A Quantile Approach for Retrieving the “Core Urban-Suburban-Rural” (USR) Structure Based on Nighttime Light

: Accurate and timely information on the “core urban-suburban-rural” (USR) spatial structure in a metropolitan region is signiﬁcant for both the scientiﬁc and policy-making communities. However, USR is usually considered as a single land use type, such as an impervious area, rather than three combined subcategories in remote-sensing image retrieval, especially for suburban areas, which obscures the details of the urbanization process. In this paper, we propose a quantile approach to retrieve the structure of USR based on stable nighttime light (NTL) data from the Defense Meteorological Satellite Program / Operational Linescan System (DMSP / OLS) and apply it in the Beijing-Tianjin-Hebei (JJJ) of China from 1995 to 2013. The key parameters of the NTL threshold, which is the maximum change point of the NTL intensity at the USR boundary, used to retrieve the three subcategories of USR are automatically deﬁned based on the quantile approach with three iterations. Then, the overall accuracy and consistency of the retrieval results are evaluated using the corresponding visual interpretation map from Landsat images with a 30 m resolution. Moreover, the inﬂuence of parameter uncertainty is compared by introducing the human settlement index (HSI). According to the time-series analysis of USR retrieval in this study, the JJJ experienced rapid urbanization from 1995 to 2013, with the core urban area expanding by 7098 km 2 (average increase of 2.7 times), the suburban area expanding by 12,690 km 2 (average increase of 2.8 times), and the rural area increasing by 4986 km 2 (average increase of 0.38 times). The USR results retrieved based on the approach agree well with the validation of the visual interpretation map, with an overall accuracy (OA) of 0.904 and a kappa coe ﬃ cient (KC) of 0.650 at the city level. The USR result with the HSI as the input shows that NTL is more suitable for USR structure retrieval as the NTL shows less uncertainty compared with other parameters such as the vegetation index (VI). This study proposes an improved quantile approach for USR mapping from NTL images on a regional scale, which will provide a useful method for urbanization dynamics analysis.


Introduction
Urbanization is widely recognized as an important factor that is one of the grand challenges of humanity, affecting the functions of terrestrial ecosystems, climate change [1], population, medical treatment, policy [2][3][4], etc. In China, most subcategories of the extent of the urban area display significant "core urban-suburban-rural" (USR) triad structures [4,5]. Recently, rapid Chinese by combining NTL and VI, LST, or both. However, a reliable estimate of the optimal thresholds of the indices is still the key parameter for mapping the extent of the urban area, which is the same as in the threshold-based approaches. Besides, as the performance of NTL satellite sensors improve, index-based approaches face the challenge of additional hypotheses between the parameters in an urban area and introducing more data uncertainty among parameters (e.g., the retrieved VI and LST from other satellite images), which may reduce the accuracy of the retrieval of the extent of the urban area.
In light of the aforementioned problems and the fact that most studies focus on retrieving the urban area as a whole due to the "disappearance" of the boundaries among the three USR subcategories, especially suburb [33], we improve a quantile approach for retrieving the extent of USR subcategories from NTL images. Furthermore, an index-based parameter of the HSI is applied as another input in the same study area of the JJJ Metropolitan Area in China to evaluate the accuracy of the improved approach. In this paper, USR corresponds to the core urban, suburban, and rural areas that are defined as the residential land types with obvious distinctions in their NTL intensities.

Study Area
The study area of the JJJ, which is both the national capital region of China and largest urbanized region in eastern Asia, lies in North China from 36 • N to 42 • N and from 111 • E to 120 • E (as shown in Figure 1). It includes 13 major cities along the coast of China's Bohai Sea. The JJJ has experienced sustained rapid urbanization since the 1980s. In 2014, the Chinese government proposed the Beijing-Tianjin-Hebei Regional Integration (BRI) regional development project to improve the JJJ as a world-class city cluster, which will significantly promote the JJJ's urbanization process [6].

Dataset Collection and Preprocessing
The main dataset used in this study for retrieving USR was the DMSP/OLS dataset acquired from (https://ngdc.noaa.gov/eog/download.html). The DMSP/OLS NTL datasets from 1995 to 2013, including F12 (1995-1996), F14 (1997-2000), F15 (2001-2003), F16 (2004-2009), and F18 (2010-2013), were chosen to be consistent with other datasets and the rapid urbanization process of the JJJ. In addition, unstable light sources such as fires and ship lights were excluded from the DMSP/OLS NTL, and water and gas flares were removed based on the MODIS water product MOD44W and gas flare masks [21,35,36] for further analysis. The background (nonlight) values and confounding factors of auroras, fires, boats, and other temporal lights were eliminated or reduced [21]. To calculate the HSI, we used the Terra MODIS NDVI data MOD13A3 in this study, which are calendar-month composites at a 1 km spatial resolution. The quality control flags from MOD13A3 were used to remove abnormal pixels caused by clouds, snow, or other geometric problems. The MOD13A3 datasets of July, August, In addition to rapid urbanization over the past several decades, the BRI will inevitably propel the JJJ's urban and population clusters further. However, unplanned urban sprawl requires sufficient land for residential, industrial, and commercial purposes, hospitals, and schools, which generally may consume the precious agricultural and residential land surrounded by cities in the JJJ [34]. Because of the important "Permanent Basic Cropland and Regulation on the Protection of Basic Farmlands" policy in China, suburban areas and rural areas are exposed and particularly vulnerable to core urban area expansion, such as through population suburbanization and industrial transfer to the suburbs. Take the Huilongguan community in Beijing as an example, which used to be a rural area 20 km north Remote Sens. 2020, 12, 4179 4 of 16 of the core urban area before 2000. The community has been a huge living community in Asia with a population of 0.3 million and an area of 8.5 million m 2 . Numerous counties surrounding Beijing are also affected by the metropolis' urban expansion, such as the rapid urbanization of eastern Yanjiao Town and southern Gu'an County, which are in Hebei Province. Therefore, the JJJ Metropolitan Area is a typical study area.

Dataset Collection and Preprocessing
The main dataset used in this study for retrieving USR was the DMSP/OLS dataset acquired from (https://ngdc.noaa.gov/eog/download.html). The DMSP/OLS NTL datasets from 1995 to 2013, including F12 (1995-1996), F14 (1997-2000), F15 (2001-2003), F16 (2004-2009), and F18 (2010-2013), were chosen to be consistent with other datasets and the rapid urbanization process of the JJJ. In addition, unstable light sources such as fires and ship lights were excluded from the DMSP/OLS NTL, and water and gas flares were removed based on the MODIS water product MOD44W and gas flare masks [21,35,36] for further analysis. The background (nonlight) values and confounding factors of auroras, fires, boats, and other temporal lights were eliminated or reduced [21]. To calculate the HSI, we used the Terra MODIS NDVI data MOD13A3 in this study, which are calendar-month composites at a 1 km spatial resolution. The quality control flags from MOD13A3 were used to remove abnormal pixels caused by clouds, snow, or other geometric problems. The MOD13A3 datasets of July, August, and September covering the JJJ were chosen, and the best quality image from each month was applied to estimate the HSI. Furthermore, the land use maps of China produced by the Chinese Academy of Sciences were used in our study. Four annual land use datasets from 1995, 2000, 2005, 2010, which were derived from Landsat image with a 30 m resolution based on visual interpretation (the data sets are provided by the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (RESDC) http://www.resdc.cn) [23,37], were applied for validation in this study. In this study, based on the 30 m land use data (Appendix A), a set of rural-urban scale verification data was produced manually. To maintain the consistency of spatial resolutions with that of the NTL data, some rural settlements with small areas (almost independent) were excluded. All datasets were processed to a 1 km spatial resolution for consistency and the dataset details are listed in the Table 1.

Methods
Although the current research literature shows a wide variety of sources for USR, there is still no common or consistent definition of this term. In particular, suburbs have been insufficiently considered in the land use retrieval from satellite images. USR is developed based on the flow and redistribution of people, materials, and information among the three subcategories of the extent of the urban area [38]; it essentially reveals the variations of human activities and land use, including the significant differences in NTL. In this study, USR is defined as a residential land use type that consists of core urban, suburban, and rural areas with obvious NTL digital number (DN) values. The general flowchart to delineate the extent of the three CSR subcategories is shown in Figure 2. More details are presented in the following sections.

MOD13A 3
The annual 16 day composite MOD13A2 product with 1-km spatial resolution. Band 1 is NDVI and Band 2 is EVI.

Methods
Although the current research literature shows a wide variety of sources for USR, there is still no common or consistent definition of this term. In particular, suburbs have been insufficiently considered in the land use retrieval from satellite images. USR is developed based on the flow and redistribution of people, materials, and information among the three subcategories of the extent of the urban area [38]; it essentially reveals the variations of human activities and land use, including the significant differences in NTL. In this study, USR is defined as a residential land use type that consists of core urban, suburban, and rural areas with obvious NTL digital number (DN) values. The general flowchart to delineate the extent of the three CSR subcategories is shown in Figure 2. More details are presented in the following sections.

A Quantile-Based Algorithm for USR Retrieval
In this paper, we improved a multiple iterating quantile approach, which can ignore the complicated impacts of the patterns of USR's spatial distribution and aggregation, to determine the values of the thresholds among three subcategories of USR. The three subcategories of USR were retrieved based on the NTL values from the core urban to rural areas in turn and concentrated in the entire administrative area. Firstly, other areas (e.g., sand, swamp, and undeveloped areas) with

A Quantile-Based Algorithm for USR Retrieval
In this paper, we improved a multiple iterating quantile approach, which can ignore the complicated impacts of the patterns of USR's spatial distribution and aggregation, to determine the values of the thresholds among three subcategories of USR. The three subcategories of USR were retrieved based on the NTL values from the core urban to rural areas in turn and concentrated in the entire administrative area. Firstly, other areas (e.g., sand, swamp, and undeveloped areas) with values of zero were excluded. Then, all NTL values of the remaining pixels in an administrative area were applied to construct a quantile curve, as shown in Figure 3.
As previously mentioned, optimal thresholds are always important factors when extracting the extent of the urban area from either NTL or related indices. Although the potential extent of USR has various NTL distribution features such as core-urban dominant or the other two subcategories dominant, the key hypothesis of NTL changing rapidly around boundaries among core urban, suburban, and rural areas can be also applied to retrieve the three subcategories of USR ( Figure 4). values of zero were excluded. Then, all NTL values of the remaining pixels in an administrative area were applied to construct a quantile curve, as shown in Figure 3. In (a-c), the x-coordinate is the quantile (Q), which is arranged inversely from 100th to 0th quantile, and the y-coordinate is the DN value of the NTL DMSP data (D). In (d), the threshold corresponds to the USR boundary.
As previously mentioned, optimal thresholds are always important factors when extracting the extent of the urban area from either NTL or related indices. Although the potential extent of USR has various NTL distribution features such as core-urban dominant or the other two subcategories dominant, the key hypothesis of NTL changing rapidly around boundaries among core urban, suburban, and rural areas can be also applied to retrieve the three subcategories of USR ( Figure 4). Based on the assumption that the NTL intensity decreases from urban to rural areas (see Figure  4), the three subcategories of USR were retrieved based on the NTL values from the core urban to rural areas in turn and concentrated in the entire administrative area. In the algorithm, the quantile method selects the threshold from low to high (from rural to urban), and the part of the NTL lower than the threshold is excluded to determine the maximum boundary of the USR subcategories. The NTL data within the maximum boundary are used as the new inputs to continue the next iteration process. When the iterations are finished, the range of the different subcategories of USR can be obtained. In (a-c), the x-coordinate is the quantile (Q), which is arranged inversely from 100th to 0th quantile, and the y-coordinate is the DN value of the NTL DMSP data (D). In (d), the threshold corresponds to the USR boundary.
were applied to construct a quantile curve, as shown in Figure 3. In (a-c), the x-coordinate is the quantile (Q), which is arranged inversely from 100th to 0th quantile, and the y-coordinate is the DN value of the NTL DMSP data (D). In (d), the threshold corresponds to the USR boundary.
As previously mentioned, optimal thresholds are always important factors when extracting the extent of the urban area from either NTL or related indices. Although the potential extent of USR has various NTL distribution features such as core-urban dominant or the other two subcategories dominant, the key hypothesis of NTL changing rapidly around boundaries among core urban, suburban, and rural areas can be also applied to retrieve the three subcategories of USR ( Figure 4). Based on the assumption that the NTL intensity decreases from urban to rural areas (see Figure  4), the three subcategories of USR were retrieved based on the NTL values from the core urban to rural areas in turn and concentrated in the entire administrative area. In the algorithm, the quantile method selects the threshold from low to high (from rural to urban), and the part of the NTL lower than the threshold is excluded to determine the maximum boundary of the USR subcategories. The NTL data within the maximum boundary are used as the new inputs to continue the next iteration process. When the iterations are finished, the range of the different subcategories of USR can be obtained. Based on the assumption that the NTL intensity decreases from urban to rural areas (see Figure 4), the three subcategories of USR were retrieved based on the NTL values from the core urban to rural areas in turn and concentrated in the entire administrative area. In the algorithm, the quantile method selects the threshold from low to high (from rural to urban), and the part of the NTL lower than the threshold is excluded to determine the maximum boundary of the USR subcategories. The NTL data within the maximum boundary are used as the new inputs to continue the next iteration process. When the iterations are finished, the range of the different subcategories of USR can be obtained.
Specifically, quantile curves are constructed by calculating the intensity from percentiles 0-100 in the NTL data and arranging them in reverse order. Taking the line between the starting point and the end point of the curve as the reference line, the point farthest from the reference line is defined as the turning point. The key to retrieve the USR area is to find the NTL intensity corresponding to the turning point via the quantile curve. The calculation process is as follows ( Figure 5).
Specifically, quantile curves are constructed by calculating the intensity from percentiles 0-100 in the NTL data and arranging them in reverse order. Taking the line between the starting point and the end point of the curve as the reference line, the point farthest from the reference line is defined as the turning point. The key to retrieve the USR area is to find the NTL intensity corresponding to the turning point via the quantile curve. The calculation process is as follows ( Figure 5).  Figure 5. USR NTL threshold determination.
In the first iteration, the quantile curve of the NTL data of the whole target area is calculated. Through the distance between the curve and the reference line, the threshold d at the maximum light intensity change in the target area can be obtained (Figure 3a).
reflects the sudden change of the NTL from nothing, that is, the border between rural areas and other areas (basically no lighting areas). The part of the NTL data whose NTL intensity is less than is removed to determine the input data of the second iteration.
The second iteration is the same as the first iteration. In the quantile curve constructed in the second iteration, the NTL intensity ( Figure 3b) corresponding to the turning point is the boundary line between the rural and suburban areas. In addition, the part of the input data in the previous step whose NTL intensity is less than is deleted as the input of the third iteration. In the third iteration, the input data do not include other regions and rural areas, which is different from the above two iterations.
( Figure 3c) has two cases on the quantile curve. When < , the corresponding NTL intensity at the turning point, , is the boundary between the suburban and urban areas; when = , there is no obvious mutation. According to the definition of the NTL intensity of USR, > > > ℎ , and the urbanization is not completely reversible in time; therefore, the rural area retrieved in the second iteration should be classified as suburban. The spatial distribution range of USR is obtained by stacking the results of the three iterations.

Validation Algorithm
Visual interpretation of the land use map from the 30 m [37] resolution Landsat multispectral images are recognized as a relatively exact dataset that can be used to validate the accuracy of the improved approach. Then, the 30 m resolution land use data were upscaled to a 1 km spatial resolution to match the resolution of the NTL and HSI data. Moreover, in the existing land use data, suburban and rural areas are usually combined as rural residential land types. We also followed this principle in the evaluation of the results. The two accuracy indices of the kappa coefficient (KC) [39] and overall accuracy (OA) were used in this paper based on calculation of the confusion matrix (as shown in Table 2). In the first iteration, the quantile curve of the NTL data of the whole target area is calculated. Through the distance between the curve and the reference line, the threshold d at the maximum light intensity change in the target area can be obtained (Figure 3a). D rural reflects the sudden change of the NTL from nothing, that is, the border between rural areas and other areas (basically no lighting areas). The part of the NTL data whose NTL intensity is less than D rural is removed to determine the input data of the second iteration.
The second iteration is the same as the first iteration. In the quantile curve constructed in the second iteration, the NTL intensity D suburban (Figure 3b) corresponding to the turning point is the boundary line between the rural and suburban areas. In addition, the part of the input data in the previous step whose NTL intensity is less than D suburban is deleted as the input of the third iteration.
In the third iteration, the input data do not include other regions and rural areas, which is different from the above two iterations. D urban (Figure 3c) has two cases on the quantile curve. When D urban < DN max , the corresponding NTL intensity at the turning point, D urban , is the boundary between the suburban and urban areas; when D urban = DN max , there is no obvious mutation. According to the definition of the NTL intensity of USR, DN urban > DN suburban > DN rural > DN other , and the urbanization is not completely reversible in time; therefore, the rural area retrieved in the second iteration should be classified as suburban. The spatial distribution range of USR is obtained by stacking the results of the three iterations.

Validation Algorithm
Visual interpretation of the land use map from the 30 m [37] resolution Landsat multispectral images are recognized as a relatively exact dataset that can be used to validate the accuracy of the improved approach. Then, the 30 m resolution land use data were upscaled to a 1 km spatial resolution to match the resolution of the NTL and HSI data. Moreover, in the existing land use data, suburban and rural areas are usually combined as rural residential land types. We also followed this principle in the evaluation of the results. The two accuracy indices of the kappa coefficient (KC) [39] and overall accuracy (OA) were used in this paper based on calculation of the confusion matrix (as shown in Table 2).

Others R and S Urban
Others Where C ij is the number of pixles that category i in verification data classified as prediction category j.
The KC is an index used to test the spatial consistency of two maps and can also be used to measure classification accuracy. The KC is expressed by where p o and p e can be defined as: where C ii is calculated from Table 2, C i, * is the total number of pixels in the ith row in Table 2, C * ,i is the total number of pixels in the ith column in Table 2, and n is the total number of pixels in the prediction map from the proposed method. The value of KC is between 0 and 1, which has been used for assessing maps agreement in common research [40,41]. Generally, the KC can be divided into five levels [38]: slight (0.0~0.20), fair (0.21~0.40), moderate (0.41~0.60), substantial (0.61~0.80), and almost perfect (0.81~1).
OA represents the proportion of the correct categories in the total categories, which can be expressed by where C ii and n are also calculated from Table 2 as Equation (2). In this paper, the OA refers to the probability that the USR subcategory is consistent with the 30 m resolution land use data.

Spatial Distribution of Retrieved USR
Using the above method, we mapped the extent of the USR of the JJJ from 1995 to 2013 based on NTL data ( Figure 6). As shown in Figure 6, since 1995, the trend of the urbanization of the JJJ has been obvious, and the extent of the urban area has increased significantly. By 2013, The urban areas of Beijing, Tianjin, and Hebei retrieved by the quantile method were 3253 km 2 , 1528 km 2 , and 6720 km 2 , respectively, (in the 2014 Beijing Land Use Statistical Yearbook, the urban area was 2977.59 km 2 , and the other two places lack relevant statistical items [42]) and the extent of the urban area has increased by 7098 km 2 (an increase of approximately 2.7 times). In addition, the gap between suburban and rural areas has narrowed in terms of the NTL intensity. By 2004, there was no significant difference in the NTL intensity between the suburban and rural areas of Beijing. Tianjin showed the same trend. Around 2006, the gap between the intensity of the NTL in the rural and suburban areas of Tianjin almost disappeared. In Beijing and Tianjin, the rural area increased to approximately 3417 km 2 before 2004, and the rural areas gradually merged with the suburban areas after 2004. Different from Beijing and Tianjin, in the process of urbanization in Hebei Province, besides the increase of the urban area by 3955 km 2 , the rural area also expanded by approximately 8742 km 2 (an increase of approximately 0.94 times, higher than the average growth of Beijing and Tianjin), and the distribution was more concentrated and accompanied by a trend of consolidation.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 16 0.94 times, higher than the average growth of Beijing and Tianjin), and the distribution was more concentrated and accompanied by a trend of consolidation.  Table 2 shows the classification accuracies for Beijing, Tianjin, and Hebei Province referring to the visual interpretation results. By analyzing the retrieval results in 1995, 2000, 2005, and 2010, it can be seen that the quantile method based on NTL has more stable overall classification accuracy and KC and its effects are slightly different in different regions. The OA and KC in Beijing and Tianjin are significantly higher than those in Hebei. The average OA in Beijing and Tianjin reached 0.904, and the KC averaged 0.650. According to the consistency classification of the KC, the overall classification results of Beijing and Tianjin are highly consistent [39]. In Hebei, the results are generally consistent. When USR is retrieved by NTL, the results are significantly affected by the scale of the target region.

Evaluation
As shown in Table 3, the retrieval accuracies in Beijing and Tianjin are higher than that in Hebei. The differences in the development levels among the cities in Hebei Province are also reflected in the differences in the intensity of NTL. When retrieved by the scope of the entire province, a higher NTL intensity weakens the rural-urban NTL intensity difference in areas with low NTL intensity, and it makes this difference more obvious between different cities in the province. As a result, areas with low NTL are mistakenly classified as rural or suburban areas.   Table 2 shows the classification accuracies for Beijing, Tianjin, and Hebei Province referring to the visual interpretation results. By analyzing the retrieval results in 1995, 2000, 2005, and 2010, it can be seen that the quantile method based on NTL has more stable overall classification accuracy and KC and its effects are slightly different in different regions. The OA and KC in Beijing and Tianjin are significantly higher than those in Hebei. The average OA in Beijing and Tianjin reached 0.904, and the KC averaged 0.650. According to the consistency classification of the KC, the overall classification results of Beijing and Tianjin are highly consistent [39]. In Hebei, the results are generally consistent. When USR is retrieved by NTL, the results are significantly affected by the scale of the target region.

Evaluation
As shown in Table 3, the retrieval accuracies in Beijing and Tianjin are higher than that in Hebei. The differences in the development levels among the cities in Hebei Province are also reflected in the differences in the intensity of NTL. When retrieved by the scope of the entire province, a higher NTL intensity weakens the rural-urban NTL intensity difference in areas with low NTL intensity, and it makes this difference more obvious between different cities in the province. As a result, areas with low NTL are mistakenly classified as rural or suburban areas. In order to compare the retrieval accuracies of different types of land in rural, suburban, and urban areas, we separately calculated the accuracies of core urban land, rural land, suburban land, and others. The results are shown in Table 4. In Beijing and Tianjin, the "others" type of area has large surface area, and the intensity of the NTL is almost zero, which makes them easier to distinguish. The classification accuracy of the "others" type is the highest with an average of 0.939. Due to the relative concentration of the urban area and the strongest intensity of NTL, the classification accuracy of the urban type is the second best with an average of 0.854. Rural settlements are easy to classify into other areas because their discrete distribution and weak NTL, thus the minimum average accuracy is 0.825. However, with the development of rural areas and the continuous infrastructure improvements, the area and the level of NTL is increasing each year. The retrieval accuracy has also improved, and fewer rural settlements have been left out.
For further validation, we compared the results of quantile methods with the retrieval results of HSI, VANUI, and other indexes based on global-fixed threshold methods, which were acquired from previous studies [27]. As shown in Table 5, the USR retrieval of NTL data using the quantile method achieved relatively good quality results. The OA and KC of Beijing and Tianjin were almost the same as those in the previous method. However, due to the feature of automatic selecting of the optimal thresholds and un-using additional parameters, the retrieval process of the quantile method is more convenient and efficient than are the fixed threshold methods. In addition, we also introduced HSI into the threshold method in the discussion subsection so as to compare the USR retrieval results based on different data (or indicators) through the quantile method.

The Problem of Retrieving Rural Settlements
The spatial distribution of rural settlements is more discrete than that of metropolitan areas. The retrieval of rural areas using DMSP/OLS data is limited to indicating the scope of the rural distribution rather than identifying the discrete distribution of rural settlements. Rural settlements will be missed when the following two situations exist:

1.
When scattered, a DMSP/OLS sensor cannot detect the NTL intensity, and thus rural settlements will be missed. In addition, due to the blooming effect of NTL, rural settlements cannot be accurately distinguished. Due to the relatively discrete distribution of rural settlements, when there are many rural settlements, the presence of the blooming effect of the NTL causes the surrounding pixels to be identified as rural areas. As a result, the discretely distributed rural settlements are merged into one area, and so only their approximate scope can be retrieved. Unlike rural settlements, the urban land itself has a more concentrated distribution and larger area. When retrieving the urban area, the impact of the blooming effect is mostly concentrated within the city, and so the impact on the entire urban area is small (Figure 7 Case I).

2.
When using data from earlier years, NTL is not an effective way to identify rural areas, since in the past the economic development in those areas was relatively minor and there was a power shortage in some rural areas. With the construction of the rural infrastructure and the promotion of corresponding policies, the situation of rural electricity consumption has been greatly improved, and the situation of being omitted due to the absence of NTL is gradually reduced (Figure 7 Case II).
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 16 Unlike rural settlements, the urban land itself has a more concentrated distribution and larger area. When retrieving the urban area, the impact of the blooming effect is mostly concentrated within the city, and so the impact on the entire urban area is small (Figure 7 Case I). 2. When using data from earlier years, NTL is not an effective way to identify rural areas, since in the past the economic development in those areas was relatively minor and there was a power shortage in some rural areas. With the construction of the rural infrastructure and the promotion of corresponding policies, the situation of rural electricity consumption has been greatly improved, and the situation of being omitted due to the absence of NTL is gradually reduced (Figure 7 Case II).

Differences between the Results Based on HSI or NTL
Lu, D et al. proposed the human settlement index (HSI) for mapping the extent of the urban area by considering both NTL and vegetation status, which is widely used in mapping the extent of the urban area. The index assumes that the NTL are high and the vegetation index is low in residential areas [1]. In order to compare the impacts of introducing other data or indicators on USR retrieval results, we applied the HSI as an input in the study area of the JJJ. The HSI, which combines NTL and NDVI data, can be expressed as: where is the normalized night lights value. After replacing the NTL data with the HSI data, the iterative process of the quantile method is performed. Compared with the results retrieved by the quantile method based on the NTL DN values, the OA and KC calculated using the HSI are poor. The results using the HSI are obviously biased in OA, and the average OA in the JJJ is only 0.8. The average KC is 0.485. According to the KC classification standard, the retrieval results only reach general consistency. Different from the NTLbased quantile method, the HSI is less affected by the scale of the study area. In the JJJ, the USR retrieval results of Beijing, Tianjin, and Hebei are more consistent. However, some discretely distributed areas in the city can be better retrieved. Compared with the results of NTL, the blooming effect is lower (as shown in Table 6). Therefore, it is possible to reduce the problem of excessively large USR retrieval.

Differences between the Results Based on HSI or NTL
Lu, D et al. proposed the human settlement index (HSI) for mapping the extent of the urban area by considering both NTL and vegetation status, which is widely used in mapping the extent of the urban area. The index assumes that the NTL are high and the vegetation index is low in residential areas [1]. In order to compare the impacts of introducing other data or indicators on USR retrieval results, we applied the HSI as an input in the study area of the JJJ. The HSI, which combines NTL and NDVI data, can be expressed as: (1 − NDVI max ) + OLS nor (1 − OLS nor ) + NDVI max + OLS nor * NDVI max (5) where OLS nor is the normalized night lights value. After replacing the NTL data with the HSI data, the iterative process of the quantile method is performed. Compared with the results retrieved by the quantile method based on the NTL DN values, the OA and KC calculated using the HSI are poor. The results using the HSI are obviously biased in OA, and the average OA in the JJJ is only 0.8. The average KC is 0.485. According to the KC classification standard, the retrieval results only reach general consistency. Different from the NTL-based quantile method, the HSI is less affected by the scale of the study area. In the JJJ, the USR retrieval results of Beijing, Tianjin, and Hebei are more consistent. However, some discretely distributed areas in the city can be better retrieved. Compared with the results of NTL, the blooming effect is lower (as shown in Table 6). Therefore, it is possible to reduce the problem of excessively large USR retrieval. We consider this to have been caused by the shortcomings of the HSI. While using the HSI, the assumption that the relationship between the NDVI and NTL follows the power law is introduced. When NTL reaches its maximum value, the NDVI is close to zero, and the HSI will grow exponentially. Some studies' calculations of the HSI of global cities show that it overcorrects the saturation of urban areas, resulting in a reduction in the data of urban boundary areas [31]. In previous studies, in many regions of the world, urban expansion takes place in areas with reduced or lower vegetation [43]. With the development of China's rural economy and the promotion of the construction of ecological civilization, rural infrastructure has become more complete and urban greening has gradually improved. This makes the difference in the degree of vegetation coverage and the intensity of the NTL between rural and urban areas gradually smaller. The assumption that the HSI retrieves USR is further weakened, which ultimately leads to unsatisfactory results in the final retrieval. This also proves that the retrieval of the structure of USR and the introduction of parameters other than NTL (e.g., VI, LST) will increase the uncertainty in the retrieval results.

The Weakness of DMSP and Potential Problems
DMSP data have some weaknesses, such as bloom effect and oversaturation, which may also reduce the accuracy of the proposed method. Compared with other night light sensors such as VIIRS (750 m spatial resolution) and Luojia (130 m spatial resolution), the spatial resolution is lower [24]. Bloom effect may lead to the combination of lighting in a certain range, which cannot be accurately expressed for the scattered rural areas. Moreover, the problem of oversaturation will gradually blur the boundaries between rural, suburban, and core-urban areas in the process of human activities, which may lead to the disappearance of the difference in NTL intensity of USR in DMSP data. In recent studies, other sensors were processed for spatiotemporal consistency, which adjusted for the defects of DMSP data [44]. For example, by using the data of VIIRS, the problem of merging scattered regions caused by bloom effect can be further suppressed, and there is no oversaturation problem in VIIRS, which can keep the difference of NTL intensity between USR.
In addition, with the development of lighting methods and related technologies, the light radiation also changes in NTL data [45]. Compared to different times, different space ranges at the same time are more important. Table 3 confirms the influence of the space range on the results of the quantile method. For a scale such as the municipal level, the difference in infrastructure construction (such as lights) and human activities is smaller than that of the whole province, and the spatial impact is also relatively small. As for the development of new technologies such as LED, controlling the scope of the research area can better control the lighting difference in the space to reduce the influence of lighting mode on USR retrieval results. However, in the developed countries in Europe or in the United States, there may be counter-urbanization phenomenon, so the retrieval results of USR structure may need to be further modified.

Conclusions
Considering how few studies previously focused on retrieving the three sub-categories of "core urban-suburban-rural" from nighttime light images, especially suburb areas, we proposed an improved quantile approach for retrieving the three USR subcategories from NTL images, which automatically defines the boundary thresholds (D rural , D suburban , and D urban ) of rural, suburban, and urban areas using DMSP NTL data without introducing empirical knowledge and additional data to retrieve the USR structure. Then, the approach was applied to USR retrieval for the JJJ from 1995 to 2013. According to the retrieval results, the JJJ has experienced rapid urbanization during the past decades. The core urban and suburban areas have increased significantly, and they more than doubled (7098 and 12,690 km 2 , respectively). Meanwhile, the rural areas expanded at a slower speed by approximately 0.38 times (increase of 4986 km 2 ). In Beijing and Tianjin, the average OA was 0.904 and the average KC was 0.650. Compared with the former method based on a fixed threshold, the quantile method further improves the retrieval accuracy. Through comparison with NTL and the HSI, it was found that NTL DN was more suitable for USR retrieval, being 12.5% on average higher than that of the HSI (from 0.803 to 0.904, respectively). Furthermore, the overall consistency was improved from the general consistency (KC of 0.485) of the HSI to substantial (KC of 0.650) for NTL DN. The retrieval results show that the increase in the parameter uncertainty and the deviation between the original hypothesis and the actual situation may have a serious impact on the retrieval of the USR structure. Due to the blooming effect of NTL, the actual results may overestimate the borders of the countryside, and there will also be missing points for areas that do not meet the assumption of NTL. With the improvement of the spatial resolution of NTL data and sensor sensitivity (such as Luojia No. 1 data), the accuracy of the retrieval of rural settlements with discrete partitions might be further improved. At the city level, the proposed quantile approach achieves similar results to that of the visual interpretation of USR retrieval, with lower labor and time costs, which increases the automation of the method. We believe that the proposed approach will provide an efficient, low-cost method for research on urbanization or urban expansion.