A Simulation Study on the Urban Population of China Based on Nighttime Light Data Acquired from Dmsp/ols

The urban population (UP) measure is one of the most direct indicators that reflect the urbanization process and the impacts of human activities. The dynamics of UP is of great importance to studying urban economic, social development, and resource utilization. Currently, China lacks long time series UP data with consistent standards and comparability over time. The nighttime light images from the Defense Meteorological Satellite Program's (DMSP) Operational Linescan System (OLS) allow the acquisition of continuous and highly comparable long time series UP information. However, existing studies mainly focus on simulating the total population or population density level based on the nighttime light data. Few studies have focused on simulating the UP in China. Based on three regression models (i.e., linear, power function, and exponential), the present study discusses the relationship between DMSP/OLS nighttime light data and the UP and establishes optimal regression models for simulating the UPs of 339 major cities in China from 1990 to 2010. In addition, the present study evaluated the accuracy of UP and non-agricultural population (NAP) simulations conducted using the same method. The simulation results show that, at the national level, the power function model is the optimal regression model between DMSP/OLS nighttime light data and UP data for 1990–2010. At the provincial scale, the optimal regression model varies among different provinces. The linear regression model is the optimal regression model for more than 60% of the provinces. In addition, the comparison results show that at the national, provincial, and city levels, the fitting results of the UP based on DMSP/OLS nighttime light data are better than those of the NAP. Therefore, DMSP/OLS nighttime light data can be used to effectively retrieve the UP of a large-scale region. In the context of frequent population flows between urban and rural areas in China and difficulty in obtaining accurate UP data, this study provides a timely and effective method for solving this problem.


Introduction
Currently, countries worldwide, and developing countries in particular, are undergoing an unprecedented urbanization process [1].As one of the most direct indicators that reflects the urbanization process and the impacts of human activities, the urban population (UP) measure is Sustainability 2016, 8, 521; doi:10.3390/su8060521www.mdpi.com/journal/sustainability of great importance to urban economic and social development and resource utilization [2].Since the onset of economic reforms, the proportion of the UP in China has increased from 17.9% in 1978 to 52.6% in 2012.If the current trend continues, the UP in China will reach one billion over the next 20 years [3,4].Meanwhile, China is implementing a new urbanization strategy, called the "New-Type Urbanization Plan".Therefore, the accuracy of the UP indicator is directly related to judgments on urbanization processes in China.Acquiring information on the UP of China in a timely and accurate manner is of great significance for formulating scientific urban system plans while reasonably optimizing the UP distribution.Currently, two common methods are used to acquire information on the UP in China.The first method involves acquiring information on the UP through a census.Since the founding of the People's Republic of China, China has conducted six successive national censuses (in 1953, 1964, 1982, 1990, 2000, and 2010), whereby the population was studied and measured on a general, door-to-door and person-by-person basis.These censuses have facilitated the acquisition of relatively accurate statistical data on the UP [5][6][7][8].However, acquiring information on the UP through a census still presents some disadvantages.First, each census was completed at the cost of large amounts of manpower and material resources.Second, the interval between two consecutive censuses was approximately 10 years.Therefore, the data lacks timeliness.As a result, acquiring statistical data on the UP through censuses presents certain limitations regarding research relevance and practical applications.More importantly, of the six censuses conducted, those conducted in 1953, 1964, and 1982 employed a relatively consistent statistical definition for UP that was, however, different from the one used for the censuses conducted in 1990, 2000, and 2010.This difference in statistical definitions is undoubtedly unfavorable to the continuity and comparability of the data [7,9,10].
The second common method involves acquiring information on the UP by substituting the non-agricultural population (NAP) for the UP.Due to their relatively high levels of continuity, data on the NAP have been extensively used in statistical publications and in urban studies in China.However, substituting the NAP for the UP also presents many problems.Most importantly, there is a significant difference between the UP and NAP.The UP cannot be equated precisely with the NAP.By definition, the UP refers to the population that lives within the boundaries of cities and is mainly statistically determined based on the division between urban and rural areas.In comparison, the NAP refers to the population that is engaged in non-agricultural production activities and to its supported population, and it is statistically determined based on the agricultural or non-agricultural status recorded in the household registration system [11].In 2010, the UP accounted for 49.68% of the total population of China, whereas the NAP accounted for only 34.17% of the total population.The difference between the UP and NAP was approximately 200 million.This difference is mainly attributable to the fact that the numerous migrant workers in cities that originated from rural areas are not recorded in a given city's NAP, and thus, substituting the NAP for the UP often underestimates the real UP.This problem is particularly prominent in large cities with high labor demands but strict control over people with household registrations [12].Therefore, China currently lacks a long time series set of UP data that are accurate, consistent and comparable over time [6].
Therefore, the present study aims to develop a means of acquiring long time series information on the UP in China based on DMSP/OLS nighttime light data.The method can account for current deficiencies in statistical UP data for China.Specifically, the main objectives of the present study are to (1) compare and discuss the relationship between DMSP/OLS nighttime light data and the UP based on multiple regression models and to (2) simulate the UP of 339 major cities in China for 1990 to 2010 based on established optimal regression models and to evaluate the accuracy of the simulation results.

Study Area and Data
Three main types of data were used in the present study: DMSP/OLS stable nighttime light data, statistical data, and geographical information system (GIS) auxiliary data (Figure 1).The first type of data includes DMSP/OLS stable nighttime light data, which are provided by the NGDC of the NOAA.DMSP/OLS stable nighttime light data are data on stable lights emitted from urban areas, rural areas, and other locations, excluding occasional noise such as flames, and have a spatial resolution of 30" (curvature).The value of DMSP/OLS stable nighttime light data represents the mean light intensity (range: 1-63).DMSP/OLS stable nighttime light data are created through the strict screening of all usable data archived by the DMSP/OLS each year and through the de-clouding of selected data.As DMSP/OLS time series data are collected by several sensors from different satellites, and are not calibrated against radiation, the difference between different sensors, the difference in transit time between different satellites and the deterioration of sensors have a certain effect on the comparability of the nighttime light data.Therefore, a relatively large difference can be found between data acquired by different satellites and data acquired by the same satellite over different years.In other words, the data are not comparable, thereby limiting the practical application of nighttime light data [39][40][41].Following the method proposed by Liu et al. [25], the data were calibrated from 1992 to 2010 of mainland China.Specifically, stable nighttime light data were first subjected to a series of calibration treatments, including relative radiation calibration, intra-annual calibration and inter-annual series calibration, to enhance the continuity and comparability of the data.The grid resolution was resampled to 1 km, and projections were converted to Albers equal-area projections.
Therefore, the present study aims to develop a means of acquiring long time series information on the UP in China based on DMSP/OLS nighttime light data.The method can account for current deficiencies in statistical UP data for China.Specifically, the main objectives of the present study are to (1) compare and discuss the relationship between DMSP/OLS nighttime light data and the UP based on multiple regression models and to (2) simulate the UP of 339 major cities in China for 1990 to 2010 based on established optimal regression models and to evaluate the accuracy of the simulation results.

Study Area and Data
Three main types of data were used in the present study: DMSP/OLS stable nighttime light data, statistical data, and geographical information system (GIS) auxiliary data (Figure 1).The first type of data includes DMSP/OLS stable nighttime light data, which are provided by the NGDC of the NOAA.DMSP/OLS stable nighttime light data are data on stable lights emitted from urban areas, rural areas, and other locations, excluding occasional noise such as flames, and have a spatial resolution of 30ʺ (curvature).The value of DMSP/OLS stable nighttime light data represents the mean light intensity (range: 1-63).DMSP/OLS stable nighttime light data are created through the strict screening of all usable data archived by the DMSP/OLS each year and through the de-clouding of selected data.As DMSP/OLS time series data are collected by several sensors from different satellites, and are not calibrated against radiation, the difference between different sensors, the difference in transit time between different satellites and the deterioration of sensors have a certain effect on the comparability of the nighttime light data.Therefore, a relatively large difference can be found between data acquired by different satellites and data acquired by the same satellite over different years.In other words, the data are not comparable, thereby limiting the practical application of nighttime light data [39][40][41].Following the method proposed by Liu et al. [25], the data were calibrated from 1992 to 2010 of mainland China.Specifically, stable nighttime light data were first subjected to a series of calibration treatments, including relative radiation calibration, intra-annual calibration and inter-annual series calibration, to enhance the continuity and comparability of the data.The grid resolution was resampled to 1 km, and projections were converted to Albers equal-area projections.The second type of data used is statistical data on the UP and NAP in China for 1990, 2000, and 2010.Considering the lack of statistical data on the populations of Hong Kong, Macao, and Taiwan, we examined 339 major cities in 31 provinces and regions in China as our study area.In the present study, census data were used to collect UP data; and the Yearbooks of Statistics of Chinese cities were used for NAP data [43].The third type of data is GIS auxiliary data, which are 1:4,000,000 data of the administrative boundaries of provinces and prefecture-level cities in China published on the National Geomatics Center of China website [44].

Regression Method
Regression analysis is a common method used to establish the relationship between the total digital number (TDN) and population data based on DMSP/OLS nighttime light data.Accordingly, urban populations in 1990, 2000, and 2010 were simulated with regression models.As the DMSP/OLS dataset recorded the nighttime light from 1992, we used the nighttime light data in 1992 and the census data in 1990 to develop the simulation model for urban population in 1990.In accordance with the literature [45], we discussed the relationship between the DMSP/OLS nighttime light and UP using three regression models-the linear regression model, the exponential regression model, and the power function regression model.The equations used were as follows: where UP i represents the statistical value of the UP of city i; TDN i represents the total light brightness of city i; and a, b, m, n, p, and q represent corresponding parameters of the regression equations.

Optimal Regression Model
Due to the significant regional heterogeneities of mainland China, we selected the optimal regression models at two scales: the national and provincial scales (Figure 2).At the national scale, regressions were conducted between the UP and the TDN in 339 major cities of mainland China.At the provincial scale, the regression analysis used the UP and the TDN in each province.Since directly controlled municipalities (i.e., Beijing, Tianjin, Shanghai, and Chongqing) in China are also provincial-level administrative units, data on the UPs and TDNs in Beijing and Tianjin were combined with the counterparts in Hebei Province; data in Shanghai were combined with the counterparts in Jiangsu Province; and data in Chongqing were combined with the counterparts in Sichuan Province.
Sustainability 2016, 8, 521 used for NAP data [43].The third type of data is GIS auxiliary data, which are 1:4,000,000 data of the administrative boundaries of provinces and prefecture-level cities in China published on the National Geomatics Center of China website [44].

Regression Method
Regression analysis is a common method used to establish the relationship between the total digital number (TDN) and population data based on DMSP/OLS nighttime light data.Accordingly, urban populations in 1990, 2000, and 2010 were simulated with regression models.As the DMSP/OLS dataset recorded the nighttime light from 1992, we used the nighttime light data in 1992 and the census data in 1990 to develop the simulation model for urban population in 1990.In accordance with the literature [45], we discussed the relationship between the DMSP/OLS nighttime light and UP using three regression models-the linear regression model, the exponential regression model, and the power function regression model.The equations used were as follows: where UPi represents the statistical value of the UP of city i; TDNi represents the total light brightness of city i; and a, b, m, n, p, and q represent corresponding parameters of the regression equations.

Optimal Regression Model
Due to the significant regional heterogeneities of mainland China, we selected the optimal regression models at two scales: the national and provincial scales (Figure 2).At the national scale, regressions were conducted between the UP and the TDN in 339 major cities of mainland China.At the provincial scale, the regression analysis used the UP and the TDN in each province.Since directly controlled municipalities (i.e., Beijing, Tianjin, Shanghai, and Chongqing) in China are also provincial-level administrative units, data on the UPs and TDNs in Beijing and Tianjin were combined with the counterparts in Hebei Province; data in Shanghai were combined with the counterparts in Jiangsu Province; and data in Chongqing were combined with the counterparts in Sichuan Province.The optimal regression models at the two scales were determined by the coefficient of determination (R 2 ) and significance test of the regression equations.At the provincial scale, when none of the three regression models passed the significance test, the national optimal regression model was used as the optimal regression model for a given province/region.When different forms of optimal regression were obtained for a given province among different years, the regression form that occurred most frequently was used as the optimal regression for a given province over time.

National Scale
Nighttime light data can be used to retrieve a city's urban population in China.All three regression models passed the significance test at the 0.001 level.However, the power function regression model was the optimal regression model between the TDN of nighttime light data and UP data for 1990, 2000, and 2010, with a goodness of fit (R 2 ) greater than 0.674 (Table 1).Specifically, the R 2 values of the linear regression model for the TDN and UP data for 1990, 2000, and 2010 were 0.562, 0.616, and 0.563, respectively.The R 2 values of the exponential regression model for 1990, 2000, and 2010 were 0.353, 0.430, and 0.508, whereas the R 2 values of the power function regression model for the three years were 0.696, 0.711, and 0.674, respectively.Clearly, the R 2 of the power function regression model was higher than that of the other two regression models for each of the aforementioned three years.0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Provincial Scale
At the provincial scale, the optimal regression model varied across provinces (Table 2), while the linear regression model was the optimal model for most of provinces.Of the three regression models for 1990-2010, the linear regression model was the optimal regression model for 64% of the provinces.For example, the mean R 2 value of the linear regression model for Hubei Province is as high as 0.941.The power function regression model was the optimal regression model for 20% of the provinces.For instance, the mean R 2 value of the power function regression model for Qinghai Province was 0.810.The exponential regression model was the optimal regression model for 16% of the provinces.The mean R 2 value of the exponential regression model for Beijing-Tianjin-Hebei was 0.894.In addition, the R 2 value of the optimal regression model for each province/region shows an increasing and then decreasing trend from 1990 to 2010.The mean R 2 values for 1990, 2000, and 2010 were 0.721, 0.798, and 0.746, respectively.None of the fitting results of the three regression models for the Ningxia Hui Autonomous Region and Hainan Province passed the significance test at the 0.05 level, therefore, the regression model at the national scale was adopted in these two provinces.
* represent the 0.05 statistical significance levels.** represent the 0.005 statistical significance levels.*** represent the 0.001 statistical significance levels. 1 As none of the regression models for the Ningxia Hui Autonomous Region and Hainan Province pass the significance test at the 0.05 level, the optimal regression model for the national scale is used as the optimal regression model.The relative errors between census data and simulated results in Ningxia in 1990, 2000, and 2010 were 18.47%, 42.36%, and 7.11%, respectively.The relative errors in Hainan for the three years were ´47.25%, ´41.72%, and ´4.19%, respectively.

Simulation Results of the UP Based on Nighttime Light Data Are Better than those of the NAP
To comparatively verify the reliability of the simulation results for the UP, the NAP of mainland China for 1990-2010 was also simulated based on statistical data for the NAP using the same approach shown in Figure 2. In addition, the accuracy of our simulation of the UP and NAP for 1990-2010 based on DMSP/OLS stable nighttime light data was also evaluated and compared at the national and provincial scales, respectively (Figures 3 and 4).
* represent the 0.05 statistical significance levels.** represent the 0.005 statistical significance levels.*** represent the 0.001 statistical significance levels. 1 As none of the regression models for the Ningxia Hui Autonomous Region and Hainan Province pass the significance test at the 0.05 level, the optimal regression model for the national scale is used as the optimal regression model.The relative errors between census data and simulated results in Ningxia in 1990, 2000, and 2010 were 18.47%, 42.36%, and 7.11%, respectively.The relative errors in Hainan for the three years were −47.25%, −41.72%, and −4.19%, respectively.

Simulation Results of the UP Based on Nighttime Light Data Are Better than those of the NAP
To comparatively verify the reliability of the simulation results for the UP, the NAP of mainland China for 1990-2010 was also simulated based on statistical data for the NAP using the same approach shown in Figure 2. In addition, the accuracy of our simulation of the UP and NAP for 1990-2010 based on DMSP/OLS stable nighttime light data was also evaluated and compared at the national and provincial scales, respectively (Figures 3 and 4).At the national scale, the fitting results of the UP based on DMSP/OLS nighttime light data were better than those of the NAP (Figure 3).The R 2 values of the optimal regression model for the TDN and the UP for 1990, 2000 and 2010 were 0.696, 0.711 and 0.674, respectively.In comparison, the R 2   At the national scale, the fitting results of the UP based on DMSP/OLS nighttime light data were better than those of the NAP (Figure 3).The R 2 values of the optimal regression model for the TDN and the UP for 1990, 2000 and 2010 were 0.696, 0.711 and 0.674, respectively.In comparison, the R 2 values of the optimal regression model for the TDN and the NAP for 1990, 2000, and 2010 are 0.640, 0.669, and 0.666, respectively.All of the optimal regression models constructed based on TDN and UP/NAP data for 1990-2010 passed the significance test at the 0.001 level.
At the provincial scale, the fitting results of the UP based on DMSP/OLS nighttime light data were also better than those of the NAP.The mean R 2 value of the optimal regression models for the TDN and the UP for all provinces from 1990 to 2010 was 0.755, whereas the mean R 2 value of the optimal regression models for the TDN and the NAP was 0.713.A comparison between the UP and NAP fitting results for four representative regions that are different in location, development level, and optimal fitting models showed that, the R 2 value of the UP was higher than that of the NAP (Figure 4).For the Beijing-Tianjin-Hebei region, the optimal fitting model was the exponential regression model.The mean R 2 values of the UP and NAP based on the TDN was 0.894 and 0.857, respectively.For Gansu Province, the linear regression model was the optimal fitting model.The mean R 2 values of the UP and NAP was 0.892 and 0.840, respectively.For Fujian Province, the linear regression model was the optimal fitting model, while the mean R 2 values of the UP and NAP were 0.715 and 0.466, respectively.For Heilongjiang Province, the power function regression model was the optimal fitting model, while the mean R 2 values of the UP and NAP was 0.783 and 0.730, respectively.
In summary, the accuracy of the evaluation results showed that the fitting results of the UP based on DMSP/OLS nighttime light data were better than those of the NAP.It confirmed there was an undeniable difference between NAP data and the actual UP of China.Substituting the NAP for the UP would undoubtedly produce data errors.The difference between the NAP and UP was attributable to major population flows between urban and rural areas that have occurred with recent processes of rapid urbanization in China.However, due to unique features of Chinese household registration system, the large population that moved from rural to urban areas has not truly obtained urban resident status [46,47].Data for the sixth national census showed that the total migrant population of China reached 260,100,000 in 2010, an 81.0% increase from the total migrant population in 2000.Migrant workers originating from rural areas accounted for much of the aforementioned migrant population.Therefore, simulating long time series information on the UP of China based on DMSP/OLS nighttime light data can compensate for the deficiencies in statistical datasets on the UP in China.

Simulation Accuracy Levels Varied Across Cities of Different Sizes
The size of a city is a key factor that affects the accuracy of simulations.We divided all the cities into four groups (i.e., mega-, large, medium, and small cities) according to their urban population in 2010 [48].The relative errors for the UPs in large cities and megacities were higher than the results in medium and small cities (Table 3).For cities with a statistical UP of more than 10,000,000 in 2010 (e.g., Shanghai, Beijing, and Chongqing), the relative errors of the simulation results of the UP for 2000 had a mean value of approximately 16.76%, while the mean relative error for a city with a statistical UP of 3,000,000-10,000,000 (e.g., Chengdu, Harbin, and Hangzhou) was 14.08%.For cities with a statistical UP of less than 3,000,000 for 2010 (e.g., Luoyang, Lanzhou, and Xiaogan), the mean relative error of the simulation results for was the smallest (11.29%).

Shanghai
The relative errors of the simulation results of the NAP in large cities and megacities were also larger than those in medium and small cities.For large and megacities (UP > 10 million in 2010), the relative errors of the simulation results for the NAP for 2000 had a mean value of 27.32%.Meanwhile, the relative errors of the simulation results for the NAP in 2000 were 19.06% for medium cities (3 million < UP < 10 million in 2010) and 20.61% in small cities (UP < 3 million in 2010).Some plausible causes might account for the difference.First, the policies in China's large and megacities for urban population registration are stricter than the policies in medium and small cities.The flow of the urban population in medium and small cities is more flexible.Second, the frequent adjustment of administrative boundaries in large cities and megacities might lead to an incomparable urban population over time.During the last two decades, the administrative boundaries in a number of megacities and large cities have been adjusted dramatically, such as Chongqing, Beijing, Shanghai, Guangzhou, and Tianjin [49].Therefore, the simulation accuracy in medium and small cities in China was higher than in large and megacities.

Limitations and Avenues for Future Research
The use of DMSP/OLS nighttime light data for UP simulations presented some shortcomings.First, the low spatial resolution of DMSP/OLS nighttime light data and issues related to saturated pixels in urban cores could limit the accuracy of UP simulations.In the future, corrected nighttime light data may be used to overcome the saturation issues [50].Second, consumption and living habits vary across different areas in China, resulting in a difference in nighttime light data-based socioeconomic models for different regions.Therefore, integrating remote sensing products with higher spatial, temporal and spectral resolutions from multiple sources may improve the accuracy of simulation.For example, using newly published nighttime light data from the National Polar-Orbiting Partnership-Visible Infrared Imaging Radiometer Suite can improve the spatial and spectral resolutions of data while overcome data saturation issues in city centers, and consequently led to a better simulation of the UP [51].

Conclusions
DMSP/OLS nighttime light data can be used to retrieve long time series data on the UP of China.Due to regional variations in China, the retrieval model varied at different scales.At the national scale, all three regression models passed the significance test at the 0.001 level.However, the power function model produced the best results, with a R 2 greater than 0.67.At the provincial level, the linear regression model was the best regression model for 64% of the provinces.The power function model and the exponential model were the best regression model for 20% and 16% of the provinces, respectively.
The fitting results of the UP based on DMSP/OLS nighttime light data were better than those of the NAP.At the national scale, the mean R 2 values of the optimal regression models for the UP and NAP data from 1990 to 2010 were 0.694 and 0.658, respectively.At the provincial scale, the mean R 2 values of the optimal regression models between the UP and NAP data from 1990 to 2010 were 0.755 and 0.713, respectively.In addition, a city's size also affected the simulation results.Relative errors between the simulation results and statistical data on the UP for cities with an UP of fewer than 3,000,000 were smaller than those cities with an UP of more than 3,000,000.Therefore, we believe that fitting the UP based on nighttime light data is particularly suitable for China, a country with a large migratory population, which is undergoing rapid processes of urbanization.

Figure 1 .
Figure 1.Urban population (UP) and the digital number (DN) of US Defense Meteorological Satellite Program's Operational Linescan System (DMSP/OLS) nighttime light data in mainland China for 2010.The second type of data used is statistical data on the UP and NAP in China for 1990, 2000, and 2010.Considering the lack of statistical data on the populations of Hong Kong, Macao, and Taiwan, we examined 339 major cities in 31 provinces and regions in China as our study area.In the present study, census data were used to collect UP data; and the Yearbooks of Statistics of Chinese cities were

Figure 1 .
Figure 1.Urban population (UP) and the digital number (DN) of US Defense Meteorological Satellite Program's Operational Linescan System (DMSP/OLS) nighttime light data in mainland China for 2010.

Figure 2 .
Figure 2. Flow chart for choosing an optimal regression model.Figure 2. Flow chart for choosing an optimal regression model.

Figure 2 .
Figure 2. Flow chart for choosing an optimal regression model.Figure 2. Flow chart for choosing an optimal regression model.

Figure 3 .
Figure 3. Optimal regression results of the total digital number (TDN) from the DMSP/OLS nighttime light data and UP/NAP at the national scale

Figure 3 .
Figure 3. regression results of the total digital number (TDN) from the DMSP/OLS nighttime light data and UP/NAP at the national scale.

Figure 4 .
Figure 4. Optimal regression results of the TDN from the DMSP/OLS nighttime light data and UP/NAP at the province level.Figure 4. Optimal regression results of the TDN from the DMSP/OLS nighttime light data and UP/NAP at the province level.

Figure 4 .
Figure 4. Optimal regression results of the TDN from the DMSP/OLS nighttime light data and UP/NAP at the province level.Figure 4. Optimal regression results of the TDN from the DMSP/OLS nighttime light data and UP/NAP at the province level.

Table 1 .
The regression results of the DMSP/OLS nighttime light brightness data and statistical data for the UP at the national scale.

Table 2 .
The regression results of the total digital number from the DMSP/OLS nighttime light data and statistical data for the UP.

Table 3 .
Comparisons between the accuracy of the simulation results for the UP and NAP across cities (the cities are ordered by their sizes in a descending manner).