Where Does Nighttime Light Come From? Insights from Source Detection and Error Attribution

: Nighttime light remote sensing has aroused great popularity because of its advantage in estimating socioeconomic indicators and quantifying human activities in response to the changing world. Despite many advances that have been made in method development and implementation of nighttime light remote sensing over the past decades, limited studies have dived into answering the question: Where does nighttime light come from? This hinders our capability of identifying speciﬁc sources of nighttime light in urbanized regions. Addressing this shortcoming, here we proposed a parcel-oriented temporal linear unmixing method (POTLUM) to identify speciﬁc nighttime light sources with the integration of land use data. Ratio of root mean square error was used as the measure to assess the unmixing accuracy, and parcel purity index and source su ﬃ ciency index were proposed to attribute unmixing errors. Using the Visible Infrared Imaging Radiometer Suite (VIIRS) nighttime light dataset from the Suomi National Polar-Orbiting Partnership (NPP) satellite and the newly released Essential Urban Land Use Categories in China (EULUC-China) product, we applied the proposed method and conducted experiments in two China cities with di ﬀ erent sizes, Shanghai and Quzhou. Results of the POTLUM showed its relatively robust applicability of detecting speciﬁc nighttime light sources, achieving an rRMSE of 3.38% and 1.04% in Shanghai and Quzhou, respectively. The major unmixing errors resulted from using impure land parcels as endmembers (i.e., parcel purity index for Shanghai and Quzhou: 54.48%, 64.09%, respectively), but it also showed that predeﬁned light sources are su ﬃ cient (i.e., source su ﬃ ciency index for Shanghai and Quzhou: 96.53%, 99.55%, respectively). The method presented in this study makes it possible to identify speciﬁc sources of nighttime light and is expected to enrich the estimation of structural socioeconomic indicators, as well as better support various applications in urban planning and management.


Introduction
In addition to an environment-based remote sensing dataset such as Landsat and Moderate Resolution Imaging Spectroradiometer (MODIS), a nighttime light (NTL) remote sensing dataset is more recognized as human-oriented [1], reflecting the distribution and intensity of human activities. Therefore, more and more human-related studies have extracted the urban built-up areas [2], estimated socioeconomic variables such as gross domestic product (GDP) values [3,4], population [5], and carbon

EULUC-China 2018
Dataset of Essential Urban Land Use Categories in China (EULUC-China) for 2018 is a vector parcel-based product downloaded from http://data.ess.tsinghua.edu.cn/. This product was derived from a set of 10-m satellite images, OpenStreetMap, nighttime lights, point of interests, and Tencent location-based service data in 2018 using machine learning algorithms. Its two-level classification system was adapted from the Chinese Standard of Land Use Classification. Here we used its first level classification results as the input of our method (as Table 2 shows). In addition to the EULUC-China 2018 dataset, the field trip samples hierarchically generated for training their classifiers were also collected. They are recorded in the form of categories, landmark buildings and facilities, and mixed land use situation and respective estimated proportions. They are used as original parcels here. More information on the EULUC-China product can be found in reference [26]. On this basis, parcels can be further refined by filtering through their recorded mixed situation and estimated proportions. No mixed parcel validated visually would be included in refined parcels. The original and refined parcels are used as endmembers in different cases to test the applicability of PPI.
Shanghai is one of the most prosperous metropolitans in China, which gathers humans from various regions and partitions the land parcels into adequate land use classes. The NTL sources in Shanghai have been detected using POTLUM and assessed by rRMSE with three PPI and two SSI attributing the error. Figure 1 shows the study sites with NPP-VIIRS nighttime light remote sensing data, EULUC-China 2018 maps, original parcels, refined parcels, and random points.

Methods
To take advantage of spatial and temporal methods, the temporal unmixing method is adopted [23], which forms a reflectance profile in multiple temporal units instead of spectral bands. It further assumes that the proportion of each class reflectance in a certain pixel should not change within the selected period. Among various unmixing methods, linear unmixing is the most simple and clear one [22], which takes the pixel reflectance as the sum of endmembers reflectance multiplied by their respective fractions directly. NTL source detection is now turning into a parcel temporal linear unmixing problem, given the consensus that land use is a parcel-oriented concept rather than a pixeloriented one [26]. A parcel-oriented temporal linear unmixing method (POTLUM) is therefore proposed.  (C,D). Besides, random points, original parcels and refined parcels used for POTLUM are also delineated in (B,D) with dots and polygons. Res, Com, Ind, Tsp, and Pub represents residential, commercial, industrial, transportation, and public service land use types, respectively.

Methods
To take advantage of spatial and temporal methods, the temporal unmixing method is adopted [23], which forms a reflectance profile in multiple temporal units instead of spectral bands. It further assumes that the proportion of each class reflectance in a certain pixel should not change within the selected period. Among various unmixing methods, linear unmixing is the most simple and clear one [22], which takes the pixel reflectance as the sum of endmembers reflectance multiplied by their respective fractions directly. NTL source detection is now turning into a parcel temporal linear unmixing problem, given the consensus that land use is a parcel-oriented concept rather than a pixel-oriented one [26]. A parcel-oriented temporal linear unmixing method (POTLUM) is therefore proposed.
Besides, this study proposed error-attributing indices from the constraints of pixel unmixing. Among possible influential factors, the accuracy of unmixing results are mainly constrained by three aspects [23]: (1) model practicability; (2) endmember purity; and (3) light source sufficiency. Ratio RMSE (RMSE divided by the average of endmember matrix) is adopted to assess the unmixing error and to evaluate the first constraint. Two more series of indices are proposed to attribute the unmixing error regarding the other two constraints, namely parcel purity index (PPI) and source sufficiency index (SSI).

Parcel-Oriented Temporal Linear Unmixing Method (POTLUM)
Regarding that monthly NTL dataset and annual EULUC-China are both in the year 2018, a fraction of each light source should remain the same, which meets the requirement of POTLUM. Besides, NTL pixel values of parcels in the same class are averaged to form the endmember profile throughout twelve months, mitigating the impact of non-land-use-related abruptions in certain parcels. With estimated endmember matrix and downloaded NTL dataset, source reflectance fraction in each NTL pixel can be estimated by the Equation (1): where F denotes source fraction matrix, N denotes NTL matrix, E denotes endmember matrix, with subscripts denoting rows and columns of each matrix. In detail, p, c, m represents total pixel numbers, classes of light source, and months, respectively. The unmixing calculation is directly conducted according to this equation, without other constraints.
To get a pixel-based assessment, root square error (RSE) is adopted (Equation (2)), and the overall accuracy of POTLUM is assessed by rRMSE Equation (3).
where N i,j denotes original NTL value for pixel i in month j,N i,j represents remixed NTL value, Mean (E c,m ) denotes the average of the whole matrix, with all parameters sharing the same meanings as above.

Parcel Purity Index
As shown in Equation (1), the calculation of E matrix controls the unmixing accuracy. Since E is calculated from sample parcels, it is the parcel purity that actually contributes much to the unmixing accuracy [27]. Although no existing index is set to assess the parcel purity, it is reasonable to infer from the unmixed results. Logically, unmixed light source fraction should meet F p,c ∈ [0,1], indicating that source fraction should neither be lower than 0 nor exceed 1. On this basis, the parcel purity (PP) matrix is labeled as follows: where p i and c i indicate the p i th row and c i th column in F p,c , respectively.
With the help of the PP matrix, a pixel-based, a class-based and an overall PPI can be calculated as follows, and p, c denotes total pixel numbers and classes of light source, respectively, with all variables sharing the same meanings as above.
Remote Sens. 2020, 12,1922 6 of 14 Pixel-based PPI is calculated by the qualified element numbers in the PP matrix divided by all element numbers within a pixel (one row in the PP matrix), which can be used as a quality assessment (PPQA) band. Class-based assessment PPI (PPCA) is calculated by the qualified element numbers divided by all element numbers within a class. Overall assessment PPI (PPOA) is calculated by all qualified element numbers divided by all element numbers in the whole matrix.

Source Sufficiency Index
To detect the NTL sources within the urban area through POTLUM lies on a basis that the light sources, or the land use classes, is sufficient. If the sources are just sufficient, the sum of source fraction of the same pixel should follow c c i =1 F p i ,c i = 1, otherwise the sum should either be lower than or exceed one. Between them, the sum lower than one but larger than zero is acceptable since the land use classes adapted from Chinese Standard of Land Use Classification cannot cover all classes within the urban area. Therefore, the source sufficiency (SS) matrix is labeled as follows.
Like PPI, a pixel-based and an overall index are proposed and similarly named as SSQA and SSOA, with all parameters sharing the same meanings.

Practicability of POTLUM, PPI, and SSI
The endmember matrix for unmixing is generated from the average of NTL values of all samples in the same category each month, showing as profiles along the timeline (Figures S1-S3). To test the practicability of PPI and SSI, four types of samples are selected, namely random point samples, original parcel samples, refined parcel samples, and class-adjusted refined parcel samples. Commercial, industrial, and public service land use types of refined parcel samples are integrated into one category by averaging their NTL values, to generate purer parcel samples. On this basis, the sample purity increases in these four types of samples. Results from these controlled experiments in Shanghai and Quzhou verify this trend, and prove the practicability of PPI, as is summarized in Tables 3 and 4. Comparing the last two lines in these tables, the SSOA of class-adjusted is lower than the non-adjusted one in both Shanghai and Quzhou, which also proves the practicability of SSI. Also, from the comparison between the first line to any of the rest, the privilege of parcel-oriented over point-oriented approach has been justified. Note that pixel-based assessments from RSE, PPQA, and SSQA are not summarized in these tables.

Nighttime Light Sources
Overall, the unmixed source fractions depict the spatial pattern of different light sources throughout Shanghai. Five fraction pictures have been rendered in different colors (see left panel of each subfigure in Figure 2). Obviously, a high proportion of the residential areas are located next to the city center and the transportation with high intensity are evenly distributed throughout the city, but with a relatively small proportion in each pixel. Purer commercial areas are also distributed in order in Shanghai with the highest proportion in the city center and near two airports, and the higher proportional public management and service areas are highly concentrated in certain places, similar to the distribution of administrations or some famous parks. However, industrial places account for a large amount of area in Shanghai. Although some of them well describe the industrial districts, others may encounter misestimation. More detailed information can be shown in four famous landmarks, namely the Century Park, Chenghuang Temple, Baosteel corporation, and Hongqiao Airport. They mainly comprise administrations and leisure places, shopping and culture religion with dense highways, industrial parks and ferry stations, transportation and commercial service land uses, respectively. Four zoomed-in figures have well captured these variances.
Differing from metropolitans such as Shanghai, NTL in Quzhou, a less developed prefecture but a traffic thoroughfare in Zhejiang province, is mainly attributed to residential ( Figure 3A) and transportation categories ( Figure 3D). Specifically, Kaihua (II) and Longyou (V) county are more developed regions, where public service ( Figure 3E) and industrial types ( Figure 3C) account for majority of NTL values, respectively. Qiuchuan town (III) and Quzhou airport (IV) are typical areas of residential NTL intensive ( Figure 3A) and transportation NTL intensive ( Figure 3D), respectively. Differing from metropolitans such as Shanghai, NTL in Quzhou, a less developed prefecture but a traffic thoroughfare in Zhejiang province, is mainly attributed to residential ( Figure 3A) and transportation categories ( Figure 3D). Specifically, Kaihua (II) and Longyou (V) county are more developed regions, where public service ( Figure 3E) and industrial types ( Figure 3C) account for majority of NTL values, respectively. Qiuchuan town (III) and Quzhou airport (IV) are typical areas of residential NTL intensive ( Figure 3A) and transportation NTL intensive ( Figure 3D), respectively.

Unmixing Accuracy
Besides visual validation, quantitative validation figures are also provided. RSE calculates unmixing errors pixel by pixel. Although the results are acceptable through quantitative assessment in Tables 1 and 2 and qualitative assessment in Figures 2 and 3, there still exist misestimations in several pixels.
These pixels mainly clustered in city centers, airport, Disney Park, and along the Huangpu River

Unmixing Accuracy
Besides visual validation, quantitative validation figures are also provided. RSE calculates unmixing errors pixel by pixel. Although the results are acceptable through quantitative assessment in Tables 1 and 2 and qualitative assessment in Figures 2 and 3, there still exist misestimations in several pixels.
These pixels mainly clustered in city centers, airport, Disney Park, and along the Huangpu River in Shanghai, judging by the RSE maps. Having pointed out the erroneous pixels in RSE map, PPI, and SSI are further calculated to attribute these errors into parcel purity or source sufficiency problem. As can be seen from Figure 4(II to V), most erroneous pixels are consistent with lower values in PPQA and SSQA, indicating that these errors can be corrected after refining parcel purity and source sufficiency rather than changing unmixing methods.  Quantitatively, NTL unmixed results in Quzhou are much better than those in Shanghai because of less mixed land use parcels, comparing values in Table 2 to those in Table 1. Erroneous pixels in Quzhou are similarly clustered in city downtown, for example, the more developed region in Kaihua (II) and Longyou (V) county ( Figure 5C), which can also be improved after purifying the purity ( Figure 5A). In detail, errors in the Century Park and Chenghuang Temple result from both impure parcels and insufficient sources, and those in Hongqiao Airport mainly result from impure sources, with Baosteel corporation a satisfactory unmixing result. To be more specific, PPCA can be used to check which endmember includes the most impure parcels. As the third line in Table 3 shows, the PPOA, SSOA, rRMSE, and PPCA of other four sources are acceptable, while industrial parcel samples are the least pure, in accord with visual validation of Figure 2.
Quantitatively, NTL unmixed results in Quzhou are much better than those in Shanghai because of less mixed land use parcels, comparing values in Table 4 to those in Table 3. Erroneous pixels in Quzhou are similarly clustered in city downtown, for example, the more developed region in Kaihua (II) and Longyou (V) county ( Figure 5C), which can also be improved after purifying the purity ( Figure 5A).
Quantitatively, NTL unmixed results in Quzhou are much better than those in Shanghai because of less mixed land use parcels, comparing values in Table 2 to those in Table 1. Erroneous pixels in Quzhou are similarly clustered in city downtown, for example, the more developed region in Kaihua (II) and Longyou (V) county ( Figure 5C), which can also be improved after purifying the purity ( Figure 5A).

Comparison with Previous Works
NTL source detection has long been a tough problem and invited multiple experiments these years. For example, Li et al., incorporated nonnegative constraints when solving the F p,c [18], to ensure that the unmixed result is reasonably meaningful. Moreover, Ma et al., plotted a cumulative distribution of pixel-level NTL radiance for different types of land cover [21]. The cumulative distributions demonstrate the one-to-one correspondence between NTL radiance and composition of different types. Meanwhile, Chen et al., utilized random forest to map information from different types of POI to NTL [20] and take contributions of different types to the mapping algorithm as the contribution to NTL, which also represents fractions. These studies detect NTL sources through multiple methods, but Li and Ma failed to objectively assess the error. While Chen assessed the error objectively, the physical interpretation of the contribution and error are not clear enough since random forest is a black box. So far, no study has devoted to attributing the error. Note that although POTLUM includes no further constraint such as nonnegative, the rRMSE is still acceptable. Thanks to this non-constrained unmixing procedure, PPI and SSI are proposed. The workflow put forward in this study combines detection light sources as well as attribute errors objectively.
However, NTL source fraction products share different interpretations from land use products such as EULUC-China. Each pixel value in NTL source fraction shows proportions of different light sources, suggesting a more intensive light source and indicating a higher intensity of human activities. In comparison, land use products such as EULUC-China focus on delineating the clearer boundary of a certain land use type, without enough intensity. They can be used in different cases according to their different emphases in future studies.
The development of PPI in this study is not only helpful to acquire a more precise NTL source fraction, but also applicable to explain the classification error in EULUC-China. Taking Shanghai as an example, whose land use complexity is similar to China, PPCA in Shanghai shows high correlation with User's Accuracy (R = 0.883) and Producer's Accuracy (R = 0.642) of EULUC-China classification results, using the same original parcels (refer to Table S1). As is reasonable to say that an impure parcel is rather difficult to be classified into one specific type, PPCA can be applied to select better training samples for parcel-oriented land use classifications.

Visual Validation, RMSE-Related Indices and Error Attributing Indices
Visual validation is the most prevailing method to label parcel class. However, two points should be considered, i.e., different methods to acquire parcels, and different understanding of validation. Field trip parcels in this study were generated from the polygons clipped by surrounding buffered roads [26], which unavoidably incorporates bias within parcels. With this bias, campaign researchers would label 100% purity to a parcel which comprises areas belonging to a pure type, without considering whether this parcel really represents the pure type. Since the average value of features within the segmented parcel is calculated and used to estimate endmembers, the difference between them can make a big difference, which invites us to propose these more objective indices.
Among objective indices, PPI and SSI are both mathematically and physically isolated from RMSE-related indices. RMSE evaluates the difference between original N i,j and remixed N i,j , and collects all errors without considering their attribution. In contrast, the calculation of PPI and SSI derives from the physical inference of the unmixed results and therefore attributes the errors detected in RSE map to either parcel purity or source sufficiency one (Figures 4 and 5). For example, in Figure 4, the unmixed map in Shanghai, higher RSE in Chenghuang Temple area (III) can be attributed to both impure parcels and insufficient sources. Those not detected by PPI and SSI can then be attributed to model practicability. PPI and SSI are also not necessarily dependent. For example, in Chongming District, a northern island in Shanghai, a little higher RSE in the middle north of the island is mainly attributed to insufficient classes rather than impure parcels, since the area is covered with croplands instead of urban land use classes.

Uncertainty and Implications
This study proposed POTLUM, PPI, and SSI, whose practicability have been verified by controlled cases. There is no further a challenge in applying them to every administrative area. Still, there remains unknown whether they can be applied to all administrative units as a whole. Since PPOA and SSOA shows a big difference in Shanghai and Quzhou (Tables 3 and 4), it's clear that there exist systematic background differences between various cities. To mitigate the huge disparity of NTL values among regions is the key to successfully apply POTLUM, PPI, and SSI to the whole country or even the whole world at once, rather than one by one.
Land use products are essential to trace the human activity footprint. NTL source fractions here and land use dataset such as EULUC-China can both be applied to discover the human-environment connections historically, and be used as forcing factor to forecast future changes through simulation models such as WRF-Chem [28]. For forecasting, it has been a long time that models take land cover datasets as inputs. Until recently, researchers have found that modeling accuracy can be significantly improved in simulating heat, wind, and pollution-related variables, if finer classes within urban land cover category are input [29]. Since those variables are highly correlated with human activities, detailed simulation within urban has increasingly been emphasized. Among other improving models, a cooling tower scheme takes human-related sensible heat into consideration [30]. It needs to distinguish different types of air-conditioners in different human activity categories, where both NTL sources fraction and EULUC-China can play a role.
Regarding the simpleness and interpretation of POTLUM, it is easy to be used in multiple regions throughout a longer period of time. RMSE-related indices together with PPI and SSI helps to produce products for higher accuracy at the target regions and periods. If human activity intensity to a larger extent during a longer time can be traced, their footprint can be delineated and the simulated accuracy of aforementioned cooling tower scheme is more likely to improve. Besides, other similar human behavioral associated studies are upcoming [10].

Conclusions
This study sought to answer the challenging question-where does nighttime light come from? by detecting nighttime light sources and attributing its detection biases. Towards this ultimate goal, we have developed an urban NTL source detection method called parcel-oriented temporal linear unmixing method (POTLUM) and proposed two indices to attribute the unmixing errors. Results showed that with this simple and straightforward POTLUM, we could successfully detect urban NTL sources with plausible accuracies. We also identified that most unmixing errors could be attributed to endmember estimation and NTL sources definition rather than model selection, according to PPI and SSI. Various controlled experiments further verified the efficiency of PPI and SSI in their capability of capturing the parcel purity and source sufficiency, which could explain the error of parcel-oriented land use classification partially. With the help of PPI and SSI, finer endmembers and NTL sources could be estimated and adapted, so that POTLUM can further be applied to larger areas throughout a longer time. In conclusion, the method presented in this study makes it possible to identify specific sources of nighttime light and is expected to enrich estimation of structural socioeconomic indicators, as well as better support various applications in urban planning and management.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/12/1922/s1, Figure S1: Temporal profile of original parcels (a) and refined parcels (b), Figure S2. Temporal violinplots of original pixel values throughout twelve month (from left to right and then from top to bottom), Figure S3. Temporal violinplots of refined parcel values throughout twelve month (from left to right and then from top to bottom), Table S1. Pearson's correlation between PPCA and class accuracy of EULUC-China 2018 from original parcels.