Potential of Night-Time Lights to Measure Regional Inequality

: Night-time lights satellite images provide a new opportunity to measure regional inequality in real-time by developing the Night Light Development Index (NLDI). The NLDI was extracted using the Gini coe ﬃ cient approach based on population and night light spatial distribution in Romania. Night-time light data were calculated using a grid with a 0.15 km 2 area, based on Defense Meteorological Satellite Program (DMSP) / Operational Linescan System (OLS satellite imagery for the 1992–2013 period and based on the National Polar-orbiting Partnership–Visible Infrared Imaging Radiometer Suite (NPP-VIIRS) satellite imagery for the 2014–2018 period. Two population density grids were created at the level of equal cells (0.15 km 2 ) using ArcGIS and PostgreSQL software, and census data from 1992 and 2011. Subsequently, based on this data and using the Gini index approach, the Night Light Development Index (NLDI) was calculated within the MATLAB software. The NLDI was obtained for 42 administrative counties (nomenclature of territorial units for statistics level 3 (NUTS-3 units)) for the 1992–2018 period. The statistical relationship between the NLDI and the socio-economic, demographic, and geographic variables highlighted a strong indirect relationship with local tax income and gross domestic product (GDP) per capita. The polynomial model proved to be better in estimating income based on the NLDI and R 2 coe ﬃ cients showed a signiﬁcant improvement in total variation explained compared to the linear regression model. The NLDI calculated on the basis of night-time lights satellite images proved to be a good proxy for measuring regional inequalities. Therefore, it can play a crucial role in monitoring the progress made in the implementation of Sustainable Development Goal 10 (reduced inequalities). Polar-orbiting Partnership (NPP) satellite records light intensity using VIIRS (Visible Infrared Imaging Radiometer Suite) sensors, which collect data in 22 di ﬀ erent spectral bands, one of which is the DNB (day / night band). These data are available at an annual scale for 2015 and 2016, and a monthly scale for the 2012–2019 period; their resolution is 15 arc-s (approximately 500 m). In this study, we used Version 1 VIIRS Day / Night Band Night-time Lights suite produced by the Earth Observations Group (EOG) at NOAA / NCEI (National Centers for Environmental Information) [61]. From the annual data we used the "vcm-orm-ntl" layer (VIIRS Cloud Mask–Outlier Removed–Night-time Lights) for 2016, which contains cloud-free average emitted radiance expressed in nW / cm 2 / sr; these data were subject to an outlier removal process to ﬁlter out ﬁres and other ephemeral lights. As regards monthly data, there are two conﬁgurations of the NPP-VIIRS composites: “vcmcfg”—data contaminated by stray light are removed, and “vcmslcfg”—data contaminated by stray light are corrected [59]. Thus, for 2014 and 2018, the averaging pixel brightness was calculated based on the monthly data (“vcmslcfg”) corresponding to the summer months (May–September), and two composite VIIRS night-time light images resulted. We used the summer months (May-September) because there were no areas covered in snow in Romania during this period.


Introduction
At the United Nations (UN) Sustainable Development Summit held in September 2015, the leaders of 193 UN member states expressed their adherence to the 17 Sustainable Development Goals (SDGs) of the 2030 Agenda which, as a global action plan, aims at reducing poverty, fighting inequalities and injustice, and protecting the environment by 2030 [1]. In monitoring progress towards these goals, Earth observation (EO) solutions play an important role in providing information where national data are not available or are costly [2][3][4][5][6][7].

DMSP Night-Time Light Data
The Defense Meteorological Satellite Program (DMSP) satellite images were collected by the National Oceanic and Atmospheric Administration (NOAA). They are available free of charge for the 1992-2013 period and have a spatial resolution of 30 arc s (1 km) (Table 1).

DMSP Night-Time Light Data
The Defense Meteorological Satellite Program (DMSP) satellite images were collected by the National Oceanic and Atmospheric Administration (NOAA). They are available free of charge for the 1992-2013 period and have a spatial resolution of 30 arc s (1 km) (Table 1). DMSP-OLS satellites orbit the Earth and record the intensity of light using OLS sensors. The light intensity measured by the DMSP-OLS satellites ranges from 0 to 63. Digital number: 0 corresponds to non-illuminated areas, and the 63 values correspond to the strongly illuminated areas ( Figure 2). The DMSP images for Romania were extracted from the NOAA/NGDC database Version 4 DMSP-OLS Night-time Lights Time Series [60]. Abbreviations: DMSP, Defense Meteorological Satellite Program; OLS, Operational Linescan System; NTL, night-time lights; NPP, National Polar-orbiting Partnership; VIIRS, Visible Infrared Imaging Radiometer Suite; DNB, day/night band.

NPP-VIIRS Night-Time Light Imagery
The NPP-VIIRS satellite images were collected by the National Oceanic and Atmospheric Administration and National Aeronautics and Space Administration (NASA). The Suomi National Polar-orbiting Partnership (NPP) satellite records light intensity using VIIRS (Visible Infrared Imaging Radiometer Suite) sensors, which collect data in 22 different spectral bands, one of which is the DNB (day/night band). These data are available at an annual scale for 2015 and 2016, and a monthly scale for the 2012-2019 period; their resolution is 15 arc-s (approximately 500 m). In this study, we used Version 1 VIIRS Day/Night Band Night-time Lights suite produced by the Earth Observations Group (EOG) at NOAA/NCEI (National Centers for Environmental Information) [61]. From the annual data we used the "vcm-orm-ntl" layer (VIIRS Cloud Mask-Outlier Removed-Night-time Lights) for 2016, which contains cloud-free average emitted radiance expressed in nW/cm 2 /sr; these data were subject to an outlier removal process to filter out fires and other ephemeral lights. As regards monthly data, there are two configurations of the NPP-VIIRS composites: "vcmcfg"-data contaminated by stray light are removed, and "vcmslcfg"-data contaminated by stray light are corrected [59]. Thus, for 2014 and 2018, the averaging pixel brightness was calculated based on the monthly data ("vcmslcfg") corresponding to the summer months (May-September), and two composite VIIRS night-time light images resulted. We used the summer months (May-September) because there were no areas covered in snow in Romania during this period.

NPP-VIIRS Night-Time Light Imagery
The NPP-VIIRS satellite images were collected by the National Oceanic and Atmospheric Administration and National Aeronautics and Space Administration (NASA). The Suomi National Polar-orbiting Partnership (NPP) satellite records light intensity using VIIRS (Visible Infrared Imaging Radiometer Suite) sensors, which collect data in 22 different spectral bands, one of which is the DNB (day/night band). These data are available at an annual scale for 2015 and 2016, and a monthly scale for the 2012-2019 period; their resolution is 15 arc-s (approximately 500 m). In this study, we used Version 1 VIIRS Day/Night Band Night-time Lights suite produced by the Earth Observations Group (EOG) at NOAA/NCEI (National Centers for Environmental Information) [61]. From the annual data we used the "vcm-orm-ntl" layer (VIIRS Cloud Mask-Outlier Removed-Nighttime Lights) for 2016, which contains cloud-free average emitted radiance expressed in nW/cm 2 /sr; these data were subject to an outlier removal process to filter out fires and other ephemeral lights. As regards monthly data, there are two configurations of the NPP-VIIRS composites: "vcmcfg"-data contaminated by stray light are removed, and "vcmslcfg"-data contaminated by stray light are corrected [59]. Thus, for 2014 and 2018, the averaging pixel brightness was calculated based on the monthly data ("vcmslcfg") corresponding to the summer months (May-September), and two composite VIIRS night-time light images resulted. We used the summer months (May-September) because there were no areas covered in snow in Romania during this period.

Demographic and Economic Data
The data on the population of villages, cities/towns, and municipalities (LAU2) were taken from the 1992 and 2011 population census [62]. These data were aggregated on equal grid networks of 0.15 km 2 ; two population density grids were created with the ArcGIS and PostgreSQL software and later used to calculate the NLDI. In the absence of other sources of population data at the local level, the population density grid obtained for 1992 was used to calculate the NLDI for 1992, and the population density grid resulted for 2011 was used to calculate the NLDI for The data on GDP per capita at the county level were extracted from the Eurostat database [64] for the 2008-2016 period, and from the National Institute of Statistics [63] for 1995 (since it was the closest year to 1992 available). The local tax income data at the county level were obtained from the Ministry of Regional Development and Public Administration (MRDPA) [65,66] (Table 2).

Methodology
Earth observation allowed us to calculate the Night Light Development Index (NLDI) which we used to measure regional inequality. This index is calculated based on the methodology for the calculation of the Gini coefficient, commonly used to measure regional inequality [43,48,53,67].
In this study, we intended to measure regional inequalities in Romania using the Night Light Development Index (NLDI), which can be regarded as a proxy variable in highlighting inter-regional differentials [54]. The Gini coefficient is one of the most commonly used indices worldwide for measuring income inequalities among individuals or households in a country. It ranges from 0 to 10, 0 representing perfect equality and 10 perfect inequality. Using the Gini coefficient approach, first we calculated the Night Light Development Index (NLDI) at county level based on data on the spatial distribution of the population and the aggregate night-time light, both at the grid level. Next, we determined the Night Light Inequality Index (NLII) at the national level, based on the Night Light Development Index (NLDI) values at the county level, using the Gini coefficient methodology. The values of the NLII range from 0 to 10, where 0 indicates perfect equality and 10 perfect inequality. The large volume of data required the automation of the NLDI calculation; to this end, a script was developed in MATLAB, and later we integrated therein the gini.m script created by Lengwiler [68]. Gini.m calculates the Gini coefficient (NLDI in our case) and the Lorenz curve using vectors of equal length and values greater than or equal to 0. NLDI values range from 0 to 1, the highly developed counties have low NLDI values and the least developed counties have high NLDI values [55]. The Lorenz curve is the graphical representation of the distribution and it is a straight line in the case of perfect equality (Figure 3).
where NLDI is the Night Light Development Index; A is the area between the perfect equality line and the Lorenz curve; B is the area under the Lorenz curve.
determined the Night Light Inequality Index (NLII) at the national level, based on the Night Light Development Index (NLDI) values at the county level, using the Gini coefficient methodology. The values of the NLII range from 0 to 10, where 0 indicates perfect equality and 10 perfect inequality. The large volume of data required the automation of the NLDI calculation; to this end, a script was developed in MATLAB, and later we integrated therein the gini.m script created by Lengwiler [68]. Gini.m calculates the Gini coefficient (NLDI in our case) and the Lorenz curve using vectors of equal length and values greater than or equal to 0. NLDI values range from 0 to 1, the highly developed counties have low NLDI values and the least developed counties have high NLDI values [55]. The Lorenz curve is the graphical representation of the distribution and it is a straight line in the case of perfect equality (Figure 3).
because A + B = 0.5 (since the axes scale from 0 to 1).
where NLDI is the Night Light Development Index; A is the area between the perfect equality line and the Lorenz curve; B is the area under the Lorenz curve. The NLDI is determined using the calculated area between the Lorenz curve and the axes (B). This area is calculated by computing the average of the left and right Riemann-like sums (rrapezoidal rule). These are called Riemann-like sums because they are calculated in points given by population values and not on a uniform grid (Figure 3).
This area calculation method is commonly used in mathematical analysis to approximate a definite integral. The values of the f function over an interval are approximated by the average of the values at the left and right extremities. A simple calculation involves using the trapezoidal area formula: where h is the height of the trapeze; b1 and b2 are parallel sides. The NLDI is determined using the calculated area between the Lorenz curve and the axes (B). This area is calculated by computing the average of the left and right Riemann-like sums (rrapezoidal rule). These are called Riemann-like sums because they are calculated in points given by population values and not on a uniform grid (Figure 3).
This area calculation method is commonly used in mathematical analysis to approximate a definite integral. The values of the f function over an interval are approximated by the average of the values at the left and right extremities. A simple calculation involves using the trapezoidal area formula: where h is the height of the trapeze; b 1 and b 2 are parallel sides.
where the interval [a, b] is divided into n subintervals, each of length: ∆x = (b − a)/n. The points in the partition will then be: a, a + ∆x, a + 2 ∆x, . . . .a + (n − 1) ∆x, b.
Using the Gini coefficient approach, the Night Light Inequality Index (NLII) was determined at the county level based on the Night Light Development Index (NLDI) values. In addition, we compared the evolution of the NLII for 1992-2018 with the official household income Gini index measured and published by Eurostat [64] in order to check the robustness of the NLII.
A large part of the literature on regional inequalities is focused on income and GDP per capita, as a complex measure of economic development highlighting inter-regional differentials worldwide. In order to validate the NLDI, we performed a correlation and regression analysis using the income data available at the county level. Using typical regression notations/notions, we can specify the relation between income and the NLDI as follows:Î ncome = a + b * NLDI + ε where Y represents the dependent variable (in our case the estimated income); X represents the independent variable (representing the NLDI); ε is the residual variable; a, b, c are regression parameters estimated by applying the regression function, based on data series used to define two variables. In order to find the best fitting (trend) line, we tested both the exponential function model (Equation (6)) and the polynomial model (Equation (7)). By substituting our variables, we ended up with the following equations:Î ncome = a * e (−b * NLDI) + c (6) where the NLDI represents the Nigh Light Development Index; a, b, c are regression parameters. The classic measure of fit in a linear regression model is the R 2 and its adjusted counterpart, R 2 a, which shows the validity of the chosen model for explaining the variation of Y (the percentage of income estimation explained by the NLDI). Adjusted R 2 (R2a) is a coefficient of determination corrected with degrees of freedom that has the same meaning as R 2 .
Evaluation of the regression function's capacity to estimate income is often based on the relative error (RE) or residuals (the difference between the observed y values and the predicted y values) and relative root mean square error (RRMSE). The lower the absolute value of the RE, the better the model fitting. Similarly, a lower RRMSE (which is derived from RE and the analyzed units) indicates an increased accuracy of the regression function (better fit of the model).
The Akaike information criterion (AIC) is a useful tool for the selection of models. It is based on the likelihood function and ranks regression models according to a score, where a lower value represents better results. The AIC is computed as: where L represents the value attributed to likelihood and k is the number of estimated parameters. Hence, the smaller the value of the information criterion, the better the model. The annual NLDI values were related to socio-economic variables (GDP, income), demographic variables (total population, net migration rate), and geographical factors (altitude, geographical position) using multiple regression.
Multiple regression also allows us to determine the overall model fit (variation explained) and the relative contribution of each of the predictors to the total variation explained. We used the SPSS software to calculate the regression parameters. The enter regression method was used at a 95% confidence level.
where a is the intercept; b 1 , b 2 . . . b 7 are the unstandardized coefficients; ε is error; NLDI represents the Nigh Light Development Index; Alt is altitude, Long is longitude; Lat is latitude; Income is local tax income; GDP-is gross domestic product per capita; NMR is net migration rate; Pop is population.   For classification of the NLDI, we used the natural breaks (Jenks) classification method which aims to ensure natural grouping in the data by minimizing variance between classes and maximizing Remote Sens. 2020, 12, 33 9 of 15 variance across them [33]. The analysis conducted on the NLDI values for each year taken into consideration shows a downward trend until the economic crisis, followed by a short increase. All this illustrates a growing polarization between the most and least developed counties. Counties with a higher development level and an implicitly lower value of the NLDI are concentrated in the Bucharest capital area, the southern and north-western parts of the historical region of Transylvania (Sibiu (SB), Bras , ov (BV), Cluj (CJ) counties), and the western border region (Timis , county (TM)) along with areas close to the Black Sea Coast (Constanta county (CT)). These counties have the highest level of human capital and foreign direct investments, and the best accessibility and productivity in Romania [35][36][37][38][39][40].

Results and Discussion
In contrast, the lowest levels of economic development reflected through the NLDI can be found in the northern, eastern, and south-western parts of Romania (Figure 4), which are among the poorest regions of the EU.
Even though the geographical position and proximity are important factors in development, we cannot unambiguously identify a duality of development between the west and the east or the north and the south, the country being similar to a mosaic of rich and poor areas, reflecting high regional inequalities (Table 3). The spatial distribution of NLDI values are influenced by a number of factors such as socio-economic variables, demographic variables, and geographical factors. The Pearson correlation coefficient was calculated (Table 3) to highlight the relationship between the NLDI and these factors.
The correlation analysis indicates the preservation of the value and the sign of the relationship between the NLDI and the parameters, in all the years subject to analysis. Thus, the relationship is inverse in terms of development indicators, as the NLDI is higher in the counties with a lower local tax income and GDP per capita. Similar results at the subnational scale in China were obtained by Xu [56] and Zhou [54] using VIIRS night-time radiance. In the case of demographic variables, the NLDI has an indirect relationship with the county population and the positive net migration rate, and a direct relationship with negative net migration rate. There is nothing unusual in these findings: higher levels of economic development are generally related to the positive market effects generated by a larger population concentration. These areas are also attractive for migration due to the higher number of jobs, higher wages, and good quality of life. All these generate positive migration balances. The relationship between the NLDI and the geographical factors indicates higher NLDI values in the counties with a higher average altitude, and an increase of the NLDI values from south to north and from east to west. However, the very low correlation values show that there is no spatial clustering for these parameters (Figure 4).
The results show statistically significant correlations with the variables that reflect the development level and the demographic variables, and non-significant correlations with the geographical variables; except for 2014 and 2016, where we can see a statistically significant direct relationship with latitude. A possible explanation would be the different resolution of the satellite images (VIIRS compared to DMSP), but also an increase of economic development between the north and the south especially after the economic crisis ( Figure 4). The relatively small correlation value (0.47) indicates that there is not a very clear differentiation of the development level in the north-south direction, due to the presence of less developed areas in the southwestern part of the country.
Given the low values of the Pearson correlation coefficients for the geographical factors, we further analyzed their relative contribution to the estimation of the NLDI using multiple linear regression. The regression was calculated for the most recent year (2016) for which there is complete data ( Table 4). Multiple regression analysis explains ≈80% of the total variability (adjusted R 2 = 0.798) of the NLDI for the year under study. A clearer picture of the results of multiple regression is provided by the relative contribution of each predictor to the total variation explained. The analysis of the standardized coefficients shows a relatively balanced relationship between the socio-economic variables and the demographic variables in the multiple regression results (Table 4). In general, geographical variables had a smaller contribution to the result of the multiple regression, indicating a lower impact on the NLDI values. It should be noted that the income and GDP variables have a greater impact in the calculation of the index than population and net migration rate. Table 3 shows that the NLDI correlates best with income, which indicates a high potential of the index in estimating income at the county level. In this context, we performed a regression analysis to identify which type of regression was best suited for income estimation. The analysis was conducted only for the 2008-2018 period because there was no income data for 1992. In this analysis we used several criteria to select the best model: total variation explained (R 2 ), error (RMSE), and AIC (Table 5). The best results were obtained in the second-order (k = 2) polynomial model ( Figure 5), for all the criteria considered (Figure 5a-f). The differences are small between the polynomial model and the exponential one, but if we consider R 2 , we can see a significant improvement compared to the linear Remote Sens. 2020, 12, 33 11 of 15 model. If we use this model, we can see an improvement of the explained variation by 6% in 2016 and 14% in 2012, compared to linear regression. Similar results were obtained by Dai [12] who, at the provincial scale, highlighted a higher accuracy of the polynomial model compared to the linear one. Satellite images proved to be a good proxy in estimating economic output and regional inequalities at the subnational level.
One of the main limitations of this study is the statistically insignificant relationship between the NLDI and the income for all 3181 LAU1 (local administrative units: communes and cities). It indicates a low potential of the index in estimating income and measuring inequality at the local level. This may result from the huge differences in population size among LAUs. Therefore, in line with the existing literature, our future research intends to find out the population thresholds where the relationship between the SOL (sum of lights) and income becomes statistically significant.
The second limitation is related to the absence of population data at LAU2 level (villages and towns) for each year. The existence of this data would have helped the authors to improve the NLDI for each year. We will continue to improve our future research by including other datasets such as the annual human settlements or population-grid dataset.
The third important limitation is due to the heterogeneity in the resolution of the satellite images used in the study. Until 2013 we employed lower resolution satellite images (1 kmp), while for the rest of the 2014-2018 period we used higher resolution satellite images (500 kmp). It is hard to overcome this limitation due to the unavailability of higher resolution images before 2013.

Conclusions
We demonstrated in our study that Earth observation solutions could play an important role in monitoring progress in the direction of SDG 10 (reduced inequalities). The NLDI calculated from night-time lights (NTL) satellite images proved to be a good proxy for the real time measuring of economic output, while the NLII calculated for the estimation of regional inequalities represents, to our knowledge, an absolute novum in inequality literature. The NLDI has high potential for estimating income and implicitly GDP, as evidenced by the high correlation coefficients. In this regard, the second-order polynomial model proved to be the best model in estimating income, compared to the linear and exponential regression model. The polynomial regression model brings about a significant improvement in the explained variation compared to linear regression.
Moreover, we successfully combined geospatial information and EO (from satellite) with modern data processing (MATLAB, GIS), thus offering an unprecedented opportunity for the realtime tracking of SDG 10. All these tools provided reliable information on the state of economic output Satellite images proved to be a good proxy in estimating economic output and regional inequalities at the subnational level.
One of the main limitations of this study is the statistically insignificant relationship between the NLDI and the income for all 3181 LAU1 (local administrative units: communes and cities). It indicates a low potential of the index in estimating income and measuring inequality at the local level. This may result from the huge differences in population size among LAUs. Therefore, in line with the existing literature, our future research intends to find out the population thresholds where the relationship between the SOL (sum of lights) and income becomes statistically significant.
The second limitation is related to the absence of population data at LAU2 level (villages and towns) for each year. The existence of this data would have helped the authors to improve the NLDI for each year. We will continue to improve our future research by including other datasets such as the annual human settlements or population-grid dataset.
The third important limitation is due to the heterogeneity in the resolution of the satellite images used in the study. Until 2013 we employed lower resolution satellite images (1 kmp), while for the rest of the 2014-2018 period we used higher resolution satellite images (500 kmp). It is hard to overcome this limitation due to the unavailability of higher resolution images before 2013.

Conclusions
We demonstrated in our study that Earth observation solutions could play an important role in monitoring progress in the direction of SDG 10 (reduced inequalities). The NLDI calculated from night-time lights (NTL) satellite images proved to be a good proxy for the real time measuring of economic output, while the NLII calculated for the estimation of regional inequalities represents, to our knowledge, an absolute novum in inequality literature. The NLDI has high potential for estimating income and implicitly GDP, as evidenced by the high correlation coefficients. In this regard, the second-order polynomial model proved to be the best model in estimating income, compared to the linear and exponential regression model. The polynomial regression model brings about a significant improvement in the explained variation compared to linear regression.
Moreover, we successfully combined geospatial information and EO (from satellite) with modern data processing (MATLAB, GIS), thus offering an unprecedented opportunity for the real-time tracking of SDG 10. All these tools provided reliable information on the state of economic output and economic regional inequalities, as well as their change over time. We believe that this combination can be successfully used in other spatial and national contexts where data on local tax incomes are available. The above-mentioned combination of methods and techniques is not place-bounded since satellite images are available for all countries. It would be very interesting to see if the relationship between regional economic development and the night-light satellite images is maintained in countries with advanced economies and where light pollution is reduced by modern technologies. Our future research will continue into this direction.
The analysis of the statistical relationship between the NLDI and demographic and socio-economic variables has shown a strong indirect relationship between the NLDI and local tax income and GDP per capita. The lack of relationship between the NLDI and geographical variables shows an aleatory spatial distribution of rich and poor areas. Multiple regression analysis using standardized coefficient values reinforces previous results, with income and GDP having the largest contribution in explaining the total variation. Author Contributions: All authors contributed equally to the research presented in this paper and to the preparation of the final manuscript. All authors have read and agreed to the published version of the manuscript.