1. Introduction
At the United Nations (UN) Sustainable Development Summit held in September 2015, the leaders of 193 UN member states expressed their adherence to the 17 Sustainable Development Goals (SDGs) of the 2030 Agenda which, as a global action plan, aims at reducing poverty, fighting inequalities and injustice, and protecting the environment by 2030 [
1]. In monitoring progress towards these goals, Earth observation (EO) solutions play an important role in providing information where national data are not available or are costly [
2,
3,
4,
5,
6,
7].
Night-time lights (NTL) satellite images have also been used in a variety of other studies, such as gross domestic product (GDP) estimation [
8,
9,
10,
11,
12,
13], economic decline detection [
14,
15], ship detection [
16], light pollution detection [
17,
18,
19], urban expansion monitoring [
20,
21,
22,
23], human well-being measurement [
24], built-up area mapping [
25,
26], modeling of electricity consumption dynamics [
27,
28,
29,
30,
31], and fire detection [
32].
The adoption of the 2030 Agenda with its 17 Sustainable Development Goals (SDGs) created a framework for a radical change in the use of geospatial and EO solutions: 60% of the 169 SDG-related targets and 232 indicators can be directly monitored with EO solutions [
4]. SDG 10 aims at reducing inequalities, which include, among other actions, empirical evidence production and monitoring the evolution of inequalities within and among countries. The monitoring of the latter is not difficult for most countries where national accounts and national statistical offices have been established [
33]. The difficulties are related to the measurement of sub-national regional inequalities, with two shortcuts: scarce statistical data and a considerable time delay in calculating regional GDPs. Our study links statistical and geospatial frameworks for improved monitoring and reporting on SDG 10. At the same time, to our knowledge, this is the first attempt to introduce EO solutions in measuring SDG 10 at the sub-national level.
We chose Romania as the study area for three reasons: it is one of the most unequal countries of the European Union (EU) [
34,
35,
36,
37,
38,
39,
40]; these regional inequalities have been generated in the last 20 years [
41,
42,
43]; and the country has started an economic development process from a low level, making it very suitable for the application of night-time light data from satellite imagery [
44]. However, the results are not limited to the area of study, as will be shown in the following parts of the article.
The literature on regional inequalities focused until recently on local tax income from household surveys and GDP per capita from national accounts, as a complex measure of economic development highlighting inter-regional differentials worldwide [
34,
45,
46,
47,
48,
49,
50,
51,
52,
53] or applying case studies in a Romanian context [
34,
35,
36,
42].
However, the above-mentioned economic indicators used to measure regional inequalities have several limitations, such that GDP is measured directly only at the national level, while at the regional level it is calculated indirectly with a two-year delay compared to the national one [
3,
54]. Compared to traditional statistical data, night-time lights satellite images have a number of advantages: regional inequality can be calculated locally in real time (instead of waiting two years for the indirect calculations from national accounts); they have higher temporal resolution; and they can be obtained free of charge. Lately, some studies have focused on the use of night-time light data in measuring regional inequalities [
54,
55,
56,
57,
58,
59]. Xu [
56] used population density and night-time lights (NTL) satellite images to measure regional inequality of public services at four different scales in China (national, economic regions, provinces, and prefectural cities) and their changes between 2005 and 2010. Zhou [
54] analyzed the socio-economic inequality at the subnational level in China, calculating three inequality indices based on the methodology used to calculate the Gini coefficient, the Theil index, and the Lorenz asymmetry coefficient. They found that the Gini and Theil indices at the provincial level were lower than the indices obtained at the county level, and thus obtained a positive statistical correlation between Visible Infrared Imaging Radiometer Suite (VIIRS) night-time radiance and GDP and population.
Although in the last decades a series of studies have dealt with the issue of regional inequalities, there is a large research gap focusing on subnational scales. The purpose of our study is to measure regional inequality in Romania, for the 1992–2018 period, at the county level (NUTS 3 units), using night-time lights (NTL) satellite images, in order to monitor progress in the direction of SDG 10 (reduced inequalities).
3. Methodology
Earth observation allowed us to calculate the Night Light Development Index (NLDI) which we used to measure regional inequality. This index is calculated based on the methodology for the calculation of the Gini coefficient, commonly used to measure regional inequality [
43,
48,
53,
67].
In this study, we intended to measure regional inequalities in Romania using the Night Light Development Index (NLDI), which can be regarded as a proxy variable in highlighting inter-regional differentials [
54]. The Gini coefficient is one of the most commonly used indices worldwide for measuring income inequalities among individuals or households in a country. It ranges from 0 to 10, 0 representing perfect equality and 10 perfect inequality. Using the Gini coefficient approach, first we calculated the Night Light Development Index (NLDI) at county level based on data on the spatial distribution of the population and the aggregate night-time light, both at the grid level. Next, we determined the Night Light Inequality Index (NLII) at the national level, based on the Night Light Development Index (NLDI) values at the county level, using the Gini coefficient methodology. The values of the NLII range from 0 to 10, where 0 indicates perfect equality and 10 perfect inequality. The large volume of data required the automation of the NLDI calculation; to this end, a script was developed in MATLAB, and later we integrated therein the gini.m script created by Lengwiler [
68]. Gini.m calculates the Gini coefficient (NLDI in our case) and the Lorenz curve using vectors of equal length and values greater than or equal to 0. NLDI values range from 0 to 1, the highly developed counties have low NLDI values and the least developed counties have high NLDI values [
55]. The Lorenz curve is the graphical representation of the distribution and it is a straight line in the case of perfect equality (
Figure 3).
because A + B = 0.5 (since the axes scale from 0 to 1).
where NLDI is the Night Light Development Index; A is the area between the perfect equality line and the Lorenz curve; B is the area under the Lorenz curve.
The NLDI is determined using the calculated area between the Lorenz curve and the axes (B). This area is calculated by computing the average of the left and right Riemann-like sums (rrapezoidal rule). These are called Riemann-like sums because they are calculated in points given by population values and not on a uniform grid (
Figure 3).
This area calculation method is commonly used in mathematical analysis to approximate a definite integral. The values of the f function over an interval are approximated by the average of the values at the left and right extremities. A simple calculation involves using the trapezoidal area formula:
where h is the height of the trapeze; b
1 and b
2 are parallel sides.
where the interval [
a,
b] is divided into n subintervals, each of length: Δ
x = (
b − a)/n. The points in the partition will then be:
a,
a + Δ
x,
a + 2 Δ
x, ….
a + (n − 1) Δ
x,
b.
Using the Gini coefficient approach, the Night Light Inequality Index (NLII) was determined at the county level based on the Night Light Development Index (NLDI) values. In addition, we compared the evolution of the NLII for 1992–2018 with the official household income Gini index measured and published by Eurostat [
64] in order to check the robustness of the NLII.
A large part of the literature on regional inequalities is focused on income and GDP per capita, as a complex measure of economic development highlighting inter-regional differentials worldwide. In order to validate the NLDI, we performed a correlation and regression analysis using the income data available at the county level.
Using typical regression notations/notions, we can specify the relation between income and the NLDI as follows:
where Y represents the dependent variable (in our case the estimated income); X represents the independent variable (representing the NLDI); ε is the residual variable; a, b, c are regression parameters estimated by applying the regression function, based on data series used to define two variables.
In order to find the best fitting (trend) line, we tested both the exponential function model (Equation (6)) and the polynomial model (Equation (7)). By substituting our variables, we ended up with the following equations:
where the NLDI represents the Nigh Light Development Index; a, b, c are regression parameters.
The classic measure of fit in a linear regression model is the R2 and its adjusted counterpart, R2a, which shows the validity of the chosen model for explaining the variation of Y (the percentage of income estimation explained by the NLDI). Adjusted R2 (R2a) is a coefficient of determination corrected with degrees of freedom that has the same meaning as R2.
Evaluation of the regression function’s capacity to estimate income is often based on the relative error (RE) or residuals (the difference between the observed y values and the predicted y values) and relative root mean square error (RRMSE). The lower the absolute value of the RE, the better the model fitting. Similarly, a lower RRMSE (which is derived from RE and the analyzed units) indicates an increased accuracy of the regression function (better fit of the model).
The Akaike information criterion (AIC) is a useful tool for the selection of models. It is based on the likelihood function and ranks regression models according to a score, where a lower value represents better results. The AIC is computed as:
where L represents the value attributed to likelihood and k is the number of estimated parameters. Hence, the smaller the value of the information criterion, the better the model.
The annual NLDI values were related to socio-economic variables (GDP, income), demographic variables (total population, net migration rate), and geographical factors (altitude, geographical position) using multiple regression.
Multiple regression also allows us to determine the overall model fit (variation explained) and the relative contribution of each of the predictors to the total variation explained. We used the SPSS software to calculate the regression parameters. The enter regression method was used at a 95% confidence level.
where a is the intercept; b
1, b
2 …b
7 are the unstandardized coefficients; ε is error; NLDI represents the Nigh Light Development Index; Alt is altitude, Long is longitude; Lat is latitude; Income is local tax income; GDP—is gross domestic product per capita; NMR is net migration rate; Pop is population.
4. Results and Discussion
Figures showing the NLDI (
Figure 4) illustrate the existing economic development inequalities.
For classification of the NLDI, we used the natural breaks (Jenks) classification method which aims to ensure natural grouping in the data by minimizing variance between classes and maximizing variance across them [
33]. The analysis conducted on the NLDI values for each year taken into consideration shows a downward trend until the economic crisis, followed by a short increase. All this illustrates a growing polarization between the most and least developed counties.
Counties with a higher development level and an implicitly lower value of the NLDI are concentrated in the Bucharest capital area, the southern and north-western parts of the historical region of Transylvania (Sibiu (SB), Brașov (BV), Cluj (CJ) counties), and the western border region (Timiș county (TM)) along with areas close to the Black Sea Coast (Constanta county (CT)). These counties have the highest level of human capital and foreign direct investments, and the best accessibility and productivity in Romania [
35,
36,
37,
38,
39,
40].
In contrast, the lowest levels of economic development reflected through the NLDI can be found in the northern, eastern, and south-western parts of Romania (
Figure 4), which are among the poorest regions of the EU.
Even though the geographical position and proximity are important factors in development, we cannot unambiguously identify a duality of development between the west and the east or the north and the south, the country being similar to a mosaic of rich and poor areas, reflecting high regional inequalities (
Table 3).
The spatial distribution of NLDI values are influenced by a number of factors such as socio-economic variables, demographic variables, and geographical factors. The Pearson correlation coefficient was calculated (
Table 3) to highlight the relationship between the NLDI and these factors.
The correlation analysis indicates the preservation of the value and the sign of the relationship between the NLDI and the parameters, in all the years subject to analysis. Thus, the relationship is inverse in terms of development indicators, as the NLDI is higher in the counties with a lower local tax income and GDP per capita. Similar results at the subnational scale in China were obtained by Xu [
56] and Zhou [
54] using VIIRS night-time radiance. In the case of demographic variables, the NLDI has an indirect relationship with the county population and the positive net migration rate, and a direct relationship with negative net migration rate. There is nothing unusual in these findings: higher levels of economic development are generally related to the positive market effects generated by a larger population concentration. These areas are also attractive for migration due to the higher number of jobs, higher wages, and good quality of life. All these generate positive migration balances. The relationship between the NLDI and the geographical factors indicates higher NLDI values in the counties with a higher average altitude, and an increase of the NLDI values from south to north and from east to west. However, the very low correlation values show that there is no spatial clustering for these parameters (
Figure 4).
The results show statistically significant correlations with the variables that reflect the development level and the demographic variables, and non-significant correlations with the geographical variables; except for 2014 and 2016, where we can see a statistically significant direct relationship with latitude. A possible explanation would be the different resolution of the satellite images (VIIRS compared to DMSP), but also an increase of economic development between the north and the south especially after the economic crisis (
Figure 4). The relatively small correlation value (0.47) indicates that there is not a very clear differentiation of the development level in the north-south direction, due to the presence of less developed areas in the southwestern part of the country.
Given the low values of the Pearson correlation coefficients for the geographical factors, we further analyzed their relative contribution to the estimation of the NLDI using multiple linear regression. The regression was calculated for the most recent year (2016) for which there is complete data (
Table 4). Multiple regression analysis explains ≈80% of the total variability (adjusted R
2 = 0.798) of the NLDI for the year under study.
A clearer picture of the results of multiple regression is provided by the relative contribution of each predictor to the total variation explained. The analysis of the standardized coefficients shows a relatively balanced relationship between the socio-economic variables and the demographic variables in the multiple regression results (
Table 4). In general, geographical variables had a smaller contribution to the result of the multiple regression, indicating a lower impact on the NLDI values. It should be noted that the income and GDP variables have a greater impact in the calculation of the index than population and net migration rate.
Table 3 shows that the NLDI correlates best with income, which indicates a high potential of the index in estimating income at the county level. In this context, we performed a regression analysis to identify which type of regression was best suited for income estimation. The analysis was conducted only for the 2008–2018 period because there was no income data for 1992. In this analysis we used several criteria to select the best model: total variation explained (R
2), error (RMSE), and AIC (
Table 5).
The best results were obtained in the second-order (k = 2) polynomial model (
Figure 5), for all the criteria considered (
Figure 5a–f). The differences are small between the polynomial model and the exponential one, but if we consider R
2, we can see a significant improvement compared to the linear model. If we use this model, we can see an improvement of the explained variation by 6% in 2016 and 14% in 2012, compared to linear regression. Similar results were obtained by Dai [
12] who, at the provincial scale, highlighted a higher accuracy of the polynomial model compared to the linear one.
Satellite images proved to be a good proxy in estimating economic output and regional inequalities at the subnational level.
One of the main limitations of this study is the statistically insignificant relationship between the NLDI and the income for all 3181 LAU1 (local administrative units: communes and cities). It indicates a low potential of the index in estimating income and measuring inequality at the local level. This may result from the huge differences in population size among LAUs. Therefore, in line with the existing literature, our future research intends to find out the population thresholds where the relationship between the SOL (sum of lights) and income becomes statistically significant.
The second limitation is related to the absence of population data at LAU2 level (villages and towns) for each year. The existence of this data would have helped the authors to improve the NLDI for each year. We will continue to improve our future research by including other datasets such as the annual human settlements or population-grid dataset.
The third important limitation is due to the heterogeneity in the resolution of the satellite images used in the study. Until 2013 we employed lower resolution satellite images (1 kmp), while for the rest of the 2014–2018 period we used higher resolution satellite images (500 kmp). It is hard to overcome this limitation due to the unavailability of higher resolution images before 2013.
5. Conclusions
We demonstrated in our study that Earth observation solutions could play an important role in monitoring progress in the direction of SDG 10 (reduced inequalities). The NLDI calculated from night-time lights (NTL) satellite images proved to be a good proxy for the real time measuring of economic output, while the NLII calculated for the estimation of regional inequalities represents, to our knowledge, an absolute novum in inequality literature. The NLDI has high potential for estimating income and implicitly GDP, as evidenced by the high correlation coefficients. In this regard, the second-order polynomial model proved to be the best model in estimating income, compared to the linear and exponential regression model. The polynomial regression model brings about a significant improvement in the explained variation compared to linear regression.
Moreover, we successfully combined geospatial information and EO (from satellite) with modern data processing (MATLAB, GIS), thus offering an unprecedented opportunity for the real-time tracking of SDG 10. All these tools provided reliable information on the state of economic output and economic regional inequalities, as well as their change over time. We believe that this combination can be successfully used in other spatial and national contexts where data on local tax incomes are available. The above-mentioned combination of methods and techniques is not place-bounded since satellite images are available for all countries. It would be very interesting to see if the relationship between regional economic development and the night-light satellite images is maintained in countries with advanced economies and where light pollution is reduced by modern technologies. Our future research will continue into this direction.
The analysis of the statistical relationship between the NLDI and demographic and socio-economic variables has shown a strong indirect relationship between the NLDI and local tax income and GDP per capita. The lack of relationship between the NLDI and geographical variables shows an aleatory spatial distribution of rich and poor areas. Multiple regression analysis using standardized coefficient values reinforces previous results, with income and GDP having the largest contribution in explaining the total variation.