Can Nighttime Satellite Imagery Inform Our Understanding of Education Inequality?

: Education is a human right, and equal access to education is important for achieving sustainable development. Measuring socioeconomic development, especially the changes to education inequality, can help educators, practitioners, and policymakers with decision- and policy-making. This article presents an approach that combines population distribution, human settlements, and nighttime light (NTL) data to assess and explore development and education inequality trajectories at national levels across multiple time periods using latent growth models (LGMs). Results show that countries and regions with initially low human development levels tend to have higher levels of associated education inequality and uneven distribution of urban population. Additionally, the initial status of human development can be used to explain the linear growth rate of education inequality, but the association between trajectories becomes less signiﬁcant as time increases. This presents an approach that combines multi-source data (including distribution, human settlement, and artiﬁcial light data monitored from space) to assess changes in trajectories of human development and education inequality at a national level from 1990 to 2010. This research has utilized nighttime light (NTL) data collected by the Defense Meteorological Satellite Program (DMSP) and human settlement data from the Global Human Settlement Layer (GHSL) to measure human development and evaluate its association with education inequality. Many researchers have demonstrated that NTL data can be used to assess regional economic we assess


Introduction
Assessing our socioeconomic development in a frequent, rapid, and accurate manner is important for achieving the United Nations' Sustainable Development Goals (SDGs) on various national and global scales [1]. The United Nations' 2030 Agenda for Sustainable Development was developed to transform our world by urging countries to solve current development challenges related to education, poverty, inequality, climate change, etc. [2][3][4][5]. Recently, many countries and regional organizations have made significant progress toward the achievement of these goals. Nevertheless, due to the complexity of socioeconomic development, many countries are still suffering from these problems, and some of the actions and policies are not implemented in an effective and efficient way.
To support the 2030 Agenda for Sustainable Development, it is important to monitor and evaluate the current socioeconomic development status to provide scientific evidence for facilitating the policy-and decision-making processes. Measuring socioeconomic development, especially the status of education inequality, in a timely and accurate manner can help educators, practitioners, scientists, and policymakers compare and evaluate a variety of key education indicators. Measuring education inequality, for example, can help us better evaluate the fairness and effectiveness of our education systems and the processes of current educational development [6]. Since education is the foundation of development and growth, measuring socioeconomic data related to education inequality also will help countries achieve many of the SDGs including stable economic growth [7][8][9], eradication of poverty [10,11], reduction of inequality and exclusion [12,13], and achievement of sustainable development [14] in the long-run. This paper presents an approach that combines multi-source data (including population distribution, human settlement, and artificial light data monitored from space) to assess changes in trajectories of human development and education inequality at a national level from 1990 to 2010. This research has utilized nighttime light (NTL) data collected by the Defense Meteorological Satellite Program (DMSP) and human settlement data from the Global Human Settlement Layer (GHSL) to measure human development and evaluate its association with education inequality. Many researchers have demonstrated that NTL data can be used to assess regional inequality and economic development [15][16][17]. Studies also have shown that NTL is capable of capturing regional uneven development [18][19][20]. Therefore, we use DMSP NTL data to estimate human development [21] and assess the associations of growth patterns with education inequality.
Education is a human right, and equal access to education is not only crucial for an individual's well-being, but also is essential for eradicating poverty, transforming our society, ensuring long-term prosperity for all, and achieving sustainable development. Many researchers have proposed that ensuring equal access to education can be achieved through distributing education resources more equally [6]. Therefore, it is important to develop indicators that can measure education inequality so we can monitor the changes to education resource allocation status over time. Nevertheless, unlike many socioeconomic indicators (e.g., the Gross Domestic Product) that are developed based on a series of sophisticated accounting and statistical methods, it is difficult to measure education inequality by assigning a monetary value to education accessibility or student achievement and attainment. Some studies have demonstrated the usage of Gini coefficients for measuring education inequality. An Education Gini (EG) index [6], for example, is developed based on education attainment of the concerned population using the following steps: where E L is the education Gini, µ is the mean years of schooling, p i and p j are the percentages of the population with certain levels of schooling, y i and y j are the years of schooling at different education attainment levels, and n is the number of levels of the attainment data for the concerned population. Thomas et al. [6] also have adopted the Lorenz curve to calculate an education Gini based on the cumulative proportion of the population with certain years of schooling, which is similar to the calculation of an income Gini. Generally, although different studies have proposed different approaches to education Gini calculation, an education Gini is mainly derived based on the proportion of the population with various education attainment levels.
Recently, many scientists also have incorporated multi-source data to enhance model performance for evaluating various socioeconomic indicators that are related to human development. There are many difficulties associated with collecting traditional socioeconomic data for measuring human well-being. Accurate information about the distribution of the population, settlements, and even wealth are not available for many less developed regions, for example, and sometimes these data are of poor quality [22]. Nevertheless, remote sensing technology and satellite imagery can help us observe, explore, and evaluate the status of human development on the Earth's surface [23]. Hence, geospatial data can be an alternative way for scientists to study and monitor human activities in a timely, consistent, and affordable way. NTL data is widely used for estimating and evaluating socioeconomic activities, for instance, since it captures the artificial light at night [24][25][26]. Based on remotely sensed NTL data, for example, Sutton et al. [27] estimated global marketed and non-marketed economic value from classified satellite images. Elvidge et al. [28] produced a global poverty map on a subnational scale based on population and DMSP NTL data. Therefore, the subnational data generated from NTLs can greatly help scientists measure human activities on various spatial scales.
Many scientists also have adopted Gini concepts for calculating other socioeconomic indexes based on the Lorenz curve. Elvidge et al. [21], for example, produced the Nighttime Light Development Index (NLDI) based on DMSP NTL data and LandScan population density data to measure human development. NLDI for each country is calculated based on the Lorenz curve produced from the cumulative proportion of the NTL and the cumulative proportion of the population. Generally, results show that developed countries tend to have low NLDI values and less developed countries have high NLDI values. It also shows that NLDI has a strong correlation with other indicators like the Human Development Index (HDI), poverty rate, and the proportion of the urban population. Therefore, the NLDI can be an alternative way for measuring human development using NTL data. Song et al. [29] also have used the Spatial Lorenz Curve (SLC) and Gini coefficients to measure land use changes based on an unsupervised land use classification method with cloud-free Landsat Thematic Mapper (TM) images. Similar to the NLDI, the SLC is calculated based on the cumulative proportion of land use and the cumulative proportion of land. Therefore, these studies show that there is great potential for scientists to utilize geospatial data to monitor the allocation of resources, the distribution of population, and the different levels of development on various spatiotemporal scales. Added to that, the availability of geospatial data can help us establish a consistent, objective, and globally applicable method for characterizing and measuring education inequality that are caused by development problems like income inequality, urbanization, and resource allocation.
This research utilizes multi-source data to evaluate human development levels and the uneven distribution of the urban population on various spatiotemporal scales to explore development trajectories and patterns of human development and education inequality. The rest of this paper is organized as follows. Section 2 describes data processing procedures and the development of latent growth models (LGMs) for measuring different development trajectories and patterns. Section 3 presents the results from LGMs to evaluate the growth patterns for each factor included in this study. Section 4 discusses the associations between trajectories. Finally, Section 5 summarizes the results and draws conclusions.

Gini Coefficients for Human Development and Education
During this study, we analyze the relationship between an Education Gini (EG), Nighttime Light Development Index (NLDI), and population distribution at a national level in 1990, 2000, and 2010. The NLDI for each county is calculated as a proxy for human development [21]. Moreover, an urban population Gini (UG) index also is constructed based on similar procedures [21,29] to measure the levels of urbanization with Lorenz curves. A higher UG value represents higher levels of rural-urban population distribution inequality which, in turn, indicates that less of the population are likely to benefit from improved economic activity, better shared infrastructure, and higher standards of living due to urbanization [30][31][32]. The datasets used in this study are described in Table 1. This study utilizes the Defense Meteorological Satellite Program nighttime light (DMSP NTL) data ( Figure 1a) and the Global Human Settlement Layer (GHSL) population data ( Figure 1b) to construct an NLDI and UG for countries and regions around the world. Due to the data availability issues, population data from 2015 (rather than 2010) and DMSP NTL data from 1992 (rather than 1990) are used to calculate these indexes.  Based on the population distribution, NTL intensity, and human settlements, the Gini coefficients for the NLDI and urbanization are calculated as follows: where G is the Gini coefficient for the NLDI or urbanization, N i is the cumulative proportion of the NTL (for calculating NLDI) or the urban population (for calculating an urbanization Gini) in the subnational entities, and P i is the cumulative proportion of the population in the same subnational entities. The NLDI and UG at the national level are constructed using level 0 and 1 administrative units. Level 0 represents national-level administrative boundaries, and level 1 represents state-and provincial-level boundaries. To construct the Lorenz curve for each country based on the cumulative proportion of the NTL and population, this study uses the level 1 subdivisions' administrative boundary layer (state or province) to calculate the sum of the population and NTL within each subdivision. Based on the cumulative percentage of the NTL and population data, this study calculates the NLDI value for each country for that corresponding year. The subnational NLDI at level 1 subdivisions is calculated based on the level 2 subdivisions' data using the same procedures. After matching and filtering the data (i.e., based on the ISO3 country code), a total number of 141 countries and regions from 1990, 2000, and 2010 are included in this study for trajectory analysis to construct latent growth models (LGMs) [34] to study the trends of the EG, UG, and NLDI changes (see Appendix A) on a national scale.

Development of Associative Latent Growth Models (LGMs)
To better analyze the developmental trajectories of an Education Gini (EG), Nighttime Light Development Index (NLDI), and Urban Population Gini (UG) for each country over time, an unspecified associative latent growth model (LGM) is developed due to its greater capacity to (1) test the efficiency and adequacy of the hypothesized growth structure, especially the non-linear growth curve [35][36][37]; (2) integrate a time-variant and time-varying covariate [38] so as to estimate their effects on developmental trajectories; (3) identify growth patterns based on the estimations of individual change, intra-individual differences from individual change, and within-group error [39]. More importantly, the associative LGMs allow researchers to explore interrelations among parameters for individual differences [40][41][42]. This model, in other words, is specified to investigate the synchronous model's correlation coefficients, which are the correlations of trajectories between factors that are included in this study [38].
It is suggested that the parallel process of LGM analysis methodology can be implemented to test the research hypotheses [43]. First, three separate unconditional (i.e., without covariates) single-factor polynomial LGMs are constructed and evaluated for the NLDI, UG, and EG, respectively. Second, these three single-factor LGMs are examined based on their model fits. Three single-factor LGMs then are combined to construct the unconditional three-factor associative LGM to further explain the associations between the growth parameters of these three major factors. Third, this study evaluates the model fits of the associative LGM and examines the growth trajectories between the NLDI, UG, and EG by interpreting model fit indices and values of growth parameters.

Latent Growth Model (LGM) Configuration Procedures 2.3.1. Unconditional Latent Growth Model (LGM) Specification for All Factors
MPlus software Version.8 [44] is used to specify, configure, and estimate the latent growth models (LGMs). To test and determine the growth shape of the Nighttime Light Development Index (NLDI), a single-factor polynomial LGM with a quadratic growth factor is specified. Since each major factor has been measured 3 times (i.e., 1990, 2000, and 2010), factor loadings of the latent intercept are all set to 1, and those of the linear latent slope are set to 0, 1, and 2, respectively. Moreover, the factor loadings for the quadratic growth factor are set to 0, 1, and 4 [45]. Additionally, the covariances between the latent intercept, slope, and quadratic factors are set to be freely estimated. To ensure that the model is overidentified with positive degrees of freedom, the error variances and mean structures of the latent factors are set to 0.
Similar to the specification of a single-factor polynomial model with a quadratic growth LGM for the NLDI, the model specifications and constraints for an Urban Population Gini (UG) and Education Gini (EG) are set with identical configurations as the LGM for the NLDI for the purpose of determining the growth shape and model identification.

Unconditional Three-Factor Associative Latent Growth Model (LGM)
The unconditional associative latent growth model (LGM) was developed by combining three separate single-factor polynomial LGMs to evaluate the associations between the latent growth factors. To ensure that the model was overidentified, the residual variances for 9 time points were set to 0 (i.e., t1-t9 since there are 3 factors, and each factor has 3 time points), and the mean structures for the growth factors also were set to 0. When the three-factor unconditional associative LGM shows an acceptable model fit, further analyses will be conducted to interpret covariances between growth parameters within and across latent factors.

Model Estimation and the Fit Indices
Multiple fit indices are used in evaluating the latent growth models (LGMs), including Chi-square test statistics, a comparative fit index (CFI), a Tucker-Lewis index (TLI), a root mean square error of approximation (RMSEA), and a standardized root mean square residual (SRMR), which are the common fit statistics used for assessing structural equation models [46]. The thresholds for each fit index to determine if a model is acceptable are as follows: (1) it is noted that RMSEA values ranging from 0.08 to 0.10 indicate a mediocre fit [47]. Moreover, they strongly argued that the RMSEA values alone could not accurately determine the model fit, and it is reasonable to combine RMSEA values with confidence intervals. Therefore, the p value should be greater than 0.50 to indicate an acceptable model fit for testing closeness of fit with a 90% confidence interval [48]; (2) Hu et al. [49] suggested that values of the CFI and TLI greater than 0.95 can indicate an acceptable model fit; (3) a smaller value of the SRMR indicates a better model fit and a SRMR value of 0 indicates a perfect model fit [50].

Model Parameter Estimation and Interpretation
Regarding either unconditional or associative latent growth models (LGMs), the variances of intercepts indicate the differences of countries on human development and educational status at the baseline. The variations in latent growth factors (such as the slope and quadratic rates of change) can indicate differences of individual countries in the probability of progressing in a linear or quadratic rate of change over time. Moreover, in the associative model, the direction and magnitude of the covariances among growth factors can indicate the directions and strengths of the relationships between the growth trajectories for human development and education factors.

Model Configuration Results
Separate unconditional single-factor polynomial latent growth models (LGMs) are constructed for each factor. Shown in Table 2, the single-factor polynomial LGMs fit the Education Gini (EG) and Urban Population Gini (UG) adequately. However, for the Nighttime Light Development Index (NLDI), although the model does not yield an acceptable fit (root mean square error of approximation (RMSEA) = 0.335), the singlefactor polynomial LGM with a quadratic growth parameter still demonstrates a better fit over those with a constant and linear growth. Therefore, all factors show quadratic change patterns. During the next step, a three-factor associative LGM is constructed to explore the associations of developmental trajectories between factors, following the model configuration procedures described in Section 2. Table 2. Model fit indices including root mean square error of approximation (RMSEA), comparative fit index (CFI), Tucker-Lewis index (TLI), and standardized root mean square residual (SRMR) for latent growth models (LGMs) based on Education Gini (EG) and Urban Population Gini (UG), and Nighttime Light Development Index (NLDI).

Associative Growth Trends
Based on the results in Table 2, it is found that the three-factor associative latent growth model (LGM) yields an acceptable model fit for the dataset used in this study. Therefore, for the rest of Section 3, we use this associative model to investigate the interrelationships between the growth patterns of factors. First, to interpret how each factor is changing over time, statistically significant growth parameter estimates within each factor are presented as follows: (1) for the Nighttime Light Development Index (NLDI), the association between its initial status and linear slope growth is statistically significant (Covariant (Cov.)1 = −0.435, Standard Error (S.E.) = 0.068, p < 0.001), and the association between the linear slope growth and quadratic growth also is statistically significant (Cov2 = −0.869, S.E. = 0.021, p < 0.001). Countries with lower NLDI values tend to have a higher linear growth but a lower quadratic growth, in other words. However, countries with higher NLDI values show a lower linear growth but a higher quadratic growth. (2) Regarding the Education Gini (EG), the association between the initial EG status and the linear slope growth is statistically significant (Cov3 = -0.307, S.E. = 0.076, p < 0.001), indicating that countries with a greater initial education inequality tended to have a slower linear rate of change. The association between the linear slope growth factor and the quadratic growth factor also is statistically significant (Cov4 = -0.845, S.E. < 0.024, p < 0.001), which means that countries with a higher linear rate of change to education inequality tend to have a slower quadratic rate of change. Therefore, the EG exhibits a similar growth pattern to the NLDI where countries with higher EG values at the initial stage demonstrate a slower linear growth, but a higher quadratic growth. Whereas, for countries that have lower EG values at the initial stage, they tend to have a higher linear growth, but a lower quadratic growth. (3) Regarding the Urban Population Gini (UG), the linear slope and quadratic growth factor covary significantly (Cov5 = -0.978, S.E. = 0.004, p < 0.001), indicating that countries with a higher linear growth in population show a slower quadratic growth.
The associative LGM allows us to explore growth parameters across factors that are statistically significant (Table 3)  The LGM trajectory analysis results also are reflected in Figure 2 (plotted based on data in Appendix A). Figure 2 shows that both the EG and NLDI experience downward trends from 1990 to 2010, which means that most of the countries included in this study have less education inequality and higher human development levels. Nevertheless, the urbanization Gini decreases from 1990 to 2000, and then increases from 2000 to 2010. Therefore, there is a greater uneven urban population distribution in recent years. During 1990, there were positive associations between the initial status of the EG, UG, and NLDI. This indicates that the countries with initially lower levels of human development also had a higher education inequality and a greater uneven urban population distribution. Considering 1990-2000, all factors experienced decreasing trends, and the EG demonstrated a higher decreasing rate. Considering 2000-2010, the quadratic change rates of the UG and NLDI showed a less significant change, whereas the quadratic change rate of the EG still demonstrated a decreasing trend.

Discussion
Although nighttime light (NTL) is not measuring human activities directly, results from previous studies have shown that NTL is capable of estimating socioeconomic development accurately on different spatial scales [27,28]. Therefore, we calculate the Nighttime Light Development Index (NLDI), Urban Population Gini (UG), and Education Gini (EG) at the country level based on the Defense Meteorological Satellite Program (DMSP) NTL, and the Global Human Settlement Layer (GHSL) population distribution. When analyzing the results from the associative latent growth model (LGM), we are able to identify the different growth trajectory patterns across multiple years, which can further inform us about the associations between development and education inequality.
Considering 1990-2010, we see a significant drop in education inequality. Considering 1990-2000, that drop is accompanied by similar drops in the NLDI (related to Human Development). However, from 2000 to 2010 the gains in the NLDI have ceased while improvements to educational inequality have continued. This bifurcation raises some interesting questions. Theory suggests that human development will correlate with higher levels of education, which appears to be true from 1990 to 2000 [51]. Therefore, those trends lead to a series of questions that need to be explored: (1) Is the departure from these correlated trajectories due to exogenous or endogenous forces? (2) Could the departure be related to fundamental resource constraints such as the availability of adequate food, water, and energy? (3) Will improved educational outcomes occurring simultaneously with slowed changes to human development foster increased levels of social unrest?

Conclusions
Here, we analyzed the trajectories of human development, urban population distribution, and education inequality using multi-source data on multiple spatiotemporal scales. Generally, the overall trend for human development levels is increasing and for education inequality is decreasing in most of the countries. However, there is a greater uneven urban population distribution over time. Different development patterns are identified through latent growth models (LGMs). To provide an example, (1) countries with low initial human development levels tend to have greater associated education inequality; (2) countries with higher initial human development levels tend to show higher linear and lower quadratic rates of changes in human development over time; (3) education inequality changes show a stronger association with the trajectories of urban population distributions than those of human development levels. To be more specific, countries with a greater initial education inequality are associated with a slower linear rate of change in the uneven distribution of the urban population. However, as time increases, the countries with a greater initial education inequality also are associated with a greater quadratic rate of change in the uneven distribution of the urban population; (4) however, the growth patterns of the human development levels and education inequality show less significant associations.
It has been demonstrated that the Defense Meteorological Satellite Program (DMSP) nighttime light (NTL) can support the estimation of socioeconomic data, especially at the country level, as some of the outlier effects are minimized with data aggregation [52]. Nevertheless, due to its own limitations, it may not be able to capture the human activities at smaller regional levels (e.g., city or town levels). Therefore, there is a potential for using the Visible Infrared Imaging Radiometer Suite (VIIRS) NTL data for estimating socioeconomic development in the future. VIIRS has outperformed DMSP in many ways, including its better resolution and higher sensitivity for capturing artificial lights [53], and the results derived from VIIRS NTL are more accurate [54,55]. Thus, VIIRS can help us better capture the spatial heterogeneity of economic development on a finer scale (e.g., a provincial level). VIIRS data also can help us better assess and explore disparities in education not only across countries but between urban and rural areas within countries and regions. Accompanying more accurate subnational socioeconomic data, there is a potential for us to develop advanced models (e.g., multi-level models) to capture the within-cluster and between-cluster variations to better analyze education disparities.
Upcoming, there are several important steps that can take this research to the next level: (1) using more accurate education Gini data to estimate education inequality as the current data is developed based on a few indicators and may not reflect the true education inequality on various scales; (2) collecting more historical data, including socioeconomic data and geospatial data to monitor and forecast education inequality changes to build LGMs with greater complexity to characterize the commonalities of trajectories; (3) developing suitable statistical models such as hierarchical linear models to cluster countries and their subnational entities in terms of their levels of development to better compare intra-group growth patterns; (4) using the VIIRS NTL data for future studies.