Integrating Nighttime Light and Household Survey Data to Monitor Income Inequality: Implications for China’s Socioeconomic Sustainability

Zhuo, Li; Wu, Qiuying; Guo, Siying

doi:10.3390/su18020734

Open AccessArticle

Integrating Nighttime Light and Household Survey Data to Monitor Income Inequality: Implications for China’s Socioeconomic Sustainability

by

Li Zhuo

^1,2

,

Qiuying Wu

¹ and

Siying Guo

^3,*

¹

School of Geography and Planning, Sun Yat-sen University, Guangzhou 510006, China

²

Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519000, China

³

Guangzhou Urban Planning & Design Survey Research Institute, Guangzhou 510060, China

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(2), 734; https://doi.org/10.3390/su18020734

Submission received: 8 December 2025 / Revised: 2 January 2026 / Accepted: 8 January 2026 / Published: 10 January 2026

Download

Browse Figures

Versions Notes

Abstract

Accurate monitoring of income inequality is critical for sustainable socioeconomic development and realizing the United Nations Sustainable Development Goals (SDGs). However, assessing inequality for counties continues to be challenging because of the high cost of household surveys and the limited accuracy of traditional nighttime light (NTL) proxies. To address this gap, we develop the Distribution Matching-based Individual Income Inequality Estimation Model (DM-I3EM), which integrates NTL data with household surveys. The model employs a three-stage workflow: logarithmic transformation of NTL data, estimation of Gini coefficients through Weibull distribution fitting, and selection of region-specific regression models, enabling high-resolution mapping and spatiotemporal analysis of county-level income inequality across China. Results show that DM-I3EM achieves superior performance, with an R² of 0.76 in China’s Eastern region (outperforming conventional NTL-based methods, R ≈ 0.5). By overcoming the spatiotemporal gaps of survey data, the model enables full-coverage estimation, revealing a regional divergence in income inequality across China from 2013 to 2022: inequality is intensifying in northern and western counties while stabilizing in the developed southern coastal regions. Furthermore, spatial agglomeration of inequality has strengthened, particularly in coastal urban clusters. These findings highlight emerging risks to socioeconomic sustainability. This study provides a robust, replicable framework for estimating inequality in data-scarce regions, offering policymakers actionable evidence to identify high-risk areas and design targeted strategies for advancing SDG 10 (Reduced Inequalities).

Keywords:

nighttime light (NTL); income inequality; distribution matching; sustainable development; county-level

1. Introduction

Income inequality remains a pivotal socioeconomic challenge globally. The 2022 World Inequality Report notes that the top 10% of earners worldwide capture over 50% of total income. Although the gap in China between the top 10% and bottom 50% is narrower than that in the U.S. and India, it still exceeds the average among European countries [1]. High income inequality undermines individuals’ motivation to pursue success [2], intensifies social stratification [3], and shortens cycles of sustained economic growth [4], thereby hindering regional sustainable development. Timely and accurate measurement of income inequality is essential for designing targeted policies to narrow disparities and advance common prosperity, in line with SDG 10 of the United Nations’ Sustainable Development Goals [5].

Current approaches to assessing income inequality predominantly rely on micro-level surveys, including the China Family Panel Studies (CFPS) [6], the European Union Statistics on Income and Living Conditions (EU-SILC) [7], and the World Bank’s Povcalnet database [8]. These methods calculate disparities with indices like the Gini coefficient and the Theil index [9,10]. While survey-based estimates offer granular insights, they suffer from critical limitations. High implementation costs restrict spatiotemporal coverage, sampling variability distorts results, and missing data or underreporting of income by high-income households further erodes data reliability [11,12,13].

In contrast, nighttime light (NTL) data offer distinct advantages, including enhanced objectivity, reduced acquisition cost, timely updates, long temporal coverage, broad spatial coverage, higher spatial resolution, and insensitivity to changes in administrative boundaries [14,15,16,17]. A strong association between NTL intensity and socioeconomic activities has been observed in previous studies, making NTL a valuable proxy for quantifying economic dynamics, particularly in underdeveloped regions where reliable socioeconomic statistics are scarce [18,19,20]. Consequently, NTL remote sensing has been widely applied in socioeconomic research, such as poverty prediction [21,22], urban economic vitality assessment [23], GDP estimation [24], and urban growth or shrinkage detection [25,26].

While relatively few studies have leveraged NTL for inequality estimation, the data’s fine spatial resolution and long temporal coverage underscore its considerable potential for capturing regional income disparities [27,28]. For example, Mirza et al. [29] compute grid-cell average NTL intensity Gini coefficients to estimate household income inequality across countries, reporting a positive correlation (R ≈ 0.5) between NTL-derived and survey-based income inequality indices. Weidmann et al. [30] establish buffer zones around Demographic and Health Surveys (DHS) sample clusters, finding that a 5 km radius optimally represents household income inequality via NTL intensity within the zones. Despite these advances, several fundamental challenges remain. First, existing studies use spatial differences in NTL as a proxy for income inequality. However, this approach captures variation across spatial units rather than differences among individuals. Its accuracy is limited and depends on the degree of spatial segregation among income groups, which varies widely across regions with different levels of economic development. Second, the empirical relationship between NTL intensity distribution patterns and income distribution patterns is still insufficiently understood. Most existing studies rely on simple spatial correspondence, without establishing a rigorous framework to interpret how the distribution of light intensity reflects the underlying income structure. Finally, the accuracy of verification remains uncertain because survey-based indicators of income inequality, often used as reference data, are frequently underreported. These reporting biases result in significant distortions in the benchmarks used to evaluate NTL intensity estimates.

To address these gaps, we propose a comprehensive methodological framework. First, we develop a Distribution Matching-based Individual Income Inequality Estimation Model (DM-I3EM) that synergizes NTL data with household income survey data. Second, we refine the Gini coefficient calculation through Weibull distribution fitting, establishing a robust quantitative correspondence between NTL intensity distribution and household income distribution. Third, we systematically compare four regression models, including polynomial regression (PR), generalized additive models (GAM), random forest (RF), and Gaussian process regression (GPR), to identify region-specific optimal matching strategies across economically heterogeneous regions. This approach greatly advances our understanding of the spatiotemporal patterns of income inequality, providing critical technical and data support for formulating targeted regional coordinated development policies and achieving common prosperity.

The following sections are arranged as described: Section 2 presents an overview of the study area, with a specific focus on China, and details the datasets employed as well as the model construction process. Section 3 presents model performance metrics and results of income inequality estimation. Section 4 presents a discussion of the results and proposes directions for future research. Section 5 summarizes the conclusions.

2. Materials and Methods

2.1. Study Area

This study examines China as the research context (Figure 1). Located in East Asia along the western Pacific coast, the country covers around 9.6 million square kilometers and ranks as the world’s third-largest by land area. Spanning multiple geographical zones and economic belts, China exhibits pronounced regional disparities in development, making it a representative case for research on income inequality. Since the advent of reform and opening-up, China’s economy has grown rapidly, with its global GDP share increasing from 2% to 15%, establishing it as the world’s second-largest economy [31]. However, alongside economic development and rising income levels, income inequality has become increasingly prominent. According to official figures from the National Bureau of Statistics, China’s Gini coefficient has consistently exceeded the international alert threshold of 0.4 since 2000, reaching 0.467 in 2022. Such disparities not only hinder the transformation of economic structure but also threaten social stability and impede the realization of common prosperity, a core goal of China’s development agenda [32]. Against this backdrop, developing an income inequality estimation model tailored to China is critical for narrowing the wealth gap and advancing common prosperity.

2.2. Data Source

Two main datasets employed in this research are presented in Table 1: the NPP-VIIRS Annual Version 2 Nighttime Lights (VNL V2) product (2013–2022) and household micro-survey data from the China Household Income Project (CHIP). The former serves as the primary remote sensing input for the inequality estimation model, while the latter is used to provide reference data for model calibration and validation.

2.2.1. NPP-VIIRS Annual VNL V2 Data

The VNL V2 product is processed from observations of the National Polar-orbiting Partnership-Visible Infrared Imaging Radiometer Suite Day/Night Band (NPP-VIIRS DNB), a global Earth observation sensor jointly produced by the National Oceanic and Atmospheric Administration (NOAA) and the National Aeronautics and Space Administration (NASA). The DNB operates within the 0.5–0.9 µm wavelength range, covering regions between 70° N and 65° S, and is specifically optimized for high-sensitivity measurement of nighttime surface brightness [33]. When contrasted with the older DMSP/OLS archives, NPP-VIIRS delivers significant enhancements in quality, including reduced radiometric saturation, higher spatial resolution (15 arcsec), and consistent radiometric calibration. These advances address critical limitations of earlier nighttime light products for socioeconomic analysis.

To ensure reliability for income inequality estimation, the annual VNL V2 data employed in this research are generated from monthly NPP-VIIRS composite data after filtering to remove pixels contaminated by sunlight (in polar regions), moonlight, cloud cover, and temporary light sources. This workflow yields noise-reduced annual composites that accurately reflect stable nighttime light emissions from human economic activities. While the VNL V2 product covers the period from 2012 to 2022, the 2012 dataset is excluded because its annual coverage is incomplete, providing data only from April onward. Thus, this study uses VNL V2 data from 2013 to 2022, which aligns temporally with the CHIP survey data to ensure consistency for model development.

2.2.2. CHIP Micro-Survey Data

The CHIP is a series of comprehensive surveys on household income and expenditure conducted by the Chinese Academy of Income Distribution, with six rounds implemented between 1995 and 2018. Each round updates and expands the sample size to enhance representativeness. The surveys cover urban, rural, and migrant worker populations, capturing income dynamics amid China’s economic and social transformation. Samples span multiple provinces, encompassing diverse geographical zones and economic development levels, ensuring coverage of regional heterogeneity critical for county-level inequality estimation. Data collection relies on structured questionnaires, which gather key socioeconomic metrics, including household demographics, income sources, expenditure patterns, education and health status, and housing conditions [34,35]. As a widely cited dataset in academic research, policy analysis, and international comparisons, CHIP provides a robust ground truth basis for validating income inequality estimates.

Consistent with the temporal scope of the VNL V2 data (2013–2022), this study uses CHIP survey samples collected after 2012, specifically the 2013 and 2018 annual surveys, which include 35,136 households across 531 counties and districts nationwide. These samples are used to adjust the proposed estimation model and assess the accuracy of county-level individual income inequality results.

2.3. Methodology

The fundamental framework for estimating individual income inequality is illustrated in Figure 2. It comprises three key components: data preprocessing, the development of the DM-I3EM (including the calculation of the Weibull distribution-based Gini coefficient and establishing the relationship between NTL-based and income-based Gini coefficients), and the evaluation and application of the DM-I3EM. The specific methodological details are as follows.

2.3.1. Data Preprocessing

As illustrated in Figure 3, both individual income distribution (Figure 3a) and nighttime light intensity distribution in a region (Figure 3b) are heavy-tailed. The light intensity distribution is more obviously right-skewed, indicating that the probability density around the zero point is significantly higher. To accurately estimate income inequality, it is essential to match the probability distribution of pixel light intensity with individual income. The shape of the distribution function can be influenced by the dynamic range of light intensity values. Therefore, we first exclude pixels with values below 1 to reduce the number of smaller values and then perform a logarithmic transformation (ln) to mitigate the skewness of the distribution (Figure 3c). For the income survey data, we clean the CHIP dataset and correct outliers to obtain valid annual household income data for the 2013 and 2018 survey samples.

2.3.2. Development of the DM-I3EM: Step 1—Calculation of the Gini Coefficient Based on the Weibull Distribution

This section focuses on establishing a quantitative link between NTL distribution and income distribution by fitting the Weibull distribution, addressing the key challenge of distributional mismatch between NTL data and income data at the spatial scale of counties. The Gini coefficient is used as the target indicator for quantifying income inequality due to its widespread recognition and applicability in cross-regional and fine-scale assessments [36].

Owing to the challenge of acquiring comprehensive income data and thus a true Lorenz curve, there are three mainstream methods for calculating the Gini coefficient. The group-based calculation method calculates the Gini coefficient using income group data from provincial statistical yearbooks. The sample generation method first estimates the income distribution function from yearbook data and then generates micro-sample data for Gini calculation. The distribution function method fits the income distribution function and Lorenz curve directly using existing household micro-survey samples to obtain the Gini coefficient. Compared to the group calculation and sample generation methods, the distribution function method avoids the need to estimate population counts for non-equally divided income groups, mitigates reliability risk in distribution fitting induced by income data concentration, and reduces biases in Gini coefficient calculations resulting from limitations in survey sample selection [37]. Crucially, it aligns with our core goal of matching NTL and income distributions, as it directly connects the two datasets through distribution fitting, unlike grouped statistics or synthetic samples that weaken this linkage. Therefore, we adopt the distribution function method to establish the matching relationship between light distribution and income distribution.

Previous studies have shown that common income distribution functions are right-skewed and heavy-tailed, including the Weibull, Fisk, Burr, Inverse-Gamma, Exponential-normal, and Lomax distributions. To identify the most suitable distribution for income data, we compare their fitting performance using the Kolmogorov–Smirnov (KS) test, Log-Likelihood, Akaike Information Criterion (AIC), and Normalized Root Mean Square Error (NRMSE), which is preferred over Root Mean Square Error (RMSE) to avoid bias from income values’ large dynamic range. As shown in Table 2, the Weibull and Fisk distributions both exhibit superior fitting performance, with higher KS p-values and lower AIC and NRMSE scores. This indicates their stronger capacity to capture income distribution characteristics across diverse regions.

Given the superior performance of Weibull and Fisk distributions in modeling income distributions (Table 2), we validate their applicability to NTL data and explore how spatial resampling influences fitting accuracy (Figure 4), as spatial resampling of NTL affects fitting results. Both distributions are used to fit the probability density functions of pixel light intensity at six resampled scales (500 m, 1 km, 2 km, 3 km, 4 km and 5 km). As shown in Figure 4, the Weibull outperforms Fisk across most scales, and some counties fail the KS test at 500 m/1 km, with p-values under 0.05 implying that the null hypothesis should be rejected. In general, fitting accuracy increases as the resampling scale becomes coarser.

However, overly coarse resampling is suboptimal, as excessive downscaling causes information dilution, resulting in discontinuous probability histograms and mismatches between fitted and actual distributions. Figure 5 shows histograms become increasingly discontinuous from 2 km to 5 km, with Weibull curves deviating more from observed densities. Resampling NTL to a 2 km scale produces a fitted distribution that better matches the original. In addition, a 2 km spatial resolution corresponds to a pixel area of 4 km², which is consistent with official figures from the National Bureau of Statistics (https://www.stats.gov.cn/, accessed on 17 December 2024), where the mean built-up area for designated towns in China stands at 4.03 km². At this scale, the nighttime light value of each pixel can reflect the general economic conditions of a township unit built-up area, and the fitting performance is relatively better, with the fitted distribution showing closer agreement with the theoretical income distribution. The above analysis shows that the Weibull distribution achieves the best fit with 2 km spatial resampling. Thus, we resample NTL pixels to 2 km and derive the county-level Gini coefficient calculation formula according to the probability density function of the Weibull distribution.

The empirical results reported above are consistent with those documented in previous studies. First formalized by Weibull [38] as a versatile statistical tool, the Weibull distribution was later introduced to income analysis by D’Addario [39], who establishes the derivation of the Gini coefficient within this framework. Since then, it has been widely adopted to model income data across various countries, and the associated Gini coefficient derived from this distribution has demonstrated robust performance in measuring income inequality [40,41]. On this basis, this study follows existing derivation methods and calculates the Gini coefficient from the Weibull distribution to estimate income inequality.

The Weibull probability density function (PDF) for a random variable

x

is

f (x) = \frac{b}{a} {(\frac{x}{a})}^{b - 1} \exp [- {(\frac{x}{a})}^{b}]

(1)

where

f (x)

denotes the Weibull probability density function, with

a

as the scale parameter and

b

as the shape parameter. Both parameters can be estimated using the maximum likelihood method.

By integrating the probability density function across the variable

x

, the cumulative distribution function can be derived:

F (x) = \int_{0}^{x} b a^{- b} t^{b - 1} \exp [- {(\frac{t}{a})}^{b}] d t = 1 - \exp [- {(\frac{x}{a})}^{b}]

(2)

where

F (x)

represents the cumulative distribution function, indicating the probability that the random variable

x

takes a value less than or equal to a given threshold.

Using the Gini coefficient’s geometric interpretation, the corresponding calculation formula for the Weibull distribution is derived as follows:

L (p) = \frac{1}{μ} \int_{a}^{F - 1 (p)} t f (t) d t

(3)

G = 1 - 2 \int_{0}^{1} L (p) d p = 1 - 2^{- \frac{1}{b}}

(4)

where

p

is the cumulative probability ranging from 0 to 1,

L (p)

is the Lorenz function representing the proportion of cumulative resources owned by the group with cumulative probability

p

,

F^{- 1} (p)

is the quantile function, which is the inverse of the cumulative distribution function, indicating the variable level corresponding to cumulative probability

p

,

u

is the mathematical expectation of the variable, and

G

denotes the Gini coefficient.

Notably, the Gini coefficient depends solely on the shape parameter of the Weibull distribution, indicating that the relationship between NTL-based spatial inequality and sample income-based individual inequality can be converted into that between the Weibull distribution of NTL pixel light intensity and sample income.

2.3.3. Development of the DM-I3EM: Step 2—Establishing the Relationship Between NTL-Based and Income-Based Gini Coefficients

This section aims to enhance the accuracy of the relationship between NTL-based and income-based Gini coefficients through a systematic comparison of regression models. The preliminary relationship established through the Weibull distribution (Section 2.3.2) does not fully clarify the specific quantitative link between individual income Gini coefficients and NTL-based spatial Gini coefficients. Therefore, it is necessary to identify a model that can accurately characterize this relationship. In this study, we compare four types of regression models, each with distinct advantages for capturing complex relationships, including PR, GAM, RF, and GPR.

PR is an extension of linear regression that enables linear fitting on the polynomial basis of input variables, thereby capturing complex nonlinear relationships [42]. GAM is a flexible statistical model developed to represent the nonlinear relationships between predictor variables and response variables, and it can accommodate response variables with different distributions (e.g., binomial, Poisson, or normal distributions) [43]. RF is an ensemble learning algorithm that enhances overall prediction accuracy and stability by integrating predictions from multiple decision trees. It has the advantages of high estimation precision and no need for prior assumptions about probability distribution [44,45]. GPR is a core model in probabilistic supervised machine learning that uses prior knowledge (i.e., kernels) for prediction and provides uncertainty metrics for the predicted results. It assumes that the data are drawn from a multivariate Gaussian process, thereby facilitating the prediction of outputs for new data points [46].

Socioeconomic development levels vary significantly across China’s regions, which may lead to heterogeneous relationships between NTL-based spatial inequality and survey-based income inequality. To reduce errors caused by this spatial heterogeneity, we stratify China into four economic regions (the Northeast region, the East region, the Central region, and the West region). The PR, GAM, RF, and GPR models are trained for each region with sample data from 2013 and 2018.

2.3.4. Evaluation and Application of DM-I3EM

After completing the two core development steps of the DM-I3EM, we conduct a performance comparison among the four regression models to finalize the DM-I3EM using two evaluation metrics, the RMSE and the coefficient of determination (R²).

The finalized DM-I3EM takes NTL imagery resampled to a 2 km scale for target counties as input and generates the individual income inequality levels of these counties. Extrapolating the optimized model to the national scale enables the estimation of individual income inequality in regions lacking survey samples, effectively overcoming the data scarcity bottleneck in traditional survey-based measures of inequality. The key formulas for evaluation and model application are as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(5)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}}

(6)

N L I I_{i} = f_{region} (G_{{NTL}_{i}})

(7)

where

y_{i}

and

{\hat{y}}_{i}

represent the reference and estimated values of income inequality for county

i

respectively,

\bar{y}

denotes the sample average of the reference values of

y_{i}

,

n

represents the dataset size,

G_{{NTL}_{i}}

is the NTL-based Gini coefficient for county

i

,

N L I I_{i}

is the Nighttime Light Inequality Index (NLII) estimated using DM-I3EM, and

f_{region}

is the mapping function between the light Gini and NLII in four economic regions.

2.3.5. Spatial Autocorrelation Analysis

To analyze the spatial autocorrelation of income inequality distribution across China, we employ Global Moran’s I [47,48], which is calculated as

I = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} (x_{i} - \bar{x}) (x_{j} - \bar{x})}{s^{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j}}

(8)

where

n

represents the total count of counties,

x_{i}

and

x_{j}

denote the NLII of county

i

and county

j

,

\bar{x}

is the mean value of the NLII, and

s^{2}

is the sample variance (

s^{2} = \frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}

).

w_{i j}

denotes the spatial weight matrix characterizing the spatial adjacency relationship. If county

i

and county

j

share a common boundary or a vertex, then

w_{i j} = 1

, otherwise

w_{i j} = 0

. The value of

I

varies from −1 to 1, where positive values signify spatial agglomeration, negative values indicate dispersion, and zero indicates random distribution.

Furthermore, to identify local spatial association patterns within the study area, we employ Local Moran’s I [48,49]. The calculation is defined as

I_{i} = \frac{\sum_{j = 1}^{n} w_{i j} (x_{i} - \bar{x}) (x_{j} - \bar{x})}{s^{2}}

(9)

where

I_{i}

refers to the Local Moran’s I for county

i

, a positive

I_{i}

indicates the spatial clustering of similar values, appearing as High-High (H-H) or Low-Low (L-L) clusters, whereas a negative

I_{i}

indicates the clustering of dissimilar values, appearing as High-Low (H-L) or Low-High (L-H) outliers.

3. Results

3.1. Statistical Comparison of Models’ Performance

Following the evaluation methodology described in Section 2.3.4, we compare RMSE and R² of PR, RF, GPR, and GAM by analyzing the consistency between predicted and actual values.

All four regression models yield relatively low and comparable RMSE values. This is attributed to the Gini coefficient’s inherent range (≤1), which limits the magnitude of prediction errors. However, as shown in Figure 6, significant differences emerge in R² performance. Among the four models, GPR performs poorly in prediction. This is likely because GPR assumes a Gaussian distribution for data, but the actual distributions of NTL intensity and sample income deviate from this assumption, which affects model performance. RF achieves moderate performance but remains less stable across regions, suggesting limited robustness when applied to heterogeneous spatial contexts. In contrast, PR and GAM, which capture both linear and nonlinear relationships, outperform the nonlinear models RF and GPR in prediction. This suggests that the relationship between log-transformed NTL-based spatial inequality and sample income-based individual income inequality is linear.

Comparing the model performance of PR and GAM across the four regions, both models perform best in the Eastern region, followed by the Northeast, with the Central and Western regions showing the weakest performance. One important reason is the spatial imbalance of household income survey samples across regions. More observations are collected in the Eastern region than in the Western region, which prevents the model from adequately learning the relationship between income and nighttime light intensity in the West during training. In addition, the regional variation in model performance is also partly due to variations in the geographic distribution of economic activities, as reflected in the nighttime light intensity. Nighttime light intensity in the Eastern region is more concentrated and stable, whereas it is more spatially dispersed and weaker in the Western region. As a result, nighttime light intensity in the West is less effective at representing income distribution, leading to lower model accuracy. Overall, the two models exhibit similar estimation accuracy throughout the Eastern and Central areas, but in the Northeast and Western areas, GAM slightly outperforms PR. Therefore, we select GAM to construct DM-I3EM. This choice yields optimal performance, with an R² of 0.76, which markedly improves estimation accuracy compared to the typical correlation coefficients of ~0.5 reported in previous NTL-based income inequality studies [30].

3.2. Spatiotemporal Distribution of Income Inequality Based on DM-I3EM

DM-I3EM effectively addresses the spatiotemporal gaps inherent in traditional income surveys. By inputting 2013–2022 NTL data resampled to a 2 km resolution, the model generates a national county-level time series of income inequality measured by NLII. It allows reliable estimation for years lacking survey data and achieves full coverage across all counties, enabling detailed characterization of fine-grained spatiotemporal variations in individual income inequality.

Figure 7 shows the annual average NLII for China and its four major economic regions. Over time, all four major economic regions exhibit an overall upward trend in NLII from 2013 to 2022, indicating that a widening income gap has become a common feature of county-level development. The Western, Northeastern, and Central regions consistently maintain higher inequality than the national average, with the Western region showing the highest inequality and the Eastern region the lowest.

Figure 8 presents county-level NLII estimates and interval proportions, with darker red representing higher inequality and darker blue representing lower inequality. Spatially, socioeconomically advanced regions (e.g., Pearl River Delta, PRD; Yangtze River Delta, YRD; Beijing-Tianjin-Hebei, BTH) consistently display lower inequality. By comparison, many less developed western and northeastern counties exhibit substantially higher levels. However, several counties in Tibet show relatively low NLII values. This is likely due to highly single-structured economies with concentrated income sources. Additionally, small and spatially clustered populations limit income variation, reducing measured inequality. The temporal evolution from 2013 to 2022 reinforces this spatial pattern. Inequality is greatest in the Western region, with the Northeast, Central, and Eastern regions exhibiting progressively lower levels, which is consistent with the regional averages shown in Figure 7. As shown in Figure 8d, the number of counties with NLII values greater than 0.41 increases markedly during the decade and exceeds half of all counties by 2022. This reflects a broad intensification of inequality, particularly in the Northeast and in the north-central part of the country. In contrast, coastal counties south of the Yangtze River experience little change or slight decreases in inequality, indicating relatively stable socioeconomic conditions in these areas.

Notably, DM-I3EM’s NLII estimates have finer spatiotemporal granularity than traditional survey data, uncovering hidden fine-grained inequality patterns. For example, in Jiangsu Province, lower income inequality is concentrated in southern Suzhou and Wuxi (NLII < 0.3), characterized by a relatively equitable income distribution. In contrast, most central or northern Jiangsu counties exhibit greater inequality, with some experiencing further widening between 2013 and 2022. This fine-scale variation would be difficult to capture with traditional survey data alone.

3.3. Spatiotemporal Evolution of Spatial Clustering of Income Inequality Based on DM-I3EM

Leveraging DM-I3EM-derived NLII estimates with fine spatiotemporal resolution, we systematically reveal the spatiotemporal evolution patterns of income inequality spatial clustering nationwide.

Table 3 shows the global spatial autocorrelation results of county-level income inequality. Moran’s I rises from 0.17 (2013) to 0.20 (2022), suggesting that the spatial clustering of county-level income inequality has gradually grown more pronounced over the study period, and high-low inequality regions have become more spatially segregated.

As shown in Figure 9, income inequality exhibits distinct regional clustering patterns. H-H clusters, where high-NLII counties are surrounded by other high-NLII counties, are mainly concentrated in the Western and Northeastern regions, whereas L-L clusters concentrate in the Central and Eastern coastal areas, notably in the PRD, YRD, and BTH city clusters. From 2013 to 2022, the proportion of H-H counties remained generally stable nationwide, but their spatial extent expanded in the Northeast and West. In the Northeast, this expansion is driven by the decline of traditional industries, continued labor outflow, and uneven resource allocation, which have intensified and spatially concentrated income disparities across counties. In the West, several counties in Tibet have shifted from L-L to H-H, likely due to diversified income sources resulting from localized resource exploitation, tourism, and other region-specific economic activities, creating high-income groups and widening local disparities. By contrast, the proportion of L-L counties rises slightly from 14.6% to 15.5%, with L-L clustering marginally expanding in coastal city clusters, such as the PRD. This trend indicates the effectiveness of national regional development policies in reducing interregional income disparities.

4. Discussion

4.1. Sensitivity of Auxiliary Variables to Income Inequality

This study uses the NTL-derived Gini coefficient as the sole input for estimating individual income inequality. To assess if adding remote sensing indicators enhances estimation performance, we incorporate twelve auxiliary variables across three domains. The first is NTL characteristics: total lit area (NTL > 0), cumulative NTL intensity, mean NTL intensity, and NTL variability. The second domain is vegetation dynamics, including vegetated area (NDVI > 0.2), cumulative NDVI, mean NDVI, and NDVI variability. The third is built environment features: urbanized area (NDBBI > 0.2), cumulative NDBBI, mean NDBBI, and NDBBI variability. Variable selection aligns with well-established socioeconomic mechanisms and prior research in applied geography. NTL metrics proxy regional economic development levels, where luminosity patterns correlate with industrial and commercial activities. NDVI reflects ecological equity via vegetation coverage, which is often linked to urban green space distribution and environmental justice. NDBBI captures urbanization intensity, informing infrastructure patterns and socioeconomic impacts. To mitigate multicollinearity, we select indicators most correlated with the income Gini coefficient. We identify mean NTL intensity, cumulative NDVI, and cumulative NDBBI as the three optimal predictors, combining them with the NTL-derived Gini coefficient for model training and testing.

Figure 10 shows that auxiliary variables provide limited improvement. Compared to the baseline model (without additional remote sensing metrics), three models (except GPR) show minimal or no accuracy gains across most regions. GAM even declines slightly in the Eastern and Western regions, suggesting added variables may introduce noise or increase model complexity without meaningful predictive value. While GPR shows marginal improvement with auxiliary variables, its overall accuracy remains low, highlighting its inherent limitations for this task.

These findings underscore the parsimony and robustness of the proposed model. Consistent with the principle of Occam’s Razor, the NTL-derived Gini coefficient proves to be a sufficient and efficient proxy for capturing the primary variance in regional income inequality. By achieving high accuracy without relying on complex, multi-source datasets, the DM-I3EM minimizes the risk of overfitting and lowers the data barrier for application. This simplicity significantly enhances the model’s transferability to other data-scarce developing regions, where consistent high-quality auxiliary data (e.g., vegetation or built-up indices) may be difficult to obtain.

4.2. Transferability of DM-I3EM to Other Inequality Indices

Compared to the commonly used formula-based method for calculating the Gini coefficient, the distribution matching method derives the Gini coefficient indirectly from the shape parameters of the probability density function, based on the definition of the Lorenz curve. This methodological advantage allows for flexible extension to other inequality indices that share a theoretical foundation in the Lorenz curve framework, addressing a key limitation of existing NTL-based models, which are often tailored to a single indicator.

These Lorenz curve-derived indices complement rather than replace each other, as they capture distinct dimensions of inequality that a single metric (e.g., the Gini coefficient) cannot fully encapsulate. For instance, the Gini coefficient emphasizes overall disparity across the entire income distribution, while the Bonferroni index (B index) concentrates on the relative disadvantage of low-income groups, comparing the mean income of the lower portion of the distribution with the overall average. For a variable arranged in ascending order within the interval, the mean over the interval can be calculated as follows [50,51]:

m (x) = \frac{1}{F (x)} \int_{x_{0}}^{x} t d F (t), x \in [x_{0}, x_{m}]

(10)

where

m (x)

denotes the mean income of the group with income

\leq x

,

F (x)

is the cumulative distribution function (representing the proportion of the group with income

\leq x

),

t

is the integration variable for income,

x_{0}

is the lower bound of income, and

x_{m}

is the upper bound of income.

For a given income

x

, the relative deviation of the average income of individuals with income below

x

from the total group average is expressed as:

ϕ (X) = \frac{μ - m (x)}{μ}, x \in [x_{0}, x_{m}]

(11)

where

ϕ (X)

is the relative difference function,

μ = \int_{x_{0}}^{x_{m}} t d F (t)

is the overall average income (mathematical expectation of the variable

x

), and

m (x)

is the interval mean defined in Formula (8). The function

ϕ (X)

is non-negative, decreasing, and satisfies

ϕ (X) \leq 1 - x_{0} / μ < 1

.

The B index is the expected value of the relative difference represented by the function

ϕ (X)

:

B = E (ϕ (x)) = \int_{x_{0}}^{x_{m}} ϕ (x) d F (x) = \int_{x_{0}}^{x_{m}} \frac{μ - m (x)}{μ} d F (x)

(12)

where

B

is the B index,

E (ϕ (x))

denotes the mathematical expectation, and the expectation is calculated by integrating

ϕ (X)

with respect to the cumulative distribution function

F (x)

.

The expression for the Lorenz curve is

L (p) = \frac{1}{μ} \int_{0}^{p} x d F (x)

(13)

where

L (p)

is the Lorenz function and

p \in [0, 1]

is the cumulative probability.

Therefore,

φ (X) = 1 - \frac{L (p)}{p}

, and the B index can be expressed in another form as:

B = 1 - \int_{0}^{1} \frac{L (p)}{p} d p = 1 - \int_{0}^{1} B (p) d p

(14)

At this point, the B index is defined as the area separating the Bonferroni curve from the absolute equality line. The calculation formula for the B index corresponding to the Weibull distribution is

B = 1 - \sum_{l = 0}^{\infty} \frac{{(l + 2)}^{- \frac{1}{b} - 1}}{l + 1}

(15)

where

l

is a summation variable taking values 0, 1, 2, …, and

b

is the shape parameter of the Weibull distribution.

Based on the B index calculation Formulas (10)–(15), we compute the B index using micro-survey income data and estimate it using the DM-I3EM. A correlation analysis is conducted between the two, with results shown in Figure 11. There is a strong correlation between the estimated B index and the actual B index, with correlation coefficients exceeding 0.6 across all four economic regions. Consistent with the Gini coefficient estimation results, the model performs best in the Eastern Region when estimating the B index. This indicates that the income inequality estimation model developed in this study can not only estimate the county-level individual income Gini coefficient but also calculate other inequality indices derived from the Lorenz curve. It thereby provides a more comprehensive reflection of individual income inequality in the region, including both overall disparity and the relative deprivation of vulnerable groups. This transferability enhances the model’s utility for interdisciplinary research and policy-making, as it can adapt to diverse analytical needs, such as evaluating the effects of poverty alleviation programs or monitoring overall trends in inequality.

4.3. Limitations and Prospects

While NTL data provides a valuable perspective for estimating income inequality, it indirectly reflects regional income levels by capturing nighttime economic activities. In affluent areas with abundant greenery, sparse populations, and locations far from urban centers, such as suburban villas and ecotourism resorts, NTL may not accurately represent the income situation due to low nighttime light emissions from non-industrial, low-density development. Similarly, in energy-intensive regions with high NTL intensity but low average income (e.g., some resource-extraction counties), the decoupling between economic activity luminosity and individual income could lead to estimation biases. Therefore, research on small-scale regions (e.g., townships) or specific time periods (e.g., holidays with altered nighttime activity patterns) may benefit from integrating other geospatial and remote sensing data. Examples include high-resolution housing price data and street view imagery, which can effectively mitigate such biases.

The income inequality estimation model developed in this study is trained and tested using data from China household income surveys, which reflect unique socioeconomic contexts such as rapid urbanization and regional development policies. Its applicability to other countries and regions, particularly those with distinct economic structures, remains to be verified. Therefore, future research could incorporate the World Inequality Database (WID) and global NTL datasets to examine the model performance in different regions and adjust the distribution fitting parameters and regression model structures to suit local contexts.

In addition, the inequality estimated in this study focuses on county-level individual income inequality and lacks multi-scale assessments of inequality within administrative units. Therefore, future research could employ deep learning models, such as convolutional long short-term memory (ConvLSTM) networks, which excel at capturing local sequential spatiotemporal dependencies. Spatiotemporal Transformer models could also be used to model long-range global spatiotemporal correlations. These advanced models can integrate high-resolution NTL data, land-use data, and population density data to explore the multi-scale magnitude, spatial distribution, and evolutionary dynamics of income and income inequality within regions.

5. Conclusions

Accurate monitoring of income inequality is a fundamental prerequisite for achieving socioeconomic sustainability. However, the reliance on costly household surveys and the limited accuracy of traditional remote sensing proxies have created significant blind spots in inequality assessment. To address these challenges, this study developed the Distribution Matching-based Individual Income Inequality Estimation Model. By synergizing nighttime light data with household surveys through Weibull distribution fitting, this framework successfully overcomes the spatiotemporal gaps inherent in micro-survey data and achieves superior accuracy compared to existing methods. This advances the methodological frontier of inequality measurement by providing a reliable solution for estimating individual income inequality in counties with missing micro-survey data across multiple years and regions.

In the context of China, the application of this model has profound implications for the socioeconomic sustainability of the nation. With its high spatiotemporal resolution, the model enables the accurate and full-coverage estimation of county-level income inequality, providing a fine-grained view of spatiotemporal dynamics that was previously difficult to achieve. Results reveal a regional divergence and intensifying spatial agglomeration of inequality, particularly in northern and western counties. These insights are critical for policymakers to transition from broad economic monitoring to targeted social interventions. By identifying high-risk areas of social stratification, the model provides the actionable evidence needed to optimize resource allocation and advance common prosperity, thereby mitigating the risks inequality poses to long-term economic resilience and social stability.

The significance of this study extends beyond the specific case of China. The core methodology of combining distribution matching with region-specific calibration offers a cost-effective and replicable solution for the Global South and other data-scarce regions facing similar development challenges. By providing a robust tool to visualize inequality at fine spatiotemporal granularities, this framework empowers global stakeholders to monitor progress toward the United Nations Sustainable Development Goal 10 regarding reduced inequalities. Ultimately, this study bridges the gap between remote sensing technology and social science, offering a scalable pathway to foster a more inclusive and sustainable global future.

Author Contributions

Conceptualization, L.Z. and S.G.; methodology, L.Z., Q.W. and S.G.; validation, S.G.; formal analysis, Q.W. and S.G.; investigation, Q.W.; resources, L.Z., Q.W. and S.G.; data curation, Q.W. and S.G.; writing—original draft preparation, L.Z., Q.W. and S.G.; writing—review and editing, L.Z., Q.W. and S.G.; visualization, Q.W.; supervision, L.Z.; project administration, L.Z.; funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 42471343; the Guangdong Basic and Applied Basic Research Foundation, grant number 2022B1515130001; and the Innovation Group Project of Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), grant number 311024004.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The NPP-VIIRS Annual VNL V2 data employed in this study are publicly accessible from the Earth Observation Group at https://eogdata.mines.edu/products/vnl/ (accessed on 13 November 2023). The CHIP Micro-Survey data employed in this study can be obtained publicly from the China Household Income Data Sharing Platform at http://chip.bnu.edu.cn (accessed on 13 November 2023).

Acknowledgments

The authors sincerely appreciate the editors and anonymous reviewers for their detailed comments and insightful suggestions, which have greatly contributed to improving this paper. The authors also acknowledge the use of GPT 3.5 to improve the language of the manuscript. All scientific content and original ideas presented are solely the work of the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chancel, L.; Piketty, T.; Saez, E.; Zucman, G. World Inequality Report 2022; Harvard University Press: Cambridge, MA, USA, 2022. [Google Scholar]
Baron, R.A. Income inequality: How efforts to reduce it can undermine motivation—And the pursuit of excellence. J. Engrep. Public Policy 2017, 6, 2–10. [Google Scholar] [CrossRef]
Marchand, Y.; Dub, J.; Breau, S. Exploring the causes and consequences of regional income inequality in Canada. Econ. Geogr. 2020, 96, 83–107. [Google Scholar] [CrossRef]
Berg, A.G.; Ostry, J.D. Inequality and unsustainable growth: Two sides of the same coin? IMF Econ. Rev. 2017, 65, 792–815. [Google Scholar] [CrossRef]
Neckerman, K.M.; Torche, F. Inequality: Causes and consequences. Annu. Rev. Sociol. 2007, 33, 335–357. [Google Scholar] [CrossRef]
Xing, Z.; He, C.; Lin, J.; Pan, Y. Who you are versus where you are: Revealing the importance of determinants of within-city income inequality in China through an interpretable machine learning approach. Appl. Geogr. 2025, 184, 103759. [Google Scholar] [CrossRef]
Dorling, D. Income inequality in the UK: Comparisons with five large Western European countries and the USA. Appl. Geogr. 2015, 61, 24–34. [Google Scholar] [CrossRef]
Constantine, C. Income inequality in Guyana: Class or ethnicity? New evidence from survey data. World Dev. 2024, 173, 106429. [Google Scholar] [CrossRef]
Marco, R.; Llano, C.; Pérez-Balsalobre, S. Economic complexity, environmental quality and income equality: A new trilemma for regions? Appl. Geogr. 2022, 139, 102646. [Google Scholar] [CrossRef]
Xia, H.; Guan, Y.; Pan, H.; Shen, Y. Spatial evolution and influencing factors of income gap in resource-based cities based on GWR. GeoJournal 2025, 90, 132. [Google Scholar] [CrossRef]
Arya, P.K.; Sur, K.; Dhote, S.; Siral, H.; Kundu, T.; Mehta, B.S.; Srivastava, R. Integrating multi-source satellite imagery and socio-economic household data for wealth-based poverty assessment of India: A GIS and machine learning based approach. Soc. Indic. Res. 2025, 179, 653–676. [Google Scholar] [CrossRef]
Hu, B.; Zhai, W.; Li, D.; Tang, J. Application note: Evaluation of the Gini coefficient at the county level in mainland China based on Luojia 1-01 nighttime light images. Comput. Urban Sci. 2024, 4, 1. [Google Scholar] [CrossRef]
Jestl, S.; List, E. Inequality, redistribution, and the financial crisis: Evidence from distributional national accounts for Austria. Rev. Income Wealth 2023, 69, 195–227. [Google Scholar] [CrossRef]
Levin, N.; Kyba, C.C.; Zhang, Q.; de Miguel, A.S.A.N.; Román, M.O.; Li, X.; Portnov, B.A.; Molthan, A.L.; Jechow, A.; Miller, S.D.; et al. Remote sensing of night lights: A review and an outlook for the future. Remote Sens. Environ. 2020, 237, 111443. [Google Scholar] [CrossRef]
Chen, Z.; Wei, Y.; Shi, K.; Zhao, Z.; Wang, C.; Wu, B.; Qiu, B.; Yu, B. The potential of nighttime light remote sensing data to evaluate the development of digital economy: A case study of China at the city level. Comput. Environ. Urban Syst. 2022, 92, 101749. [Google Scholar] [CrossRef]
Bao, H.; Tao, H.; Zhuo, L.; Shi, Q.; Guo, S. Estimation of Economic Spillover Effects under the Hierarchical Structure of Urban Agglomeration Based on Time-Series Night-Time Lights: A Case Study of the Pearl River Delta, China. Remote Sens. 2024, 16, 394. [Google Scholar] [CrossRef]
Zhang, F.; Zhang, Q.; Xu, M. Remote sensing insights into urban--rural imbalance and sustainable development: A case study in Guangdong, China. Sustainability 2025, 17, 2247. [Google Scholar] [CrossRef]
Small, C.; Elvidge, C.D.; Balk, D.; Montgomery, M. Spatial scaling of stable night lights. Remote Sens. Environ. 2011, 115, 269–280. [Google Scholar] [CrossRef]
Mcsharry, P.; Mawejje, J. Estimating urban GDP growth using nighttime lights and machine learning techniques in data poor environments: The case of South Sudan. Technol. Forecast. Soc. Change 2024, 203, 123399. [Google Scholar] [CrossRef]
Ivan, K.; Holobâcă, I.; Benedek, J.; Török, I. VIIRS Nighttime Light Data for Income Estimation at Local Level. Remote Sens. 2020, 12, 2950. [Google Scholar] [CrossRef]
Liu, H.; Wang, J.; Liu, H.; Chen, Y.; Liu, X.; Guo, Y.; Huang, H. Identification of relative poverty based on 2012–2020 NPP/VIIRS night light data: In the area surrounding Beijing and Tianjin in China. Sustainability 2022, 14, 5559. [Google Scholar] [CrossRef]
Chen, Z.; Luo, H.; Li, M.; Lin, J.; Zhang, X.; Li, S. Fine-scale poverty estimation by integrating SDGSAT-1 glimmer images and urban functional zoning data. Remote Sens. Environ. 2025, 329, 114925. [Google Scholar] [CrossRef]
Jiang, Y.; Sun, S.; Zheng, S. Exploring urban expansion and socioeconomic vitality using NPP-VIIRS data in Xia-Zhang-Quan, China. Sustainability 2019, 11, 1739. [Google Scholar] [CrossRef]
Lin, W.; Xu, W.; Wu, Z.; Cao, J. Enhancing county-level GDP estimation accuracy with downscaled NPP-VIIRS nighttime light data. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2025, 18, 17552–17564. [Google Scholar] [CrossRef]
Huang, C.; Zhuang, Q.; Meng, X.; Guo, H.; Han, J. An improved nightlight threshold method for revealing the spatiotemporal dynamics and driving forces of urban expansion in China. J. Environ. Manag. 2021, 289, 112574. [Google Scholar] [CrossRef]
Chen, S.; Cheng, C. Monitoring Fine-Scale Urban Shrinkage Space with NPP-VIIRS Imagery. Remote Sens. 2025, 17, 688. [Google Scholar] [CrossRef]
Yang, L.; Cao, J.; Zhuo, L.; Shi, Q. A novel consistency calibration method for DMSP-OLS nighttime stable light time-series images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2022, 15, 2621–2631. [Google Scholar] [CrossRef]
Lessmann, C.; Seidel, A. Regional inequality, convergence, and its determinants—A view from outer space. Eur. Econ. Rev. 2017, 92, 110–132. [Google Scholar] [CrossRef]
Mirza, M.U.; Xu, C.; Bavel, B.V.; van Nes, E.H.; Scheffer, M. Global inequality remotely sensed. Proc. Natl. Acad. Sci. USA 2021, 118, e1919913118. [Google Scholar] [CrossRef] [PubMed]
Weidmann, N.B.; Theunissen, G. Estimating local inequality from nighttime lights. Remote Sens. 2021, 13, 4624. [Google Scholar] [CrossRef]
Li, W.; Wu, M.; Niu, Z. Spatialization and analysis of China’s GDP based on NPP/VIIRS data from 2013 to 2023. Appl. Sci. 2024, 14, 8599. [Google Scholar] [CrossRef]
Hou, X.; Gao, J. Toward Common Prosperity: Measuring decrease in inequality in China prefecture-level cities. Struct. Change Econ. Dyn. 2025, 72, 29–46. [Google Scholar] [CrossRef]
Elvidge, C.D.; Zhizhin, M.; Ghosh, T.; Hsu, F.C.; Taneja, J. Annual time series of global VIIRS nighttime lights derived from monthly averages: 2012 to 2019. Remote Sens. 2021, 13, 922. [Google Scholar] [CrossRef]
Wang, J. How Does New Urban Land Affect Rural-Urban Migration? Spat. Demogr. 2025, 13, 19. [Google Scholar] [CrossRef]
Tian, Z.; Xin, Y.; Lin, Y. Do roads help rural populations escape poverty? new evidence from Chinese survey data. Appl. Econ. 2025, 1–14. [Google Scholar] [CrossRef]
Wang, Z.; Jv, Y. Revisiting income inequality among households: New evidence from the Chinese Household Income Project. China Econ. Rev. 2023, 81, 102039. [Google Scholar] [CrossRef]
Bai, X.; Li, L.; Song, P. Balancing efficiency and fairness: The research on the impact of China’s digital economy development on economic growth and income inequality. J. Xi’an Jiaotong Univ. Soc. Sci. 2023, 43, 38–50. (In Chinese) [Google Scholar] [CrossRef]
Weibull, W. A statistical distribution function of wide applicability. J. Appl. Mech. 1951, 18, 293–297. [Google Scholar] [CrossRef]
D’Addario, R. Intorno ad una funzione di distribuzione. G. Degli Econ. Ann. Econ. 1974, 33, 205–214. [Google Scholar]
Chotikapanich, D.; Rao, D.P.; Tang, K.K. Estimating income inequality in China using grouped data and the generalized beta distribution. Rev. Income Wealth 2007, 53, 127–147. [Google Scholar] [CrossRef]
Mirzaei, S.; Mohtashami Borzadaran, G.R.; Amini, M.; Jabbari, H. A new generalized Weibull distribution in income economic inequality curves. Commun. Stat.-Theory Methods 2019, 48, 889–908. [Google Scholar] [CrossRef]
Kekatos, V.; Giannakis, G.B. Sparse volterra and polynomial regression models: Recoverability and estimation. IEEE Trans. Signal Process. 2011, 59, 5907–5920. [Google Scholar] [CrossRef]
Feng, Y.; Yang, Q.; Tong, X.; Chen, L. Evaluating land ecological security and examining its relationships with driving factors using GIS and generalized additive model. Sci. Total Environ. 2018, 633, 1469–1479. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Dhibi, K.; Fezai, R.; Mansouri, M.; Trabelsi, M.; Kouadri, A.; Bouzara, K.; Nounou, H.; Nounou, M. Reduced kernel random forest technique for fault detection and classification in grid-tied PV systems. IEEE J. Photovolt. 2020, 10, 1864–1871. [Google Scholar] [CrossRef]
Zhang, X.; Xu, J.; Zhong, S.; Wang, Z. Assessing uneven regional development using nighttime light satellite data and machine learning methods: Evidence from county-level improved HDI in China. Land 2024, 13, 1524. [Google Scholar] [CrossRef]
Moran, P.A. Notes on continuous stochastic phenomena. Biometrika 1950, 37, 17–23. [Google Scholar] [CrossRef] [PubMed]
Qu, R.; Lee, S.; Rhee, Z.; Bae, S. Analysis of inequality levels of industrial development in rural areas through inequality indices and spatial autocorrelation. Sustainability 2023, 15, 8102. [Google Scholar] [CrossRef]
Anselin, L. Local indicators of spatial association—LISA. Geogr. Anal. 1995, 27, 93–115. [Google Scholar] [CrossRef]
Giorgi, G.M.; Nadarajah, S. Bonferroni and Gini indices for various parametric families of distributions. Metron 2010, 68, 23–46. [Google Scholar] [CrossRef]
Chakravarty, S.R.; Sarkar, P. New perspectives on the Gini and Bonferroni indices of inequality. Soc. Choice Welf. 2023, 60, 47–64. [Google Scholar] [CrossRef]

Figure 1. Map of the study area.

Figure 2. Overview of the individual income inequality estimation methodology.

Figure 3. Distribution of household income and NTL intensity: (a) Household income; (b) Raw NTL intensity; (c) Processed NTL intensity.

Figure 4. Boxplot comparison of Weibull vs. Fisk fitting performance across spatial resampled scales.

Figure 5. Weibull fitting of NTL intensity distributions at different spatial resampled scales.

Figure 6. Comparison of the model’s performance.

Figure 7. Temporal changes in county-level income inequality in China (2013–2022).

Figure 10. Performance of models after incorporating remote sensing auxiliary variables.

Figure 11. Correlation analysis between NTL-derived B index and survey-derived income B index.

Table 1. Description of the key datasets used.

Type	Dataset	Time Horizon	Spatial Resolution	Unit	Data Source
NTL Data	NPP-VIIRS Annual VNL V2	2013–2022	15 arcsec	nW·cm⁻²·sr⁻¹	https://eogdata.mines.edu/products/vnl/ (accessed on 13 November 2023)
Micro Survey Data	CHIP	2013, 2018	County	Person or family	http://chip.bnu.edu.cn (accessed on 13 November 2023)

Table 2. Comparison of fitting performance across income distribution functions.

Indicators	Quantiles	Distributions
Indicators	Quantiles	Weibull	Fisk	Burr	Inverse-Gamma	Exponential-Normal	Lomax
KS p-value	25th Percentile	0.22	0.71¹	0.08	0.06	0.00	0.00
	Median	0.52	0.85	0.29	0.26	0.00	0.00
	75th Percentile	0.75	0.95	0.68	0.63	0.06	0.06
Log-Likelihood	25th Percentile	−1159.08	−1149.94	−1159.14	−1172.09	−1167.12	−1166.29
	Median	−910.28	−911.65	−916.05	−921.50	−915.67	−915.32
	75th Percentile	−565.44	−561.36	−560.02	−572.20	−577.46	−578.21
AIC	25th Percentile	1131.48	1121.99	1121.00	1144.18	1154.16	1155.64
	Median	1821.53	1828.65	1829.04	1846.50	1833.48	1833.49
	75th Percentile	2321.18	2302.95	2322.37	2344.94	2337.71	2337.30
NRMSE	25th Percentile	0.78	0.83	0.92	0.84	1.07	1.07
	Median	0.94	1.04	1.34	1.07	1.20	1.22
	75th Percentile	1.13	1.33	1.67	1.41	1.36	1.34

¹ The bold values in this table indicate the distribution with the best fitting effect for each indicator.

Table 3. County-level income inequality measured by Global Moran’s I (2013–2022).

Year	Global Moran’s I	Standardized z-Score	Significance (p-Value)
2013	0.17	99.89	p < 0.001
2018	0.19	111.23	p < 0.001
2022	0.20	117.88	p < 0.001

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhuo, L.; Wu, Q.; Guo, S. Integrating Nighttime Light and Household Survey Data to Monitor Income Inequality: Implications for China’s Socioeconomic Sustainability. Sustainability 2026, 18, 734. https://doi.org/10.3390/su18020734

AMA Style

Zhuo L, Wu Q, Guo S. Integrating Nighttime Light and Household Survey Data to Monitor Income Inequality: Implications for China’s Socioeconomic Sustainability. Sustainability. 2026; 18(2):734. https://doi.org/10.3390/su18020734

Chicago/Turabian Style

Zhuo, Li, Qiuying Wu, and Siying Guo. 2026. "Integrating Nighttime Light and Household Survey Data to Monitor Income Inequality: Implications for China’s Socioeconomic Sustainability" Sustainability 18, no. 2: 734. https://doi.org/10.3390/su18020734

APA Style

Zhuo, L., Wu, Q., & Guo, S. (2026). Integrating Nighttime Light and Household Survey Data to Monitor Income Inequality: Implications for China’s Socioeconomic Sustainability. Sustainability, 18(2), 734. https://doi.org/10.3390/su18020734

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Nighttime Light and Household Survey Data to Monitor Income Inequality: Implications for China’s Socioeconomic Sustainability

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Source

2.2.1. NPP-VIIRS Annual VNL V2 Data

2.2.2. CHIP Micro-Survey Data

2.3. Methodology

2.3.1. Data Preprocessing

2.3.2. Development of the DM-I3EM: Step 1—Calculation of the Gini Coefficient Based on the Weibull Distribution

2.3.3. Development of the DM-I3EM: Step 2—Establishing the Relationship Between NTL-Based and Income-Based Gini Coefficients

2.3.4. Evaluation and Application of DM-I3EM

2.3.5. Spatial Autocorrelation Analysis

3. Results

3.1. Statistical Comparison of Models’ Performance

3.2. Spatiotemporal Distribution of Income Inequality Based on DM-I3EM

3.3. Spatiotemporal Evolution of Spatial Clustering of Income Inequality Based on DM-I3EM

4. Discussion

4.1. Sensitivity of Auxiliary Variables to Income Inequality

4.2. Transferability of DM-I3EM to Other Inequality Indices

4.3. Limitations and Prospects

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI