Estimation of Poverty Using Random Forest Regression with Multi-Source Data: A Case Study in Bangladesh
1
Key Laboratory of Geographic Information Science (Ministry of Education), East China Normal University, Shanghai 200241, China
2
School of Geographic Sciences, East China Normal University, Shanghai 200241, China
3
School of Earth and Environmental Sciences, The University of Queensland, Brisbane, QLD 4072, Australia
*
Authors to whom correspondence should be addressed.
Remote Sens. 2019, 11(4), 375; https://doi.org/10.3390/rs11040375
Received: 20 January 2019 / Revised: 2 February 2019 / Accepted: 9 February 2019 / Published: 13 February 2019
(This article belongs to the Special Issue Advances in Remote Sensing with Nighttime Lights)
Spatially explicit and reliable data on poverty is critical for both policy makers and researchers. However, such data remain scarce particularly in developing countries. Current research is limited in using environmental data from different sources in isolation to estimate poverty despite the fact that poverty is a complex phenomenon which cannot be quantified either theoretically or practically by one single data type. This study proposes a random forest regression (RFR) model to estimate poverty at 10 km × 10 km spatial resolution by combining features extracted from multiple data sources, including the National Polar-orbiting Partnership Visible Infrared Imaging Radiometer Suite (NPP-VIIRS) Day/Night Band (DNB) nighttime light (NTL) data, Google satellite imagery, land cover map, road map and division headquarter location data. The household wealth index (WI) drawn from the Demographic and Health Surveys (DHS) program was used to reflect poverty level. We trained the RFR model using data in Bangladesh and applied the model to both Bangladesh and Nepal to evaluate the model’s accuracy. The results show that the R2 between the actual and estimated WI in Bangladesh is 0.70, indicating a good predictive power of our model in WI estimation. The R2 between actual and estimated WI of 0.61 in Nepal also indicates a good generalization ability of the model. Furthermore, a negative correlation is observed between the district average WI and the poverty head count ratio (HCR) in Bangladesh with the Pearson Correlation Coefficient of -0.6. Using Gini importance, we identify that proximity to urban areas is the most important variable to explain poverty which contribute to 37.9% of the explanatory power. Compared to the study that used NTL and Google satellite imagery in isolation to estimate poverty, our method increases the accuracy of estimation. Given that the data we use are globally and publicly available, the methodology reported in this study would also be applicable in other countries or regions to estimate the extent of poverty.
View Full-Text
▼
Show Figures
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
MDPI and ACS Style
Zhao, X.; Yu, B.; Liu, Y.; Chen, Z.; Li, Q.; Wang, C.; Wu, J. Estimation of Poverty Using Random Forest Regression with Multi-Source Data: A Case Study in Bangladesh. Remote Sens. 2019, 11, 375.
Show more citation formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.