# Predicting Multidimensional Poverty with Machine Learning Algorithms: An Open Data Source Approach Using Spatial Data

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Works

- They use census covariates as predictors and the Principal Component Analysis (PCA) as a multidimensional reduction technique, choosing five components. They reported that their MPI has 0 and 1 values, which they excluded for the first experiment. Gradient Boosting tree regression and Random Forest were the methods. The best performance result was using the Gradient Boosting algorithm (r-squared equal to 0.6789, and RMSE equal to 0.7818);
- In the second experiment, they used the same variables as the first one, but this time they included the 0 and 1 MPI values. The best result was using the Gradient Boosting algorithm; the R-squared was equal to 0.6537 and the RMSE equal to 1.1898; meanwhile, with the second-best algorithm, they achieved an R-squared of 62.81% and an RMSE of 1.233;
- In The third experiment, they used sentinel-2 images as input features. They used Resnet34 as a pre-trained model to transfer knowledge and fine-tune the data. They applied data augmentation, rotations of 90 degrees on the horizontal and vertical axis were performed, and image contrast was performed. They extracted 512 covariates from the neural network (the weights). After the future extraction, they applied PCA to obtain five components, which they interpolated with the natural neighbor’s method at the block level. They estimated the MPI using approach 1 (A1), which includes the 0 and 1 MPI values, and approach 2 (A2), which excludes the 0 and 1 values. The best result reported reached an RMSE equal to 0.9067 and an R-squared equal to 0.5757 using a Random Forest as a classifier and following the A2 approach.

#### Contributions of This Study

- Propose an accessible data source to estimate multidimensional poverty at a high level of granularity;
- Apply machine learning methods on spatial features to estimate multidimensional poverty at the street block level.

## 3. Materials and Methods

#### 3.1. Area of Interest

#### 3.2. Data

#### 3.2.1. National Department of Statistics (DANE)

#### 3.2.2. OpenStreetMaps (OSM)

#### 3.2.3. European Space Agency (ESA)

#### 3.3. Methods

#### 3.3.1. Linear Regression

#### 3.3.2. Support Vector Regression Machine

#### 3.3.3. Random Forest

#### 3.3.4. eXtreme Gradient Boosting XGBoost

#### 3.3.5. Light Gradient Boosting Machine LightGBM

#### 3.3.6. CatBoost

#### 3.4. Data Preparation and Estimation

#### 3.5. Software

## 4. Results

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

DANE | National Statistical Department |

ESA | European Space Agency |

MPI | Multidimensional poverty index |

## Notes

1 | Estimación Pobreza multidimensional: https://github.com/sandboxDANE/IPM-Pobrezamultidimensional, accessed on 18 March 2023. |

2 | https://planet.openstreetmap.org/, accessed on 18 March 2023. |

3 | https://geoportal.dane.gov.co/visipm/, accessed on 18 March 2023. |

4 | LightGBM Documentation: https://lightgbm.readthedocs.io/en/v3.3.2/, accessed on 18 March 2023. |

5 | Cuadra (urbanismo), https://es.wikipedia.org/wiki/Cuadra_(urbanismo), accessed on 18 March 2023; Urban design compendium, https://wiki.sustainabletechnologies.ca/images/8/8f/2_UrbanDesignCompendium.pdf, accessed on 18 March 2023. |

## References

- Alkire, Sabina. 2005. Valuing Freedoms: Sen’s Capability Approach and Poverty Reduction. Oxford: Oxford University Press on Demand. [Google Scholar]
- Alkire, Sabina, and James Foster. 2011. Counting and multidimensional poverty measurement. Journal of Public Economics 95: 476–87. [Google Scholar] [CrossRef]
- Alkire, Sabina, Usha Kanagaratnam, and Nicolai Suppa. 2020. The Global Multidimensional Poverty Index (mpi) 2020. Available online: https://www.ophi.org.uk/wp-content/uploads/OPHI_MPI_MN_49_2020.pdf (accessed on 24 February 2023).
- Alkire, Sabina, José Manuel Roche, Paola Ballon, James Foster, Maria Emma Santos, and Suman Seth. 2015. Multidimensional Poverty Measurement and Analysis. New York: Oxford University Press. [Google Scholar]
- Angulo, Roberto, Yadira Díaz, and R. Rodriguez Pardo. 2016. The colombian multidimensional poverty index: Measuring poverty in a public policy context. Social Indicators Research 127: 1–38. [Google Scholar] [CrossRef]
- Ayush, Kumar, Burak Uzkent, Marshall Burke, David Lobell, and Stefano Ermon. 2020. Generating interpretable poverty maps using object detection in satellite images. arXiv arXiv:2002.01612. [Google Scholar]
- Blumenstock, Joshua, Gabriel Cadamuro, and Robert On. 2015. Predicting poverty and wealth from mobile phone metadata. Science 350: 1073–76. [Google Scholar] [CrossRef]
- Breiman, Leo. 2001. Random forests. Machine learning 45: 5–32. [Google Scholar] [CrossRef]
- Browne, Chris, David S. Matteson, Linden McBride, Leiqiu Hu, Yanyan Liu, Ying Sun, Jiaming Wen, and Christopher B. Barrett. 2021. Multivariate random forest prediction of poverty and malnutrition prevalence. PLoS ONE 16: e0255519. [Google Scholar] [CrossRef]
- Chica-Olmo, Jorge, Angeles Sánchez, and Fabio H. Sepúlveda-Murillo. 2020. Assessing colombia’s policy of socio-economic stratification: An intra-city study of self-reported quality of life. Cities 97: 102560. [Google Scholar] [CrossRef]
- Daniels, Rhonda, and Corinne Mulley. 2013. Explaining walking distance to public transport: The dominance of public transport supply. Journal of Transport and Land Use 6: 5–20. [Google Scholar] [CrossRef]
- Duque, Juan C., Jorge E. Patino, Luis A. Ruiz, and Josep E. Pardo-Pascual. 2015. Measuring intra-urban poverty using land cover and texture metrics derived from remote sensing data. Landscape and Urban Planning 135: 11–21. [Google Scholar] [CrossRef]
- Engstrom, Ryan, Jonathan Hersh, and David Newhouse. 2017. Poverty from Space: Using High-Resolution Satellite Imagery for Estimating Economic Well-Being. Working Paper 8284. Oxford: Oxford University Press. [Google Scholar]
- Gebru, Timnit, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, and Li Fei-Fei. 2017. Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the united states. Proceedings of the National Academy of Sciences 114: 13108–13. [Google Scholar] [CrossRef]
- Hall, Ola, Mattias Ohlsson, and Thorsteinn Rögnvaldsson. 2022. A review of explainable ai in the satellite data, deep machine learning, and human poverty domain. Patterns 3: 100600. [Google Scholar] [CrossRef]
- Hastie, Trevor, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. 2017. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Berlin: Springer, vol. 2. [Google Scholar]
- Hu, Shan, Yong Ge, Mengxiao Liu, Zhoupeng Ren, and Xining Zhang. 2022. Village-level poverty identification using machine learning, high-resolution images, and geospatial data. International Journal of Applied Earth Observation and Geoinformation 107: 102694. [Google Scholar] [CrossRef]
- Hu, Tengyun, Jun Yang, Xuecao Li, and Peng Gong. 2016. Mapping urban land use by using landsat images and open social data. Remote Sensing 8: 151. [Google Scholar] [CrossRef]
- Ibrahim, Abdullahi, Muhammed M. Muhammed, Samuel O. Sowole, Ridwan Raheem, and Rabiat O. Abdulaziz. 2020. Performance of Catboost Classifier and Other Machine Learning Methods. Available online: https://www.datasciencehub.net/system/files/ds-paper-644.pdf (accessed on 24 February 2023).
- Jangaraj, Avanija, Gurram Sunitha, Reddy Madhavi, Padmavathi Kora, R. Hitesh, and Sai Associate. 2021. Prediction of house price using xgboost regression algorithm. Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12: 2151–55. [Google Scholar]
- Jean, Neal, Marshall Burke, Michael Xie, W. Matthew Davis, David B. Lobell, and Stefano Ermon. 2016. Combining satellite imagery and machine learning to predict poverty. Science 353: 790–94. [Google Scholar] [CrossRef]
- Ledesma, Chiara, Oshean Lee Garonita, Lorenzo Jaime Flores, Isabelle Tingzon, and Danielle Dalisay. 2020. Interpretable poverty mapping using social media data, satellite images, and geospatial information. arXiv arXiv:2011.13563. [Google Scholar]
- Lee, Kamwoo, and Jeanine Braithwaite. 2020. High-resolution poverty maps in sub-saharan africa. arXiv arXiv:2009.00544. [Google Scholar] [CrossRef]
- Li, Dong, and Jiming Liu. 2019. Uncovering the relationship between point-of-interests-related human mobility and socioeconomic status. Telematics and Informatics 39: 49–63. [Google Scholar] [CrossRef]
- Li, Qing, Shuai Yu, Damien Échevin, and Min Fan. 2022. Is poverty predictable with machine learning? a study of dhs data from kyrgyzstan. Socio-Economic Planning Sciences 81: 101195. [Google Scholar] [CrossRef]
- Liu, Mengxiao, Shan Hu, Yong Ge, Gerard B. M. Heuvelink, Zhoupeng Ren, and Xiaoran Huang. 2021. Using multiple linear regression and random forests to identify spatial poverty determinants in rural china. Spatial Statistics 42: 100461. [Google Scholar] [CrossRef]
- Moya-Gómez, Borja, Marcin Stępniak, Juan Carlos García-Palomares, Enrique Frías-Martínez, and Javier Gutiérrez. 2021. Exploring night and day socio-spatial segregation based on mobile phone data: The case of medellin (colombia). Computers, Environment and Urban Systems 89: 101675. [Google Scholar] [CrossRef]
- Niu, Tong, Yimin Chen, and Yuan Yuan. 2020. Measuring urban poverty using multi-source data and a random forest algorithm: A case study in guangzhou. Sustainable Cities and Society 54: 102014. [Google Scholar] [CrossRef]
- Nussbaum, Martha C. 2001. Women and Human Development: The Capabilities Approach. Cambridge: Cambridge University Press, vol. 3. [Google Scholar]
- Pandey, Shailesh, Tushar Agarwal, and Narayanan C. Krishnan. 2018. Multi-task deep learning for predicting poverty from satellite images. Paper presented at AAAI Conference on Artificial Intelligence, Volume 32, Hilton New Orleans Riverside, New Orleans, LA, USA, April 27. [Google Scholar]
- Pokhriyal, Neeti, and Damien Christophe Jacques. 2017. Combining disparate data sources for improved poverty prediction and mapping. Proceedings of the National Academy of Sciences 114: E9783–E9792. [Google Scholar] [CrossRef] [PubMed]
- Pokhriyal, Neeti, Omar Zambrano, Jennifer Linares, and Hugo Hernández. 2020. Estimating and Forecasting Income Poverty and Inequality in Haiti Using Satellite Imagery and Mobile Phone Data. Technical Report. Washington, DC: Inter-American Development Bank. [Google Scholar] [CrossRef]
- Prokhorenkova, Liudmila, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2019. Catboost: Unbiased Boosting with Categorical Features. Available online: https://arxiv.org/pdf/1706.09516.pdf (accessed on 29 March 2023).
- Puttanapong, Nattapong, Arturo Martinez, Joseph Albert Nino Bulan, Mildred Addawe, Ron Lester Durante, and Marymell Martillan. 2022. Predicting poverty using geospatial data in thailand. ISPRS International Journal of Geo-Information 11: 293. [Google Scholar] [CrossRef]
- Sachs, Jeffrey, Christian Kroll, Guillame Lafortune, Grayson Fuller, and Finn Woelm. 2021. Sustainable Development Report 2021. Cambridge: Cambridge University Press. [Google Scholar]
- Santa, Guberney Muñetón, Laura Pineda Varela, and Juan Pablo Keep Buitrago. 2019. Medición de la pobreza multidimensional para la ciudad de medellín, colombia. Revista de Ciencias Sociales 25: 114–29. [Google Scholar] [CrossRef]
- Schonlau, Matthias, and Rosie Yuyan Zou. 2020. The random forest algorithm for statistical learning. The Stata Journal 20: 3–29. [Google Scholar] [CrossRef]
- Sen, Amartya. 1985. Commodities and Capabilities. Oxford: Oxford University Press. [Google Scholar]
- Sen, Amartya. 1992. Inequality reexamined. Cambridge: Harvard University Press. [Google Scholar]
- Sen, Amartya. 1999. Development as Freedom. New York: Anchor Books. [Google Scholar]
- Sen, Amartya. 2017. Collective Choice and Social Welfare. Cambridge: Harvard University Press. [Google Scholar]
- Sepúlveda Murillo, Fabio Humberto, Jorge Chica Olmo, and Norely Margarita Soto Builes. 2019. Spatial variability analysis of quality of life and its determinants: A case study of medellín, colombia. Social Indicators Research 144: 1233–56. [Google Scholar] [CrossRef]
- Sheehan, Evan, Chenlin Meng, Matthew Tan, Burak Uzkent, Neal Jean, Marshall Burke, David Lobell, and Stefano Ermon. 2019. Predicting economic development using geolocated wikipedia articles. Paper presented at 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, August 4–8; pp. 2698–2706. [Google Scholar]
- Sohnesen, Thomas Pave, and Niels Stender. 2017. Is random forest a superior methodology for predicting poverty? An empirical assessment. Poverty & Public Policy 9: 118–33. [Google Scholar]
- Steele, Jessica E., Pål Roe Sundsøy, Carla Pezzulo, Victor A. Alegana, Tomas J. Bird, Joshua Blumenstock, Johannes Bjelland, Kenth Engø-Monsen, Yves-Alexandre de Montjoye, Asif M Iqbal, and et al. 2017. Mapping poverty using mobile phone and satellite data. Journal of The Royal Society Interface 14: 20160690. [Google Scholar] [CrossRef]
- Suel, Esra, Samir Bhatt, Michael Brauer, Seth Flaxman, and Majid Ezzati. 2021. Multimodal deep learning from satellite and street-level imagery for measuring income, overcrowding, and environmental deprivation in urban areas. Remote Sensing of Environment 257: 112339. [Google Scholar] [CrossRef]
- Sugiyama, Takemi, Akio Kubota, Masaaki Sugiyama, Rachel Cole, and Neville Owen. 2019. Distances walked to and from local destinations: Age-related variations and implications for determining buffer sizes. Journal of Transport & Health 15: 100621. [Google Scholar] [CrossRef]
- UNDP and OPHI. 2021. Global Multidimensional Poverty Index 2021—Unmasking Disparities by Ethnicity, Caste and Gender. Technical Report. Oxford: United Nations Development Programme and Oxford Poverty and Human Development Initiative. [Google Scholar]
- Usmanova, Aziza, Ahmed Aziz, Dilshodjon Rakhmonov, and Walid Osamy. 2022. Utilities of artificial intelligence in poverty prediction: A review. Sustainability 14: 14238. [Google Scholar] [CrossRef]
- Watmough, Gary R., Charlotte L. J. Marcinko, Clare Sullivan, Kevin Tschirhart, Patrick K. Mutuo, Cheryl A. Palm, and Jens-Christian Svenning. 2019. Socioecologically informed use of remote sensing data to predict rural household poverty. Proceedings of the National Academy of Sciences 116: 1213–18. [Google Scholar] [CrossRef]
- Weidmann, Nils B., and Sebastian Schutte. 2017. Using night light emissions for the prediction of local wealth. Journal of Peace Research 54: 125–40. [Google Scholar] [CrossRef]
- Xi, Yanxin, Tong Li, Huandong Wang, Yong Li, Sasu Tarkoma, and Pan Hui. 2022. Beyond the first law of geography: Learning representations of satellite imagery by leveraging point-of-interests. Paper presented at ACM Web Conference 2022, online, April 25. [Google Scholar]
- Ye, Mao, Peifeng Yin, Wang-Chien Lee, and Dik-Lun Lee. 2011. Exploiting geographical influence for collaborative point-of-interest recommendation. Paper presented at 34th international ACM SIGIR conference on Research and development in Information Retrieval, Beijing, July 24–28; pp. 325–34. [Google Scholar]
- Ye, Tingting, Naizhuo Zhao, Xuchao Yang, Zutao Ouyang, Xiaoping Liu, Qian Chen, Kejia Hu, Wenze Yue, Jiaguo Qi, Zhansheng Li, and et al. 2019. Improved population mapping for china using remotely sensed and points-of-interest data within a random forests model. Science of the Total Environment 658: 936–46. [Google Scholar] [CrossRef]

**Figure 5.**Difference between the ground truth value of the MPI and the value estimated by the Random Forest model 2.

**Figure 7.**Difference between the ground truth value of the MPI and the value estimated by the XGBoost model.

**Figure 9.**Difference between the ground truth value of the MPI and the value estimated by the LightGBM model.

**Figure 11.**Difference between the ground truth value of the MPI and the value estimated by the CatBoost model.

**Figure 13.**Difference between the ground truth value of the MPI and the value estimated by the SVM model.

**Figure 15.**Difference between the ground truth value of the MPI and the value estimated by the GLM model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Muñetón-Santa, G.; Manrique-Ruiz, L.C.
Predicting Multidimensional Poverty with Machine Learning Algorithms: An Open Data Source Approach Using Spatial Data. *Soc. Sci.* **2023**, *12*, 296.
https://doi.org/10.3390/socsci12050296

**AMA Style**

Muñetón-Santa G, Manrique-Ruiz LC.
Predicting Multidimensional Poverty with Machine Learning Algorithms: An Open Data Source Approach Using Spatial Data. *Social Sciences*. 2023; 12(5):296.
https://doi.org/10.3390/socsci12050296

**Chicago/Turabian Style**

Muñetón-Santa, Guberney, and Luis Carlos Manrique-Ruiz.
2023. "Predicting Multidimensional Poverty with Machine Learning Algorithms: An Open Data Source Approach Using Spatial Data" *Social Sciences* 12, no. 5: 296.
https://doi.org/10.3390/socsci12050296