Detecting Ethnic Spatial Distribution of Business People Using Machine Learning
Abstract
:1. Introduction
2. Materials and Methods
2.1. Large-Scale Surname Data
2.2. Using RNN
2.3. Method for Abstracting Ethnic Linkages
3. Results
3.1. Prediction of Nationality
3.2. Classification by Ethnicity
3.3. Prediction of Ethnic Distribution within a Country
3.4. Spatial Distribution Analysis of African Continent
3.5. Spatial Distribution Analysis of Europe
4. Discussion
4.1. Segregation of Racial and Ethnic Groups in Society
4.2. Extensions to Methodology
4.3. Relationship between Information and Ethnic Networks
4.4. Future Economic Environment and National Network
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Farrer, L.A.; Cupples, L.A.; Haines, J.L.; Hyman, B.; Kukull, W.A.; Mayeux, R.; Myers, R.H.; Pericak-Vance, M.A.; Risch, N.; Van Duijn, C.M. Effects of Age, Sex, and Ethnicity on the Association Between Apolipoprotein E Genotype and Alzheimer Disease: A Meta-analysis. JAMA 2003, 278, 1349–1356. [Google Scholar] [CrossRef]
- Stacey, J.; De Andrade, M.; Miller, V.M. Genetics of cardiovascular disease: Importance of sex and ethnicity. Atherosclerosis 2015, 241, 219–228. [Google Scholar]
- Gillborn, D. Race, Ethnicity and Education: Teaching and Learning in Multi-Ethnic Schools; Routledge: London, UK, 1990. [Google Scholar]
- Bhopal, K. Gender, ethnicity and career progression in UK higher education: a case study analysis. Res. Pap. Educ. 2019, 1–16. [Google Scholar] [CrossRef]
- McGowen, R. The many colors of crime: Inequalities of race, ethnicity, and crime in America; NYU Press: New York, NY, USA, 2006. [Google Scholar]
- Rojas-Gaona, C.E.; Hong, J.S.; Peguero, A.A. The significance of race/ethnicity in adolescent violence: A decade of review 2005–2015. J. Crim. Justice 2016, 46, 137–147. [Google Scholar] [CrossRef]
- Burchard, E.G.; Ziv, E.; Coyle, N.; Gomez, S.L.; Tang, H.; Karter, A.J.; Mountain, J.L.; Pérez-Stable, E.J.; Sheppard, D.; Risch, N. The importance of race and ethnic background in biomedical research and clinical practice. New Engl. J. Med. 2003, 348, 1170–1175. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nanchahal, K.; Mangtani, P.; Alston, M.; dos Santos Silva, I. Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British health-related studies. J. Public Health 2001, 23, 278–285. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Barr, D.A. Health Disparities in the United States: Social Class, Race, Ethnicity, and Health, 2nd ed.; Johns Hopkins University Press: Baltimore, MA, USA, 2014. [Google Scholar]
- Quesada, J.; Hart, L.K.; Bourgois, P. Structural vulnerability and health: Latino migrant laborers in the United States. Med. Anthropol. 2011, 30, 339–362. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lauderdale, D.S.; Kestenbaum, B. Asian American ethnic identification by surname. Popul. Res. Policy Rev. 2000, 19, 283–300. [Google Scholar] [CrossRef]
- Mateos, P. A review of name-based ethnicity classification methods and their potential in population studies. Popul. Space Place 2007, 13, 243–263. [Google Scholar] [CrossRef]
- Schnell, R.; Gramlich, T.; Bachteler, T.; Reiher, J.; Trappmann, M.; Smid, M.; Becher, I. A new Name-Based Sampling Method for Migrants using n-grams. Method Data Anal. 2013, 7, 1. [Google Scholar] [CrossRef]
- Appiah, O. Ethnic identification on adolescents evaluations of advertisements. J. Advert. Res. 2001, 41, 7–22. [Google Scholar] [CrossRef]
- Richard, W. Using names to segment customers by cultural, ethnic or religious origin. J. Direct Data Digit. Mark. Pract. 2007, 8, 226–242. [Google Scholar]
- Coldman, A.J.; Braun, T.; Gallagher, R.P. The classification of ethnic status using name information. J. Epidemiol. Community Health 1988, 42, 390–395. [Google Scholar]
- Ambekar, A.; Ward, C.; Mohammed, J.; Male, S.; Skiena, S. Name-ethnicity classification from open sources. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 49–58. [Google Scholar]
- Chang, J.; Rosenn, I.; Backstrom, L.; Marlow, C. Ethnicity on social networks. In Proceedings of the 4th Int’l AAAI Conference on Weblogs and Social Media, Washington, DC, USA, 23–26 May 2010; pp. 18–25. [Google Scholar]
- Liu, W.; Ruths, D. What’s in a name? using first names as features for gender inference in twitter. In Proceedings of the AAAI spring symposium, Stanford, CA, USA, 25–27 March 2019. [Google Scholar]
- Pennacchiotti, M. Popescu, Ana-Maria A machine learning approach to twitter user classification. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Catalonia, Spain, 17–21 July 2011; pp. 281–288. [Google Scholar]
- Mazières, A.; Roth, C. Large-scale diversity estimation through surname origin inference. Bull. Sociol. Methodol. 2018, 139, 59–73. [Google Scholar] [CrossRef]
- Overview and Benchmark of Traditional and Deep Learning Models in Text Classification. Available online: https://ahmedbesbes.com/overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification.html (accessed on 28 February 2020).
- Race–Census Bureau. Available online: https://www.census.gov/topics/population/race/about.html (accessed on 28 February 2020).
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- PyTorch. Available online: https://pytorch.org/ (accessed on 28 February 2020).
- Rosvall, M.; Axelsson, D.; Bergstrom, C.T. The map equation. Eur. Phys. J. Special Topics 2009, 178, 13–23. [Google Scholar] [CrossRef]
- 120 Years of Olympic History: Athletes and Results. Available online: https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results (accessed on 28 February 2020).
- Lee, J.; Kim, H.; Ko, M.; Choi, D.; Choi, J.; Kang, J. Name Nationality Classification with Recurrent Neural Networks. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 2081–2087. [Google Scholar]
- Wikipedia.org. Available online: https://www.wikipedia.org/ (accessed on 28 February 2020).
- World Population Review. Available online: http://worldpopulationreview.com/ (accessed on 28 February 2020).
- CIA Factbook-The World Factbook. Available online: https://www.cia.gov/library/publications/the-world-factbook/ (accessed on 28 February 2020).
- Google Map. Available online: https://maps.google.com/ (accessed on 28 February 2020).
- MapChart: Create Custom Map. Available online: https://mapchart.net/ (accessed on 28 February 2020).
Input | Output (Top 3) (Log-Likelihood) Predicted Nationality |
---|---|
Smith | (−0.264) United Kingdom (−0.320) Australia (−0.401) New Zealand |
Obama | (−0.246) Kenya (−0.424) Nigeria (−0.470) Japan |
Mori | (−0.042) Japan (−0.614) Papua New Guinea (−0.656) Italy |
Rank | Country | Precision |
---|---|---|
1 | Iceland | 0.95 |
2 | Japan | 0.91 |
3 | Bulgaria | 0.86 |
4 | Greece | 0.85 |
5 | Thailand | 0.83 |
… | ||
73 | United States | 0.52 |
74 | New Zealand | 0.51 |
75 | Australia | 0.51 |
76 | Canada | 0.50 |
77 | Afghanistan | 0.50 |
AVG. | 0.66 |
Region | Balanced Accuracy |
---|---|
UK | 0.61 |
Germany | 0.68 |
Italy | 0.70 |
India | 0.57 |
France | 0.73 |
Spain | 0.71 |
Ethnic Groups | Population (%) | Prediction (%) | Business (%) | Watch List (%) | Physicians (%) | |
---|---|---|---|---|---|---|
1 | Non-Hispanic whites | 69.13 | 68.8 | 69.9 | 75.7 | 61.5 |
2 | Hispanics | 12.5 | 11.3 | 6.9 | 8.2 | 11.0 |
3 | African Americans | 12.0 | 13.1 | 12.3 | 10.2 | 13.9 |
4 | Asian Americans | 3.6 | 6.6 | 10.8 | 5.7 | 13.5 |
Rank | Country | Dominant Ethnic Group | % | Entropy |
---|---|---|---|---|
1 | Rep. of Korea | Korean | 99 | 0.48573 |
2 | Iceland | Icelandic | 93 | 0.561222 |
3 | Japan | Japanese | 98.5 | 0.853479 |
4 | Vietnam | Vietnamese | 85.7 | 1.0056 |
5 | Bulgaria | Bulgarian | 85 | 1.223591 |
… | ||||
38 | Belgium | Flemish | 52 | 4.254518 |
39 | Canada | Canadian | 32.3 | 4.38808 |
40 | Luxembourg | Luxembourgers | 55 | 4.486005 |
41 | Philippines | Visayan | 32.9 | 4.563974 |
42 | Afghanistan | Pashtun | 42.1 | 4.621831 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jun, J.; Mizuno, T. Detecting Ethnic Spatial Distribution of Business People Using Machine Learning. Information 2020, 11, 197. https://doi.org/10.3390/info11040197
Jun J, Mizuno T. Detecting Ethnic Spatial Distribution of Business People Using Machine Learning. Information. 2020; 11(4):197. https://doi.org/10.3390/info11040197
Chicago/Turabian StyleJun, Joomi, and Takayuki Mizuno. 2020. "Detecting Ethnic Spatial Distribution of Business People Using Machine Learning" Information 11, no. 4: 197. https://doi.org/10.3390/info11040197
APA StyleJun, J., & Mizuno, T. (2020). Detecting Ethnic Spatial Distribution of Business People Using Machine Learning. Information, 11(4), 197. https://doi.org/10.3390/info11040197