Application of Machine Learning Models in Social Sciences: Managing Nonlinear Relationships
Definition
:1. Introduction
1.1. Overview of Nonlinear Relationships in Social Sciences
1.2. Introduction to Machine Learning
2. How to Apply Machine Learning Models in Social Sciences
2.1. Machine Learning Models for Nonlinear Relationships
2.2. Model Evaluation, Validation, and Handling Imbalanced Data
2.3. Practical Recommendations for Applying Machine Learning in Social Science Research
2.3.1. Prioritize Data Quality and Preprocessing
2.3.2. Model Selection Based on Research Goals
2.3.3. Avoid Overfitting and Ensure Generalization
2.3.4. Incorporate Ethical Considerations
2.3.5. Interpreting Complex Machine Learning Models
2.3.6. Communicating Results to Diverse Audiences
3. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Room, G. The Empirical Investigation of Nonlinear Dynamics in the Social World. Ontology, Methodology and Data. Sociologica 2020, 14, 163–193. [Google Scholar]
- Kravchenko, S. The birth of “normal trauma”: The effect of nonlinear development. Econ. Sociol. 2020, 13, 150–159. [Google Scholar] [CrossRef]
- Strydom, G.; Ewing, M.T.; Heggen, C. Time lags, nonlinearity and asymmetric effects in an extended service-profit chain. Eur. J. Mark. 2020, 54, 2343–2363. [Google Scholar] [CrossRef]
- Girme, Y.U. Step out of line: Modeling nonlinear effects and dynamics in close-relationships research. Curr. Dir. Psychol. Sci. 2020, 29, 351–357. [Google Scholar] [CrossRef]
- Sanclemente Ibáñez, F.J.; Gamero Vázquez, N.; Arenas Moreno, A.; Medina Díaz, F.J. Linear and nonlinear relationships between job demands-resources and psychological and physical symptoms of service sector employees. When is the midpoint a good choice? Front. Psychol. 2022, 1329, 950908. [Google Scholar]
- Hope, T.M. Linear regression. In Machine Learning; Academic Press: Cambridge, MA, USA, 2020; pp. 67–81. [Google Scholar]
- Okoye, K.; Hosseini, S. Regression Analysis in R: Linear Regression and Logistic Regression. In R Programming: Statistical Data Analysis in Research; Springer Nature Singapore: Singapore, 2024; pp. 131–158. [Google Scholar]
- Munir, K.; Kanwal, A. Impact of educational and gender inequality on income and income inequality in South Asian countries. Int. J. Soc. Econ. 2020, 47, 1043–1062. [Google Scholar] [CrossRef]
- Caffrey-Maffei, L. Education, Self-Importance, and the Propensity for Political Participation. Perceptions 2019, 5. [Google Scholar] [CrossRef]
- Oser, J.; Hooghe, M. Democratic ideals and levels of political participation: The role of political and social conceptualisations of democracy. Br. J. Politics Int. Relat. 2018, 20, 711–730. [Google Scholar] [CrossRef]
- Pellicer, M.; Assaad, R.; Krafft, C.; Salemi, C. Grievances or skills? The effect of education on youth political participation in Egypt and Tunisia. Int. Political Sci. Rev. 2022, 43, 191–208. [Google Scholar] [CrossRef]
- Dim, E.E.; Schafer, M.H. Age, Political Participation, and Political Context in Africa. J. Gerontol. Ser. B Psychol. Sci. Soc. Sci. 2024, 79, gbae035. [Google Scholar] [CrossRef]
- Pickering, D. Political activation and social movements: Addressing non-participation in Aotearoa New Zealand. Sociol. Compass 2023, 17, e13022. [Google Scholar] [CrossRef]
- Džunić, M.; Golubović, N. Civic and Political Participation in Transition Countries: The Case of Serbia. Facta Univ. Ser. Econ. Organ. 2018, 15, 001–013. [Google Scholar] [CrossRef]
- Kutuk, Y.; Usturali, A. The nonlinear relationship between political trust and nonelectoral political participation in democratic and nondemocratic regimes. Soc. Sci. Q. 2023, 104, 478–504. [Google Scholar] [CrossRef]
- Nickels, S.; Steinhauer, K. Prosody–syntax integration in a second language: Contrasting event-related potentials from German and Chinese learners of English using linear mixed effect models. Second Lang. Res. 2018, 34, 9–37. [Google Scholar] [CrossRef]
- Weng, S.F.; Reps, J.; Kai, J.; Garibaldi, J.M.; Qureshi, N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS ONE 2017, 12, e0174944. [Google Scholar] [CrossRef]
- Bone, A.E.; Gomes, B.; Etkind, S.N.; Verne, J.; Murtagh, F.E.; Evans, C.J.; Higginson, I.J. What is the impact of population ageing on the future provision of end-of-life care? Population-based projections of place of death. Palliat. Med. 2018, 32, 329–336. [Google Scholar] [CrossRef]
- Guimarães, M.H.; Sousa, C.; Garcia, T.; Dentinho, T.; Boski, T. The value of improved water quality in Guadiana estuary—A transborder application of contingent valuation methodology. Lett. Spat. Resour. Sci. 2011, 4, 31–48. [Google Scholar] [CrossRef]
- Laparra, V.; Malo, J. Visual aftereffects and sensory nonlinearities from a single statistical framework. Front. Hum. Neurosci. 2015, 9, 557. [Google Scholar] [CrossRef]
- Simpson, A.H.; Richardson, S.J.; Laughlin, D.C. Soil–climate interactions explain variation in foliar, stem, root and reproductive traits across temperate forests. Glob. Ecol. Biogeogr. 2016, 25, 964–978. [Google Scholar] [CrossRef]
- Wouters, A.; Pauwels, B.; Lambrechts, H.A.; Pattyn, G.G.; Ides, J.; Baay, M.; Meijnders, P.; Lardon, F.; Vermorken, J.B. Counting clonogenic assays from normoxic and anoxic irradiation experiments manually or by using densitometric software. Phys. Med. Biol. 2010, 55, N167. [Google Scholar] [CrossRef]
- Parkes, L.; Kim, J.Z.; Stiso, J.; Calkins, M.E.; Cieslak, M.; Gur, R.E.; Gur, R.C.; Moore, T.M.; Ouellet, M.; Roalf, D.R.; et al. Asymmetric signaling across the hierarchy of cytoarchitecture within the human connectome. Sci. Adv. 2022, 8, eadd2185. [Google Scholar] [CrossRef] [PubMed]
- Rørvik, E.; Fjæra, L.F.; Dahle, T.J.; Dale, J.E.; Engeseth, G.M.; Stokkevåg, C.H.; Thörnqvist, S.; Ytre-Hauge, K.S. Exploration and application of phenomenological RBE models for proton therapy. Phys. Med. Biol. 2018, 63, 185013. [Google Scholar] [CrossRef] [PubMed]
- Bonnebaigt, R.; Caulfield, C.P.; Linden, P.F. Detrainment of plumes from vertically distributed sources. Environ. Fluid Mech. 2018, 18, 3–25. [Google Scholar] [CrossRef] [PubMed]
- Alpaydin, E. Machine Learning; MIT Press: Cambridge, MA, USA, 2021. [Google Scholar]
- El Naqa, I.; Murphy, M.J. What Is Machine Learning? Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 3–11. [Google Scholar]
- Sammut, C.; Webb, G.I. (Eds.) Encyclopedia of Machine Learning; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. Machine Learning Basics [PowerPoint Slides]. 2016. Available online: http://whdeng.cn/Teaching/PPT_01_Machine%20learning%20Basics.pdf (accessed on 20 November 2024).
- Zhou, Z.H. Machine Learning; Springer Nature: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Elhanashi, A.; Saponara, S.; Dini, P.; Zheng, Q.; Morita, D.; Raytchev, B. An integrated and real-time social distancing, mask detection, and facial temperature video measurement system for pandemic monitoring. J. Real-Time Image Process. 2023, 20, 95. [Google Scholar] [CrossRef]
- Levy, J.; Mussack, D.; Brunner, M.; Keller, U.; Cardoso-Leite, P.; Fischbach, A. Contrasting classical and machine learning approaches in the estimation of value-added scores in large-scale educational data. Front. Psychol. 2020, 11, 2190. [Google Scholar] [CrossRef]
- Yılmaz, K.; Turanlı, M. A multi-disciplinary investigation of linearization deviations in different regression models. Asian J. Probab. Stat. 2023, 22, 15–19. [Google Scholar] [CrossRef]
- Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
- Hainmueller, J.; Mummolo, J.; Xu, Y. How much should we trust estimates from multiplicative interaction models? Simple tools to improve empirical practice. Political Anal. 2019, 27, 163–192. [Google Scholar] [CrossRef]
- Wu, J.; Chen, S.; Zhou, W.; Wang, N.; Fan, Z. Evaluation of feature selection methods using bagging and boosting ensemble techniques on high throughput biological data. In Proceedings of the 2020 10th International Conference on Biomedical Engineering and Technology, Tokyo, Japan, 15–18 September 2020; pp. 170–175. [Google Scholar]
- Mitchell, T.M.; Mitchell, T.M. Machine Learning; McGraw-hill: New York, NY, USA, 1997; Volume 1. [Google Scholar]
- Morris, C.; Raman, S.; Seymour, S. Openness to social science knowledges? The politics of disciplinary collaboration within the field of UK food security research. Sociol. Rural. 2019, 59, 23–43. [Google Scholar] [CrossRef]
- Ray, L. Explaining Violence-Towards a Critical Friendship with Neuroscience? J. Theory Soc. Behav. 2016, 46, 335–356. [Google Scholar] [CrossRef]
- Greener, J.G.; Kandathil, S.M.; Moffat, L.; Jones, D.T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 2022, 23, 40–55. [Google Scholar] [CrossRef] [PubMed]
- Neuman, Y.; Cohen, Y. AI for identifying social norm violation. Sci. Rep. 2023, 13, 8103. [Google Scholar] [CrossRef] [PubMed]
- van Putten, I.; Kelly, R.; Cavanagh, R.D.; Murphy, E.J.; Breckwoldt, A.; Brodie, S.; Cvitanovic, C.; Dickey-Collas, M.; Dickey-Collas, M.; Melbourne-Thomas, J.; et al. A decade of incorporating social sciences in the integrated marine biosphere research project (IMBeR): Much done, much to do? Front. Mar. Sci. 2021, 8, 662350. [Google Scholar] [CrossRef]
- Lebaron, F.; Castro, T.A.F. Some contributions from Geometry to linear models’ construction in Social Sciences. Bull. Sociol. Methodol./Bull. Méthodol. Sociol. 2018, 140, 90–109. [Google Scholar] [CrossRef]
- Yuan, Y.; Zhu, W. Artificial Intelligence-Enabled Social Science: A Bibliometric Analysis. In Proceedings of the 2022 3rd International Conference on Artificial Intelligence and Education (IC-ICAIE 2022), Chengdu, China, 24–26 June 2022; Atlantis Press: Dordrecht, The Netherlands, 2022; pp. 1602–1608. [Google Scholar]
- Leach, M.; Scoones, I. The social and political lives of zoonotic disease models: Narratives, science and policy. Soc. Sci. Med. 2013, 88, 10–17. [Google Scholar] [CrossRef]
- Veltri, G.A. Big data is not only about data: The two cultures of modelling. Big Data Soc. 2017, 4, 2053951717703997. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
- Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695. [Google Scholar] [CrossRef]
- Edelmann, A.; Wolff, T.; Montagne, D.; Bail, C.A. Computational social science and sociology. Annu. Rev. Sociol. 2020, 46, 61–81. [Google Scholar] [CrossRef]
- Li, Y.; Wang, S.; Song PX, K.; Wang, N.; Zhou, L.; Zhu, J. Doubly regularized estimation and selection in linear mixed-effects models for high-dimensional longitudinal data. Stat. Its Interface 2018, 11, 721. [Google Scholar] [CrossRef]
- Ahearn, C.; Brand, J.E. Predicting layoff among fragile families. Socius Sociol. Res. Dyn. World 2019, 5, 237802311880975. [Google Scholar] [CrossRef]
- Nakagawa, S.; Schielzeth, H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol. Evol. 2013, 4, 133–142. [Google Scholar] [CrossRef]
- Kong, D.; Zhu, J.; Duan, C.; Lu, L.; Chen, D. Bayesian linear regression for surface roughness prediction. Mech. Syst. Signal Process. 2020, 142, 106770. [Google Scholar] [CrossRef]
- Playford, C.J.; Gayle, V.; Connelly, R.; Gray, A.J. Administrative Social Science Data: The Challenge of Reproducible Research. Big Data Soc. 2016, 3, 2053951716684143. [Google Scholar] [CrossRef]
- Molina, M.; Garip, F. Machine learning for sociology. Annu. Rev. Sociol. 2019, 45, 27–45. [Google Scholar] [CrossRef]
- Di Franco, G.; Santurro, M. From big data to machine learning: An empirical application for social sciences. Athens J. Soc. Sci. 2023, 2, 79–100. [Google Scholar] [CrossRef]
- Lo-Thong-Viramoutou, O.; Charton, P.; Cadet, X.F.; Grondin-Perez, B.; Saavedra, E.; Damour, C.; Cadet, F. Nonlinearity of Metabolic Pathways Critically Influences the Choice of Machine Learning Model. Front. Artif. Intell. 2022, 5, 744755. [Google Scholar] [CrossRef]
- Hilbert, S.; Coors, S.; Kraus, E.; Bischl, B.; Lindl, A.; Frei, M.; Wild, J.; Krauss, S.; Goretzko, D.; Stachl, C. Machine learning for the educational sciences. Rev. Educ. 2021, 9, e3310. [Google Scholar] [CrossRef]
- Wu, P.; Jiang, J. Robust estimation of mean squared prediction error in small-area estimation. Can. J. Stat. 2021, 49, 362–396. [Google Scholar] [CrossRef]
- Freeman, K. Text as Data: A New Framework for Machine Learning and the Social Sciences; Princeton University Press: Princeton, NJ, USA, 2023. [Google Scholar]
- Kern, C.; Klausch, T.; Kreuter, F. Tree-based machine learning methods for survey research. In Survey Research Methods; NIH Public Access: Bethesda, MD, USA, 2019; Volume 13, p. 73. [Google Scholar]
- Wu, C.; Wang, G.; Hu, S.; Liu, Y.; Mi, H.; Zhou, Y.; Guo, Y.-K.; Song, T. A data driven methodology for social science research with left-behind children as a case study. PLoS ONE 2020, 15, e0242483. [Google Scholar] [CrossRef]
- Gibson, W.J.; Nafee, T.; Travis, R.; Yee, M.; Kerneis, M.; Ohman, M.; Gibson, C.M. Machine learning versus traditional risk stratification methods in acute coronary syndrome: A pooled randomized clinical trial analysis. J. Thromb. Thrombolysis 2020, 4, 1–9. [Google Scholar] [CrossRef]
- Zhong, S.; Zhang, K.; Bagheri, M.; Burken, J.G.; Gu, A.; Li, B.; Ma, X.; Marrone, B.L.; Ren, Z.J.; Schrier, J.; et al. Machine learning: New ideas and tools in environmental science and engineering. Environ. Sci. Technol. 2021, 55, 12741–12754. [Google Scholar] [CrossRef]
- Pukelis, L.; Stančiauskas, V. The opportunities and limitations of using artificial neural networks in social science research. Politologija 2019, 94, 56–80. [Google Scholar] [CrossRef]
- Chen, Y.; Gao, Q.; Liang, F.; Wang, X. Nonlinear variable selection via deep neural networks. J. Comput. Graph. Stat. 2021, 30, 484–492. [Google Scholar] [CrossRef]
- Cleophas, T.J.; Zwinderman, A.H.; Cleophas, T.J.; Zwinderman, A.H. Neural Networks for Assessing Relationships that are Typically Nonlinear (90 Patients). In Machine Learning in Medicine—A Complete Overview; Springer: Berlin/Heidelberg, Germany, 2020; pp. 423–427. [Google Scholar]
- Clark, D.G.; Abbott, L.F.; Litwin-Kumar, A. Dimension of activity in random neural networks. Phys. Rev. Lett. 2023, 131, 118401. [Google Scholar] [CrossRef]
- Rao, A.R.; Reimherr, M. Nonlinear functional modeling using neural networks. J. Comput. Graph. Stat. 2023, 32, 1248–1257. [Google Scholar] [CrossRef]
- Fan, W.; Ma, Y.; Li, Q.; Wang, J.; Cai, G.; Tang, J.; Yin, D. A graph neural network framework for social recommendations. IEEE Trans. Knowl. Data Eng. 2020, 34, 2033–2047. [Google Scholar] [CrossRef]
- Bungert, L.; Hait-Fraenkel, E.; Papadakis, N.; Gilboa, G. Nonlinear power method for computing eigenvectors of proximal operators and neural networks. SIAM J. Imaging Sci. 2021, 14, 1114–1148. [Google Scholar] [CrossRef]
- Linka, K.; Schäfer, A.; Meng, X.; Zou, Z.; Karniadakis, G.E.; Kuhl, E. Bayesian Physics Informed Neural Networks for real-world nonlinear dynamical systems. Comput. Methods Appl. Mech. Eng. 2022, 402, 115346. [Google Scholar] [CrossRef]
- Mienye, I.D.; Sun, Y. A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
- Sahin, E.K. Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. SN Appl. Sci. 2020, 2, 1308. [Google Scholar] [CrossRef]
- Bentéjac, C.; Csörgo, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2020, 54, 1937–1967. [Google Scholar] [CrossRef]
- Pop, C.B.; Chifu, V.R.; Cordea, C.; Chifu, E.S.; Barsan, O. Forecasting the Short-Term Energy Consumption Using Random Forests and Gradient Boosting. In Proceedings of the 2021 20th RoEduNet Conference: Networking in Education and Research (RoEduNet), Iasi, Romania, 4–6 November 2021; pp. 1–6. [Google Scholar]
- Jafarzadeh, H.; Mahdianpari, M.; Gill, E.; Mohammadimanesh, F.; Homayouni, S. Bagging and boosting ensemble classifiers for classification of multispectral, hyperspectral and PolSAR data: A comparative evaluation. Remote Sens. 2021, 13, 4405. [Google Scholar] [CrossRef]
- Saifan, R.; Sharif, K.; Abu-Ghazaleh, M.; Abdel-Majeed, M. Investigating algorithmic stock market trading using ensemble machine learning methods. Informatica 2020, 44, 311–325. [Google Scholar] [CrossRef]
- Gabidolla, M.; Carreira-Perpiñán, M.Á. Pushing the envelope of gradient boosting forests via globally-optimized oblique trees. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 285–294. [Google Scholar]
- Pahno, S.; Yang, J.J.; Kim, S.S. Use of machine learning algorithms to predict subgrade resilient modulus. Infrastructures 2021, 6, 78. [Google Scholar] [CrossRef]
- Malek, N.H.A.; Yaacob, W.F.W.; Wah, Y.B.; Nasir, S.A.M.; Shaadan, N.; Indratno, S.W. Comparison of ensemble hybrid sampling with bagging and boosting machine learning approach for imbalanced data. Indones. J. Elec. Eng. Comput. Sci. 2023, 29, 598–608. [Google Scholar] [CrossRef]
- Xie, Y.; Peng, M. Forest fire forecasting using ensemble learning approaches. Neural Comput. Appl. 2019, 31, 4541–4550. [Google Scholar] [CrossRef]
- Yadav, D.C.; Pal, S. Analysis of heart disease using parallel and sequential ensemble methods with feature selection techniques: Heart disease prediction. Int. J. Big Data Anal. Healthc. (IJBDAH) 2021, 6, 40–56. [Google Scholar] [CrossRef]
- González, S.; García, S.; Del Ser, J.; Rokach, L.; Herrera, F. A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities. Inf. Fusion 2020, 64, 205–237. [Google Scholar] [CrossRef]
- Raj, V.; Dotse, S.Q.; Sathyajith, M.; Petra, M.I.; Yassin, H. Ensemble machine learning for predicting the power output from different solar photovoltaic systems. Energies 2023, 16, 671. [Google Scholar] [CrossRef]
- Noviandy, T.R.; Maulana, A.; Idroes, G.M.; Emran, T.B.; Tallei, T.E.; Helwani, Z.; Idroes, R. Ensemble machine learning approach for quantitative structure-activity relationship based drug discovery: A Review. Infolitika J. Data Sci. 2023, 1, 32–41. [Google Scholar] [CrossRef]
- Galicia, A.; Talavera-Llames, R.; Troncoso, A.; Koprinska, I.; Martínez-Álvarez, F. Multi-step forecasting for big data time series based on ensemble learning. Knowl.-Based Syst. 2019, 163, 830–841. [Google Scholar] [CrossRef]
- Bologna, G. A rule extraction technique applied to ensembles of neural networks, random forests, and gradient-boosted trees. Algorithms 2021, 14, 339. [Google Scholar] [CrossRef]
- Thabtah, F.; Hammoud, S.; Kamalov, F.; Gonsalves, A. Data imbalance in classification: Experimental evaluation. Inf. Sci. 2020, 513, 429–441. [Google Scholar] [CrossRef]
- Takase, T.; Oyama, S.; Kurihara, M. Evaluation of stratified validation in neural network training with imbalanced data. In Proceedings of the 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), Kyoto, Japan, 27 February–2 March 2019; pp. 1–4. [Google Scholar]
- Liu, B.; Zhang, H.; Yang, L.; Dong, L.; Shen, H.; Song, K. An experimental evaluation of imbalanced learning and time-series validation in the context of CI/CD prediction. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering, Trondheim, Norway, 15–17 April 2020; pp. 21–30. [Google Scholar]
- Zheng, M.; Wang, F.; Hu, X.; Miao, Y.; Cao, H.; Tang, M. A method for analyzing the performance impact of imbalanced binary data on machine learning models. Axioms 2022, 11, 607. [Google Scholar] [CrossRef]
- Gan, Y.; Dai, Z.; Wu, L.; Liu, W.; Chen, L. Deep Reinforcement Learning and Dempster-Shafer Theory: A Unified Approach to Imbalanced Classification. In Proceedings of the 2023 3rd International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Wuhan, China, 15–17 December 2023; pp. 67–72. [Google Scholar]
- Zhao, Z.; Liang, J.; Wang, W.; Tang, J.; Fu, X.; Yan, Y. Fusion Model Classification Algorithm for Imbalanced Data. Solid State Technol. 2020, 63, 1663–1673. [Google Scholar]
- Sadouk, L.; Gadi, T.; Essoufi, E.H. A novel cost-sensitive algorithm and new evaluation strategies for regression in imbalanced domains. Expert Syst. 2021, 38, e12680. [Google Scholar] [CrossRef]
- Tanov, V.; Ivanov, I. Data-centric optimization method to imbalanced datasets. In Proceedings of the International Conference on Mathematical and Statistical Physics, Computational Science, Education, and Communication (ICMSCE 2022), Istanbul, Turkey, 8–9 December 2023; SPIE: Bellingham, WA, USA, 2023; Volume 12616, p. 1261602. [Google Scholar]
- Rezvani, S.; Wang, X. Class imbalance learning using fuzzy ART and intuitionistic fuzzy twin support vector machines. Inf. Sci. 2021, 578, 659–682. [Google Scholar] [CrossRef]
- Mienye, I.D.; Sun, Y. Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inform. Med. Unlocked 2021, 25, 100690. [Google Scholar] [CrossRef]
- Thölke, P.; Mantilla-Ramos, Y.-J.; Abdelhedi, H.; Maschke, C.; Dehgan, A.; Harel, Y.; Kemtur, A.; Berrada, L.M.; Sahraoui, M.; Young, T.; et al. Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. NeuroImage 2023, 277, 120253. [Google Scholar] [CrossRef] [PubMed]
- Hussein, A.S.; Li, T.; Yohannese, C.W.; Bashir, K. A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE. Int. J. Comput. Intell. Syst. 2019, 12, 1412–1422. [Google Scholar] [CrossRef]
- Thumpati, A.; Zhang, Y. Towards Optimizing Performance of Machine Learning Algorithms on Unbalanced Dataset. In Proceedings of the Artificial Intelligence Applications, Vienna, Austria, 28–29 October 2023; pp. 169–183. [Google Scholar] [CrossRef]
- Fan, Z.; Qian, J.; Sun, B.; Wu, D.; Xu, Y.; Tao, Z. Modeling voice pathology detection using imbalanced learning. In Proceedings of the 2020 International Conference on Sensing, Measurement & Data Analytics in the era of Artificial Intelligence (ICSMD), Xi’an, China, 15–17 October 2020; pp. 330–334. [Google Scholar]
- Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef] [PubMed]
- Hodson, T.O.; Over, T.M.; Foks, S.S. Mean squared error, deconstructed. J. Adv. Model. Earth Syst. 2021, 13, e2021MS002681. [Google Scholar] [CrossRef]
- Silva, A.; Ribeiro, R.P.; Moniz, N. Model optimization in imbalanced regression. In Proceedings of the International Conference on Discovery Science, Montpellier, France, 10 October 2022; Springer: Cham, Switzerland, 2022; pp. 3–21. [Google Scholar]
- Rahman, H.A.A.; Wah, Y.B.; Huat, O.S. Predictive Performance of Logistic Regression for Imbalanced Data with Categorical Covariate. Pertanika J. Sci. Technol. 2021, 29, 181–197. [Google Scholar] [CrossRef]
- Ren, J.; Zhang, M.; Yu, C.; Liu, Z. Balanced mse for imbalanced visual regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7926–7935. [Google Scholar]
- Laxmi Sree, B.R.; Vijaya, M.S. A Weighted Mean Square Error Technique to Train Deep Belief Networks for Imbalanced Data. Int. J. Simul. Syst. Sci. Technol. 2018. [Google Scholar] [CrossRef]
- Branco, P.; Torgo, L.; Ribeiro, R.P. SMOGN: A preprocessing approach for imbalanced regression. In First International Workshop on Learning with Imbalanced Domains: Theory and Applications; PMLR: New York, NY, USA, 2017; pp. 36–50. [Google Scholar]
- Kou, Y.; Fu, G.H. ASER: Adapted squared error relevance for rare cases prediction in imbalanced regression. J. Chemom. 2023, 37, e3515. [Google Scholar] [CrossRef]
- Ge, J.; Chen, H.; Zhang, D.; Hou, X.; Yuan, L. Active learning for imbalanced ordinal regression. IEEE Access 2020, 8, 180608–180617. [Google Scholar] [CrossRef]
- Annur Sinaga, B.; Vionanda, D.; Permana, D.; Salma, A. Comparison of error rate prediction methods in binary logistic regression modeling for imbalanced data. UNP J. Stat. Data Sci. 2023, 1, 361–368. [Google Scholar] [CrossRef]
- Gadekar, B.; Hiwarkar, T. A Critical Evaluation of Business Improvement through Machine Learning: Challenges, Opportunities, and Best Practices. Int. J. Recent Innov. Trends Comput. Commun. 2023, 11, 264–276. [Google Scholar] [CrossRef]
- Whang, S.E.; Lee, J.G. Data collection and quality challenges for deep learning. Proc. VLDB Endow. 2020, 13, 3429–3432. [Google Scholar] [CrossRef]
- Soni, A.; Arora, C.; Kaushik, R.; Upadhyay, V. Evaluating the Impact of Data Quality on Machine Learning Model Performance. J. Nonlinear Anal. Optim. 2023, 14, 13–18. [Google Scholar] [CrossRef]
- Whang, S.E.; Roh, Y.; Song, H.; Lee, J.G. Data collection and quality challenges in deep learning: A data-centric ai perspective. VLDB J. 2023, 32, 791–813. [Google Scholar] [CrossRef]
- Toms, A.; Whitworth, S. Ethical Considerations in the Use of Machine Learning for Research and Statistics. Int. J. Popul. Data Sci. 2022, 7. [Google Scholar] [CrossRef]
- Ximenes, B.H.; Ramalho, G.L. Concrete ethical guidelines and best practices in machine learning development. In Proceedings of the 2021 IEEE International Symposium on Technology and Society (ISTAS), Waterloo, ON, Canada, 28–31 October 2021; pp. 1–8. [Google Scholar]
- Ratul, Q.E.A.; Serra, E.; Cuzzocrea, A. Evaluating attribution methods in machine learning interpretability. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 5239–5245. [Google Scholar]
- Rodríguez-Pérez, R.; Bajorath, J. Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J. Med. Chem. 2019, 63, 8761–8777. [Google Scholar] [CrossRef] [PubMed]
- Man, X.; Chan, E. The best way to select features? Comparing mda, lime, and shap. J. Financ. Data Sci. Winter 2021, 3, 127–139. [Google Scholar] [CrossRef]
- Jalali, A.; Schindler, A.; Haslhofer, B.; Rauber, A. Machine Learning Interpretability Techniques for Outage Prediction: A Comparative Study. PHM Soc. Eur. Conf. 2020, 5, 10. [Google Scholar] [CrossRef]
- Fang, J.P.; Zhou, J.; Cui, Q.; Tang, C.Z.; Li, L.F. Interpreting model predictions with constrained perturbation and counterfactual instances. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2251001. [Google Scholar] [CrossRef]
- Rashi, A.; Madamala, R. Minimum Relevant Features to Obtain AI Explainable System for Predicting Breast Cancer in WDBC. Int. J. Health Sci. 2022, 6, 1312–1326. [Google Scholar] [CrossRef]
- Kyriazos, T.; Poga, M. Quantum Concepts in Psychology: Exploring the Interplay of Physics and the Human Psyche. Biosystems 2024, 235, 105070. [Google Scholar] [CrossRef]
- Kyriazos, T.; Poga, M. Leveraging Network Insights into Positive Emotions and Resilience for Better Life Satisfaction. The Open Public Health J. 2024, 17, e18749445338146. [Google Scholar] [CrossRef]
- Kyriazos, T.; Poga, M. Life Satisfaction, Anxiety, Stress, Depression, and Resilience: A Multigroup Latent Class Analysis. Trends Psychol. 2024, 1–21. [Google Scholar] [CrossRef]
- Kyriazos, T.; Poga, M. Planfulness in Psychological Well-being: Mediating Roles of Self-Efficacy and Presence of Meaning in Life. Appl. Res. Qual. Life 2024, 19, 1927–1950. [Google Scholar] [CrossRef]
Aspect | Key Points | Challenges Addressed |
---|---|---|
Model Validation |
|
|
Cross-Validation Techniques |
|
|
Performance Metrics |
|
|
Handling Imbalanced Data |
|
|
Ethical Considerations |
|
|
Transparency Tools |
|
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kyriazos, T.; Poga, M. Application of Machine Learning Models in Social Sciences: Managing Nonlinear Relationships. Encyclopedia 2024, 4, 1790-1805. https://doi.org/10.3390/encyclopedia4040118
Kyriazos T, Poga M. Application of Machine Learning Models in Social Sciences: Managing Nonlinear Relationships. Encyclopedia. 2024; 4(4):1790-1805. https://doi.org/10.3390/encyclopedia4040118
Chicago/Turabian StyleKyriazos, Theodoros, and Mary Poga. 2024. "Application of Machine Learning Models in Social Sciences: Managing Nonlinear Relationships" Encyclopedia 4, no. 4: 1790-1805. https://doi.org/10.3390/encyclopedia4040118
APA StyleKyriazos, T., & Poga, M. (2024). Application of Machine Learning Models in Social Sciences: Managing Nonlinear Relationships. Encyclopedia, 4(4), 1790-1805. https://doi.org/10.3390/encyclopedia4040118