# Modeling Job Satisfaction of Peruvian Basic Education Teachers Using Machine Learning Techniques

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{*}

## Abstract

**:**

## 1. Introduction

- We determined that the XGBoost and Random Forest algorithms allow us to obtain predictive models of job satisfaction of Peruvian basic education teachers with a balanced accuracy of 74%, sensitivity of 74%, F1-Score of 0.48, negative predictive value of 0.94, higher values of true-positives (479 instances) and lower values of false-negatives (168 instances). These values are unprecedented in the prediction of job satisfaction of basic education teachers in Peru.
- We were able to identify that the economic income, the satisfaction with: life, self-esteem, pedagogical activity and relationship with the director, in addition to perception of living conditions, satisfaction with their family relationships, problem of depression-related health and relationship satisfaction with colleagues turned out to be the most important variables in predicting teachers’ job satisfaction.
- Finally, we made available to the scientific community a set of pre-processed data with 13,302 records, 11 predictive columns and 1 to predict, to perform replicate experiments with other machine learning algorithms. We obtained these columns after the feature selection process applied to the original data set, which initially had 15,087 records and 942 columns.

## 2. State-of-the-Art

## 3. Materials and Methods

#### 3.1. Data Cleaning and Preprocessing

#### 3.1.1. Inputting Missing Values and Eliminating Outliers

#### 3.1.2. Robust Scaling Data

#### 3.2. Feature Selection

#### 3.2.1. ANOVA F-Test Filter

#### 3.2.2. Chi-Square Filter

#### 3.2.3. Construction of the Final Data Set

#### 3.3. Predictive Modeling

#### 3.3.1. Training and Test Data Set

#### 3.3.2. Hyper-Parameter Tuning and Model Training

#### 3.3.3. Model Validation

#### 3.3.4. Evaluation and Comparison of Models Obtained

## 4. Results

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## Appendix A

#### Appendix A.1

#### Appendix A.2

#### Appendix A.3

**Balanced accuracy**For binary classification this value is equal to the arithmetic mean of the sensitivity and specificity.

- $TP$ are the true-positives. $TN$ are the true negatives.
- $FN$ are the false-negatives. $FP$ are the false-positives.

**Sensitivity**Represents the proportion of true-positives that the model predicted correctly (Positive Class 1: Dissatisfied).

**Specificity**Represents the proportion of true negatives that the model predicted correctly (Negative Class 0: Satisfactory).

**Positive predictive value (PPV)**Also called precision, represents the probability of an instance being of the positive class having been predicted as positive by the model.

**Negative predictive value (NPV)**Represents the probability of an instance being of the negative class having been predicted as negative by the model.

**Cohen’s kappa coefficient**Measures the agreement between classifiers.

- where, $Pr\left(a\right)$: the percentage of agreement observed.
- $Pr\left(e\right)$: is the probability that the inter-rater agreement is due to chance.

**Area under the ROC curve (AUC).**Indicates how good the model is at discriminating instances of the positive class and the negative class. This value fluctuates between 0 and 1. According to the Ref. [67], the closer this value is to 1, the greater the discriminating capacity of the model.

#### Appendix A.4

Description | Variable |
---|---|

Total income in the previous month | P311_2m |

Time since first job as a teacher | TPT |

Number of months you have been working at this school as a contract | P311_2m |

Number of years you have been working consecutively at this school as a contract | P311_2a |

Travel time in hours to work | THT |

Number of times you attended services or performances of film features in the past 12 months | P815B$06 |

Number of times you visited monuments in the last 12 months | P815B$02 |

Average number of students per classroom | P314A2 |

Number of times you visited archaeological sites in the last 12 months | P815B$03 |

Number of months you have been working at this school as appointed | P311_1m |

Number of times you visited museums in the last 12 months | P815B$01 |

Number of times you visited book fairs in the last 12 months | P815B$10 |

Amount of time spent working as a team or conversing with colleagues from your school | P316A_H$4 |

Number of times you visited craft fairs in the last 12 months | P815B$11 |

Number of public schools in which you currently teach | P305_1 |

Number of times you visited libraries or reading rooms in the last 12 months | P815B$09 |

Total cost of commuting to work | CTT |

Number of years you have worked consecutively at this school as appointed | P311_1a |

Commute time in minutes to work | TMT |

Time spent teaching classes in a typical week | P316A_H$1 |

Description | Variable |
---|---|

Satisfaction with your life | P818_1 |

Satisfaction with your self-esteem | P818_6 |

Satisfaction with your pedagogical activity | P819_1 |

Satisfaction with the relationship with the director | P819_5 |

Perception of living conditions | P509 |

Satisfaction with your salary | P819_8 |

Satisfaction with their family relationships | P818_8 |

The year before she suffered from depression | P401_12 |

Satisfaction with the relationship with their colleagues | P819_4 |

Frequency with which you work with poor lighting in class | P407_3 |

Satisfaction with the achievements of their students | P819_2 |

The previous year he suffered stress | P401_9 |

Satisfaction with the conditions of your future retirement | P818_5 |

The previous year he suffered anxiety | P401_10 |

Satisfaction with education that you can give your children | P818_4 |

If you could choose any teaching position in the country, would it be in the same district? | P319 |

Satisfaction with your health | P818_2 |

Qualification of the teaching methodology used by teachers in their teacher training | P210A_2 |

Qualification of the thematic contents of the courses/learning areas received in their teacher training | P210A_1 |

Satisfaction with their relationship with parents | P819_6 |

## References

- Hassan, O.; Ibourk, A. Burnout, self-efficacy and job satisfaction among primary school teachers in Morocco. Soc. Sci. Humanit. Open
**2021**, 4, 100148. [Google Scholar] [CrossRef] - Serrano-García, V.; Ortega-Andeane, P.; Reyes-Lagunes, I.; Riveros-Rosas, A. Traducción y Adaptación al Español del Cuestionario de Satisfacción Laboral para Profesores. Acta Investig. Psicol.
**2015**, 5, 2112–2123. [Google Scholar] [CrossRef] [Green Version] - Lopes, J.; Oliveira, C. Teacher and school determinants of teacher job satisfaction: A multilevel analysis. Sch. Eff. Sch. Improv.
**2020**, 31, 641–659. [Google Scholar] [CrossRef] - Sadeghi, K.; Ghaderi, F.; Abdollahpour, Z. Self-reported teaching effectiveness and job satisfaction among teachers: The role of subject matter and other demographic variables. Heliyon
**2021**, 7, e07193. [Google Scholar] [CrossRef] - Aouadni, I.; Rebai, A. Decision support system based on genetic algorithm and multi-criteria satisfaction analysis (MUSA) method for measuring job satisfaction. Ann. Oper. Res.
**2016**, 256, 3–20. [Google Scholar] [CrossRef] [Green Version] - Lee, A.N.; Nie, Y. Understanding teacher empowerment: Teachers’ perceptions of principal’s and immediate supervisor’s empowering behaviours, psychological empowerment and work-related outcomes. Teach. Teach. Educ.
**2014**, 41, 67–79. [Google Scholar] [CrossRef] - Valles-Coral, M.A.; Salazar-Ramírez, L.; Injante, R.; Hernandez-Torres, E.A.; Juárez-Díaz, J.; Navarro-Cabrera, J.R.; Pinedo, L.; Vidaurre-Rojas, P. Density-Based Unsupervised Learning Algorithm to Categorize College Students into Dropout Risk Levels. Data
**2022**, 7, 165. [Google Scholar] [CrossRef] - Lopes Martins, D. Data science teaching and learning models: Focus on the Information Science area. Adv. Notes Inf. Sci.
**2022**, 2, 140–148. [Google Scholar] [CrossRef] - Araoz, E.G.E.; Ramos, N.A.G. Satisfacción laboral y compromiso organizacional en docentes de la amazonía peruana. Educ. Form.
**2021**, 6, e3854. [Google Scholar] [CrossRef] - Ruiz-Quiles, M.; Moreno-Murcia, J.A.; Vera-Lacárcel, J.A. Del soporte de autonomía y la motivación autodeterminada a la satisfacción docente. Eur. J. Educ. Psychol.
**2015**, 8, 68–75. [Google Scholar] [CrossRef] [Green Version] - Gabrani, G.; Kwatra, A. Machine learning based predictive model for risk assessment of employee attrition. Lect. Notes Comput. Sci. Incl. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform.
**2018**, 10963 LNCS, 189–201. [Google Scholar] [CrossRef] - Sisodia, D.S.; Vishwakarma, S.; Pujahari, A. Evaluation of machine learning models for employee churn prediction. In Proceedings of the International Conference on Inventive Computing and Informatics, ICICI 2017, Coimbatore, India, 23–24 November 2017; pp. 1016–1020. [Google Scholar] [CrossRef]
- Yogesh, I.; Suresh Kumar, K.R.; Candrashekaran, N.; Reddy, D.; Sampath, H. Predicting Job Satisfaction and Employee Turnover Using Machine Learning. J. Comput. Theor. Nanosci.
**2020**, 17, 4092–4097. [Google Scholar] [CrossRef] - Homocianu, D.; Plopeanu, A.P.; Florea, N.; Andries, A.M. Exploring the patterns of job satisfaction for individuals aged 50 and over from three historical regions of Romania. An inductive approach with respect to triangulation, cross-validation and support for replication of results. Appl. Sci.
**2020**, 10, 2573. [Google Scholar] [CrossRef] [Green Version] - Saisanthiya, D.; Gayathri, V.M.; Supraja, P. Employee attrition prediction using machine learning and sentiment analysis. Int. J. Adv. Trends Comput. Sci. Eng.
**2020**, 9, 7550–7557. [Google Scholar] [CrossRef] - Seok, B.W.; Wee, K.H.; Park, J.Y.; Anil Kumar, D.; Reddy, N.S. Modeling the teacher job satisfaction by artificial neural networks. Soft Comput.
**2021**, 25, 11803–11815. [Google Scholar] [CrossRef] - Rustam, F.; Ashraf, I.; Shafique, R.; Mehmood, A.; Ullah, S.; Sang Choi, G. Review prognosis system to predict employees job satisfaction using deep neural network. Comput. Intell.
**2021**, 37, 924–950. [Google Scholar] [CrossRef] - Chen, T.; Cao, Z.; Cao, Y. Comparison of job satisfaction prediction models for construction workers: Cart vs. neural network. Teh. Vjesn.
**2021**, 28, 1174–1181. [Google Scholar] [CrossRef] - Hong-Hua, M.; Mi, W.; Hong-Yum, L.; Yong-Mei, H. Influential factors of China’s elementary school teachers’ job satisfaction. Springer Proc. Math. Stat.
**2016**, 167, 339–361. [Google Scholar] [CrossRef] - Tomás, J.M.; De Los Santos, S.; Fernández, I. Job satisfaction of the Dominican teacher: Labor background. Rev. Colomb. Psicol.
**2019**, 28, 63–76. [Google Scholar] [CrossRef] [Green Version] - Asadujjaman, M.D.; Rashid, M.H.O.; Nayon, M.A.A.; Biswas, T.K.; Arani, M.; Billal, M.M. Teachers’ job satisfaction at tertiary education: A case of an engineering university in Bangladesh. In Proceedings of the International Conference on e-Learning, ICEL, Sakheer, Bahrain, 6–7 December 2020; pp. 238–242. [Google Scholar] [CrossRef]
- Al-Mahdy, Y.F.H.; Alazmi, A.A. Principal Support and Teacher Turnover Intention in Kuwait: Implications for Policymakers. Leadersh. Policy Sch.
**2023**, 22, 44–59. [Google Scholar] [CrossRef] - Luque-Reca, O.; García-Martínez, I.; Pulido-Martos, M.; Lorenzo Burguera, J.; Augusto-Landa, J.M. Teachers’ life satisfaction: A structural equation model analyzing the role of trait emotion regulation, intrinsic job satisfaction and affect. Teach. Teach. Educ.
**2022**, 113, 103668. [Google Scholar] [CrossRef] - Eirín-Nemiña, R.; Sanmiguel-Rodríguez, A.; Rodríguez-Rodríguez, J. Professional satisfaction of physical education teachers. Sport Educ. Soc.
**2022**, 27, 85–98. [Google Scholar] [CrossRef] - Karunanayake, V.J.; Wanniarachchi, J.C.; Karunanayake, P.N.; Rajapaksha, U.U. Intelligent System to Verify the Effectiveness of Proposed Teacher Transfers Incorporating Human Factors. In Proceedings of the ICARC 2022—2nd International Conference on Advanced Research in Computing: Towards a Digitally Empowered Society, Belihuloya, Sri Lanka, 23–24 February 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Saleh, L.; Abu-Soud, S. Predicting Jordanian Job Satisfaction Using Artificial Neural Network and Decision Tree. In Proceedings of the 2021 11th International Conference on Advanced Computer Information Technologies, ACIT 2021—Proceedings, Deggendorf, Germany, 15–17 September 2021; pp. 735–738. [Google Scholar] [CrossRef]
- Talingting, R.E. A data mining-driven model for job satisfaction prediction of school administrators in DepEd Surigao del Norte division. Int. J. Adv. Trends Comput. Sci. Eng.
**2019**, 8, 556–560. [Google Scholar] [CrossRef] - Arambepola, N.; Munasinghe, L. What makes job satisfaction in the information technology industry? In Proceedings of the 2021 International Research Conference on Smart Computing and Systems Engineering (SCSE), Colombo, Sri Lanka, 16 September 2021; pp. 99–105. [Google Scholar] [CrossRef]
- Lu, M.H.; Luo, J.; Chen, W.; Wang, M.C. The influence of job satisfaction on the relationship between professional identity and burnout: A study of student teachers in Western China. Curr. Psychol.
**2022**, 41, 289–297. [Google Scholar] [CrossRef] - Perera, H.N.; Maghsoudlou, A.; Miller, C.J.; McIlveen, P.; Barber, D.; Part, R.; Reyes, A.L. Relations of science teaching self-efficacy with instructional practices, student achievement and support, and teacher job satisfaction. Contemp. Educ. Psychol.
**2022**, 69, 102041. [Google Scholar] [CrossRef] - Zhang, J.; Yin, H.; Wang, T. Exploring the effects of professional learning communities on teacher’s self-efficacy and job satisfaction in Shanghai, China. Educ. Stud.
**2023**, 49, 17–34. [Google Scholar] [CrossRef] - Xia, J.; Wang, M.; Zhang, S. School culture and teacher job satisfaction in early childhood education in China: The mediating role of teaching autonomy. Asia Pac. Educ. Rev.
**2022**, 24, 101–111. [Google Scholar] [CrossRef] - Zhang, X.; Cheng, X.; Wang, Y. How Is Science Teacher Job Satisfaction Influenced by Their Professional Collaboration? Evidence from Pisa 2015 Data. Int. J. Environ. Res. Public Health
**2023**, 20, 1137. [Google Scholar] [CrossRef] - Aktan, O.; Toraman, Ç. The relationship between Technostress levels and job satisfaction of Teachers within the COVID-19 period. Educ. Inf. Technol.
**2022**, 27, 10429–10453. [Google Scholar] [CrossRef] [PubMed] - Elrayah, M. Improving Teaching Professionals’ Satisfaction through the Development of Self-efficacy, Engagement, and Stress Control: A Cross-sectional Study. Educ. Sci. Theory Pract.
**2022**, 22, 1–12. [Google Scholar] - Smet, M. Professional development and teacher job satisfaction: Evidence from a multilevel model. Mathematics
**2022**, 10, 51. [Google Scholar] [CrossRef] - Hussain, S.; Saba, N.U.; Ali, Z.; Hussain, H.; Hussain, A.; Khan, A. Job Satisfaction as a Predictor of Wellbeing Among Secondary School Teachers. SAGE Open
**2022**, 12, 21582440221138726. [Google Scholar] [CrossRef] - MINEDU. Ministerio de Educación del Perú|MINEDU. 2021. Available online: https://escale.minedu.gob.pe/uee/-/document_library_display/GMv7/view/5384052 (accessed on 30 September 2022).
- Brownlee, J.; Sanderson, M.; Koshy, A.; Cheremskoy, A.; Halfyard, J. Machine Learning Mastery With Python: Data Cleaning, Feature Selection, and Data Transforms in Python; Machine Learning Mastery: Vermont, VIC, Australia, 2020. [Google Scholar]
- Useche, L.; Mesa, D. Una introducción a la imputación de valores perdidos. Terra
**2006**, XXII, 127–152. [Google Scholar] - Navarro-Pastor, J.; Losilla-Vidal, J. Psicothema—Análisis De Datos Faltantes Mediante Redes Neuronales Artificiales. Psicothema
**2000**, 12, 503–510. [Google Scholar] - Cuesta, M.; Fonseca-Pedrero, E.; Vallejo, G.; Muñiz, J. Datos perdidos y propiedades psicométricas en los tests de personalidad. An. Psicol.
**2013**, 29, 285–292. [Google Scholar] [CrossRef] [Green Version] - Rosati, G. Construcción de un modelo de imputación para variables de ingreso con valores perdidos a partir de ensamble learning: Aplicación en la Encuesta permanente de hogares (EPH). SaberEs
**2017**, 9, 91–111. [Google Scholar] [CrossRef] - Alshawabkeh, M.; Jang, B.; Kaeli, D. Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems—ASPLOS, Pittsburgh, PA, USA, 14 March 2010; pp. 104–110. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Dashdondov, K.; Lee, S.M.; Kim, M.H. OrdinalEncoder and PCA based NB Classification for Leaked Natural Gas Prediction Using IoT based Remote Monitoring System. Smart Innov. Syst. Technol.
**2021**, 212, 252–259. [Google Scholar] [CrossRef] - Quintero, M.A.; Duran, M. Análisis del error tipo I en las pruebas de bondad de ajuste e independencia utilizando el muestreo con parcelas de tamaño variable (Bitterlich). Bosque
**2004**, 25, 45–55. [Google Scholar] [CrossRef] [Green Version] - Hancock, J.T.; Khoshgoftaar, T.M. Survey on categorical data for neural networks. J. Big Data
**2020**, 7, 1–41. [Google Scholar] [CrossRef] [Green Version] - Potdar, K. A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers. Int. J. Comput. Appl.
**2017**, 175, 975–8887. [Google Scholar] [CrossRef] - Zheng, A.; Casari, A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, 1st ed.; O’Reilly Media, Inc.: Sebastopol, Russia, 2018; p. 218. [Google Scholar]
- Fallucchi, F.; Coladangelo, M.; Giuliano, R.; De Luca, E.W. Predicting employee attrition using machine learning techniques. Computers
**2020**, 9, 86. [Google Scholar] [CrossRef] - Moon, N.N.; Mariam, A.; Sharmin, S.; Islam, M.M.; Nur, F.N.; Debnath, N. Machine learning approach to predict the depression in job sectors in Bangladesh. Curr. Res. Behav. Sci.
**2021**, 2, 100058. [Google Scholar] [CrossRef] - Torres-Vásquez, M.; Hernández-Torruco, J.; Hernández-Ocaña, B.; Chávez-Bosquez, O.; Torres-Vásquez, M.; Hernández-Torruco, J.; Hernández-Ocaña, B.; Chávez-Bosquez, O. Impacto de los algoritmos de sobremuestreo en la clasificación de subtipos principales del síndrome de guillain-barré. Ingenius. Rev. Cienc. Tecnol.
**2021**, 25, 20–31. [Google Scholar] [CrossRef] - Bourel, M.; Segura, A.M.; Crisci, C.; López, G.; Sampognaro, L.; Vidal, V.; Kruk, C.; Piccini, C.; Perera, G. Machine learning methods for imbalanced data set for prediction of faecal contamination in beach waters. Water Res.
**2021**, 202, 117450. [Google Scholar] [CrossRef] - Elgeldawi, E.; Sayed, A.; Galal, A.R.; Zaki, A.M. Hyper-parameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis. Informatics
**2021**, 8, 79. [Google Scholar] [CrossRef] - Yang, L.; Shami, A. On hyper-parameter optimization of machine learning algorithms: Theory and practice. Neurocomputing
**2020**, 415, 295–316. [Google Scholar] [CrossRef] - Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyper-parameter Optimization for Machine Learning Models Based on Bayesian Optimization. J. Electron. Sci. Technol.
**2019**, 17, 26–40. [Google Scholar] [CrossRef] - Buitinck, L.; Louppe, G.; Blondel, M.; Pedregosa, F.; Mueller, A.; Grisel, O.; Niculae, V.; Prettenhofer, P.; Gramfort, A.; Grobler, J.; et al. API design for machine learning software: Experiences from the scikit-learn project. In Proceedings of the ECML PKDD Workshop: Languages for Data Mining and Machine Learning, Würzburg, Germany, 16–20 September 2019; pp. 108–122. [Google Scholar]
- Wiȩckowska, B.; Kubiak, K.B.; Jóźwiak, P.; Moryson, W.; Stawińska-Witoszyńska, B. Cohen’s Kappa Coefficient as a Measure to Assess Classification Improvement following the Addition of a New Marker to a Regression Model. Int. J. Environ. Res. Public Health
**2022**, 19, 10213. [Google Scholar] [CrossRef] - Fujimura, S.; Kojima, T.; Okanoue, Y.; Shoji, K.; Inoue, M.; Omori, K.; Hori, R. Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network. J. Voice
**2022**, 36, 15–20. [Google Scholar] [CrossRef] [PubMed] - Kim, K.B.; Park, H.J.; Song, D.H. Combining Supervised and Unsupervised Fuzzy Learning Algorithms for Robust Diabetes Diagnosis. Appl. Sci.
**2022**, 13, 351. [Google Scholar] [CrossRef] - Kvak, D.; Chromcová, A.; Biroš, M.; Hrubý, R.; Kvaková, K.; Pajdaković, M.; Ovesná, P. Chest X-ray Abnormality Detection by Using Artificial Intelligence: A Single-Site Retrospective Study of Deep Learning Model Performance. BioMedInformatics
**2023**, 3, 82–101. [Google Scholar] [CrossRef] - Makansi, F.; Schmitz, K.; Makansi, F.; Schmitz, K. Data-Driven Condition Monitoring of a Hydraulic Press Using Supervised Learning and Neural Networks. Energies
**2022**, 15, 6217. [Google Scholar] [CrossRef] - Moreno-Ibarra, M.A.; Villuendas-Rey, Y.; Lytras, M.D.; Yáñez-Márquez, C.; Salgado-Ramírez, J.C. Classification of Diseases Using Machine Learning Algorithms: A Comparative Study. Mathematics
**2021**, 9, 1817. [Google Scholar] [CrossRef] - Gironés, J.; Casas, J.; Minguillón, J.; Caihuelas, R. Minería De Datos: Modelos y Algoritmos, 1st ed.; Editorial UOC: Barcelona, Spain, 2017; p. 274. [Google Scholar]
- Montgomery, D. Diseño y Análisis De Experimentos, 2nd ed.; Limusa Wiley: México City, Mexico, 2004; pp. 69–71. [Google Scholar]
- Cerda, J.; Cifuentes, L. Uso de curvas ROC en investigación clínica: Aspectos teórico-prácticos. Rev. Chil. Infectol.
**2012**, 29, 138–141. [Google Scholar] [CrossRef] [Green Version]

**Figure 5.**Class distribution in the target variable. (

**a**) Before data balancing. (

**b**) After data balancing.

**Figure 7.**Confusion matrix of the studied models. (

**a**) Logistic regression. (

**b**) Decision tree-CART. (

**c**) Random forest. (

**d**) Gradient boosting. (

**e**) XGBoost.

**Figure 9.**Area under the ROC curve. (

**a**) Logistic Regression model. (

**b**) Decision Tree-CART model. (

**c**) Random Forest model. (

**d**) Gradient Boosting model. (

**e**) XGBoost model.

Name | Value |
---|---|

Rows | 15,087 |

Columns | 942 |

Discrete columns | 66 |

Continuous columns | 873 |

All missing columns | 3 |

Missing observations | 4,717,895 |

Complete rows | 0 |

Total observations | 14,211,954 |

Description | Designation |
---|---|

Life satisfaction. | P818_1 |

Satisfaction with their self-esteem. | P818_6 |

Satisfaction with their pedagogical activity. | P819_1 |

Satisfaction with the relationship with the principal. | P819_5 |

Perception of living conditions. | P509 |

Satisfaction with their salary. | P819_8 |

Satisfaction with family relationships. | P818_8 |

Depression in the previous year. | P401_12 |

Satisfaction with their relationship with colleagues. | P819_4 |

Total income in the previous month. | P501_A |

P40112No | P40112Si | P501A | P509 | P8181 | P8186 | P8188 | P8191 | P8194 | P8195 | P8198 | JobSat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | 1 | 0 | −0.06520 | 2 | 2 | 3 | 2 | 2 | 3 | 2 | 2 | 0 |

1 | 1 | 0 | 0.59641 | 2 | 2 | 3 | 3 | 2 | 2 | 2 | 3 | 0 |

2 | 1 | 0 | −0.24108 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 0 |

... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |

13,299 | 1 | 0 | 0.59641 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 0 |

13,300 | 1 | 0 | 1.01516 | 2 | 3 | 2 | 2 | 2 | 3 | 2 | 3 | 0 |

13,301 | 1 | 0 | 1.01516 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 3 | 0 |

Model | Hyper-Parameter | Value Range | Description | Default Values | Optimal Values |
---|---|---|---|---|---|

LR | penalty | “[“l1”, “l2”, “elasticnet”]” | “Penalty type” | l2 | l2 |

C solver | [0.001, 0.01, 0.1, 1, 10] [“lbfgs”, “liblinear”, “newton-cg”, “newton-cholesky”, “sag”, “saga”] | Inverse of regularization strength Algorithm to use in the optimization problem. | 1.0 lbfgs | 1 liblinear | |

RF | n_estimators | [10, 17, 25, 33, 41, 48, 56, 64, 72, 80] | Number of decision trees in the random forest. | 100 | 64 |

max_depth Criterion | [3,5,7] [“gini”, “entropy”] | Maximum depth of trees. Function that measures the quality of the division. | none gini | 7 entropy | |

min_samples_split | [2,5] | Minimum number of samples required to split an internal node. | 2 | 5 | |

min_samples_leaf | [1,2] | Minimum number of samples required for a leaf node. | 1 | 2 | |

max_features | [“auto”, “sqrt”] | Number of randomly selected features without replacement in the division. | auto | auto | |

XGB | n_estimators | [10, 17, 25, 33, 41, 48, 56, 64, 72, 80] | Number of decision trees in XGB. | 100 | 72 |

max_depth learning_rate subsample | [3,5,7] [0.1] [0.6: 0.9] step 0.1 | Maximum depth of trees. learning rate Fraction of observations that will be random samples for each tree. | 6 0.3 1 | 3 0.1 0.9 | |

colsample_bytree | [0.6: 0.9] step 0.1 | Fraction of columns that will be random samples for each tree. | 1 | 0.6 | |

GB | n_estimators | [10, 17, 25, 33, 41, 48, 56, 64, 72, 80] | The number of sequential trees to model. | 100 | 48 |

max_depth min_samples_split | [3,5,7] [500: 1000] step 5.05 | Maximum depth of trees. Minimum number of samples required at a node for splitting. | 3 2 | 3 949 | |

min_samples_leaf | [20, 28, 37, 46, 55, 64, 73, 82, 91, 100] | Minimum number of samples required in a leaf node. | 1 | 55 | |

max_features | [“auto”, “sqrt”, “log2”] | Number of features to consider when looking for the best split. | None | log2 | |

subsample learning_rate | [0: 1] step 0.1 [0.01: 0.1] step 0.03 | Fraction of observations to select for each tree. Learning rate. | 1.0 0.1 | 0.3 0.09 | |

DT-Cart | Criterion | [“gini”, “entropy”, “log_loss”] | Function to measure the quality of a division. | gini | entropy |

max_depth min_samples_split | [2:10] step 1 [1: 10] step 1 | Max Tree Depth Minimum number of samples required for splitting on an internal node. | None 2 | 6 2 | |

min_samples_leaf | [2: 10] step 1 | Minimum number of samples required in a leaf node. | 1 | 3 | |

max_features | [“auto”, “sqrt”, “log2”,”none”] | Number of features to consider when looking for the best split of the tree. | None | None |

Model | DT-CART | GB | LR | RF | XGB |
---|---|---|---|---|---|

Accuracy | $0.75\pm 0.03$ | $0.76\pm 0.04$ | $0.74\pm 0.03$ | $\mathbf{0}.\mathbf{77}\pm \mathbf{0}.\mathbf{04}$ | $0.76\pm 0.04$ |

Sensitivity | $0.70\pm 0.05$ | $0.78\pm 0.05$ | $0.75\pm 0.05$ | $\mathbf{0}.\mathbf{81}\pm \mathbf{0}.\mathbf{05}$ | $0.79\pm 0.05$ |

Specificity | $\mathbf{0}.\mathbf{80}\pm \mathbf{0}.\mathbf{04}$ | $0.73\pm 0.05$ | $0.73\pm 0.05$ | $0.73\pm 0.05$ | $0.73\pm 0.05$ |

PPV | $\mathbf{0}.\mathbf{78}\pm \mathbf{0}.\mathbf{04}$ | $0.75\pm 0.04$ | $0.73\pm 0.03$ | $0.75\pm 0.04$ | $0.75\pm 0.04$ |

NPV | $0.73\pm 0.04$ | $0.77\pm 0.04$ | $0.74\pm 0.04$ | $\mathbf{0}.\mathbf{79}\pm \mathbf{0}.\mathbf{04}$ | $0.78\pm 0.05$ |

F1-Score | $0.73\pm 0.04$ | $0.76\pm 0.04$ | $0.74\pm 0.04$ | $\mathbf{0}.\mathbf{78}\pm \mathbf{0}.\mathbf{03}$ | $0.77\pm 0.04$ |

AUC | $0.83\pm 0.03$ | $0.84\pm 0.03$ | $0.82\pm 0.03$ | $\mathbf{0}.\mathbf{85}\pm \mathbf{0}.\mathbf{03}$ | $\mathbf{0}.\mathbf{85}\pm \mathbf{0}.\mathbf{03}$ |

Kappa | $0.50\pm 0.07$ | $0.51\pm 0.07$ | $0.47\pm 0.07$ | $\mathbf{0}.\mathbf{53}\pm \mathbf{0}.\mathbf{07}$ | $0.52\pm 0.08$ |

Kappa | |||||
---|---|---|---|---|---|

Sum of squares | gl | Quadratic Mean | F | Sig. | |

Between groups | 0.231 | 4 | 0.058 | 11.489 | 0.000 |

Within groups | 2.490 | 495 | 0.005 | ||

Total | 2.721 | 499 |

Kappa | |||||
---|---|---|---|---|---|

Duncan | |||||

Model | N | Subset for Alpha = 0.05 | |||

1 | 2 | 3 | 4 | ||

LR | 100 | 0.4727280 | |||

DT-CART | 100 | 0.4952041 | |||

GB | 100 | 0.5118734 | 0.5118734 | ||

XGBoost | 100 | 0.5229337 | 0.5229337 | ||

RF | 100 | 1.000 | 0.5338400 | ||

Sig. | 1.000 | 0.0970000 | 0.2710000 | 0.2770000 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Holgado-Apaza, L.A.; Carpio-Vargas, E.E.; Calderon-Vilca, H.D.; Maquera-Ramirez, J.; Ulloa-Gallardo, N.J.; Acosta-Navarrete, M.S.; Barrón-Adame, J.M.; Quispe-Layme, M.; Hidalgo-Pozzi, R.; Valles-Coral, M.
Modeling Job Satisfaction of Peruvian Basic Education Teachers Using Machine Learning Techniques. *Appl. Sci.* **2023**, *13*, 3945.
https://doi.org/10.3390/app13063945

**AMA Style**

Holgado-Apaza LA, Carpio-Vargas EE, Calderon-Vilca HD, Maquera-Ramirez J, Ulloa-Gallardo NJ, Acosta-Navarrete MS, Barrón-Adame JM, Quispe-Layme M, Hidalgo-Pozzi R, Valles-Coral M.
Modeling Job Satisfaction of Peruvian Basic Education Teachers Using Machine Learning Techniques. *Applied Sciences*. 2023; 13(6):3945.
https://doi.org/10.3390/app13063945

**Chicago/Turabian Style**

Holgado-Apaza, Luis Alberto, Edgar E. Carpio-Vargas, Hugo D. Calderon-Vilca, Joab Maquera-Ramirez, Nelly J. Ulloa-Gallardo, María Susana Acosta-Navarrete, José Miguel Barrón-Adame, Marleny Quispe-Layme, Rossana Hidalgo-Pozzi, and Miguel Valles-Coral.
2023. "Modeling Job Satisfaction of Peruvian Basic Education Teachers Using Machine Learning Techniques" *Applied Sciences* 13, no. 6: 3945.
https://doi.org/10.3390/app13063945