Student Dataset from Tecnologico de Monterrey in Mexico to Predict Dropout in Higher Education
Abstract
:1. Introduction
2. Data Description
- Sociodemographic information, such as age, gender, and type of zone to which the student’s address belongs.
- Enrollment information, such as program, school, and educational model.
- Academic information related to the student, such as the average of the previous level, the average in the first term or midterm of the first semester, and the number of failed subjects.
- Information associated with scores on admission tests, such as the admission test, standardized English proficiency test, and Mathematics grade.
- Academic history, such as type of school from provenance, national/international student, and relationship with the Tecnologico de Monterrey system.
- Student life, such as participation in sports, cultural, and leadership activities.
- Scholarship and financial aid information, such as type of scholarship, percentage of scholarship, and percentage of scholarship loan.
- Academic information related to the student’s parents, such as educational level and whether the parents were students of the Tecnologico de Monterrey.
- Information on the student’s retention or dropout in the first year.
3. Materials and Methods
3.1. Data Planning
3.2. Data Collection
3.3. Data Assurance
- 1.
- Considering the privacy of students and faculty, it is important to emphasize that the data must be de-identified before it is made available for institutional use and research purposes [22]. Therefore, the student’s enrollment identifier ( student.id) and the name of the previous level school ( id.school.origin) became non-identifiable values as they represent sensitive information.
- 2.
- All records were translated into the English language.
- 3.
- 4.
- Spelling and typographical errors were checked for the categorical values of each variable.
- 5.
- Missing values for the variables socioeconomic.level and social.lag were filled in with “No information”.
- 6.
- The empty values corresponding to admission.test for the Undergraduate level were replaced by “Does not apply” when the variable tec.no.tec has the value “TEC”. That is, the student is a graduate of the Tecnologico de Monterrey’s High School.
- 7.
- The variable dropout.semester was categorized according to the period in which the student dropped out: before or during the semester.
- 8.
- The values of the variables scholarship.perc, loan.perc, and total.scholarship.loan were multiplied by 100 to represent a percentage.
3.4. Data Description
3.5. Data Preservation
3.6. Data Discovery
3.7. Data Integration
3.8. Data Analysis
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AD | August–December |
CC0 | Creative Commons Zero |
DOI | Digital Object Identifier |
LiFE | Leadership and Student Education |
MDPI | Multidisciplinary Digital Publishing Institute |
PAA | Academic Aptitude Test (Prueba de Aptitud Académica) |
PAL | Online Aptitude Test (Prueba de Aptitud en Línea) |
SAP | Systemanalyse Programmentwicklung |
WebI | SAP BusinessObjects Web Intelligence |
XAI | Explainable Artificial Intelligence |
Appendix A
Program | Meaning |
---|---|
ADI | Architecture and Design/Exploration |
AMC | Built Environment/Exploration |
ARQ | B.A. in Architecture |
BIO | Bioengineering and Chemical Process/Exploration |
CIS | Law, Economics and International Relations/Exploration |
COM | Communication and Digital Production/Exploration |
CPF | B.A. in Finance & Accounting |
ESC | Creative Studies/Exploration |
IA | B.S. Agronomy Engineering |
IBN | B.S. Biobusiness Engineering |
IBQ | Engineering-Bioengineering and Chemical Process (avenue)/Exploration |
IBT | B.S. in Biotechnology Engineering |
IC | B.S. Civil Engineering |
ICI | Engineering-Applied Sciences (avenue)/Exploration |
ICT | Engineering-Computer Science and Information Technologies (avenue)/Exploration |
IDA | B.S. Automotive Engineering |
IDS | B.S. Sustainable Development Engineering |
IFI | B.S. in Engineering Physics |
IIA | B.S. Food Industry Engineering |
IID | B.S. Innovation and Development Engineering |
IIN | B.S. Industrial Innovation Engineering |
IIS | B.S. Industrial Engineering with minor in Systems Engineering |
IIT | Engineering-Innovation and Transformation (avenue)/Exploration |
IMA | B.S. Mechanical Engineering (administrator) |
IMD | B.S. Biomedical Engineering |
IME | B.S. Mechanical Engineering (electrician) |
IMI | B.S. Digital Music Production Engineering |
IMT | B.S. in Mechatronics Engineering |
ING | Engineering/Exploration |
INQ | B.S. Chemistry and Nanotechnology Engineering |
INT | B.S. Business Informatics |
IQA | B.S. Chemical Engineering (administrator) |
IQP | B.S. Chemical Engineering (sustainable processes) |
ISC | B.S. Computer Science and Technology |
ISD | B.S. Digital Systems and Robotics Engineering |
ITC | B.S. in Computer Science and Technology |
ITE | B.S. Electronic and Computer Engineering |
ITI | B.S. Information and Communication Technologies |
ITS | B.S. Telecommunications and Electronic Systems |
LAD | B.A. Animation and Digital Art |
LAE | B.A. Business Administration |
LAF | B.A. Financial Management |
LBC | B.A. in Biosciences |
LCD | B.A. Communication and Digital Media |
LCMD | B.A. Communication and Digital Media |
LDE | B.A. in Entrepreneurship |
LDF | B.A. Law with Minor in Finance |
LDI | B.A. Industrial design |
LDN | B.A. Business Innovation and Management |
LDP | B.A. Law with Minor in Political Science |
Program | Meaning |
---|---|
LEC | B.A. Economics |
LED | B.A. in Law |
LEF | B.A. Economics and Finances |
LEM | B.A. in Marketing |
LIN | B.A. in International Business |
LLE | B.A. Spanish Literature |
LLN | B.A. International Logistics |
LMC | B.A. Marketing and Communication |
LMI | B.A. Journalism and Media Studies |
LNB | B.A. in Nutrition and Wellness |
LP | B.A. Psychology |
LPL | B.A. Political Science |
LPM | B.A. Advertising and Marketing Communications |
LPO | B.A. Organizational Psychology |
LPS | B.S. Clinical Psychology and Health |
LRI | B.A. International Relations |
LTS | B.A. Social Transformation |
MC | Physician & Surgeon |
MO | Medical and Surgical Dentist |
NEG | Business/Exploration |
PBB | Bicultural High School |
PBI | International High School |
PTB | Bilingual High School |
PTM | Multicultural High School |
SLD | Health Sciences/Exploration |
TIE | Information Technologies and Electronics/Exploration |
References
- Latif, A.; Choudhary, A.I.; Hammayun, A.A. Economic Effects of Student Dropouts: A Comparative Study. J. Global Econ. 2015, 3, 137. [Google Scholar] [CrossRef]
- Raisman, N. The Cost of College Attrition at Four-Year Colleges & Universities—An Analysis of 1669 US Institutions. Policy Perspect. 2013, 269. Available online: https://eric.ed.gov/?q=source%3A%22Educational+Policy+Institute%22&id=ED562625 (accessed on 24 August 2022).
- Da Silva, J.J.; Roman, N.T. Predicting Dropout in Higher Education: A Systematic Review. In Proceedings of the Anais do XXXII Simpósio Brasileiro de Informática na Educação; SBC: Porto Alegre, Brazil, 2021; pp. 1107–1117. [Google Scholar] [CrossRef]
- Fahd, K.; Venkatraman, S.; Miah, S.J.; Ahmed, K. Application of machine learning in higher education to assess student academic performance, at-risk, and attrition: A meta-analysis of literature. Educ. Inf. Technol. 2022, 27, 3743–3775. [Google Scholar] [CrossRef]
- Ranjeeth, S.; Latchoumi, T.P.; Paul, P.V. A Survey on Predictive Models of Learning Analytics. Procedia Comput. Sci. 2020, 167, 37–46. [Google Scholar] [CrossRef]
- Dutt, A.; Ismail, M.A.; Herawan, T. A Systematic Review on Educational Data Mining. IEEE Access 2017, 5, 15991–16005. [Google Scholar] [CrossRef]
- Kumar, M.; Singh, A.J.; Handa, D. Literature Survey on Educational Dropout Prediction. Int. J. Educ. Manag. Eng. 2017, 7, 8. [Google Scholar] [CrossRef]
- Saleem, F.; Ullah, Z.; Fakieh, B.; Kateb, F. Intelligent Decision Support System for Predicting Student’s E-Learning Performance Using Ensemble Machine Learning. Mathematics 2021, 9, 2078. [Google Scholar] [CrossRef]
- Hilliger, I.; Ortiz-Rojas, M.; Pesántez-Cabrera, P.; Scheihing, E.; Tsai, Y.S.; Muńoz-Merino, P.J.; Broos, T.; Whitelock-Wainwright, A.; Pérez-Sanagustín, M. Identifying needs for learning analytics adoption in Latin American universities: A mixed-methods approach. Internet High. Educ. 2020, 45, 100726. [Google Scholar] [CrossRef]
- Namoun, A.; Alshanqiti, A. Predicting Student Performance Using Data Mining and Learning Analytics Techniques: A Systematic Literature Review. Appl. Sci. 2021, 11, 237. [Google Scholar] [CrossRef]
- Cardona, T.A.; Cudney, E.A. Predicting Student Retention Using Support Vector Machines. Procedia Manuf. 2019, 39, 1827–1833. [Google Scholar] [CrossRef]
- Lázaro Alvarez, N.; Callejas, Z.; Griol, D. Predicting computer engineering students’ dropout in cuban higher education with pre-enrollment and early performance data. J. Technol. Sci. Educ. 2020, 10, 241–258. [Google Scholar] [CrossRef]
- Nagy, M.; Molontay, R. Predicting Dropout in Higher Education Based on Secondary School Performance. In Proceedings of the 2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES), Las Palmas de Gran Canaria, Spain, 21–23 June 2018; pp. 389–394. [Google Scholar] [CrossRef]
- Varga, E.B.; Sátán, Á. Detecting at-risk students on Computer Science bachelor programs based on pre-enrollment characteristics. Hung. Educ. Res. J. 2021, 11, 297–310. [Google Scholar] [CrossRef]
- Kiss, B.; Nagy, M.; Molontay, R.; Csabay, B. Predicting Dropout Using High School and First-semester Academic Achievement Measures. In Proceedings of the 2019 17th International Conference on Emerging eLearning Technologies and Applications (ICETA), Starý Smokovec, Slovakia, 21–22 November 2019; pp. 383–389. [Google Scholar] [CrossRef]
- Alshanqiti, A.; Namoun, A. Predicting Student Performance and Its Influential Factors Using Hybrid Regression and Multi-Label Classification. IEEE Access 2020, 8, 203827–203844. [Google Scholar] [CrossRef]
- Hoffman, J.L.; Lowitzki, K.E. Predicting College Success with High School Grades and Test Scores: Limitations for Minority Students. Rev. High. Educ. 2005, 28, 455–474. [Google Scholar] [CrossRef]
- Zwick, R.; Himelfarb, I. The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average. J. Educ. Meas. 2011, 48, 101–121. [Google Scholar] [CrossRef]
- Freitas, F.A.d.S.; Vasconcelos, F.F.X.; Peixoto, S.A.; Hassan, M.M.; Dewan, M.A.A.; Albuquerque, V.H.C.D.; Filho, P.P.R. IoT System for School Dropout Prediction Using Machine Learning Techniques Based on Socioeconomic Data. Electronics 2020, 9, 1613. [Google Scholar] [CrossRef]
- Séllei, B.; Stumphauser, N.; Molontay, R. Traits versus Grades—The Incremental Predictive Power of Positive Psychological Factors over Pre-Enrollment Achievement Measures on Academic Performance. Appl. Sci. 2021, 11, 1744. [Google Scholar] [CrossRef]
- Terry, M. The Effects that Family Members and Peers Have on Students’ Decisions to Drop out of School. Educ. Res. Q. 2008, 31, 25–38. [Google Scholar]
- Slade, S.; Prinsloo, P. Learning Analytics: Ethical Issues and Dilemmas. Am. Behav. Sci. 2013, 57, 1510–1529. [Google Scholar] [CrossRef] [Green Version]
- Ferreyra, M.M.; Avitabile, C.; Botero Álvarez, J.; Haimovich Paz, F.; Urzúa, S. At a Crossroads: Higher Education in Latin America and the Caribbean; The World Bank Group: Washington, DC, USA, 2017. [Google Scholar] [CrossRef]
- Ferreira, F.H.G.; Messina, J.; Rigolini, J.; López-Calva, L.F.; Lugo, M.A.; Vakis, R. Economic Mobility and the Rise of the Latin American Middle Class; The World Bank Group: Washington, DC, USA, 2013. [Google Scholar] [CrossRef]
- Lemaitre, M.J. Quality assurance in Latin America: Current situation and future challenges. Tuning J. High. Educ. 2017, 5, 21–40. [Google Scholar] [CrossRef]
- González-Velosa, C.; Rucci, G.; Sarzosa, M.; Urzúa, S. Returns to Higher Education in Chile and Colombia; Technical Report, IDB Working Paper Series No. IDB-WP-587; Inter-American Development Bank: Washington, DC, USA, 2015. [Google Scholar]
- Cobo, C.; Aguerrebere, C. Building capacity for learning analytics in Latin America. In Learning Analytics for the Global South; Lim, C.P., Tinio, V.L., Eds.; Foundation for Information Technology Education and Development, Inc.: Quezon City, Philippines, 2018; Volume 58, pp. 63–67. [Google Scholar]
- Call for Proposals: Bringing New Solutions to the Challenges of Predicting and Countering Student Dropout in Higher Education. 2022. Available online: https://ifelldh.tec.mx/en/student-dropout-higher-education (accessed on 9 June 2022).
- Tecnologico de Monterrey. Tecnologico de Monterrey. 2022. Available online: https://tec.mx/en (accessed on 11 May 2022).
- The Tec Is Transforming Its Educational Model to Become More Flexible. 2022. Available online: https://conecta.tec.mx/en/news/national/education/tec-transforming-its-educational-model-become-more-flexible (accessed on 18 May 2022).
- Tec de Monterrey Has Reinvented Its Student Experience, Presents LiFE. 2022. Available online: https://conecta.tec.mx/en/news/national/institution/tec-de-monterrey-has-reinvented-its-student-experience-presents-life (accessed on 18 May 2022).
- Gestión de Datos de Investigación. 2022. Available online: https://biblioguias.cepal.org/c.php?g=495473&p=4994826 (accessed on 21 June 2022).
- Primer on Data Management: What You Always Wanted to Know. 2022. Available online: https://old.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf (accessed on 21 June 2022).
- Rastrollo-Guerrero, J.L.; Gómez-Pulido, J.A.; Durán-Domínguez, A. Analyzing and Predicting Students’ Performance by Means of Machine Learning: A Review. Appl. Sci. 2020, 10, 1042. [Google Scholar] [CrossRef]
- Baranyi, M.; Nagy, M.; Molontay, R. Interpretable Deep Learning for University Dropout Prediction. In Proceedings of the 21st Annual Conference on Information Technology Education, Virtual, 7–9 October 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 13–19. [Google Scholar] [CrossRef]
- Nagy, M.; Molontay, R.; Szabó, M. A Web Application for Predicting Academic Performance and Identifying the Contributing Factors. In Proceedings of the SEFI 47th Annual Conference, Budapest, Hungary, 16–19 September 2019; pp. 1794–1806. [Google Scholar]
- Smith, B.I.; Chimedza, C.; Bührmann, J.H. Individualized help for at-risk students using model-agnostic and counterfactual explanations. Educ. Inf. Technol. 2022, 27, 1539–1558. [Google Scholar] [CrossRef]
No. | Attribute | Data Type | Description | Values |
---|---|---|---|---|
1 | student.id | Integer | Masked enrollment number of the student. There are duplicate student identifiers (IDs) as one identifier may be related to a different educational level: High School or Undergraduate. In addition, there are some student IDs that are repeated three times due to those students have additional information related to different generations. | 1-121584 |
2 | generation | String | Unique indicator that denotes the generation to which the student belongs. | AD14, AD15, AD16, AD17, AD18, AD19, AD20 |
3 | educational.model | Binary | Educational model to which the student belongs. | 1: TEC21 Model, 0: Previous educational model |
4 | level | String | Educational level to which the student belongs. | High School, Undergraduate |
5 | gender | String | Student gender. | Male, Female |
6 | age | Integer | Student’s age. | Range from 13 to 55 years |
7 | zone.type | String | Description of the type of zone to which the student’s address belongs. | Rural, Semiurban, Urban, No information |
8 | socioeconomic.level | String | Socioeconomic level of the student. | Level 1, Level 2, Level 3, Level 4, Level 5, Level 6, Level 7, No information |
9 | social.lag | String | It indicates the level of social backwardness at the level of urban areas of the student’s address according to the zip code. | Low, Medium, High, No information |
10 | id.school.origin | String | Masked identifier of the school where the student comes from. | Range from “School 0” to “School 10242”. |
11 | school.cost | String | Classification of the tuition cost of the student’s school of origin. | Public, Low cost, Medium cost, Medium-high cost, High cost, Not defined |
12 | tec.no.tec | String | Indicator that denotes if the student comes from a school that belongs to Tecnologico de Monterrey. | TEC, NO TEC |
13 | max.degree.parents | String | Highest educational level obtained by the student’s parents. | No information, No degree, Undergraduate degree, Master degree, PhD |
14 | father.education.complete | String | Description of the last educational level completed by the father. | Attended university, but did not graduate; Graduated from elementary or middle school; Graduated from high school; None educational degree; Received master degree; Received PhD; Received technical or commercial degree; Received undergraduate degree; No information |
15 | father.education.summary | String | Classification of the last educational level completed by the father. | No information, No degree, Undergraduate degree, Master degree, PhD |
16 | mother.education.complete | String | Description of the last educational level completed by the mother. | Attended university, but did not graduate; Graduated from elementary or middle school; Graduated from high school; None educational degree; Received master degree; Received PhD; Received technical or commercial degree; Received undergraduate degree; No information |
17 | mother.education.summary | String | Classification of the last educational level completed by the mother. | No information, No degree, Undergraduate degree, Master degree, PhD |
18 | parents.exatec | String | Indicator that denotes if either of the parents is an exatec (was a student at Tecnologico de Monterrey). | Yes, No, No information |
19 | father.exatec | String | Indicator that denotes if the student’s father is an exatec (was a student at Tecnologico de Monterrey). | Yes, No, No information |
20 | mother.exatec | String | Indicator that denotes if the student’s mother is an exatec (was a student at Tecnologico de Monterrey). | Yes, No, No information |
21 | first.generation | String | It indicates if the student is the first person in the family to study for a professional career. | Yes, No, No information, Does not apply |
No. | Attribute | Data Type | Description | Values |
---|---|---|---|---|
22 | school | String | Acronyms of the school to which the student’s academic program belongs. | High school, EN = Business School, EMCS = School of Medicine and Health Sciences, EIC = School of Engineering and Sciences, ECSG = School of Social Sciences and Government, EHE = School of Humanities and Education, EAAD = School of Architecture, Art and Design |
23 | program | String | Acronyms of the academic program to which the student belongs. | The meaning of the acronyms is found in Appendix A |
24 | region | String | Code of the region to which the campus where the student is enrolled belongs. | RM = Monterrey Region, RO = West Region, RCM = Mexico City Region, RCS = South/Central Region, DR = Regional Development Region |
25 | foreign | String | Indicator to identify if the student is a foreigner (Yes: Foreigner), if the Mexican student’s birthplace is different from the location of the school campus (Yes: National), or if the student belongs to the same location (Local). | Local, Yes: National, Yes: Foreigner |
26 | PNA | Float | Previous level score (average) | Range from 0 to 100 |
27 | english.evaluation | Integer | Level of English obtained from a standardized test of English language proficiency. | Level 0: No information, Level 1: Beginner, Level 2: Basic, Level 3: Basic, Level 4: Intermediate, Level 5: Intermediate, Level 6: Upper Intermediate, Level 7: Advanced |
28 | admission.test | Integer and String | Admission test score. There are two scoring scales depending on how the test is applied: (1) Academic Aptitude Test ( Prueba de Aptitud Académica-PAA): admission test applied face-to-face for all generations of students before the closure due to the COVID-19 pandemic. The range of scores is from 400 to 1600. (2) Online Aptitude Test ( Prueba de Aptitud en Línea-PAL): admission test that, as a consequence of the closure due to COVID-19, is applied online. The range of scores is from 0 to 100. | Ranges from 1 to 100 and from 400 to 1600, Does not apply |
29 | online.test | Binary | It indicates if the student took the online admission test. | 1: Yes, 0: No |
30 | general.math.eval | Float and String | Mathematics score from the admission test or from the school of origin. | Range from 0 to 100, Does not apply, No information |
31 | admission.rubric | Integer | Score generated from the student’s profile where 50 is outstanding and 0 is average. | Range from 0 to 50 |
32 | scholarship.type | String | Type of scholarship. | Academic talent, Army/Navy scholarship, Child of Professor/Employee/Director, Contingency scholarship, Cultural talent, Entrepreneurial talent, Leaders of Tomorrow Scholarship, Leadership talent, No scholarship, Sports Talent, Traditional |
33 | scholarship.perc | Integer | Scholarship percentage. | Range from 0 to 100 |
34 | loan.perc | Integer | Percentage of the educational loan. | Range from 0 to 50 |
35 | total.scholarship.loan | Integer | Total percentage of financial support provided to the student for education (scholarship + educational loan). | Range from 0 to 100 |
36 | FTE | Float | It indicates if the student is a full-time student at Tecnologico de Monterrey according to the number of subjects enrolled. | Range from 0.04 to 1.44 |
37 | average.first.period | Float | Average obtained in the first term (five weeks–Undergraduate) or the first midterm (six weeks–High School) of the student’s first semester. This data corresponds only to the AD19 and AD20 generations (TEC21 Model). | Range from 0 to 100 |
No. | Attribute | Data Type | Description | Values |
---|---|---|---|---|
38 | failed.subject.first.period | Integer | Number of subjects failed in the first term (five weeks–Undergraduate) or the first midterm (six weeks–High School) of the student’s first semester. This data corresponds only to the AD19 and AD20 generations (TEC21 Model). | Range from 0 to 8 |
39 | dropped.subject.first.period | Integer | Number of subjects dropped out in the first term (five weeks–Undergraduate) or the first midterm (six weeks–High School) of the student’s first semester. This data corresponds only to the AD19 and AD20 generations (TEC21 Model). | Range from 0 to 9 |
40 | retention | Binary | Value that indicates if the student continues studying at Tecnologico de Monterrey. | 1: Retention, 0: Dropout |
41 | dropout.semester | Integer | Value indicating the semester when the student dropped out. Where 0 = the student continues studying, 1 = the student dropped out during the first semester, 2 = the student did not enroll in the second semester, 3 = the student dropped out during the second semester, and 4 = the student did not enroll in the third semester. | 0, 1, 2, 3, 4 |
42 | physical.education | Binary and String | Value that indicates if the student was enrolled in any physical education activities during the first semester. This data corresponds only to the AD14, AD15, AD16, and AD17 generations. | 0: No, 1: Yes, Does not apply, No information |
43 | cultural.diffusion | Binary and String | Value that indicates if the student was enrolled in any cultural diffusion activities during the first semester. This data corresponds only to the AD14, AD15, AD16, and AD17 generations. | 0: No, 1: Yes, Does not apply, No information |
44 | student.society | Binary and String | Value that indicates if the student was enrolled in any student society activities during the first semester. This data corresponds only to the AD14, AD15, AD16, and AD17 generations. | 0: No, 1: Yes, Does not apply, No information |
45 | total.life.activities | Integer and String | Number of LiFE (Leadership and Student Education) activities in which the student was enrolled during the first semester. This data corresponds only to the AD18, AD19, and AD20 generations. | 0, 1, 2, 3, 4, 5, Does not apply, No information |
46 | athletic.sports | Binary and String | Value that indicates if the student was enrolled in any athletic or sports activities during the first semester. This data corresponds only to the AD18, AD19, and AD20 generations. | 0: No, 1: Yes, Does not apply, No information |
47 | art.culture | Binary and String | Value that indicates if the student was enrolled in any artistic or cultural activities during the first semester. This data corresponds only to the AD18, AD19, and AD20 generations. | 0: No, 1: Yes, Does not apply, No information |
48 | student.society.leadership | Binary and String | Value that indicates if the student was enrolled in any student society activities and a leadership program during the first semester. This data corresponds only to the AD18, AD19, and AD20 generations. | 0: No, 1: Yes, Does not apply, No information |
49 | life.work.mentoring | Binary and String | Value that indicates if the student received advice on life and work plans during the first semester. This data corresponds only to the AD18, AD19, and AD20 generations. | 0: No, 1: Yes, Does not apply, No information |
50 | wellness.activities | Binary and String | Value that indicates if the student was enrolled in any integral wellness activities during the first semester. This data corresponds only to the AD18, AD19, and AD20 generations. | 0: No, 1: Yes, Does not apply, No information |
No. | Attribute | Unique | Mean | Min | Max | Information Gain |
---|---|---|---|---|---|---|
6 | age | 32 | 17 | 13 | 55 | 0.0086 |
26 | PNA | 2881 | 88.15 | 0 | 100 | 0.0068 |
28 | admission.test | 907 | 1259 | 1 | 1600 | 0.0026 |
30 | general.math.eval | 423 | 68.50 | 0 | 100 | 0.0062 |
31 | admission.rubric | 51 | 33 | 0 | 50 | 0.0025 |
33 | scholarship.perc | 26 | 17 | 0 | 100 | 0.0066 |
34 | loan.perc | 14 | 4 | 0 | 50 | 0.0010 |
35 | total.scholarship.loan | 3066 | 21 | 0 | 100 | 0.0064 |
36 | FTE | 64 | 1.02 | 0.04 | 1.44 | 0.0154 |
37 | average.first.period | 545 | 87.26 | 0 | 100 | 0.0321 |
38 | failed.subject.first.period | 9 | 0 | 0 | 8 | 0.0039 |
39 | dropped.subject.first.period | 10 | 0 | 0 | 9 | 0.0006 |
45 | total.life.activities | 8 | 1.74 | 0 | 8 | 0.0061 |
No. | Attribute | Unique | Mode | Frequency | Information Gain |
---|---|---|---|---|---|
2 | generation | 7 | AD20 | 21,962 | 0.0047 |
3 | educational model | 2 | 0 | 99,534 | 0.0029 |
4 | level | 2 | Undergraduate | 77,517 | 0.0089 |
5 | gender | 2 | Male | 75,285 | 0.0081 |
7 | zone.type | 4 | No information | 101,920 | 0.0058 |
8 | socioeconomic.level | 8 | No information | 124,041 | 0.0174 |
9 | social.lag | 4 | No information | 119,327 | 0.0208 |
10 | id.school.origin | 10,243 | School 5,328 | 3106 | 0.0080 |
11 | school.cost | 6 | High cost | 67,135 | 0.0057 |
12 | tec.no.tec | 2 | NO TEC | 102,481 | 0.0026 |
13 | max.degree.parents | 5 | Undergraduate degree | 52,494 | 0.0128 |
14 | father.education.complete | 9 | Received undergraduate degree | 49,888 | 0.0110 |
15 | father.education.summary | 5 | Undergraduate degree | 49,888 | 0.0124 |
16 | mother.education.complete | 9 | Received undergraduate degree | 53,453 | 0.0119 |
17 | mother.education.summary | 5 | Undergraduate degree | 53,453 | 0.0130 |
18 | parents.exatec | 3 | No | 94,020 | 0.0056 |
19 | father.exatec | 3 | No | 97,845 | 0.0047 |
20 | mother.exatec | 3 | No | 104,787 | 0.0039 |
21 | first.generation | 4 | Does not apply | 65,809 | 0.0064 |
22 | school | 7 | High School | 65,809 | 0.0100 |
23 | program | 76 | PBB | 38,506 | 0.0074 |
24 | region | 5 | RCM | 36,678 | 0.0078 |
25 | foreign | 3 | Local | 116,933 | 0.0020 |
27 | english.evaluation | 8 | 6 | 49,296 | 0.0070 |
29 | online.test | 2 | 0 | 142,204 | 0.0004 |
32 | scholarship.type | 11 | No scholarship | 71,866 | 0.0165 |
40 | retention | 2 | 1 | 131,687 | Target |
41 | dropout.semester | 5 | 0 | 131,687 | 0.2819 |
42 | physical.education | 4 | 1 | 58,701 | 0.0243 |
43 | cultural.diffusion | 4 | 1 | 40,768 | 0.0233 |
44 | student.society | 4 | 0 | 52,710 | 0.0235 |
46 | athletic.sports | 4 | 1 | 36,908 | 0.0176 |
47 | art.culture | 4 | 0 | 43,566 | 0.0174 |
48 | student.society.leadership | 4 | 0 | 42,987 | 0.0175 |
49 | life.work.mentoring | 4 | 0 | 51,553 | 0.0176 |
50 | wellness.activities | 4 | 0 | 44,364 | 0.0175 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alvarado-Uribe, J.; Mejía-Almada, P.; Masetto Herrera, A.L.; Molontay, R.; Hilliger, I.; Hegde, V.; Montemayor Gallegos, J.E.; Ramírez Díaz, R.A.; Ceballos, H.G. Student Dataset from Tecnologico de Monterrey in Mexico to Predict Dropout in Higher Education. Data 2022, 7, 119. https://doi.org/10.3390/data7090119
Alvarado-Uribe J, Mejía-Almada P, Masetto Herrera AL, Molontay R, Hilliger I, Hegde V, Montemayor Gallegos JE, Ramírez Díaz RA, Ceballos HG. Student Dataset from Tecnologico de Monterrey in Mexico to Predict Dropout in Higher Education. Data. 2022; 7(9):119. https://doi.org/10.3390/data7090119
Chicago/Turabian StyleAlvarado-Uribe, Joanna, Paola Mejía-Almada, Ana Luisa Masetto Herrera, Roland Molontay, Isabel Hilliger, Vinayak Hegde, José Enrique Montemayor Gallegos, Renato Armando Ramírez Díaz, and Hector G. Ceballos. 2022. "Student Dataset from Tecnologico de Monterrey in Mexico to Predict Dropout in Higher Education" Data 7, no. 9: 119. https://doi.org/10.3390/data7090119