An Extended Dataset of Educational Quality Across Countries (1970–2023)
Abstract
1. Summary
- -
- A balanced panel of harmonized test scores for 15 year olds, aligned with the TIMSS.
- -
- Annual educational quality indicators for the 15–19 age cohort, spanning 1970–2023.
- -
- Educational quality indexes for the working-age population (ages 15–64) for 2015 and 2023, incorporating population weights and estimated returns to test scores.
2. Data Description
- Test Score (1970–2023): Includes harmonized and imputed test scores at approximately four-year intervals for 15 year olds using two estimation methods.
- Annual Educational Quality for Ages 15–19 (1970–2023): Provides yearly measures for the 15–19 cohort constructed from harmonized test score estimates.
- Working-Age Educational Quality Index (2015, 2023): Aggregated indicators for the 15–64 population, incorporating population weights and estimated wage return variables.
3. Methods
3.1. Data on International Test Scores
- Harmonization error: Different test designs, participant ages, and content coverage across assessments may introduce measurement error. The high cross-assessment correlations and validated equi-percentile linking method minimize this concern.
- Imputation uncertainty: Missing values represent 48% of potential observations. We mitigate this through two complementary methods—linear interpolation and LASSO regression—yielding highly consistent estimates. This near-perfect agreement between methodologically distinct approaches provides strong validation of the imputed values. Furthermore, the machine learning approach demonstrates excellent predictive performance with an out-of-sample RMSE of 15.7 and R2 of 0.905, indicating low prediction error. The high correlation between methods and strong predictive accuracy suggest that imputation has minimal impact on data quality.
3.2. Constructing a Measure of Educational Quality
4. Data Analysis
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
PISA | Programme for International Student Assessment |
TIMSS | Trends in International Mathematics and Science Study |
IEA | International Association for the Evaluation of Educational Achievement |
CSV | Comma-Separated Values |
IAEP | International Assessment of Educational Progress |
LASSO | Least Absolute Shrinkage and Selection Operator |
OECD | Organisation for Economic Co-operation and Development |
NCES | National Center for Education Statistics |
RMSE | Root-Mean-Squared Error |
ML | Machine Learning |
NAEP | National Assessment of Educational Progress |
UN | United Nations |
References
- Angrist, N.; Djankov, S.; Goldberg, P.K.; Patrinos, H.A. Measuring human capital using global learning data. Nature 2021, 592, 403–408. [Google Scholar] [CrossRef] [PubMed]
- Altinok, N.; Diebolt, C.; Demeulemeester, J.L. A new international database on education quality: 1965–2010. App. Econ. 2014, 46, 1212–1247. [Google Scholar] [CrossRef]
- Lee, H.; Lee, J.W. Educational quality and disparities in income and growth across countries. J. Econ. Growth 2024, 29, 361–389. [Google Scholar] [CrossRef]
- Braun, H.I.; Holland, P.W. Observed-score test equating: A mathematical analysis of some ETS equating procedures. In Test Equating; Holland, P.W., Rubin, D.B., Eds.; Academic Press: New York, NY, USA, 1982; pp. 9–49. [Google Scholar]
- World Bank. World Development Indicators. 2025. Available online: https://datacatalog.worldbank.org/dataset/world-development-indicators (accessed on 14 June 2025).
- World Bank. Education Statistics. 2025. Available online: https://datacatalog.worldbank.org/dataset/education-statistics (accessed on 14 June 2025).
- Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef] [PubMed]
- Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; John Wiley & Sons: New York, NY, USA, 2019. [Google Scholar]
- IEA. TIMSS 2003 International Mathematics Report; TIMSS International Study Centre, Boston College: Chestnut Hill, MA, USA, 2003. [Google Scholar]
- OECD. Learning for Tomorrow’s World; OECD Publishing: Paris, France, 2004. [Google Scholar]
- OECD. Comparing the Similarities and Differences of PISA 2003 and TIMSS; OECD Education Working Papers; OECD Publishing: Paris, France, 2010. [Google Scholar]
- Barro, R.J.; Lee, J.W. A new data set of educational attainment in the world, 1950–2010. J. Dev. Econ. 2013, 104, 184–198. [Google Scholar] [CrossRef]
Variable | Description |
---|---|
CountryName | Country name |
CountryCode | ISO 3-letter country code |
Year | Year of observation |
Observed_data_flag | Indicator for availability of original, observed test score (1: Yes; 0: No) |
Tscore_INT | Harmonized test scores: based on original data and interpolated estimates |
Tscore_ML | Harmonized test scores: based on original data and machine learning estimates |
Tscore1519_INT | Educational quality indicator for cohort 15–19 based on Tscore_INT |
Tscore1519_ML | Educational quality indicator for cohort 15–19 based on Tscore_ML |
Q_INT | Educational quality index for working-age population (interpolation-based) |
Q_ML | Educational quality index for working-age population (machine learning based) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, H.; Lee, J.-W. An Extended Dataset of Educational Quality Across Countries (1970–2023). Data 2025, 10, 130. https://doi.org/10.3390/data10080130
Lee H, Lee J-W. An Extended Dataset of Educational Quality Across Countries (1970–2023). Data. 2025; 10(8):130. https://doi.org/10.3390/data10080130
Chicago/Turabian StyleLee, Hanol, and Jong-Wha Lee. 2025. "An Extended Dataset of Educational Quality Across Countries (1970–2023)" Data 10, no. 8: 130. https://doi.org/10.3390/data10080130
APA StyleLee, H., & Lee, J.-W. (2025). An Extended Dataset of Educational Quality Across Countries (1970–2023). Data, 10(8), 130. https://doi.org/10.3390/data10080130