An Extended Dataset of Educational Quality Across Countries (1970–2023)

Lee, Hanol; Lee, Jong-Wha

doi:10.3390/data10080130

Open AccessData Descriptor

An Extended Dataset of Educational Quality Across Countries (1970–2023)

by

Hanol Lee

¹

and

Jong-Wha Lee

^2,*

¹

Research Institute of Economics and Management, Southwestern University of Finance and Economics, 555, Liutai Avenue, Wenjiang District, Chengdu 611130, China

²

Economics Department, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea

^*

Author to whom correspondence should be addressed.

Data 2025, 10(8), 130; https://doi.org/10.3390/data10080130

Submission received: 21 July 2025 / Revised: 9 August 2025 / Accepted: 13 August 2025 / Published: 15 August 2025

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-Learning and Education—3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

This study presents an extended dataset on educational quality covering 101 countries, from 1970 to 2023. While existing international assessments, such as the Programme for International Student Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS), offer valuable snapshots of student performance, their limited coverage across countries and years constrains broader analyses. To address this limitation, we harmonized observed test scores across assessments and imputed missing values using both linear interpolation and machine learning (Least Absolute Shrinkage and Selection Operator (LASSO) regression). The dataset included (i) harmonized test scores for 15 year olds, (ii) annual educational quality indicators for the 15–19 age group, and (iii) educational quality indexes for the working-age population (15–64). These measures are provided in machine-readable formats and support empirical research on human capital, economic development, and global education inequalities across economies.

Dataset: The datasets supporting the findings of this study are publicly available at https://doi.org/10.5281/zenodo.16778072.

Dataset License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Keywords:

educational quality; machine learning; PISA; TIMSS; test scores; cross-country dataset

1. Summary

This study introduces an extended cross-country dataset on educational quality, spanning 101 countries, from 1970 to 2023. The dataset harmonizes mathematics and science test scores for secondary students from major international assessments—Programme for International Student Assessment (PISA), Trends in International Mathematics and Science Study (TIMSS), and the International Association for the Evaluation of Educational Achievement (IEA)—and imputes missing values using two methods, (i) linear interpolation and (ii) machine learning prediction based on the Least Absolute Shrinkage and Selection Operator (LASSO), incorporating a diverse set of economic and educational indicators.

Key features of the dataset include the following:

-: A balanced panel of harmonized test scores for 15 year olds, aligned with the TIMSS.
-: Annual educational quality indicators for the 15–19 age cohort, spanning 1970–2023.
-: Educational quality indexes for the working-age population (ages 15–64) for 2015 and 2023, incorporating population weights and estimated returns to test scores.

This dataset supports cross-country research on education, human capital, and development by offering enhanced temporal coverage and broader country representation. It complements existing data sources [1,2,3] and is publicly available for further use.

2. Data Description

The dataset is available in two formats, CSV and Stata (.dta, version 18), and includes panel data spanning 101 countries from 1970 to 2023. It comprises three main panel datasets.

Test Score (1970–2023): Includes harmonized and imputed test scores at approximately four-year intervals for 15 year olds using two estimation methods.
Annual Educational Quality for Ages 15–19 (1970–2023): Provides yearly measures for the 15–19 cohort constructed from harmonized test score estimates.
Working-Age Educational Quality Index (2015, 2023): Aggregated indicators for the 15–64 population, incorporating population weights and estimated wage return variables.

The dataset also includes identifiers (Country Name, ISO3 code, Year) and an indicator for the availability of original observed test scores. Table 1 summarizes the key variables in the dataset.

The variables Tscore_INT and Tscore_ML represent harmonized test scores derived through two distinct imputation approaches. All test scores are harmonized to the TIMSS 1995 scale (mean = 500, SD = 100), following the TIMSS metric convention. The harmonization process transforms raw scores from different assessments into a comparable scale, with score transformations building directly on the TIMSS framework, while incorporating PISA data through equi-percentile linking methods.

A harmonized dataset of test scores for 15 year olds from 1970 to 2023 was constructed using 631 original observed data points from international assessments: 323 observations from TIMSS (1995–2023), 245 from PISA (2000–2022), and 63 from earlier IEA and IAEP studies (1970–1991).

To construct a balanced panel, 581 missing values out of 1212 potential country–year observations (48%) were imputed using two complementary methods. First, linear interpolation and extrapolation (Tscore_INT) fills gaps by linearly interpolating between observed assessments within each country, or extrapolating trends when observations are available only on one side. Second, LASSO regression (Tscore_ML) builds a predictive model using 501 economic and educational indicators from the World Bank to estimate test scores, enabling predictions even for countries with sparse or no historical assessment data.

The variables Tscore1519_INT and Tscore1519_ML represent annual educational quality indicators for the 15–19 age cohort. The educational quality indexes (Q_INT and Q_ML) for the working-age population (ages 15–64) are derived from cohort-level scores.

3. Methods

3.1. Data on International Test Scores

Our analysis builds on student achievement data collected through various international testing initiatives over the past five decades. While multiple assessment programs contribute to the database, TIMSS and PISA constitute the core sources, providing systematic mathematics and science performance measures. The resulting dataset covers 101 countries with observations spanning from 1970 through 2023.

Launched in 1995, the TIMSS assesses mathematics and science achievement at Grades 4 and 8 every four years. Grade 8 scores are used as a proxy for secondary school quality, as this age group closely aligns with PISA’s 15-year-old target population, facilitating harmonization across assessments. The program has been conducted in 1995, 1999, 2003, 2007, 2011, 2015, 2019, and most recently in 2023, with participation varying from 36 to 72 countries across different cycles. The 2023 cycle included 72 participating countries and regional benchmarks.

First administered in 2000, PISA evaluates reading, mathematics, and science literacy among 15 year olds. The dataset incorporates eight PISA waves through 2022, conducted in 2000, 2003, 2006, 2009, 2012, 2015, 2018, and 2022. Participation has expanded substantially from 42 countries in the initial assessment to 81 countries and territories in 2022.

To extend coverage to earlier decades, this study incorporates results from the IEA’s pioneering international assessments, which established methodologies that later informed TIMSS and PISA frameworks. These include the First International Science Study (1970–1972) covering 16 countries, the Second International Mathematics Study (1980–1982) with 17 participating countries, and the Second International Science Study (1983–1984) also involving 17 countries. Additionally, we include data from the International Assessment of Educational Progress (IAEP), administered by NCES in 1988 and 1990–1991, which assessed mathematics and science achievement in 6 and 19 countries, respectively. Without these earlier data points, we would miss critical variation in educational quality during periods of significant educational reform in many countries, particularly the rapid educational expansion experienced by East Asian economies during the 1970s and 1980s.

The data excluded countries from the final sample based on two criteria: (i) the absence of nationally representative samples (e.g., China and India) and (ii) missing key national indicators within the World Bank’s education and economic datasets, which are essential for panel construction and data imputation.

The analysis covers 12 key assessment years between 1970 and 2023 (1970, 1980, 1984, 1990, 1995, 1999, 2003, 2007, 2011, 2015, 2019, and 2023), yielding an unbalanced panel of 101 countries, with 805 mathematics and 828 science observations. TIMSS served as the reference metric, with all scores anchored to its 1995 scale (mean = 500, SD = 100). PISA scores were mapped onto this scale using equi-percentile linking [4], aligning cumulative distributions across assessments. This approach is supported by strong cross-country correlations between TIMSS and PISA: 0.88 for mathematics and 0.91 for science. Notably, mathematics and science scores demonstrate remarkable similarity—averaging 475 versus 481 points on the TIMSS and 462 versus 467 on the PISA—justifying our use of their simple average as a comprehensive measure of educational quality.

To extend comparability to pre-1995 assessments (IEA and IAEP), scores from the US-based National Assessment of Educational Progress (NAEP) scores were employed as the temporal benchmark, leveraging the US’s consistent participation and applying variance equalization across Organisation for Economic Co-operation and Development (OECD) countries.

After the harmonization procedures were completed, the average achievement score for each country–year was computed as the simple mean of the mathematics and science scores. In cases where only one participant was available, that score was used as the representative achievement value for that year.

Of the 1212 potential country–year observations (101 countries × 12 years), 631 were observed, leaving 581 missing values (48%). These missing values were imputed using a combination of linear interpolation and LASSO regression based on economic and educational predictors. The two methods yielded highly consistent estimates (correlation = 0.967), enabling the construction of a balanced panel.

The LASSO model draws on 501 fully observed predictors from the World Bank’s Development Indicators and Education Statistics [5,6], selected from an initial pool of 3442 variables. To improve predictive accuracy, existing test score data were incorporated—specifically, country-level mean scores and the nearest available assessment for each year t, prioritizing earlier observations in cases of equidistant data points. This approach adheres to standard machine learning practices [7,8]. In line with standard protocols, the data were split into training (80%) and validation (20%) sets, and a grid search with tenfold cross-validation was applied to minimize the Root-Mean-Squared Error (RMSE). The final model, trained on the full dataset, achieved an RMSE of 17.5 and an R² of 0.912.

The dataset construction addresses several methodological challenges related to data quality and comparability across different assessments.

First, we address the fundamental question of whether PISA and TIMSS data are commensurable. While PISA assesses 15-year-olds’ ability to apply knowledge in real-world contexts and TIMSS evaluates Grade 8 students’ mastery of curriculum-based content [9,10,11], empirical evidence strongly supports their comparability. The cross-country correlations between TIMSS and PISA scores are remarkably high—0.88 for mathematics and 0.91 for science—indicating that despite methodological differences, both assessments capture similar underlying educational quality.

We employ the equi-percentile linking method [4] to harmonize the scales while preserving relative country standings. This approach aligns cumulative distribution functions between assessments without assuming identical content, and anchors all scores to the TIMSS 1995 scale.

Second, we identify and address potential sources of error in the dataset construction:

Harmonization error: Different test designs, participant ages, and content coverage across assessments may introduce measurement error. The high cross-assessment correlations and validated equi-percentile linking method minimize this concern.
Imputation uncertainty: Missing values represent 48% of potential observations. We mitigate this through two complementary methods—linear interpolation and LASSO regression—yielding highly consistent estimates. This near-perfect agreement between methodologically distinct approaches provides strong validation of the imputed values. Furthermore, the machine learning approach demonstrates excellent predictive performance with an out-of-sample RMSE of 15.7 and R² of 0.905, indicating low prediction error. The high correlation between methods and strong predictive accuracy suggest that imputation has minimal impact on data quality.

Figure 1 depicts the evolution of test scores from 1970 to 2023 for 12 of the 101 selected countries. The figure distinguishes between three data sources: solid black dots represent harmonized scores derived from original observations from international assessments (including TIMSS, PISA, and earlier IEA studies); hollow blue triangles indicate interpolation-based estimates; and hollow red circles denote machine learning estimates generated using LASSO regression.

The selected countries span a broad range of geographic regions and development stages—from high-performing East Asian economies, such as Japan and the Republic of Korea, which consistently maintain average scores above 500, to lower-performing countries, such as Ghana and South Africa, which start from a lower baseline. The figure highlights diverse national trajectories: Japan demonstrates consistently high performance, while Brazil shows gradual improvement beginning in the 1990s. In contrast, countries such as Indonesia and Finland displayed declining trends in recent years. The interpolated and machine learning estimates align closely for most countries; however, discrepancies emerge in a few cases—notably in Ghana and Serbia—where early year extrapolations differ between the two approaches.

3.2. Constructing a Measure of Educational Quality

Annual educational quality data for the 15–19 age group were constructed based on the estimated test scores of 15 year olds. Using the selected estimation methods—interpolation and machine learning—these values were labeled as Tscore1519_INT and Tscore1519_ML, respectively. This study assumes that educational quality in year t corresponds to the test scores of 15 year olds assessed in that year, aligning with the design of international assessments that routinely evaluate cognitive skills at age 15. For example, the educational quality of the 15–19 age group in 2023 was calculated as the population-weighted average of annual educational quality from 2019 to 2023. Since assessments are typically conducted at four-year intervals, interpolation was employed to generate an annual dataset.

In addition, an index of educational quality was constructed for the working-age population,

Q_{t}

, defined as

Q_{t} = \sum_{a} e^{β_{q} q_{t}^{a}} l_{t}^{a}

(1)

where a is the age group (15–19, 20–24,…, 60–64),

l_{t}^{a}

is the population share in age group a at time t,

q_{t}^{a}

is the normalized test score for group a at time t, and

q^{15 - 19}

corresponds to the normalized Tscore1519.

β_{q}

indicates the return to the “normalized” test score, which is set at 9.5%, based on the estimated wage return to one standard deviation in the test score (Lee and Lee [3], Table 2, Column 2).

The methodology relies on two key assumptions designed to ensure empirical feasibility, given the structure of international assessment data.

First, it is assumed that

β_{q}

remains consistent across cohorts and countries.

Second, a uniform quality score is assigned to all individuals within a cohort, regardless of their educational track. For example, members of the 15–19 age group in 2023 received the same score irrespective of whether they attended primary, secondary, or tertiary institutions. This implies that the cohort assessed at age 15 in 2015 is assumed to have received the same quality of education throughout their schooling, both prior to and following the assessment.

A systematic approach was used to link test scores with age cohorts across different time periods. This methodology maintains consistent temporal relationships while accounting for cohort progression. For example, the 15–19 age cohort in 2015 (20–24 in 2020) incorporated the test scores from 2011–2015.

4. Data Analysis

Figure 2 presents the distribution of the educational quality index for the working-age population across 101 countries in 2023, comparing the estimates derived from the interpolation and machine learning approaches. Both methods yield broadly similar distributions, each exhibiting a roughly symmetric shape centered around a value close to 2.0.

Figure 3 plots educational quality, measured by the Q interpolation index, against educational quantity, represented by the average years of schooling among adults aged 25–64, based on Barro and Lee [12], for the year 2015. The scatterplot reveals a generally positive relationship between the two measures (correlation = 0.67), indicating that countries with higher educational quality also tend to exhibit longer average durations of schooling.

The upper-right quadrant features the strongest performers, countries that combine high educational quality with extensive schooling. The US exemplifies this pattern (13.33 years; Q = 2.11), along with Japan (12.83 years; Q = 2.30), the Republic of Korea (12.84 years; Q = 2.27), and Singapore (12.77 years; Q = 2.28). Germany (12.28 years; Q = 2.14) and other European countries also cluster in this high-performance group, reflecting strong systems in both educational access and learning outcomes.

In contrast, some countries exhibit substantial educational quantity, but comparatively low quality. South Africa stands out with 10.18 years of schooling but the lowest Q score in the dataset (1.48) indicating persistent challenges in translating schooling into learning. Qatar exhibited a similar pattern, with 9.41 years of schooling and a Q-score of 1.62.

The lower-left quadrant includes countries grappling with the dual challenge of limited educational access and low quality. Ghana (8.13 years; Q = 1.53) and Cambodia (4.87 years; Q = 1.69) exemplify this group, where resource constraints impede both school participation and learning outcomes.

Author Contributions

Conceptualization, J.-W.L.; methodology, H.L. and J.-W.L.; software, H.L.; validation, H.L. and J.-W.L.; formal analysis, H.L. and J.-W.L. investigation, H.L. and J.-W.L.; data curation, H.L.; writing—original draft preparation, H.L. and J.-W.L.; writing—review and editing, H.L. and J.-W.L.; visualization, H.L.; supervision, J.-W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study received no external funding.

Data Availability Statement

The datasets supporting the findings of this study are publicly available at https://doi.org/10.5281/zenodo.16778072. The repository provides (i) the test-score database with original and imputed international assessment scores for 1970–2023, (ii) annual educational-quality series for the 15–19 age cohort (1970–2023), and (iii) educational-quality indices for the working-age population (ages 15–64) for 2015 and 2023. The data were provided under the Creative Commons Attribution 4.0, International (CC BY 4.0) license, allowing for unrestricted use with appropriate attribution.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PISA	Programme for International Student Assessment
TIMSS	Trends in International Mathematics and Science Study
IEA	International Association for the Evaluation of Educational Achievement
CSV	Comma-Separated Values
IAEP	International Assessment of Educational Progress
LASSO	Least Absolute Shrinkage and Selection Operator
OECD	Organisation for Economic Co-operation and Development
NCES	National Center for Education Statistics
RMSE	Root-Mean-Squared Error
ML	Machine Learning
NAEP	National Assessment of Educational Progress
UN	United Nations

References

Angrist, N.; Djankov, S.; Goldberg, P.K.; Patrinos, H.A. Measuring human capital using global learning data. Nature 2021, 592, 403–408. [Google Scholar] [CrossRef] [PubMed]
Altinok, N.; Diebolt, C.; Demeulemeester, J.L. A new international database on education quality: 1965–2010. App. Econ. 2014, 46, 1212–1247. [Google Scholar] [CrossRef]
Lee, H.; Lee, J.W. Educational quality and disparities in income and growth across countries. J. Econ. Growth 2024, 29, 361–389. [Google Scholar] [CrossRef]
Braun, H.I.; Holland, P.W. Observed-score test equating: A mathematical analysis of some ETS equating procedures. In Test Equating; Holland, P.W., Rubin, D.B., Eds.; Academic Press: New York, NY, USA, 1982; pp. 9–49. [Google Scholar]
World Bank. World Development Indicators. 2025. Available online: https://datacatalog.worldbank.org/dataset/world-development-indicators (accessed on 14 June 2025).
World Bank. Education Statistics. 2025. Available online: https://datacatalog.worldbank.org/dataset/education-statistics (accessed on 14 June 2025).
Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef] [PubMed]
Little, R.J.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; John Wiley & Sons: New York, NY, USA, 2019. [Google Scholar]
IEA. TIMSS 2003 International Mathematics Report; TIMSS International Study Centre, Boston College: Chestnut Hill, MA, USA, 2003. [Google Scholar]
OECD. Learning for Tomorrow’s World; OECD Publishing: Paris, France, 2004. [Google Scholar]
OECD. Comparing the Similarities and Differences of PISA 2003 and TIMSS; OECD Education Working Papers; OECD Publishing: Paris, France, 2010. [Google Scholar]
Barro, R.J.; Lee, J.W. A new data set of educational attainment in the world, 1950–2010. J. Dev. Econ. 2013, 104, 184–198. [Google Scholar] [CrossRef]

Figure 1. Trends of test scores by country, 1970–2023. Note: This figure displays three types of data points. Solid black dots represent the observed test scores from international student assessments; hollow blue triangles indicate interpolated estimates; and hollow red circles correspond to machine learning-based estimates using the LASSO method.

Figure 2. Density distributions of educational quality for the working-age population, 2023.

Figure 3. Educational quality index vs. average years of schooling, 2015.

Table 1. Variable descriptions and definitions.

Variable	Description
CountryName	Country name
CountryCode	ISO 3-letter country code
Year	Year of observation
Observed_data_flag	Indicator for availability of original, observed test score (1: Yes; 0: No)
Tscore_INT	Harmonized test scores: based on original data and interpolated estimates
Tscore_ML	Harmonized test scores: based on original data and machine learning estimates
Tscore1519_INT	Educational quality indicator for cohort 15–19 based on Tscore_INT
Tscore1519_ML	Educational quality indicator for cohort 15–19 based on Tscore_ML
Q_INT	Educational quality index for working-age population (interpolation-based)
Q_ML	Educational quality index for working-age population (machine learning based)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, H.; Lee, J.-W. An Extended Dataset of Educational Quality Across Countries (1970–2023). Data 2025, 10, 130. https://doi.org/10.3390/data10080130

AMA Style

Lee H, Lee J-W. An Extended Dataset of Educational Quality Across Countries (1970–2023). Data. 2025; 10(8):130. https://doi.org/10.3390/data10080130

Chicago/Turabian Style

Lee, Hanol, and Jong-Wha Lee. 2025. "An Extended Dataset of Educational Quality Across Countries (1970–2023)" Data 10, no. 8: 130. https://doi.org/10.3390/data10080130

APA Style

Lee, H., & Lee, J.-W. (2025). An Extended Dataset of Educational Quality Across Countries (1970–2023). Data, 10(8), 130. https://doi.org/10.3390/data10080130

Article Menu

An Extended Dataset of Educational Quality Across Countries (1970–2023)

Abstract

1. Summary

2. Data Description

3. Methods

3.1. Data on International Test Scores

3.2. Constructing a Measure of Educational Quality

4. Data Analysis

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI