Assessing Education from Space: Using Satellite Earth Observation to Quantify Overcrowding in Primary Schools in Rural Areas of Nigeria

: Nigeria is a country with a rapidly growing youthful population and the availability of good quality education for all is a key priority in the sustainable development of the country. An important element of this is the need to improve access to high-quality primary education in rural areas. A key indicator for progress on this is the provision of adequate classroom space for the more than 20 million learners in Nigerian public schools because overpopulated classrooms are known to have a strong negative impact on the performance of both pupils and their teachers. However, it can be challenging to rapidly monitor this indicator for the over 60 thousand primary schools, especially in rural areas. In this research, we used satellite Earth Observation (EO) and Nigerian government data to determine the size of available teaching spaces and evaluate the degree of overcrowding in a sample of 1900 randomly selected rural primary schools across 19 Nigerian states spanning all regions of the country. Our analysis shows that 81.4% of the schools examined were overcrowded according to the minimum standard threshold for school size of at least 1.2 m 2 of classroom space per pupil deﬁned by the Federal Government of Nigeria. Such overcrowding can be expected to have a negative impact on educational performance, on achieving universal basic education and UN Sustainable Development Goal (SDG) 4 (Quality Education), and it can lead to poverty. While measuring ﬂoor area can be performed manually on site, collecting, and reporting such data for the number of rural primary schools in a large and populous country such as Nigeria is a serious, time-consuming administrative task with considerable potential for errors and data gaps. Satellite EO data are readily available including for remote areas, are reproducible and are easy to update over time. This paper provides a proof-of-concept example of how such EO data can contribute to addressing this socio-economic dimension of the SDGs framework.


Introduction
Nigeria is the most populous country in Africa, and seventh in the world. It also has one of the largest populations of youth in the world [1]. From an estimated 42.5 million people at the time of independence in 1960, Nigeria's population has grown to around 195 million in 2018 [2]. Although Nigeria became the largest economy on the African continent in 2014 [3], the country still faces many serious issues such as violent rebellion and terrorism, endemic corruption, low life expectancy, inadequacies in public health systems, income inequalities, and high illiteracy rates [4,5].
The educational system in Nigeria is based on the Universal Basic Education (UBE) programme that was launched in 1999 aiming to provide free, universal, and compulsory basic education for children aged 6 to 15 years old. UBE covers six years of primary school and three years of junior secondary education. This can be followed by optional three years of senior secondary education and four years of tertiary education [6]. It needs to be noted that primary education is the only level of education that is available in urban and rural areas throughout the developed and developing world and is the largest subsector of any education system and so offers a unique opportunity to contribute to the transformation of societies [7,8].
Nigeria's education system struggles with the challenge of a persistent lack of adequate facilities. There is evidence that UBE is challenged by multiple issues such as insufficient classroom space in relation to high pupil enrolment, inadequate furniture and no functional chalkboards, lack of maintenance of building infrastructure and the lack of teachers [9,10]. All these lead to overcrowded classrooms and limit the quality of educational attainment [11][12][13]. While the UBEC's report does not specify a legal minimum space requirement for classroom dimensions, it does provide provisions and guidance on space norms which include a minimum standard learning space of 1.2 m 2 / pupil in rural primary schools (1.4 m 2 /pupil for semi-urban and urban primary schools) [6]. Currently, among the suite of standards listed in the UBEC report, the key indicator that the Nigerian government uses for measuring quality education and equity is the Pupil-Teacher Ratio (PTR), with an ideal value set at 35:1 for primary schools [14]. Values higher than this equates to overcrowding in schools. However, while it is often claimed to be a key indicator, we found limited literature showing the PTR at the state and Local Government Areas (LGA) level in Nigeria, reflecting a weak and often highly politicised statistical system [15,16]. Many issues for national statistics offices in developing countries such as Nigeria often include a lack of timely data of suitable quality, a simple lack of data, limited independence of statistical information, unstable budgets, and misaligned incentives. These issues encourage the production of inaccurate data, the domination of national priorities by various sponsors, and limited access to and usability of the traditional data [17][18][19][20].
Despite these data challenges, researchers have conducted surveys in different states of Nigeria. For example, Opanuga et al. [14] noted that 81% of the 133 public primary schools in Ogun State have a PTR over 35:1. Moreover, a survey conducted by Ndem et al. [21] in Cross River State schools found an occurrence of high PTRs, such as 49:1 in primary and 62:1 at the junior secondary level. A most remarkable example of high PTR linked with overcrowding is noted in Sherry's 2008 [22]: "all the schools I have seen are hugely overcrowded. In one record case, in a rural school, I saw a class of over 200 pupils of ages ranging from 11 to 21 with only one teacher to attend to them." ( [22], pp. [39][40]. A five-year study that sought to support equitable access to education and improve the learning outcomes from basic education systems entitled 'Education Data, Research and Evaluation in Nigeria (EDOREN)' found that: "consideration needs to be given to alternative ways of assessing classroom overcrowding, to complement pupil-teacher ratio rates, as the latter does not necessarily give an accurate indication of the numbers on the ground and can give the impression that classes are of manageable size when in reality they are not" [9]. Overcrowded classrooms are well-known to be detrimental to educational outcomes [14,23] and have been reported in many studies as having a negative impact on adult and youth literacy [24,25]. Respondents of a survey conducted by Olaleye et al. [26] concluded that the shortage of building infrastructure of adequate quality was a major cause of overcrowded classrooms in Nigeria and Ikoya and Onoyase [27] presented a comprehensive national survey of primary school infrastructure that found 53% of the schools surveyed lacked fundamental structures. In addition, the assessment of basic education facilities in Kano, Jigawa, and Kaduna States by the Education Sector Support Programme in Nigeria (ESSPIN) concluded that around 75% of school infrastructure was "very poor" [28], while in Adamawa State 67% of public primary school classrooms were deemed to be in "poor condition" [13]. In a federal system such as in Nigeria, where taxes are raised at national and state levels, there can be disparities in wealth between states and this can influence the resources they have available to allocate to education [9]. Disparities in the resourcing of education between states can, in turn, lead to differences in education outcomes (e.g., literacy and numeracy) and differences in educational outcomes can in turn influence socio-economic indicators such as poverty; thus, there can be a negative reinforcement of inequality between states [29].
The present research aimed to evaluate the utility of data derived from satellite Earth Observation (EO) data as a direct measurement of classroom size and to determine the amount of classroom space allocated to pupils in rural primary schools in Nigeria. Although classroom areas could be individually measured by school staff, EO provides an opportunity for an independent, and, through image recognition machine learning algorithms, rapid assessment applicable to the whole country. Because Nigerian rural primary schools are built to a common pattern, they are easy to detect from satellite imagery. Each school is located near a road, with a playing field in front and a line of rectangular buildings (classrooms) behind and typically running parallel with the road. Thus, unlike surveys and measurements on site, EO satellite data provide the potential for a rapid, inexpensive, and accurate assessment [30].
In this study, we provide a first proof-of-concept assessment of the use of EO data for measuring school building footprints (area m 2 ) that could help governments and nongovernmental organisations (NGOs) quickly identify schools with overcrowded classrooms. Classroom areas (m 2 ) were measured for 1900 rural primary schools across 19 Nigerian states and in combination with available enrolment data, a determination was made of the area per pupil for each school. Primary schools in rural areas were chosen for the research because i) they are an important component of delivering Nigeria's UBE ambitions and ii) most rural primary schools are single-storey buildings facilitating accurate measurements from satellite imagery.
Having estimated the area per pupil as an indicator of resourcing per pupil, the research then sought to use the data to explore whether there are links with existing data on educational outcomes such as literacy and numeracy. The latter has often been noted in studies based in the developed world [31], but literature is scarce to support such an association for the developing world, especially for Sub-Saharan Africa [26,32].

Study Area
Policies aimed at providing free universal primary education for all children in Nigeria pre-date independence from Britain in 1960. In the 1950s the colonial government recognised that secondary and tertiary education should be prioritised to provide the required number of teachers to achieve universal primary education (UPE). The attainment of UPE gathered pace during the 1960s but was piecemeal as separate states implemented their policies [33]. However, in 1976 the military government launched a major effort to implement free UPE across the entire country based on a significant school building programme and the recruitment and training of teachers [33]. Despite the prioritisation of UPE by successive governments (both civilian and military), the realisation of UPE remains a challenge and, since 1976, there have been many initiatives designed to address bottlenecks and constraints within the system. Later the UPE programme has morphed into UBE.
Nigeria has a federal system of governance, and the country comprises 36 states, each with its own governor and state assembly, plus the Federal Capital Territory (FCT) which houses the capital city of Abuja. Within the states, there are 768 Local Government Areas (LGA) and six Area Councils within the FCT, totalling 774 [34]. Responsibility for educational institutions is shared between different bodies at the federal, state, and local government levels, and a suite of indicators have been developed to help assess the quality of UBE received by pupils and their attainment. Various standards for basic education were urged by the Universal Basic Education Commission (UBEC) in 2000 as part of Nigeria's efforts to achieve the second Millennium Development Goal (MDG) of universal primary education [6]. For convenience, these states and the FCT are often classified in terms of six geopolitical zones primarily based on location, but which would also broadly encompass Sustainability 2022, 14, 1408 4 of 21 the major ethnic groups in the country. The South-West geopolitical zone, for example, largely comprises the Yoruba ethnic group while the South-East largely comprises the Igbo ethnic group [35,36]. But given the country comprises hundreds of ethnic groups a geopolitical zone will not be homogenous [36,37]. The same point applies to factors such as religious belief. In broad terms, the country comprises an Islamic north and Christian south, but geopolitical zones such as the South-West and North-Central, in particular, will be mixed [36]. The country also has an economic axis that runs from south to north in terms of wealth per capita; the southern zones tend to be richer than the northern ones [38] and this results in southern states having more resources available for investment in education. The authors hypothesised that these differences in wealth between states would, in turn, result in differences in area per pupil.
For this study, 19 states ( Figure 1) were selected to span all six geopolitical regions of the country. These were: 1.
UBE received by pupils and their attainment. Various standards for basic education were urged by the Universal Basic Education Commission (UBEC) in 2000 as part of Nigeria's efforts to achieve the second Millennium Development Goal (MDG) of universal primary education [6]. For convenience, these states and the FCT are often classified in terms of six geopolitical zones primarily based on location, but which would also broadly encompass the major ethnic groups in the country. The South-West geopolitical zone, for example, largely comprises the Yoruba ethnic group while the South-East largely comprises the Igbo ethnic group [35,36]. But given the country comprises hundreds of ethnic groups a geopolitical zone will not be homogenous [36,37]. The same point applies to factors such as religious belief. In broad terms, the country comprises an Islamic north and Christian south, but geopolitical zones such as the South-West and North-Central, in particular, will be mixed [36]. The country also has an economic axis that runs from south to north in terms of wealth per capita; the southern zones tend to be richer than the northern ones [38] and this results in southern states having more resources available for investment in education. The authors hypothesised that these differences in wealth between states would, in turn, result in differences in area per pupil. For this study, 19 states ( Figure 1) were selected to span all six geopolitical regions of the country. These were:

Data Sources
Nigerian government data resources for rural primary schools [39] and satellite imageries from Google Earth Pro were used as the main data sources for the research. Firstly, the location of all primary schools was obtained via the Education Facilities in Nigeria (EFN) dataset [40] which includes school location (latitude and longitude), school type (primary, Sustainability 2022, 14, 1408 5 of 21 secondary, etc), school name, number of children registered, number of toilets, date of survey (survey period between 2009 and 2014), and number of teachers. The database comprises 98,667 primary schools across Nigeria and its goal was to build Nigeria's first nation-wide inventory of education facilities, to make the data collected available to planners, government officials, and the public, to be used to make strategic decisions for planning relevant interventions and to help achieve the MDGs.
The Google Earth Pro platform uses historical satellite and aerial imagery, at different spatial resolutions, which collect each image at a specific date and time. Most images used for this study came from satellites of very high resolution, and the date of the images was chosen to be close to the data of the pupil number survey (listed in [39]). For best results in measuring the footprints of school buildings, a top-down view of the images has been used as recommended in [39].
Given the lack of published and official data on the area (m 2 )/pupil or pupil density in primary schools in Nigeria, but also to support validation of Google Earth measurements and later the calculation of the teaching area per pupil, classroom buildings in a subset of schools were physically measured. A professional town planner team was recruited to conduct on-site measurements of school buildings in Ogun State. Due to the COVID-19 travel restrictions and practical issues, this state was one of the few where travelling was allowed at the time of the research but still there were constraints in terms of accessing the school interior. Although this state was not part of our original selection, the fact that all rural schools were built in a nationwide characteristic pattern and in a typical morphology, we expect similar results in other parts of the country.
The team measured the exterior dimensions of the building for 21 primary schools from rural areas (listed in Table S1), and these were compared to the estimates made via satellite images (Table S2).
A series of national survey data was used to establish a causal relationship between the space per pupil and the educational outcomes of literacy and numeracy rates, as these variables have often been noted in the literature to have a significant negative association with pupil density.
The youth literacy and numeracy percentages (children age 5-16 able to read) by the state were taken from the Nigeria Education Data Survey [41] for the 19 studied states (Table 1). This survey was designed to provide information about the ability of children aged 5-16 years old and adults in a sample of 30,000 households to read and be numerate. As a further level of exploration, we sought to check whether the area/pupil indicator is linked with data on a variety of socio-economic measures of poverty available at the state level in Nigeria. For the latter, we used the following widely used indices for measuring poverty [42]: the poverty headcount ratio at US$3.20, consumption poverty headcount, Multidimensional Poverty Index (MPI) headcount, and relative poverty. The data have been collected under different surveys and methodologies and calculated at the state level in Nigeria ( Table 2). The poverty rates of these indicators are given in Table 3 for each state in the present research. Data are based on primary household surveys obtained from Nigeria statistical agencies and the World Bank. The indicator is calculating the percentage of the population living on less than US $3.20 a day in 2011 international purchasing power parity (PPPs). A detailed description of this poverty indicator is presented by Ferreira et al. [42]. [43] Consumption Poverty Headcount (2013) Data on consumption are collected by the General Household Survey which asks the households about broad categories of consumed items of food, health care, schools. The indicator is obtained by aggregating information on food consumption and non-food consumption. [44,45] MPI Headcount (2013) MPI uses 10 indicators to measure poverty in three dimensions: education, health and living standard in which the intensity of poverty denotes the proportion of weighted indicators in which they are deprived. A person who is deprived in 90% of the weighted indicators has a greater intensity of deprivation than someone deprived in 40% of the weighted indicators. The proportion of the population that is multidimensionally poor is the incidence of poverty or MPI headcount ratio. This index was calculated using 2013 data from Demographic Health Surveys. The consumption poverty and MPI headcount indicators are both largely used to measure poverty, but the data are collected under two different surveys and methods, thus the poor according to the MPI does not always correspond to the poor measured according to consumption poverty.

Relative poverty (2010)
Relative poverty measurement is defined by the living standards of the majority and separates the poor from the non-poor. The threshold at which relative poverty is defined varies from one country to another, thus households with expenditure in Nigeria greater than two-thirds of the total household per capita expenditure are considered non-poor whereas those below it is poor. [46]

Analysis
An overview of the analysis approach used in this paper is shown in Figure 2. The analysis had the main aim of exploring the utility of a method for evaluating the overcrowding of an individual school from satellite EO imagery of the school buildings and national statistical office data on pupil enrolments. The question being asked here was whether it is feasible to use EO-derived data to assess building footprint area and thereby use that measure as part of the area per pupil indicator? This process includes an analysis of the challenges involved in the measurement process. The EO-based measurements were used to assess area per pupil and based on the target employed in Nigeria of at least 1.2 m 2 being needed for a pupil it was possible to assess the degree of overcrowding (i.e., the proportion of schools having < 1.2 m 2 /pupil). Firstly, we queried and extracted the EFN information relating to school type (e.g., primary), school management (e.g., public), schools' location, name, date of survey, and the number of pupils registered. Then, the width and length of each school building were measured using very high resolution (VHR) satellite images and the external area (m 2 ) was determined. The assessment of building footprint was directly checked against results obtained via ground-truthing (using a sample of schools in Ogun State; details below) and UBEC information that 15% of a school's built area should be attributed for administration [6]. This process allowed the authors to assess whether their measurements via EO had potential inaccuracies due to factors such as correct identification of buildings used for teaching rather than for other uses such as storage and also school buildings having a large veranda. However, even in the latter case, it is common for schools in Nigeria to use verandas as teaching spaces. Furthermore, we explored the results of the corrected teaching area to test for a possible association between educational attainment (youth literacy and numeracy rates) and poverty indices.

Evaluating the Teaching Area
The EFN datasets were queried by school type, to obtain data on 60,000 public primary schools across the country. Having the approximate coordinates of the school location, these were overlaid on the Google Earth satellite images and 1900 schools (100 schools per state) from rural areas were selected using a random selection, with at least 2 km between schools, from rural areas using buffer measure tool in Google Earth, spanning all the geopolitical zones.

Google Earth Schools' Measurements
The total area (m 2 ) of the school building was obtained by measuring the length and width of each building within the selected school, using historical images from the same year as the EFN survey (from 2011-2016). The measurements were performed by the same operator manually measuring the length and width of each building within the school using the ruler tool in Google Earth and storing these measurements in an Excel spreadsheet. Where images from the same year were not available, the closest year for when images were available was selected. Identifying the school buildings was straightforward, as almost all sample schools followed a common pattern as expected from the rapid school building programme that took place during the mid-1970s. Figure 3 shows four examples of school configurations and locations that present characteristic patterns and a typical morphology, such as a schoolyard with bare soil or mowed grass, rectangular-shaped buildings, having up to 5 building units in a row, L or U building layout, and being in a peripheral location in the village adjacent to the main road (dirt road in most cases). Thus, teaching blocks (perhaps comprising more than one classroom) can be readily identified as being the larger buildings surrounding the playing field. While a typical school classroom block will have a veranda, the pressure on space is such that these are also often used for teaching. For operational use at the national scale, an image-processing classification algorithm supported by machine learning techniques could be alternatively used to perform this step and there are likely to be benefits in acquiring operational VHR data and using a more sophisticated processing platform.

Evaluating the Teaching Area
The EFN datasets were queried by school type, to obtain data on 60,000 public primary schools across the country. Having the approximate coordinates of the school location, these were overlaid on the Google Earth satellite images and 1900 schools (100 schools per state) from rural areas were selected using a random selection, with at least 2 km between schools, from rural areas using buffer measure tool in Google Earth, spanning all the geopolitical zones.

Google Earth Schools' Measurements
The total area (m 2 ) of the school building was obtained by measuring the length and width of each building within the selected school, using historical images from the same year as the EFN survey (from 2011-2016). The measurements were performed by the same operator manually measuring the length and width of each building within the school using the ruler tool in Google Earth and storing these measurements in an Excel spreadsheet. Where images from the same year were not available, the closest year for when images were available was selected. Identifying the school buildings was straightforward, as almost all sample schools followed a common pattern as expected from the rapid school

Validation and Uncertainty Analysis
In any analysis method, and particularly when an approach is new, it is important to understand the uncertainty associated with the approach and to validate the results. This provides the information for users to judge the fitness for purpose of the data and the inferences drawn from them. Here, we wanted to evaluate a quantitative uncertainty associated with the area per pupil estimates and to validate the satellite measurements against on site measurements.
To validate the measurements taken from satellite images of Google Earth, we were aware of two main sources of uncertainties, one is that satellite images would not provide information about the building functionality (classroom, laboratory, office, veranda, etc.). The second concerns the repeatability of the measurement process in Google Earth.
The Guide to the Expression of Uncertainty in Measurement (GUM) provides a standard method for evaluating and propagating uncertainties [47] using two methods: the Law of Propagation of Uncertainties and Monte Carlo Analysis [48]. Both were employed in this analysis.
To evaluate the uncertainty, a sub-sample of 21 rural schools (with a total of 55 buildings) were selected for ground-truth measurement in Ogun State (South-Western Zone) and UBEC standards for administrative areas [6]. The team of planners provided for each school the width and length of each building and veranda (if present), along with a building plan as a pdf format document and a drone image (one example is presented in Figure 4). They also provided the number of buildings, number of building floors (always one for primary schools in our sample), details about building conditions, number of offices, and lavatory facilities within the school (see Table S1). From the on-site measurements, it was apparent that buildings with at least 6.5 m width had a veranda and toilets with a width of less than 4 m, and length less than 7 m are detached as small buildings (as seen in Figure 4the building without roof) (see Table S2). As the planner team was not allowed to take internal measurements within buildings, the administrative area is unknown. UBEC [6] recommends that 15% of the total building area for each school should be attributed for administrative purposes (e.g., office, storage). morphology, such as a schoolyard with bare soil or mowed grass, rectangular-shaped buildings, having up to 5 building units in a row, L or U building layout, and being in a peripheral location in the village adjacent to the main road (dirt road in most cases). Thus, teaching blocks (perhaps comprising more than one classroom) can be readily identified as being the larger buildings surrounding the playing field. While a typical school classroom block will have a veranda, the pressure on space is such that these are also often used for teaching. For operational use at the national scale, an image-processing classification algorithm supported by machine learning techniques could be alternatively used to perform this step and there are likely to be benefits in acquiring operational VHR data and using a more sophisticated processing platform.  [39] and some school structures might have changed since these images.

Validation and Uncertainty Analysis
In any analysis method, and particularly when an approach is new, it is important to understand the uncertainty associated with the approach and to validate the results. This provides the information for users to judge the fitness for purpose of the data and the inferences drawn from them. Here, we wanted to evaluate a quantitative uncertainty associated with the area per pupil estimates and to validate the satellite measurements against on site measurements.
To validate the measurements taken from satellite images of Google Earth, we were aware of two main sources of uncertainties, one is that satellite images would not provide  [39] and some school structures might have changed since these images.
To estimate the uncertainty associated with the measured building sizes, we considered two components that we defined as "reproducibility uncertainty" (accuracy of the Google Earth measurements compared to on-site measurements) and "repeatability uncertainty" (repeatability of multiple measurements in Google Earth).
It should be noted that Google imagery does not have the resolution needed to provide sharp demarcations for the buildings. Thus, it was not possible to know precisely where the edge of the building was in the image. Hence it is important to establish an uncertainty associated with the Google-based assessments of area and inferences.
for primary schools in our sample), details about building conditions, number of offices, and lavatory facilities within the school (see Table S1). From the on-site measurements, it was apparent that buildings with at least 6.5 m width had a veranda and toilets with a width of less than 4 m, and length less than 7 m are detached as small buildings (as seen in Figure 4-the building without roof) (see Table S2). As the planner team was not allowed to take internal measurements within buildings, the administrative area is unknown. UBEC [6] recommends that 15% of the total building area for each school should be attributed for administrative purposes (e.g., office, storage).  To estimate the uncertainty associated with the measured building sizes, we considered two components that we defined as "reproducibility uncertainty" (accuracy of the Google Earth measurements compared to on-site measurements) and "repeatability uncertainty" (repeatability of multiple measurements in Google Earth).
It should be noted that Google imagery does not have the resolution needed to provide sharp demarcations for the buildings. Thus, it was not possible to know precisely where the edge of the building was in the image. Hence it is important to establish an uncertainty associated with the Google-based assessments of area and inferences.

Model Development
As described above, the initial measurements of buildings using Google Earth might contain inaccuracies associated with building functionality since there are aspects that are opaque to satellite EO, such as the size of administrative areas (e.g., offices), storage areas, As described above, the initial measurements of buildings using Google Earth might contain inaccuracies associated with building functionality since there are aspects that are opaque to satellite EO, such as the size of administrative areas (e.g., offices), storage areas, and verandas. The on-site measurements revealed that small buildings with less than 4 m width and 7 m length are non-teaching areas (e.g., toilet blocks or storage areas) and hence they were removed from further analysis. For this analysis, and despite the fact that schools often use verandas as teaching areas, we removed a veranda area for larger buildings. We also removed our best estimate of administrative areas.
The calculation of "teaching area per pupil" thus followed Equations (1) and (2).
where, A T,j is the corrected teaching area of the jth school (in m 2 ), i is an index representing the individual school buildings (in total there are n buildings), L i is the measured (from satellite imagery) external length of the ith building, in metres, W i is the measured (from satellite imagery) external width of the ith building, in metres, W V,i is the assumed width of the veranda for school building i. Based on the analysis of the 21 schools measured on-site, it was assumed that the veranda width would be 2 m for buildings wider than 10 m and 1.6 m for buildings from 6.5 m to 10 m wide. Smaller buildings less than 6.5 m had no veranda. Mathematically expressed: W i > 10 m 1.6 m, 10 m > W i > 6.5 m 0 m, W i < 6.5 m To account for the space of administrative areas, an extra term was included: (1 − S off )B j is a term that reduces the teaching area of the school if the buildings are big enough to have office and storage space. If the school has more than three buildings, then it was assumed that the office space is 15% of the total school area after removing the veranda.
S off = 0.15 is the proportion of the building area taken up by offices.
is a Boolean that takes the value 1 if there are 3 or more buildings in school j and 0 otherwise. Therefore, the teaching area per pupil (α ) is given by where, A T is the corrected teaching area, N p is the number of pupils in a measured school.

Estimation of the Uncertainty of the Google Earth Measurements of the Buildings
To estimate the repeatability uncertainty, each of the 55 buildings in the 21 reference schools was measured 10 times (by the same operator) in Google Earth. The standard deviations of the 10 measurements are shown in Figure 5a and show the spread (28%) expected from a standard deviation calculated from just ten measurements. Those values also show no pattern as a function of actual length or width and therefore we consider the mean value (0.3 m) to be the uncertainty associated with random effects in a single Google Earth Pro measurement (the 1900 schools in the main set were each only measured once). To estimate the reproducibility uncertainty, we took the average value of the 10 measurements of the length and width from the Google Earth Pro measurements and subtracted the on-site measured length or width from this. The differences obtained are given in Figure 5b. If the difference could be entirely explained by the random repeatability effects, we would expect the uncertainty associated with the mean of the ten measurements to be equal to the uncertainty associated with a single measurement (0.3 m) divided by √10 (i.e., 0.095 m). We see from Figure 5b that the actual spread is closer to 0.4 m (the standard deviation of the points is 0.43 m).
The increased spread of 0.4 m, in Figure 5b is symmetrical around the 0 axis; there is no obvious systematic bias between on site measurements and Google Earth Pro, but there is a randomly distributed difference between the two that cannot be accounted for by the random spread in the Google Earth Pro measurements alone.
From these two analyses, we can determine that the uncertainty associated with a single (rather than the mean of ten) measurement in Google Earth Pro contains a repeatability component of 0.

Uncertainty Analysis for Teaching Area per Pupil
To establish the uncertainty associated with the teaching area per pupil for a single school, Monte Carlo analysis was performed, using the uncertainty distributions described in Table 4. For this, a Python-coded algorithm that calculates Equations (1) and (2) was placed within a "for loop" and run 50 times (see Table S3). Within each loop, the different parameters were varied, for example by adding to the length and width of each building a random error from a Gaussian distribution with a standard deviation of 0.5 m, or by treating the office proportion, Soff as a quantity taken from a uniform random distribution between 0.10 and 0.20. The different errors were treated entirely independentlythat is a separate random number was generated for every length and width measurement of every building in every school and for every Monte Carlo iteration. To estimate the reproducibility uncertainty, we took the average value of the 10 measurements of the length and width from the Google Earth Pro measurements and subtracted the on-site measured length or width from this. The differences obtained are given in Figure 5b. If the difference could be entirely explained by the random repeatability effects, we would expect the uncertainty associated with the mean of the ten measurements to be equal to the uncertainty associated with a single measurement (0.3 m) divided by √ 10 (i.e., 0.095 m). We see from Figure 5b that the actual spread is closer to 0.4 m (the standard deviation of the points is 0.43 m).
The increased spread of 0.4 m, in Figure 5b is symmetrical around the 0 axis; there is no obvious systematic bias between on site measurements and Google Earth Pro, but there is a randomly distributed difference between the two that cannot be accounted for by the random spread in the Google Earth Pro measurements alone.
From these two analyses, we can determine that the uncertainty associated with a single (rather than the mean of ten) measurement in Google Earth Pro contains a repeatability component of 0.3 m and a reproducibility component of 0.4 m. The uncertainty associated with a single building's length and/or width is obtained by combining these two quantities according to the GUM's law of propagation of uncertainties, and is, therefore:

Uncertainty Analysis for Teaching Area per Pupil
To establish the uncertainty associated with the teaching area per pupil for a single school, Monte Carlo analysis was performed, using the uncertainty distributions described in Table 4. For this, a Python-coded algorithm that calculates Equations (1) and (2) was placed within a "for loop" and run 50 times (see Table S3). Within each loop, the different parameters were varied, for example by adding to the length and width of each building a random error from a Gaussian distribution with a standard deviation of 0.5 m, or by treating the office proportion, S off as a quantity taken from a uniform random distribution between 0.10 and 0.20. The different errors were treated entirely independently -that is a separate random number was generated for every length and width measurement of every building in every school and for every Monte Carlo iteration.  (1) and (2), and the probability distribution used to create the error for the Monte Carlo simulation.

Equations (1) and (2) term Probability Distribution the Monte Carlo Error Is Drawn From Where This Came From
N p , number of pupils 0 It is assumed that the number of pupils is known from enrolment statistics without uncertainty.
Set of L i , W i for this school: lengths and widths of the external buildings measured in Google Earth A Gaussian (normal) distribution centred on the original measurement, with a standard deviation of 0.5 m. Note each length and width has a different random error drawn from this distribution.
The analysis is described in Section 2.5.2 and Figure 5a,b. Statistically determined.
No uncertainty is associated with the step points (6.5 m and 10 m). Veranda width is described by a Gaussian (normal) distribution centred on the calculated width (2 m or 1.6 m), with a standard deviation of 0.3 m.
The on situ data showed this variety in the veranda widths (see Section 2.5.1).
S off = 0.15, the proportion of the buildings taken up by offices (for a school big enough) is 15% Office proportion has taken as a uniform distribution from S off = 0.10 to S off = 0.20. That is each school that is big enough for an office is assigned an office proportion randomly from this interval with an equally likely probability of any value in this interval The authors do not have any strong justification for this range and have made a "best guess" based on the UBEC report [6] requirement of 15% area for a school, and allowing for a "reasonable" range of values around this.
Boolean criterion to decide whether or not to subtract office space.
No uncertainty is assumed.
Arguably, other criteria could be used to decide whether or not to select office space, but this was not analysed in the Monte Carlo simulation.

Form of equation.
No uncertainty is assumed.
Arguably, the form of Equation (1) could be different-for example, the office area could be removed before subtracting a veranda. But for this analysis, alternative forms were not considered.
Monte Carlo simulations are a method of uncertainty analysis described in the GUM [48]. The standard deviation of the results of the 50 individual Monte Carlo it-Sustainability 2022, 14, 1408 13 of 21 erations provides an uncertainty estimate for the results obtained without perturbing the data. For Monte Carlo simulation to provide a reliable estimate of the uncertainty, an estimate of the uncertainty associated with the individual input parameters is required in order to define the probability distribution from which the random errors are calculated. Table 4 lists the uncertainties that we assumed.

Statistical Analysis using Socio-Economic Indicators
It has been well-reported in the literature that pupil density (or area/pupil) is linked to outcomes such as literacy and numeracy [14,[23][24][25]. Therefore, we attempted to test the validity of the values obtained from Equation (2) and evaluate the relationship between teaching area/pupil and percentage of children (age 5-16) able to read and be numerate at the state level using Welch's ANOVA.
Welch's ANOVA is a test of multiple comparisons of means (a modified one-way ANOVA) that is appropriate to use when there are unequal sample sizes and heterogeneity variance. Non-parametric methods such as Kruskal Wallis can be also used but Welch's ANOVA fits better especially with heterogeneous large datasets [49]. Welch's ANOVA was performed in Excel (extension Sigma XL), so we tested the hypothesis as follows: • Null Hypothesis (H0): all five groups means are the same • Alternative hypothesis (Ha): at least one mean is different First, we used the teaching area (m 2 )/pupil data measured for 1900 schools and the literacy and numeracy rates related to the states where the schools are located. Then, we grouped the states into 5 classes for the literacy and numeracy rates in Tables 5 and 6. To demonstrate Welch's ANOVA, we used the literacy and numerate groups in relation to the mean teaching area (m 2 )/ pupil.

Values and Associated Uncertainties for Each School
To establish the uncertainty associated with the teaching area per pupil calculated according to Equation (2), a Python program was written to calculate Equations (1) and (2) Sustainability 2022, 14, 1408 14 of 21 50 times as a Monte Carlo simulation (Table S3). As an example of the output, Figure 6 shows the 50 Monte Carlo runs on the teaching area/pupil and the original data analysis (Equations (1) and (2)) (illustrated in black diamond) on the first 50 schools in the dataset. For the vast majority of schools, the originally measured value is close to the centre of the Monte Carlo distribution. However, there are cases when it comes closer to the top or bottom, due to those schools where at least one building is close in width to the boundary conditions for having a veranda or not. Such cases create a bias between the Monte Carlo result set and the original data. Moreover, Tables S4 and S5, Figures S1-S3 include a further investigation on both biases (difference between the average of the Monte Carlo output and the originally determined value) and the standard deviation of the Monte Carlo output.

Basic Summary Statistics
From the Monte Carlo analysis, we determined that, typically, for a single school with the teaching area per pupil calculated according to Equations (1) and (2), the uncertainty associated with that school "teaching area per pupil" was 10% of the value (see Supplementary Material Table S5, Figures S2 and S3).
A histogram of the area per pupil calculated for the 1900 individual schools is given in Figure 7. This histogram was calculated an additional 50 times, each time using the results of a separate Monte Carlo run for the 1900 schools. The error bars in Figure 7 are calculated as the standard deviation of the histograms calculated for the 50 Monte Carlo runs results.

Basic Summary Statistics
From the Monte Carlo analysis, we determined that, typically, for a single school with the teaching area per pupil calculated according to Equations (1) and (2), the uncertainty associated with that school "teaching area per pupil" was 10% of the value (see Supplementary Material Table S5, Figures S2 and S3).
A histogram of the area per pupil calculated for the 1900 individual schools is given in Figure 7. This histogram was calculated an additional 50 times, each time using the results of a separate Monte Carlo run for the 1900 schools. The error bars in Figure 7 are calculated as the standard deviation of the histograms calculated for the 50 Monte Carlo runs results.
We calculated the summary statistics for the set of 1900 schools (Table 7) and found that 81.4% of the measured schools do not have a minimum required teaching space for children (a school is overcrowded if the teaching area per pupil is less than 1.2 m 2 ). Note that the standard uncertainties for the statistical values in Table 7 are considerably less than the 10% standard uncertainty associated with a single school. This is because the uncertainties are random from school to school and are reduced in effect by the very large number of schools considered. the teaching area per pupil calculated according to Equations (1) and (2), the uncertainty associated with that school "teaching area per pupil" was 10% of the value (see Supplementary Material Table S5, Figures S2 and S3).
A histogram of the area per pupil calculated for the 1900 individual schools is given in Figure 7. This histogram was calculated an additional 50 times, each time using the results of a separate Monte Carlo run for the 1900 schools. The error bars in Figure 7 are calculated as the standard deviation of the histograms calculated for the 50 Monte Carlo runs results.   (1) and (2), error bars represent the standard uncertainty associated with this calculated from the Monte Carlo outputs. Welch's ANOVA was applied to understand the relationship between the mean area m 2 / pupil and education performance (literacy and numeracy rates) analyses. Therefore, Table 8 presents the sample size (number of schools), the mean of area m 2 / pupil, standard error and standard deviation of each literacy and numeracy group (also presented above in Tables 5 and 6, Section 2.6.2), as well as the results of Welch's ANOVA. The results of applying Welch's ANOVA's show that there are statistically significant (p < 0.0001) differences in teaching area/pupil between the five literacy and numeracy groups, indicating that lower teaching means area m 2 /pupil (overcrowded) are associated with low literacy and numeracy rates (% children 5-12 able to read and numerate), and conversely when more space is allocated to the pupil gradually increased, there is better performance in schools. Figure 8 illustrates the percentage of schools measured that are overcrowded or meet the minimum size required, overlaid on the poverty indices: (a) poverty headcount ratio at $3.20 (% of total population, state level), (b) consumption poverty headcount (by state), (c) MPI headcount (by state), (d) poverty relative (% of total population, statelevel). The poverty rates described by the four indicators have similar trends, Northeastern and Northwestern States are the poorest states, while Southwestern and South-Southern states show the lowest poverty rates, with slight differences in Consumption poverty (b) probably because our estimates were performed in the rural areas and the consumption rates are normally higher in urban zones. Overall, schools with lower teaching areas/pupils (<1.2 m 2 ) are associated with the populations of that state being poorer. States such as Sokoto, Zamfara, Bauchi, Gombe, Kaduna, Oyo, Benue, Taraba, Nasarawa, and Kwara, where over 80% of measured schools per state are overcrowded, poverty is also at the highest level. Conversely, the states with lower poverty rates (e.g., Anambra, Enugu, Delta, Cross River, Abia, Ondo, Osun, and Cross River) generally have less overcrowded schools.

Discussion
This paper presents a novel application of EO satellite data in measuring the teaching floor area of a sample of 1900 rural public primary school buildings across 19 Nigerian States. We relate these measurements to nationally reported pupil enrolment data, thus determining how many of these schools would be deemed overcrowded. This approach identified that 81.4% (±0.2%) of the schools measured appeared to be overcrowded. In order to illustrate the potential value of our results, we performed further exploration of the distribution of the overcrowded schools and their interlinkage with literacy and nu-

Discussion
This paper presents a novel application of EO satellite data in measuring the teaching floor area of a sample of 1900 rural public primary school buildings across 19 Nigerian States. We relate these measurements to nationally reported pupil enrolment data, thus determining how many of these schools would be deemed overcrowded. This approach identified that 81.4% (±0.2%) of the schools measured appeared to be overcrowded. In order to illustrate the potential value of our results, we performed further exploration of the distribution of the overcrowded schools and their interlinkage with literacy and numeracy rates, and poverty indices.
While measuring floor area could be performed manually on-site by school staff or others, the collection and reporting of such data for the number of rural primary schools in a large and populous country such as Nigeria is a substantial, expensive, and time-consuming administrative task, with potential for miscalculation and data gaps. On the other hand, EO data are readily available, address issues of accessibility in remote areas, are easily operated (convenient and free use of Google Earth), and are easy to update over time as schools add more classroom buildings. For example, from the historical satellite imagery incorporated in Google Earth, we were able to observe from establishing our sample of 1900 schools, that 113 of the schools had been extended and another 130 schools had been demolished between 2011 and 2019. Therefore, we suggest that EO data can provide a reliable, accurate, and convenient means for assessing classroom areas at the national scale and this has the potential to be automated via Artificial Intelligent/machine learning approaches (see Yazdani et al., [50] for identifying rural schools in Liberia). The advantages of this approach are probably most likely to be realised in the developing world where issues of accessibility to rural schools are especially challenging. Indeed, we consider the ability to rapidly and remotely evaluate overcrowding in the rural primary schools presented in this study, can help government agencies and NGOs in recognising priorities and to target attention and investment. Likewise, the method can be used in other countries where the spatial pattern of the school buildings and the number of students enrolled in school is understood.
As noted, EO data are convenient and easily available, but some limitations exist. We based our analysis on the total area of the school buildings and assumed that this was primarily dedicated to classrooms. Classrooms are understood to be the major use of the space, but satellite images cannot distinguish other usages within the building (e.g., administration uses, storerooms, etc.). Therefore, for calculation of classroom space per pupil, small buildings (width < 4 m and length < 7 m) were removed from the analysis for schools with more than 2 buildings assuming these are lavatory facilities. Veranda space was also extracted from classroom space based on the building width measurements, detailed exclusion criteria, and uncertainty analysis. All assumptions were established using the trends observed from the on-site measurements. The total remaining school building area obtained for classroom space was further reduced by 15% as per UBEC recommendations for administration uses [6].
Secondly, the EO images cannot determine the quality of the classroom space such as the internal condition of the building, availability of desks, availability of equipment, blackboards, etc., or lack of teachers (e.g., 'ghost' teachers-a type of fraud that often occurs in developing countries) [51]. Thirdly, the approach is not straightforward for schools that have more than one storey (mostly in urban areas). In Nigeria, the majority of rural schools were built to a common single-storey design, but urban schools often have multiple storeys. It may be possible to estimate height using shadow length taken at a particular time of the day, but even so, the assumptions become more complicated.
In our previous work [52,53], we discussed the importance of providing validation information and estimating uncertainty associated with indicators calculated from EO data sets. Despite the pandemic time, we obtained the on-site school measurements of 21 schools from one state and consider that to be a representative sample of common single-storey designed buildings used across rural areas in the country. Hence, the results of this study have been validated by both on-site measurements and statistical analysis using socio-economic indicators.
Overcrowded schools result from increases in student enrollment that are unmatched by school resources. School overcrowding and poverty are two distinct social issues, but literature shows that they are related and take part in a cause-effect relationship (shown in Figure 9). Insufficient classroom area can cause indirectly poverty, as children drop out of primary school without achieving minimal academic performance [54,55] and is also reflected in literacy and numeracy results [14,[23][24][25], and vice versa when poverty (state not having enough money to spend in education) causes a lack of school resources [56][57][58]. Our study has shown that EO approaches can be important, efficient, enabling tools revealing school overcrowding in this case and helping authorities to make progress on a number of the UN SDGs. Figure 9 provides a conceptual model showing where EO can play a key role in making progress on the critical socio-economic relationships that affect the achievement of both SDG4 for quality education and promotion of lifelong learning and SDG 1 on poverty alleviation.
x FOR PEER REVIEW 19 of 22 58]. Our study has shown that EO approaches can be important, efficient, enabling tools revealing school overcrowding in this case and helping authorities to make progress on a number of the UN SDGs. Figure 9 provides a conceptual model showing where EO can play a key role in making progress on the critical socio-economic relationships that affect the achievement of both SDG4 for quality education and promotion of lifelong learning and SDG 1 on poverty alleviation. This study supports the need for an increased awareness of the value of satellite EO approaches for identifying both the specific example of overcrowded classrooms and, more generally, in supporting the socio-economic SDGs (as well as the environmentally focussed ones). Given the challenges involved in implementing the SDGs then the adaptation of a number of indicators to enable them to make use of readily available EO data presents a valuable opportunity to calibrate progress efficiently and economically.

Conclusions
The following conclusions can be drawn from this research:

•
Overcrowded classrooms with less than 1.2 m 2 /pupil in rural primary schools in Nigeria were readily identified using satellite EO tools in combination with available school enrolment data.

•
Results show that 84.4% (±0.2%) of schools measured are overcrowded and these are reflected in the education attainment (using literacy and numeracy rates) and poverty levels. The use of satellite images offers cost and time-efficient data to support improvements to education in Nigeria and elsewhere, particularly for those schools with one floor, and a simple measurement model and Monte Carlo Analysis can provide uncertainty to those satellite estimations that can be used to assess the fitnessfor-purpose of the satellite data.

•
Assessing pupil density using satellite EO can provide important information to help progress towards the UN SDGs for quality education and lifelong learning (SDG 4), This study supports the need for an increased awareness of the value of satellite EO approaches for identifying both the specific example of overcrowded classrooms and, more generally, in supporting the socio-economic SDGs (as well as the environmentally focussed ones). Given the challenges involved in implementing the SDGs then the adaptation of a number of indicators to enable them to make use of readily available EO data presents a valuable opportunity to calibrate progress efficiently and economically.

Conclusions
The following conclusions can be drawn from this research:

•
Overcrowded classrooms with less than 1.2 m 2 /pupil in rural primary schools in Nigeria were readily identified using satellite EO tools in combination with available school enrolment data.

•
Results show that 84.4% (±0.2%) of schools measured are overcrowded and these are reflected in the education attainment (using literacy and numeracy rates) and poverty levels. The use of satellite images offers cost and time-efficient data to support improvements to education in Nigeria and elsewhere, particularly for those schools with one floor, and a simple measurement model and Monte Carlo Analysis can provide uncertainty to those satellite estimations that can be used to assess the fitnessfor-purpose of the satellite data. • Assessing pupil density using satellite EO can provide important information to help progress towards the UN SDGs for quality education and lifelong learning (SDG 4), equal access to opportunities (SDG 10), and reduce poverty (SDG 1). In wider terms, this study has also highlighted how EO-derived information can offer effective and complementary support for sustainable development, including for indicators that are more closely aligned with social dimensions.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/su14031408/s1, Table S1. Information about the on-site school measurements, Table S2. On site measurements compared to satellite images measurements, Table  S3. Python codes are used to calculate school overcrowding and the associated uncertainty Table S4. Further explanation of differences between original calculations and the Monte Carlo analysis, Table  S5. Monte Carlo Analysis for individual schools. Figure S1. School number 6 (blue dots) and 36 (grey dots) show the 50 Monte Carlo (MC) runs and the original measurements (brown and green lines) on the teaching area (m 2 )/pupil, Figure S2. In blue is the standard deviation of the school teaching area per pupil as a function of school teaching area per pupil. Negative values are given (calculated as -1 times the standard deviation) as well, to show the full spread. In orange the bias, calculated as the difference between the mean of the Monte Carlo output and the teaching area per pupil calculated from the original dataset, Figure S3. In blue is the standard deviation of the school teaching area per pupil as a function of school teaching area per pupil. Negative values are given (calculated as -1 times the standard deviation) as well, to show the full spread. In orange, the bias, calculated as the difference between the mean of the Monte Carlo output and the teaching area per pupil calculated from the original dataset.