Reporting on the Sustainable Development Goals (SDGs) (2015–2030) has become more complicated than reporting on the Millennium Development Goals (MDGs), given the increase in the number of goals, targets, and indicators. SDG 6 [1
] is focused on ensuring availability and sustainable management of water and sanitation for all. It has eight targets and eleven indicators, whereas there is no separate MDG on water and sanitation. The MDGs have only three related indicators under Goal 7 (ensure environmental sustainability). Traditionally, the reporting uses census data from the National Statistics Office (NSO) and household surveys from ministries. Fritz et al. [2
] give an overview of new additional data sources that have become available for measuring the SDGs. The ongoing digitization of society has led to an exponential increase in the volume of so-called Big Data. Big Data is not only large in volume, but is also produced continuously and varies in nature (structured and unstructured data). In addition to Big Data, Small Data also becomes more and more unlocked. Small Data is data from a wide variety of stakeholders, produced in a tightly controlled way using sampling techniques that limit their scope, temporality, size, and variety [3
In terms of Big Data, See et al. [4
] provide an overview of the value of combining remote sensing and geospatial data for more effective monitoring of SDGs. Geospatial data is defined here as data with explicit geographical locations. Walz et al. [5
] show how remote sensing data and geostatistical data can be used to monitor the progress of an indicator from a global framework at the municipality level. In this case, the approach is tested on the Sendai framework, but the same approach can, in principle, also be tested for specific SDG indicators. We note that also remote sensing imagery contains geospatial data in terms of the bounding box of the images and the spatial reference system; however, in this case, no precise locations or objects are identified beforehand. Examples of geospatial data are mobile phone or social media data and Volunteered Geographic Information (VGI). ICT platforms have been developed that allow both professionals and citizens to report on water points via mobile devices (mostly phones) [6
]. Fraisl et al. [7
] mapped citizen science contributions to the UN SDGs and showed that based on the mapping exercise the Group on Earth Observations (GEO) undertook, [8
] of the 29 indicators identified by GEO, citizen science could support 24. The Global Partnership for Sustainable Development Data (GPSDD) advocates for this kind of citizen-generated data [9
]. It can complement official data sources, fill in data gaps, and give those hard to reach a voice on issues that matter the most to them. Georeferenced tweets or posts on social media can contain information on the functioning of public infrastructure. However, most developing countries where the monitoring of SDG 6 is essential have low internet and social media penetration rates, especially in rural areas. Call detail records can form a proxy for the number of users of a water point but are very hard to get access to and are often biased. Missing Maps [10
] is an open VGI collaboration founded by the Humanitarian OpenStreetMap Community (HOTOSM), Médecins Sans Frontières (MSF), and the British and American Red Cross. The objective of this project is to map the most vulnerable places in the developing world so that humanitarian organizations can use these maps and data to better respond to crises. Through the Tasking Manager of Missing Maps, organizations can ‘request’ remote volunteers to trace aerial imagery for a particular area. The created polygons, lines, points, and attribute information are saved and stored as free and open data in the OSM database and can be accessed like other regular OSM extracts.
In terms of Small Data, humanitarian and development organizations regularly collect data on water and sanitation through household surveys, usually the areas where they intervene. The WHO/UNICEF Joint Monitoring Programme (JMP) collects household data, globally, on Water and Sanitation for Health (WASH) through surveys and aggregates this on a country level [11
]. However, many other humanitarian and development actors collect data on specifically, for example, SDG 6.1.1. Proportion of population using safely managed drinking water services. Van den Homberg and Susha [12
] developed a framework to characterize a data ecosystem and applied it to water points in Malawi. The framework consists of five dimensions: data infrastructure, data supply and demand, data governance, and actors. Results show that many governmental and NGO actors are involved in water supply projects with different funding sources and little overall governance. There is a large variety of geospatial data sharing platforms and online accessible information management systems with, however, a low adoption due to limited internet connectivity and low data literacy. The framework was also used to characterize the data quality of these data sources and to identify the gaps, such as lots of data not being open. Verplanke and Georgiadou [13
] describe the complexity of establishing an open database to map all rural water points in an African nation (Tanzania). When bringing together unharmonized data from different sources, measurement errors have to be inventoried and characterized. Causes for errors range from material, observational, conceptual, and discursive errors [13
]. Taking water quality as an example, some measurement methodologies are based on visual inspection or tasting of the water and are thus subjective, whereas other methods rely upon chemical test kits [13
These new Big and Small Data sources offer opportunities to complement official statistics for reporting on SDGs, especially at the subnational level. Little attention has been devoted to SDG information on a subnational level. The UN SDG 6 report [14
] identifies the lack of data in rural areas as a challenge for tracking progress. The recently launched UN-Water SDG 6 Data Portal [15
] makes data on SDG 6 available in a user-friendly interface but is limited by the level at which data are available. For many countries such as Malawi, local data are lacking in the portal restricting the usability of the data by decision-makers. UN Water has introduced so-called data drives, where custodian agencies offer support to the focal points in terms of compiling data from different sources in a variety of ways, such as providing methodologies, helpdesks, webinars, and workshops [14
]. Malawi published in June 2020 their first Voluntary National Review report for SDGs [16
]. It describes the mechanism of how at the subnational level, local councils can coordinate the implementation and monitoring of the SDGs. However, the review admits that tracking progress and reporting on various initiatives need to be strengthened at these local levels. The description of SDG 6.1.1 clearly shows this, as only numbers at the national level are given and only progress in the period before 2016.
This paper assesses how Big Data can be used to complement data coming from Small Data to improve subnational reporting on SDG 6.1.1. For a case study in Malawi, we compare high and low-resolution satellite imagery with Unmanned Aerial Vehicle (UAV) imagery to find out which remote sensing imagery has sufficient resolution for water point identification. We assess the added value of combining the UAV imagery, VGI, and field survey data from different data providers by (a) cross-validating and resolving discrepancies in information on water point attributes from these different data providers and (b) enriching information on attributes or assessing if attributes can be added.
summarizes how data from various remote sensing, VGI, and field survey data can be combined to get more information on water points and their attributes. The left column is a long list of all the attributes found in the different water point datasets as provided by both governmental and NGO data providers [12
]. We added an attribute on the number of users per water point.
UAV images typically have a higher resolution than satellite images and are therefore more suitable for water point detection. The images we were able to collect with UAVs have a resolution of 11 cm, which is sufficient for the identification of water points. Satellite imagery of 30 cm resolution enables identifying the somewhat larger or walled water points, whereas 50 cm satellite imagery does not. The combination with OSM building footprint data are powerful and enables us to get more insights. However, the combination of OSM and UAV data did not fill all information gaps. Whereas our pilot proves that UAV imagery is promising in closing some information gaps, field surveys will remain necessary. Water quality, water point management, and whether a water point provides a free or a paid service cannot be determined from UAV imagery. Increased adoption of data collection tools that can capture spatial data instead of using paper or other non-spatial collection methods contributes to filling information gaps. Nevertheless, combining remote sensing data with field survey data can play an important role, especially if, for example, the remote sensing data are taken at more regular intervals than the field surveys.
Besides information gaps, combining remote sensing, VGI and field survey data is essential in responding to geographical gaps. Location analysis can be used to identify the spatial deficits in information coverage. Once these are identified, new data can efficiently be collected in target areas. This can be done either by conducting field surveys or by UAV missions as described above. Hereby the OSM community can support by mapping the targeted areas on the UAV imagery.
In terms of the different aerial imagery available, satellite imagery would be the most scalable compared to UAV and paerial photography. Commercial satellite data providers can capture any place on earth at very regular intervals and usually offer resolutions from 30 cm onwards. However, funding issues for high-resolution satellite imagery will need to be resolved. Prices for 50 cm resolution are typically between $
20 to $
40 per km2
], and hence higher resolution will be most likely more expensive. RCMRD gives as the surface of Malawi 118,484 ,
which would amount to a cost of over $
2.3 million. This cost will be an upper limit as probably satellite providers reduce the cost per
if one buys imagery for a very large area. RCMRD charged NSO US $
1.1 million for technical assistance and the satellite imagery for the whole of Malawi (but for 0.5 m up to 2.5 m resolution) [18
]. To compare, Malawian drone companies typically request around US $
5 per hectare, which amounts to US $
500 per km2
. Costs for nationwide field surveys are more difficult to determine. According to [2
] and [26
], sample-based methods such as household surveys cost on average between US $
460,000 to 1.7 million depending on the type of survey used.
The main implication of our research is that we have created, for small areas, a cost-effective workflow that can create an up-to-date and consistent water point dataset. This workflow combines UAV imagery, VGI, and field survey data. Important in combining these heterogeneous datasets is the information on buildings mapped through the open VGI platform OSM, whereby OSM enables easy and systematic scaling of the mapping to other areas. The resulting water point data fill a gap in the data needs for monitoring water-related SDGs on a sub-national level as it provides more details on the service level of water points at the household level. Our analysis clarifies the added value per data source, given that the attribute information and quality vary.
Future research will assess how collaboration with other organizations doing UAV analysis in Malawi can potentially enable scaling of the approach tested in this pilot. OpenAerialMap [27
] already has some UAV imagery from different areas of Malawi available. The expectation is that the amount of openly available imagery will grow, given that more and more low-cost UAVs become available, and more and more governmental and humanitarian organizations start using them. We will also look into using other geospatial data such as the High-Resolution Settlement Layer, as it gives a high-resolution estimation of people living in the area. Digital Elevation Models extracted from UAV imagery can serve to identify which water points will be most at risk of getting flooded and for which return periods. Overall, building data collaboratives [12
] will be essential to align the different data collection efforts and facilitate data sharing among the many actors involved in the fragmented WASH sector.