Generating Up-to-Date and Detailed Land Use and Land Cover Maps Using OpenStreetMap and GlobeLand30

With the opening up of the Landsat archive, global high resolution land cover maps have begun to appear. However, they often have only a small number of high level land cover classes and they are static products, corresponding to a particular period of time, e.g., the GlobeLand30 (GL30) map for 2010. The OpenStreetMap (OSM), in contrast, consists of a very detailed, dynamically updated, spatial database of mapped features from around the world, but it suffers from incomplete coverage, and layers of overlapping features that are tagged in a variety of ways. However, it clearly has potential for land use and land cover (LULC) mapping. Thus the aim of this paper is to demonstrate how the OSM can be converted into a LULC map and how this OSM-derived LULC map can then be used to first update the GL30 with more recent information and secondly, enhance the information content of the classes. The technique is demonstrated on two study areas where there is availability of OSM data but in locations where authoritative data are lacking, i.e., Kathmandu, Nepal and Dar es Salaam, Tanzania. The GL30 and its updated and enhanced versions are independently validated using a stratified random sample so that the three maps can be compared. The results show that the updated version of GL30 improves in terms of overall accuracy since certain classes were not captured well in the original GL30 (e.g., water in Kathmandu and water/wetlands in Dar es Salaam). In contrast, the enhanced GL30, which contains more detailed urban classes, results in a drop in the overall accuracy, possibly due to the increased number of classes, but the advantages include the appearance of more detailed features, such as the road network, that becomes clearly visible.


Introduction
Land cover is an essential climate variable [1] as it strongly influences current and future climate, particularly with rapid changes in the landscape due to human activities [2]. Baseline information on land use and land cover (LULC) is also a key input to many different types of models

Data Sources
In this section, the two main datasets that are used in this paper are described, i.e., the OSM and the GL30. In addition, a brief description of the UA is provided since the nomenclature is used for deriving the LULC product from OSM.

OpenStreetMap (OSM)
The OSM project was initiated in 2004 based on Steve Coast's vision that a global map of the world could be crowdsourced using the extensive local knowledge of people living and working in the areas to be mapped. After 13 years since the start of the project, OSM has indeed become the largest, most diverse, most complete and most up-to-date open access geospatial database in the world. Any object that is physically located in a position on the Earth's surface (from forests, lakes and buildings to detailed features such as benches, drinking fountains and manholes) can be added to the OSM database. Rather than producing map products or cartographic outputs, the focus of the OSM project is to maintain a living geospatial database of the world. OSM data are available under the Open Database License (ODbL) [12], an inclusive license allowing anyone to freely copy, distribute, share and adapt the database provided that attribution is made to the OSM project and its contributors and that derived datasets are released under the same license.
The OSM database is comprised of vector data (i.e., the geometry and the associated attributes for each feature). The data model is made up of three kinds of geometric primitives: nodes (which are encoded points), ways (sets of nodes used for encoding linear and polygonal features) and relations (logical collections of two or more nodes, ways or other relations). Each of these primitives is associated with one or more attributes or tags. A tag is comprised of a key and a value. Although OSM contributors are free to use their own tags, there is a quasi-official collection of tags that has been established and agreed upon over the years by the global OSM community. The starting point of this collection is the Map Features Wiki page [13] that links to numerous other detailed pages on the OSM Wiki and explains the likely usage and use-case scenarios of each tag. For example, to identify a river, the tag "waterway = river" should be used, where "waterway" is the key and "river" is the value.
Anyone can contribute data to OSM provided that a user account is first created. At the time of writing (February 2017), more than 3 million users had registered in the OSM [14], although some studies have shown that most of the mapping is performed by a very small fraction of volunteers [15]. Users can contribute data to the project in three ways: (1) by digitizing features from satellite imagery (armchair mapping); (2) by inserting/uploading elements that have been physically surveyed in the field, for instance using a GPS (Global Positioning System) receiver (field mapping); and (3) by uploading other datasets that have been released under an open and compatible license. Access to OSM data via application programming interfaces (APIs) has also enabled the development of a wealth of other OSM-based software, websites, services and applications that make direct use of OSM data for a variety of purposes (see [16] for an extensive and up-to-date list).

Humanitarian Applications of OSM
The digital empowerment of both affected communities and volunteer networks are reshaping humanitarian responses in the twenty-first century [17]. Over the last few years, an increase in mapping and exploitation of OSM data has been associated with humanitarian applications in the less developed countries of the world, which were typically absent or underrepresented in OSM [18]. This is a result of the efforts of the Humanitarian OpenStreetMap Team (HOT), an international, nonprofit organization leading the collaborative mapping in OSM when a disaster strikes anywhere in the world. HOT was formed after the tragic earthquake in Haiti in 2010, when the remote mapping by volunteers from around the world became crucial in assisting the humanitarian aid work in the field [19]. Another initiative, the Missing Maps (MM) project [20], is a consortium of organizations (including HOT) that supports the creation of maps in the most vulnerable areas of the world that are subject, e.g., to epidemics, political crises, natural hazards and other risks, and where maps do not currently exist. Maps are created remotely from volunteers and then enriched by local organizations working in the field.
Remote volunteers often produce OSM data through the organization of so-called mapathons (literally "map marathons"), i.e., social events where experienced and novice contributors meet together in a room (e.g., at a university) and focus their efforts on a specific area that needs armchair mapping [21]. In contrast to mapathons, mapping parties are field mapping events where volunteers aim at improving the local OSM map of an area [22][23][24]. Both types of mapping actions may have humanitarian purposes. Latif et al. [25] describe the challenges of stimulating voluntary mapping efforts in the local community of a disaster-prone country like Bangladesh. Feinmann [26] presents an ambitious project from MM that is being undertaken in collaboration with Médecins Sans Frontières (MSF), HOT, and the British and American Red Cross. The aim is to add 200 M addresses to the OSM world map in two years via numerous mapathons held around the world. The OpenStreetMap Haiti project, which focused on the remote mapping of Haiti after the earthquake, is presented by Shemak [27], while Moeller and Furhmann [18] describe the mapathons organized by the American Red Cross and HOT (both partners of MM) to fight the 2014 Ebola outbreak in Western Africa. Ebrahim et al. [28] describe an experience of humanitarian mapathon for Swaziland co-organized with HOT and MM and performed by 200 10-year old children.
Humanitarian mapping in OSM has often proven to provide the only available source of map data and thus becomes a crucial source of information in emergency situations as well as for other preventive or planning needs such as census taking, risk mapping, control of outbreaks, etc. As an example, Figure 1 provides a meaningful comparison between Google Maps and OSM for the Kibera slum in Nairobi (Kenya) obtained using the Map Compare tool from Geofabrik [29]. While this slum is totally absent in the former, the latter shows the impressively high level of detail that has resulted from the work of the local volunteers in the frame of the Map Kibera project [30]. Situations like this one clearly show the potential of using OSM data to derive accurate map products like LULC maps, as discussed in Section 4. aim at improving the local OSM map of an area [22][23][24]. Both types of mapping actions may have humanitarian purposes. Latif et al. [25] describe the challenges of stimulating voluntary mapping efforts in the local community of a disaster-prone country like Bangladesh. Feinmann [26] presents an ambitious project from MM that is being undertaken in collaboration with Médecins Sans Frontières (MSF), HOT, and the British and American Red Cross. The aim is to add 200 M addresses to the OSM world map in two years via numerous mapathons held around the world. The OpenStreetMap Haiti project, which focused on the remote mapping of Haiti after the earthquake, is presented by Shemak [27], while Moeller and Furhmann [18] describe the mapathons organized by the American Red Cross and HOT (both partners of MM) to fight the 2014 Ebola outbreak in Western Africa. Ebrahim et al. [28] describe an experience of humanitarian mapathon for Swaziland co-organized with HOT and MM and performed by 200 10-year old children.
Humanitarian mapping in OSM has often proven to provide the only available source of map data and thus becomes a crucial source of information in emergency situations as well as for other preventive or planning needs such as census taking, risk mapping, control of outbreaks, etc. As an example, Figure 1 provides a meaningful comparison between Google Maps and OSM for the Kibera slum in Nairobi (Kenya) obtained using the Map Compare tool from Geofabrik [29]. While this slum is totally absent in the former, the latter shows the impressively high level of detail that has resulted from the work of the local volunteers in the frame of the Map Kibera project [30]. Situations like this one clearly show the potential of using OSM data to derive accurate map products like LULC maps, as discussed in Section 4.

GlobeLand30
GL30 has been produced by the National Geomatic Center of China (NGCC) as one of the first global 30-m land cover products produced from freely available Landsat imagery [5]. Baseline maps for the years 2000 and 2010 were created using more than 10,000 Landsat images to produce maps with 10 high-level land cover classes as follows: water bodies, wetland, artificial surfaces; cultivated land; permanent snow and ice; forest; shrubland; grassland; bareland and tundra. Pixel-and object-based classification approaches were combined to extract the classes in a hierarchical manner starting with water bodies and ending with tundra. These classes were then combined with ancillary layers such as other global, regional and national land cover products, OpenStreetMap and very high resolution satellite imagery from Google Earth, among others, via a customized web-based information service in order to label the objects and verify the results with knowledge related to specific geographical areas.
The GL30 product was validated globally for 2010 and was found to have an overall accuracy greater than 80% [5]. For artificial surfaces, it was also compared to other global land cover products for eight test areas around the world and accuracies ranged from 79% to 97%, outperforming both CLC and the FROM-GLC product of Gong et al. [4]. There have been a few other studies that have independently compared GL30 to authoritative products in Italy [31], Germany [32] and water bodies in Scandinavian countries [33] and more recently, land cover in Iran [34], with high agreements, i.e., greater than 78%. Other than the comparison for Iran, there has been little validation of this product outside Europe, and particularly in less developed countries, which may benefit from such a product if no other land cover maps are available. Table 1 shows the GL30 nomenclature along with the MMU, which varies by class.

Urban Atlas
The Global Monitoring for Environment and Security Urban Atlas, hereafter referred to as the Urban Atlas (UA) is a pan-European product that provides a detailed land cover and land use classification of European cities with a population greater than 100 K in the EU [35] although more cities were added in 2012 with populations greater than 50 K. The UA is intended as an input to evidence-based policy making, and allows for comparison between, and benchmarking of, major European cities. For example, studies related to the amount of greenspace or accessibility studies can be undertaken with such a product. More than 300 cities were mapped for the baseline year of 2006 using high resolution satellite imagery including other reference data such as commercial navigation products; cadastral and zoning data; local city maps; and aerial photographs, where available. Google Earth has also been used for verification of the locational and thematic accuracy of the classes.
The UA has a minimum mapping unit (MMU) of 0.25 ha (0.0025 km 2 ) for artificial classes and 1 ha (0.01 km 2 ) for other land cover classes. The positional accuracy is ±5 m with a minimum thematic accuracy of 85% for artificial surface classes and 80% for other land cover types [35]. The reason for introducing the UA in this paper is because of the detailed urban nomenclature that is available. This nomenclature is used to convert OSM data to a LULC map and then to produce an updated and enhanced version of the GL30 LULC map, as described in Section 4.
The UA is organized into four levels of hierarchy with increasing levels of detail. Table 2 lists the classes of the first three levels.

Kathmandu, Nepal
Between April and May 2015, Nepal was hit by a severe earthquake followed by many strong aftershocks, which caused widespread damage in the capital city of Kathmandu and the surrounding regions. Driven by HOT and locally coordinated by the nonprofit Kathmandu Living Labs (KLL) [36], the humanitarian mapping efforts have seen the participation of thousands of volunteers from around the world. The produced maps, which in many regions were completely missing before the event, were then used to support relief, rescue and rebuilding operations. Thanks to the coordination and training efforts of KLL, impressive field mapping was also performed locally, which has currently made the OSM Kathmandu map highly detailed and up-to-date. The 18 km × 18 km study area, which includes Kathmandu city, is shown in Figure 2.

Dar es Salaam, Tanzania
With a population of over 4 million people, Dar es Salaam in Tanzania is the largest city in East Africa [37]. The city is particularly prone to flooding, which every year during the rainy season causes many deaths and millions of dollars of damages, which could be prevented with adequate planning. However, since much of the city is comprised of unplanned and informal settlements, after the severe flood event of May 2015, the Dar Ramani Huria community-based mapping project [38] was started to raise awareness of flood resilience. They have trained university students and local citizens to create highly accurate OSM maps of the most flood-prone areas of the city, which were previously unmapped [39]. Maps are also delivered in printed form to the local governing bodies. The project, still ongoing, is managed by the Tanzanian Commission for Science and Technology, with partners including the City Council of Dar es Salaam, Ardhi University, University of Dar es Salaam, Buni Innovation Hub and HOT, with support from the Red Cross, the Global Facility for Disaster Reduction and Recovery and the World Bank. As a result of this project, which has also involved remote volunteers through the HOT channels (see e.g., [38]), Dar es Salaam is now one of the most densely mapped areas in the whole OSM database. Figure 3 shows the 18 km by 18 km study area in Dar es Salaam.

Dar es Salaam, Tanzania
With a population of over 4 million people, Dar es Salaam in Tanzania is the largest city in East Africa [37]. The city is particularly prone to flooding, which every year during the rainy season causes many deaths and millions of dollars of damages, which could be prevented with adequate planning. However, since much of the city is comprised of unplanned and informal settlements, after the severe flood event of May 2015, the Dar Ramani Huria community-based mapping project [38] was started to raise awareness of flood resilience. They have trained university students and local citizens to create highly accurate OSM maps of the most flood-prone areas of the city, which were previously unmapped [39]. Maps are also delivered in printed form to the local governing bodies. The project, still ongoing, is managed by the Tanzanian Commission for Science and Technology, with partners including the City Council of Dar es Salaam, Ardhi University, University of Dar es Salaam, Buni Innovation Hub and HOT, with support from the Red Cross, the Global Facility for Disaster Reduction and Recovery and the World Bank. As a result of this project, which has also involved remote volunteers through the HOT channels (see e.g., [38]), Dar es Salaam is now one of the most densely mapped areas in the whole OSM database. Figure 3 shows the 18 km by 18 km study area in Dar es Salaam.

Conversion from OSM to LULC
There is a large diversity of tags in OSM, and, as mentioned in Section 2.1, volunteers may even create their own tags. One of the keys available in OSM is "landuse". Some of the values used for this key may have a direct conversion to LULC classes, such as "forest", "vineyard" or "residential". However, a large percentage of other proposed keys may also provide information on LULC, such as the keys "building", "highway" or "amenity". Hence, a methodology was developed to convert OSM features into LULC classes, which is shown schematically in Figure 4. This methodology was implemented for the set of tags listed in the OSM Map Features Wiki page that were considered to be the most relevant for extracting LULC information. However, as contributors are free to create new tags, whenever the conversion is made, an analysis of the available tags needs to be done for that specific study area and time of download, as additional important tags may exist. In a previous study by Fonte et al. [40], this conversion was already made for the UA and CLC nomenclatures.

Conversion from OSM to LULC
There is a large diversity of tags in OSM, and, as mentioned in Section 2.1, volunteers may even create their own tags. One of the keys available in OSM is "landuse". Some of the values used for this key may have a direct conversion to LULC classes, such as "forest", "vineyard" or "residential". However, a large percentage of other proposed keys may also provide information on LULC, such as the keys "building", "highway" or "amenity". Hence, a methodology was developed to convert OSM features into LULC classes, which is shown schematically in Figure 4. This methodology was implemented for the set of tags listed in the OSM Map Features Wiki page that were considered to be the most relevant for extracting LULC information. However, as contributors are free to create new tags, whenever the conversion is made, an analysis of the available tags needs to be done for that specific study area and time of download, as additional important tags may exist. In a previous study by Fonte et al. [40], this conversion was already made for the UA and CLC nomenclatures. The implemented methodology requires not only the identification of the nomenclature and the mapping of the OSM features into it, but also the use of rules and parameters that enable, among other aspects: (1) the conversion of linear features, such as roads and waterways (which are represented as polylines in OSM) into polygons; (2) the identification of the most appropriate class to be assigned to some tags which may be associated to more than one LULC class, such as grass (which may represent urban gardens, agricultural fields, pastures, or natural vegetation) or water (which may correspond to an urban lake inside a park, a river or the ocean); and (3) the solution of OSM data inconsistencies, which assign different non-compatible classes to the same location. Finally, as it may be convenient to create LULC maps with a particular MMU, the maps obtained through this process can be generalized, which requires the definition of additional rules. However, generalization was not considered in this paper.
From the technical perspective, the procedure described above was implemented in a Free and Open Source Software for Geospatial (FOSS4G) environment. The technologies used include GRASS GIS (Geographic Resources Analysis Support System Geographic Information System) [41], GDAL/OGR (Geospatial Data Abstraction Library) [42], and PostgreSQL [43] with its spatial extension PostGIS [44] as well as some specific tools to manage OSM data such as osm2pgsql [45] for PostgreSQL/PostGIS, and Osmosis [46]. Python was the main language used to integrate and manipulate the spatial processing of the data. The details about how these FOSS4G (Free and Open Source Software for Geospatial) technologies were used to accomplish the steps shown in Figure 4 are described in detail by Fonte et al. [40].
To be able to compare the LULC maps extracted from OSM with GL30, the tags associated with OSM features were also mapped to the classes of the GL30. Table 3 shows the mapping between the OSM tags considered for the study areas to both the UA and the GL30 nomenclature. The implemented methodology requires not only the identification of the nomenclature and the mapping of the OSM features into it, but also the use of rules and parameters that enable, among other aspects: (1) the conversion of linear features, such as roads and waterways (which are represented as polylines in OSM) into polygons; (2) the identification of the most appropriate class to be assigned to some tags which may be associated to more than one LULC class, such as grass (which may represent urban gardens, agricultural fields, pastures, or natural vegetation) or water (which may correspond to an urban lake inside a park, a river or the ocean); and (3) the solution of OSM data inconsistencies, which assign different non-compatible classes to the same location. Finally, as it may be convenient to create LULC maps with a particular MMU, the maps obtained through this process can be generalized, which requires the definition of additional rules. However, generalization was not considered in this paper.
From the technical perspective, the procedure described above was implemented in a Free and Open Source Software for Geospatial (FOSS4G) environment. The technologies used include GRASS GIS (Geographic Resources Analysis Support System Geographic Information System) [41], GDAL/OGR (Geospatial Data Abstraction Library) [42], and PostgreSQL [43] with its spatial extension PostGIS [44] as well as some specific tools to manage OSM data such as osm2pgsql [45] for PostgreSQL/PostGIS, and Osmosis [46]. Python was the main language used to integrate and manipulate the spatial processing of the data. The details about how these FOSS4G (Free and Open Source Software for Geospatial) technologies were used to accomplish the steps shown in Figure 4 are described in detail by Fonte et al. [40].
To be able to compare the LULC maps extracted from OSM with GL30, the tags associated with OSM features were also mapped to the classes of the GL30. Table 3 shows the mapping between the OSM tags considered for the study areas to both the UA and the GL30 nomenclature.

Comparison of OSM-Derived LULC Map with GlobeLand30
The comparison between the LULC map derived from OSM using the GL30 nomenclature and GL30 was made through: (1) computation of the percentage of unmapped regions in the OSM-derived map relative to the study area; and (2) a direct comparison of the maps for those regions with data in both maps (OSM-derived and GL30). In order to undertake this comparison, the vector LULC map derived from OSM data was first rasterized using a pixel size of 30 m to match the resolution of GL30. A confusion matrix was then calculated. The overall agreement of the overlapping regions was computed, as well as the marginal proportions of agreement [47]. In addition, a binary disagreement map was computed, which shows the spatial distribution of the classification agreement/disagreement.

Update of GlobeLand30 through the OSM-Derived LULC Map
The OSM-derived LULC map (with the GL30 nomenclature) was used to update the original GL30 product, which corresponds to the year 2010. To do this, the OSM-derived LULC map was rasterized to match the same 30 m grid as the GL30 for Kathmandu and Dar es Salaam. To minimize the loss of detail in the OSM data, the rasterization was performed using the prevalence method, i.e., the final 30 m pixel is assigned the class that has the largest vector presence in the pixel. The updated GL30 is then produced using the following rule, which is applied to each pixel:

1.
If a pixel contains OSM-derived LULC data, update the GL30 with this information; 2.
If a pixel contains no OSM-derived LULC data (i.e., a null value), then retain the original value of the GL30.
The original and the updated versions of GL30 were then compared by analyzing the difference in the relative proportions of the different LULC classes in order to understand what the additional contribution of OSM data are in the new updated map.

Enhancing GlobeLand30 through the Use of More Detailed OSM-Derived Data
In this section we go one step further and consider how to enhance the information content of the GL30 by adding additional detailed information that is available in OSM. The GL30 has 10 high-level classes as shown previously in Table 1, with only one class for urban areas (or artificial surfaces) while the UA nomenclature contains a number of detailed classes on urban areas ( Table 2). Hence it is possible to enhance the artificial surfaces class of the GL30 with more detailed classes from the OSM-derived LULC map. The updated GL30 map (generated with the approach described in Section 4.3) is used as the starting layer. The non-urban classes, i.e., cultivated land (10), forest (20), grassland (30), shrubland (40), wetland (50) and water bodies (60), remain unchanged. Four new classes, which correspond to the UA level 2 nomenclature as listed in Table 2, are added to the GL30, replacing the single artificial surfaces class with these four new ones (Table 4). In the situation where a pixel contains no OSM-derived LULC data for the urban classes 80.1 to 80.4 but was classified as artificial surfaces (80) in the updated GL30 map, then the enhanced GL30 map pixel is assigned a value of 80.1 corresponding to the urban fabric class.

Validation of the Derived Maps
To validate the updated and enhanced GL30 maps and to compare them with the original GL30 LULC map, a random stratified sampling design was implemented to create a reference database of 50 points per class, considering the classes of the enhanced GL30 map (described in the previous section) as strata. The validation was then performed using the LACO-Wiki tool, which is a free, online portal that offers standardized land cover validation [48]. Interpretation in LACO-Wiki was implemented using what is referred to as blind validation, which means that the user is not shown the map class at each location but instead, interprets the very high resolution satellite imagery from Google Maps and Bing using the enhanced GL30 classification. For the validation of the GL30 and updated GL30, the four artificial surface classes (80.1 to 80.4) were grouped together. Once the validation was completed in LACO-Wiki, the reference database was downloaded for the calculation of confusion matrices and accuracy indices, which were computed using the methodology proposed by Card [49], where the area occupied by each class is considered in the computation.         Tables 5 and 6 show the confusion matrices between the original GL30 and the OSM-derived LULC map, the overall agreement (OA) and the row and column marginal proportions of agreement (RMPA and CMPA, respectively) for the study areas of Kathmandu and Dar es Salaam, respectively.  Table 6. Confusion matrix and indices of agreement for the Dar es Salaam study area.  Tables 5 and 6 show the confusion matrices between the original GL30 and the OSM-derived LULC map, the overall agreement (OA) and the row and column marginal proportions of agreement (RMPA and CMPA, respectively) for the study areas of Kathmandu and Dar es Salaam, respectively.   Table 6. Confusion matrix and indices of agreement for the Dar es Salaam study area. The results are considerably different for the two study areas. The overall agreement between the original GL30 and the OSM-derived LULC map is 56% for Kathmandu and 81% for Dar es Salaam. In the former case, the class having the highest RMPA and CMPA is 20 (Forest); class 10 (Cultivated land) shows a low RMPA but a very high CMPA, while the opposite happens for class 80 (Artificial surfaces). The remaining classes are almost absent in both LULC maps. In the case of Dar es Salaam, the highest agreement is found for class 80 (artificial surfaces). Class 20 (forest) and class 50 (wetlands) have a high CMPA and low RMPA (most of them are mapped as artificial surfaces in the original GL30) while class 30 (grassland), which is the one occupying the largest area in the original GL30, is almost absent in the OSM-derived map.

Conversion of OSM into LULC Classes
In general, classification disagreement between the pixels of the OSM-derived map and the GL30 could be due to: (1) classification errors in the original GL30 map; (2) LULC changes that occurred between 2010 (date of the original GL30 map) and the date of the OSM derived data, which were captured in the OSM data; (3) tagging errors in the original OSM data from which the LULC map is obtained; and (4) classification errors introduced by the conversion procedure.

Evaluation of the Updated GlobeLand30 Map
Tables 7 and 8 show the relative distribution of classes and the percentage of non-null pixels for the same series of LULC maps, again for Kathmandu and Dar es Salaam, respectively. As shown in Table 8, in the Dar es Salaam study area, there are pixels with null values in the original GL30, and as a consequence also in the updated GL30. This is due to the presence of the sea, which is not mapped in the GL30 LULC classification (see Figure 6). Conversely, for the Kathmandu study area, pixels with null values are only found in the OSM-derived LULC map. These pixels correspond to regions where no OSM data are available or no OSM data were used to generate the LULC map (because the tags were not considered appropriate for LULC purposes). However, it is worth noting that the percentage of non-null pixels in the OSM-derived LULC map is similar for both Kathmandu and Dar es Salaam (71.71% and 62.28%, respectively). In both study areas, the class with the largest percentage in the OSM-derived product is 80 (artificial surfaces), which reflects the fact that the highest mapping efforts in OSM are focused on built-up structures, mainly roads and buildings.
The effects of updating GL30 with the OSM-derived LULC map are also different for the two case studies. For Kathmandu, there is a significant increase in artificial surfaces from 23.42% to 48.64% (see Table 7), which results in an extension of the city area at the expense of cultivated land, as is clearly visible from Figure 5. For Dar es Salaam, the proportion among the classes is instead quite unchanged, except for a slight increase in class 50 (wetlands) and a decrease in class 30 (grassland). In particular, a visible difference between the original GL30 and the OSM-derived LULC map is the presence in the latter of the wetlands around the Msimbazi River and its tributaries. This area, which is the one typically suffering from flooding, is not mapped in GL30 due either to its MMU of 0.0009 km 2 for rivers and 0.0729 km 2 for wetlands or simply due to classification errors. Summarizing, the update of GL30 with the more detailed and small-scale LULC information derived from OSM mainly results in an extension of the urban area of Kathmandu and an increase in wetlands for Dar es Salaam. Figure 8 shows the enhanced GL30 maps for the two study areas produced after the application of the procedure described in Section 4.4. As is clearly visible from the figure, the main advantage of these maps compared to the updated GL30 (see the lower left part of Figures 5 and 6) is the more detailed characterization of the urban areas according to the UA level 2 nomenclature. In particular, the details of the road network (included in the new class 80.2) are clearly visible. Supplementing Tables 7 and 8, which list the proportions of the single urban class for Kathmandu and Dar es Salaam study areas, Table 9 shows the relative proportions for the 4 urban classes of the enhanced GL30. In both study areas, classes 80. 3     Finally, the binary maps in Figure 9 show the provenance of each pixel of the enhanced GL30 maps, i.e., they describe whether the pixel information comes from the original GL30 or the OSM-derived LULC map. It is easy to see that most of the pixel values (71.71% for Kathmandu and 74.87% for Dar es Salaam) originate from the OSM database. As already mentioned, these pixels correspond, in particular, to the urban and most populated areas, thus reflecting the primary focus of the OSM mapping effort.

Validation
The validation of the three maps was made using the reference database created as described in Section 4.5. Table 10 shows the user's (UAc), producer's (PAc) and overall accuracy (OAc) obtained for the original GL30 and updated GL30 for the two study areas.

Validation
The validation of the three maps was made using the reference database created as described in Section 4.5. Table 10 shows the user's (UAc), producer's (PAc) and overall accuracy (OAc) obtained for the original GL30 and updated GL30 for the two study areas.
In Section 2.2, the accuracy of GL30 was reported to vary between 79% and 97%. The results found here are much lower. This may be caused by the methodology used to create the reference database (i.e., the rules used to assign a class to the sample pixels), the fact that the reference database used in this study was created using only photo interpretation, and therefore in some cases the differentiation between, for example, agriculture and natural vegetation may be difficult, and also because the aerial and satellite images for the photo interpretation are more recent than GL30. For example, the Bing imagery used for Dar Es Salaam is dated 2013 (or occasionally 2010) while it is dated 2016 for Dar Es Salaam. Google Earth imagery is 2016 for both cities. The unavailability of imagery with the same date as the date of creation of GL30 prevents the estimation of the real amount of change as well as the identification of the effects on the accuracy assessment of the actual change and the influence of the accuracy assessment procedure used.
When comparing the results obtained for both study areas, the OAc increased from the original GL30 to the updated version, from 54% to 65% for Kathmandu (+11%) and from 61% to 69% for Dar es Salaam (+8%). Regarding the UAc and PAc, for Kathmandu, most differences are found for: class 60 (water bodies), since these are practically non-existent in the original GL30 and were mapped in the updated version; class 30 (grassland), as the UAc decreased from 33% to 4%, and class 40 (shrubland), where the opposite occurred, with an increase from 0% in the original GL30 to 38% in the updated version. This may result from actual changes in the vegetation or interpretation difficulties in the creation of the reference dataset. For Dar es Salaam, classes 50 (wetlands) and 60 (water bodies) are the most affected, as large regions around waterlines are classified in OSM as wetlands. These, however, do not correspond to regions with water in the satellite images, but likely to regions frequently flooded, as they also include some urban areas. This explains the relatively low values of UAc and large values of PAc for the updated GL30 map. Table 11 shows the user's (UAc), producer's (PAc) and overall accuracy (OAc) obtained for the enhanced LULC map for the two study areas. For Kathmandu, the OAc of the enhanced LULC map is lower than the OAc of the updated map, but it is higher than the OAc of the original GL30. However, for Dar es Salaam the OAc is lower than the OAc of both the original GL30 and its updated version using OSM data. The decrease in accuracy is partly explained by an increase in the number of classes in the enhanced GL30 map since there are now four urban classes instead of one so the potential for error has increased. What has been gained is a more detailed urban characterization that clearly shows the road patterns in these cities.

Conclusions
In this paper a methodology is presented to demonstrate the use of OSM data for updating and enhancing the GL30 LULC product. GL30 exists at the global scale but it only has a small number of high level classes, e.g., only one urban class. In addition, the product is static from the year 2010 although a more recent version is currently being developed. OSM, on the other hand, is a dynamic data source created by citizens that is continually being updated. The results showed that, for the chosen study areas of Kathmandu, Nepal and Dar es Salaam, Tanzania, the conversion of OSM data into the GL30 classes, and the integration of the resultant maps with the original GL30, produced more up-to-date maps, with a higher OAc. A visual analysis of the updated products shows the expansion of urban areas and a more detailed map of water bodies. The richness of OSM data, which enables the use of a more detailed nomenclature, especially for the urban environment, was exploited by replacing the single urban class of GL30 with four more detailed urban classes using the UA level 2 nomenclature to create an enhanced map with a higher number of classes and thus greater detail. More than 70% of the content of the enhanced map for the two study areas is derived from the OSM database. Although the OAc of the resultant map was lower than the OAc of the original GL30 in the case of Dar es Salaam, a visual analysis shows clear identification of the road networks in both study areas.
The creation of hybrid maps, i.e., merging GL30 and OSM, has the advantage of obtaining a more detailed and updated product, especially related to urban development. However, the OSM data are not validated (even though there is a continuous validation made by the crowd), which may lead to some potential errors in the database. For example, there may be different tagging practices for some classes, in particular the natural classes, such as grassland (GL), shrubland (SL) or forest (F), which may create inconsistencies in the resulting LULC classes.
We have considered two "extreme" cases of mapping where many people have contributed, i.e., mainly remote mappers after the earthquake in Nepal or through an internationally funded project to address flooding in Dar es Salaam. However, this does not mean that areas in developing countries "need" to be hit by a disaster or funded by a project to be very well mapped in OSM. On the contrary, there are many areas worldwide that are well mapped simply thanks to local mapping groups or volunteers (i.e., without any contribution from remote mappers); see e.g., Abidjan city in the Ivory Coast (http://osm.org/go/ardJDOX). Thus, the same procedure can be applied anywhere that OSM data are available.
In the future we plan to create a Web service to apply the procedure outlined here in real time to any user defined area (without having to install software and run scripts). This service will be based on the Web processing service (WPS) specifications issued by the Open Geospatial Consortium (OGC). Such a service would be linked to an interface that would allow users to choose the optimal values of the parameters (e.g., the buffer values) so that users familiar with the area to be classified can use their local knowledge; otherwise default values would be used. This would bring an advantage compared to the current OSMLanduse.org service developed at the University of Heidelberg [11], which generates the final OSM-derived LULC map without allowing users to tune the parameter values used by the algorithm. We also plan to improve the algorithm with the use of other VGI, such as geotagged photographs from publicly available repositories such as Flickr and Panoramio (or its replacement) to complement LULC classification in areas where there is no OSM, or where the OSM data are not clearly mappable into a specific LULC target class.