In the past few years, collaborative mapping projects, with the main goal of collecting and distributing freely available geodata, have attracted significant attention from academia, leading towards their integration into research projects in a semantic and meaningful way. There are a variety of projects available on the Internet on which mostly volunteers share their expertise and information.
One of the most well-known examples of a User-Generated Content (UGC; [1
]) online portal is the free online-encyclopedia Wikipedia. Other projects focus on the collection of a diverse type of data and information specifically containing geographic objects and their corresponding information. Due to the voluntary approach to data collection efforts for this particular type of information, it was initially termed Volunteered Geographic Information (VGI) [2
]. In 2007, early questions were raised about “the phenomenon of VGI, and the use of VGI in doing science” [3
], especially in the area of Geographic Information Science (GIScience). Thus, UGC or VGI have developed into popular interdisciplinary research topics in recent years. Most VGI analyses focus on the OpenStreetMap (OSM) project, which provides a plethora of research questions due to its data diversity. An increasing number of studies in the past investigated data quality indicators, the motivation and activity spectrum of the community that share the information or applications that were developed based on the collected information found in OSM. Although different factors have been analyzed in these prior articles which will be discussed in more detail in the second section of this paper, the analyses usually show a similar pattern: The larger the population in the predefined, analyzed area, the stronger the data quality of the collaboratively collected information in OSM gets. To the best of the authors’ knowledge, only minor studies on other countries not located in Europe have been carried out and reported. More importantly, no comparative investigation on OSM data for different selected world regions has been conducted.
The second major factor for collaborative projects such as OSM, after a well designed project infrastructure, is the worldwide community. The volunteers build the foundation of the project and guarantee the detailed data contributions and temporal accuracy of the data, thus it is important to investigate the different worldwide collaboration efforts. The main objective of this paper is to determine similarities and differences in the pattern of VGI data contributions and user activity spectrums for different worldwide urban areas. We hypothesize that different factors such as contributor concentration, population density and socio-economic parameters such as income can influence the data contributions to OSM. The analysis is conducted for 12 world regions representing at least one urban area for each continent.
The remainder of the paper is organized as follows: The following section gives a brief overview of the OSM project and prior OSM research. The next section introduces the study areas and applied data preparation steps. The third section presents the results conducted for the selected urban areas between 2005 and 2012. The last two chapters summarize and discuss the achieved outcomes and provide an outlook on potential future research.
2. Volunteered Geographic Information: The OpenStreetMap Project
The OSM project is one of the most popular and well-known VGI platforms on the Internet. The main goal of the project since its initiation in 2004 is to create a freely available database of geographic features [4
]. Contributions and edits to OSM can be made by any internet user that is registered to the project. This open approach to data contribution allowed the project to gradually attract new members and grow rapidly in the past few years to more than one million registered members at the time of writing [5
]. Aside from the aforementioned registration to the project to be able to make edits to the map, the volunteers are commonly equipped with GPS enabled devices that allow the collection of new information in the field. Others prefer to trace new data from aerial imagery from their home computer. The imagery information is provided by a variety of sources and allows the active users to trace data even for areas that are not close to the editor’s location. In addition to digitized features, attribute information about the created objects can be simultaneously added to the database. For a certain number of countries, contributions to the project were achieved through large data imports from commercial or governmental data sources whose licenses are compatible to the OSM license. Some examples can be found in the United States, the Netherlands and Austria, where partial or complete data representations of the road networks were enabled through this approach. For France, building information and the CORINE (Coordination of Information on the Environment), land cover information were imported to OSM in 2009. There are no strict limitations, rules or standards to the type of information which should be contributed to the OSM project. Merely a de facto standard, represented by the “Map-Features”, helps to guide the contributors with their work [6
]. This guide describes the most common elements and objects and their corresponding attributes that can be found in the OSM project. The map features are mostly attributed with a key and value combination also referred to as “tags”. The collected real world information in the database is represented by three data types. Nodes, which represent any point feature, Ways which represent lines such as roads and areas such as buildings, and Relations which contain information about how Nodes and Ways are related to each other.
In recent years, the OSM project, data, and contributors have been the center of attention for many research disciplines. In 2009, early outcomes showed a density of OSM data in Germany with potential applicability for 3D location based services [7
]. The results also showed some first indications of a correlation between improved data quality in areas with a higher population density. Similar results were conducted for London and entire England in 2008–2009, highlighting the decrease in data quality when moving away from bigger cities and that: “more affluent areas and urban locations are better covered than deprived or rural locations” [8
]. In 2010, findings for Germany resembled the pattern of urban locations found in England [10
]: “If coverage is needed only in the densely populated urban areas of Germany, OpenStreetMap may already be an interesting and very cost-efficient-alternative”. A similar statement was made for England [11
]: “Most accurate tiles are located in major urban areas such as London, Liverpool, Manchester or Birmingham”. A different analysis for Germany [12
] stated that: “At the national level, the quality of OSM is highest regarding relative object completeness” and “quality differs locally, and even in a single town the different aspects of quality may vary”. A study conducted for France showed a heterogeneous OSM data pattern, which “is particularly explained by the coexistence of different data sources, processes of capture, and contributors’ proﬁles, highlighting the importance of following accepted and well-deﬁned speciﬁcations” [13
]. Additionally, the analysis revealed that the more volunteers contributed within an area, the more recent the objects were, i.e. providing a better temporal quality of OSM itself. Similar to all prior findings, an analysis for Ireland showed that the data completeness in OSM loosely correlates with the population density [14
]. However, contradicting results to the pattern found in Europe could be determined for the US. In this particular case urban areas only showed similar data completeness between OSM and commercial providers in Florida, while rural areas were more complete in OSM [15
]. This difference was primarily based on the TIGER/Line data import in OSM for the US and not due to active data contributions [15
The heterogeneous pattern of the OSM project is not limited to data completeness and accuracy factors. The community and its active contributors show a similar distribution. At the beginning of 2012, about 75% of all members who made at least one change to the database were located in Europe, while the rest was distributed over the world [16
]. Especially some countries with higher population values such as USA, China and India show relatively small OSM project communities. Although the project was initiated in the UK, the most active community of the project in recent years can be found in Germany [16
]. The latest results showed that about 25% of all active OSM members are located in Germany. Thus, it is not surprising that the road network completeness shows good results, sometimes exceeding commercial providers for this particular area [17
]. Solely attribute information such as road names, speed limits and turn restrictions are missing for parts of the German dataset [12
Aside from road network parameters, which are the main focus of most conducted analyses, a few other publications also confirm OSM data to be suitable for 3D city models in urban areas [18
]. In summary, almost all prior studies show that urban areas provide better data completeness in OSM than rural areas [20
], which is sometimes also referred to as “urban bias in VGI” [22
]. However, each individual case study needs to be analyzed for its particular purpose [22
]. Chances are that: “When one moves away from large urban centers the major issue for quality becomes one of coverage—in many rural areas there is little or no OSM coverage at all” [24
While most studies analyze the quantity and quality of the collaboratively collected information in OSM, others focus on the motivational factors of the volunteers that contribute to VGI projects [25
]. Possible motivational factors might be the unique ethos, or that geospatial information should be freely available to everyone. For others, learning new technologies, self-expression, relaxation and recreation or just pure fun can play a major role [28
]. Three independent surveys in 2009 [28
], 2010 [29
] and 2011 [30
] gave more insight on demographic aspects of the OSM project. The majority of the contributors to the project, about 97%, were males. Two out of the three surveys [28
] showed that on average 65% of the respondents were between 20 and 40, and about 23% between 40 and 50 years old. Furthermore, about 56% had a high-school or higher education degree. About 50% of the respondents in one survey considered their current profession as computer science related [30
] and another survey showed that about 50% had some sort of GIS background [28
], highlighting that “the OSM community does not constitute with GIS amateurs as is speculated in VGI” [28
Community-based projects, websites and portals such as OSM are oftentimes affected by so-called “participation inequality”. A 90-9-1 rule can usually be applied to most of these projects [31
]. This rule highlights that about 90% of the members of community-based projects are usually only consuming the collaboratively collected information, while 9% occasionally contributes to the project and only 1% demonstrate a very active pattern. This rule can be applied to projects such as Wikipedia [32
] and has also been tested for the OSM project. In 2007, about 28% of the 120,000 members of the project actively contributed any data [28
]. In 2011, about 38% of the 500,000 members made at least one change to the dataset [16
]. Additionally, only 3% of all members actively contributed to the project each month. However, considering values from prior years the recent number of active contributors is not increasing in the same pattern as the total number of registered members. At the end of 2012, of the almost 1 million registered OSM members on average only 18,000 members, less than 2% actively contributed to the project each month [34
The increasing popularity of OSM also comes with caveats such as cases of vandalism, similar to developments seen in Wikipedia. An analysis carried out in 2012 revealed that for a timeframe of one week at least one case of vandalism could be detected in the OSM database each day [35
]. It needs to be noted though that these cases of vandalism can also be accidently created by new or inexperienced members and are not always intentional.
3. Selected Urban Areas and Data Sources
Several definitions from different sources help to distinguish urban or agglomerated areas from rural areas, which is a crucial point for the analysis presented in this article. Demographia defines urban areas as: “A continuously built up land mass of urban development that is within a labor market (i.e.
, metropolitan area or metropolitan region), without regard for administrative boundaries (i.e.
, municipality, city or commune)” [36
]. The identification of these areas is usually based on maps and satellite images that estimate the continuous urbanized area [36
]. It is also important to distinguish between urban areas and metropolitan areas in which: “A metropolitan area is an urban area plus the satellite cities around the urban area and the agricultural land in between.” Since these factors could potentially influence the results of the analysis conducted, it was decided to use urban areas instead of metropolitan areas to avoid forests, agricultural and other uninhabited areas in the selected regions.
A variety of online sources allow Internet users to retrieve freely available urban area information. However, oftentimes sources show inconsistencies in their provided information due to different geographical definitions of urban areas [37
]. Since none of the available sources, such as Natural Earth Data or CORINE, provided the information needed for a comprehensive comparison of worldwide urban areas, it was decided to trace the urban area boundaries based on Bing satellite imagery. The center of each polygon was primarily based on the location of the city name feature in the standard OSM map. The urban area sizes that were implemented during the polygon generation and their corresponding population information were retrieved from Demographia [36
]. Figure 1
shows a world map highlighting the selected regions for which urban area polygons were generated.
During the urban area selection for the analysis it was decided to choose at least one large, well-known urban area (city) for each continent to provide world wide information. In total, 12 urban areas and their related area extent, absolute and population density information were chosen as shown in Table 1
Overview of the selected urban areas.
Overview of the selected urban areas.
Selected urban areas. Source: Demographia [35
Selected urban areas. Source: Demographia .
|Country||City||Population in 2011||Area (km2)||Density (/km2)|
|United States||Los Angeles||14,900,000||6,299||2,365|
After the areas of interest were defined, generated and included all desired information for analysis, an OSM history dump file was retrieved from the OSM project [38
]. This particular file includes the entire history (versions) of all geodata that is included in the worldwide OSM database until October 19, 2012. Doing so enabled us to analyze the potential development of the datasets for each urban area for the past few years by clipping the information from the worldwide dataset and applying Java based tools that were specifically developed for this research project.
5. Conclusions and Future Work
The analyses presented in this article provided detailed information about the concentration of OSM geodata and its contributors for 12 selected worldwide urban areas. The main objective of the article was to determine similarities or significant differences between the selected areas regarding their data growth and collection efforts by the OSM community. The results showed that the urban areas provide significantly different data concentrations in OSM, which can be caused by data imports for selected areas or large differences between community contributions. The results also highlighted the differences between European and other world regions in OSM. Especially the number of OSM members can differ largely in this case. With the exception of Istanbul, all tested European areas show higher OSM member concentrations than other areas with high population density values such as Cairo or Seoul. Moscow proved to be a positive example outside of Europe with a large OSM community.
When splitting the OSM contributors into different groups, based on their number of edits made to the map data, all tested areas show similar patterns. About 7% of the data contributors are very active “Senior Mappers” while 28% fall into the Junior Mapper category with fewer contributions. The largest group of data collectors is represented by the “Nonrecurring Mappers” with 66%. The determination of the active time frames of the members showed that about 16% of all OSM contributors in each area have been active within three months by making at least one edit to the map. However, only 3% of the members can be considered as very active “Senior Mappers”. The data also revealed that the absolute number of active OSM members has no impact on the activity spectrum of the volunteers. The most active “Senior Mappers” created on average about 90% of the data in the urban areas and worked on about 9 of the tested 90 days and created almost 1500 Nodes, 230 Ways and 4 Relations total. The temporal data quality proved to be highly influenced by the size of the community in each urban area, which confirms similar findings for France [13
]. Smaller communities do not guarantee continuing data collection or correction efforts and thus make the datasets outdated.
Further results were gathered by analyzing and comparing local to external data contributors. Especially urban areas with lower OSM community member numbers show large (sometimes more than 50%) external member data contributions. Especially Cairo, Istanbul, Johannesburg and Los Angeles rely on these non-local members. In general, this pattern contradicts in certain aspects the main idea behind VGI projects as defined by Goodchild [42
] in which “local volunteers” should be the main source of information. However, Neis & Zipf [16
] already proved that more than 50% of the worldwide “Senior Mappers” of the OSM project contribute data to two or more countries and do not limit their efforts to local areas. Due to the fact that the population density did not provide enough evidence of impacting OSM member numbers, other socio-economic factors were taken into consideration. It was hypothesized that income might be a major influential factor. The analysis showed that urban areas with higher income values such as Sydney, Los Angeles, Seoul and Osaka could potentially inherit larger OSM communities than currently available but still show a correlation between income and OSM contributor ratio. Berlin has a slightly lower average income in comparison to other tested areas and a relatively high member density, but can be considered as an exceptional case. Overall the conducted analyses do not completely confirm prior results gathered for England where “more affluent areas and urban locations are better covered than deprived or rural locations” [9
]. However, a more comprehensive investigation with additional urban areas, which increases the sample size, could improve the findings of our analysis and statistical results presented.
Questions remain about potential other reasons that would explain why urban areas such as Los Angeles or Seoul only show small OSM communities and not similar success as in Europe. Possibly differences in Internet access, culture, mentality, personal interests or acquaintance to the project due to language barriers could play a role. Others would argue that countries with freely available datasets, e.g., provided by the government such as the TIGER/Line datasets in the US, are slowing down data contribution efforts in OSM. Other influential indicators could most likely only be determined by conducting an extensive survey.
The assessment of the quality of the data collected by external OSM members in comparison to local members was not part of this study. However, it was clearly shown that large data contributions have been made in selected areas by members that maybe never collected data locally in person and lack the “local expertise” [42
] that are making VGI projects unique. Based on these findings, investigations planned for the future will reveal some answers to questions such as: Do external or remote members provide a better, equal or worse data quality when contributing to the project? A similar approach to the one chosen during the analysis of Wikipedia and “The Roles of Local and Global Contribution Inequality” [43
] could provide some meaningful insights. Geometric differences such as inconsistencies in positional accuracy will most likely be limited due to the high resolution images that the mappers can utilize when tracing data for OSM, as long as they are not outdated. However, a metadata analysis including street names, street types or turn restrictions could introduce some of the caveats of remote data contributions in OSM.