Exploring Data Model Relations in OpenStreetMap

: The OpenStreetMap (OSM) geographic data model has three principal object types: nodes (points), ways (polygons and polylines), and relations (logical grouping of all three object types to express real-world geographical relationships). While there has been very signiﬁcant analysis of OSM over the past decade or so, very little research attention has been given to OSM relations. In this paper, we provide an exploratory overview of relations in OSM for four European cities. In this exploration, we undertake analysis of relations to assess their complexity, composition and ﬂexibility within the OSM data model. We show that some of the patterns discovered by researchers related to OSM nodes and ways also exist in relations. We ﬁnd some other interesting aspects of relations which we believe can act as a catalyst for a more sustained future research effort on relations in OSM. These aspects include: the potential inﬂuence of bulk imports of geographical data to OSM, tagging of relations, and contribution patterns of edits to OSM relations.


Introduction
The OpenStreetMap (OSM) project and database have both received increasing research attention from academics and analysts over the past number of years.The majority of geocomputational research reported has focused on the OSM database and the OSM data model.The OSM geographic data model is reasonably easy to understand.There are three principal data object types: nodes (points), ways (polygons and polylines), and relations (logical grouping of all three object types).At the time of writing in August 2017, in the global OSM database, there are over four billion nodes, over four hundred million ways, and over five million relations [1].Nodes and ways have been the subject of the most intense research investigation while there has been limited work reported or published on relations in OSM.This is surprising given that, as we shall discuss, relations provide the constructs to model complex real-world features, geographic relationships, and representations within the OSM database.Relations play a very important role within the OSM data model.While there has been limited academic investigation into relations in OSM they provide OSM with a means of implementing complex spatial and topological relationships.
A relation is one of the core data elements that consists of one or more tags [2] and also an ordered list of one or more nodes, ways and/or relations as members which is used to define logical or geographic relationships between these elements.A member of a relation can optionally have a role which describes the part that a particular feature plays within a relation.A simple example of a relation might be a university campus.The university campus itself contains many ways, representing roads and buildings for example, and nodes representing points of interest around the campus.The relation then expresses the geographic relationship of the campus between these node and way elements.Another very popular type of relation in OSM is called a turn restriction.These relations are usually applied at road intersections, streets, or junctions.Such relations have a set of tags describing the type of turn restriction.Turn restrictions, such as No-left-turn, restriction=only_straight_on and No-U-turn, regulate traffic flow at intersections.The relation then includes the ways and nodes which make up the physical road infrastructure in the junction or intersection.In a similar way to nodes and ways, relations are created and edited by contributors to OSM using software such as JOSM (Java for OpenStreetMap [3]).Routing software and applications which use OSM data then use these relations to create workable optimal routes between locations.We believe that relations play a very important role in modeling complex real-world geographical relationships and phenomena in OSM.Without relations, the OSM database could not be used for applications such as routing [4,5], traffic simulation [6] or Web Map Services (WMS) [7] to name but a few.Indeed we believe anecdotally that the role of relations is not actually apparent to many of external users and consumers of OSM data.To address this apparent gap in the knowledge of Volunteered Geographic Information (VGI), this paper has three research goals which are outlined as follows:

•
To carry out an exploration of the types of relations represented in OSM for four large European cities and understand how they model geographical relationships and phenomena in these cities.

•
To investigate how the relations are represented, for these four cities, within the OSM database as data objects.

•
To explore the research question: are the patterns of contribution, editing and tagging for relations similar to those observed by other researcher work on OSM nodes and ways?
At the conclusion of this work, we will provide some general observations and conclusions about this work along with a number of suggestions for future research work which could be considered.
The remainder of this paper is organized as follows.In Section 2, we discuss some related work on relations and some of the software tools available to work with relations in OSM.In Section 3, we discuss how relations are represented within the OSM database.In Section 4, we outline the analysis we performed on the relations extracted from OSM for four European cities: Madrid, Rome, Vienna and Zurich.In the final section of the paper (Section 5), we draw some conclusions from our experimental analysis and outline a number of suggestions for future research work in this area.

Related Work and Applications
As mentioned in the previous section, there are few published articles directly dealing with research on relations in OSM from the academic community.However, there is some attention drawn to relations in a few papers published in the literature.
Related Academic Work: Barron et al. [8] consider a methodology for intrinsic quality analysis of OSM data.Much of the focus of their proposed methodology is on nodes and ways with the authors concluding that future work must involve relations (e.g., turn restrictions, bus routes, etc.).Current software, such as software to import the OSM history into a relational database does not support the import of Relations to the database.Subsequently relations are not considered as a part of this proposed methodology.The work of Neis et al. [9] on vandalism detection in OSM certainly mentions the usage of relations.However, the work considers relations only from the viewpoint of how many relations have been created or edited by a particular OSM contributor.Eiter et al. [10] consider the problem of description logics for knowledge representation and reasoning.They mention, without specific implementation or proof, that relations in OSM are spatial objects which have an inherent structure of containment, bordering, and overlapping, which can be exploited to generate topological relations (e.g., contains).The addition of tagging on relations supplies spatial information allow the relations to be naturally represented as spatial knowledge concepts.Krajzewic et al. [6] use OSM as an input data source for their open source microscopic traffic simulation package SUMO (Simulation of Urban MObility) [11] .SUMO uses the relations in OSM to understand traffic flow using turn restrictions, connected routes, street segments, etc. Graser et al. [5] uses the relations in OSM to extract explicit turn restriction information for their work in development of an Open Source analysis toolbox for street network comparison between geographic datasets.The work by Senaratne et al. [12] provides an up-to-date review of various quality measures and indicators for selected types of VGI and existing quality assessment methods.In addition to this review the authors also propose a classification of VGI with current methods utilized to assess the quality of selected types of VGI.In a recent edited volume by Capineri et al. [13], there are several chapters outlining recent work on what motivates citizens to provide VGI and the subsequent factors which govern or predict its validity and quality.
To effectively study how crowdsourced geographic data evolves over time authors such as Harvey [14] suggest that distinguishing contributed crowdsourced data from volunteered crowdsourced data is important to start to understand the nature of sources of crowdsourced data of any provenance and to help begin to identify possible biases.Harvey argues that there is a significant difference in contributions provided in an automated versus manual way.Quattrone et al. [15] present a model that attempts to leverage the characterists of urban-based crowdsourcing in order to describe its digital evolution over time.The authors conclude that "digital mapping of spatial urban information is governed by complex dynamics".Geo-demographic factors are of major importance and have great influence over how the early stages of mapping are shaped.Further consideration of how crowdsourced datasets, such as OSM, evolve over time reveal biases with the mapping patterns.Studies, such as those outlined in Quattrone et al. [16], develop methods to quantitatively measure the bias introduced by more established users in comparison to occasional contributors in crowdsourced mapping.The authors found strong bias in terms of the geographical areas being mapped.Hecht and Stephens [17] also consider urban biases in Volunteered Geographic Information.They argue that, while they have conceptualized the notions of rural and urban as binary at a broad level these categories are more naturally conceived along a continuum.Spielman [18] argues that user-generated maps are complex social systems and the quality of a contribution is difficult to assess and subsequently he argues that the structure of groups is more important than the intelligence of group members in VGI projects.
Software and web-based applications: Despite the absence of extensive academic inquiry into relations, the OSM community itself has developed a number of software applications for error checking and quality assurance on relations.The Relation Analyzer (RA) [19] analyses OSM relations for several quality-assurance-related aspects (amongst others): relations with gaps accompanied by a list of the segments if there are gaps, a height profile, type of ways included, tags.The RA tries to build a linked series of ways using some general rules and then creates a rating based on the type of relation this object corresponds to.Not all relation types are supported yet.The Keepright tool [20] flags the existence of relations which have no type tag and relations which have no tags at all.The Keepright tool can also perform some quality analysis (QA) checking on relations which represent turn restrictions.The JOSM (Java For OpenStreetMap) tool is one of the most frequently used tools in OSM by contributors to edit the contents of the OSM database.JOSM is usually used by more experienced contributors to OSM.JOSM provides some support for QA of relations [21] as they are being created or edited.Two of the most popular software tools for the import of raw OSM data into PostgreSQL PostGIS enabled databases [22] are osmconvert [23] and osmosis [24].Both tools take OSM data and perform extract, transform and load processing into a database with their predefined data model structure.Both osmconvert and osmosis import relations including all members of each relation, all tags and all attribute information.It is then possible, although not always possible or simple, to recreate the geometry of the relation by computing the geometry of all members of the relation and reconstructing the relation structure.

Working with Relations in OSM
Before we detail our experimental analysis, we shall now provide a more formalized definition of an OSM relation.In OSM, a node n is a geographic coordinate with tags n t where the set of tags can be empty.A way w is an ordered set of nodes (n 0 , n 1 , . . .n N−1 ).If the way w represents a polyline feature such as a road or path then the n 0 = n N−1 .However, if the way w represents a closed polygon feature such as a building or a lake then n 0 = n N−1 .For polyline ways, there should always be a minimum of two nodes while for closed polygon features there should always be a minimum of three nodes.A way can also have a set of tags w t .The set w t can be empty, but normally it is encouraged that this is not the case.Finally in OSM a relation r is an element representing the logical relationship between objects.A relation r is represented by a set containing at least one object.A relation can contain any combination of existing nodes, ways and even relations themselves.These are called members.As is the case with nodes and ways relations can also have a set of tags r t associated with them.As before this set can be empty but this is discouraged.The OSM Wiki has a large section dedicated to relations [25] and this is an important source of guidance to contributors to OSM.The documentation around relations recommended that relation objects only have about 300 members.If it is necessary to handle more than that amount of members contributors are encouraged to create several relations and combine them with a Super-Relation.The thinking behind this is as follows.The more members that are included in a single relation, the more difficult it is to handle, the easier logical errors and conflicts can occur.Ultimately, the OSM Wiki outlines these super relations consume additional resources at the database and server side.
To illustrate the data representation of a relation in OSM Figure 1 shows the XML (eXtendible Markup Language) representing a relation in the Zurich, Switzerland.This relation is of type route and represents a cycle route which is part of the regional cycle network of Switzerland.There

Types of Relations in OSM
According to both the OSM Wiki and TagInfo [26], the five most frequently occurring types of relations (where r t contains the tag for type) are outlined as follows: • Multipolygon: A multipolygon relation can have any number of ways and these ways must somehow form valid rings to build a multipolygon from.Multipolygon relations are used to represent complex areas.Generally, the multipolygon relation can be used to build multipolygons in compliance with the OGC Simple Feature standard.Subsequently multipolygons allows for the expression of arbitrarily complex relations within OSM. • Restriction: Relations with the type=restriction key allows the modeling of different types of traffic flow restrictions at junctions, intersections, etc.A turn restriction at a junction is represented by a relation that has a set of tags describing the type of turn restriction.This turn restriction relation is not necessarily limited to turns.It can also be used in a number of other situations.• Route: This type of relation models a regular known line or path of travel.According to the OSM Wiki routes can consist of paths taken repeatedly by people and vehicles: a ship on the predefined shipping route, a car on a numbered road, a bus on its route or a cyclist on a national route.• Public Transport: Relations with this type=public_transport allow for the description of relations used in the public transport tagging scheme in OSM.This relation corresponds to the description of all types of public transport stops, stations, halts, areas or similar.As stated in the OSM Wiki pages "A stop area consists of everything regarding the embarkation and disembarkation of a specific public transport vehicle or service" [27].In the example of a specific railway line, this includes the adjacent platform, services on that platform, buildings, and information describing it (platform number, identification, etc.).• Route Master: Relations of type type=route_master contain all the direction and variant routes and information belonging to a whole route service.Routes or services are represented by vehicles that always run the same way with the same reference number.Each direction of a route should be tagged as a separate relation.If a route has several variants (e.g., different way at weekend), these variants should also be in separate relations.
In addition to relations with the type tag key assigned to one of the five values above, there are other frequently occurring types of relations in OSM including boundary, associatedStreet, site, waterway and bridge.However, these types of relations appear much less frequently within the global OSM database and subsequently we decided to omit them from our analysis.

Experimental Analysis
In this section, we outline the experimental analysis we performed on the relations extracted from OSM for four European cities: Madrid, Rome, Vienna and Zurich.As this was an exploratory examinations of relations in OSM, we did not impose strict criteria when choosing the case study cities.There are many European, or global cities, we could have chosen.Informally, we choose these four locations from our own preferences.We believe these cities have strong OSM contributor communities, good OSM coverage, different sizes of overall geographical area, and the likelihood of a large number of relation objects.We also felt that it would be important to consider cities where bulk imports of geographic data had been performed on the local OSM database.Bulk imports have been identified as having influence on contribution patterns in OSM [28].

Relation Data Extraction
To perform the data extraction, we carried out the following set of steps.The OSM XML for Madrid, Rome, Vienna and Zurich was downloaded from GeoFabrik on 30 June 2017.We downloaded the whole country files for Spain, Italy, Austria and Switzerland and then extracted the cities directly from this.Using a Python script, we extracted all of the relations for each of the cities and stored these in a PostgreSQL PostGIS database.Java software was developed to perform the analysis of the relations within the PostGIS database.R was used for statistical calculations and visualisations.In summary, the number of relations in each city are as follows: Vienna 9177, Zurich 1855, Rome 13,226 and Madrid 6403.We used the OSM relation representing each city boundary in OSM as a means of extracting the relations in each city in an equivalent manner.In cases where the OSM relation crossed the boundary of the city the entire relation was extracted.All attributes of each relation are stored in the PostGIS database.These include all of the attributes displayed in the XML in Figure 1.We calculate a number of additional attributes including the number of members in each relation, number of tags, etc.

Relation Analysis
To consider a consistent approach to the analysis of the relations, we pay particular attention to the five most popular types of relations, as outlined in Section 3.1.The area in km 2 extracted for each city is also provided in the third row of the table.Table 1, summarises the five most popular types of relations (based on the type tag key) in all of our case-study cities.These are presented as percentages of the total number of relations in each city.The results in this table are reasonably consistent across each of the four cities.There are some notable differences in particular involving Zurich.Zurich appears to have a smaller proportion of type=multipolygon relations than the other three cities.Vienna has the lowest percentage of type=restriction for turn restrictions.The use of the type=route relation is high in all cities with low percentages observed for type=public_transport in Rome and Madrid.We find that type=route_master has the low representation in all of the four cities.This could be related to the fact that type=route_master represents a public transport service and therefore probably correlates to the number of public transport service lines or routes in a given city.

Composition of Relations
As discussed in Section 3, relations are composed of members.A member can be a node, way or a relation itself.Table 2 summaries compositions of data types within relations in these four cities.In the city column, the pound sign # indicates the number of relations with the corresponding member composition while the % indicates this number as a percentage of all relations in the city.The column N indicates the numbers of relations comprised of only node objects, W indicates the number of relations comprised of only way objects, R indicates the number of relations comprised of only relation objects.The remaining columns represent combinations of compositions of nodes, ways and relations.The most frequently occurring composition for relations are those composed of only ways (the W column) and then relations composed of nodes and ways (the N + W column).These two composition represent at least 85% of all relations in all of the four cities.We believe that this is a healthy situation.It would appear to indicate that these relations are more efficiently handled, managed and visualized by OSM software tools.It may also indicate that the contributors of these relations are focused on expressing real-world relationships between objects in a simplified way.By far the least popular compositions of data types for members within relations in the four cities involve compositions which include relations themselves.As mentioned above, the OSM Wiki pages on Relations [25] indicate that this recursive definition of relations creates what is known as a super relation.While these super relations are a valid logical construct, their use is discouraged due to the very poor software support for them when processing and using OSM data at this time.We investigated the presence of super relations in all four cities.Taking our guidance from the OSM Wiki pages [25], we considered a relation as a super relation if it had 300 or more members defined within it.Overall, the numbers of such relations are, as expected, very low.Madrid has a total of 33 super relations, Rome has 46 and Zurich has 55.The exception for the four cities is Vienna, with a total of 149 super relations, of which 86 of these relations have between 300 and 500 members.Taking Vienna as an example, some of the largest super relations (21 have over 1000 members) represent long distance bus routes, very long hiking trails of hundreds of kilometers, or parts of the International European route network.Only 8 of the identified super relations in Vienna do not have the type=route tag.Even when we consider a wider definition of super relation to include all relations with 100 or more members, we still find that the number of relations exhibiting this characteristic to the small.There are 154 (8.30%) in Zurich, 386 (6.20%) in Madrid, 556 (6.05%) in Vienna and 673 (5.08%) in Rome.

Membership Size of Relations
The previous subsection concentrated on the composition of super-relations.However, in reality, these super-relations and relations with a large number of members are for the most part in the minority.Upon a deeper inspection of the membership size of relations in all of our four cities, we found that the actual distribution of membership size for relations is very heavily skewed towards relations with 10 or fewer members.Considering relations with 10 or less members we find that there are 1377 (74.35%) in Zurich, 5445 (85.03%) in Madrid, 7655 (83.40%) in Vienna and 11,902 (89.81%) in Rome.When we extend this to relations with 20 members or fewer we observe only a small increase in the relative percentages.These are 1462 (78.92%) in Zurich, 5582 (87.17%) in Madrid, 7911 (86.20%) in Vienna and 12,133 (91.73%) in Rome.This is an important result, as it reinforces the advice from the OSM Wiki which advocates keeping the number of members within any relation to a managable number.In Figure 2, we provide charts for the distribution of membership size of relations where the number of relations are 10 or less.The distributions are remarkably similar with all of the four cities displaying very dense clustering around membership size of two and three members.However, no strong patterns emerge.We speculate anecdotally the clustering around membership size of two or three members maybe related in some way to the experience levels of the contributors or the advice from the OSM Wiki on maintaining relations with a manageable number of members.This issue will need to be investigated further.
In each of the four cities, we found a small number of relations containing only a single member.While this is a valid relation in OSM and by our definition in Section 3, it may not be correct in reality and may be the result of an erroneous interpretation of what relations actually represent.There are 48 of these single member relations in Vienna, 61 in Madrid, 69 in Zurich and 199 in Rome.It might be useful that contributors to OSM in these cities inspect these single member relations in order to ascertain if there is corrective action needed.In the case of Zurich, the majority of these relations have type=public_transport, while, in the case of three other cities, the dominant type of relation has the tag type=multipolygon.

Tagging of Relations
Tagging in OSM has been well studied over the past number of years by many authors.See [29][30][31][32] amongst others for recent treatments of this area.As discussed in Section 3, relations can have tags associated with them in the same way that ways and nodes in OSM have tags.The KeepRight QA tool [20] in OSM flags an error if it finds a relation with no tags.Figure 3 shows the distribution of the number of tags associated with each of the relations in the four cities.A familiar pattern is observed here for tags on relations which has been reported by several authors for nodes and ways in OSM [33][34][35].We see that over 85% of relation objects in all four cities have 3 tags or fewer.Zurich and Vienna show a higher percentage of relations, with more than 3 tags.We believe that this may be related to specific tagging strategies for relations related to public transportation or buildings.However, a closer analysis of these objects is required in order to correctly ascertain why this is the case.
In Table 3, we summarize an analysis of tags associated with the five most frequently occurring types of relations.These relations are identified using the type tag key.They are amongst the most frequently mapped relations in the global OpenStreetMap database [26].For each polygon, we consider the other tags with co-occur with the type key.The table is organised as follows.For each city, the digit 1 following the city name indicates the most frequently occurring tag key with the corresponding type tag key for that city.The digit 2 following the city name indicates the second most frequently occurring tag key with the corresponding type tag key for that city.Table 3 exhibits a very consistent pattern of tagging across all four cities and the five relation types.There is complete agreement in regards to the tagging of type=restriction relations.There are small variations in route, public_transport and route_master.However, all of the co-occurring tag keys are valid tag keys.It is interesting to note the co-occurring tag keys for type=multipolygon.We see agreement in Rome and Vienna on building and landuse, while we see Madrid and Vienna agreeing on building and addr:street.This variation may possibly be explained by the bulk import of openly available geographic data into OSM in Madrid and Vienna.Bulk imports for both areas or regions are reported on the Catalogue of Imports to OSM [36].

Edit Frequency of Relations
The next aspect of relations we investigated was their edit timestamp.Using the timestamp attribute extracted for each relation, we can obtain an insight into how recently relations have been edited or changed.For each of our four cities, we calculated the Age of each relation by calculating the number of days between the timestamp of each relation and the date upon which we downloaded the OSM data from GeoFabrik (30 June 2017).Figure 4 provides the distribution of the Age of all relations in each city where the bin size is 180 days which is approximately 6 months.Both Zurich and Vienna show a large number of relations which have been edited in some way in the past 180 days.
The distributions for Rome and Madrid are very different to Zurich and Vienna.The majority of relations in Madrid have been edited in some fashion over the previous four years.The distribution of Age in Rome appears to indicate that there are three distinct time periods for editing of relations: within the last 180 days, between 2 and 3 years ago, and also between 5 and 6 years.We conducted a review of the change sets for Rome during this time and found that relations with ages 540 days to 720 days (around 2 to 3 years ago) there was over 2000 relation objects edited with over 70% of these relations having type = multipolygon.A similar situation occurs during for relations with ages 720 days to 900 days with over 1100 type = multipolygon relations edited.Both situations could be the result of a bulk import of data to OSM or a large-scale organized mapping activity in the form of mapping parties [37,38].A more detailed investigation of the editing patterns of relations in Rome is required in order to understand the significance of these periods.In the case of Vienna, an investigation of change set activity revealed that in the age period 1620 days to 1800 there were 1570 relation objects edited, 90% of which are type = multipolygon.Two users are responsible for over 50% of these edits.This potential bulk import explains the spike in the Vienna distribution during this age period.Another aspect of the edit frequency of relations is contained in the edit version number attribute.Each time an object is edited and submitted to the OSM database, the version number attribute of that objected is incremented by one.Several authors [5,8,39,40] have considered the version number of objects in OSM as an important attribute in relation to data quality, contributor pattern analysis and general understanding of OSM mapping.In Figure 5, we provide the distribution of edit version number of all relations within the four cities.We see a very similiar distribution with all cities containing a very large percentage of relations with between 1 and 10 versions.

Contributors to Relations
Our analysis presented in this paper is limited to the current version of OSM data for our four case-study cities.Subsequently, it is not possible to provide an in-depth analysis of the contribution patterns to relations in these cities without analysis of the edit history.Each relation contains the username and user-id of the OSM contributor who edited the current version of the corresponding relation.To align this work with more formal investigations of contribution patterns in OSM [41][42][43], we observed the edits performed by the ten most frequent contributors to relations in each city.There is no immediately obvious patterns in the absence of a more longitudinal analysis of the entire history of edits.However, it is interesting to see that the top ten contributors certainly provide a high percentage of editing effort.The total percentage of the current version of all relations edited in each city by the top ten most frequent contributors to relations are as follows: Vienna 53.44%, Zurich 57.08%, Madrid 61.13% and Rome 85.23%.Through further investigation in the Rome dataset, we found that one particular user (username = Davlak) was the most frequent contributor, and responsible for 32% of the current versions of relations in Rome.It appears that these relations are mostly part of a bulk import or strategic editing of public transportation related relations in Rome.

Conclusions and Future Work
In this paper, we have reported on an analysis of relations in OSM for four cities in Europe.Relations are one of the three core elements in the OSM data model.As discussed in Sections 1 and 2 OSM relations have not been discussed in detail within the literature.While this certainly does not indicate that there is no such research work being carried out, it does highlight a gap in the knowledge of Volunteered Geographic Information (VGI) data and demonstrates the potential need for more research activity in this area.Crowdsourced mapping, such as OSM, opens up fundamental ontological and epistemological questions about the process of mapping according to Dodge and Kitchin [44].We feel that it is necessary that more in-depth studies of relations in OSM and how they are created and edited by the crowd will yield some very interesting outcomes.This paper has attempted to explore similarity or dissimilarity of relations in four cities in order to begin addressing this potential gap in VGI knowledge.As a first step, this paper proposed three clear exploratory research goals which were outlined as follows in Section 1: • To explore the types of relations represented in OSM for four large European cities.
• To investigate the similarities between the implementation of relations and their representation in the four cities. • To explore the research question: are the patterns of contribution, editing and tagging for relations similar to those observed by other researcher work on OSM nodes and ways?

Summary of Key Findings
Section 4 outlined our analysis of relations in four European cities.There are a number of key findings in this paper and they are summarized as follows: • Composition of Relations: In our analysis, we found that relations composed of only way objects and relations composed of only node and way objects are by far the most frequently occurring composition arrangement for ways in all four cities.Much smaller, but not insignificant, numbers of relations are comprised of only node objects or only relation objects.More indepth analysis of the composition of relations is required to better understand the spatial and topological relationships they are trying to represent.Schultz et al. [45] conclude that the composition of relations, the spatial relations they represent and the tags associated with the relations could provide opportunities for applications such as the derivation of Land Cover classes from Land User map data.Mainzer et al. [46] present a new method to provide local decision makers with tools to assess the remaining (roof-mounted photovoltaic) PV potential within their respective communities.It allows highly detailed analyses without having to rely on 3D city models.OSM is used for the building footprint data.The authors estimate building size from the geographical area of ways relations with the building tag.However, their results are dependent upon accurate building tagging of ways and relations.It can also be assumed that the topological validity of relations would also need to be investigated to ensure that an accurate estimate of building footprint area was being calculated.• Membership Size of Relations: The distribution of membership size of relations is very heavily skewed towards relations with 10 members or less.We describe in Section 4.2.3 that over 70% of relations in all cities have 10 members or fewer.The tendency towards smaller relations follows the advice of the OSM Wiki on maintaining relation objects with a manageable number of members.We did not find specific work reported in the literature related to the membership size of relations.We find that overall the number of members in relations appears relatively small.A wider sample of cities is required to establish if this is a general trend or pattern.Additionally, we feel that the membership size of relations could play an important role in assessing data quality variables such as completeness and coverage.The number of members within a relation may also have influence on any semantic interpretations extracted from these data • Tagging on Relations: Over 85% of relations in all of our four cities have three tags or fewer.
The two most frequently co-occurring tag keys with the type tag on relations are very consistently applied across the five most popular type tags in all of the cities.Tagging and the subsequent maintenance of tags on objects in OSM is a well studied problem.In Quattrone et al. [47] the authors find, in a global study of OSM Points-of-Interest objects that only a minority of POI types (fewer than 10% of all types) are actually being frequently maintained (for example tag edits) and that several hundreds of POI types receiving near zero maintenance instead.Bakillah et al. [48] sudy several major European cities.While they do not provide precise statistics, they remark that there is a lack of tagging observed on way objects representing highways in all of the cities investigated.In an extensive global analysis, Davidovic et al. [30] found that for 40 cities globally there was very often a very low number of tag keys used, with a mean of fewer than 2 (approximately 1.33) additional tags per object for way objects.In Schultz et al. [45], the authors consider which tags and relations in OSM can be used to create Land Use and Land Cover (LULC) classes from the Corine Landcover Classes.The authors convert ways and relations into polygons without mentioning specific characteristics of the relations involved.The focus is on the tags attached to the relations.With the available tags from relations, on OSM data from Germany, accurate LULC classes could be derived.• Edit Frequency of Relations: We calculate the Age of relations as the number of days between the date of download and the current date timestamp on the relation itself.In all cities, a large number of relations are edited within the last few years.Rome and Madrid shows different distribution of the Age of relations.We see that the vast majority of relation objects in all four cities have between 1 and 10 edit versions.Authors such as Mooney and Corcoran [40] and Barron et al. [8] have drawn attention to heavily edited objects in OSM with specific focus on those objects with high version numbers.For example those objects with version number greater or equal to 15. Quattrone et al. [47] conclude from their global study of OSM that some maintenance actions, such as the addition of new tags to existing spatial objects, are much more frequent than other actions, such as the updating or the removal of existing tags.The distribution of version edit number towards lower version values on relations may indicate reluctance on the part of many contributors to edit relations within the OSM software.In their work on analysis of the history of OSM objects, Mooney and Corcoran [49] find that over 90% of objects in the their analysis have three or fewer versions.However this requires further investigation in regards to relations.The complexity of relation objects may also have a significant role to play in their actual editing by contributors.Efentakis et al. [50] argue that in the case of turn restrictions for navigation satellite imagery cannot testify to the existence of restrictions and contributing turning restrictions even for a single road to the OSM dataset may be extremely time-consuming.• Relation Size: The size of relations refers to the number of members contained within the relation itself.In this respect, relations are different to OSM nodes and ways which are single self-contained objects.In our review of related work, we were unable to find other reported results dealing with the size of relations in OSM.Subsequently, as a first step, we decided to investigate the general distribution of relation sizes within the four cities.Eighty per cent of relations in Madrid, Vienna and Rome have 10 members or fewer.Figure 2 visualizes the distribution of membership size when we consider only relations with 10 or fewer members as a sample of the entire population of relations for each city in Section 4.2.3.Super relations (those with more than 300 members do exist in all of the cities in small numbers).Considering a wider definition of super relations to include those objects with more than 100 members these only account for between 5% and 8% of all relations in the cities.We feel that the size of a relation may, in some way, be related to its spatial complexity.Anecdotally, we speculate that relations of small size will not be representing complex spatial relationships or topologies.However, relation size along would not indicate the true nature of the complexity of the relation.It would also be necessary to consider the members themselves within the relation.• Contributors to Relations: In the absence of a full analysis of the history of all edits to relations in each city it is difficult to draw robust conclusions about contribution patterns.However, it was observed that, in the current version of relations in the OSM database for the four cities, a small number of contributors are responsible for a considerable amount of the current version edits.At this stage of our work, it is difficult to accurately quantify the sociological factors most influencing the obtained results.In the paper we concentrated on an analysis of relations as geographical data objects without considering the geodemographics or socio-demographic of the case-study areas.Quattrone et al. [16] argue that over time the user base in crowdsourced geographic mapping projects changes across a variety of dimensions (e.g., size, demographics, expertise).There is also potential impact on the user base itself with the authors stating that "some users may lose interest, once they see certain object types have been completely mapped, or when they see 'sufficient' information being mapped".Other studies (such as [51]) find that the majority of contributors to nodes and ways in OSM are undertaken by experienced contributors.Gröchenig et al. [51] find that less than 3% of all contributors in OSM have contributed to the project on more than 100 days.

Suggestions for Future Work
This paper is our first exploration of relations in OSM.Many interesting aspects of relations have emerged.Subsequently, growing from this work, there are a number of interesting areas for future work which are outlined as follows: • From our reading of the OSM Wiki [25], more standardised documentation on relations on the OSM Wiki is required.At present, the information on the OSM Wiki is somewhat scattered and disorganized making it a little difficult to consume.We will consider making edits to these Wiki pages in the future.• Visualisation of large relations: Visualisation or rendering of many relations with a large number of members is a difficult computational task.This task often times out on the OSM website.
The reasons for this are related to the complex nature of large relations which also contains relations themselves.The rendering of these types of relations in real-time is computationally difficult.• Understanding the complexity of relations: As we discussed above, it is a difficult task to effectively assess and understand the type of spatial and topological relationships which are being represented by a relation in OSM.Membership size of relations could be a useful high level indicator.For example, relations with more members have more spatial and/or topological complexity.However, we believe, further work is required in developing approaches or methodologies to quantitatively assess the complexity of the spatial relations or topological relations expressed by a relation in OSM.This type of work would allow researchers and OSM experts to better assess the situations within OSM where relations are needed.• Contributors and Relations in OSM: In this paper, we have only considered the current version of all of the relations in the four case-study cities.Researchers [35] have considered longitudinal analysis of the contributors to OSM for ways and nodes.A similar study into the history of edits to relations may indicate the characteristics of contributors to create and edit relations.Anecdotally, relations are only created and edited by more experienced contributors to OSM.Are all contributors to OSM actually aware of relations?Efentakis et al. [50] uses the example of turn restrictions in OSM to state that most users are completely unaware of this type of tagging.Is it the case that only experienced OSM contributors actually work with relations and that the complexity of the relation data object is actually a barrier to wider contributions?Amongst others Mooney and Corcoran [35] show that in OSM the top 10% contributors (ranked by their quantity of contributions) perform over 90% of all object creations and edits.Within this top 10% there are what the authors call "isolated" contributors who appear to be working completely on their own without editing the work of other top contributors.Future work is required to investigate if these patterns are also visible within the contribution history on relations in OSM.• Our analysis considered four European cities.An area for immediate future work would include extending this analysis to include more cities and regions with different sizes, populations, OSM communities, etc.The extension of the research in this paper could also include the full edit history of relations in case-study areas.The edit history for relations is also downloadable from the OSM Wiki.We are aware that our selection of cities for the case-studies are the result of personal selection.To indicate the presence of more general trends or patterns we will need to carefully select cities or urban areas based on their urban characteristics or OSM community structure.Ballatore et al. [52] emphasise the need to ensure that OSM analysis extends beyond the "typical Anglo-American bias".They recommend that analysis considers a diverse set of national and regional OSM communities.
We believe that this is the first academic paper to report the results of a study into relations in OSM.Relations in OSM provide database representations of the complexity of the real world for infrastructure such as junctions and road intersections, route structures such as hiking and bike trails, public transport lines and routes, and multipolygons representing a plethora of objects from lakes to university campuses.In this way, we feel that it also indicates that the contributors to OSM who work with relations have a very high level of understanding of geographical representations and geocomputation.In closing, we encourage other researchers to undertake work to investigate the properties, characteristics and usages of relations in OSM.
are 14 members of this relation.Two of the members are themselves relations while the remaining 12 are ways representing part of the bicycle route ref 66.The first part of the XML for the relation tag has all of the standard OSM attribute information: the current version number of the relation, the timestamp of the last edit, the user ID and username of the last contributor to edit the relation, the change set of edits that this relation edit in contained in, and the OSM ID.Each relation, as is the case for nodes and ways, has its own unique identification number (OSM ID) within the OSM database.If one can access these number then the OSM website provides a simple way to visualise relations and their history within a single API (Application Programming Interface) call.For example the Tiergarten suburb of Berlin, Germany, has an OSM ID of 55750.The API call http://www.openstreetmap.org/relation/55750 will display (render) the current version of this relation on a basemap while the API call http://www.openstreetmap.org/relation/55750/historywill also display or render the current version of this relation but also provide access to all of the previous versions or history of the relation for access.In a similar way the API call http://www.openstreetmap.org/relation/55750provides a visualisation of the relation in Figure 1.

Figure 1 .
Figure 1.An example of a relation in OpenStreetMap XML from Switzerland.This relation represents a bicycle network in Switzerland.The relation can be seen on the OSM website at http://www.openstreetmap.org/relation/28044.

Figure 2 .
Figure 2. Distribution of the number of members, membership size, in relations for all four cities where the number of members is fewer than or equal to 10.

Figure 3 .
Figure 3. Distribution of the number of tags assigned to every relation in all four cities.

Figure 4 .
Figure 4. Distribution of calculated age, in days, of each relation in all four cities.

Figure 5 .
Figure 5. Distribution of the edit version number of relations within the four cities.

Table 1 .
Summary of the five most popular types of relations in all of the case study cities.Values are expressed as percentages of the total number of relations.

Table 2 .
Composition of relations in the four cities based on the data objects of their members.

Table 3 .
Summary of the most popular tag key combinations with the five most frequently occuring relation types in the four cities.