Interoperability Conﬂicts in Linked Open Statistical Data

: An important part of Open Data is of a statistical nature and describes economic and social indicators monitoring population size, inﬂation, trade, and employment. Combining and analyzing Open Data from multiple datasets and sources enable the performance of advanced data analytics scenarios that could result in valuable services and data products. However, it is still difﬁcult to discover and combine Open Statistical Data that reside in different data portals. Although Linked Open Statistical Data (LOSD) provide standards and approaches to facilitate combining statistics on the Web, various interoperability challenges still exist. In this paper, we propose an Interoperability Framework for LOSD, comprising deﬁnitions of LOSD interoperability conﬂicts as well as modelling practices currently used by six ofﬁcial open government data portals. Towards this end, we combine a top-down approach that studies interoperability conﬂicts in the literature with a bottom-up approach that studies the modelling practices of data portals. We deﬁne two types of LOSD schema-level conﬂicts, namely naming conﬂicts and structural conﬂicts. Naming conﬂicts result from using different URIs. Structural conﬂicts result from different practices of modelling the structure of data cubes. Only two out of the 19 conﬂicts are currently resolved and 11 can be resolved according to literature.


Introduction
Governments, public authorities, and companies are increasingly opening up the data they produce or collect [1].Today, more than 2600 Open Data portals around the globe provide access to Open Data.Statistical data comprise a major part of Open Data.They typically refer to economic and social statistics monitoring the population size, inflation, trade, and employment but also to aggregated governmental data such as the number of government vehicles per public authority.Statistical data are structured as "data cubes" that comprise measures (e.g., unemployment rate) and dimensions describing the measures (e.g., country and year).
Combining data cubes from different data portals may unleash the full potential of Open Data [2][3][4][5][6] based on Business Intelligence analytics [7].Linked data is a promising paradigm towards this direction, since it facilitates data integration on the Web.In data cubes, linked data has the potential to realize the vision of performing advanced data analytics on previously isolated data cubes across the Web [8].
Data cubes can be modelled as RDF graphs using vocabularies such as the RDF data cube (QB) vocabulary [9], a World Wide Web Consortium (W3C) recommendation for modelling Linked Open Statistical Data (LOSD).Other vocabularies include the Simple Knowledge Organization System (SKOS) vocabulary [10], which is also a W3C standard, and the Extended Knowledge Organization System (XKOS) vocabulary [11].SKOS allows defining the structure and contents of code lists.XKOS is an extension of SKOS that enables modelling hierarchical structures with multiple levels.
In practice, however, creating LOSD remains a complex task due to modelling challenges that arise either because the application of existing vocabularies (which we call modelling practice) is not always clear or because of the absence of standard vocabularies [12].As a result, LOSD are not published in a uniform way and remain isolated and non-interoperable.
The problem of data interoperability has been thoroughly studied in the past in the context of databases and data warehouses (e.g., in [3,[13][14][15]).In particular, different types of interoperability conflicts (e.g., schema conflicts and data conflicts) have been defined that inhibit the interoperability of data residing in databases and data warehouses [16].
The aim of this paper is to define a framework comprising (i) interoperability conflicts in LOSD and (ii) modelling practices of portals with LOSD.To this end, we adopt a two-fold approach that (i) theoretically translates interoperability conflicts of databases and data warehouses to LOSD interoperability conflicts (top-down) and (ii) studies the modelling practices of six data portals that result in LOSD interoperability conflicts (bottom-up).
The rest of the paper is organized as follows: Section 2 presents the background knowledge required to understand the contents of this research and Section 3 describes the approach of this research.Section 4 presents details on the six investigated LOSD portals and Section 5 present interoperability conflicts of traditional databases and data warehouses.Thereafter, Section 6 defines LOSD interoperability conflicts and Section 7 evaluates the current state of LOSD interoperability conflicts.Finally, Section 8 discusses the results and identifies open research issues.

Background
This Section presents the theoretical background required to understand this study.In particular, (i) we describe the Data Cube model that is used to structure multidimensional data, and (ii) we present the main concepts of LOSD.

The Data Cube Model
The Data Cube model aims to cover the needs of the Online Analytical Processing (OLAP) and data warehouse systems [17].Although a data cube has been defined in various ways, all definitions agree that it comprises [4,[18][19][20]: (1) measures, which represent numerical values (e.g., unemployment), and (2) dimensions that provide contextual information on the measures (e.g., geospatial or temporal dimension).Dimensions are populated with a set of distinct values (e.g., a temporal dimension can be populated with values "2018", "2019" etc.).Finally, different granularities of a dimension can be represented by hierarchical levels.For instance, a geospatial dimension may have levels such as country, region, city etc..A data cube, for example, that measures unemployment rate in European countries (measure) is described by the time, the country, and the age group (dimensions).The distinct values of the time dimension could be "2017", "2018", and "2019", while of the country dimension "GR", "IT", and "FR".Similarly, the age-group dimension can be populated with values "00-24", "25-49", and "50+".All dimensions have a single hierarchical level.Additional hierarchical levels, however, could be used, for example, in the geospatial dimension by including, apart from countries, values for regions.

Linked Statistical Data
Linked data technologies are based on the Semantic Web philosophy and are mainly about publishing structured data in RDF format using URIs rather than focusing on the ontological level or inferencing [21].
The QB vocabulary [9] is a W3C standard for publishing statistical data on the Web based on the principles of linked data.The core class of the QB vocabulary is qb:DataSet that represents a data cube.A data cube comprises a set of dimensions (qb:DimensionProperty) and measures (qb:MeasureProperty), as well as attributes (qb:AttributeProperty) that are used to represent structural metadata such as the unit of measurement.Finally, the observations of a data cube a represented as instances of qb:Observation.
A common practice for populating the values of the dimensions is to re-use predefined values included in code lists.For example, the values of a geospatial dimension can be populated by values included in a code list defining the geographical or administrative divisions of a country.Code lists can be specified using either the QB vocabulary or the SKOS vocabulary.The values of code lists may have hierarchical relations which can be expressed using the SKOS vocabulary (e.g., using the skos:narrower property), the QB vocabulary (e.g., using the qb:parentChildProperty) or the XKOS vocabulary (e.g., using the xkos:isPartOf property).
Finally, the UK Government Linked Data Working Group [22] has developed a set of common resources such as dimensions (e.g., sdmx:timePeriod, sdmx:refArea, and sdmx:sex), measures sdmx:obsValue, attributes, and code lists that are widely used for creating LOSD.The definitions of these concepts are based on SDMX.

Research Approach
This paper combines a top-down with a bottom-up approach to define interoperability conflicts in LOSD.
The top-down approach translates the interoperability conflicts of traditional databases and data warehouses found in the literature to LOSD interoperability conflicts.To this end, we first studied the two types of interoperability conflicts of databases and data warehouses, namely schema and data conflicts, as well as their sub-types (see Section 5).We decided to exclude data conflicts from our research because they strongly depend on the specific values of the data cubes, hence we cannot create generic definitions.We then transformed the definitions of conflicts to LOSD interoperability conflicts based on the following assumptions:

•
Database attributes can be conceptually mapped to the main QB components of a data cube's structure (i.e., measure, dimension, and attribute properties).For example, assuming that a "Sales" database entity includes a "Country" attribute and, in addition, a "SalesQuantity" attribute to indicate the quantity of sales in each country, "SalesQuantity" can be mapped to an instance of qb:MeasureProperty and "Country" to an instance of qb:DimensionProperty.

•
The names of the database attributes are mapped to RDF properties (i.e., URIs) of the corresponding QB components.In the previous example, "Country" can be mapped to sdmx-dimension:refArea.

•
Database records can be mapped to qb:Observation triples.
The bottom-up approach studies modelling practices of LOSD portals to understand the interoperability conflicts they create.To this end, we selected six data portals i.e., the data portals of (i) the Scottish Government (https://statistics.gov.scot/),(ii) the UK Department for Communities and Local Government (DCLG) (http://opendatacommunities.org/),(iii) the environmental department of the Flemish Government (VLO) (https://id-ontwikkel.milieuinfo.be/),(iv) the Official Statistics in Japan (e-Stat) (http://data.e-stat.go.jp/lodw/), (v) the Italian National Institute of Statistics (ISTAT) that hosts the Italian census 2011 (http://datiopen.istat.it/),and (vi) the Irish CSO data portal that hosts the 2011 Irish Census (http://data.cso.ie/).These data portals were selected because (i) they are official open government data portals, (ii) they publish open government data as LOSD, and (iii) they have published a relatively large amount of datasets (see Section 4).
We visited each data portal several times and executed SPARQL queries to their SPARQL endpoints to identify and understand their modelling practices.In particular, we first used SPARQL queries to understand the general policy used by data portals for publishing data cubes.For example, the results of the SPARQL query in Figure 1 presents the graphs that include datasets in a data portal.Then, we searched data cubes for the components that are linked to the core class of the QB vocabulary namely qb:DataSet.We used the results to create consecutive SPARQL queries until we finish exploring and completely understanding the structure of the data cube.At the same time, we also checked the qb:Observation of the data cubes.For example in order to identify the name (i.e., URI) used for the measure property, we identified the qb:DataStructureDefinition resource of the data cubes and, through this, the corresponding resource that is instance of qb:MeasureProperty (e.g., eg:unemployment).
In addition, we also searched in the observations of the data cube to understand how they use the measure property.
SELECT distinct ?graph ?datasetwhere { graph ?graph { ?dataset a~qb:DataSet } } Figure 1.The SPARQL query used to understand the graphs that include data sets in the data portals.
We labelled LOSD conflicts as Open or Solved.Conflicts are considered to be Solved when all data portals adopt the same practice and as Open when they result from different practices of data portals.Finally, we studied the practices suggested by nine LOSD experts [12] to understand which open conflicts can be currently resolved following a specific practice and which need to be further investigated.

Portals with Linked Open Statistical Data
Today, a large volume of LOSD is provided on the Web through dedicated open government data portals.This subsection describes the datasets of six official data portals launched by organizations and public authorities, namely the data portals of the Scottish and Japan's (e-Stat) governments, the data portal of the environmental department of the Flemish government (VLO), of DCLG in the UK, and the data portals that host the Italian (ISTAT) and Irish (Irish CSO) 2011 censuses.The details of all data portals including number of data cubes, number of observations, and others are presented in Table 1.
The Scottish Government provides official data on "Neighborhood Statistics" as LOSD.In particular, it provides access to 238 data cubes classified into 18 themes such as housing and transport.In addition, DCLG provides LOSD that describe various indicators including local government finance and housing and homelessness.In particular, they provide access to 167 data cubes classified into 14 themes (e.g., homelessness and societal well-being).VLO provides nine data cubes that describe gas emissions in Flanders as LOSD, while the portal site of the Official Statistics in Japan (e-Stat) provides 78 data cubes from seven sources of statistics including a population census, an economic census, and a labor force survey [23].Finally, two censuses have been published as LOSD: (i) the Italian census 2011 published by the Italian National Institute of Statistics (ISTAT) with 8 cubes, and (ii) the Irish Census 2011 with 682 cubes.
The number of measures and dimensions of the data cubes provided by the data portals relates to the number of published data cubes; many cubes usually implies a large number of distinct measures and dimensions.DCLG, for example, uses the largest number of measures (156) and distinct dimensions (205), while the data cubes of e-Stat have only 6 measures and 36 distinct dimensions.A special case is ISTAT that uses the same number (8) of data cubes and measures (each data cube has one distinct measure).
The geospatial and temporal dimensions are the most commonly used dimensions of the data cubes.All data portals use geospatial dimensions of different hierarchical levels (e.g., region, province, district).e-Stat uses the largest number of geospatial values (941,506) followed by ISTAT (426,725).A peculiarity of the Irish Census is that they publish different data cubes for each geographical level.For example, they provide 12 data cubes that measure unemployment, one for each of the 12 geographical levels, hence, resulting in many data cubes.In addition, all data portals except for the Irish Census use a temporal dimension and populate it with values from different levels (e.g., years, quarters).From the two censuses, the Irish CSO does not use a temporal dimension as all data refer to the same time period (i.e., 2011), while ISTAT uses a temporal dimension with a fixed value.All numerical values have been computed using SPARQL queries.For example, the SPARQL query presented in Figure 2 was used to count the number of distinct measure properties.
SELECT (COUNT(DISTINCT ?x) as ?NumberOfMeasures) WHERE { ?x a~qb:MeasureProperty } Finally, a large volume of LOSD are also published by several, unofficial third-party activities.For example, Eurostat's data portal for linked data (http://eurostat.linked-statistics.org/),which was created during a research project, includes more than 5000 cubes.Moreover, few statistical datasets from the European Central Bank, World Bank, UNESCO and other international organizations have been also transformed into linked data using the QB vocabulary in a third-party activity [24].Finally, 2011 census data from Greece [25] and historical censuses from the Netherlands [26] are also available as LOSD.

Interoperability Conflicts of Databases and Data Warehouses
A recent study [16] conducted a systematic literature review of the interoperability conflicts of traditional databases and data warehouses and identified two main types of conflicts that can be also applied to LOSD, namely schema conflicts and data conflicts (Figure 3).Schema conflicts regard the database model while data conflicts the data included in the database tables.Schema conflicts can be further classified to naming conflicts and structural conflicts.Correspondingly, data conflicts can be further classified into data scaling conflicts, data precision conflicts, data representation conflicts, and data value conflicts.
All types of conflicts are defined in the following subsections.In the definitions, we consider two database terms as "semantically similar" when they refer to the same concept, and "semantically unrelated" when they refer to different concepts.To facilitate the understanding of the definitions, we use the example of Figure 4.The example presents the database model and sample data of two datasets that describe a company's sales.The database model of dataset 1 includes three entities, namely "Product", "Sales", and "Date", while the database model of dataset 2 includes two entities, namely "Product" and "Sales".Accordingly, in the data level, dataset 1 includes three tables, while dataset 2 two tables.

Schema Conflicts
Schema conflicts result from using entities, relations, and/or attributes of data (i.e., the main components that can be used to model data according to the Entity Relationship model in different ways).Schema conflicts are classified into naming and structural conflicts.

•
Homonym conflicts.This type of conflict results from using similar names to define semantically different components.In our example, a homonym conflict results from the "weight" attribute of "Product" entity that means total weight in dataset 1 and net weight in dataset 2.

•
Synonyms conflicts.This type of conflict results from using different names to define semantically similar components.For example, "quantity" attribute of the "Sales" entity in dataset 1 and "volume" attribute of the corresponding entity in dataset 2 are different names used for the same concept.Using multiple languages can also result in synonym conflicts.For example, week (in English) and woche (in German) refer to the same concept using different language [31].
Structural conflicts result from modelling semantically similar components in a different way [15].The two types of structural conflicts include [14,27,28,[32][33][34]: • Schematic discrepancies.Re-organizing the logical structure of some attributes and their values to create a different structure in another database schema results in schematic discrepancies [13,28].For example, "date" is an entity in dataset 1 while in dataset 2 an attribute of the "Sale" entity.
A special case of this conflict is when the value of an entity's attribute in one database model is modelled as an attribute in another database model [28].For example, one of the values of the attribute "weight" of the "Product" entity in dataset 1 is "heavy" that corresponds to an attribute of the second dataset's "Product" entity.

•
Schema isomorphism.Defining different number of attributes in semantically similar entities results in schema isomorphism conflicts [13,27,28,33].For example, the "Product" entity of dataset 2 includes an extra attribute (i.e., "heavy") compared to the same entity of dataset 1.

Data Conflicts
Data conflicts result from incompatible or inconsistent data.Data conflicts are classified into data scaling, data precision, data representation, and data value conflicts.
Data scaling conflicts result from populating semantically similar attributes with values described by different units of measure [14,27,28,34,35].For instance, the "amount" attribute of dataset 1 is populated with values in euros while in dataset 2 with values in dollars.
Data precision conflicts result from populating semantically similar attributes with values that have different precisions [14,27,28,33].For example, the "weight" attribute of the "Product" entity in dataset 1 includes values such as "heavy", "medium", and "light" while in dataset 2 the "weight" of the "Product" entity is measured in kilograms (e.g., 40).Moreover, precision conflicts may also result from using different levels of accuracy.For example, the "weight" attribute may be measured with accuracy of milligrams in one database and, alternatively, with grams in another database.
Data representation conflicts result from populating semantically similar attributes with values that have the same unit of measure and precision but different formats resulting in data representation conflicts [13,27,28,33,34].For example, the values of attribute "vat" of the "Sales" entity in dataset 1 are expressed as a percentage (e.g., 23%) while in dataset 2 as a decimal (e.g., 0.23).Different formats are also commonly used when populating date attributes, e.g., "dd/mm/yy" versus "mm/dd/yyyy".
Data value conflicts [4,13,28,30,36] result from wrong or obsolete data or from employing different statistical methods [27].For example, the Television sales for Q1 2016 in dataset 1 are 13,000 e while in dataset 2 are $11,200 (the values are conflicting even after converting euros to dollars).

Interoperability Conflicts of Linked Open Statistical Data
In this Section we define schema interoperability conflicts in LOSD (Figure 5).The conflicts are classified into "Naming conflicts" (Section 6.1), "Schema isomorphism" (Section 6.2), and "Schematic discrepancies" (Section 6.3).For each conflict, we present its definition and the related modelling practices adopted by the six investigated data portals.
In the rest of this Section, we use some standard abbreviations for common linked data namespaces.These are presented in Table 2.  Table 2. Abbreviations for common namespaces used in the paper.

Naming Conflicts
One of the main principles of linked data is to model the components of data cubes (e.g., the measure properties, dimension properties, unit properties, code lists etc.) as RDF properties, i.e., using URIs [37].The W3C Working Group suggests re-using or extending standardized vocabularies whenever possible to facilitate inclusion and expansion of the Web of data, otherwise create new vocabularies [38].In this context, naming conflicts in LOSD result either from re-using values from different vocabularies and/or code lists, or from re-using values from the same vocabularies and/or code lists but in a different manner.For example, the URIs provided by the SDMX vocabulary (e.g., sdmx-dimension:sex) are commonly re-used for the dimension properties.In practice, however, data publishers may also select to define new URIs or create proprietary code lists when creating LOSD, resulting in naming conflicts.The two types of naming conflicts in LOSD are Homonym and Synonym conflicts.

Homonym Conflicts
Homonym conflicts result from naming semantically unrelated components of data cubes using the same URI.For example, a common modelling practice that results in homonym conflicts is to name semantically unrelated measures (e.g., unemployment, poverty, etc.) using the sdmx-measure:obsValue property [12].
Although homonym conflicts exist in the literature, the six data portals that were investigated in this study do not adopt modelling practices that result in homonym conflicts.This proves that in practice, this type of conflict is not critical.

Synonym Conflicts
Synonym conflicts result from naming semantically similar components of data cubes using different URIs.These components include: • Measure properties.For example, in case of a dataset that measures unemployment, the alternatives for modelling the measure property could be either to re-use sdmx-measure:obsValue property or to define a proprietary measure properties (e.g., eg:unemployment).
• Attribute properties.For example, the unit of measure can be represented by the sdmx-attribute:unitMeasure property or, alternatively, by a proprietary property (e.g., eg:unitMeasure).
• Dimension properties.For example, SDMX dimension properties (e.g., sdmx:refArea) are commonly re-used for common dimensions' properties (e.g., temporal, geospatial, gender, and age).An alternative could be to define a new dimension property (e.g., eg:geo) instead.• Code lists.For example, for the unit of the measure property, alternative practices include either to re-use the QUDT vocabulary or to re-use resources of the DBpedia vocabulary.

•
Hierarchical relation properties and levels of hierarchies.For example, hierarchical relations can be expressed using the dcterms:isPartOf and dcterms:hasPart properties or, alternatively, using new URIs.
The following paragraphs facilitate the understanding of LOSD synonym conflicts by elaborating the definition using the different practices adopted by data portals.

C1.1: Naming the Measure Property
The measures of data cubes are commonly modelled as RDF properties (i.e., using URIs).All data portals investigated in this paper define and use a proprietary measure property.As a result, synonym conflicts are created hampering the interoperability of datasets.A practice to address this conflict is to define each proprietary property as sub-property of sdmx-measure:obsValue.This practice is also suggested by the QB vocabulary specification because it facilitates readability and processing of the RDF datasets.However, this practice is also considered to be a redundancy because it does not provide additional semantic value to the measure [12].
Table 3 presents details about the practices used by data portals regarding the name of the measure property.
Table 4 summarizes the practices used for the names of the measure properties that result in synonym conflicts.The unit of the measure defines the quantity or increment used to count or describe the measure of a data cube.The unit of the measure property is commonly represented using URIs.DCLG, VLO, and the Scottish data portals re-use the sdmx-attribute:unitMeasure property for the unit of measure, a practice also suggested by the QB vocabulary.In addition, e-Stat defines and uses a proprietary attribute property (i.e., cd-attribute:unitMeasure), which is, however, linked with the sdmx-attribute:unitMeasure using the dcterms:relation property.The rest of the data portals do not use a unit of measure property.Table 5 presents the practices used by data portals.Finally, Table 6 presents the three distinct practices regarding the names used for the unit of measure property.Using different URIs to express semantically similar unit properties result in synonym conflicts, although e-Stat's practice could partially solve address this interoperability conflict.In addition, the fact that some data portals do not use units of measure may result in schema isomorphism conflicts as semantically similar data cubes will have different number of dimensions.

C1.3: Naming the Common Dimension Properties
The geospatial, temporal, gender, and age dimensions are the most common dimensions used to describe statistical data.Dimension properties in data cubes are commonly named with URIs and are defined in the structure of the data cube (i.e., qb:DataStructureDefinition).A challenge, hence, is to decide on the URIs that will be used for the common dimensions.
Regarding the geospatial dimension, most of the data portals re-use the sdmx-dimension:refArea property.Only VLO and ISTAT define and use proprietary properties (e.g., milieu:referentiegebied).VLO's property is rdfs:subPropertyOf of the sdmx-dimension:refArea.
VLO's property is rdfs:subPropertyOf sdmx-dimension:timePeriod, while e-Stat's property is related to sdmx-dimension:refPeriod as well as to cen:haAnno using the dcterms:relation property.Finally, Irish CSO does not use a temporal dimension because all observations refer to year 2011.
Regarding the gender dimension, most data portals (i.e., the Scottish data portal, ISTAT, and Irish CSO) define and use proprietary properties.e-Stat also defines and uses a proprietary property (i.e., cd-dimension:sex), which is related to sdmx-dimension:sex and cen:haSesso properties using the dcterms:relation property.DCLG re-uses the sdmx-dimension:sex property.VLO does not use a gender dimension.
As with the gender dimension, most data portals (i.e., the Scottish data portal, ISTAT, and Irish CSO) define and use proprietary properties for the age dimension.In particular, ISTAT defines various age properties for different age groups.For example, cen:haClasseEta15Anni property represents an age classification consisting of two categories: lower/higher 15 years, while cen:haClassiEta16Categorie represents 16 age-group categories.This practice allows using code lists that include only values that are used in the data cube.e-Stat also defines and uses a proprietary age property (i.e., cd-dimension:age), which is, however, related to the sdmx-dimension:age using the dcterms:relation property.DCLG re-uses the sdmx-dimension:age property.VLO does not use an age dimension.
All the approaches used by data portals for the geospatial, temporal, gender, and age dimensions are presented in Table 7.
Using different URIs for semantically similar dimensions results in synonym schema conflicts.Table 8 summarizes the four distinct practices used by data portals to represent the geospatial, temporal, gender, and age dimension properties.
Finally, the fact that some of the data portals do not use all the common dimensions also causes schema isomorphism conflicts.Statistical data often include hierarchical structures (e.g., geographical divisions).Hierarchical structures include generalization/specialization relations (e.g., Greece is part of Europe) and hierarchical levels (e.g., country, region, city).The QB vocabulary suggests using skos:narrower property (or define a sub-property of it) to define relationships in hierarchical code lists.However, it also suggests using qb:parentChildProperty in some cases (e.g., when publishers wish to be able to re-use existing code lists).
Nevertheless, some data portals (i.e., DCLG and the Scottish data portals) define and use proprietary properties (e.g., spatial:within, spatial:parent, and spatial:contains) to indicate relations in hierarchical structures.For example, DCLG defines that West Midlands (i.e., an English region) is spatial:within England.This practice, however, that does not use a standard vocabulary makes the interpretation of the data difficult.Moreover, e-Stat uses dcterms:isPartOf and dcterms:hasPart properties.For example, Toyota-Shi (a Japanese city) dcterms:isPartOf Aichi-ken (a Japanese prefecture).ISTAT, VLO, and Irish CSO do not define hierarchical relations.
Table 9 presents the practices of data portals regarding hierarchical relations and structures.Using different URIs for semantically similar hierarchical relations and hierarchical levels result in synonym conflicts.In addition, the lack of URIs in some data portals for the hierarchical relations and hierarchical levels result in schema isomorphism results.Table 10 presents the distinct practices for defining hierarchical relations.The distinct practices for defining hierarchical levels are presented in Table 11.The values of the units of measure are usually URIs extracted either from code lists (e.g., skos:ConceptScheme) or from vocabularies (e.g., the QUDT units Vocabulary (http://qudt.org/)).The QB vocabulary specification recommends re-using common code lists and vocabularies for the values of the unit of the measure.For example, DCLG uses (i) QUDT (e.g., qudt:Percent), (ii) DBpedia resources for currency units (e.g., http://dbpedia.org/resource/Pound_sterling),and also (iii) defines a new code list (i.e., http://opendatacommunities.org/def/concept-scheme/measure-units) with additional units such as http://opendatacommunities.org/def/concept/measure-units/pounds and http://opendatacommunities.org/def/concept/measure-units/pounds-per-hour.At the same time, the Scottish data portal, e-Stat, and VLO define their own code lists (http://statistics.gov.scot/def/concept-scheme/measure-units, cd-attribute:UnitMeasureConceptScheme, and https://id.milieuinfo.be/vocab/imjv/conceptscheme/eenheden#id respectively) with measurement units such as http://statistics.gov.scot/def/concept/measure-units/percentage,cd-attribute:code/unitMeasure-year, and https://id.milieuinfo.be/vocab/imjv/concept/eenheid/Meter#idrespectively.In VLO, some unit values are also related to QUDT using rdfs:SeeAlso.For example, https://id.milieuinfo.be/vocab/imjv/concept/eenheid/Meter#id rdfs:SeeAlso qudt:Meter.ISTAT and Irish CSO do not define units of measure.
Table 12 presents the practices used by data portals.Finally, Table 13 presents the four distinct practices of data portals.Using different URIs to express semantically similar units result in synonym conflicts.The values that are used to populate the temporal dimension can be drawn from a code list or a vocabulary or, alternatively, they can be encoded as data values (e.g., an xsd:dateTime).The QB vocabulary, for example, suggests re-using the reference.data.gov.ukvocabulary and declare this within the data structure definition of the data cube.
DCLG re-uses values from the reference.data.gov.ukvocabulary (e.g., http://reference.data.gov.uk/id/year/2015) while the Scottish data portal defines and uses a proprietary code list for each data set (e.g., http://statistics.gov.scot/def/code-list/house-sales-prices/refPeriod) that re-uses values from the reference.data.gov.ukvocabulary.Proprietary code lists allow defining additional values related to the ones included in existing code lists.On the contrary, ISTAT, e-Stat, and VLO use literal values for the temporal dimension (e.g., "2010"ˆˆxsd:gYear-eStat, "2011"-ISTAT, or 2010-VLO).Finally, Irish CSO does not define a temporal dimension.These different practices, however, may result in synonym conflicts when different values are used for the same temporal value.Table 14 presents the practices used by data portals.
Table 15 presents the distinct practices of the data portals for populating the temporal dimension.As with the other dimensions, the gender dimension is also commonly populated using URIs from code lists.The QB vocabulary recommends re-using directly the sdmx-dimension:sex property for the temporal dimension in order to be able to re-use the default code list for it that includes sdmx-code:sex-F (female), sdmx-code:sex-M (male), sdmx-code:sex-U (undefined), sdmx-code:sex-N (not applicable), and sdmx-code:sex-T (total).
Nevertheless, the Scottish data portal, the Irish CSO, and e-Stat define proprietary code lists (http: //statistics.gov.scot/def/concept-scheme/gender,http://data.cso.ie/census-2011/classification/gender, and cd-code:SexConceptScheme respectively) for the gender dimension.The first one with five values: male, female, all, unknown, and not-specified (e.g., http://statistics.gov.scot/def/concept/gender/not-specified), the second one with three values: male, female and both (e.g., http://data.cso.ie/census-2011/classification/gender/both), and the third one with three values: female, male, and all (e.g., cd-code:sex-all).The latest practice is usually preferred when there is a need for additional values that are not provided by the SDMX vocabulary.However, e-Stat's values are related to sdmx:sex values using skos:closeMatch.Finally, DCLG and ISTAT use the SDMX code list to populate gender dimensions, while VLO does not use a gender dimension.Table 16 presents the practices used by data portals.
Table 17 presents the distinct practices of data portals for populating the gender dimension.Using different URIs for semantically similar gender values results in synonym conflicts.Schema isomorphism conflicts result from defining a different number of components in semantically similar data cubes.The components may include: • Measure parameters.For example, a data portal may define just the measure of the data cube, while an alternative could be to define both the measure and the measure type (e.g., qb:measureType).• Measure properties.For example, defining a single or multiple measures per data cube.

•
Unit of measure parameters.For example, a data cube may define only the unit of the measure while another data cube may additionally define the unit multiplier, which is used to indicate the magnitude in the units of measurements (e.g., hundreds, thousands, tens of thousands etc.).• Units of measures.For example, in case of multiple units of measure, an option is to publish a data cubes for a single unit of measure and another option to publish one data cube with all units of measure.• Hierarchical levels.For example, all hierarchical levels could be defined in one data cube, but another option is to define one data cube per hierarchical level.
Finally, structural conflicts may result from not including a component in a data cube.For example, LOSD often describe census data of a specific year, e.g., ISTAT provides Italian census data only for 2011.In such case, alternative practices could be either to include the temporal dimension as usually, or to consider including the temporal dimension a redundancy and omit it.

C2.1: Modelling the Measure and Its Parameters
A measure of a data cube is a complex concept that can be described using multiple properties.According to the QB vocabulary, a measure is defined as qb:MeasureProperty.In case of multiple measures per data cube, the "measure dimension" approach of the QB vocabulary suggests declaring the measure dimension (i.e., qb:measureType) within the data structure definition.The value of the qb:measureType will denote which particular measure is being conveyed in each observation.On the contrary, the "multi-measure observations" approach of the QB vocabulary recommends attaching multiple measure values to each observation and, hence, the qb:measureType is not necessary.
Data portals adopt different practices regarding the measure and its parameters (Table 18).The Scottish data portal, e-Stat, and DCLG, for example, define both measure and measure type.Measure type is defined using the qb:measureType property.VLO also defines both measure and measure type, although it does not define multiple measures per data cube.However, in VLO, the type of the measure (which is actually the type of the pollutant) is defined using a proprietary property i.e., milieu:substantie.Finally, Irish CSO and ISTAT define only the measure of the data cubes.Irish CSO does not define multiple measures in observation, while ISTAT follows the "multi-measure observations" approach.Sometimes statistical data describe multiple measures that are closely related (e.g., sensor measurements).The QB vocabulary encourages defining multiple measures of similar or different nature in data cubes.Data portals, however, follow different practices in their data cubes (Table 20).In particular, the Scottish data portal, DCLG, ISTAT, and e-Stat define one or more measures per data cube, while Irish CSO and VLO define only one measure per data cube.The unit of the measure may be simple (e.g., count, currency) or derived (e.g., various types of ratios).Additional parameters such as multipliers, percentages, and others can be also described along with the unit of the measure.In particular, a unit can be described with one or more of the following parameters: (i) the unit, which is the quantity or increment by which something is counted or described (e.g., euros), (ii) the unit type, which is the kind of the unit (e.g., currency), and (iii) the unit multiplier, which is used to indicate the magnitude in the units of measurements (e.g., hundreds, thousands, tens of thousands etc.).The unit type enables grouping and associating similar units, while the unit multiplier facilitates the comprehension of data sets.
All data portals that use units of measure define only the unit of the measure except for e-Stat (Table 22).e-Stat defines the unit of the measure and the unit multiplier using the estat:unitMult property.ISTAT and Irish CSO do not use unit of measures.The distinct practices are presented in Table 23.The practices may result in schema isomorphism conflicts.In some cases, a measure may comprise more than one unit.For example, temperature can be measured either in Celsius or in Fahrenheit.In this case, publishers can choose to include all units in the same data cube, or define one data cube or each unit.DCLG and the Scottish data portal provide data cubes with more than one unit per measure (e.g., scottish:people and scottish:percentage-of-people-employed).The rest of the data portals define only one unit per measure, except for ISTAT and Irish CSO that do not use unit of measure.Table 24 presents the practices followed by data portals.
Table 24.Modelling practices of data portals for selecting number of units per measure.

Scottish
More than one unit per measure DCLG More than one unit per measure ISTAT -Irish CSO -VLO One unit per measure e-Stat One unit per measure The distinct practices followed by the data portals are presented in Table 25.
Table 25.Distinct practices of data portals for selecting number of units per measure.

Distinct Publishing Practice
P2.4.1:Publish several data cubes with a single unit per measure P2.4.2:Publish one data cube with multiple units per measure C2.5: Modelling Hierarchical Levels Data cubes usually include hierarchical dimensions (e.g., geographical divisions).Most data portals (i.e., the Scottish portal, DCLG, VLO and ISTAT) provide measurements for all levels of the hierarchical dimension in the same data cube (Table 26).On the contrary, the Irish CSO and e-Stat define separate data cubes for each hierarchical level.For example, the Irish CSO defines 11 data cubes that measure unemployment (one for each of the 11 geographical levels).In the same way, e-Stat, for example, provides two data cubes to measure welfare and social security; one for the city level (dataset http://data.e-stat.go.jp/lod/dataset/g00200502/d00200502jm) and one for the prefecture level (http://data.e-stat.go.jp/lod/dataset/g00200502/d00200502jp).
Table 26.Modelling practices of data portals for hierarchical levels.

Portal
Modelling Practice Scottish Define one data cube for all hierarchical levels DCLG Define one data cube for all hierarchical levels ISTAT Define one data cube for all hierarchical levels Irish CSO Define one data cube per hierarchical level VLO Define one data cube for all hierarchical levels e-Stat Define one data cube per hierarchical level Distinct practices are presented in Table 27.
Table 27.Distinct modelling practices of data portals for hierarchical levels.
Distinct Modelling Practice P2.5.1:Define several data cubes with a single hierarchical level P2.5.2:Define one cube with multiple hierarchical levels

Schematic Discrepancies
Schematic discrepancies result from using different logical constructs to represent the same set of data cube components.In particular, schematic discrepancies result while:

•
Modelling multiple measures.For example, a data portal may define multiple qb:MeasureProperty (i.e., one for each measure) in the qb:DataStructureDefinition, an instance of a single measure component in each observation, and an extra qb:measureType dimension property that expresses the measure used in the observation.Another data portal may define multiple qb:MeasureProperty in the qb:DataStructureDefinition and an instance of each defined measure component in each observation.

•
Defining the unit of the measure.For example, a practice could be to define the unit of measure at the qb:Observation level, or, alternatively, at the qb:MeasureProperty level.

•
Defining single value dimensions (i.e., dimensions with a common value).For example, the single value dimension could be defined at the qb:Dataset level, or, alternatively, at the qb:Slice level.

•
Defining aggregated values (i.e., values describing aggregated statistics).For example, a hierarchy can be defined by the aggregated value on the top level of the hierarchy, or, alternatively, a unique, generic, "total" URI (e.g., sdmx:total) could be used for all kind of aggregated values .

•
Defining the values of the temporal dimension.For example, the values of the temporal dimension could be (optionally) followed by its data type, i.e., "2011"ˆˆxsd:date.

•
Associating dimensions with potential values.For example, a practice could be to define the association using the qb:codeList property, or, alternatively, define the rdfs:range of the qb:DimensionProperty as a skos:Concept.

C3.1: Modelling Multiple Measures
As presented in C2.2, it is a common practice to define more than one measures in a data cube.Data portals use different practices to model multiple measures in the data cubes (Table 28).For example, the Scottish data portal, e-Stat, and DCLG adopt the "Measure dimension" practice of the QB vocabulary and define (i) multiple qb:MeasureProperty components in the qb:DataStructureDefinition of the data cube (one for each measure), (ii) an instance of a single measure component in each observation, and (iii) an extra qb:measureType dimension that denotes the measure used in the observation.On the contrary, ISTAT adopts the "Multi-measure observation" practice of the QB vocabulary and defines (i) multiple qb:MeasureProperty components in the qb:DataStructureDefinition of the data cube and (ii) an instance of each defined measure component in each observation.VLO and Irish CSO define only one measure per data cube, hence they are not related to this conflict.

Distinct Modelling Practice
P3.1.1:"Multi-measure observation" modelling practice P3.1.2:"Measure dimension" modelling practice C3.2: Defining the Unit of Measure in the Structure of the Cube According to the specification of the QB vocabulary, the unit of the measure can be defined at three levels of a data cube's structure i.e., the qb:DataSet level, the qb:MeasureProperty level, and the qb:Observation level.Each approach has its own advantages and disadvantages.For example, defining the unit measure at the qb:DataSet or qb:MeasureProperty level facilitates the identification of the unit of all observations directly from the data cube structure but, in case of multiple units per data cube, it is impossible to identify the unit of each observation.In addition, when the unit is defined at the qb:Observation level, one must iterate through all observations to identify all units used in the data cube.
All data portals that use units of measure define it at the qb:Observation level (Table 30).The only practice for defining the unit of measure is presented in Table 31.Sometimes, (e.g., in census data) all observations may have the same value in a dimension.For example, when the census data refer only to one year, the temporal dimension will have only one value.For these data cubes, the specification of the QB vocabulary allows the definition of a single value at the qb:Dataset level, at the qb:Slice level, or at the qb:Observation level.
Among the data portals we studied, only ISTAT and Irish CSO have published census data.ISTAT, that publishes the Italian 2011 census, follows the qb:Observation level approach and defines in each observation the same value for the temporal dimension (i.e., 2011) because all observations refer to this year.Although this approach makes it easy to identify the time reference of each observation, it loads data cubes with triples with the same information.At the same time, Irish CSO that publishes the Irish 2011 census, does not define a temporal dimension at all.All practices are presented in Table 32.All data portals use specific URIs to express aggregated values of each dimension.For example, ISTAT re-uses sdmx-code:sex-T for the gender dimension and defines http://datiopen.istat.it/odi/ontologia/censimento/StatoCivile5CatTotale to express the total sum of all marital statuses.In addition, VLO defines https://id.milieuinfo.be/vocab/imjv/concept/substantie/totaal_stof#idfor all substances.
The distinct practices, hence, used for aggregated values is presented in Table 34.As already described in C1.6, values for the temporal dimension may be extracted from code list or, alternatively, a data value can be used instead.Different practices may also cause schematic discrepancies conflicts.For example, ISTAT and e-Stat define the value of the temporal dimension along with its data type, i.e., "2011"ˆˆxsd:date or "2014"ˆˆxsd:gYear), VLO only the value of the temporal dimension (e.g., 2011) while all the other portals use URIs from code list e.g., http://reference.data.gov.uk/id/year/2016.Irish CSO does not use a temporal dimension.All practices are presented in Table 35.The distinct practices, hence, used for the values of the temporal dimension are presented in Table 36.The above practices may result in schematic discrepancies conflicts.

C3.6: Associating Dimensions to Code Lists
According to the QB vocabulary, the modelling of the possible values of a dimension is described using the rdfs:range property of the dimension in the usual RDF manner.For example, the values of a temporal dimension might be represented using, e.g., xsd:time or as URIs.URIs can also be grouped into code lists (i.e., skos:ConceptScheme,skos:Collection, orqb:HierarchicalCodeList) or into reference datasets (e.g., http://reference.data.gov.uk to facilitate their re-use.In the case of code lists, it is useful to easily identify the overall code list a value comes from.
Linked data portals use various practices for associating the dimensions of the data cubes with their potential values.ISTAT, for example, uses the qb:codeList property to associate a qb:DimensionProperty with a skos:ConceptScheme that indicates the list of skos:Concept that may be used as values of the dimension.e-Stat and DCLG uses the same approach, but also defines the rdfs:range of the qb:DimensionProperty as a sub-class of skos:Concept.VLO and Irish CSO use a different approach and define the rdfs:range of the qb:DimensionProperty as a skos:Concept.Finally, the Scottish government follows a hybrid approach; it adopts e-Stat's and DCLG's practice for some dimensions and, at the same time, defines the rdfs:range of other dimensions as a sub-class of skos:Concept.Table 37 presents the approaches followed by the data portals.
The different practices can be, hence, summarized in Table 38.
Table 37. Modelling practices of data portals for associating dimensions to code lists.

Status of LOSD Interoperability Conflicts
In this Section we identify and present the status of the 19 interoperability conflicts in LOSD.We consider two statuses of interoperability conflicts, namely (i) Solved and (ii) Open, and label conflicts accordingly.A conflict is labelled as Solved when all data portals adopt the same modelling practice and as Open when they result from different modelling practices of data portals.We also evaluate the resolvability of conflicts against the related opinion of LOSD experts expressed in a recently published study [12].In particular, we consider that a conflict is resolvable when, although it results from different practices of data portals, LOSD experts have already proposed following a specific practice that resolves it.Table 39 presents the conflicts along with their status and resolvable labels.The rest of this Section discusses the status and resolvability of interoperability conflicts.
As already stated in Section 6.1.1,the modelling practices followed by the data portals do not result in homonym conflicts, hence homonym conflicts are considered to be solved.On the contrary, all synonym conflicts (C1.1-C1.7)are still open, mainly because data portals define their proprietary properties and values.According to LOSD experts, some of these conflicts can be resolved while others are currently irresolvable.Regarding the measure property (C1.1), although LOSD experts suggest defining proprietary properties, they also consider that this approach does not fully address all challenges.We, hence, consider this conflict as irresolvable.In addition, LOSD experts suggest using sdmx-attribute:unitMeasure for the unit of measure property (C1.2) to be able to assign values that are not part of a code list.If, however, annotation with additional properties (e.g., label, code list) is required, they suggest defining a proprietary property that is sub-property of sdmx-attribute:unitMeasure.Both suggestions facilitate interoperability, and, hence, this conflict is labelled as resolvable.For the geospatial, age, and temporal dimension properties (C1.3a,C1.3c, C1.3d), LOSD experts suggest defining a proprietary qb:DimensionProperty as rdfs:subPropertyOf the corresponding SDMX dimension (e.g., sdmx-dimension:refArea for the geospatial dimension).For the gender dimension (C1.3b), however, LOSD experts suggest creating a proprietary property along with a code list when more values (e.g., hermaphroditism) are needed apart from those included in the code list linked to existing SDMX property (i.e., sdmx-dimension:sex).C1.3b, hence, cannot be resolved with existing practices.
For the URIs of the hierarchical relations (C1.4),LOSD experts suggest using SKOS to model hierarchical relations in simple hierarchies and XKOS when the hierarchical levels are fully separated, and depth is a meaningful concept.When, publishers need additional relations than the ones offered by SKOS and XKOS, the QB vocabulary's properties (e.g., administeredBy in contrast to within) are suggested.C1.4 is, hence, considered to be resolvable.For the values of the unit of measure (C1.5),LOSD experts suggest re-using QUDT, dbpedia, or other code list when QUDT does not address data needs.In addition, LOSD experts suggest different practices for populating the values of temporal properties describing time periods and points of time (C1.6).In particular, in case of a period of time, they suggest defining rdfs:range interval:Interval and, hence, re-use the values from http://reference.data.gov.uk if it is sufficient, otherwise create a new code list.In case of a specific point of time, they suggest defining rdfs:range xsd:dateTime.As a result, we currently consider C1.6 as irresolvable.Finally, LOSD experts suggest re-using the SDMX vocabulary for the values of the gender dimension (C1.7).However, as already mentioned, these values may not be sufficient, hence, they suggest creating a new code list for the gender dimension.As a result, we also consider C1.7 currently irresolvable.
Structural conflicts (C2.1-2.5 and C3.1-3.6)result from different practices of modelling semantically similar components and, hence, hamper interoperability.Starting with schema isomorphism conflicts, some data portals model only one or both of the measure property and measure type.LOSD experts do not have a suggestion for modelling the measure type in addition to the measure property, hence, this conflict is considered irresolvable.In addition, some data portals create data cubes with a single measure, while others with multiple measures (C2.2).LOSD experts, suggest defining multiple measures in data cubes only when the measures are closely related to a single observational event.However, it is not always clear whether and how independent are the measures and, sometimes, this depends on how someone will use the data.As a result, there is not a clear suggestion on which practice to use for modelling multiple measures and, hence, C2.2 is currently irresolvable.In addition, although data portals use different practices for modelling the unit of measure and its parameters (C2.3),LOSD experts do not have a suggestion and, hence, C2.3 is still irresolvable.Data portals also create data cubes with one or more units per measure (C2.4).LOSD experts, however, suggest defining a single data cube with multiple units to include all related units of the same measure in the same cube.C2.4 is, hence, resolvable.Finally, data portals create data cubes with one or more hierarchical levels (C2.5).No related practice, however, has been suggested by LOSD experts.
In schematic discrepancies, data portals also use various practices to model multiple measures in a data cube (C3.1).LOSD experts believe that the approach to be followed should be decided by the data cube publisher.As a result, C3.1 is currently irresolvable.Regarding C3.2, data portals follow the same practice.In addition, regarding dimensions with a single value (C3.3),LOSD experts suggest always defining single value dimensions in all observations to facilitate adding more observations with different values in the same dataset.All data portals define a proprietary property for aggregated values (C3.4).LOSD experts, however, suggest defining aggregated values on top of hierarchies.Different practices are also used for modelling the values of the temporal dimension (C3.5).As already described in the previous paragraphs (C1.6), LOSD experts suggest different practices for time periods and points of time that could be followed to solve this conflict.As a result, we consider C3.5 resolvable.Finally, data portals adopt various practices for associating dimensions to their code lists (C3.6).LOSD experts suggest that if the code list is modelled as skos:ConceptScheme, qb:HierarchicalCodeList, or skos:Collection, then it should be associated with the qb:DimensionProperty using the qb:codeList property.In addition, the object that is related to the rdfs:range property should be set to skos:Concept.C3.6 is, hence, considered solvable.

Discussion
Interoperability among data cubes is crucial to unleash the full potential of LOSD.For example, it will enable performing combined analytics and visualizations on data published by different governments and offices for national statistics.Currently, although official open government data portals provide their statistical data as LOSD using standard vocabularies such as the QB vocabulary and SKOS, they adopt different practices thus hampering data interoperability.
This paper proposes an Interoperability Framework with definitions of LOSD interoperability conflicts.To this end, we combined a top-down approach (that translated interoperability conflicts of traditional databases and data warehouses to LOSD) with a bottom-up approach (that studied modelling practices of official open government data portals) and defined the two types of LOSD interoperability conflicts, namely naming and structural conflicts along with their sub-types.The sub-types of naming conflicts are homonyms and synonyms, and of structural conflicts isomorphism and schematic discrepancies.We resulted in 19 conflicts classified in the above types.Most of them (11) are structural conflicts and the rest of them (eight) naming.Two of the naming conflicts can also be classified as structural.From the structural conflicts, five are schema isomorphisms and six schema discrepancies.
We studied the conflicts to understand their current state, i.e., which of them have been already addressed and which are still open.Only two of the conflicts are considered to be solved, meaning that all data portals use the same modelling practice.The rest of them are still open.For 11 of the open conflicts, however, LOSD experts have suggested practices that could be used to resolve them.The rest of them need to be further investigated on how to resolve them.
This study focused on facilitating the creation of interoperable data cubes.However, enhancing interoperability is not the only challenge that should be considered when creating LOSD.Additional challenges include, for example, to facilitate easy access of the data cubes, to ensure high performance when accessing the data cubes, or to facilitate the interpretation of data cubes.In this context, some practices may facilitate interoperability, but at the same time, have a negative influence, for example, on the performance of data cubes.For instance, using a common practice for defining single values at the qb:Obervation level may facilitate interoperability, but also increases the number of triples and, hence, influence performance.In addition, some practices that facilitate interoperability, may need to be enhanced to also address other types of challenges.For example, regarding the modelling of aggregated values (C3.4), data portals define proprietary properties.However, it is very important to differentiate these values from other values to avoid inaccurate interpretation of data (e.g., from duplicate the values) [12].Another example regards the conflict related to the definition of the unit of measure.Our study showed that all data portals define the unit of the measure at the qb:Observation level and, hence, we considered this conflict as Solved.However, it is also suggested to additionally define the unit of measure at the "qb:DataSet" level to facilitate the retrieval of the available units in a cube [12].Finally, in some cases, there is no practice that addresses all types of challenges.As a result, a publisher may have to select between addressing a type of challenge in favor of another, e.g., facilitate interoperability but sacrifice performance.
The next step could, hence, be first to identify and define all types of challenges (e.g., performance, accessibility etc.) that should be considered when creating LOSD and, second, to define an overall evaluation framework that could be used to evaluate modelling practices of LOSD based on all types of challenges.

Figure 2 .
Figure 2. The SPARQL query used for counting the distinct number of measure properties.

Figure 3 .
Figure 3. Classification of traditional database and data warehouse conflicts.

Figure 4 .
Figure 4. Example of two database schema along with their data.

7- 8
September 2016, Guimarães, Portugal 13 3rd plenary Meeting -OpenGovIntelligence C1.3 URIs for the geospatial, temporal, gender, & age dimensions C1.5 Code list for the unit of measure C1.2 URI for the unit of the measure property C1.1 URI for measure property C1.4 URIs for hierarchical structures C1.6 Code list for the temporal dimension C1.7 Code list for the gender dimension C2.1 Measure parameters C2.2 Number of measures in a data cube C2.3 Unit parameters C2.4 Number of units per measure C2.5 Hierarchical levels per data cube C3.1 Multiple measures C3.2 Structural definition of the unit of measure C3.3 Dimensions with a single value C3.4 Aggregated values C3.5 Values of the temporal dimension C3.6 Association of dimensions to potential values Measure related Unit of measure related Dimension related Code list related

1 :
Define and use only the measure P2.1.2:Define and use both measure and measure type (qb:measureType) P2.1.3:Define and use both measure and measure type (new property) C2.2: Selecting the Number of Measures

1 :
Define the dimension at the qb:Observation level P3.3.2:Do not define a temporal dimension The above practices result in schematic discrepancies conflicts.C3.4: Modelling Aggregated Values Aggregated values are single values calculated by performing mathematical operations (such as counting or summing) on a group of other values.Aggregated values are often used to populate the dimensions of data cubes.For example, the gender dimension's values may include sdmx-code:sex-T (total) among others.It is important to semantically distinguish aggregated values from the rest of the values used in the dimension to prevent potential errors such as including the aggregated values in new mathematical calculations on top of the observations of the dataset.

)
Defines the rdfs:range as a sub-class of skos:Concept DCLG Uses the qb:codeList property to associate a qb:DimensionProperty with a skos:ConceptScheme and defines the rdfs:range of the qb:DimensionProperty as a sub-class of skos:Concept ISTAT Uses qb:codeList to associate a qb:DimensionProperty with a skos:ConceptScheme Irish CSO Defines the rdfs:range of the qb:DimensionProperty as a skos:Concept VLO Defines the rdfs:range of the qb:DimensionProperty as a skos:Concept e-Stat Uses the qb:codeList property to associate a qb:DimensionProperty with a skos:ConceptScheme and defines the rdfs:range of the qb:DimensionProperty as a sub-class of skos:Concept Table 38.Distinct modelling practices of data portals for associating the dimensions to code lists.Distinct Modelling Practice P3.6.1:Use the qb:codeList property P3.6.2:Use the qb:codeList property & define the rdfs:range of the qb:DimensionProperty as a sub-class of skos:Concept P3.6.3:Define the rdfs:range of the qb:DimensionProperty as a skos:Concept P3.6.4:Use two approaches.(1) Associate the code list using the qb:codeList property.(2) Define the rdfs:range as a sub-class of skos:Concept

Table 1 .
Characteristics of official data portals.

Table 3 .
Modelling practices of data portals for the name of the measure property.

Table 4 .
Distinct modelling practices used for naming the measure property.
C1.2: Naming the Unit of Measure Property

Table 5 .
Modelling practices of data portals for naming the unit property.

Table 6 .
Distinct modelling practices for naming the unit of measure property.Re-use sdmx-attribute:unitMeasure property P1.2.2: Define and use a proprietary attribute that is also linked with the sdmx-attribute:unitMeasure using the dcterms:relation property P1.2.3: Do not use unit of measure

Table 7 .
Modelling practices used by data portals for the URI of the geospatial, temporal, sex and age dimensions.

Table 8 .
Distinct modelling practices for naming the geospatial, temporal, sex and age dimensions.

Table 9 .
Modelling practices used by data portals for hierarchical structures.

Table 10 .
Distinct modelling practices used by data portals for hierarchical relations.

Table 12 .
Modelling practices of data portals regarding the values (code list) of the unit of measure.

Table 14 .
Modelling practices of data portals for populating the temporal dimension.

Table 15 .
Distinct modelling practices of data portals for populating the temporal dimension.

Table 16 .
Modelling practices of data portals for populating the gender dimension.

Table 17 .
Distinct modelling practices of data portals for populating the gender dimension.Structural conflicts are related to modelling the QB components in the structure of the data cubes.The two types of structural conflicts in LOSD include Schema isomorphism and Schematic discrepancies conflicts.

Table 18 .
Modelling practices of data portals for the measure and its parameters.

Table 19
presents the distinct practices the data portals.The practices may result in schema isomorphism conflicts as two conceptually similar data cubes may have different number of components.

Table 19 .
Distinct modelling practices of data portals for the measure and its parameters.

Table 20 .
Modelling practices of data portals for multiple measures.

Table 21
presents the distinct practices of data portals for modelling multiple measures.The different practices result in schema isomorphism conflicts.

Table 21 .
Distinct modelling practices of data portals for the number of measures.

Table 22 .
Publishing modelling practices of data portals for the unit of the measure.

Table 23 .
Distinct modelling practices of data portals regarding the number of measures.

Table 28 .
Modelling practices of data portals for multiple measures in data cubes.

Table 29
presents the distinct practices used by the data portals to model multiple measures in data cubes that result in schematic discrepancies conflicts.

Table 29 .
Distinct modelling practices of data portals for multiple measures in data cubes.

Table 30 .
Modelling practices of data portals regarding the structural definition of the unit of measure.

Table 31 .
Single practice of data portals regarding the structural definition of the unit of measure.Define the unit of measure at the qb:Observation level C3.3: Defining Dimensions with a Single Value in the Structure of the Data Cube

Table 32 .
Modelling practices of data portals for defining dimensions with a single value in the structure of the data cube.

Table 33
presents the distinct practices for defining dimensions with a single value in the structure of the data cube.

Table 33 .
Distinct modelling practices of data portals for defining dimensions with a single value in the structure of the data cube.

Table 34 .
Single modelling practice of data portals regarding aggregated values.

Table 35 .
Modelling practices of data portals regarding the values of the temporal dimension.

Table 36 .
Distinct modelling practices of data portals regarding the values of the temporal dimension.