Semantic Representation and Location Provenance of Cultural Heritage Information : the National Gallery Collection in London

This paper describes a working example of semantically modelling cultural heritage information and data from the National Gallery collection in London. The paper discusses the process of semantically representing and enriching the available cultural heritage data, and reveals the challenges of semantically expressing interrelations and groupings among the physical items, the venue and the available digital resources. The paper also highlights the challenges in the creation of the conceptual model of the National Gallery as a Venue, which aims to i) describe and understand the correlation between the parts of a building and the whole; ii) to record and express the semantic relationships among the building components with the building as a whole; and iii) to be able to record the accurate location of objects within space and capture their provenance in terms of changes of location. The outcome of this research is the CrossCult venue ontology, a fully International Committee for Documentation Conceptual Reference Model (CIDOC-CRM) compliant structure developed in the context of the CrossCult project. The proposed ontology attempts to model the spatial arrangements of the different types of cultural heritage venues considered in the project: from small museums to open air archaeological sites and whole cities.


Introduction
CrossCult 1 , Empowering reuse of digital cultural heritage in context-aware crosscuts of European history, is a three-year H2020 research project, which started in March 2016.It consists of 11 European institutions and 14 associated partners, from Computer Science, History and Cultural Heritage.The goal of the project is to spur a change in the way European citizens appraise History, fostering the re-interpretation of what they may have learnt in the light of cross-border interconnections among pieces of cultural heritage, other citizen's viewpoints and physical venues.Four distinct Pilots contribute data to the CrossCult project covering a unique range of cultural heritage venues across Europe; from the large venue of National Gallery in London (Pilot 1) to the considerably smaller venue of the Archaeological Museum in Tripolis, Greece (Pilot 3), from the archaeological sites of the Roman healing spa of Lugo in Spain, of Chaves in Portugal, of Montegrotto Terme in Italy and the Ancient theatre of Epidaurus in Greece (Pilot 2) to the historical points of interest in the cities of Luxembourg in Luxembourg and Valletta in Malta (Pilot 4).
For over a decade the field of Cultural Heritage has received significant attention in application of Semantic Web technologies, aimed at facilitating a harmonised and interoperable access over heterogeneous resources [1].A fundamental challenge in dealing with Cultural Heritage data is to make the content mutually interoperable, so that it can be searched, linked, and presented in a harmonised way across the boundaries of the datasets and data silos.The difficulty of finding and relating information in this kind of heterogeneous content provision and data format environment creates an obstacle for end-users of cultural contents, and a challenge to organisations and communities producing the contents.The CrossCult project ingests a wide range of diverse data associated to Cultural Heritage objects, events and subjects that span from antiquity to modern times.Such disparate data means there is a wide array of formats, technologies, management and classification approaches relevant to each data provider or source.Hence, modelling such data in a coherent way to enable interoperability among the Pilots requires addressing the diversity of content types, data formats, and levels of data detail between the four pilots.Semantic Web technologies ease access to Cultural Heritage content facilitating new ways of engaging with heritage by the general public and experts that go beyond a simple interactive engagement.They provide an intelligent integration of resources via machine readable and human interpretable representations of a domain of knowledge (i.e.ontology), enabling retrieval, reasoning, optimal data integration and knowledge reuse of disparate cultural heritage resources.The benefits of Semantic Web technologies to Cultural Heritage are evident in literature including; a harmonised view to disparate and distributed contents, intelligent content aggregation, semantic search-browsing and recommendation, content enrichment and reuse [2,3].In this respect, the Conceptual Reference Model (CRM) of the International Council of Museums (ICOM)-International Committee for Documentation (CIDOC), CIDOC-CRM (ISO 21127:2014), provides an object-oriented schema based on real world concepts and events implementing data harmonisation based on the relationships between things rather than artificial generalisations and fixed field schemas.
The model has been gaining increased popularity and it is considered to be a major standard in knowledge representation of Cultural Heritage data.Adopted by numerous small and large scale projects, it offers rich semantic representation and rigour definitions, sympathetic to the data and the different and varied perspectives of the cultural heritage community [4].The CLAROS (Oxford University) project [5] was one of the first cases (2010) to provide interoperability over a large collection of cultural heritage data (20 million records) using CIDOC-CRM as the underlined semantic layer.Since then prestigious CH institutions, such as the British Museum (BM) [6] and the American Numismatic Society (ANS) [7] have pursed projects that advance knowledge representation and content provision of their collections using CIDOC-CRM semantics.The BM ResearchSpace 2 is a Semantic Web platform that provides a collaborative research environment for uncovering relationships and connections between CIDOC-CRM harmonised datasets, whereas the ANS Kerameikos initiative proposes use of CIDOC-CRM semantics for normalising classical pottery databases to facilitate large scale data aggregation and subsequent analyses.In addition, the EU FP7 ARIADNE Infrastructure aimed at integrating existing archaeological distributed and disparate data across Europe has used CIDOC-CRM as the backbone of the ARIADNE Reference Model [8].
Our first step for achieving interoperability (at the semantic level) was to adopt the CIDOC Conceptual Reference Model 3 as the core conceptual component of the CrossCult Knowledge Base (CCKB), a semantic knowledge base that stores all Pilots' data.The employment of CIDOC-CRM 2 https://www.researchspace.org/Accessed January 29, 2019.Heritage 2019, 2 650 enabled us to integrate the disparate datasets of the four project Pilots and their metadata under a common semantic layer driving cross-search and inference capabilities.On the other hand, CIDOC-CRM as a formal and generic structure of concepts and relationships is not tied to any particular vocabulary of types, terms and individuals.In order to address the vocabulary needs of the project and enable interoperability also at the syntactic level, we developed and connected to CIDOC-CRM an additional vocabulary structure, which integrates terms from standard external glossaries and thesauri.A detailed description of the CCKB and its main components can be found in Section 2.
One of the requirements for the CCKB is to store semantic descriptions not only of the collections of the four Pilots (objects and Points of Interest), but also of the Venues themselves, building from a generic venue description structure.Metadata standards for the documentation of the built heritage and the archaeological complexes attempt to record the semantics of the building's components but have in the past often failed in describing the completeness of information about the building and the relationships among the parts and the whole.The aim of the conceptual model of Venues in the CCKB is to i) describe and understand the correlation between the parts of a building and the whole; ii) to record and express the semantic relationships among the building components with the building as a whole; and iii) to be able to record the accurate location of objects within space and capture their provenance in terms of changes of location.A detailed description of the CCKB and the conceptual model of the proposed CrossCult Venue Ontology and its main components can be found in Section 2.
The remainder of the paper is focused on Pilot 1, whose aim is to demonstrate how the CrossCult platform can facilitate the discovery and exploration of connections between objects (paintings), subjects depicted, people (painters) and events (painting creation) across European history.In recent years the National Gallery London (NG) has contributed to a number of collaborative documentation research and development projects.From examining searching and the semantic web in EU projects like ARTISTE 4 and SCULPTEUR 5 , to general information resource building with the Andrew W. Mellon Foundation funded Raphael Research Resource 6 , and the current H2020 7 projects, developing the potential of cultural heritage digital documentation in IPERION CH 8 and CrossCult.This research has examined and developed a variety of processes and tools to facilitate the gathering, storage, use and presentation of cultural heritage related material and has led to the work presented in this paper; using international standards to interact with large numbers of images, combining separate sources of digital information and mapping the complex semantic relationships that connect them together.
During the first two years of the CrossCult project we focused on aggregating the NG data and developing the semantic definition of the NG collection information (an example of this mapping is available in Section 4).This detailed definition allowed us to describe how we can structure and store the varied complex relationships and connections between paintings, artists and materials and map these relationships to the agreed project ontology.We also tried to record the actual location of paintings to a specific point on a wall, moving beyond the simple room location data that has been available at the existing NG dataset.The process of how we expanded the existing NG data to cover more detailed painting location information and the mechanism to track how it changes over time can be found in Section 3. The paper concludes in Section 5 summarising the presented work.

The CrossCult Knowledge Base
The CrossCult Knowledge Base (CCKB) [9] is a multi-layered structure of semantics aimed at facilitating interoperable connections between cultural heritage data.Based on maximum reuse of well-established technologies, it incorporates a set of standard Semantic-Web technologies and formats to support the data modelling requirements and objectives of CrossCult project.The CCKB stack (see Figure 1) illustrates the architecture of the knowledge base, where each section carries different semantics: a) the bottom section carries the semantics of different standard ontological schemas adopted in the CCKB; b) the middle section accommodates the project-specific cultural heritage semantics; c) the side section refers to the complementary CrossCult Classification Scheme (CCCS) vocabulary; and d) the top section to the representation of venues and users.
Heritage 2019, 2 FOR PEER REVIEW  4   The CrossCult Knowledge Base (CCKB) [9] is a multi-layered structure of semantics aimed at facilitating interoperable connections between cultural heritage data.Based on maximum reuse of well-established technologies, it incorporates a set of standard Semantic-Web technologies and formats to support the data modelling requirements and objectives of CrossCult project.The CCKB stack (see Figure 1) illustrates the architecture of the knowledge base, where each section carries different semantics: a) the bottom section carries the semantics of different standard ontological schemas adopted in the CCKB; b) the middle section accommodates the project-specific cultural heritage semantics; c) the side section refers to the complementary CrossCult Classification Scheme (CCCS) vocabulary; and d) the top section to the representation of venues and users.The four schemas of the bottom section constitute the foundation of the architecture with CIDOC-CRM being the most prominent.The framework is complemented by the semantics of the Simple Knowledge Organization System (SKOS) 9 ; the Dublin Core Schema, a standard vocabulary for describing web resources; and the FOAF (Friend-Of-A-Friend) 10 ontology, which is used for mediating the semantics between the User Ontology layer and the Upper-Level Ontology layer in terms of describing user related entities and their interests.The middle layer accommodates the semantics of the Upper-level ontology, which is defined as a generic conceptual structure for accommodating common concepts and relationships across a diverse range of cultural heritage data.To this aim, CIDOC-CRM as the core model of the layer guarantees the use of well-defined and interoperable semantics, whilst allowing for project-specific specialisations which address the requirements of reflection, holistic understanding and reinterpretation of the European history.
On the other hand, CIDOC-CRM as a formal and generic structure of concepts and relationships is not tied to any particular vocabulary of types, terms and individuals.The particular need for an additional level of vocabulary-based semantics is addressed by the side section, which accommodates the faceted vocabulary structure CrossCult Classification Scheme (CCCS).The scheme provides skosified concepts to the middle and top layer of the architecture which are linked to ontology instances via the P67.refers to or P2. has type properties.The role of CCCS is not to classify objects according to their characteristics, which is handled by the ontology, but to provide a9 https://www.w3.org/2004/02/skos/Accessed January 29, 2019.The four schemas of the bottom section constitute the foundation of the architecture with CIDOC-CRM being the most prominent.The framework is complemented by the semantics of the Simple Knowledge Organization System (SKOS) 9 ; the Dublin Core Schema, a standard vocabulary for describing web resources; and the FOAF (Friend-Of-A-Friend) 10 ontology, which is used for mediating the semantics between the User Ontology layer and the Upper-Level Ontology layer in terms of describing user related entities and their interests.The middle layer accommodates the semantics of the Upper-level ontology, which is defined as a generic conceptual structure for accommodating common concepts and relationships across a diverse range of cultural heritage data.To this aim, CIDOC-CRM as the core model of the layer guarantees the use of well-defined and interoperable semantics, whilst allowing for project-specific specialisations which address the requirements of reflection, holistic understanding and reinterpretation of the European history.
On the other hand, CIDOC-CRM as a formal and generic structure of concepts and relationships is not tied to any particular vocabulary of types, terms and individuals.The particular need for an additional level of vocabulary-based semantics is addressed by the side section, which accommodates the faceted vocabulary structure CrossCult Classification Scheme (CCCS).The scheme provides skosified concepts to the middle and top layer of the architecture which are linked to ontology instances via the P67.refers to or P2. has type properties.The role of CCCS is not to classify objects according to their characteristics, which is handled by the ontology, but to provide a supplementary layer of terminology (subjects, types etc.) that can be useful during retrieval.Wherever possible, CCCS concepts are linked to external semantic definitions from standard thesauri resources such as, the Arts and Architecture Thesaurus of Getty (AAT) 11 , the EUROVOC 12 , the UNESCO Thesaurus and the Library of Congress Subject Authorities (LC) vocabulary 13 .The CCCS polyhierarchical structure also allows for concepts to be linked to multiple parents, thus one concept may appear in multiple hierarchical views.The CCCS was developed using the TemaTres 14 , a web application for managing documentation languages, oriented to the development of hierarchical thesauri, on which several editors can be working at the same time.It allows both a systematic and an alphabetical list of terms, and offers different options to perform searches, such as simple search or expanded search through related or hierarchical terms.
The top section of the architecture contains the Venue and the User ontologies.The Venue ontology is a fully CIDOC-CRM compliant structure, which aims to model the spatial arrangements of the different venues that participate in the project.The User ontology is a CrossCult centric structure aimed at supporting the user modelling requirements of the project with respect to the user interests, visit experience, user background and other demographic information.The ontology combines elements from the Friend of a Friend (FOAF) and CIDOC-CRM models while it introduces project-specific classes and properties to address particular user modelling requirements, such as fatigue, prior knowledge and behaviour.

CrossCult Upper level Ontology
The CrossCult Upper-level ontology is a single and generic conceptual structure that acts as a semantic layer of common concepts and relationships across the four pilots of the project.It delivers formalisms and conceptual arrangements which enable augmentation, linking, semantic-based reasoning and retrieval across disparate data resources.The ontology builds on standard Semantic Web technologies and maintains full compatibility with CIDOC-CRM, containing the least minimum set of CRM concepts as described in the latest specification document version 6.2.3.Aimed at maximum reuse of established Semantic-Web definitions, the structure is written in OWL2 15 , following the Erlangen CRM 16 (version 140220) implementation and complemented by SKOS, FOAF (Friend-Of-A-Friend) and Dublin Core 17 semantics.Project-specific entities which address the requirements of reflection, holistic understanding and reinterpretation of the European history are also incorporated in the ontology whilst a selected set of ontology instances is enriched with links to DBpedia concepts 18 .
Figure 2 presents the core elements (classes and properties) of the Upper-level ontology and the modelling arrangements of the common semantics across the four project pilots for modelling cultural heritage objects.At the core of the model resides the CIDOC-CRM entity E18.Physical Item, which comprises all persistent physical items with a relatively stable form, man-made or natural.The entity enables the representation of a vast range of items of interest, such as museum exhibits, gallery paintings, artifacts, monuments and points of interest, whilst providing extensions to specialised Heritage 2019, 2 653 entity definitions of targeted semantics for man-made objects, physical objects and physical features.The arrangement benefits from a range of relationships between E18.Physical Item and a set of entities that describe the static parameters of an item, such as dimension, unique identifier, title and type.The model also allows the description of more complex objects through a composition of individual items (i.e., P46.is_composed_of).Moreover, the well-defined semantics enable rendering of rich relationships between the physical item and entities describing the item in terms of ownership, production, location, and other conceptual associations.The project-specific property reflects enables specific, direct connections between existing concepts and the CrossCult class Reflective Topic.
and his doctor, about 1648-9).The painting is modelled as an instance of E22.Man-Made Object uniquely identified by a National Gallery (UK) reference and associated with a skosified type (e.g.Canvas painting).The modelling of typical information about the painting such as its size, material, medium and support, date of production and ownership is not different from the approach proposed by the CIDOC-CRM official tutorial and evident in well-known projects, such as the ResearchSpace of British Museum.A unique element of the CrossCult Upper-level ontology is the semantics of the Reflective Topic entity, which encompasses all those connections that can be made to create a network of points of view to aid reflection and prospective interpretation over a topic and to enable interconnection between physical or conceptual things of man-made or natural origin.A broader reflective topic can be composed by more specific (narrower) topics, in the same way as an E89.Propositional Object can be composed by other objects, using the P148.has_componentproperty.The core CRM classes of the model are shown on blue and the skosified entities are in pink, whereas the ontology individuals are represented in boxes with DBpedia links shown in bright yellow.A fully-fledged example of the Upper-level ontology is shown in Figure 3, which presents a detailed modelling view of the National Gallery painting ID NG6576 (Eustache Le Sueur, Alexander and his doctor, about 1648-9).The painting is modelled as an instance of E22.Man-Made Object uniquely identified by a National Gallery (UK) reference and associated with a skosified type (e.g., Canvas painting).The modelling of typical information about the painting such as its size, material, medium and support, date of production and ownership is not different from the approach proposed by the CIDOC-CRM official tutorial and evident in well-known projects, such as the ResearchSpace of British Museum.A unique element of the CrossCult Upper-level ontology is the semantics of the Reflective Topic entity, which encompasses all those connections that can be made to create a network of points of view to aid reflection and prospective interpretation over a topic and to enable interconnection between physical or conceptual things of man-made or natural origin.A broader reflective topic can be composed by more specific (narrower) topics, in the same way as an E89.Propositional Object can be composed by other objects, using the P148.

CrossCult venue Ontology
The CrossCult Venue ontology is a fully CIDOC-CRM compliant structure, which aims to provide a simple generic model of the spatial arrangements of the different venues that participate in the four project pilots which captures the provenance of POIs (Points of Interest) movement.The venues of the four pilots can be clustered broadly as indoor and outdoor "exhibitions" of POIs, with similar characteristics: i) Pilot 1, an indoor gallery with a large multi-thematic collection spread over 66 rooms and 2 floors.ii) Pilot 2, four open air archaeological sites with location and POIs alterations over the various historical periods starting from the classical period and the Roman times.iii) Pilot 3, a small museum with dense displays of archaeological exhibits confined in a small number of rooms.iv) Pilot 4, two whole cities with disperse POIs located on façades of buildings, near bridges, in crossroads, near statues, on top of columns etc.A POI in CrossCult is any physical thing (place or object), either immobile or portable, which is of historical, social or cultural interest, e.g. a painting at the National Gallery, the Asklepieion at Epidaurus or the statue of "The Tall Banker" in Luxembourg.
Although the purposes of the different venues are quite different, they are characterised by similarities that allow the construction of a common model that describes their spatial arrangements.The semantic representation of the city's structure conceptualised as an outdoor exhibition has

CrossCult venue Ontology
The CrossCult Venue ontology is a fully CIDOC-CRM compliant structure, which aims to provide a simple generic model of the spatial arrangements of the different venues that participate in the four project pilots which captures the provenance of POIs (Points of Interest) movement.The venues of the four pilots can be clustered broadly as indoor and outdoor "exhibitions" of POIs, with similar characteristics: (i) Pilot 1, an indoor gallery with a large multi-thematic collection spread over 66 rooms and 2 floors.(ii) Pilot 2, four open air archaeological sites with location and POIs alterations over the various historical periods starting from the classical period and the Roman times.(iii) Pilot 3, a small museum with dense displays of archaeological exhibits confined in a small number of rooms.(iv) Pilot 4, two whole cities with disperse POIs located on façades of buildings, near bridges, in crossroads, near statues, on top of columns etc.A POI in CrossCult is any physical thing (place or object), either immobile or portable, which is of historical, social or cultural interest, e.g., a painting at the National Gallery, the Asklepieion at Epidaurus or the statue of "The Tall Banker" in Luxembourg.
Although the purposes of the different venues are quite different, they are characterised by similarities that allow the construction of a common model that describes their spatial arrangements.
Heritage 2019, 2 The semantic representation of the city's structure conceptualised as an outdoor exhibition has similar characteristics to the indoor gallery and the small museum.It is composed of sections filled with other elements; for example, buildings composed of walls, floors, ceilings-that have dimensions and materiality; windows and doorways-spaces that are completely void.In all venues the POIs, within a building or outdoors, are also characterised by events; POIs are moved from one location to another to serve for example the needs of exhibitions.They are also moved to receive treatment or for the needs of rehanging or changing the display of objects at a specific part of the building's structure.Finally, the POIs move as the city's structure changes or as the result of constant alterations throughout time.Historic buildings and archaeological venues are, in most cases, the result of a series of matter addition and removal due to construction and destruction activities that modified their appearance over the various historical periods.The identification of these processes, together with the analysis of the different building techniques and the materials utilised over its existence, provides historians with an understanding of the continuity and discontinuity of matter and activities on a built structure.All these strands of information can be used to produce a detailed understanding of the development of the historical provenance of any building, whether standing or in ruins, and to identify significant phases of the monument's appearance throughout the centuries.
The process of building the Venue ontology involved first developing the appropriate underlying conceptual model to support the requirements of the four venues and, second, populating the model with sufficient detail to realise its full potential.We kept the resulting model as generic as possible and we progressed with the task of populating the model with examples.The data for populating the ontology came from a variety of sources and differed in their underlying structures, accuracy and the level of detail in the representation of the places.Therefore, as more data was included in the process, the model was further specialised to meet the specific needs of each Venue.
The proposed CrossCult Venue ontology attempts to address these emerging data modelling requirements and has been inspired from the CIDOC-CRMba, an extension of CIDOC CRM that has been proposed for approval by CIDOC CRM-SIG to support buildings archaeology documentation 19 .We decided on CIDOC CRM as the integrating framework, as a sensible first step on the road to interoperability.From the modelling process outlined above, we concluded that the resulting Venue ontology does cover the basic needs and characteristics of the four pilot venues in terms of their spatial arrangements.Finally, if we need to scope the needs of all our indoor and outdoor venues in more detail and cater for additional functionalities (for example, model the spatial semantics related to the alterations of buildings that modified their appearance over the various historical periods), then the Venue ontology can be enhanced with additional classes and properties from the CIDOC-CRMba 20 .The CIDOC-CRMba incorporates parts of the CRMgeo, a detailed model of generic spatial-temporal topology and geometric description [10], parts of CRMsci, a model for scientific observation, measurements and processed data in descriptive and empirical sciences (such as biology, geology, geography and cultural heritage conservation) and CRMarcheo, a model developed for the documentation of archaeological excavations.
To address the data modelling requirements discussed in the paragraphs above, we defined the Venue ontology as a subset of CIDOC-CRM.Similar to the Upper-level ontology, the structure maintains full compatibility with CIDOC-CRM containing the least minimum set of CRM concepts as described in the latest specification document version 6.2.3.Figure 4 depicts its graphical representation; Major components of the Venue ontology arrangements are the subclasses of the E18.Physical thing, E19.Physical object, E26.Physical feature and E24.Physical man-made thing, which are used to model physical objects and features as well as man-made structures.Physical thing and Physical man-made thing Instances such as a 'Building', a 'Room', a 'Floor' or a 'Wall'.It can also be combined together to 19 http://icom.museum/resources/publications-database/publication/definition-of-the-crmba-an-extension-of-cidoccrm-to-support-buildings-archaeology-documentation/print/1/Accessed January 29, 2019. 20http://www.cidoc-crm.org/crmba/sites/default/files/2016-12-3%23CRMba_v1.4.1_UR.pdfAccessed January 29, 2019.
Heritage 2019, 2 656 form more complex structures.These classes are further related to other ontology classes to model the physical and man-made structures' dimensions, conditions or events.The class E.55 Type has also been employed to differentiate between the functionalities of a room in a museum as a 'Gallery', a 'Cafe', a 'Temporary exhibition room' etc. Complementary to the notion of the E19.Physical object and E24.Physical man-made thing classes is the E53.Place class, which is used to model the different types of the venue spaces.Place instances can be combined together to form complex spaces, whereas spatial coordinates and appellations are used to model the details of such spaces.To address the data modelling requirements discussed in the paragraphs above, we defined the Venue ontology as a subset of CIDOC-CRM.Similar to the Upper-level ontology, the structure maintains full compatibility with CIDOC-CRM containing the least minimum set of CRM concepts as described in the latest specification document version 6.2.3.Figure 4 depicts its graphical representation; Major components of the Venue ontology arrangements are the subclasses of the E18.Physical thing, E19.Physical object, E26.Physical feature and E24.Physical man-made thing, which are used to model physical objects and features as well as man-made structures.Physical thing and Physical man-made thing Instances such as a 'Building', a 'Room', a 'Floor' or a 'Wall'.It can also be combined together to form more complex structures.These classes are further related to other ontology classes to model the physical and man-made structures' dimensions, conditions or events.The class E.55 Type has also been employed to differentiate between the functionalities of a room in a museum as a 'Gallery', a 'Cafe', a 'Temporary exhibition room' etc. Complementary to the notion of the E19.Physical object and E24.Physical man-made thing classes is the E53.Place class, which is used to model the different types of the venue spaces.Place instances can be combined together to form complex spaces, whereas spatial coordinates and appellations are used to model the details of such spaces.
We use the E9.Move class to describe changes of the physical location of the instances of E19.Physical object, for example the movement of a painting from one room to another.This class inherits We use the E9.Move class to describe changes of the physical location of the instances of E19.Physical object, for example the movement of a painting from one room to another.This class inherits the property P7.took_place_at (witnessed), which has range E53.Place.We use this property to describe the larger area within which a move takes place, whereas the properties P26.moved_to (was_destination_of) and P27.moved_from (was_origin_of) describe the start and end points only.For example, (E9) "Movement of the painting" moved the (E19) "Painting"; (E53) "East Wall location" is the origin of the (E9) "Movement of the painting" and (E53) "West wall location" is the destination of the movement; the (E9) "Movement of the painting" took place at (E53) "the location of Room 9".In some cases, we can also use the P8.took_place_on or within (witnessed) which has range E19.Physical Object.This property is in effect a special case of P7.took_place_at and we can use it to describe, for example, a movement that can be located with respect to the space defined by an E19.Physical Object such as a 'Building', a 'Room' or a 'Wall'.

Internal Data Aggregation
Memory institutions have been working to enrich their cultural resources either by converting them into digital objects or by collecting born digital ones.Various types of metadata, meaning data about data, are created for those resources such as bibliographic information, technology and structure features and preservation information.Characteristic features of metadata are that it "can be embedded in the body of the digital resource, may be a first-class object as well as a primary resource and may be linked to each other in order to produce a richer environment for users to access the resources over the internet" [11].We consider metadata as a secondary resource created from a primary resource (a painting, a book, a music performance).
The National Gallery (NG) uses a range of systems to hold and manage information about its primary resources.Most forms of documentation within the NG make direct use of or reference these resources, particularly images and metadata.For the CrossCult project the NG needed to provide dynamic access to a full set of painting images and its core (Tombstone) data, retrieved from collection information held in the NG collection management system (CMS) TMS (The Museum System™).
The existing data consisted of the painting details dataset, the artist's details dataset and the images dataset:

•
Painting inventory ID: Accession Number, unique painting ID.

•
Painting date(s): relevant dates for the painting, including date of production, dates of exhibitions and modifications, etc.

•
Artist(s): The name of the artist or artist involved in the production of the painting.This will also include details relating to unknown groups of related artists, such as "Workshop of . . .", "Follower of . . ." etc.

•
Group: Indicates if a given painting is part of a defined group of paintings.The paintings in these groups are normally directly physically related rather than of a similar type.For example paintings that used to be part of the same altarpiece, paintings that were all created as part of one installation, double sided paintings, etc.

•
Painting title: Full title of a painting.Additional alternative titles may also be available; a shortened version of the title will also be available.

•
Medium and support: Short terms used to describe the main key materials used to create a given painting, for example "Oil on Canvas".

•
Painting dimensions: The physical height and widths of a given painting in centimetres.

•
Credit line: Were available, details of specific acquisitions credits, including the name and date of a given bequest.This can include details of more than one event and date.

•
Public locations: The name or number of the specific Gallery in which the painting is held.All paintings that are not on display are given the generic location of "Not on display".

•
Inscription summary: Textural details describing the presence and locations of any specific marks, signatures, dates or more general inscriptions noted on a given painting.

•
Classifications and keywords: General type and grouping classification terms, along with more general subject matter related keywords.

•
Additional paintings details: • Description: Short textural description of the painting and its history, drawn from the National Gallery public website content management system.
The Artist details dataset includes: • Unique Artist ID.

•
Artist name: Were possible including know variations and translations of these names.
• Artist date(s): Generally the date of birth and death on an artist, but possibly dates relating to when they were known to be alive, active or when their work was documented.• Short artist's biography, where available.
The image details dataset provided to the CrossCult system included a full set of 800 pixel images of almost all of the NG acquisitioned paintings (~2300), drawn from an internal bespoke digital asset management system, presented via an IIIF21 compliant IIP Image server 22 .
In order to dynamically re-use all of these resources, an internal Application Programming Interface (API) was developed to present a single, aggregated view of all of the available data and allow direct access to structured linkable data describing the NG Collection (see Figure 5).As a second step and in order to interlink the NG digital information and to share its data unambiguously with external users, the NG established a unique persistent identifier (PID) for every entity referred to by its digital information.A persistent identifier (PI or PID) is a long-lasting generic reference to an image, document, file, web page, or digital description of any physical thing or concept that one might want to describe or discuss.Many things one might want to discuss or refer to already have IDs within existing local databases or catalogue systems.The purpose of a PID system is to provide unique generic identifiers that can be used and reused across multiple systems, particularly in relation to publishing information that can be accessed over the Internet.Finally, a subset of the NG available data has been provided in the form of a basic JSON23 array (see Table 1), and shared externally using the PIDs through the NG public beta API 24 .The work that is currently underway aims to fully map the data to the CIDOC CRM and provide a standard semantic presentation of the data (an example of this mapping is available in Section 4).
Heritage 2019, 2 FOR PEER REVIEW  11   The Artist details dataset includes: • Unique Artist ID.

•
Artist name: Were possible including know variations and translations of these names.

•
Artist date(s): Generally the date of birth and death on an artist, but possibly dates relating to when they were known to be alive, active or when their work was documented.• Short artist's biography, where available.
The image details dataset provided to the CrossCult system included a full set of 800 pixel images of almost all of the NG acquisitioned paintings (~2300), drawn from an internal bespoke digital asset management system, presented via an IIIF 21 compliant IIP Image server22 .In order to dynamically re-use all of these resources, an internal Application Programming Interface (API) was developed to present a single, aggregated view of all of the available data and allow direct access to structured linkable data describing the NG Collection (see Figure 5).As a second step and in order to interlink the NG digital information and to share its data unambiguously with external users, the NG established a unique persistent identifier (PID) for every entity referred to by its digital information.A persistent identifier (PI or PID) is a long-lasting generic reference to an image, document, file, web page, or digital description of any physical thing or concept that one might want to describe or discuss.Many things one might want to discuss or refer to already have IDs within existing local databases or catalogue systems.The purpose of a PID system is to provide unique generic identifiers that can be used and reused across multiple systems, particularly in relation to publishing information that can be accessed over the Internet.Finally, a subset of the NG available data has been provided in the form of a basic JSON 23 array (see Table 1), and shared externally using the PIDs through the NG public beta API 24 .The work that is currently underway aims to fully map the data to the CIDOC CRM and provide a standard semantic presentation of the  {"type":"location","pid":"006-001M-0000","name":"Room 30","title":"Spain","description":"<p>Spanish painting flourished during the 17th century principally in the service of God and King....","objects":{"000-00A8-0000":{"pid":"000-00A8-0000","no":"NG6566"}, ... , ,"example_object":"000-00A8-0000","artists":{"001-01WB-0000":"Italian, Neapolitan","001-03FC-0000":"Jusepe de Ribera", ...},"date_range":{"begin":"1618-01-01","end":"1684-12-31"},"contains":[],"keywords": {"00A-0001-0000":"Religion","00A-0002-0000":"Christianity", ...},"license":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/","attribution":"This data is licensed ... "}

Extending the National Gallery Core Data-Keywords
Compliant with international IT standards the Getty vocabularies 25 were chosen by the CrossCult project as the main building block of CCCS, the CrossCult vocabulary, as they provide authoritative information for cataloguers, researchers and data providers; they contain structured terminology for art, architecture, decorative arts, archive materials, visual surrogates, conservation and bibliographic materials.These multilingual semantically structured thesauri can be powerful tools for enriching knowledge and providing meaningful links for cultural heritage information resources.As Linked Open Data, the Getty vocabularies are expressed as structured and openly reusable machine-readable data, that information systems can interpret and use to create semantically relevant relationships across other linked datasets [12].
For the needs of Pilot 1 we selected a flat list of approximately 500 keywords, which have been identified and aggregated from internal available NG datasets and a number of resources such as the keywords of the NG picture library 26 .The flat list of the NG keywords was initially cleansed, verified against and linked to the external semantic definitions from the Getty authority vocabularies such as, the Arts and Architecture Thesaurus of Getty (AAT), the Getty Thesaurus of Geographic Names (TNG), the Union List of Artist Names (ULAN), the Cultural Objects Name Authority (CONA) as well as the Conservation & Art Materials Encyclopaedia (CAMEO).On a second stage we incorporated the flat list in the CrossCult Classification Scheme (CCCS).The reuse of standardised resources ensured the validity of the CCCS structure and the consistency in the use of its terms.The connections between the CCCS and the CrossCult Upper-level ontology introduced an augmented view of cultural heritage information enabling further fluency to express interrelations and groupings among physical items, venues, digital resources and concepts.

Extending National Gallery Painting Location Data
At the NG, paintings move from one location to another, to serve the needs of exhibitions, to receive treatment or for the needs of rehanging and changing the objects displayed in a specific location within the building.Although the record of an object's location is part of its provenance, the time for recording detailed location-based information is limited.Our intention during the CrossCult project was to develop administrative tools that would also allow the basic room data to be augmented to include specific wall and position based information in order to record the changes in location and capture the movement provenance.
The existing location data we had to start with was available from the NG architectural drawings (see Figure 6); the process began by manually extracting the dimensions of the rooms (height and width), walls (height and width) and when available the dimensions of the room's door(s).A relational MySQL database introduced a series of tables that were populated with the existing location data and 25 http://www.getty.edu/research/tools/vocabularies/Accessed January 29, 2019. 26https://www.nationalgalleryimages.co.uk/Accessed January 29, 2019.
Heritage 2019, 2 660 would store the generated location data (see Figure 7).At the core of the structure is the ng-location painting_position database table.This table holds data that allows us to capture temporal information related to each movement of an object such as painting_position_date.Supplementary database tables are the ng_location painting which holds the painting dimensions and the ng-location wall_object which holds location data related to wall objects such as skirting boards, cornicing, doors etc.
During the project we developed a game application that allows the end user to move paintings and record their positions, on a given wall within a virtual space and can be used to quickly capture a more precise location for paintings.The "Moving Paintings" as a standalone administrative tool has been created to assist the museum staff to accurately record the positions of paintings when they are moved or re-positioned.In order for the application to use NG resources (images, room information, etc.) a specific data structure needed to be used.The system uses an XML27 structure which is automatically generated from internal NG data, where metadata of rooms (and their contents), walls (and their dimensions), paintings (and their images), artists and categories are stored.In the current prototype [13], the application consumes the XML structure of a room and its paintings, and downloads images of these paintings to integrate into the application.At the same time the data from the game can be used to populate and update the XML structure and be added to the location database.

Extending National Gallery Painting Location Data
At the NG, paintings move from one location to another, to serve the needs of exhibitions, to receive treatment or for the needs of rehanging and changing the objects displayed in a specific location within the building.Although the record of an object's location is part of its provenance, the time for recording detailed location-based information is limited.Our intention during the CrossCult project was to develop administrative tools that would also allow the basic room data to be augmented to include specific wall and position based information in order to record the changes in location and capture the movement provenance.
The existing location data we had to start with was available from the NG architectural drawings (see Figure 6); the process began by manually extracting the dimensions of the rooms (height and 3 http://www.cidoc-crm.org/Accessed January 29, 2019.

Figure 1 .
Figure 1.The architecture of the CrossCult Knowledge Base.CCCS: CrossCult Classification Scheme; CIDOC-CRM: International Committee for Documentation Conceptual Reference Model; AAT: Arts and Architecture Thesaurus of Getty; FOAF: FOAF (Friend-Of-A-Friend).

Figure 1 .
Figure 1.The architecture of the CrossCult Knowledge Base.CCCS: CrossCult Classification Scheme; CIDOC-CRM: International Committee for Documentation Conceptual Reference Model; AAT: Arts and Architecture Thesaurus of Getty; FOAF: FOAF (Friend-Of-A-Friend).

Figure 2 .
Figure 2. Core Elements of the Upper-level Ontology.

Figure 2 .
Figure 2. Core Elements of the Upper-level Ontology.

2 654Figure 3 .
Figure 3.A detailed example of the CrossCult Upper-level Ontology and relationships of ontology individuals.CCCS: CrossCult Classification Scheme.

Figure 3 .
Figure 3.A detailed example of the CrossCult Upper-level Ontology and relationships of ontology individuals.CCCS: CrossCult Classification Scheme.

Figure 4 .
Figure 4.The conceptual model of the CrossCult Venue ontology, demonstrating the documentation of the movement of paintings and recording former or current locations.

Figure 4 .
Figure 4.The conceptual model of the CrossCult Venue ontology, demonstrating the documentation of the movement of paintings and recording former or current locations.

Figure 5 .
Figure 5. Simplified diagram of the major sources of digital information aggregated within the National Gallery to create the Application Programming Interface (API) used within CrossCult and available to other possible users.PID: persistent identifier.

Figure 5 .
Figure 5. Simplified diagram of the major sources of digital information aggregated within the National Gallery to create the Application Programming Interface (API) used within CrossCult and available to other possible users.PID: persistent identifier.

Figure 6 .
Figure 6.The architecturaldrawing with the available National Gallery London (NG) location data (wall and room dimensions) of Room 30.

Figure 6 .
Figure 6.The architecturaldrawing with the available National Gallery London (NG) location data (wall and room dimensions) of Room 30.