Towards an Online Database for Archaeological Landscapes. Using the Web Based, Open Source Software OpenAtlas for the Acquisition, Analysis and Dissemination of Archaeological and Historical Data on a Landscape Basis

: In this paper, we present the web-based, open source software OpenAtlas, which uses the International Council of Museums’ Conceptual Reference Model (CIDOC CRM), and its possible future potential for the acquisition, analysis and dissemination of a wide range of archaeological and historical data on a landscape basis. To this end, we will ﬁrst introduce the ongoing research project The Anthropological and Archaeological Database of Sepultures (THANADOS), built upon OpenAtlas, as well as its data model and interactive web interface / presentation frontend. Subsequently, the article will then discuss the possible extension of this database of early medieval cemeteries with regard to the integration of further archaeological structures (e.g., medieval settlements, fortiﬁcations, ﬁeld systems and tra ﬃ c routes) and other data, such as historical maps, aerial photographs and airborne laser scanning data. Finally, the paper will conclude with the general added value for future research projects by such a collaborative and web-based approach.


Introduction
Archaeological research has seen a constantly increasing integration of new methods, tools and techniques, such as web GIS, online databases and large-scale archaeological prospection in recent years [1][2][3][4][5][6][7][8][9][10]. This has led to countless new discoveries and deeper insights into known archaeological sites, including the consideration of the surrounding landscape on the basis of ever-growing areas and datasets [11][12][13][14][15]. However, this fast increase in mainly digital research data also makes the demand for structured data management and data exchange increasingly urgent. In this respect, web-based applications that are operated via conventional web browsers have several clear advantages. On the one hand, they offer the respective researchers the possibility to access and edit their research data anytime and anywhere, on the other hand, they offer a simple and intuitive option to exchange the data within a larger research community or even to present them to a broader public. In order to prepare these data sustainably and properly for unforeseen future applications, however, a clear data structure based on common, internationally recognized data standards is essential.
To ensure this, the online database system OpenAtlas 1 [16] has been developed over the last few years and is constantly being improved [1]. OpenAtlas is a web-based, open source software for the work with archaeological, historical and geospatial data. It provides a user interface for the data input and directly maps the data in predefined networks following the International Council of Museums' Conceptual Reference Model (CIDOC CRM) 2 .
OpenAtlas is already in use for this very purpose by several, both finalized and ongoing, international research projects 3 , mainly concerned with archaeological excavation data and historical written sources. Another ongoing project called The Anthropological and Archaeological Database of Sepultures (THANADOS) is also built upon the OpenAtlas database system. Its aim is the online and open access mapping of 500 already-published cemeteries from the Early Middle Ages in Austria and neighboring countries. Besides the data acquisition of these cemeteries, however, the further development and improvement of the user-sided presentation frontend is another focus of the project. During the project, consideration is also given to the above-mentioned expansion of the software with regard to further sources, data sets and methods, as well as the inclusion of a wider range of archaeological structures.
In the following, this article will discuss the THANADOS project, its data structure and the existing functionalities of its interactive web frontend, as well as the archaeological prospection methods to be integrated. Based on an Austrian case study of a rural medieval landscape, several corresponding data sets and their integration into the program are then presented to illustrate arising opportunities and benefits.  4 is funded by the Austrian Academy of Sciences (Go!Digital Next Generation program: GDNG 2018-039) and based at the Natural History Museum Vienna and the Austrian Archaeological Institute of the Academy of Sciences. In the course of this project, the team further developed and used OpenAtlas for data acquisition. To integrate the large amount of different data needed for THANADOS, OpenAtlas uses the CIDOC CRM for archaeological purposes.

The Anthropological and Archaeological Database of Sepultures
To this end, mainly archaeological data such as burial data from individual databases with varying data standards as well as data models have been mapped to achieve a higher interoperability [10]. Yet so far data were rarely mapped directly using the CIDOC CRM, but rather existing data were remapped [17]. Thus, a large amount of burial data were collected in recent years within various research projects, by "re-engineering" the data from original and mainly printed publications, using OpenAtlas. To date, the database comprises about 500 burial sites from the Early Middle Ages with summaries of the sites themselves and their general interpretation as well as descriptions of each individual grave, the burials documented with anthropological data and all associated finds ( Figure 1) [18]. Furthermore, all available classifications, values and digital objects (e.g., scans of figures of the respective entities) were stored digitally. Additionally, spatial information such as shape and position of all sites and graves were documented or determined. The THANADOS data model ( Figure 1) was based on four hierarchical levels, where the site/cemetery functions as the parent entity for multiple graves, each grave as parent entity for one or multiple burials and finally each burial as parent entity for one or multiple finds, if applicable.
As mentioned, OpenAtlas is a web based open source software, especially designed for the acquisition of archaeological, historical and geospatial data, providing an intuitive user-interface. The data were mapped directly in predefined networks following the CIDOC CRM. The program is intended for the use in a multiuser environment in which various specialists such as historians, archaeologists, anthropologists and other researchers from cultural heritage management and humanities can enter, access and analyze their data collaboratively. The users do not require in-depth knowledge of the underlying software and data structure, as the program deals with these issues automatically. Here, no specialized CRM extensions were used but only CRM classes that represented the lowest common denominator for the respective entity, to ensure data integrity and future compatibility. These mappings, however, can of course be extended later on by experts on ontology using various CRM extensions if needed. Each entity in OpenAtlas generated a node in the network, which was assigned a certain CIDOC CRM class, linked to various other nodes via further properties, also defined by the CIDOC CRM. This was solved technically mainly by only two tables in a PostgreSQL/PostGIS database, one containing information about the entities themselves (Table 1), and another table contained the links in between ( Table 2).  For sites, this meant that the entity would be linked for example with the type "burial site" respectively one of its subtypes ( Figure 2). The same procedure was also applied to graves, burials and finds using the respective types and subtypes. Additionally, all entities could be connected with various other types to assign them to further attributes. By doing so, various domain specific thesauri could also be integrated. Spatial attributes of physical entities were assigned the class P53 (has current or former location) as well as E53 (place). Thus, through the use of other classes and links with the respective properties, further information could be assigned to these entities as well, e.g., chronological, spatial or bibliographical data.
By using timestamps respectively for the time spans, OpenAtlas allows for a precise documentation of temporal attributes of certain entities [1]. As with the GeoJSON-T format (https://github.com/ kgeographer/geojson-t) for spatial extents, each physical entity may also be given a certain time span for begin and end of its life span/period of use. In an archaeological context this would represent the temporal range in which a certain object was dated by the archaeologists, e.g., an early medieval church commonly dated between the second half of the 9th century AD until the first half of the 11th century AD, following the current state of research. Here, for instance four temporal nodes were assigned to the respective church (earliest begin: 850-01-01; latest begin: 899-12-31; earliest end: 1000-01-01; latest end: 1049-12-31). The program uses single days as the smallest temporal unit for documentation ( Figure 3).  Due to the common occurrence of uncertainties or imprecise information on chronological and spatial data, regarding the documentation of cultural heritage, classifications such as "circa", "approximately" or "shortly after" etc. as well as quantifiers such as percentage probabilities for the quality of temporal and spatial entities were purposely avoided as merely subjective during the development of the data model. Thus, OpenAtlas in theory assumes 100% certainty of time spans and spatial extents, however, it allows for variable precision. If the location or dating is not known precisely, the user can define an area or time span in which the respective entity falls locally and chronologically with 100% certainty. The larger this given area or time span, the lower the precision and vice versa.
By using the CIDOC CRM as an underlying data model (Figure 4), OpenAtlas also allows for combining written sources with physical objects, actors and temporal entities. A medieval charter for example can be mapped to the database as a network of certain nodes and links. The charter as a physical object (Information Carrier E84) "carries" (P128) its content (Linguistic Object E33) which can have transcriptions, transliterations or translations (further E33 entities via P73 "has translation"). The charter itself can have a certain location (E53 via P53) e.g., the archive where it is kept today. The content refers (P67) to other entities like the described events or processes (E7, E8, E9), involved actors (E21, E74, E40) and places, respectively, and physical things (E53, E18). Depending on the outreach of connections this can result in dense networks and combinations of sources and content beyond the charter itself or in an isolated cluster if there are no further links connected to other entities outside the given source.

Technical Background
OpenAtlas is primarily a software for data acquisition from cultural heritage science that allows researchers to map their information to the CIDOC CRM in a homogenous way. A user interface provides workflows for researchers from humanities that do not need to be experts in IT/ontology to record their data. The software takes care of the mappings. Regarding content, each research project that uses this software for its own purpose is responsible for the contained information and what shall happen with it. As the software is open source, the developers (which includes the authors of this article) have no control over the way the software is used and to what extent. However, even though one instance of OpenAtlas can feature many different projects, usually one research project, at least amongst the ones known to the authors, uses one instance. In the case of THANADOS one instance was shared with cooperating partners who work on the same topic but for different regions (CZ, HU, AT).
OpenAtlas provides an API that delivers various JSON-LD outputs (based on the linked places annotation format) in order to make the data interoperable, reusable, findable and accessible (e.g., https: //demo-dev.openatlas.eu/api/0.1/entity/116293). Various data entities can be linked to controlled vocabularies e.g., GeoNames or Wikidata as well as to existing (linked) open data, which is also implemented in the JSON-LD output following the respective standards and annotations. In this way the data can be embedded into the semantic web. However, the choice which vocabularies to use and whether to publicly provide the data or not and which metadata enriches the datasets lies in the hands of the project's responsible PI/institution. Additionally, this is dependent on the availability through APIs of respective vocabularies and data endpoints.
OpenAtlas is not an initiative that provides data per se, but is only the technical framework to record it and if desired, technical ways to provide it in a machine-readable format online.
Regarding technical aspects, both OpenAtlas and THANADOS are entirely built upon open source technology and use PostgreSQL with PostGIS as a data backend. On the server side, they are using Python3 with the Flask framework. On the client side, they are using common technologies such as JavaScript, HTML 5 and CSS along with certain JS Libraries like Leaflet, JQuery, Bootstrap etc. Thus, the system needs to be installed on an Apache2 server in order to be used. This can be a localhost on a single computer if someone intends to use the system offline as stand alone. However, the software is mainly intended to be used by multiple users at the same time and therefore best installed on a web server.
Sharing the same data pool by multiple users has the benefit that the database is always up to date and the work can be carried out collaboratively. As we are dealing with complex data and differing levels of detail and the scope of OpenAtlas are research projects in History and Archaeology, a web based multiuser environment that can be accessed by every common browser seemed most beneficial for this purpose. This of course required a certain technical environment and maintenance. However, once installed, users only needed to log in via a browser in order to work with the system. An offline database has grave limitations when it comes to sharing or merging data and does not allow for online collaboration per se.
Both systems are open source and the code along with installation instructions is hosted at GitHub. While OpenAtlas is designed for data acquisition, THANADOS serves as a frontend for disseminating the data of this specific project. It is the first development of a public frontend for OpenAtlas data and it features functionalities that are specific for archaeological and anthropological burial data that are recorded with the OpenAtlas system. However, as will be discussed in this article, it can serve as a raw model for further extensions e.g., landscape archaeological data.
Currently, as THANADOS is an ongoing project that started only in 2019, the download of static GeoJSON files per site is possible as well as the Export of CSV files for query and search results both from intrasite and intersite searches. In this way it is remapped from complex CIDOC CRM networks to relational data and can be used by non (ontology-) expert users in common software such as ArcGIS, QGIS, Excel or SPSS. Static maps and charts can be exported as image files. More export features are in development.
Regarding interoperability, the OpenAtlas API, as mentioned before, provides live data from the THANADOS database in the form of JSON-LDs. THANADOS has an additional API that provides basic GeoJSONs for each site (e.g., https://thanados.net/entity/50505/json). Upon completion of the project in 2021, the datasets including metadata will also be referenced within trusted repositories, for which the respective possibilities and technical implementation are currently evaluated. The THANADOS datasets are provided under the CC BY 4.0 License. Every section of https://thanados.net that allows for downloading data offers a citation recommendation including the license. As certain elements cannot be licensed in this way (e.g third party provided map tiles or object images etc.) they are excluded from the CC by license and their copyright was documented separately.

Landscape Archaeological Data
As previously outlined, OpenAtlas and the THANADOS frontend thus offer an ideal platform for the extension to further landscape archaeological data and results. Furthermore, the comprehensive mapping of all known cemeteries of a certain period already provides an excellent foundation of data for a general understanding of the historical landscape. In the next step, the voids now need to be supplemented with data on settlements, centers of power, agricultural use and communication routes. Ideally, this should also include the application of different data sets on further medieval structures, based on aerial photographs, airborne laser scanning data and historical maps.
Although geophysical prospection has the potential to provide numerous significant results [19] and could be very well integrated for the concept discussed here as well, this paper primarily dealt with historical maps and remote sensing data. This is mainly due to the fact that the latter data sets can already be accessed online to a large extent with regard to many areas and research questions and require far less technical equipment and financial resources by individual researchers. This naturally reflects the approach propagated here largely based on open access available programs and the orientation towards a wide range of different users.

Historical Maps
Historical maps play an important role for historical and landscape archaeological investigations. The map series', most of which were produced from the 18th century onwards and whose accuracy in the early 19th century often reached a high level of quality [20][21][22][23][24], are often an indispensable source for numerous historical questions, also concerning earlier periods [25][26][27]. For most areas of Europe, there are nowadays several online applications available, which allow free access to various historical maps [28][29][30]. For the successor states of the former Habsburg monarchy this would be Mapire 5 [28,31], among others. In addition, numerous archives offer digitized versions of the map sheets they archive, many of which are already georeferenced and can be obtained at reduced prices for research purposes. However, most providers only offer to view the maps online, downloads are usually charged for or not permitted at all. Additionally, the maps provided by archives, as mentioned above, often have to be georeferenced by the researchers themselves, and even with preferential conditions for research and educational purposes, the costs can quickly add up for larger landscapes and research areas. In addition, even under optimal circumstances, the maps are ultimately available only as georeferenced scans on offline GIS applications. For more in-depth queries and analyses, a further tedious work step is necessary, whereby the maps must be vectorized either manually or semi-automatically to obtain the individual paths, landmarks, agricultural land and buildings as polygons [32] (pp. 64-68). Thus, a collaborative approach in which numerous different researchers enter their individually evaluated maps into a jointly maintained online database would bring significant advantages. Especially since historical maps of already considered areas would not have to be vectorized by each and every new user again and an increasingly growing and almost seamless area could be covered, which would also be invaluable for future research. This would also allow access to the data for researchers who do not have the technical expertise to prepare the data themselves, but who do have the necessary competence to evaluate the same data in a scientifically profitable way.

Remote Sensing Data
Similar arguments can be made for the use of remote sensing data such as aerial photographs and digital terrain models (DTM) based on airborne laser scanning (ALS) data. ALS data in particular require a large number of complex initial processing steps to make them suitable for archaeological and historical investigations [33] (pp. 65-66). Additionally, here there are numerous possibilities of mostly governmental providers 6 , usually offering different visualizations of the DTM, such as hill shade or slope [34] (pp. [16][17][18][19] as well as sometimes more elaborate visualizations like openness [35], local relief models [36] and sky-view factors [37]. However, these visualizations are only a representation of the initial DTM, the raw data are usually not accessible for conventional users. Furthermore, the data visualizations are only as good as the filtering methods originally applied to the data, which, in the case of such public providers, are not primarily focused on archaeological needs [33] (pp. 66-67). Yet aerial photographs likewise require meticulous preparation in order to be able to map archaeological structures accurately and on a large scale [5,8]. The rectification and georeferencing as well as the documentation of the metadata of the survey for instance are only three of the most important aspects here. Thus, the joint preparation and evaluation by different researchers of remote sensing data based on an online solution would yield significant advantages for archaeological research as well.

Results and Discussion
For the research question posed here, the medieval landscape around the castle and dominion of Scharfeneck at the Leitha Hills, in eastern Austria, close to the former Hungarian border, which was recently investigated in the course of a doctoral thesis [32], was selected as pilot study. In addition, further data on several deserted medieval villages from Lower Austria were used to test the mapping of smaller individual settlements using OpenAtlas.
Prior to entering the data into OpenAtlas, the different historical and archaeological sources needed to be acquired and analyzed individually. In a next step, an integrative interpretation process on a landscape basis was needed, combining all complementary data and results, including publications on old excavations and surface finds. This step is commonly carried out by archaeologists offline in a desktop GIS (e.g., ArcGIS, QGIS), while most web GIS solutions do not offer all the necessary tools and algorithms. However, OpenAtlas [1,16] and the THANADOS frontend already offer to do many of these tasks, such as the mapping of archaeological structures via polygons or advanced data querying and visualization, online via a web browser and specially tailored to the needs of historical/archaeological research.
As already implemented in THANADOS for the acquisition of cemetery data, also written sources and published data on archaeological sites and features can be summarized, translated and stored as digital text. Additionally, further survey data can be classified (archaeological interpretation, chronology, spatial extent) to allow for statistical analyses and comparison. Their spatial information is stored as PostGIS geometries, attachments like drawings, photographs, 3D-models or bibliographical references are also collected systematically and linked to the respective entities. All data are mapped as a network and consist mainly of four hierarchical clusters: place (e.g., village, lordship, cemetery), feature (e.g., building, agricultural field, grave), structure/stratigraphic unit (e.g., posthole, ditch, burial) and, where applicable, finds/artefacts.
The categorization may also include deduced metadata such as respective types of desertion, the dimensions in question, also convertible into relevant predefined historical area measurements, or the respective economic use and, where applicable, a cartographic representation of these data. Each place is connected to other entities like actors (person, group, legal body), historical events and further places.
As the object-oriented database again is using classes and properties of the international standard CIDOC CRM, it is highly interoperable within cultural heritage documentation. For making the data publicly available, the THANADOS web-interface/presentation frontend 7 combines a cartographic with a data-centered representation specifically designed for a wider range of archaeological structures.
To illustrate this, the Franciscan Cadastre [22,24], a 19th century cadastral map, was vectorized as polygons for the study area and imported into OpenAtlas. The individual areas of the map were divided into the following categories based on the original division of the cadastral map (e.g., buildings, gardens, vineyards, agricultural fields, meadows, forest, water bodies, quarries and cemeteries) ( Figure 5). Paths, rivers and borders were mapped as polylines. In addition to the representation of the different land uses in different colors, the respective toponyms were also documented where these were indicated.  7 The website is built upon the THANADOS frontend (https://thanados.net), only that here not only graveyards but a wider range of archaeological structures and historical sources are in the focus.
After the vectorization was completed, the cartographic data set of the study area comprised 699 individual polygons ( Figure 6). These vectorized parcels could subsequently be used for various area calculations as well as for the analysis of the individual land use or other statistical analyses. Yet further information contained in historical maps, such as topography, toponyms, cadastral boundaries, meadow-enclaves and secondary acquired fields can also be displayed digitally in a variety of ways. In addition, the resulting polygons form the basis for further cross-linking of the historical map with other sources and metadata. Apart from a colored representation of the different utilization of areas according to the original maps, numerous other information derived from these maps can be profitably visualized by means of such an online application as well. For example, based on an approach already pursued by Klaus Schwarz in the 1980s [25,26], a stratigraphic analysis [38] of agricultural fields and path-networks indicated in the map was carried out. Based on the assumption, the paths running along adjacent fields (concordant) must be as old or older than these, while paths that intersect certain fields (discordant) are younger, a depiction of the development and relative chronology of the path and field system was deduced. Based on this approach, a depiction of different development phases is possible, indicating areas with older and younger fields and paths (Figure 7). This could, for instance, be used to detect desertification by searching the path and field systems for older structures that could indicate former villages or farms, or simply to achieve a deeper understanding of diachronic development processes, as well as a tool for a general historic landscape characterization [39]. To further highlight the advantages of such an approach for historical landscape research, information on various abandoned medieval villages was included in the database, as mentioned above. In this context, a publicly accessible aerial photograph showing the suspected remains of an abandoned late medieval village in Lower Austria, emerging as cropmarks (Figure 8a), was examined more closely. The assumed farm plots, the individual gardens adjacent to these plots and a surrounding ditch system recognizable to the north were outlined as polygons (Figure 8b) and added to the database (Figure 9). This step could, however, also be carried out directly in OpenAtlas, paralleling the mapping of cemeteries in THANADOS. Yet the intrinsic added value of such an online database ultimately arises in the further investigation of the village. Especially in the case of larger interdisciplinary research projects or several consecutive projects, where a wide variety of specialists must cooperate on a common topic. In the case of deserted villages, this should include archaeologists with experience in archaeological prospection and GIS who prepare and interpret the remote sensing data as well as historians who examine the respective written sources and enter them into the database. In addition, there might be other archaeologists, geologists or botanists who add further information from possible excavations, corings or surveys, including the analyses of possible soil samples or plant remains and the like. Especially if these researchers are located at different institutions or even in different countries, or if data are collected over a longer period of time, this could facilitate the work process enormously. Regarding this specific case study, however, there are three wider areas in Lower Austria currently under investigation by one of the authors, each comprising dozens of deserted medieval villages which are comparably densely located within these areas. Making the discussed landscape archaeological data available online would not only benefit possible future research projects but could also become relevant for numerous historical societies active in these regions and the local population in terms of an increased and scientifically guided citizen science involvement. In addition, the data could also be used for the planning prior to new construction projects in order to avoid redundant prospecting or to redirect corresponding construction projects to other areas.
The collaboratively collected data in this instance would comprise general information on the village itself, based on written sources, including their transcription and translation, such as its overall period of use, its possessors and documented inhabitants as well as historical events involving the village. Archaeological prospection data are adding decisive new information on the internal structure and conception of the site, while historical maps are providing information on the wider agricultural landscape and its development over time, both documented through polygons, which in turn can be linked to specific information from other sources, depending on the level of detail of the written documents. Building on this data, targeted excavations or corings could be conducted, and the information and research results collected in this way are stored on a long-term basis, publicly accessible and available at any time for further research and analyses.

Conclusions
To summarize the above, it can be concluded that the use of OpenAtlas or similar programs for the preparation and collaborative analysis of different historical and archaeological data sets will generate significant and sustainable added value for future research projects. Considering the constantly growing requirements for the documentation and research of cultural heritage in terms of digital tools and new technical or methodological developments as well as their mostly inherent demands for Open Access and Fair Principles, a growing commitment and the further development of the necessary software is practically without alternative.
In this respect, OpenAtlas and the web frontend presented here, with the adaptations introduced above to take into account a wide range of historical sources based on both traditional and novel methods of historical research, is a significant methodological development. On the basis of some practical applications, we attempted to show that already now a large spectrum of publicly available data are at hand, whose true potential can only be fully exploited by such a collaborative and web-based approach. Of course, the success of such an endeavor is highly dependent on an appropriately large circle of users and contributing researchers as well as a sustainable concept and a future-oriented data structure with continuous further improvement. This aspect has also been extensively taken into account and the authors would like to take this opportunity to actively encourage the establishment of contacts with regard to constructive criticism and possible future cooperation in order to further promote such approaches.
Funding: This research was funded by the Austrian Academy of Sciences, Go!Digital Next Generation program: GDNG 2018-039.