Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Practices of Linked Open Data in Archaeology and Their Realisation in Wikidata

Digital 2022, 2(3), 333-364; https://doi.org/10.3390/digital2030019

by Sophie C. Schmidt^1,*

, Florian Thiery²

and Martina Trognitz³

Reviewer 1:

Cesar Gonzalez-Perez

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Digital 2022, 2(3), 333-364; https://doi.org/10.3390/digital2030019

Submission received: 20 April 2022 / Revised: 2 June 2022 / Accepted: 17 June 2022 / Published: 22 June 2022

(This article belongs to the Special Issue Bridging Digital Approaches and Legacy in Archaeology)

Round 1

Reviewer 1 Report

The article has a very wide and ambiguous title, "Practices of Linked Open Data in Archaeology". The abstract states that the paper presents "how Linked Open Data (LOD) is applied in the archaeological domain", but doesn't say by whom or for what purposes, which contributes to the ambiguity. I suggest the authors rephrase the title and abstract to express more specifically what the paper is about.

Sections 1, 2 and 4 read more like promotional material than a scientific paper. The authors are explicit in that "We would like to encourage the creation and use of LOD", and spend many pages describing how LOD works and why it is useful. However, they fail to persuade the reader that LOD must be used, because:

They don't provide a gap analysis that can show how much LOD improves upon the current state of affairs. In other words, the benefits of using LOD are not clearly measured and presented. For example, how much time, money or effort will LOD save to a project? How much time, money and effort will an archaeologist need to spend in order to to learn and apply LOD to a project?
They don't consider other options. The authors seem to assume that it is LOD or nothing, and don't even mention other approaches to knowledge representation beyond LOD. For example, how is LOD better than conceptual modelling? In this regard, all the ontologies and vocabularies mentioned in page 6 are LOD-based or LOD-friendly; the authors are not mentioning relevant initiatives that are based on technologies other than LOD such as CHARM (www.charminfo.org).

Without a clear treatment of these issues, the paper must be taken as the opinion of the authors, as no systematic study of LOD against nothing and against other options is presented. Furthermore, conclusions are not backed by evidence. For example, the authors state that "When LOD is properly implemented, computer-aided processing and inference become possible", but they show no evidence of this. If the authors have evidence that computer-based inference is possible thanks to LOD, they should be explicit about it and describe how.

Regarding the challenges of LOD, I agree with the authors that semantic overloading and technical complexity are important. However, authors fail to address additional weaknesses of LOD, such as the fact that it cannot organise information into meaningful chunks or structures; the fact that its simple scheme of subject/predicate/object is not enough to represent intransitive verbs or cross-cutting concerns (such as temporality or subjectivity) in an intuitive manner; or the fact that it collapses abstraction levels by mixing up conceptual and implementation issues. In this regard, modularity and layering are crucial aspects of any software engineering approach, and LOD alone is not enough to address them. The authors may want to consider LOD as an implementation technology that is applied downstream in the lifecycle after other technologies (such as conceptual modelling) have been used to model knowledge into a meaningful representation.

Finally, the authors make some technical errors. For example, in pages 5 and 7, the authors present CSV as a "standardised format", and contrast it with XLSX, which they discourage. However, XLSX has been standardised as ISO/IEC 29500, and can be opened and processed by multiple free and open software applications. CSV, on the contrary, and despite RFC 4180 and MIME specifications, lacks proper standardisation. Well-known issues related to CSV's lack of standardisation include ambiguity regarding whether CR, LF or a combination should be used for line ends; how to escape embedded commas; how to detect an initial header row; or how to deal with line breaks inside quoted blocks. Not to mention the fact that XLSX supports strong typing and complex data structures, whereas CSV doesn't.

Another technical error made by the authors relates to the statement "In order for data to become machine-readable, it has to be annotated (also: marked up or tagged) with metadata". This is clearly incorrect, as most data read and processed by computers is never annotated. Annotation, in any case, may be useful for "semantic" (in the W3C sense, not the linguistic sense) processing. But data, in general, is perfectly machine readable without annotations.

Overall, I don't think the article presents an original research contribution beyond the examples of LOD usage in specific archaeological projects. If the authors want to "encourage the creation and use of LOD" as they say, I encourage them to consider a comprehensive analysis of LOD against nothing and against alternative technologies. and present them so that archaeologists can use their guidance to make informed decisions on whether to use LOD or not.

Author Response

Dear Reviewer,

Thank you for your insightful comments on our Linked Open Data paper. We would like to explain here how we improved the paper in reaction to your comments:

Rephrase the title and abstract to express more specifically what the paper is about:

We added a specification to the title (new: Practices of Linked Open Data in Archaeology and their Realisation in Wikidata), amended the abstract and changed the introduction to better reflect our intentions.

Lack of a gap analysis and clear measurements of the benefits of using LOD:

We added a section regarding the benefits and drawbacks of LOD (Chapter 2.2), which links back to the design principles explained before and are confident these will enable readers to come to well-informed conclusions on the gains and costs of LOD.
A thorough gap analysis regarding costs of human resources for the implementation of LOD, which we understand would strengthen our point, is sadly out of scope for this paper.

Lack of other options and comparison of LOD with other knowledge representation systems and using other techniques for meaningful representation:

We agree that LOD is to be implemented at the end of a workflow that incorporates the modeling of data semantically and conceptually and have underlined this point in the paper.
We added a paragraph on comparable alternatives to LOD (lines 60-67).
CHARM has been added as another ontology, which can be the basis for LOD. (line 370)
We would like to stress though, that the main aim of LOD is not knowledge representation, but the linking of data.

Lack of evidence that computer-based inference is possible thanks to LOD:

We have added references to this statement and a small elaboration. As further explanations would be very technical, we decided to leave it as is.

Comparing XSLX and CSV :

The topic of standards regarding XLSX and CSV is more complicated than should be discussed in the paper. We therefore redacted the mention of XLSX and kept CSV, which has been documented as RCF 4180 https://datatracker.ietf.org/doc/html/rfc4180 . In practice, both formats are not always implemented according to their specifications, as also Microsoft does not always apply the ISO-standards (see https://en.wikipedia.org/wiki/Office_Open_XML).

Machine-readability of data:

We agree that data may be read without being annotated. In the context of the semantic web, though, annotations enable further reasoning approaches. Therefore we have added information regarding machine-readability in this context.

Consider a comprehensive analysis of LOD against nothing and against alternative technologies. and present them so that archaeologists can use their guidance to make informed decisions on whether to use LOD or not:

We hope that by adding further information and clarifying some points as described above (no. 3) we now enable readers to see benefits and drawbacks of LOD and decide on the usefulness of the approach.
We have restructured the paper to better follow our intention. The focus lies now more clearly on the use of Wikidata as a relatively easy-to-use linking hub.

We are confident that the added information due to your requests strengthens our paper. Thank you again for your helpful comments and clarifications.

Reviewer 2 Report

This is an excellent and thorough contribution that explains concepts with clear, tangible examples. The bibliography is rich and the references to current programs and links to external sites are really informative and illustrative of the authors' points. This is a very informative and clearly written paper and I recommend publishing it with no revisions.

Author Response

Dear reviewer,

we thank you for your positive comments. Due to wishes from other reviewers, some changes have been made to the paper, but we are confident it keeps up with the spirit of the draft.

Reviewer 3 Report

I approached this with real anticipation, as LOD in archaeology has always been presented in either a highly technical and somewhat opaque manner, or in a basic introductory format without any clear indication of how to use it. This paper promises in its introduction to create a less technically demanding bridge to LOD for users – since LOD has always seemed a very geeky and complex set of tools, anything that makes it more accessible is a good thing. Furthermore, it would be a genuinely novel contribution to the debate of LOD in archaeology. Unfortunately, this isn't followed through as clearly or effectively as it might be as it currently stands, but the objective is well worth pursuing nevertheless as the basics are all largely present if insufficiently focused at the moment.

Several technical terms pop up from time to time – SPARQL for instance – but are not explained.

The introductory section on LD is perhaps overlong – the topic of triples etc. is widely covered elsewhere – the big issue to my mind is how this is implemented in real life, which seems to be what this paper promises to explain? The use of WIkidata is interesting and this is the first such archaeological presentation that I can recall. In this respect, it’s perhaps unfortunate that all the examples and cases described are ongoing/complete – what would be really helpful in this context, rather than a somewhat lengthy discussion of LOD, would be an explanation of how to implement LOD in Wikidata as a starting point for all those case studies? As it stands, there’s a leap from the theoretical description of LD to the set of case studies, and therefore a gap at the critical bridging point, contrary to what is promised in the Introduction. Overall, therefore, there is a tendency to lose sight of this less-technical readership, and sections can come across as quite ‘tekky’.

First paragraph of the Introduction (lines 10-19) is a very compressed summary of the situation and open to debate – the situation, while undoubtedly challenging, is not the same across the world. It seems rather reliant for its picture of the state of archiving on the unpublished NFDI4Objects survey. I’d suggest that a brief overview of the recent collection of national summary papers published under the SEADDA banner in Internet Archaeology vol 58 and the follow-up paper in vol 59 by Geser, Richards, Massara, and Wright (https://doi.org/10.11141/ia.59.2) would provide a much stronger basis for the introductory discussion (to be fair, the introduction to vol 58 is already referenced in the paragraph and there’s reference to the broader volume at line 184).

The Introduction would benefit from a brief summary of what LOD actually is – at it stands, it’s offered as a solution in paragraph 2 (lines 20-23) but there’s no indication as to why this is the case. I’d suggest that moving the first two sentences of section 1.1 (lines 35-37) to here would easily resolve this?

RDF on line 56 should perhaps have a reference? Given the use of W3C resources elsewhere, https://www.w3.org/RDF/ is perhaps the obvious one to use?

The rules for LD (lines 71-76) could usefully be presented as a numbered list, since they are referred to later as numbered rules? They also don’t clearly and unambiguously match up with following text: for example, ‘rule 3’ is described as “useful information should be provided using standards” (line 73) but later becomes “LD rule number three states: “When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)” (line 87-88). In an introductory piece on LOD, this just seems confusing (and is "RDF*" different to "RDF", and what is "SPARQL"?).

The historical perspective of LOD in archaeology seems dealt with in quite a cursory fashion? Lines 156-164 cover a huge amount of work by a range of different parties, and it would be helpful to expand on this somewhat? There were years of work on LOD by Tudhope, Binding, May etc. which is only given one reference [31]. A historical account ought to refer to a broader range of existing work. For instance (and there are doubtless others!):

Binding, C., May, K. and Tudhope, D. (2008) ‘Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM’, in Christensen-Dalsgaard, B. et al. (eds) Research and Advanced Technology for Digital Libraries (12th European Conference Proceedings, Aarhus: 2008). Berlin, Heidelberg: Springer Berlin Heidelberg (Lecture Notes in Computer Science, 5173), pp. 280–290. doi:10.1007/978-3-540-87599-4_30.

May, K., Binding, C. and Tudhope, D. (2010) ‘Following a STAR? Shedding More Light on Semantic Technologies for Archaeological Resources’, in Frischer, B., Crawford, J.W., and Koller, D. (eds) Making History Interactive. Computer Applications and Quantitative Methods in Archaeology (CAA). Proceedings of the 37th International Conference, Williamsburg, Virginia, United States of America, March 22-26 2009. Oxford, UK: Archaeopress (British Archaeological Reports International Series, 2079), pp. 227–233.

May, K., Binding, C. and Tudhope, D. (2015) ‘Barriers and opportunities for Linked Open Data use in archaeology and cultural heritage’, Archäologische Informationen, 38, pp. 173–184. doi:10.11588/ai.2015.1.26162.

Tudhope, D. et al. (2011) ‘A STELLAR role for knowledge organization systems in digital archaeology’, Bulletin of the American Society for Information Science and Technology, 37(4), pp. 15–18. doi:10.1002/bult.2011.1720370405.

Vlachidis, A. et al. (2013) ‘Automatic Metadata Generation in an Archaeological Digital Library: Semantic Annotation of Grey Literature’, in Przepiórkowski, A. et al. (eds) Computational Linguistics. Berlin, Heidelberg: Springer Berlin Heidelberg (Studies in Computational Intelligence, 458), pp. 187–202. doi:10.1007/978-3-642-34399-5_10.

The Archaeology Data Service (which is mentioned in this section) also undertook a number of early attempts to link datasets – for example, very early implementations of Z39-50, the ARCHway, ARENA, HEIRPORT, Common Information Environment (leading to the Archaeotools facetted browser), and Transatlantic Archaeology Gateway projects, all of which preceded the projects referred to in lines 191-193 and arguably provided the foundation, background and experience for those mentioned (see https://archaeologydataservice.ac.uk/research/projects.xhtml for details and links to references etc.). Again, a historical overview might reasonably mention some of this earlier work? I appreciate that Geser’s 2016 ARIADNE report is being relied upon for some of this historical perspective, but it isn’t especially strong in that respect, so there’s perhaps an opportunity being missed here.

Would it not be more logical in this context to use an archaeological example to illustrate the Wikidata data model (p 8 and fig 7) rather than the Douglas Adams example provided? Perhaps continue the Phaistos Disk example, for instance? This might also provide the opportunity to provide that bridging example between the theory of LOD and its practice within Wikidata?

The links associated with the Wikidata projects described in Section 3 could be much clearer and emphasised in some way – there are so many inline links the reader is left wondering where to start, so some guidance might be helpful!

I’d suggest reviewing the set of case studies – there are rather a lot of them! What is the key point or characteristic each is demonstrating, for instance? They aren’t really working hard to clarify how LOD can be easily implemented otherwise. For instance, the final section 3.5 summarises some of the key aspects (lines 801-806) but demonstrating these over and over again in the case studies isn’t perhaps the best approach and a more targeted, explicit presentation would work much better?

The presentation of the tools listed in Section 4 could be clearer. Foregrounding the problems or shortcomings experienced with Wikidata for the less-technical reader would provide a more useful approach to explaining where these tools can be brought to bear, rather than presenting a rather bald list as it currently appears?

Author Response

Dear reviewer,

Thank you for your positive reaction and feedback. We were encouraged to improve upon the paper due to your helpful comments and would like to explain the changes made according to your feedback here:

Missing explanation of technical terms:

We checked again that at the first mention of a technical term it is described at the very least with a reference.

Missing explanation of how to implement LOD in Wikidata:

We added information on how to use Wikidata as a user and link to different sources thereby creating LOD. We also restructured the paper: The introductory section on Wikidata is now a sub-chapter of a section called “Creating and Publishing LOD”.

Lack of diversity regarding the state of the art in archaeological data management:

We edited the introduction regarding archaeological data management (esp. lines 15-23).
We understand that publications regarding data management in archaeology are skewed towards the “global West” and have added further caveats to our overview. For this, we draw on the SEADDA papers in Internet Archaeology vol 58 and Geser, G., Richards, J.D., Massara, F. and Wright, H. 2022 Data Management Policies and Practices of Digital Archaeological Repositories, Internet Archaeology 59.

Introduction would benefit from a brief summary of what LOD is

We have added a definition of LOD to the introduction (lines 25-30).

RDF on line 56 should perhaps have a reference

Thank you for finding this, we have added the reference (lines 76, 210 and 215).

The rules for LD don’t have numbers

Thank you for finding this, we changed this to an enumerated list (lines 102-107).

Is "RDF*" different to "RDF", and what is "SPARQL"?

We added an explanation to both terms (lines 77 and 105).

The historical perspective of LOD in archaeology seems dealt with in quite a cursory fashion

Yes, this is true. The history of LOD in archaeology could be a paper of its own. We thank you for the references provided and added some more information on work done in the STAR, STELLAR, and TAG projects. Most of the mentioned articles were cited by Geser in 2016 and we tried not to repeat too much of the already aggregated published information. An overlong list of projects would tire the reader, which is why we tried to keep it short and concise. We placed the cited studies into some more context regarding how LOD is distributed within the world.

Use an archaeological example to illustrate the Wikidata data model (p 8 and fig 7) rather than the Douglas Adams example provided

This is a good idea and we have done so, providing now the Phaistos disc as an example (Fig. 7, starting line 217).

Section 3: there are so many inline links the reader is left wondering where to start, so some guidance might be helpful!

We understand the point. After restructuring and “slimming down” the chapters on the Wikidata projects, we believe it is now a bit easier to navigate than before.

Case Studies: What is the key point or characteristic each is demonstrating, for instance?

This is a valid point. We have re-structured and re-written the case studies to more clearly show the benefit of Wikidata for different stages and types of a project.

The presentation of the tools listed in Section 4 could be clearer.

We have re-written parts of the tool descriptions to better reflect their use for projects and explained how they facilitate the use of Wikidata.

We are very grateful for your useful suggestions and are convinced the paper has been improved by reworking parts in reaction to your comments.

Round 2

Reviewer 1 Report

This second version of the paper has been clearly improved. It is a pity that a proper evaluation of pros and cons of LOD cannot be included, but I am aware of the limitations of research and paper length. I encourage the authors to pursue this line of research in future publications.

Reviewer 3 Report

The issues flagged have been comprehensively dealt with, and the resulting paper is a much stronger and more coherent contribution. A lot of care has clearly gone into restructuring and re-presenting key elements of the paper and the whole now works well barring some very minor spelling/punctuation hiccups.

Article Menu

Practices of Linked Open Data in Archaeology and Their Realisation in Wikidata

Further Information

Guidelines

MDPI Initiatives

Follow MDPI