An Open System for Collection and Automatic Recognition of Pottery through Neural Network Algorithms

: In the last ten years, artiﬁcial intelligence (AI) techniques have been applied in archaeology. The ArchAIDE project realised an AI-based application to recognise archaeological pottery. Pottery is of paramount importance for understanding archaeological contexts. However, recognition of ceramics is still a manual, time-consuming activity, reliant on analogue catalogues. The project developed two complementary machine-learning tools to propose identiﬁcations based on images captured on-site, for optimising and economising this process, while retaining key decision points necessary to create trusted results. One method relies on the shape of a potsherd; the other is based on decorative features. For the shape-based recognition, a novel deep-learning architecture was employed, integrating shape information from points along the inner and outer proﬁle of a sherd. The decoration classiﬁer is based on relatively standard architectures used in image recognition. In both cases, training the algorithms meant facing challenges related to real-world archaeological data: the scarcity of labelled data; extreme imbalance between instances of different categories; and the need to take note of minute differentiating features. Finally, the creation of a desktop and mobile application that integrates the AI classiﬁers provides an easy-to-use interface for pottery classiﬁcation and storing pottery data.


Introduction
Over the last decade, artificial intelligence (AI) has become widespread across science and technology.Born in 1955 [1], the different facets of AI have gone through waves of innovation before becoming ubiquitous.Machine learning (ML) algorithms were developed in the 1980s [2], but their use has become common only within the last decade with the ability to produce huge datasets (Big Data), and with the advent of neural networks.Traditionally, AI addresses tasks such as reasoning, knowledge representation, planning, learning, natural language processing (NLP), perception, and robotics.Methods include statistics, computational intelligence, and symbolic AI (AI with human-readable representations).The tools used in these tasks consist mainly in mathematical optimisation, statistical tools, and artificial neural networks (ANNs).
Within archaeology, the usefulness of AI is now formally explored.Five to ten years ago, ML algorithms and neural networks were concepts unknown to archaeologists; now, there are sessions dedicated to AI at archaeological conferences.AI techniques have been applied in various field of archaeology, especially for (i) the discovery of archaeological sites; (ii) the recognition and reassembling of archaeological pottery; (iii) the extraction of text and name entity recognition (NER); (iv) the analysis of human remains; (v) murals and graffiti drawings; and (vi) robotics.In general, archaeology benefits from AI when a vast amount of data needs to be analysed; and when complicated, subjective, highly specialised, and time-consuming activities are required (such as in the identification of finds).
Artificial neural networks (ANNs) are used to manage some of the severe problems that manifest in archaeological data: incompleteness, noisiness, messiness, and non-linear relationships between the data.Techniques such as (i) multilayer perceptron network (MLP) [3], which provides a supervised learning technique called backpropagation that permits finding the weights of a network [4]; (ii) probabilistic neural network (PNN) [5] that works with a kernel density estimation; (iii) convolutional neural network (CNN) [6], a group of neural networks used in computer vision, in which the connection between artificial neurons resembles the structure of the visual cortex; and (iv) self-organizing feature map (SOM) that employs an unsupervised competitive learning method to obtain dimensionality reduction are applied.Some early AI archaeological implementations focussed on the classification, seriation, and analysis of material culture, such as artistic representations [7,8], use-wear of prehistoric tools [9], historical glass artefacts, and ancient coins [10].The application of ML and deep learning in archaeology underwent a decisive turn towards the detection of archaeological sites during the last years.Examples in the detection and exploration of terrestrial and marine archaeological sites come from various projects.The Archäoprognose Brandenburg project [11] adopted a combined PNN and SOM solution to develop archaeological predictive modelling for identifying the possible location of archaeological sites in Brandenburg (Germany).In the Dzungaria Landscape project [12], CNN was employed to detect Iron Age tombs in the Eurasian steppe, to find archaeological sites and related toponyms in historical cartography [13], and to identify pottery fragments in drone imagery [14].A random forest algorithm has been used for the detection of archaeological mounds in the Cholistan (Pakistan) employing a large-scale collection of multitemporal synthetic-aperture radar and multispectral images [15] and using aerial laser scanning (ALS, lidar) data to identify megalithic funerary structures in the region of Carnac (France) [16].
Supervised learning approaches (machine and deep learning) for the automated classification of three-dimensional (3D) architectural components (columns, facades, and more) in large datasets have also been recently explored [17].Arch-I-Scan realised a prototype system for the detection and classification of whole pottery vessels [18].
Virtual reconstruction of artefacts from fragments has been handled in different contexts, such as automatic puzzle solving.Recently, clustering techniques were designed to group fragments for re-building the original image by ordering the pieces identified.An advanced variation of puzzle-solving is the reassembling of archaeological artefacts.Some research teams proposed approaches based on 3D models using the information encapsulated in the thickness of the potsherd [19], or adopting a comparison of vectors and surfaces, performed linearly, applying an appositely developed algorithm (Fragmatch) [20].The solving of archaeological puzzles using both 3D models of fragments and images has also been explored by the GRAVITATE project [21].Reconstruction of potsherds and text has been achieved on a group of ostraka with demotic inscriptions, focusing on 2D reconstruction techniques using a specific multilayer architecture of deep neural network (DNN) called Siamese neural network, which distinguishes similar pairs [22].
Archaeological texts are often reported into epigraphical inscriptions.Frequently, inscriptions are damaged, fragmentary, and illegible, making it difficult for NLP.Pythia [23] is an automated ancient text restoration system that recovers missing characters from damaged text using DNN.More generally, NLP techniques have been employed in archaeology from the 1990s [24] to identify the process model from the text [25], in iconographic representation research, for numismatic [26,27] and artwork studies [28], in zooarchaeology [29], and to make grey literature more accessible [30].
AI has been applied to the study of human remains.Bewes et al. [31] developed a neural network for identifying the sex of individuals starting with 3D reconstructions of skulls based on CT scans.The transfer learning technique, based on pre-trained GoogLeNet, coupled with backpropagation, was applied.Czibula et al. [32] compared two supervised regression models-one based on an ANN and the other based on genetic algorithms (GA)-to estimate stature from bone measurements.The ANN achieved a better result than the GA.
Geochemical [33] and archaeobotanical [34] research is now setting up different projects to develop automated identification procedures, which can boost traditionally arduous and time-consuming techniques.
ML techniques were used for automated petroglyph image segmentation with interactive classifier fusion [35], in reconstructing fresco segments [36], and with remote sensing in the Mogao Caves [37].
The use of robots has been explored by projects related to underwater explorations and museums.As for the first, the VENUS project [38] used AUVs/ROVs (autonomous underwater/remotely operated vehicles) coupled with data acquisition techniques (sonar and photogrammetry) for underwater exploration of shipwrecks aimed at data collection and extraction of 3D models.A similar approach has been used to map the floor of the Mediterranean Sea around the island of Malta [39].As for the second, many cultural institutions and museums have proposed AI solutions to engage visitors, using chatbots and robots to understand questions, communicate responses, create paths in the museum to create a more in-depth understanding, and develop software to automate the organisation of exhibitions.Robovie-R ver.2 [40] is a humanoid robot that reproduces the description of artworks with movements akin to those of a human guide, using face recognition, and response methods implemented with AI.Minerva software [41] uses a multiagent system developed using distributed AI for grouping artefacts according to the user's criteria and arranges them in the rooms of a museum.
The present paper presents a short overview of the ArchAIDE project (Section 2), explains the methods adopted for developing the shape-based and appearance (decoration)based recognition of potsherd through one picture taken from a mobile device or a camera (Section 3), and discusses the results obtained and the steps followed for improving the application (Section 4).Section 5 describes the importance of data availability and mostly open access to research data for training the neural networks.It also points out how sharing the AI algorithm as open source code is essential in an open science environment.Section 6 exemplifies the facility of use of the ArchAIDE system through its mobile application.The final section, Section 7, discusses the difficulties encountered and the project's future development.

ArchAIDE Project
Within this scenario, the ArchAIDE project (2016-2019) developed two different deep neural networks (DNNs) devoted to recognising pottery through images using a mobile device.One of the networks is dedicated to image recognition (also called appearancebased recognition, for pottery decorations), the other to shape recognition (for pottery types).ArchAIDE was thought of as a response to well-defined archaeological needs.During archaeological investigations, pottery is the most common type of finding, and its analysis and classification allow the understanding of much information related to the archaeological contexts, from the chronology to the function, and social structures.Ceramic identification is a repetitive and time-consuming activity based on the archaeologist's expertise and is usually made by matching potsherds to exemplars in catalogues of archaeological typologies (Figure 1).The ArchAIDE project operated for optimising this identification process, developing a new system that simplifies the practice of pottery recognition in archaeology, through an AI approach, and without replacing the knowledge of domain specialists.On the contrary, ArchAIDE assured archaeologists' role at the centre of the decision-making process within the identification workflow.
To achieve its goals, the ArchAIDE project created (i) a digital comparative collection for pottery types [42], decorations [43], and stamps [44], combining digital collections, digitised paper catalogues, and data acquired through photo campaigns; (ii) a semi-automated system for paper catalogues' digitisation [45]; (iii) a multilingual thesaurus of descriptive pottery terms, mapped to the Getty Art and Architecture Thesaurus, which includes French, German, Spanish, Catalan, Portuguese, English, and Italian [46]; (iv) two distinct neural networks for appearance-based and shape-based recognition (partially discussed here [47,48]); and (v) an app connected to the AI classifiers to support archaeologists in recognising potsherds during excavation and post-excavation analysis, with an easy-to-use interface.To achieve its goals, the ArchAIDE project created (i) a digital comparative collection for pottery types [42], decorations [43], and stamps [44], combining digital collections, digitised paper catalogues, and data acquired through photo campaigns; (ii) a semi-automated system for paper catalogues' digitisation [45]; (iii) a multilingual thesaurus of descriptive pottery terms, mapped to the Getty Art and Architecture Thesaurus, which includes French, German, Spanish, Catalan, Portuguese, English, and Italian [46]; (iv) two distinct neural networks for appearance-based and shape-based recognition (partially discussed here [47,48]); and (v) an app connected to the AI classifiers to support archaeologists in recognising potsherds during excavation and post-excavation analysis, with an easy-to-use interface.
The ArchAIDE system is based on a pipeline where archaeologists take a picture of a potsherd and send it to the specifically trained classifier, which returns five suggested matches from the comparative collections.Once the correct type is identified, the information is linked to the photographed sherd and stored within a database that can be shared online (Figure 2).The ArchAIDE system is based on a pipeline where archaeologists take a picture of a potsherd and send it to the specifically trained classifier, which returns five suggested matches from the comparative collections.Once the correct type is identified, the information is linked to the photographed sherd and stored within a database that can be shared online (Figure 2).

Materials and Methods
The set of tools developed by the project addresses two scenarios: (i) when the pottery is undecorated, the identification relies on the shape (i.e., profile's geometry) of the sherd; (ii) if decorations (i.e., colours and patterns) are present, classification is usually based on those, since they can provide a more reliable diagnostic than the shape of the

Materials and Methods
The set of tools developed by the project addresses two scenarios: (i) when the pottery is undecorated, the identification relies on the shape (i.e., profile's geometry) of the sherd; (ii) if decorations (i.e., colours and patterns) are present, classification is usually based on those, since they can provide a more reliable diagnostic than the shape of the sherd.
The first goal of ArchAIDE was to realise a proof of concept.The selection of pottery classes was based on the need (i) to find types that relied on shape-based and decorationbased characteristics for identification; and (ii) to realise a system that could have a realworld implementation.The decision was made to choose four classes: amphorae manufactured throughout the Roman world between the late 3rd century BCE and the early 7th century CE (Figure 3a); Roman Terra Sigillata manufactured in Italy, Spain, and South Gaul between the 1st century BCE and the 3rd century CE; Majolica produced in Montelupo Fiorentino (Italy) between 14th and 18th century; and medieval and post-medieval Majolica from Barcelona and Valencia (Spain) (Figure 3b).

Shape-Based Recognition
Since the goal was aiding archaeologists in the field, we tackled classifying a potsherd profile based on a single picture of it.A significant challenge in building the necessary AI

Shape-Based Recognition
Since the goal was aiding archaeologists in the field, we tackled classifying a potsherd profile based on a single picture of it.A significant challenge in building the necessary AI tools is that one cannot obtain sufficient real-world samples to train neural networks.Furthermore, given its variability, an archaeological dataset would contain only a small fraction of the possible sherds.Instead, we defined each class (i.e., a pottery type) by twodimensional drawings of the profile of the complete vessel.Whereas the drawing describes the geometry of the entire vessel's profile, a real potsherd is a part of it (many times a tiny one) which contains minimal information about the shape as a whole.Consequently, the recognition tool was designed as a two-phase process, where the classification algorithm was first developed on one dataset and then validated on other datasets for different types of pottery.Separation of datasets enables avoiding an overfit due to multiple hypothesis testing, thus enabling better confidence in the results.The dataset used in the first phase was composed of 435 sketches of Terra Sigillata Italica (TSI), grouped into 65 standardised toplevel classes (i.e., the top-level types defined in the Conspectus catalogue [49]).From these drawings, class-balanced synthetic data (i.e., 3D models) were created, while reserving the real-world sherds' outlines to be used solely for testing.The real-world outlines were traced from potsherd photographed in archaeological warehouses throughout Europe using the dedicated ArchAIDE mobile app (see Section 6).The real-world test dataset contained 240 extracted outlines from 29 different top-level classes.Nevertheless, the classifier was trained on all 65 classes.
On the dataset side, 3D models of each pottery type were reconstructed by automatically extracting the profile of the entire vessel from 2D drawing, and by rotating the profile around its revolution axis and shattering it to derive synthetic sherds [45] (Figure 4).To circumvent the computation overhead of 3D reconstruction, we imagined circles going around the vertical axis for each point in the profile, then generated a random 3D plane, and calculated how all the circles intersect the plane, connecting the intersection points from the circles along the profile to generate the fracture face.To create a more realistic synthetic fracture, we reduced its size to match real potsherds' dimensions [48].
W archaeological warehouses throughout Europe using the dedicated ArchAIDE mobile ap (see Section 6).The real-world test dataset contained 240 extracted outlines from 29 di ferent top-level classes.Nevertheless, the classifier was trained on all 65 classes.
On the dataset side, 3D models of each pottery type were reconstructed by automa ically extracting the profile of the entire vessel from 2D drawing, and by rotating the pro file around its revolution axis and shattering it to derive synthetic sherds [45] (Figure 4 To circumvent the computation overhead of 3D reconstruction, we imagined circles goin around the vertical axis for each point in the profile, then generated a random 3D plan and calculated how all the circles intersect the plane, connecting the intersection poin from the circles along the profile to generate the fracture face.To create a more realist synthetic fracture, we reduced its size to match real potsherds' dimensions [48].The network was trained based on the distinctive characteristics of archaeologic profiles, including the requirement to divide the inner and the outer profile of the sherd The network was trained based on the distinctive characteristics of archaeological profiles, including the requirement to divide the inner and the outer profile of the sherd, the relevance of the position of the points along the profile outline, the intrinsic noise in the tracing procedure, and the requirement to overcome sub-optimal data acquisition processes [48] (Figure 5).The architecture of ArchAIDE's classifier is similar to Point-Net [50]; it uses pooling to achieve a representation that is invariant to the order of the elements, following a local computation at each element.Such pooling is the only way to obtain this invariance under mild conditions [50,51].Novel applications regarding shape classification include PointNet ++ [50], and PointCNN [52].While most of the preceding work has been directed on 3D point clouds identification, ArchAIDE network encoded a 2D outline and took advantage of the information that arises from the position of the points along the outline.
This fits with its function as a reference tool for pottery specialists who would be glad to evaluate a shortlist of results as part of the obligatory expert validation but would be disappointed to use a tool where the correct result is often completely omitted.
Following the first phase development on the Terra Sigillata Italica (TSI) dataset, three other datasets were added.The first was a supplementary TSI dataset that includes the profiles of additional 96 sherds belonging to 11 classes that were not considered during the test; the other two contain Terra Sigillata Hispanica (TSH) and South Gaulish Terra Sigillata (TSSG) data.These also describe Terra Sigillata pottery, but there is no intersection in classes between TSI, TSH, and TSSG.
On the new TSI test set, using the same model from phase I (without any retraining/adaptations), the accuracy values obtained were even better than the phase I dataset.Additionally, for the datasets containing new typologies, similar or better accuracy (measured relative to the number of classes) was obtained using precisely the same training method, without any adaptations (Figure 6).In the ranking, the OutlineNet's real-world top 2 classification rate was 1.5 times the top 1 classification rate when training the model, suggesting that the classes were easily confused.Ablation experiments (i.e., a method to assesses the performance of the NN by removing specific components, to understand their contribution to the model) showed that separation of inner and outer profiles, angle information, group-hot encoding (i.e., the conversion of categorical data in order to be processed by a NN), and adaptive sampling each add to the overall top-K performance, even when changes in the top 1 accuracy were small.Similarly, augmentation also contributed to the top-K result, without significant impact on the top 1 accuracy.Top-K processing finds a list of K results with the highest scores, assuming that all the K results are independent.In practice, some of the top-K results obtained can be very similar to each other or redundant.A plausible reason is that all these modifications to the model and training are less meaningful for samples that are carefully collected and informative, and mainly impact the accuracy of the lower-quality samples.
This fits with its function as a reference tool for pottery specialists who would be glad to evaluate a shortlist of results as part of the obligatory expert validation but would be disappointed to use a tool where the correct result is often completely omitted.
Following the first phase development on the Terra Sigillata Italica (TSI) dataset, three other datasets were added.The first was a supplementary TSI dataset that includes the profiles of additional 96 sherds belonging to 11 classes that were not considered during the test; the other two contain Terra Sigillata Hispanica (TSH) and South Gaulish Terra Sigillata (TSSG) data.These also describe Terra Sigillata pottery, but there is no intersection in classes between TSI, TSH, and TSSG.
On the new TSI test set, using the same model from phase I (without any retraining/adaptations), the accuracy values obtained were even better than the phase I dataset.Additionally, for the datasets containing new typologies, similar or better accuracy (measured relative to the number of classes) was obtained using precisely the same training method, without any adaptations (Figure 6).

Appearance-Based Recognition
Pottery decorations can be classified based on the presence and combination of colours, the type of patterns, the areas that are decorated, and more.In this case, a transfer-learning technique was applied, as happens in domains characterised by data's paucity.A pretrained version of the ResNet-50 network [53] trained on the ImageNet collection [54] was employed.Images were scaled to a 224 × 224 to fit the expected input dimensions of the ResNet model.To train the network to work with varying amounts of decorations/background, we added augmented versions of each image to the original dataset, scaling it to four different sizes.On each scaled image, we created three versions: unflipped, horizontally flipped, and vertically flipped.All these images were cropped, leaving just the centre square.As a result, 12 images from each original one were obtained, increasing the dataset from around 8000 images to about 100,000 images.
In the first testbeds, the most challenging factor that affected identification was varying illumination.To improve robustness, we simulated different white balance, brightness, and contrast adjustments.The luminosity ("brightness") of all the pixels within each image was multiplied using a randomised factor to simulate different lighting conditions.An analogous random multiplicative factor was applied to each channel in the image compensating the white balance setups; every red/green/blue channel was multiplied by a different random constant factor, to change the ratio between the colours.Moreover, the imaging conditioned (i.e., background and ruler) varied significantly, leading to an inherent bias.The foreground was extracted automatically from the training images using the GrabCut algorithm to avoid this conditioning [55] (Figure 7).ying illumination.To improve robustness, we simulated different white balance, brightness, and contrast adjustments.The luminosity ("brightness") of all the pixels within each image was multiplied using a randomised factor to simulate different lighting conditions.An analogous random multiplicative factor was applied to each channel in the image compensating the white balance setups; every red/green/blue channel was multiplied by a different random constant factor, to change the ratio between the colours.
Moreover, the imaging conditioned (i.e., background and ruler) varied significantly, leading to an inherent bias.The foreground was extracted automatically from the training images using the GrabCut algorithm to avoid this conditioning [55] (Figure 7).

Results
The development of the two neural networks was extremely challenging.In particular, we faced: (i) the paucity of real-world data to train the networks; (ii) the partiality of the potsherd in comparison to the whole object and its high variability due to a random breakage process; (iii) the non-informativeness of a large portion of the sherds, among both the synthetic and real-world data; (iv) the similarity between types which can cause ambiguity in the classification; and (v) the noisiness of the acquisition process due to the procedure for extracting and scaling the profile from potsherd images (shape), and the variability in illumination and background (decorations).
Most neural network loss functions would be prone to sacrificing challenging classes to improve the average accuracy across all classes.Nonetheless, a reference tool is more valuable when it achieves less obvious identification; i.e., it can also recognise less common types.For tackling the heterogeneous and unbalanced nature of the data, the algorithm was trained adopting a novel weighting technique that considers both the error of each ground truth class and false positives in each class.Ground truth means checking the neural network results for accuracy against the real world.This reweighting scheme addressed the difficulty of correctly classifying a sample from a given class and the frequency of the current classification of a sample.The achieved results show quite good recognition accuracy in the face of these challenges.
The full development of the algorithm was implemented through a two-phase process.In the first phase, the method was applied to one dataset of potsherds of one specific family; in the second, the same method, pipeline and parameters, were used to three additional datasets.With the Phase-I dataset (composed of 65 classes), the identification of almost 74% of the sherds within the top 10 results was achieved.Adding the three new (supplementary Terra Sigillata Italica dataset, Terra Sigillata Hispanica (TSH), and South Gaulish Terra Sigillata (TSSG)) datasets, without any change of the pipeline, we reached 81%, 68%, and 60% top 10 accuracy for 65, 98, and 94 classes, respectively.The ranking is essential in information retrieval because it appoints the relative order between the classes, ranking classes with a high degree of relevancy higher than those with a low degree of relevancy.Hence, the ArchAIDE system works as a reliable reference tool to be used in the field, allowing to narrow the list of relevant types to be considered for each potsherd.
The evaluation of the shape-based identification was done both on the captured realworld data (used in the testing phase), and in an end-to-end fashion, with users capturing new photos, annotating them, and using the classification algorithm.
The end-to-end evaluation was done using 381 different pictures of sherds of TSI, taken from 42 different types (out of 65 types).Most images were taken with a smartphone or a tablet (as would be the case in the field), with only 25 pictures using a regular camera.The average mobile-app top 5 accuracy was 50.8% and the top1 accuracy was 18.9%.This is slightly lower than 22.0% top 1 accuracy and 57.9% top 5 accuracy reported in our evaluation, but these results are still useful for archaeologists.
The results reported on the testing data are reported in Table 1: The assessment of the decoration recognition was achieved using on both the mobile and desktop applications.The results for the classification are reported in Table 2: The evaluation was performed on 49 different genres (out of 84) using more than 820 images taken both on mobile devices (700 by phones and tablet) and with a camera (120).Results show that the accuracy, in both the applications, was not affected by the lighting, giving similar results both with artificial and natural light.

Open ArchAIDE
As previously discussed, one of the most complex aspects of the practical application of AI is not the development of the algorithms themselves, but the creation of the dataset used to train them.Archaeology is widely digitised, but rarely datafied [56].Unfortunately, datafication is essential because AI algorithms need data, preferably Big Data, that is also FAIR (findable, accessible, interoperable, and reusable).
The ArchAIDE neural networks also rely on a vast amount of data from digital collections, paper catalogues (necessary to the creation of digital comparative collections included in the reference database) and photography campaigns (for the creation of training datasets).The project has used two main digital collections: the "Roman Amphorae: a digital resource" [57], created by Simon Keay and David Williams of the University of Southampton and published as open data on the Archaeology Data Service, that includes the principal types of roman amphorae between the late 3rd century BCE and the early 7th century CE; and the "CERAMALEX" database [58], a proprietary database of the German and French excavations in Alexandria, Schedia, and Marea, available thanks to partnership with the University of Cologne.In addition to these two collections, printed catalogues in the form of books and papers have been digitised to populate ArchAIDE database.
For achieving the correct management of the material which falls under copyright or database protection, the EU directives on Copyright (2001/29/EC) and Database protection (96/9/EC) were analysed [59].The scientific research exception permitted the implementation of the project, to the extent justified by a non-commercial purpose mentioning the source and the authors' name.For training the algorithm, multiple photo campaigns were also carried out in several archaeological warehouses.The aim was to obtain a dataset of images for all the chosen ceramic classes.Considering that it has not been possible to collect all the data in one warehouse, this task requested a significant effort, involving more than 30 different institutions in Austria, Italy, and Spain.Other images were collected by associates' participation, who sent pictures of their assemblages from many countries.Detailed guidelines were prepared for helping the consortium partners and project associates to take images of sherd profiles that could fit the training of the neural network.All this procedure which included the finding, classifying, photographing, and creating a digital storage was very time-consuming, as images of at least ten different potsherds for every ceramic type were needed to provide enough training information for the algorithm.It appeared that not every top-level type and sub-type could be represented.In some instances, the presence of rare types and the significant number of unclassified sherds inside the warehouses made it impossible to reach the amount needed.Overall, 3498 sherds were photographed for training the shape-based recognition model.For appearance-based recognition, a dataset of 13,676 pictures was collected through multiple photography campaigns.
Participating in H2020 open data pilot, ArchAIDE was committed to creating sustainable outputs where the project held the copyright.Unfortunately, not all the collected data could be disseminated as open data.The research exceptions allowed by the EU Directives [59] do not mean the ArchAIDE project automatically holds the copyright to the newly digitised or remixed data.Negotiation with copyright holders will be necessary for making these data available outside the project.ArchAIDE is able to demonstrate that paper catalogues, once digitised, can be actively reused, also many years later from the first publication.This opens to the possibility of reaching an agreement with publishers and other data providers for making their resources available in new ways, "with a tangible benefit (seeing their data in use within the app), thus furthering the long-term discourse around making research data open and accessible" [60].Instead, data owned by the project, i.e., multilingual vocabularies, videos created by the project, as well as the 2D and 3D models created from the ADS Roman Amphorae digital resource [57], were made available for download [46] (Figure 8).The ArchAIDE archive contains 2D vector drawings in SVG format and interactive 3D models navigable through a 3DHOP 3D viewer [61], that can also be downloaded for 3D printing (Figure 9).These models exemplify an excellent standard of best-practice reuse.When the Roman Amphorae digital resource was deposited in 2005, creating automated 2D and 3D models for training a neural network could not have been a use envisioned.As 2D and 3D models were produced for each type included in the digital resource, it was possible to link the two archives, amplifying their mutual usefulness.available for download [46] (Figure 8).The ArchAIDE archive contains 2D vector drawings in SVG format and interactive 3D models navigable through a 3DHOP 3D viewer [61], that can also be downloaded for 3D printing (Figure 9).These models exemplify an excellent standard of best-practice reuse.When the Roman Amphorae digital resource was deposited in 2005, creating automated 2D and 3D models for training a neural network could not have been a use envisioned.As 2D and 3D models were produced for each type included in the digital resource, it was possible to link the two archives, amplifying their mutual usefulness.It was also hoped the thousands of photos taken by the project for training the algorithms might result in new comparative collections that could be deposited as open research data into the ArchAIDE archive.Still, in many European countries, copyright on cultural heritage is very restrictive and did not allow us to make available the images of potsherds taken by ArchAIDE partners in national and regional collections.Showing the usefulness of these data within the ArchAIDE application might help convince cultural heritage national institutions to move towards more open data policies.Finally, the source code and neural network models are publicly available as open source in a GitHub repository [62] to allow re-use and future development by other researchers.Although all the data collected by users are, by definition, private and are not published, and all system components are designed to comply with this privacy statement, the system offers the It was also hoped the thousands of photos taken by the project for training the algorithms might result in new comparative collections that could be deposited as open research data into the ArchAIDE archive.Still, in many European countries, copyright on cultural heritage is very restrictive and did not allow us to make available the images of potsherds taken by ArchAIDE partners in national and regional collections.Showing the usefulness of these data within the ArchAIDE application might help convince cultural heritage national institutions to move towards more open data policies.Finally, the source code and neural network models are publicly available as open source in a GitHub repository [62] to allow re-use and future development by other researchers.Although all the data collected by users are, by definition, private and are not published, and all system components are designed to comply with this privacy statement, the system offers the option to publish the data as open data.Sponsoring the open data philosophy and EU open data pilot, ArchAIDE suggests to the user to share the data with the community, leaving each user the choice to do that or not.

The App
Mobile and desktop applications were developed to make ArchAIDE fully operational.Their functionality was designed taking into account the workflow of pottery analysis, from the finding in the field to post-excavation examination, considering the environmental context in which these activities are performed (warehouse, remote places, etc.) and the related constraints.Through continuous feedback from the archaeological companies involved in the consortium and external associates who collaborated with the project [63], it was possible to collect suggestions on automating this workflow, improving the design, and generating new prototypes.In the end, various needs have been taken into consideration, from the use as a recognition tool, to collecting and storing data in the form of digital assemblages.The design prioritised intuitive access and ease of use (Figure 10).The final result is a digital ecosystem in which mobile and server-side applications interact through an API server mediating all the communications and activities.
The ArchAIDE Desktop Web Server and the ArchAIDE mobile application provide search and retrieval tools to access the reference database and the classification functionalities.The choice for satisfying this requirement fell on Liferay 7.1, an open-source Portal Server technology widely used to build medium/large web portals.The reference database and the desktop website implemented a single sign-on infrastructure based on CAS (central authentication server) to share the same user archive between the app, the reference database, and the desktop website.The Shape Recognition and Decoration Recognition Model servers implement the pottery type prediction as a unique service.In the first case, the input is an SVG file representing a sherd fracture's outer and inner profile.In the second one, the input is an image of the sherd surface.The result, returned as a JSON array, is a list of ranked ceramic type (or decoration) identifiers paired with a score of relevancy.The ArchAIDE mobile application also gives access to the "my sites" area, dedicated to registered users where it is possible to store information about sites and assemblages.The mobile application was designed for allowing the use in lack of internet connectivity, such as in storehouses or remote rural areas.In these environments, the app permits storing new images of potsherds or browsing the reference database.The app registers the information locally when offline and then saves the information into the server online (Figure 11).
To sum up, access to the reference database and the automatic classification tools are available for all the users without any registration.Registration is mandatory for storing and managing information about sites/assemblages/sherds (e.g., classification information obtained from the classifier, or provenance of a sherd that belongs to an assemblage from a site) that is stored in the local memory of the device and on the ArchAIDE server.The ArchAIDE App is free and available for Android and iOS platforms, respectively on Google Play Store and Apple Store.
form of digital assemblages.The design prioritised intuitive access and ease of use (Figure 10).The final result is a digital ecosystem in which mobile and server-side applications interact through an API server mediating all the communications and activities.available for all the users without any registration.Registration is mandatory for stori and managing information about sites/assemblages/sherds (e.g., classification inf mation obtained from the classifier, or provenance of a sherd that belongs to an asse blage from a site) that is stored in the local memory of the device and on the ArchAID server.The ArchAIDE App is free and available for Android and iOS platforms, resp tively on Google Play Store and Apple Store.

Discussion
ArchAIDE has shown the ability of artificial intelligence in identifying archaeologi pottery, but it has also pointed out some of the challenges that AI applications in arch ology have to deal with.The first is related to the amount of data necessary for traini neural networks.Despite popular perception, one of the most complex aspects of the pr tical application of AI is not the development of the algorithm itself, but the creation the dataset used to train it.Archaeology is widely digitised, but rarely datafied [64], a data availability represents a critical aspect of AI applications.AI algorithms need da preferably Big Data, that is also FAIR (findable, accessible, interoperable, and reusab as well as consolidated, persistent digital infrastructures.However, this is not enough cause vast amounts of data are often unavailable in archaeology, and frequently, data unusable due to copyright or legislation.Collections accessible in digital format, both open re-use and as comparative data for AI applications, like the open databases of t Samian Research of the Roman-Germanic Central Museum [65], the Roman Open D [66], or the already mentioned Roman Amphorae: a digital resource [57] are extrem rare.Furthermore, producing the necessary training and comparative data is time-co suming and demanding, and until this can be addressed, the ability for archaeology to u

Discussion
ArchAIDE has shown the ability of artificial intelligence in identifying archaeological pottery, but it has also pointed out some of the challenges that AI applications in archaeology have to deal with.The first is related to the amount of data necessary for training neural networks.Despite popular perception, one of the most complex aspects of the practical application of AI is not the development of the algorithm itself, but the creation of the dataset used to train it.Archaeology is widely digitised, but rarely datafied [64], and data availability represents a critical aspect of AI applications.AI algorithms need data, preferably Big Data, that is also FAIR (findable, accessible, interoperable, and reusable), as well as consolidated, persistent digital infrastructures.However, this is not enough because vast amounts of data are often unavailable in archaeology, and frequently, data is unusable due to copyright or legislation.Collections accessible in digital format, both for open re-use and as comparative data for AI applications, like the open databases of the Samian Research of the Roman-Germanic Central Museum [65], the Roman Open Data [66], or the already mentioned Roman Amphorae: a digital resource [57] are extremely rare.Furthermore, producing the necessary training and comparative data is time-consuming and demanding, and until this can be addressed, the ability for archaeology to use AI to answer research questions will be irregular, producing low-quality results.In the case of ArchAIDE, this resulted in a massive effort to digitise the paper catalogues and collect primary data through time-consuming photo-campaigns.These allowed us to gather around 17,000 pictures, on the whole, considering a minimum threshold of 10 real-world potsherd images for each type and 100 real-world potsherd images for each decoration genre respectively for the shape-based and appearance-based algorithm.
The second was that it was not reasonable to design an image recognition system that could identify pottery using contemporaneously decoration-based and shape-based characteristics.It appeared evident that it was necessary to develop two different algorithms.If needed, ceramic classes for which both shape data and appearance data are available can be recognised using the two different classifiers to obtain more detailed results.Moreover, the project represents a proof of concept, and new experiments could be conducted with other ceramic classes.
The third was that the archaeological classification is not based on shape or decoration alone.Archaeologists and especially pottery specialists as domain experts use other considerations such as locations, the composition of the assemblage, fabric, and more as elements that permit filtering out some classes.At present, these elements are not captured in ArchAIDE scheme.The fabric is not recognisable through a picture taken by a mobile device, given to the technological and methodological limitation.Nevertheless, fabric and other elements can be employed to filter the information on top of the class ranking predicted by ArchAIDE.Consequently, we can assume that the gap between ArchAIDE and human archaeologists in distinguishing ceramic types based on their shape or decoration, is probably much lower than the achieved error rates.Moreover, the error rates are probably exaggerated due to problems related to the correct labelling of potsherds.These have been gathered based on the labelling that is documented in catalogues and established collections, even if, in some cases, mistakes about the exact provenance of the assignment or the ground truth classification are likely to be present.
ArchAIDE developed a novel data generation technique, a new shape representation scheme, an original reweighting method to deal with a large set of compounding challenges, and a real-world cross-modality matching problem.ArchAIDE, thanks to the innovations built up, provides a real-world scenario working application and a case study of deep learning applied to real-world data where the "sim2real domain shift" is broad, and most conventional assumptions are widely disrupted.
Finally, ArchAIDE has demonstrated that it may be used for a variety of pottery types if the necessary comparative data can be gathered (and potentially other artefact types as well).This will allow maintaining the system as fully operational and useful to archaeologists; new catalogues must be added into the reference database, as well as training datasets for having more recognisable ceramic classes.From the end of the project (May 2019), the MAPPALab, a research unit of the University of Pisa, pursued this goal (Figure 12).In this period, decoration and types of Maiolica Arcaica (a medieval tin-glazed ware) produced in Pisa were added.At this moment, all the data are available to the users as a comparative collection.In the next months, the data collected will be used to train two specific neural networks and potentially test their performances.Bronze Age pottery coming from central Italy and Roman Common ware are now being implemented by a research team composed of researchers from the Museo delle Civiltà in Rome, the University of Cassino, the Deutsches Archäologisches Institut in Rome, and the Italian General Directorate for Education, Research and Cultural Institutes of the Ministry for Cultural Heritage and Activities and Tourism.These collections will be available to users in the next months.Work on Bronze Age pottery represents an opportunity and a challenge for ArchAIDE.The recognition algorithms were developed with standardised pottery productions such as Terra Sigillata, benefiting from a long tradition in classification and analysis [49].Working with Bronze Age pottery means stress-testing the algorithms to demonstrate that they can also work thoroughly with less standardised pottery productions common in other historical periods than the Roman period and non-Mediterranean archaeology.This could bring to an overall improvement, broader collaboration, and implementation of the ArchAIDE system.

Figure 1 .
Figure 1.Archaeologists must spend much time classifying thousands of pottery sherds.Ar-chAIDE meets archaeologists' needs creating a portable, user-friendly tool for mobile devices that can be used everywhere, speeding up the classification phase both in the field and during work in the warehouses.

Figure 1 .
Figure 1.Archaeologists must spend much time classifying thousands of pottery sherds.ArchAIDE meets archaeologists' needs creating a portable, user-friendly tool for mobile devices that can be used everywhere, speeding up the classification phase both in the field and during work in the warehouses.

Heritage 2021, 4 FOR PEER REVIEW 5 Figure 2 .
Figure 2. The double workflow for appearance-based and shape-based recognition from an input image to top 5 results.

Figure 2 .
Figure 2. The double workflow for appearance-based and shape-based recognition from an input image to top 5 results.

Figure 3 .
Figure 3.The Roman amphorae (a) and Majolica of Montelupo Fiorentino (b) are two of the main test classes used to train the system, for their peculiar characteristics useful to stress the algorithms for shape-based, and appearance recognition, respectively.Thanks to the collaborations of different institutions, museums, research groups, and colleagues worldwide, it was possible to collect photos of thousands of sherds.In this figure, part of the sherds were from the Roman site of Spoletino (Viterbo-Italy), and fragments stored in the Museum of Ceramic in Montelupo Fiorentino warehouse.

Figure 3 .
Figure 3.The Roman amphorae (a) and Majolica of Montelupo Fiorentino (b) are two of the main test classes used to train the system, for their peculiar characteristics useful to stress the algorithms for shape-based, and appearance recognition, respectively.Thanks to the collaborations of different institutions, museums, research groups, and colleagues worldwide, it was possible to collect photos of thousands of sherds.In this figure, part of the sherds were from the Roman site of Spoletino (Viterbo-Italy), and fragments stored in the Museum of Ceramic in Montelupo Fiorentino warehouse.

Figure 4 .
Figure 4.The figure shows the steps from the extraction of inner and outer profiles from 2D draw ings, to the creation of 3D models ready to be randomly broken to obtain synthetic sherds to train the algorithms [45].

Figure 4 .
Figure 4.The figure shows the steps from the extraction of inner and outer profiles from 2D drawings, to the creation of 3D models ready to be randomly broken to obtain synthetic sherds to train the algorithms [45].

Figure 5 .
Figure 5.The automated extraction of the outer (green) and inner (red) profiles from a real-world sherd image.

Figure 5 .
Figure 5.The automated extraction of the outer (green) and inner (red) profiles from a real-world sherd image.

Figure 5 .
Figure 5.The automated extraction of the outer (green) and inner (red) profiles from a real-world sherd image.

Figure 7 .
Figure 7.The appearance-based algorithm's continuous improvement from its first release (March 2018) to the final version (February 2019).

Figure 7 .
Figure 7.The appearance-based algorithm's continuous improvement from its first release (March 2018) to the final version (February 2019).

Figure 8 .
Figure 8.The ArchAIDE portal is available at the Archaeology Data Service of the University of York.Multilingual pottery vocabularies, 2D and 3D pottery models, and all the videos produced by ArchAIDE can be freely downloaded.

Figure 8 .
Figure 8.The ArchAIDE portal is available at the Archaeology Data Service of the University of York.Multilingual pottery vocabularies, 2D and 3D pottery models, and all the videos produced by ArchAIDE can be freely downloaded.Heritage 2021, 4 FOR PEER REVIEW 13

Figure 10 .
Figure 10.The figure shows the working of the shape recognition tool inside the ArchAIDE app.The app has been designed with a user-friendly interface.Both for shape-based and appearancebased recognition, the system offers five results to the user at the end of the recognition process.Each item is linked to the reference database for verifying the exactness of the matching.After checking, the user can flag and save the right one.

Figure 10 .
Figure10.The figure shows the working of the shape recognition tool inside the ArchAIDE app.The app has been designed with a user-friendly interface.Both for shape-based and appearance-based recognition, the system offers five results to the user at the end of the recognition process.Each item is linked to the reference database for verifying the exactness of the matching.After checking, the user can flag and save the right one.

Table 2 .
Comparison between mobile and desktop performances.