An Augmented Reality CBIR System Based on Multimedia Knowledge Graph and Deep Learning Techniques in Cultural Heritage

Rinaldi, Antonio M.; Russo, Cristiano; Tommasino, Cristian

doi:10.3390/computers11120172

Open AccessArticle

An Augmented Reality CBIR System Based on Multimedia Knowledge Graph and Deep Learning Techniques in Cultural Heritage

by

Antonio M. Rinaldi

^*,†

,

Cristiano Russo

^†

and

Cristian Tommasino

^†

Department of Electrical Engineering and Information Technology, University of Napoli Federico II, Via Claudio, 21, 80125 Napoli, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Computers 2022, 11(12), 172; https://doi.org/10.3390/computers11120172

Submission received: 4 November 2022 / Revised: 23 November 2022 / Accepted: 27 November 2022 / Published: 30 November 2022

(This article belongs to the Special Issue Computational Science and Its Applications 2022)

Download

Browse Figures

Versions Notes

Abstract

:

In the last few years, the spreading of new technologies, such as augmented reality (AR), has been changing our way of life. Notably, AR technologies have different applications in the cultural heritage realm, improving available information for a user while visiting museums, art exhibits, or generally a city. Moreover, the spread of new and more powerful mobile devices jointly with virtual reality (VR) visors contributes to the spread of AR in cultural heritage. This work presents an augmented reality mobile system based on content-based image analysis techniques and linked open data to improve user knowledge about cultural heritage. In particular, we explore the uses of traditional feature extraction methods and a new way to extract them employing deep learning techniques. Furthermore, we conduct a rigorous experimental analysis to recognize the best method to extract accurate multimedia features for cultural heritage analysis. Eventually, experiments show that our approach achieves good results with respect to different standard measures.

Keywords:

augmented reality; deep learning; linked open data; knowledge graph

1. Introduction

Cultural heritage (CH) often fails to be a successful attraction because it cannot fully capture people’s interests. Due to legal or environmental regulations, the enjoyment of an archaeological site cannot be improved, and it is often hard to recognize relevant details in a cultural landscape or understand their meanings. In recent years, many technologies have allowed excellent results in the cultural heritage (CH) domain. On the one hand, the diffusion of new deep learning models and applications with state-of-the-art performance in many fields have found applications in this domain. Moreover, the distribution of mobile devices, thanks to the decreasing price, the high speed of new network infrastructures, and new devices, such as virtual reality viewers, allow the design and development of new applications for cultural heritage that improve the visiting experiences of cultural sites. These new technologies provide information to users quickly and intuitively. In this context, information has a strategic value in understanding the world around a user. Other new digital devices, joined with new technologies, provide the ability to interact with places or objects in real time [1,2]. Furthermore, cultural heritage goes toward decay, so future generations may not be able to access many artistic works or places. Digital cultural heritage is a set of methodologies and tools that use digital technologies for understanding and preserving cultural, or natural heritage [3]. The digitization of cultural heritage allows the fruition of artwork, from literature to paintings, for current and future generations.

Augmented reality (AR) represents a technique that has changed how art is enjoyable. The advance of information technologies has made it possible to define new ways to describe and integrate natural and digital information [4]. Undoubtedly, virtual reality allows users to be completely immersed in a computer-generated environment, hiding the real world when the device is in use. Augmented reality will enable us to superimpose information around the user without blinding them in their physical environment. Mixed reality overlays information on the natural world. It includes the ability to understand and use the environment around the user to show or hide some of the digital content. Additionally, there are new classifications of the reality–virtuality continuum, such as extended reality, in which the natural and virtual world objects are presented within a single display. Thus, comprehensive reality includes all the previously mentioned categories [5].

Most applications that extensively use multimedia objects, such as digital libraries, sensor networks, bioinformatics, and e-business applications, require effective and efficient data management systems. Due to their complex and heterogeneous nature, managing, storing, and retrieving multimedia information are more demanding than traditional data management, which can be easily stored in commercial (primarily relational) database management systems [6].

A solution to the problem of retrieving multimedia objects is to associate the objects with a specific description [7]. This description allows us to make a retrieval by similarity. How this description occurs depends on the type of object. There are two possible ways to describe an image: through metadata or visual descriptors [6]. Content-based image retrieval (CBIR) systems are a solution to use a visual descriptor. One of its primary purposes is to limit textual descriptions, using the image content to compare images in the retrieval process. Moreover, novel data structures, such as knowledge graphs [8], could combine different data by improving their informative layers [9]. In the realm of the semantic web [10], linked data is a way of publishing structured data that allows data to be linked together. The publication of linked data is based on open web technologies and standards, such as HTTP (hypertext transfer protocol), RDF (resource description framework), and URI (unified resource identifier). The purpose of this data structuring is to allow computers to read and interpret the information on the web directly. The links also make extracting data from various sources through semantic queries. When linked data link publicly accessible data, it is referred to as linked open data (LOD) [11].

In this article, we propose a system for camera picture augmentation on a mobile device that allows users to retrieve real-time information. Our framework is based on CBIR techniques, linked open data, the knowledge graph model, and deep learning tools. The augmentation task superimposes helpful information on the mobile screen. Further, a user is an active actor in our workflow because it can interact with the application to give feedback to improve future users’ experience. Furthermore, we employ deep-learning techniques to extract features from images. In particular, we use pre-trained convolutional neural networks (CNNs) as feature extractors. Moreover, we exclude classification layers from CNNs and applied max or average global pooling to have a one-dimensional feature to implement a vector-based similarity search.

We organize the rest of the paper as follows: in Section 2, we provide a review of the literature related to CBIR and augmented reality for cultural heritage; Section 3 introduces the used approach along with the general architecture of the system; in Section 4, a use case of the proposed system is described and discussed; in Section 5, we present and discuss the experimental strategy and results; eventually, Section 6 is devoted to conclusions and future research.

2. Related Works

Many scholars have extensively studied the idea of applying technology to cultural heritage. According to the UNESCO convention of 1970, “communication, audiovisual techniques, automatic data processing, and other applicable technology, as well as cultural and recreational trends should advance faster than presenting work” [12]. In this regard, augmented reality (AR) is one of the most cutting-edge technologies for the heritage area since it enables the creation of user-focused mobile applications [13]. Researchers argue that, in addition to being entertaining, visitor interaction with AR systems results in a richer CH experience [14], demonstrating experience co-creation in which the user and the provider work together to create a worthwhile result [15,16,17]. Furthermore, as mentioned in [18], AR can enhance cultural heritage sites. Visitors can explore strange environments excitedly and enjoyably [19]. Since many tourists are searching for distinctive and memorable on-trip experiences, this is one of the most significant benefits from the standpoint of supply.

Over the years, one of the technologies that has spread most in the artistic domain is augmented reality. Further, many works studied it from different points of view. In the framework of cultural heritage, the authors in [20] recognized augmented reality’s economic, experiential, social, epistemological, cultural, historical, and educational values.

Additionally, CBIR systems have a wide range of uses in cultural heritage. For example, in [21], the authors used it for 3D reconstruction and e-documentation of monumental assets.

Since most people have smartphones, providing them with this experience is incredibly inexpensive. We highlight [22], where the authors suggest a mobile app based on augmented reality to give users information on natural monuments. To further illustrate the information retrieval process, the authors of [23,24] created a CBIR system that captures images from a camera. They made use of these data to enhance the user’s displayed image. The authors of [25] introduced apps that take advantage of speeded-up robust features (SURF) and virtual reality to view data and 3D representations of cultural heritage monuments. The authors demonstrated the suitability of the features obtained using a scale-invariant feature transform (SIFT) detector for an augmented reality application in [26]. In their paper, Rodriguez et al. developed an ORB-based mobile image recognition-based augmented reality (MIRAR) framework architecture [27]. A similar ORB, a quicker approach for extracting and comparing features, is used in other papers by authors, such as [28]. The authors in [29] discussed the findings of experiments carried out during certain museum visits, together with an application that uses augmented reality and retrieval. These tests examine how such a tool can alter visitors’ educational experiences. The authors in [30] provided a method for developing mobile augmented reality (AR) applications to improve the experience of visiting cultural places by offering details and interpretations about the site’s exciting objects. The study described how different technology advancements, including mobile wireless technologies, augmented reality (AR), multimedia, and game technologies, have been combined in a cultural context to make relevant information fun and exciting.

Haugstvedt et al. [31] applied a technology acceptance model for hedonic systems to examine the determinants of intention to use an augmented reality application with historical information and pictures. The results reported in their work show that perceived enjoyment and usefulness are essential determinants of the intention to use augmented reality applications with historical information and pictures.

A case study from Italy is the “Ducale” research project, which suggests ICT solutions for the Palazzo Ducale museum in Urbino. It offers a mobile application that employs augmented reality and content-based picture retrieval to improve the experience of visiting museums. In this project, the well-known picture “Ideal City” is an example to illustrate a particular method for using digital media to preserve cultural property [32].

The discussed literature inspires our approach, but we address two weaknesses that they have. Firstly, we improved accuracy using the most recent and novel techniques based on deep neural network architectures, and lastly, we used a multimedia knowledge graph that is continuously populated, thanks to the users’ feedback. In particular, in our work, we proposed a novel method, where several descriptors are evaluated and used to achieve the best results in a real scene utilizing a mobile CBIR framework. We remove the classification layers from pre-trained CNNs, extracting deep features, which were reduced from three dimensions to one dimension, using a global pooling operation. Additionally, using augmented reality, linked open data are combined with the acquired photos to retrieve textual data and provide more information to users.

3. The Proposed System

In this section, we describe our approach. We improved a framework presented in our previous work [33], where we proposed an augmented realty process for cultural heritage that uses traditional feature extraction methods. The new novel framework contains some blocks that work offline and others that work online. The offline processes concern the multimedia knowledge graph (MKG) population, while the online ones implement the augmentation task. We populated the multimedia knowledge graph using a focused crawler for the cultural heritage domain [34]. The augmentation task works in real time, beginning with taking a picture of the mobile device camera and, through MKG and LOD, enriching it with the recognized textual details. In addition, we also considered the case where our MKG does not have the required management information with user feedback. With this strategy, the users contribute to adding new knowledge to our system, improving the experience of future ones. As shown in Figure 1, the main blocks are as follows:

Image Loader: It loads images and preprocesses them as required from the feature extractor block with regards to the feature extraction techniques.
Feature Extractor: It extracts features, so it takes in input of a processed image and outputs a feature vector.
Feature Comparator: It compares the feature vector computed by the feature extractor and the feature stored in the multimedia knowledge graph. It puts on output the information used to augment the image.
Augmenter Image: It applies augmentation using the information obtained from the feature comparator.
Result Checker: It collects the user feedback and updates the results using LOD and MKG.
Focused Crawler: It works offline, populating the multimedia knowledge graph and then updating and improving its content.

Furthermore, an essential component of our system is the multimedia knowledge graph [35]. It is based on cultural heritage ontology enriched with multimedia contents retrieved using a focused crawler and information extracted from linked open data. This information is employed in the augmentation task. In this work, we used deep features and ORB as local descriptors. As described above, for deep features, we use pre-trained CNNs. In particular, we chose VGG-net, residual network, inception, and MobileNet.

Concerning the pooling operation, in this work, we explored global average pooling and global max pooling. We consider a tensor of H × W × C, H is height, W is width, and C is the number of channels. The global average pooling computes the average for each matrix H × W, so the output is an array of C elements. Instead, the max average pooling calculates the max of each matrix H × W, obtaining a vector of C elements.

We briefly introduce the used CNN architectures and Oriented FAST and Rotated BRIEF (ORB) in the following.

VGG16

The Visual Geometric Group of Oxford University proposed that VGG-Net [36] is a CNN architecture that achieved good results on the Large-Scale Visual Challenge (ILSVRC). Due to its easy architecture, it is one of the most used CNNs. The architecture involves convolutional layers with receptive filed (3 × 3), ReLU activation, and 2 × 2 max pooling layers after convolution. There are two versions, one with sixteen layers and one with nineteen. In this work, we used VGG16.

Residual Network

Residual network (ResNet) [37] is a CNN architecture designed to mitigate the vanishing gradient effect in deep networks. The main innovation proposed in this architecture is the residual block, which adds shortcut connections. These kinds of connections skip one or more layers performing the identity mapping, and their output is added to the output of the stacked layers. The authors, over the years, proposed the architecture of ResNet with a different number of layers. In this study, we used a typical configuration with fifty layers.

Inception

The authors introduced in [38] inception v1 and inception v2 architecture. Afterward, they proposed inception v3 [39], an improvement of v2 introducing the factorization concept. The main idea is factorizing convolution to reduce the number of connections and parameters without decreasing the network efficiency. The CNN consists of four modules implementing small factorization convolutions, factorization into asymmetric convolutions, and efficient grid size reduction.

MobileNet

MobileNet [40] is a CNN architecture inspired by InceptionNet with the same optimization to work on mobile devices. The main contribution of this architecture is the introduction of depthwise separable convolution, which, working in two steps, firstly applies a single convolution filter for each input channel and then create a linear combination of the output using a pointwise convolution. In this work, we used MobileNetV2 [41], which is an improvement of the first version.

ORB

ORB is a modified BRIEF descriptor based on a FAST keypoint detector. It initially employed FAST to identify the essential points. The top N spots are then determined by using a Harris corner measure. Because FAST is a rotation variation and does not compute orientation, it determines the intensity-weighted centroid of the patch with a corner situated in the middle. The vector’s direction determines the orientation from this corner point to the centroid. In order to improve the rotation invariance, moments are computed. In addition, ORB generates a rotation matrix from the patch’s orientation and then directs the BRIEF descriptors to point in that direction.

Augment Image Process

In Figure 2, we summarize the process of our system. The foremost step performed are image acquisition, feature extraction, image augmentation, visualization of augmented images, and results checking. The user interacts with the application in two cases, firstly to take a picture and lastly in case of wrong or missed matching. The process starts with a user that takes pictures on a mobile device using a mobile application. Then it sends the image to the server that performs prepossessing operations and extracts the features. Afterward, the server computes the similarity between the picture and the features contained in the multimedia knowledge graph. Then it sorts the results by similarity, performs augmentation superimposing textual information on the image, and sends information back to the mobile device. The user can suggest a resource if the server cannot find any machining. Instead, if the augmentation is wrong, the user can send feedback that the server uses to improve the experience of future users.

In particular, to compare input pictures with images in the multimedia knowledge graph, we used cosine similarity defined as in Equation (1):

S_{c} (A, B) = cos θ = \frac{A \cdot B}{∥ A ∥ ∥ B ∥} = \frac{\sum_{i = 1}^{n} A_{i} B_{i}}{\sqrt{\sum_{i = i}^{n} A_{i}^{2}} \sqrt{\sum_{i = i}^{n} B_{i}^{2}}}

(1)

We used Solr, a retrieval system that indexes the feature and the relations with the linked open data, to create our server application. Based on Apache Lucene, Apache Solr is an open-source search engine that offers indexing and searching tools. The targeted crawler fills our multimedia knowledge graph with linked open data. LOD, which depends on industry-standard web technologies including HTTP (hypertext transfer protocol), RDF (resource description framework) [42], and URI, is the publisher’s method of structured data that enables linking the data among them (uniform resource identifier). To obtain data from LOD, we used SPARQL [43]. A knowledge base called DBpedia was created using multiple organized bits of data from Wikipedia. For instance, according to the most recent statistics, the full DBpedia data set contains 38 million labels and abstracts in 125 different languages, 25.2 million links to images, 29.8 million links to other websites, 80.9 million links to Wikipedia, and 41.2 million links to YAGO categories. One of the most important online resources in this work was LOD DBpedia [44].

4. Use Case

In this section, we show a use case of our application. In particular, we describe two use cases: when the augmentation task receives positive feedback from the user (case 1) and negative feedback, but the user finds the correct matching in the first five results (case 2).

4.1. Use Case 1

In Case 1, we used David of Michelangelo Buonarroti as the picture we want to augment. The user takes a photo by mobile application (Figure 3a). It sends a shot to the server that computes a similarity with images contained in Multimedia Knowledge Graph and uses the information related to the more similar to augment the picture (Figure 3b). Then, it sends back information to the mobile application, and the user visualizes the correct information.

4.2. Use Case 2

In Case 2, we used the photo that depicts Apollo and Daphne of Gian Lorenzo Bernini. The user takes a picture using a mobile application (Figure 4a), then it sends a picture to the server that sends back the augmented picture (Figure 4b). The server sends to the mobile application the list of the first five best matchings (Figure 4c). The user can select the correct description from the items, and the application shows the right augmented picture (Figure 4d). The user sends negative feedback because the description does not correspond to the picture.

5. Experimental Results

This section introduces the experimental strategy and shows the results with related comments and discussion. We used a vector space model to retrieve deep descriptors and key point matching for ORB. We computed the precision–recall curve and meant average precision to evaluate our proposed approach. The precision–recall curve is computed as an interpolation of the precision values for 11 standard recall values ranging from 0 to 1 step of 0.1. We used Equation (2) for the interpolation. Equation (3) shows the precision, and Equation (4) shows the recall:

P_{i n t e r p} (r) = max_{r_{i} \geq r} p (r_{i})

(2)

P r e c i s i o n = \frac{| r e l e v a n t d o c u m e n t s \cap r e t r i e v e d d o c u m e n t s |}{| r e t r i e v e d d o c u m e n t s |}

(3)

R e c a l l = \frac{| r e l e v a n t d o c u m e n t s \cap r e t r i e v e d d o c u m e n t s |}{| r e l e v a n t d o c u m e n t s |}

(4)

The mean average precision is computed as shown in Equation (5a):

m A P @ k = \frac{\sum_{q = 1}^{Q} A v e P @ k (q)}{Q}

(5a)

A v r P @ k = \frac{\sum_{k = 1}^{n} (P (k) * r e l (k)}{n u m b e r o f r e l e v a n t d o c u m e n t s}

(5b)

In this work, we considered both the precision-recall curve and mAP. We used the first one to identify the best descriptor in image retrieval and the second one to figure out the best descriptor used as a classifier. It is important because we perform the augmentation using the information related to the first retrieved image. Furthermore, we computed the mAP@5 because we proposed the first five results to the user in case of negative feedback. We used a custom dataset of 100 query images to evaluate our system.

Figure 5 shows the results of the precision–recall curve, where the deep features achieved better results than ORB. Furthermore, the deep features have about the same precision. Due to this, we based our choice on MAP@1 and MAP@5. According to Table 1 and Figure 6, the feature extracted with MobileNetV2 and global average pooling as a reduction method achieved the best mean average precision at 5. Instead, MobileNetV2 and global max pooling as reduction methods achieved the best mean average precision at 1. Therefore, for our application, we chose ḿobilenetv2_avgéven if it has the second best MAP@1 and the best MAP@5.

6. Conclusions and Future Works

This work introduced an application to augment pictures for the cultural heritage domain using deep learning techniques and multimedia knowledge graphs, improving our previous work [33]. The main differences are the introduction of a focused crawler to populate the multimedia knowledge graph and using more accurate feature extraction methods that achieved better results. The results show that the approach based on the deep learning feature extraction method achieved better results. We designed our application in a modular fashion, which is easily extensible with other descriptors or functionalities. Therefore, we proved that using pre-trained CNNs as feature extractors is a good solution for the cultural heritage domain combined with a multimedia knowledge graph to improve the user knowledge. Further, the feedback improves the result over time. In future work, we will improve the focused crawler to have more accurate results in our multimedia knowledge graph using novel techniques based on deep learning. Furthermore, we want to explore new representation learning methods to improve the feature extraction process, and we want to define and implement a strategy to analyze the user experience.

Author Contributions

All authors who contributed substantially to the study’s conception and design were involved in the preparation and review of the manuscript until the approval of the final version. A.M.R., C.R. and C.T. were responsible for the literature search, manuscript development, and testing. Furthermore, A.M.R., C.R. and C.T. actively contributed to all parts of the article, including interpretation of results, review and approval. In addition, all authors contributed to the development of the system for the performance of the system tests. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The study data sets used or analysed are available in the manuscript tables.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

Purificato, E.; Rinaldi, A.M. Multimedia and geographic data integration for cultural heritage information retrieval. Multimedia Tools Appl. 2018, 77, 27447–27469. [Google Scholar] [CrossRef]
Purificato, E.; Rinaldi, A.M. A multimodal approach for cultural heritage information retrieval. In Proceedings of the International Conference on Computational Science and Its Applications, Melbourne, VIC, Australia, 2–5 July 2018; Springer: Cham, Switzerland, 2018; pp. 214–230. [Google Scholar]
Affleck, Y.K.T.K.J. New Heritage: New Media and Cultural Heritage; Routledge: London, UK, 2007. [Google Scholar]
Han, D.I.D.; Weber, J.; Bastiaansen, M.; Mitas, O.; Lub, X. Virtual and augmented reality technologies to enhance the visitor experience in cultural tourism. In Augmented Reality and Virtual Reality; Springer: Cham, Switzerland, 2019; pp. 113–128. [Google Scholar]
Vi, S.; da Silva, T.S.; Maurer, F. User Experience Guidelines for Designing HMD Extended Reality Applications. In Proceedings of the Human-Computer Interaction—INTERACT 2019, Paphos, Cyprus, 2–6 September 2019; Lamas, D., Loizides, F., Nacke, L., Petrie, H., Winckler, M., Zaphiris, P., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 319–341. [Google Scholar]
Candan, K.S.; Sapino, M.L. Data Management for Multimedia Retrieval; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Rinaldi, A.M.; Russo, C. User-centered information retrieval using semantic multimedia big data. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 2304–2313. [Google Scholar]
Rinaldi, A.M.; Russo, C.; Tommasino, C. A semantic approach for document classification using deep neural networks and multimedia knowledge graph. Expert Syst. Appl. 2021, 169, 114320. [Google Scholar] [CrossRef]
Muscetti, M.; Rinaldi, A.M.; Russo, C.; Tommasino, C. Multimedia ontology population through semantic analysis and hierarchical deep features extraction techniques. Knowl. Inf. Syst. 2022, 64, 1283–1303. [Google Scholar] [CrossRef]
Berners-Lee, T.; Hendler, J.; Lassila, O. The semantic web. Sci. Am. 2001, 284, 34–43. [Google Scholar] [CrossRef]
Yu, L. Linked open data. In A Developer’s Guide to the Semantic Web; Springer: Berlin/Heidelberg, Germany, 2011; pp. 409–466. [Google Scholar]
Titchen, S.M. On the construction of outstanding universal value: UNESCO’s World Heritage Convention (Convention concerning the Protection of the World Cultural and Natural Heritage, 1972) and the identification and assessment of cultural places for inclusion in the World Heritage List. Ph.D. Thesis, The Australian National University, Canberra, Australia, 1995. [Google Scholar]
Rigby, J.; Smith, S.P. Augmented Reality Challenges for Cultural Heritage; Applied Informatics Research Group, University of Newcastle: Callaghan, Australia, 2013. [Google Scholar]
Han, D.; Leue, C.; Jung, T. A tourist experience model for augmented reality applications in the urban heritage context. In Proceedings of the APacCHRIE Conference, Kuala Lumpur, Malaysia, 21–24 May 2014; pp. 21–24. [Google Scholar]
Neuhofer, B.; Buhalis, D.; Ladkin, A. A typology of technology-enhanced tourism experiences. Int. J. Tour. Res. 2014, 16, 340–350. [Google Scholar] [CrossRef]
Vargo, S.L.; Lusch, R.F. From goods to service (s): Divergences and convergences of logics. Ind. Mark. Manag. 2008, 37, 254–259. [Google Scholar] [CrossRef]
Yovcheva, Z.; Buhalis, D.; Gatzidis, C. Engineering augmented tourism experiences. In Information and Communication Technologies in Tourism 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 24–35. [Google Scholar]
Cranmer, E.; Jung, T. Augmented reality (AR): Business models in urban cultural heritage tourist destinations. In Proceedings of the APacCHRIE Conference, Kuala Lumpur, Malaysia, 21–24 May 2014; pp. 21–24. [Google Scholar]
Fritz, F.; Susperregui, A.; Linaza, M.T. Enhancing cultural tourism experiences with augmented reality technologies. In Proceedings of the 6th International Symposium on Virtual Reality, Archaeology and Cultural Heritage VAST, Pisa, Italy, 8–11 November 2005. [Google Scholar]
tom Dieck, M.C.; Jung, T.H. Value of augmented reality at cultural heritage sites: A stakeholder approach. J. Destinat. Mark. Manag. 2017, 6, 110–117. [Google Scholar] [CrossRef]
Makantasis, K.; Doulamis, A.; Doulamis, N.; Ioannides, M.; Matsatsinis, N. Content-based filtering for fast 3D reconstruction from unstructured web-based image data. In Proceedings of the Euro-Mediterranean Conference, Limassol, Cyprus, 3–8 November 2014; Springer: Cham, Switzerland, 2014; pp. 91–101. [Google Scholar]
Shin, C.; Hong, S.H.; Yoon, H. Enriching Natural Monument with User-Generated Mobile Augmented Reality Mashup. J. Multimedia Inf. Syst. 2020, 7, 25–32. [Google Scholar] [CrossRef] [Green Version]
Tam, D.C.C.; Fiala, M. A Real Time Augmented Reality System Using GPU Acceleration. In Proceedings of the 2012 Ninth Conference on Computer and Robot Vision, Toronto, ON, Canada, 28–30 May 2012; pp. 101–108. [Google Scholar] [CrossRef]
Chen, J.; Guo, J.; Wang, Y. Mobile augmented reality system for personal museum tour guide applications. In Proceedings of the IET International Communication Conference on Wireless Mobile and Computing (CCWMC 2011), Shanghai, China, 14–16 November 2011; pp. 262–265. [Google Scholar] [CrossRef]
Han, J.G.; Park, K.W.; Ban, K.J.; Kim, E.K. Cultural Heritage Sites Visualization System based on Outdoor Augmented Reality. AASRI Procedia 2013, 4, 64–71. [Google Scholar] [CrossRef]
Bres, S.; Tellez, B. Localisation and Augmented Reality for Mobile Applications in Culture Heritage; Computer (Long. Beach. Calif.); INSA: Lyon, France, 2006; pp. 1–5. [Google Scholar]
Rodrigues, J.M.F.; Veiga, R.J.M.; Bajireanu, R.; Lam, R.; Pereira, J.A.R.; Sardo, J.D.P.; Cardoso, P.J.S.; Bica, P. Mobile Augmented Reality Framework—MIRAR. In Proceedings of the Universal Access in Human-Computer Interaction, Virtual, Augmented, and Intelligent Environments, Las Vegas, NV, USA, 15–20 July 2018; Antona, M., Stephanidis, C., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 102–121. [Google Scholar]
Ufkes, A.; Fiala, M. A Markerless Augmented Reality System for Mobile Devices. In Proceedings of the 2013 International Conference on Computer and Robot Vision, Washington, DC, USA, 29–31 May 2013; pp. 226–233. [Google Scholar] [CrossRef]
Ghouaiel, N.; Garbaya, S.; Cieutat, J.M.; Jessel, J.P. Mobile augmented reality in museums: Towards enhancing visitor’s learning experience. Int. J. Virtual Real. 2017, 17, 21–31. [Google Scholar] [CrossRef] [Green Version]
Angelopoulou, A.; Economou, D.; Bouki, V.; Psarrou, A.; Jin, L.; Pritchard, C.; Kolyda, F. Mobile augmented reality for cultural heritage. In Proceedings of the International Conference on Mobile Wireless Middleware, Operating Systems, and Applications, London, UK, 22–24 June 2011; Springer: Cham, Switzerland, 2011; pp. 15–22. [Google Scholar]
Haugstvedt, A.C.; Krogstie, J. Mobile augmented reality for cultural heritage: A technology acceptance study. In Proceedings of the 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Altanta, GA, USA, 5–8 November 2012; pp. 247–255. [Google Scholar]
Clini, P.; Frontoni, E.; Quattrini, R.; Pierdicca, R. Augmented Reality Experience: From High-Resolution Acquisition to Real Time Augmented Contents. Adv. MultiMedia 2014, 2014, 597476. [Google Scholar] [CrossRef] [Green Version]
Rinaldi, A.M.; Russo, C.; Tommasino, C. An Approach Based on Linked Open Data and Augmented Reality for Cultural Heritage Content-Based Information Retrieval. In Proceedings of the International Conference on Computational Science and Its Applications, Malaga, Spain, 4–7 July 2022; Springer: Cham, Switzerland, 2022; pp. 99–112. [Google Scholar]
Capuano, A.; Rinaldi, A.M.; Russo, C. An ontology-driven multimedia focused crawler based on linked open data and deep learning techniques. Multimedia Tools Appl. 2020, 79, 7577–7598. [Google Scholar] [CrossRef]
Rinaldi, A.M.; Russo, C. A semantic-based model to represent multimedia big data. In Proceedings of the 10th International Conference on Management of Digital EcoSystems, Tokyo, Japan, 25–28 September 2018; pp. 31–38. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with 389 convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Candan, K.S.; Liu, H.; Suvarna, R. Resource description framework: Metadata and its applications. ACM Sigkdd Explor. Newsl. 2001, 3, 6–19. [Google Scholar] [CrossRef]
Della Valle, E.; Ceri, S. Querying the semantic web: SPARQL. In Handbook of Semantic Web Technologies; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; Van Kleef, P.; Auer, S.; et al. Dbpedia–A large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 2015, 6, 167–195. [Google Scholar] [CrossRef]

Figure 1. System architecture.

Figure 2. Flow chart.

Figure 3. Use Case 1. (a) Took picture. (b) Augmented picture.

Figure 4. Use Case 2. (a) Took picture. (b) Augmented picture wrong. (c) List of five best matching. (d) Augmented picture.

Figure 5. Precision–recall curve.

Figure 6. Mean average precision at one and five.

Table 1. MAP@5 and MAP@1 for each features.

	MAP@5	MAP@1
inceptionv3_avg	0.8416	0.8716
vgg16_avg	0.8424	0.8824
resnet50_avg	0.8290	0.8590
mobilenetv2_avg	0.8586	0.9086
inceptionv3_max	0.8278	0.8878
vgg16_max	0.8417	0.8917
resnet50_max	0.8327	0.8827
mobilenetv2_max	0.8547	0.9247
orb	0.2949	0.3349

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rinaldi, A.M.; Russo, C.; Tommasino, C. An Augmented Reality CBIR System Based on Multimedia Knowledge Graph and Deep Learning Techniques in Cultural Heritage. Computers 2022, 11, 172. https://doi.org/10.3390/computers11120172

AMA Style

Rinaldi AM, Russo C, Tommasino C. An Augmented Reality CBIR System Based on Multimedia Knowledge Graph and Deep Learning Techniques in Cultural Heritage. Computers. 2022; 11(12):172. https://doi.org/10.3390/computers11120172

Chicago/Turabian Style

Rinaldi, Antonio M., Cristiano Russo, and Cristian Tommasino. 2022. "An Augmented Reality CBIR System Based on Multimedia Knowledge Graph and Deep Learning Techniques in Cultural Heritage" Computers 11, no. 12: 172. https://doi.org/10.3390/computers11120172

APA Style

Rinaldi, A. M., Russo, C., & Tommasino, C. (2022). An Augmented Reality CBIR System Based on Multimedia Knowledge Graph and Deep Learning Techniques in Cultural Heritage. Computers, 11(12), 172. https://doi.org/10.3390/computers11120172

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Augmented Reality CBIR System Based on Multimedia Knowledge Graph and Deep Learning Techniques in Cultural Heritage

Abstract

1. Introduction

2. Related Works

3. The Proposed System

Augment Image Process

4. Use Case

4.1. Use Case 1

4.2. Use Case 2

5. Experimental Results

6. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI