MDPI - Publisher of Open Access Journals

35 pages, 7343 KB

Open AccessArticle

A Hybrid Deep Learning and Knowledge Graph Approach for Intelligent Image Indexing and Retrieval

by Mohamed Hamroun and Damien Sauveron

Appl. Sci. 2025, 15(19), 10591; https://doi.org/10.3390/app151910591 - 30 Sep 2025

Viewed by 1925

Technological advancements have enabled users to digitize and store an unlimited number of multimedia documents, including images and videos. However, the heterogeneous nature of multimedia content poses significant challenges in efficient indexing and retrieval. Traditional approaches primarily focus on visual features, often neglecting [...] Read more.

Technological advancements have enabled users to digitize and store an unlimited number of multimedia documents, including images and videos. However, the heterogeneous nature of multimedia content poses significant challenges in efficient indexing and retrieval. Traditional approaches primarily focus on visual features, often neglecting the semantic context, which limits retrieval efficiency. This paper proposes a hybrid deep learning and knowledge graph approach for intelligent image indexing and retrieval. By integrating deep learning models such as EfficientNet and Vision Transformer (ViT) with structured knowledge graphs, the proposed framework enhances semantic understanding and retrieval performance. The methodology incorporates feature extraction, concept classification, and hierarchical knowledge graph structuring to facilitate effective multimedia retrieval. Experimental results on benchmark datasets, including TRECVID, Corel, and MSCOCO, demonstrate significant improvements in precision, robustness, and query expansion techniques. The findings highlight the potential of combining deep learning with knowledge graphs to bridge the semantic gap and optimize multimedia indexing and retrieval. Full article

(This article belongs to the Special Issue Application of Deep Learning and Big Data Processing)

► Show Figures

Figure 1

25 pages, 2296 KB

Open AccessArticle

Multimedia Graph Codes for Fast and Semantic Retrieval-Augmented Generation

by Stefan Wagenpfeil

Electronics 2025, 14(12), 2472; https://doi.org/10.3390/electronics14122472 - 18 Jun 2025

Cited by 3 | Viewed by 3150

Abstract

Retrieval-Augmented Generation (RAG) has become a central approach to enhance the factual consistency and domain specificity of large language models (LLMs) by incorporating external context at inference time. However, most existing RAG systems rely on dense vector-based similarity, which fails to capture complex [...] Read more.

Retrieval-Augmented Generation (RAG) has become a central approach to enhance the factual consistency and domain specificity of large language models (LLMs) by incorporating external context at inference time. However, most existing RAG systems rely on dense vector-based similarity, which fails to capture complex semantic structures, relational dependencies, and multimodal content. In this paper, we introduce Graph Codes—a matrix-based encoding of Multimedia Feature Graphs—as an alternative retrieval paradigm. Graph Codes preserve semantic topology by explicitly encoding entities and their typed relationships from multimodal documents, enabling structure-aware and interpretable retrieval. We evaluate our system in two domains: multimodal scene understanding (200 annotated image-question pairs) and clinical question answering (150 real-world medical queries with 10,000 structured knowledge snippets). Results show that our method outperforms dense retrieval baselines in precision (+9–15%), reduces hallucination rates by over 30%, and yields higher expert-rated answer quality. Theoretically, this work demonstrates that symbolic similarity over typed semantic graphs provides a more faithful alignment mechanism than latent embeddings. Practically, it enables interpretable, modality-agnostic retrieval pipelines deployable in high-stakes domains such as medicine or law. We conclude that Graph Code-based RAG bridges the gap between structured knowledge representation and neural generation, offering a robust and explainable alternative to existing approaches. Full article

(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)

► Show Figures

Figure 1

19 pages, 12241 KB

Open AccessArticle

Geospatial Tool Development for the Management of Historical Hiking Trails—The Case of the Holy Site of Meteora

by Chryssy Potsiou, Charalabos Ioannidis, Sofia Soile, Argyro-Maria Boutsi, Regina Chliverou, Konstantinos Apostolopoulos, Maria Gkeli and Fotis Bourexis

Land 2023, 12(8), 1530; https://doi.org/10.3390/land12081530 - 2 Aug 2023

Cited by 10 | Viewed by 4164

Abstract

This paper presents a holistic guiding methodology for the development of a geospatial tool to be used for the documentation, planning, smart management and dissemination of a country’s network of historic hiking trails. To deal with the challenges and to ensure the sustainability [...] Read more.

This paper presents a holistic guiding methodology for the development of a geospatial tool to be used for the documentation, planning, smart management and dissemination of a country’s network of historic hiking trails. To deal with the challenges and to ensure the sustainability of a historic site, geospatial documentation merging authoritative and crowdsourced data and a WebGIS-based spatial analysis is necessary. Geospatial data collection should include professional field surveys, professional and crowdsourced photographic documentation and video recording of the existing historic walking/hiking trails. A geodatabase, structured using relational model technology, including vector spatial entities (feature classes), mosaics (raster) and tabulated data (geodatabase tables), should be developed on a commercial or open platform; in this case, the ArcGIS Pro is used. Entities with embedded descriptive information and metadata for the technical, legal, historical, and administrative context may then be created. An object-oriented data model is needed to connect spatial and descriptive information. Spatial and descriptive queries or correlations between attribute fields of spatial entities must be enabled for specialized information retrieval by either experts or users. Next, a web GIS application to present the developed geodatabase in a visually appealing and informative way is created. It should integrate 2D maps with built-in tools and should support advanced functionalities, such as: (i) pop-ups that display brief information and images about specific spots along the trails; (ii) dynamic visualization of the vertical profile of each trail; (iii) multimedia information about landmarks, natural features and scenic viewpoints. Finally, the tool includes a feedback service and continuous efficiency monitoring and assessment, and enables adjustments, if and where needed. The tool is tested and used for 10 historical walking/hiking trails of the archaeological and Holy Site of Meteora, Central Greece. This is a UNESCO World Heritage site. The network, with a total length of 35 km, leads to six monasteries, still active since the 12th century, passing by gigantic rocks and beautiful natural landscapes. The site is famous globally and the greater area is continuously overcrowded with visitors. The tool is anticipated to be used for the documentation and management of the whole walking/hiking historic trail network of Greece in the future. Full article

(This article belongs to the Special Issue Preservation, Reuse and Reveal of Cultural Heritage through Sustainable Land Management, Rural and Urban Development II)

► Show Figures

Figure 1

18 pages, 4476 KB

Open AccessArticle

Investigation of Relationships between Discrete and Dimensional Emotion Models in Affective Picture Databases Using Unsupervised Machine Learning

by Marko Horvat, Alan Jović and Kristijan Burnik

Appl. Sci. 2022, 12(15), 7864; https://doi.org/10.3390/app12157864 - 5 Aug 2022

Cited by 12 | Viewed by 4806

Abstract

Digital documents created to evoke emotional responses are intentionally stored in special affective multimedia databases, along with metadata describing their semantics and emotional content. These databases are routinely used in multidisciplinary research on emotion, attention, and related phenomena. Affective dimensions and emotion norms [...] Read more.

Digital documents created to evoke emotional responses are intentionally stored in special affective multimedia databases, along with metadata describing their semantics and emotional content. These databases are routinely used in multidisciplinary research on emotion, attention, and related phenomena. Affective dimensions and emotion norms are the most common emotion data models in the field of affective computing, but they are considered separable and not interchangeable. The goal of this study was to determine whether it is possible to statistically infer values of emotionally annotated pictures using the discrete emotion model when the values of the dimensional model are available and vice versa. A positive answer would greatly facilitate stimuli retrieval from affective multimedia databases and the integration of heterogeneous and differently structured affective data sources. In the experiment, we built a statistical model to describe dependencies between discrete and dimensional ratings using the affective picture databases NAPS and NAPS BE with standardized annotations for 1356 and 510 pictures, respectively. Our results show the following: (1) there is a statistically significant correlation between certain pairs of discrete and dimensional emotions in picture stimuli, and (2) robust transformation of picture ratings from the discrete emotion space to well-defined clusters in the dimensional space is possible for some discrete-dimensional emotion pairs. Based on our findings, we conclude that a feasible recommender system for affective dataset retrieval can be developed. The software tool developed for the experiment and the results are freely available for scientific and non-commercial purposes. Full article

(This article belongs to the Special Issue Affective Computing and Recommender Systems)

► Show Figures

Figure 1

24 pages, 10116 KB

Open AccessArticle

Explainable Multimedia Feature Fusion for Medical Applications

by Stefan Wagenpfeil, Paul Mc Kevitt, Abbas Cheddad and Matthias Hemmje

J. Imaging 2022, 8(4), 104; https://doi.org/10.3390/jimaging8040104 - 8 Apr 2022

Cited by 6 | Viewed by 3964

Abstract

Due to the exponential growth of medical information in the form of, e.g., text, images, Electrocardiograms (ECGs), X-rays, and multimedia, the management of a patient’s data has become a huge challenge. In particular, the extraction of features from various different formats and their [...] Read more.

Due to the exponential growth of medical information in the form of, e.g., text, images, Electrocardiograms (ECGs), X-rays, and multimedia, the management of a patient’s data has become a huge challenge. In particular, the extraction of features from various different formats and their representation in a homogeneous way are areas of interest in medical applications. Multimedia Information Retrieval (MMIR) frameworks, like the Generic Multimedia Analysis Framework (GMAF), can contribute to solving this problem, when adapted to special requirements and modalities of medical applications. In this paper, we demonstrate how typical multimedia processing techniques can be extended and adapted to medical applications and how these applications benefit from employing a Multimedia Feature Graph (MMFG) and specialized, efficient indexing structures in the form of Graph Codes. These Graph Codes are transformed to feature relevant Graph Codes by employing a modified Term Frequency Inverse Document Frequency (TFIDF) algorithm, which further supports value ranges and Boolean operations required in the medical context. On this basis, various metrics for the calculation of similarity, recommendations, and automated inferencing and reasoning can be applied supporting the field of diagnostics. Finally, the presentation of these new facilities in the form of explainability is introduced and demonstrated. Thus, in this paper, we show how Graph Codes contribute new querying options for diagnosis and how Explainable Graph Codes can help to readily understand medical multimedia formats. Full article

(This article belongs to the Special Issue Intelligent Strategies for Medical Image Analysis)

► Show Figures

Figure 1

20 pages, 1754 KB

Open AccessArticle

A Knowledge-Driven Multimedia Retrieval System Based on Semantics and Deep Features

by Antonio Maria Rinaldi, Cristiano Russo and Cristian Tommasino

Future Internet 2020, 12(11), 183; https://doi.org/10.3390/fi12110183 - 28 Oct 2020

Cited by 15 | Viewed by 3880

Abstract

In recent years the information user needs have been changed due to the heterogeneity of web contents which increasingly involve in multimedia contents. Although modern search engines provide visual queries, it is not easy to find systems that allow searching from a particular [...] Read more.

In recent years the information user needs have been changed due to the heterogeneity of web contents which increasingly involve in multimedia contents. Although modern search engines provide visual queries, it is not easy to find systems that allow searching from a particular domain of interest and that perform such search by combining text and visual queries. Different approaches have been proposed during years and in the semantic research field many authors proposed techniques based on ontologies. On the other hand, in the context of image retrieval systems techniques based on deep learning have obtained excellent results. In this paper we presented novel approaches for image semantic retrieval and a possible combination for multimedia document analysis. Several results have been presented to show the performance of our approach compared with literature baselines. Full article

(This article belongs to the Special Issue Data Science and Knowledge Discovery)

► Show Figures

Figure 1

20 pages, 3092 KB

Open AccessArticle

Lift Charts-Based Binary Classification in Unsupervised Setting for Concept-Based Retrieval of Emotionally Annotated Images from Affective Multimedia Databases

by Marko Horvat, Alan Jović and Danko Ivošević

Information 2020, 11(9), 429; https://doi.org/10.3390/info11090429 - 3 Sep 2020

Cited by 2 | Viewed by 4605

Abstract

Evaluation of document classification is straightforward if complete information on the documents’ true categories exists. In this case, the rank of each document can be accurately determined and evaluated. However, in an unsupervised setting, where the exact document category is not available, lift [...] Read more.

Evaluation of document classification is straightforward if complete information on the documents’ true categories exists. In this case, the rank of each document can be accurately determined and evaluated. However, in an unsupervised setting, where the exact document category is not available, lift charts become an advantageous method for evaluation of the retrieval quality and categorization of ranked documents. We introduce lift charts as binary classifiers of ranked documents and explain how to apply them to the concept-based retrieval of emotionally annotated images as one of the possible retrieval methods for this application. Furthermore, we describe affective multimedia databases on a representative example of the International Affective Picture System (IAPS) dataset, their applications, advantages, and deficiencies, and explain how lift charts may be used as a helpful method for document retrieval in this domain. Optimization of lift charts for recall and precision is also described. A typical scenario of document retrieval is presented on a set of 800 affective pictures labeled with an unsupervised glossary. In the lift charts-based retrieval using the approximate matching method, the highest attained accuracy, precision, and recall were 51.06%, 47.41%, 95.89%, and 81.83%, 99.70%, 33.56%, when optimized for recall and precision, respectively. Full article

(This article belongs to the Section Information Processes)

► Show Figures

Graphical abstract

26 pages, 10390 KB

Open AccessArticle

An Integrated Approach to 3D Web Visualization of Cultural Heritage Heterogeneous Datasets

by Argyro-Maria Boutsi, Charalabos Ioannidis and Sofia Soile

Remote Sens. 2019, 11(21), 2508; https://doi.org/10.3390/rs11212508 - 26 Oct 2019

Cited by 35 | Viewed by 9139

Abstract

The evolution of the high-quality 3D archaeological representations from niche products to integrated online media has not yet been completed. Digital archives of the field often lack multimodal data interoperability, user interaction and intelligibility. A web-based cultural heritage archive that compensates for these [...] Read more.

The evolution of the high-quality 3D archaeological representations from niche products to integrated online media has not yet been completed. Digital archives of the field often lack multimodal data interoperability, user interaction and intelligibility. A web-based cultural heritage archive that compensates for these issues is presented in this paper. The multi-resolution 3D models constitute the core of the visualization on top of which supportive documentation data and multimedia content are spatial and logical connected. Our holistic approach focuses on the dynamic manipulation of the 3D scene through the development of advanced navigation mechanisms and information retrieval tools. Users parse the multi-modal content in a geo-referenced way through interactive annotation systems over cultural points of interest and automatic narrative tours. Multiple 3D and 2D viewpoints are enabled in real-time to support data inspection. The implementation exploits front-end programming languages, 3D graphic libraries and visualization frameworks to handle efficiently the asynchronous operations and preserve the initial assets’ accuracy. The choice of Greece’s Meteora, UNESCO world site, as a case study accounts for the platform’s applicability to complex geometries and large-scale historical environments. Full article

(This article belongs to the Special Issue 2nd Edition Advances in Remote Sensing for Archaeological Heritage)

► Show Figures

Graphical abstract

12 pages, 1553 KB

Open AccessArticle

Cloud-Based Image Retrieval Using GPU Platforms

by Sidi Ahmed Mahmoudi, Mohammed Amin Belarbi, El Wardani Dadi, Saïd Mahmoudi and Mohammed Benjelloun

Computers 2019, 8(2), 48; https://doi.org/10.3390/computers8020048 - 14 Jun 2019

Cited by 9 | Viewed by 7866

Abstract

The process of image retrieval presents an interesting tool for different domains related to computer vision such as multimedia retrieval, pattern recognition, medical imaging, video surveillance and movements analysis. Visual characteristics of images such as color, texture and shape are used to identify [...] Read more.

The process of image retrieval presents an interesting tool for different domains related to computer vision such as multimedia retrieval, pattern recognition, medical imaging, video surveillance and movements analysis. Visual characteristics of images such as color, texture and shape are used to identify the content of images. However, the retrieving process becomes very challenging due to the hard management of large databases in terms of storage, computation complexity, temporal performance and similarity representation. In this paper, we propose a cloud-based platform in which we integrate several features extraction algorithms used for content-based image retrieval (CBIR) systems. Moreover, we propose an efficient combination of SIFT and SURF descriptors that allowed to extract and match image features and hence improve the process of image retrieval. The proposed algorithms have been implemented on the CPU and also adapted to fully exploit the power of GPUs. Our platform is presented with a responsive web solution that offers for users the possibility to exploit, test and evaluate image retrieval methods. The platform offers to users a simple-to-use access for different algorithms such as SIFT, SURF descriptors without the need to setup the environment or install anything while spending minimal efforts on preprocessing and configuring. On the other hand, our cloud-based CPU and GPU implementations are scalable, which means that they can be used even with large database of multimedia documents. The obtained results showed: 1. Precision improvement in terms of recall and precision; 2. Performance improvement in terms of computation time as a result of exploiting GPUs in parallel; 3. Reduction of energy consumption. Full article

(This article belongs to the Special Issue Advances and Innovations in Cloud Computing Technologies and Applications (CloudTech 2018))

► Show Figures

Figure 1

30 pages, 853 KB

Open AccessArticle

Finding Emotional-Laden Resources on the World Wide Web

by Kathrin Knautz, Diane Rasmussen Neal, Stefanie Schmidt, Tobias Siebenlist and Wolfgang G. Stock

Information 2011, 2(1), 217-246; https://doi.org/10.3390/info2010217 - 2 Mar 2011

Cited by 9 | Viewed by 11246

Abstract

Some content in multimedia resources can depict or evoke certain emotions in users. The aim of Emotional Information Retrieval (EmIR) and of our research is to identify knowledge about emotional-laden documents and to use these findings in a new kind of World Wide [...] Read more.

Some content in multimedia resources can depict or evoke certain emotions in users. The aim of Emotional Information Retrieval (EmIR) and of our research is to identify knowledge about emotional-laden documents and to use these findings in a new kind of World Wide Web information service that allows users to search and browse by emotion. Our prototype, called Media EMOtion SEarch (MEMOSE), is largely based on the results of research regarding emotive music pieces, images and videos. In order to index both evoked and depicted emotions in these three media types and to make them searchable, we work with a controlled vocabulary, slide controls to adjust the emotions’ intensities, and broad folksonomies to identify and separate the correct resource-specific emotions. This separation of so-called power tags is based on a tag distribution which follows either an inverse power law (only one emotion was recognized) or an inverse-logistical shape (two or three emotions were recognized). Both distributions are well known in information science. MEMOSE consists of a tool for tagging basic emotions with the help of slide controls, a processing device to separate power tags, a retrieval component consisting of a search interface (for any topic in combination with one or more emotions) and a results screen. The latter shows two separately ranked lists of items for each media type (depicted and felt emotions), displaying thumbnails of resources, ranked by the mean values of intensity. In the evaluation of the MEMOSE prototype, study participants described our EmIR system as an enjoyable Web 2.0 service. Full article

(This article belongs to the Special Issue What Is Information?)

► Show Figures

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI