Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (37)

Search Parameters:
Keywords = intelligent metadata processing

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
29 pages, 2299 KB  
Article
A Multi-Dimensional Framework for Data Quality Assurance in Cancer Imaging Repositories
by Olga Tsave, Alexandra Kosvyra, Dimitrios T. Filos, Dimitris Th. Fotopoulos and Ioanna Chouvarda
Cancers 2025, 17(19), 3213; https://doi.org/10.3390/cancers17193213 - 1 Oct 2025
Viewed by 259
Abstract
Background/Objectives: Cancer remains a leading global cause of death, with breast, lung, colorectal, and prostate cancers being among the most prevalent. The integration of Artificial Intelligence (AI) into cancer imaging research offers opportunities for earlier diagnosis and personalized treatment. However, the effectiveness of [...] Read more.
Background/Objectives: Cancer remains a leading global cause of death, with breast, lung, colorectal, and prostate cancers being among the most prevalent. The integration of Artificial Intelligence (AI) into cancer imaging research offers opportunities for earlier diagnosis and personalized treatment. However, the effectiveness of AI models depends critically on the quality, standardization, and fairness of the input data. The EU-funded INCISIVE project aimed to create a federated, pan-European repository of imaging and clinical data for cancer cases, with a key objective to develop a robust framework for pre-validating data prior to its use in AI development. Methods: We propose a data validation framework to assess clinical (meta)data and imaging data across five dimensions: completeness, validity, consistency, integrity, and fairness. The framework includes procedures for deduplication, annotation verification, DICOM metadata analysis, and anonymization compliance. Results: The pre-validation process identified key data quality issues, such as missing clinical information, inconsistent formatting, and subgroup imbalances, while also demonstrating the added value of structured data entry and standardized protocols. Conclusions: This structured framework addresses common challenges in curating large-scale, multimodal medical data. By applying this approach, the INCISIVE project ensures data quality, interoperability, and equity, providing a transferable model for future health data repositories supporting AI research in oncology. Full article
(This article belongs to the Section Methods and Technologies Development)
Show Figures

Figure 1

26 pages, 7003 KB  
Article
Agentic Search Engine for Real-Time Internet of Things Data
by Abdelrahman Elewah, Khalid Elgazzar and Said Elnaffar
Sensors 2025, 25(19), 5995; https://doi.org/10.3390/s25195995 - 28 Sep 2025
Viewed by 443
Abstract
The Internet of Things (IoT) has enabled a vast network of devices to communicate over the Internet. However, the fragmentation of IoT systems continues to hinder seamless data sharing and coordinated management across platforms.However, there is currently no actual search engine for IoT [...] Read more.
The Internet of Things (IoT) has enabled a vast network of devices to communicate over the Internet. However, the fragmentation of IoT systems continues to hinder seamless data sharing and coordinated management across platforms.However, there is currently no actual search engine for IoT data. Existing IoT search engines are considered device discovery tools, providing only metadata about devices rather than enabling access to IoT application data. While efforts such as IoTCrawler have striven to support IoT application data, they have largely failed due to the fragmentation of IoT systems and the heterogeneity of IoT data.To address this, we recently introduced SensorsConnect—a unified framework designed to facilitate interoperable content and sensor data sharing among collaborative IoT systems, inspired by how the World Wide Web (WWW) enabled shared and accessible information spaces for humans. This paper presents the IoT Agentic Search Engine (IoTASE), a real-time semantic search engine tailored specifically for IoT environments. IoTASE leverages LLMs and Retrieval-Augmented Generation (RAG) techniques to address the challenges of navigating and searching vast, heterogeneous streams of real-time IoT data. This approach enables the system to process complex natural language queries and return accurate, contextually relevant results in real time. To evaluate its effectiveness, we implemented a hypothetical deployment in the Toronto region, simulating a realistic urban environment using a dataset composed of 500 services and over 37,000 IoT-like data entries. Our evaluation shows that IoT-ASE achieved 92% accuracy in retrieving intent-aligned services and consistently generated concise, relevant, and preference-aware responses, outperforming generalized outputs produced by systems such as Gemini. These results underscore the potential of IoT-ASE to make real-time IoT data both accessible and actionable, supporting intelligent decision-making across diverse application domains. Full article
(This article belongs to the Special Issue Recent Trends in AI-Based Intelligent Sensing Systems and IoTs)
Show Figures

Figure 1

30 pages, 10155 KB  
Article
Interoperable Semantic Systems in Public Administration: AI-Driven Data Mining from Law-Enforcement Reports
by Alexandros Z. Spyropoulos and Vassilis Tsiantos
Computers 2025, 14(9), 376; https://doi.org/10.3390/computers14090376 - 8 Sep 2025
Viewed by 1455
Abstract
The digitisation of law-enforcement archives is examined with the aim of moving from static analogue records to interoperable semantic information systems. A step-by-step framework for optimal digitisation is proposed, grounded in archival best practice and enriched with artificial-intelligence and semantic-web technologies. Emphasis is [...] Read more.
The digitisation of law-enforcement archives is examined with the aim of moving from static analogue records to interoperable semantic information systems. A step-by-step framework for optimal digitisation is proposed, grounded in archival best practice and enriched with artificial-intelligence and semantic-web technologies. Emphasis is placed on semantic data representation, which renders information actionable, searchable, interlinked, and automatically processed. As a proof of concept, a large language model—OpenAI ChatGPT, version o3—was applied to a corpus of narrative police reports, extracting and classifying key entities (metadata, persons, addresses, vehicles, incidents, fingerprints, and inter-entity relationships). The output was converted to Resource Description Framework triples and ingested into a triplestore, demonstrating how unstructured text can be transformed into machine-readable, interoperable data with minimal human intervention. The approach’s challenges—technical complexity, data quality assurance, information-security requirements, and staff training—are analysed alongside the opportunities it affords, such as accelerated access to records, cross-agency interoperability, and advanced analytics for investigative and strategic decision-making. Combining systematic digitisation, AI-driven data extraction, and rigorous semantic modelling ultimately delivers a fully interoperable information environment for law-enforcement agencies, enhancing efficiency, transparency, and evidentiary integrity. Full article
(This article belongs to the Special Issue Advances in Semantic Multimedia and Personalized Digital Content)
Show Figures

Figure 1

33 pages, 112557 KB  
Article
Enhanced Tumor Diagnostics via Cyber-Physical Workflow: Integrating Morphology, Morphometry, and Genomic MultimodalData Analysis and Visualization in Digital Pathology
by Marianna Dimitrova Kucarov, Niklolett Szakállas, Béla Molnár and Miklos Kozlovszky
Sensors 2025, 25(14), 4465; https://doi.org/10.3390/s25144465 - 17 Jul 2025
Viewed by 656
Abstract
The rapid advancement of genomic technologies has significantly transformed biomedical research and clinical applications, particularly in oncology. Identifying patient-specific genetic mutations has become a crucial tool for early cancer detection and personalized treatment strategies. Detecting tumors at the earliest possible stage provides critical [...] Read more.
The rapid advancement of genomic technologies has significantly transformed biomedical research and clinical applications, particularly in oncology. Identifying patient-specific genetic mutations has become a crucial tool for early cancer detection and personalized treatment strategies. Detecting tumors at the earliest possible stage provides critical insights beyond traditional tissue analysis. This paper presents a novel cyber-physical system that combines high-resolution tissue scanning, laser microdissection, next-generation sequencing, and genomic analysis to offer a comprehensive solution for early cancer detection. We describe the methodologies for scanning tissue samples, image processing of the morphology of single cells, quantifying morphometric parameters, and generating and analyzing real-time genomic metadata. Additionally, the intelligent system integrates data from open-access genomic databases for gene-specific molecular pathways and drug targets. The developed platform also includes powerful visualization tools, such as colon-specific gene filtering and heatmap generation, to provide detailed insights into genomic heterogeneity and tumor foci. The integration and visualization of multimodal single-cell genomic metadata alongside tissue morphology and morphometry offer a promising approach to precision oncology. Full article
Show Figures

Figure 1

17 pages, 2550 KB  
Article
Solar and Wind 24 H Sequenced Prediction Using L-Transform Component and Deep LSTM Learning in Representation of Spatial Pattern Correlation
by Ladislav Zjavka
Atmosphere 2025, 16(7), 859; https://doi.org/10.3390/atmos16070859 - 15 Jul 2025
Viewed by 429
Abstract
Spatiotemporal correlations between meteo-inputs and wind–solar outputs in an optimal regional scale are crucial for developing robust models, reliable in mid-term prediction time horizons. Modelling border conditions is vital for early recognition of progress in chaotic atmospheric processes at the destination of interest. [...] Read more.
Spatiotemporal correlations between meteo-inputs and wind–solar outputs in an optimal regional scale are crucial for developing robust models, reliable in mid-term prediction time horizons. Modelling border conditions is vital for early recognition of progress in chaotic atmospheric processes at the destination of interest. This approach is used in differential and deep learning; artificial intelligence (AI) techniques allow for reliable pattern representation in long-term uncertainty and regional irregularities. The proposed day-by-day estimation of the RE production potential is based on first data processing in detecting modelling initialisation times from historical databases, considering correlation distance. Optimal data sampling is crucial for AI training in statistically based predictive modelling. Differential learning (DfL) is a recently developed and biologically inspired strategy that combines numerical derivative solutions with neurocomputing. This hybrid approach is based on the optimal determination of partial differential equations (PDEs) composed at the nodes of gradually expanded binomial trees. It allows for modelling of highly uncertain weather-related physical systems using unstable RE. The main objective is to improve its self-evolution and the resulting computation in prediction time. Representing relevant patterns by their similarity factors in input–output resampling reduces ambiguity in RE forecasting. Node-by-node feature selection and dynamical PDE representation of DfL are evaluated along with long-short-term memory (LSTM) recurrent processing of deep learning (DL), capturing complex spatio-temporal patterns. Parametric C++ executable software with one-month spatial metadata records is available to compare additional modelling strategies. Full article
(This article belongs to the Special Issue Applications of Artificial Intelligence in Atmospheric Sciences)
Show Figures

Figure 1

17 pages, 3508 KB  
Article
Multimodal Pathological Image Segmentation Using the Integration of Trans MMY Net and Patient Metadata
by Ahmed Muhammad Rehan, Kun Li and Ping Chen
Electronics 2025, 14(12), 2369; https://doi.org/10.3390/electronics14122369 - 10 Jun 2025
Viewed by 901
Abstract
In recent years, the utilization of artificial intelligence methodologies in computer vision has markedly propelled the advancement of intelligent healthcare. A multimodal medical image segmentation algorithm is proposed by combining patient metadata with a segmentation network, improving its performance and attaining more accuracy [...] Read more.
In recent years, the utilization of artificial intelligence methodologies in computer vision has markedly propelled the advancement of intelligent healthcare. A multimodal medical image segmentation algorithm is proposed by combining patient metadata with a segmentation network, improving its performance and attaining more accuracy in the final diagnostic results. A fusion method utilizing a transformer backbone network is presented to enhance the efficacy of fusion processes for various modalities of medical data. A channel-level cross-fusion module (channel trans) is incorporated during the fusion phase of two modalities to mitigate interference from extraneous elements in the integrated information. The SMESwin UNet backbone network combines vision transformers and convolutional neural networks to produce multi-scale semantic features and attention mechanisms. It simultaneously collects information from global and local perspectives while minimizing model parameters. Exceptional experimental results were obtained on two publicly accessible glandular pathology datasets, with the Dice segmentation performance index reaching 91.41% on Dataset A and 80.6% on Dataset B. This indicates that utilizing a channel transformer to merge the two modalities effectively generalizes, and the combination of convolutional neural networks with vision transformers improves the ability to extract features in medical images. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

25 pages, 1932 KB  
Article
Enhancing Facility Management with Emerging Technologies: A Study on the Application of Blockchain and NFTs
by Andrea Bongini, Marco Sparacino, Luca Marzi and Carlo Biagini
Buildings 2025, 15(11), 1911; https://doi.org/10.3390/buildings15111911 - 1 Jun 2025
Viewed by 762
Abstract
In recent years, Facility Management has undergone significant technological and methodological advancements, primarily driven by Building Information Modelling (BIM), Computer-Aided Facility Management (CAFM), and Computerized Maintenance Management Systems (CMMS). These innovations have improved process efficiency and risk management. However, challenges remain in asset [...] Read more.
In recent years, Facility Management has undergone significant technological and methodological advancements, primarily driven by Building Information Modelling (BIM), Computer-Aided Facility Management (CAFM), and Computerized Maintenance Management Systems (CMMS). These innovations have improved process efficiency and risk management. However, challenges remain in asset management, maintenance, traceability, and transparency. This study investigates the potential of blockchain technology and non-fungible tokens (NFTs) to address these challenges. By referencing international (ISO, BOMA) and European (EN) standards, the research develops an asset management process model incorporating blockchain and NFTs. The methodology includes evaluating the technical and practical aspects of this model and strategies for metadata utilization. The model ensures an immutable record of transactions and maintenance activities, reducing errors and fraud. Smart contracts automate sub-phases like progress validation and milestone-based payments, increasing operational efficiency. The study’s practical implications are significant, offering advanced solutions for transparent, efficient, and secure Facility Management. It lays the groundwork for future research, emphasizing practical implementations and real-world case studies. Additionally, integrating blockchain with emerging technologies like artificial intelligence and machine learning could further enhance Facility Management processes. Full article
Show Figures

Figure 1

25 pages, 1339 KB  
Article
Link-State-Aware Proactive Data Delivery in Integrated Satellite–Terrestrial Networks for Multi-Modal Remote Sensing
by Ranshu Peng, Chunjiang Bian, Shi Chen and Min Wu
Remote Sens. 2025, 17(11), 1905; https://doi.org/10.3390/rs17111905 - 30 May 2025
Viewed by 985
Abstract
This paper seeks to address the limitations of conventional remote sensing data dissemination algorithms, particularly their inability to model fine-grained multi-modal heterogeneous feature correlations and adapt to dynamic network topologies under resource constraints. This paper proposes multi-modal-MAPPO, a novel multi-modal deep reinforcement learning [...] Read more.
This paper seeks to address the limitations of conventional remote sensing data dissemination algorithms, particularly their inability to model fine-grained multi-modal heterogeneous feature correlations and adapt to dynamic network topologies under resource constraints. This paper proposes multi-modal-MAPPO, a novel multi-modal deep reinforcement learning (MDRL) framework designed for a proactive data push in large-scale integrated satellite–terrestrial networks (ISTNs). By integrating satellite cache states, user cache states, and multi-modal data attributes (including imagery, metadata, and temporal request patterns) into a unified Markov decision process (MDP), our approach pioneers the application of the multi-actor-attention-critic with parameter sharing (MAPPO) algorithm to ISTNs push tasks. Central to this framework is a dual-branch actor network architecture that dynamically fuses heterogeneous modalities: a lightweight MobileNet-v3-small backbone extracts semantic features from remote sensing imagery, while parallel branches—a multi-layer perceptron (MLP) for static attributes (e.g., payload specifications, geolocation tags) and a long short-term memory (LSTM) network for temporal user cache patterns—jointly model contextual and historical dependencies. A dynamically weighted attention mechanism further adapts modality-specific contributions to enhance cross-modal correlation modeling in complex, time-varying scenarios. To mitigate the curse of dimensionality in high-dimensional action spaces, we introduce a multi-dimensional discretization strategy that decomposes decisions into hierarchical sub-policies, balancing computational efficiency and decision granularity. Comprehensive experiments against state-of-the-art baselines (MAPPO, MAAC) demonstrate that multi-modal-MAPPO reduces the average content delivery latency by 53.55% and 29.55%, respectively, while improving push hit rates by 0.1718 and 0.4248. These results establish the framework as a scalable and adaptive solution for real-time intelligent data services in next-generation ISTNs, addressing critical challenges in resource-constrained, dynamic satellite–terrestrial environments. Full article
(This article belongs to the Special Issue Advances in Multi-Source Remote Sensing Data Fusion and Analysis)
Show Figures

Figure 1

26 pages, 2363 KB  
Article
Generative Artificial Intelligence-Enabled Facility Layout Design Paradigm
by Fuwen Hu, Chun Wang and Xuefei Wu
Appl. Sci. 2025, 15(10), 5697; https://doi.org/10.3390/app15105697 - 20 May 2025
Cited by 2 | Viewed by 4304
Abstract
Facility layout design (FLD) is critical for optimizing manufacturing efficiency, yet traditional approaches struggle with complexity, dynamic constraints, and fragmented data integration. This study proposes a generative-AI-enabled facility layout design, a novel paradigm aligning with Industry 4.0, to address these challenges by integrating [...] Read more.
Facility layout design (FLD) is critical for optimizing manufacturing efficiency, yet traditional approaches struggle with complexity, dynamic constraints, and fragmented data integration. This study proposes a generative-AI-enabled facility layout design, a novel paradigm aligning with Industry 4.0, to address these challenges by integrating generative artificial intelligence (AI), semantic models, and data-driven optimization. The proposed method evolves from three historical paradigms: experience-based methods, operations research, and simulation-based engineering. The metamodels supporting the generative-AI-enabled facility layout design is the Asset Administration Shell (AAS), which digitizes physical assets and their relationships, enabling interoperability across systems. Domain-specific knowledge graphs, constructed by parsing AAS metadata and enriched by large language models (LLMs), capture multifaceted relationships (e.g., spatial adjacency, process dependencies, safety constraints) to guide layout generation. The convolutional knowledge graph embedding (ConvE) method is employed for link prediction, converting entities and relationships into low-dimensional vectors to infer optimal spatial arrangements while addressing data sparsity through negative sampling. The proposed reference architecture for generative-AI-enabled facility layout design supports end-to-end layout design, featuring a 3D visualization engine, AI-driven optimization, and real-time digital twins. Prototype testing demonstrates the system’s end-to-end generation ability from requirement-driven contextual prompts and extensively reduced complexity of modeling, integration, and optimization. Key innovations include the fusion of AAS with LLM-derived contextual knowledge, dynamic adaptation via big data streams, and a hybrid optimization approach balancing competing objectives. The 3D layout generation results demonstrate a scalable, adaptive solution for storage workshops, bridging gaps between isolated data models and human–AI collaboration. This research establishes a foundational framework for AI-driven facility planning, offering actionable insights for AI-enabled facility layout design adoption and highlighting future directions in the generative design of complex engineering. Full article
Show Figures

Figure 1

15 pages, 29428 KB  
Article
Color as a High-Value Quantitative Tool for PET/CT Imaging
by Michail Marinis, Sofia Chatziioannou and Maria Kallergi
Information 2025, 16(5), 352; https://doi.org/10.3390/info16050352 - 27 Apr 2025
Viewed by 1029
Abstract
The successful application of artificial intelligence (AI) techniques for the quantitative analysis of hybrid medical imaging data such as PET/CT is challenged by the differences in the type of information and image quality between the two modalities. The purpose of this work was [...] Read more.
The successful application of artificial intelligence (AI) techniques for the quantitative analysis of hybrid medical imaging data such as PET/CT is challenged by the differences in the type of information and image quality between the two modalities. The purpose of this work was to develop color-based, pre-processing methodologies for PET/CT data that could yield a better starting point for subsequent diagnosis and image processing and analysis. Two methods are proposed that are based on the encoding of Hounsfield Units (HU) and Standardized Uptake Values (SUVs) in separate transformed .png files as reversible color information in combination with .png basic information metadata based on DICOM attributes. Linux Ubuntu using Python was used for the implementation and pilot testing of the proposed methodologies on brain 18F-FDG PET/CT scans acquired with different PET/CT systems. The range of HUs and SUVs was mapped using novel weighted color distribution functions that allowed for a balanced representation of the data and an improved visualization of anatomic and metabolic differences. The pilot application of the proposed mapping codes yielded CT and PET images where it was easier to pinpoint variations in anatomy and metabolic activity and offered a potentially better starting point for the subsequent fully automated quantitative analysis of specific regions of interest or observer evaluation. It should be noted that the output .png files contained all the raw values and may be treated as raw DICOM input data. Full article
Show Figures

Figure 1

10 pages, 3532 KB  
Proceeding Paper
RoBuCACO: ChatGPT-Based Educational Model for Creative Problem-Solving
by Jung-Suk Hyun and Chan-Jung Park
Eng. Proc. 2025, 89(1), 40; https://doi.org/10.3390/engproc2025089040 - 20 Mar 2025
Viewed by 638
Abstract
Generative artificial intelligence (AI), including ChatGPT4o, is increasingly used across various sectors such as education. In this article, we introduce a new educational model, RoBuCACO, which combines the Butterfly Model for creative problem-solving in a ChatGPT-based framework. ChatGPT is trained on the Butterfly [...] Read more.
Generative artificial intelligence (AI), including ChatGPT4o, is increasingly used across various sectors such as education. In this article, we introduce a new educational model, RoBuCACO, which combines the Butterfly Model for creative problem-solving in a ChatGPT-based framework. ChatGPT is trained on the Butterfly Model using Korean patent data to generate patent metadata. Users follow a structured learning process that includes the definition of roles for ChatGPT (Ro), learning the Butterfly Model (Bu), defining problems and contradictions (C), developing both abstract (A) and concrete solutions (C), and refining optimal (O) solutions. Korean patent metadata are used to obtain concrete solutions and collaborate with ChatGPT to iteratively refine optimal solutions. Full article
Show Figures

Figure 1

17 pages, 1507 KB  
Article
A Data-Driven Decision-Making Support Method for Priority Determination for an Intelligent Road Problem Reporting System
by Woohoon Jeon, Jinguk Kim and Joyoung Lee
Appl. Sci. 2024, 14(23), 10861; https://doi.org/10.3390/app142310861 - 23 Nov 2024
Viewed by 1426
Abstract
This paper presents a new decision support method aimed at prioritizing processing for an intelligent road problem reporting service. The proposed method uses advanced georeferencing technology to extract the longitude and latitude coordinates in the metadata of photos taken with the smartphone application [...] Read more.
This paper presents a new decision support method aimed at prioritizing processing for an intelligent road problem reporting service. The proposed method uses advanced georeferencing technology to extract the longitude and latitude coordinates in the metadata of photos taken with the smartphone application to capture the complaint scene. This method not only maps out the processing times, but also applies a spatiotemporal clustering technique to link the complaint types and locations with the actual complaint processing times. A validation study of the frequency of reported locations per priority reveals that the complaint-processing prioritization method developed in this study aligns realistically with actual field complaint processing. Furthermore, recognizing the significance of location in processing complaints, the georeferencing technique appears suitable for identifying complaint locations for each report and incorporating this into the decision-making framework. Full article
(This article belongs to the Special Issue Advances in Intelligent Transportation Systems)
Show Figures

Figure 1

30 pages, 3456 KB  
Article
Towards Next-Generation Urban Decision Support Systems through AI-Powered Construction of Scientific Ontology Using Large Language Models—A Case in Optimizing Intermodal Freight Transportation
by Jose Tupayachi, Haowen Xu, Olufemi A. Omitaomu, Mustafa Can Camur, Aliza Sharmin and Xueping Li
Smart Cities 2024, 7(5), 2392-2421; https://doi.org/10.3390/smartcities7050094 - 31 Aug 2024
Cited by 17 | Viewed by 5106
Abstract
The incorporation of Artificial Intelligence (AI) models into various optimization systems is on the rise. However, addressing complex urban and environmental management challenges often demands deep expertise in domain science and informatics. This expertise is essential for deriving data and simulation-driven insights that [...] Read more.
The incorporation of Artificial Intelligence (AI) models into various optimization systems is on the rise. However, addressing complex urban and environmental management challenges often demands deep expertise in domain science and informatics. This expertise is essential for deriving data and simulation-driven insights that support informed decision-making. In this context, we investigate the potential of leveraging the pre-trained Large Language Models (LLMs) to create knowledge representations for supporting operations research. By adopting ChatGPT-4 API as the reasoning core, we outline an applied workflow that encompasses natural language processing, Methontology-based prompt tuning, and Generative Pre-trained Transformer (GPT), to automate the construction of scenario-based ontologies using existing research articles and technical manuals of urban datasets and simulations. From these ontologies, knowledge graphs can be derived using widely adopted formats and protocols, guiding various tasks towards data-informed decision support. The performance of our methodology is evaluated through a comparative analysis that contrasts our AI-generated ontology with the widely recognized pizza ontology, commonly used in tutorials for popular ontology software. We conclude with a real-world case study on optimizing the complex system of multi-modal freight transportation. Our approach advances urban decision support systems by enhancing data and metadata modeling, improving data integration and simulation coupling, and guiding the development of decision support strategies and essential software components. Full article
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)
Show Figures

Figure 1

22 pages, 5745 KB  
Article
GenAI-Assisted Database Deployment for Heterogeneous Indigenous–Native Ethnographic Research Data
by Reen-Cheng Wang, David Yang, Ming-Che Hsieh, Yi-Cheng Chen and Weihsuan Lin
Appl. Sci. 2024, 14(16), 7414; https://doi.org/10.3390/app14167414 - 22 Aug 2024
Viewed by 2019
Abstract
In ethnographic research, data collected through surveys, interviews, or questionnaires in the fields of sociology and anthropology often appear in diverse forms and languages. Building a powerful database system to store and process such data, as well as making good and efficient queries, [...] Read more.
In ethnographic research, data collected through surveys, interviews, or questionnaires in the fields of sociology and anthropology often appear in diverse forms and languages. Building a powerful database system to store and process such data, as well as making good and efficient queries, is very challenging. This paper extensively investigates modern database technology to find out what the best technologies to store these varied and heterogeneous datasets are. The study examines several database categories: traditional relational databases, the NoSQL family of key-value databases, graph databases, document databases, object-oriented databases and vector databases, crucial for the latest artificial intelligence solutions. The research proves that when it comes to field data, the NoSQL lineup is the most appropriate, especially document and graph databases. Simplicity and flexibility found in document databases and advanced ability to deal with complex queries and rich data relationships attainable with graph databases make these two types of NoSQL databases the ideal choice if a large amount of data has to be processed. Advancements in vector databases that embed custom metadata offer new possibilities for detailed analysis and retrieval. However, converting contents into vector data remains challenging, especially in regions with unique oral traditions and languages. Constructing such databases is labor-intensive and requires domain experts to define metadata and relationships, posing a significant burden for research teams with extensive data collections. To this end, this paper proposes using Generative AI (GenAI) to help in the data-transformation process, a recommendation that is supported by testing where GenAI has proven itself a strong supplement to document and graph databases. It also discusses two methods of vector database support that are currently viable, although each has drawbacks and benefits. Full article
(This article belongs to the Topic Innovation, Communication and Engineering)
Show Figures

Figure 1

22 pages, 3698 KB  
Article
An Email Cyber Threat Intelligence Method Using Domain Ontology and Machine Learning
by Algimantas Venčkauskas, Jevgenijus Toldinas, Nerijus Morkevičius and Filippo Sanfilippo
Electronics 2024, 13(14), 2716; https://doi.org/10.3390/electronics13142716 - 11 Jul 2024
Cited by 3 | Viewed by 2445
Abstract
Email is an excellent technique for connecting users at low cost. Spam emails pose the risk of collecting a user’s personal information by fooling them into clicking on a link or engaging in other fraudulent activities. Furthermore, when a spam message is delivered, [...] Read more.
Email is an excellent technique for connecting users at low cost. Spam emails pose the risk of collecting a user’s personal information by fooling them into clicking on a link or engaging in other fraudulent activities. Furthermore, when a spam message is delivered, the user may read the entire message before deciding it is spam and deleting it. Most approaches to email classification proposed by other authors use natural language processing (NLP) methods to analyze the content of email messages. One of the biggest shortcomings of NLP-based methods is their dependence on the language in which a message is written. To construct an effective email cyber threat intelligence (CTI) sharing framework, the privacy of a message’s content must be preserved. This article proposes a novel domain-specific ontology and method for emails that require only the metadata of email messages to be shared to preserve their privacy, making them applicable to solutions for sharing email CTI. To preserve privacy, a new semantic parser was developed for the proposed email domain-specific ontology to populate email metadata and create a dataset. Machine learning algorithms were examined, and experiments were conducted to identify and classify spam messages using the newly created dataset. Feature-ranking algorithms, chi-squared, ANOVA (analysis of variance), and Kruskal–Wallis tests were used. In all experiments, the kernel naïve Bayes model demonstrated acceptable results. The highest accuracy of 92.28% and an F1 score of 95.92% for recognizing spam email messages were obtained using the proposed domain-specific ontology, the newly developed semantic parser, and the created metadata dataset. Full article
(This article belongs to the Special Issue Recent Advances in Intrusion Detection Systems Using Machine Learning)
Show Figures

Figure 1

Back to TopTop