MDPI - Publisher of Open Access Journals

23 pages, 1882 KiB

Open AccessArticle

Attention Mechanism-Based Cognition-Level Scene Understanding

by Xuejiao Tang and Wenbin Zhang

Information 2025, 16(3), 203; https://doi.org/10.3390/info16030203 - 5 Mar 2025

Viewed by 848

Given a question–image input, a visual commonsense reasoning (VCR) model predicts an answer with a corresponding rationale, which requires inference abilities based on real-world knowledge. The VCR task, which calls for exploiting multi-source information as well as learning different levels of understanding and [...] Read more.

Given a question–image input, a visual commonsense reasoning (VCR) model predicts an answer with a corresponding rationale, which requires inference abilities based on real-world knowledge. The VCR task, which calls for exploiting multi-source information as well as learning different levels of understanding and extensive commonsense knowledge, is a cognition-level scene understanding challenge. The VCR task has aroused researchers’ interests due to its wide range of applications, including visual question answering, automated vehicle systems, and clinical decision support. Previous approaches to solving the VCR task have generally relied on pre-training or exploiting memory with long-term dependency relationship-encoded models. However, these approaches suffer from a lack of generalizability and a loss of information in long sequences. In this work, we propose a parallel attention-based cognitive VCR network, termed PAVCR, which fuses visual–textual information efficiently and encodes semantic information in parallel to enable the model to capture rich information for cognition-level inference. Extensive experiments show that the proposed model yields significant improvements over existing methods on the benchmark VCR dataset. Moreover, the proposed model provides an intuitive interpretation of visual commonsense reasoning. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining: Innovations in Big Data Analytics)

► Show Figures

Figure 1

14 pages, 1443 KiB

Open AccessArticle

A Dynamic Graph Reasoning Model with an Auxiliary Task for Knowledge Base Question Answering

by Zhichao Wu and Xuan Tian

Electronics 2024, 13(24), 5011; https://doi.org/10.3390/electronics13245011 - 20 Dec 2024

Viewed by 938

Abstract

In the field of question answering (QA), the methods of large language models (LLMs) cannot learn vertical domain knowledge during the pre-training stage, leading to low accuracy in domain QA. Conversely, knowledge base question answering (KBQA) can combine the knowledge base (KB) that [...] Read more.

In the field of question answering (QA), the methods of large language models (LLMs) cannot learn vertical domain knowledge during the pre-training stage, leading to low accuracy in domain QA. Conversely, knowledge base question answering (KBQA) can combine the knowledge base (KB) that contains domain knowledge with small language models to achieve high accuracy with a low cost. In KBQA, the inference subgraph is composed of entity nodes and their relationships pertinent to the question, with the final answers being derived from the subgraph. However, there are still two critical problems in this field: (i) fixed or decreased scopes of the inference subgraphs over the reasoning process may lead to limited knowledge, restricted in KBQA, and (ii) a lack of alignment between the inference subgraph and the question leads to low accuracy. In this work, we propose a dynamic graph reasoning model with an auxiliary task, the DGRMWAT, which addresses the above challenges through two key innovations, as follows: (i) dynamic graph reasoning, whereby we update the scope of the inference subgraph during each reasoning step to obtain more relevant knowledge and reduce irrelevant knowledge, and (ii) an auxiliary task to enhance the correlation between the inference subgraph and the question by computing the similarities between the inference subgraph and the QA context node. The experiments on two QA benchmark datasets, CommonsenseQA and OpenbookQA, indicate that the DGRMWAT allowed improvements compared to the baseline models and LLMs. Full article

► Show Figures

Figure 1

17 pages, 5434 KiB

Open AccessArticle

Parallel Fusion of Graph and Text with Semantic Enhancement for Commonsense Question Answering

by Jiachuang Zong, Zhao Li, Tong Chen, Liguo Zhang and Yiming Zhan

Electronics 2024, 13(23), 4618; https://doi.org/10.3390/electronics13234618 - 22 Nov 2024

Viewed by 819

Abstract

Commonsense question answering (CSQA) is a challenging task in the field of knowledge graph question answering. It combines the context of the question with the relevant knowledge in the knowledge graph to reason and give an answer to the question. Existing CSQA models [...] Read more.

Commonsense question answering (CSQA) is a challenging task in the field of knowledge graph question answering. It combines the context of the question with the relevant knowledge in the knowledge graph to reason and give an answer to the question. Existing CSQA models combine pretrained language models and graph neural networks to process question context and knowledge graph information, respectively, and obtain each other’s information during the reasoning process to improve the accuracy of reasoning. However, the existing models do not fully utilize the textual representation and graph representation after reasoning to reason about the answer, and they do not give enough semantic representation to the edges during the reasoning process of the knowledge graph. Therefore, we propose a novel parallel fusion framework for text and knowledge graphs, using the fused global graph information to enhance the semantic information of reasoning answers. In addition, we enhance the relationship embedding by enriching the initial semantics and adjusting the initial weight distribution, thereby improving the reasoning ability of the graph neural network. We conducted experiments on two public datasets, CommonsenseQA and OpenBookQA, and found that our model is competitive when compared with other baseline models. Additionally, we validated the generalizability of our model on the MedQA-USMLE dataset. Full article

► Show Figures

Figure 1

14 pages, 793 KiB

Open AccessArticle

Unlocking Everyday Wisdom: Enhancing Machine Comprehension with Script Knowledge Integration

by Zhihao Zhou, Tianwei Yue, Chen Liang, Xiaoyu Bai, Dachi Chen, Congrui Hetang and Wenping Wang

Appl. Sci. 2023, 13(16), 9461; https://doi.org/10.3390/app13169461 - 21 Aug 2023

Cited by 3 | Viewed by 1601

Abstract

Harnessing commonsense knowledge poses a significant challenge for machine comprehension systems. This paper primarily focuses on incorporating a specific subset of commonsense knowledge, namely, script knowledge. Script knowledge is about sequences of actions that are typically performed by individuals in everyday life. Our [...] Read more.

Harnessing commonsense knowledge poses a significant challenge for machine comprehension systems. This paper primarily focuses on incorporating a specific subset of commonsense knowledge, namely, script knowledge. Script knowledge is about sequences of actions that are typically performed by individuals in everyday life. Our experiments were centered around the MCScript dataset, which was the basis of the SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge. As a baseline, we utilized our Three-Way Attentive Networks (TriANs) framework to model the interactions among passages, questions, and answers. Building upon the TriAN, we proposed to: (1) integrate a pre-trained language model to capture script knowledge; (2) introduce multi-layer attention to facilitate multi-hop reasoning; and (3) incorporate positional embeddings to enhance the model’s capacity for event-ordering reasoning. In this paper, we present our proposed methods and prove their efficacy in improving script knowledge integration and reasoning. Full article

(This article belongs to the Special Issue Text Mining, Machine Learning, and Natural Language Processing)

► Show Figures

Figure 1

12 pages, 1840 KiB

Open AccessArticle

Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering

by Yuchen Sha, Yujian Feng, Miao He, Shangdong Liu and Yimu Ji

Mathematics 2023, 11(15), 3269; https://doi.org/10.3390/math11153269 - 25 Jul 2023

Cited by 10 | Viewed by 4582

Abstract

Existing knowledge graph (KG) models for commonsense question answering present two challenges: (i) existing methods retrieve entities related to questions from the knowledge graph, which may extract noise and irrelevant nodes, and (ii) there is a lack of interaction representation between questions and [...] Read more.

Existing knowledge graph (KG) models for commonsense question answering present two challenges: (i) existing methods retrieve entities related to questions from the knowledge graph, which may extract noise and irrelevant nodes, and (ii) there is a lack of interaction representation between questions and graph entities. However, current methods mainly focus on retrieving relevant entities with some noisy and irrelevant nodes. In this paper, we propose a novel retrieval-augmented knowledge graph (RAKG) model, which solves the above issues using two key innovations. First, we leverage the density matrix to make the model reason along the corrected knowledge path and extract an enhanced subgraph of the knowledge graph. Second, we fuse representations of questions and graph entities through a bidirectional attention strategy, in which two representations fuse and update using a graph convolutional network (GCN). To evaluate the performance of our method, we conducted experiments on two widely used benchmark datasets: CommonsenseQA and OpenBookQA. The case study gives insight into the finding that the augmented subgraph provides reasoning along the corrected knowledge path for question answering. Full article

(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)

► Show Figures

Figure 1

20 pages, 4260 KiB

Open AccessArticle

MKBQA: Question Answering over Knowledge Graph Based on Semantic Analysis and Priority Marking Method

by Xiang Wang, Yanchao Li, Huiyong Wang and Menglong Lv

Appl. Sci. 2023, 13(10), 6104; https://doi.org/10.3390/app13106104 - 16 May 2023

Cited by 5 | Viewed by 2059

Abstract

In the field of question answering-based knowledge graphs, due to the complexity of the construction of knowledge graphs, a domain-specific knowledge graph often cannot contain some common-sense knowledge, which makes it impossible to answer questions that involve common-sense and domain knowledge at the [...] Read more.

In the field of question answering-based knowledge graphs, due to the complexity of the construction of knowledge graphs, a domain-specific knowledge graph often cannot contain some common-sense knowledge, which makes it impossible to answer questions that involve common-sense and domain knowledge at the same time. Therefore, this study proposes a knowledge graph-based question answering method in the computer science domain, which facilitates obtaining complete answers in this domain. In order to solve the problem of natural language problems being difficult to match with structured knowledge, a series of logic rules are first designed to convert natural language into triples of the question. Then, a semantic query expansion strategy based on WordNet is proposed and a priority marking algorithm is proposed to mark the order of triples of the question. Finally, when a question triple corresponds to multiple triples in the knowledge graph, it can be solved by the proposed SimCSE-based similarity method. The designed logic rules can deal with each type of question in a targeted manner according to the different question words and can effectively transform the question text into question triples. In addition, the proposed priority marking algorithm can effectively mark the order in the triple of the question. MKBQA can answer not only computer science-related questions but also extended open domain questions. In practical applications, answering a domain question often cannot rely solely on one knowledge graph. It is necessary to combine domain knowledge and common-sense knowledge. The MKBQA method provides a new idea and can be easily migrated from the field of computer science to other fields. Experiment results on real-world data sets show that, as compared to baselines, our method achieves significant improvements to question answering and can combine common-sense and domain-specific knowledge graphs to give a more complete answer. Full article

► Show Figures

Figure 1

12 pages, 1544 KiB

Open AccessArticle

Semantic Representation Using Sub-Symbolic Knowledge in Commonsense Reasoning

by Dongsuk Oh, Jungwoo Lim, Kinam Park and Heuiseok Lim

Appl. Sci. 2022, 12(18), 9202; https://doi.org/10.3390/app12189202 - 14 Sep 2022

Cited by 2 | Viewed by 2388

Abstract

The commonsense question and answering (CSQA) system predicts the right answer based on a comprehensive understanding of the question. Previous research has developed models that use QA pairs, the corresponding evidence, or the knowledge graph as an input. Each method executes QA tasks [...] Read more.

The commonsense question and answering (CSQA) system predicts the right answer based on a comprehensive understanding of the question. Previous research has developed models that use QA pairs, the corresponding evidence, or the knowledge graph as an input. Each method executes QA tasks with representations of pre-trained language models. However, the ability of the pre-trained language model to comprehend completely remains debatable. In this study, adversarial attack experiments were conducted on question-understanding. We examined the restrictions on the question-reasoning process of the pre-trained language model, and then demonstrated the need for models to use the logical structure of abstract meaning representations (AMRs). Additionally, the experimental results demonstrated that the method performed best when the AMR graph was extended with ConceptNet. With this extension, our proposed method outperformed the baseline in diverse commonsense-reasoning QA tasks. Full article

► Show Figures

Figure 1

25 pages, 4727 KiB

Open AccessArticle

An Interactive Virtual Home Navigation System Based on Home Ontology and Commonsense Reasoning

by Alan Schalkwijk, Motoki Yatsu and Takeshi Morita

Information 2022, 13(6), 287; https://doi.org/10.3390/info13060287 - 6 Jun 2022

Cited by 4 | Viewed by 2966

Abstract

In recent years, researchers from the fields of computer vision, language, graphics, and robotics have tackled Embodied AI research. Embodied AI can learn through interaction with the real world and virtual environments and can perform various tasks in virtual environments using virtual robots. [...] Read more.

In recent years, researchers from the fields of computer vision, language, graphics, and robotics have tackled Embodied AI research. Embodied AI can learn through interaction with the real world and virtual environments and can perform various tasks in virtual environments using virtual robots. However, many of these are one-way tasks in which the interaction is interrupted only by answering questions or requests to the user. In this research, we aim to develop a two-way interactive navigation system by introducing knowledge-based reasoning to Embodied AI research. Specifically, the system obtains guidance candidates that are difficult to identify with existing common-sense reasoning alone by reasoning with the constructed home ontology. Then, we develop a two-way interactive navigation system in which the virtual robot can guide the user to the location in the virtual home environment that the user needs while repeating multiple conversations with the user. We evaluated whether the proposed system was able to present appropriate guidance locations as candidates based on users’ speech input about their home environment. For the evaluation, we extracted the speech data from the corpus of daily conversation, the speech data created by the subject, and the correct answer data for each data and calculated the precision, recall, and F-value. As a result, the F-value was 0.47 for the evaluation data extracted from the daily conversation corpus, and the F-value was 0.49 for the evaluation data created by the subject. Full article

(This article belongs to the Special Issue Knowledge Graph Technology and Its Applications)

► Show Figures

Figure 1

11 pages, 385 KiB

Open AccessArticle

Considering Commonsense in Solving QA: Reading Comprehension with Semantic Search and Continual Learning

by Seungwon Jeong, Dongsuk Oh, Kinam Park and Heuiseok Lim

Appl. Sci. 2022, 12(9), 4099; https://doi.org/10.3390/app12094099 - 19 Apr 2022

Viewed by 2177

Abstract

Unlike previous dialogue-based question-answering (QA) datasets, DREAM, multiple-choice Dialogue-based REAding comprehension exaMination dataset, requires a deep understanding of dialogue. Many problems require multi-sentence reasoning, whereas some require commonsense reasoning. However, most pre-trained language models (PTLMs) do not consider commonsense. In addition, because the [...] Read more.

Unlike previous dialogue-based question-answering (QA) datasets, DREAM, multiple-choice Dialogue-based REAding comprehension exaMination dataset, requires a deep understanding of dialogue. Many problems require multi-sentence reasoning, whereas some require commonsense reasoning. However, most pre-trained language models (PTLMs) do not consider commonsense. In addition, because the maximum number of tokens that a language model (LM) can deal with is limited, the entire dialogue history cannot be included. The resulting information loss has an adverse effect on performance. To address these problems, we propose a Dialogue-based QA model with Common-sense Reasoning (DQACR), a language model that exploits Semantic Search and continual learning. We used Semantic Search to complement information loss from truncated dialogue. In addition, we used Semantic Search and continual learning to improve the PTLM’s commonsense reasoning. Our model achieves an improvement of approximately 1.5% over the baseline method and can thus facilitate QA-related tasks. It contributes toward not only dialogue-based QA tasks but also another form of QA datasets for future tasks. Full article

► Show Figures

Figure 1

19 pages, 9133 KiB

Open AccessArticle

Vision–Language–Knowledge Co-Embedding for Visual Commonsense Reasoning

by JaeYun Lee and Incheol Kim

Sensors 2021, 21(9), 2911; https://doi.org/10.3390/s21092911 - 21 Apr 2021

Cited by 7 | Viewed by 5116

Abstract

Visual commonsense reasoning is an intelligent task performed to decide the most appropriate answer to a question while providing the rationale or reason for the answer when an image, a natural language question, and candidate responses are given. For effective visual commonsense reasoning, [...] Read more.

Visual commonsense reasoning is an intelligent task performed to decide the most appropriate answer to a question while providing the rationale or reason for the answer when an image, a natural language question, and candidate responses are given. For effective visual commonsense reasoning, both the knowledge acquisition problem and the multimodal alignment problem need to be solved. Therefore, we propose a novel Vision–Language–Knowledge Co-embedding (ViLaKC) model that extracts knowledge graphs relevant to the question from an external knowledge base, ConceptNet, and uses them together with the input image to answer the question. The proposed model uses a pretrained vision–language–knowledge embedding module, which co-embeds multimodal data including images, natural language texts, and knowledge graphs into a single feature vector. To reflect the structural information of the knowledge graph, the proposed model uses the graph convolutional neural network layer to embed the knowledge graph first and then uses multi-head self-attention layers to co-embed it with the image and natural language question. The effectiveness and performance of the proposed model are experimentally validated using the VCR v1.0 benchmark dataset. Full article

(This article belongs to the Special Issue Human-Computer Interaction in Smart Environments)

► Show Figures

Figure 1

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI