MDPI - Publisher of Open Access Journals

21 pages, 2248 KB

Open AccessArticle

V-PTP-IC: End-to-End Joint Modeling of Dynamic Scenes and Social Interactions for Pedestrian Trajectory Prediction from Vehicle-Mounted Cameras

by Siqi Bai, Yuwei Fang and Hongbing Li

Sensors 2025, 25(23), 7151; https://doi.org/10.3390/s25237151 - 23 Nov 2025

Viewed by 544

Abstract

Pedestrian trajectory prediction from a vehicle-mounted perspective is essential for autonomous driving in complex urban environments yet remains challenging due to ego-motion jitter, frequent occlusions, and scene variability. Existing approaches, largely developed for static surveillance views, struggle to cope with continuously shifting viewpoints. [...] Read more.

Pedestrian trajectory prediction from a vehicle-mounted perspective is essential for autonomous driving in complex urban environments yet remains challenging due to ego-motion jitter, frequent occlusions, and scene variability. Existing approaches, largely developed for static surveillance views, struggle to cope with continuously shifting viewpoints. To address these issues, we propose V-PTP-IC, an end-to-end framework that stabilizes motion, models inter-agent interactions, and fuses multi-modal cues for trajectory prediction. The system integrates Simple Online and Realtime Tracking (SORT)-based tracklet augmentation, Scale-Invariant Feature Transform (SIFT)-assisted ego-motion compensation, graph-based interaction reasoning, and multi-head attention fusion, followed by Long Short-Term Memory (LSTM) decoding. Experiments on the JAAD and PIE datasets demonstrate that V-PTP-IC substantially outperforms existing baselines, reducing ADE by 27.23% and 25.73% and FDE by 33.88% and 32.85%, respectively. This advances dynamic scene understanding for safer autonomous systems. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

19 pages, 6903 KB

Open AccessArticle

GT-SRR: A Structured Method for Social Relation Recognition with GGNN-Based Transformer

by Dejiao Huang, Menglei Xia, Ruyi Chang, Xiaohan Kong and Shuai Guo

Sensors 2025, 25(10), 2992; https://doi.org/10.3390/s25102992 - 9 May 2025

Cited by 1 | Viewed by 841

Abstract

Social relationship recognition (SRR) holds significant value in fields such as behavior analysis and intelligent social systems. However, existing methods primarily focus on modeling individual visual traits, interaction patterns, and scene-level contextual cues, often failing to capture the complex dependencies among these features [...] Read more.

Social relationship recognition (SRR) holds significant value in fields such as behavior analysis and intelligent social systems. However, existing methods primarily focus on modeling individual visual traits, interaction patterns, and scene-level contextual cues, often failing to capture the complex dependencies among these features and the hierarchical structure of social groups, which are crucial for effective reasoning. In order to overcome these restrictions, this essay suggests a SRR model that integrates Gated Graph Neural Network (GGNN) and Transformer. The task for SRR in this model is image-based. Specifically, the purpose of a novel and robust hybrid feature extraction module is to capture individual characteristics, relative positional information, and group-level cues, which are used to construct relation nodes and group nodes. A modified GGNN is then employed to model the logical dependencies between features. Nevertheless, GGNN alone lacks the capacity to dynamically adjust feature importance, which may result in ambiguous relationship representations. The Transformer’s multi-head self-attention (MSA) mechanism is integrated to improve feature interaction modeling, allowing the model to capture global context and higher-order dependencies effectively. By fusing pairwise features, graph-structured features, and group-level information. Experimental results on public datasets such as PISC demonstrate that the proposed approach outperforms comparison models including Dual-Glance, GRM, GRRN, Graph-BERT, and SRT in terms of accuracy and mean average precision (mAP), validating its effectiveness in multi-feature representation learning and global reasoning. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

23 pages, 5309 KB

Open AccessArticle

Triple Graph Convolutional Network for Hyperspectral Image Feature Fusion and Classification

by Maryam Imani and Daniele Cerra

Remote Sens. 2025, 17(9), 1623; https://doi.org/10.3390/rs17091623 - 3 May 2025

Viewed by 1545

Abstract

Most graph-based networks utilize superpixel generation methods as a preprocessing step, considering superpixels as graph nodes. In the case of hyperspectral images having high variability in spectral features, considering an image region as a graph node may degrade the class discrimination ability of [...] Read more.

Most graph-based networks utilize superpixel generation methods as a preprocessing step, considering superpixels as graph nodes. In the case of hyperspectral images having high variability in spectral features, considering an image region as a graph node may degrade the class discrimination ability of networks for pixel-based classification. Moreover, most graph-based networks focus on global feature extraction, while both local and global information are important for pixel-based classification. To deal with these challenges, superpixel-based graphs are overruled in this work, and a Graph-based Feature Fusion (GF2) method relying on three different graphs is proposed instead. A local patch is considered around each pixel under test, and at the same time, global anchors with the highest informational content are selected from the entire scene. While the first graph explores relationships between neighboring pixels in the local patch and the global anchors, the second and third graphs use the global anchors and pixels of the local patch as nodes, respectively. These graphs are processed using graph convolutional networks, and their results are fused using a cross-attention mechanism. The experiments on three hyperspectral benchmark datasets show that the GF2 network has high classification performance compared to state-of-the-art methods, while imposing a reasonable number of learnable parameters. Full article

(This article belongs to the Special Issue Machine Learning Approaches for Semantic and Instance Segmentation in Remote Sensing)

► Show Figures

Figure 1

28 pages, 8967 KB

Open AccessArticle

Adaptive Global Dense Nested Reasoning Network into Small Target Detection in Large-Scale Hyperspectral Remote Sensing Image

by Siyu Zhan, Yuxuan Yang, Muge Zhong, Guoming Lu and Xinyu Zhou

Remote Sens. 2025, 17(6), 948; https://doi.org/10.3390/rs17060948 - 7 Mar 2025

Cited by 1 | Viewed by 1278

Abstract

Small and dim target detection is a critical challenge in hyperspectral remote sensing, particularly in complex, large-scale scenes where spectral variability across diverse land cover types complicates the detection process. In this paper, we propose a novel target reasoning algorithm named Adaptive Global [...] Read more.

Small and dim target detection is a critical challenge in hyperspectral remote sensing, particularly in complex, large-scale scenes where spectral variability across diverse land cover types complicates the detection process. In this paper, we propose a novel target reasoning algorithm named Adaptive Global Dense Nested Reasoning Network (AGDNR). This algorithm integrates spatial, spectral, and domain knowledge to enhance the detection accuracy of small and dim targets in large-scale environments and simultaneously enables reasoning about target categories. The proposed method involves three key innovations. Firstly, we develop a high-dimensional, multi-layer nested U-Net that facilitates cross-layer feature transfer, preserving high-level features of small and dim targets throughout the network. Secondly, we present a novel approach for computing physicochemical parameters, which enhances the spectral characteristics of targets while minimizing environmental interference. Thirdly, we construct a geographic knowledge graph that incorporates both target and environmental information, enabling global target reasoning and more effective detection of small targets across large-scale scenes. Experimental results on three challenging datasets show that our method outperforms state-of-the-art approaches in detection accuracy and achieves successful classification of different small targets. Consequently, the proposed method offers a robust solution for the precise detection of hyperspectral small targets in large-scale scenarios. Full article

(This article belongs to the Special Issue Integrating Deep Learning with Image Perception for Advanced Remote Sensing Applications)

► Show Figures

Figure 1

17 pages, 81622 KB

Open AccessArticle

A Hierarchical Spatiotemporal Data Model Based on Knowledge Graphs for Representation and Modeling of Dynamic Landslide Scenes

by Juan Li, Jin Zhang, Li Wang and Ao Zhao

Sustainability 2024, 16(23), 10271; https://doi.org/10.3390/su162310271 - 23 Nov 2024

Viewed by 1491

Abstract

Represention and modeling the dynamic landslide scenes is essential for gaining a comprehensive understanding and managing them effectively. Existing models, which focus on a single scale make it difficult to fully express the complex, multi-scale spatiotemporal process within landslide scenes. To address these [...] Read more.

Represention and modeling the dynamic landslide scenes is essential for gaining a comprehensive understanding and managing them effectively. Existing models, which focus on a single scale make it difficult to fully express the complex, multi-scale spatiotemporal process within landslide scenes. To address these issues, we proposed a hierarchical spatiotemporal data model, named as HSDM, to enhance the representation for geographic scenes. Specifically, we introduced a spatiotemporal object model that integrates both their structural and process information of objects. Furthermore, we extended the process definition to capture complex spatiotemporal processes. We sorted out the relationships used in HSDM and defined four types of spatiotemporal correlation relations to represent the connections between spatiotemporal objects. Meanwhile, we constructed a three-level graph model of geographic scenes based on these concepts and relationships. Finally, we achieved representation and modeling of a dynamic landslide scene in Heifangtai using HSDM and implemented complex querying and reasoning with Neo4j’s Cypher language. The experimental results demonstrate our model’s capabilities in modeling and reasoning about complex multi-scale information and spatio-temporal processes with landslide scenes. Our work contributes to landslide knowledge representation, inventory and dynamic simulation. Full article

(This article belongs to the Special Issue Landslides in Urban Environments: Monitoring, Impact Mitigation and Resilient Enhancement)

► Show Figures

Figure 1

20 pages, 9500 KB

Open AccessArticle

Image Captioning Based on Semantic Scenes

by Fengzhi Zhao, Zhezhou Yu, Tao Wang and Yi Lv

Entropy 2024, 26(10), 876; https://doi.org/10.3390/e26100876 - 18 Oct 2024

Cited by 3 | Viewed by 3999

Abstract

With the development of artificial intelligence and deep learning technologies, image captioning has become an important research direction at the intersection of computer vision and natural language processing. The purpose of image captioning is to generate corresponding natural language descriptions by understanding the [...] Read more.

With the development of artificial intelligence and deep learning technologies, image captioning has become an important research direction at the intersection of computer vision and natural language processing. The purpose of image captioning is to generate corresponding natural language descriptions by understanding the content of images. This technology has broad application prospects in fields such as image retrieval, autonomous driving, and visual question answering. Currently, many researchers have proposed region-based image captioning methods. These methods generate captions by extracting features from different regions of an image. However, they often rely on local features of the image and overlook the understanding of the overall scene, leading to captions that lack coherence and accuracy when dealing with complex scenes. Additionally, image captioning methods are unable to extract complete semantic information from visual data, which may lead to captions with biases and deficiencies. Due to these reasons, existing methods struggle to generate comprehensive and accurate captions. To fill this gap, we propose the Semantic Scenes Encoder (SSE) for image captioning. It first extracts a scene graph from the image and integrates it into the encoding of the image information. Then, it extracts a semantic graph from the captions and preserves semantic information through a learnable attention mechanism, which we refer to as the dictionary. During the generation of captions, it combines the encoded information of the image and the learned semantic information to generate complete and accurate captions. To verify the effectiveness of the SSE, we tested the model on the MSCOCO dataset. The experimental results show that the SSE improves the overall quality of the captions. The improvement in scores across multiple evaluation metrics further demonstrates that the SSE possesses significant advantages when processing identical images. Full article

(This article belongs to the Collection Entropy in Image Analysis)

► Show Figures

Figure 1

12 pages, 8185 KB

Open AccessFeature PaperArticle

Augmented Reality Visualization and Quantification of COVID-19 Infections in the Lungs

by Jiaqing Liu, Liang Lyu, Shurong Chai, Huimin Huang, Fang Wang, Tomoko Tateyama, Lanfen Lin and Yenwei Chen

Electronics 2024, 13(6), 1158; https://doi.org/10.3390/electronics13061158 - 21 Mar 2024

Cited by 3 | Viewed by 2207

Abstract

The ongoing COVID-19 pandemic has had a significant impact globally, and the understanding of the disease’s clinical features and impacts remains insufficient. An important metric to evaluate the severity of pneumonia in COVID-19 is the CT Involvement Score (CTIS), which is determined by [...] Read more.

The ongoing COVID-19 pandemic has had a significant impact globally, and the understanding of the disease’s clinical features and impacts remains insufficient. An important metric to evaluate the severity of pneumonia in COVID-19 is the CT Involvement Score (CTIS), which is determined by assessing the proportion of infections in the lung field region using computed tomography (CT) images. Interactive augmented reality visualization and quantification of COVID-19 infection from CT allow us to augment the traditional diagnostic techniques and current COVID-19 treatment strategies. Thus, in this paper, we present a system that combines augmented reality (AR) hardware, specifically the Microsoft HoloLens, with deep learning algorithms in a user-oriented pipeline to provide medical staff with an intuitive 3D augmented reality visualization of COVID-19 infections in the lungs. The proposed system includes a graph-based pyramid global context reasoning module to segment COVID-19-infected lung regions, which can then be visualized using the HoloLens AR headset. Through segmentation, we can quantitatively evaluate and intuitively visualize which part of the lung is infected. In addition, by evaluating the infection status in each lobe quantitatively, it is possible to assess the infection severity. We also implemented Spectator View and Sharing a Scene functions into the proposed system, which enable medical staff to present the AR content to a wider audience, e.g., radiologists. By providing a 3D perception of the complexity of COVID-19, the augmented reality visualization generated by the proposed system offers an immersive experience in an interactive and cooperative 3D approach. We expect that this will facilitate a better understanding of CT-guided COVID-19 diagnosis and treatment, as well as improved patient outcomes. Full article

(This article belongs to the Special Issue Advances in Immersive Technologies, Knowledge Representation, and AI for Human Centered Digital Experiences)

► Show Figures

Figure 1

19 pages, 4911 KB

Open AccessArticle

Trajectory Prediction with Attention-Based Spatial–Temporal Graph Convolutional Networks for Autonomous Driving

by Hongbo Li, Yilong Ren, Kaixuan Li and Wenjie Chao

Appl. Sci. 2023, 13(23), 12580; https://doi.org/10.3390/app132312580 - 22 Nov 2023

Cited by 5 | Viewed by 3582

Abstract

Accurate and reliable trajectory prediction is crucial for autonomous vehicles to achieve safe and efficient operation. Vehicles perceive the historical trajectories of moving objects and make predictions of behavioral intentions for a future period of time. With the predicted trajectories of moving objects [...] Read more.

Accurate and reliable trajectory prediction is crucial for autonomous vehicles to achieve safe and efficient operation. Vehicles perceive the historical trajectories of moving objects and make predictions of behavioral intentions for a future period of time. With the predicted trajectories of moving objects such as obstacle vehicles, pedestrians, and non-motorized vehicles as inputs, self-driving vehicles can make more rational driving decisions and plan more reasonable and safe vehicle motion behaviors. However, due to traffic environments such as intersection scenes with highly interdependent and dynamic attributes, the task of motion anticipation becomes challenging. Existing works focus on the mutual relationships among vehicles while ignoring other potential essential interactions such as vehicle–traffic rules. These studies have not yet deeply explored the intensive learning of interactions between multi-agents, which may result in evaluation deviations. Aiming to meet these issues, we have designed a novel framework, namely trajectory prediction with attention-based spatial–temporal graph convolutional networks (TPASTGCN). In our proposal, the multi-agent interaction mechanisms, including vehicle–vehicle and vehicle–traffic rules, are meticulously highlighted and integrated into one homogeneous graph by transferring the time-series data of traffic lights into the spatial–temporal domains. Through integrating the attention mechanism into the adjacency matrix, we effectively learn the different strengths of interactive association and improve the model’s ability to capture critical features. Simultaneously, we construct a hierarchical structure employing the spatial GCN and temporal GCN to extract the spatial dependencies of traffic networks. Profiting from the gated recurrent unit (GRU), the scene context in temporal dimensions is further attained and enhanced with the encoder. In such a way, the GCN and GRU networks are fused as a features extractor module in the proposed framework. Finally, the future potential trajectories generation tasks are performed by another GRU network. Experiments on real-world datasets demonstrate the superior performance of the scheme compared with several baselines. Full article

(This article belongs to the Special Issue Autonomous Driving and Intelligent Transportation)

► Show Figures

Figure 1

17 pages, 1621 KB

Open AccessArticle

Symmetric Graph-Based Visual Question Answering Using Neuro-Symbolic Approach

by Jiyoun Moon

Symmetry 2023, 15(9), 1713; https://doi.org/10.3390/sym15091713 - 7 Sep 2023

Cited by 1 | Viewed by 2258

Abstract

As the applications of robots expand across a wide variety of areas, high-level task planning considering human–robot interactions is emerging as a critical issue. Various elements that facilitate flexible responses to humans in an ever-changing environment, such as scene understanding, natural language processing, [...] Read more.

As the applications of robots expand across a wide variety of areas, high-level task planning considering human–robot interactions is emerging as a critical issue. Various elements that facilitate flexible responses to humans in an ever-changing environment, such as scene understanding, natural language processing, and task planning, are thus being researched extensively. In this study, a visual question answering (VQA) task was examined in detail from among an array of technologies. By further developing conventional neuro-symbolic approaches, environmental information is stored and utilized in a symmetric graph format, which enables more flexible and complex high-level task planning. We construct a symmetric graph composed of information such as color, size, and position for the objects constituting the environmental scene. VQA, using graphs, largely consists of a part expressing a scene as a graph, a part converting a question into SPARQL, and a part reasoning the answer. The proposed method was verified using a public dataset, CLEVR, with which it successfully performed VQA. We were able to directly confirm the process of inferring answers using SPARQL queries converted from the original queries and environmental symmetric graph information, which is distinct from existing methods that make it difficult to trace the path to finding answers. Full article

(This article belongs to the Special Issue Advances in Computer Vision, Pattern Recognition, Machine Learning and Symmetry)

► Show Figures

Figure 1

15 pages, 2062 KB

Open AccessArticle

A Simple Framework for Scene Graph Reasoning with Semantic Understanding of Complex Sentence Structure

by Yoonseok Heo and Sangwoo Kang

Mathematics 2023, 11(17), 3751; https://doi.org/10.3390/math11173751 - 31 Aug 2023

Cited by 1 | Viewed by 2760

Abstract

A rapidly expanding multimedia environment in recent years has led to an explosive increase in demand for multimodality that can communicate with humans in various ways. Even though the convergence of vision and language intelligence has shed light on the remarkable success over [...] Read more.

A rapidly expanding multimedia environment in recent years has led to an explosive increase in demand for multimodality that can communicate with humans in various ways. Even though the convergence of vision and language intelligence has shed light on the remarkable success over the last few years, there is still a caveat: it is unknown whether they truly understand the semantics of the image. More specifically, how they correctly capture relationships between objects represented within the image is still regarded as a black box. In order to testify whether such relationships are well understood, this work mainly focuses on the Graph-structured visual Question Answering (GQA) task which evaluates the understanding of an image by reasoning a scene graph describing the structural characteristics of an image in the form of natural language together with the image. Unlike the existing approaches that have been accompanied by an additional encoder for scene graphs, we propose a simple yet effective framework using pre-trained multimodal transformers for scene graph reasoning. Inspired by the fact that a scene graph can be regarded as a set of sentences describing two related objects with a relationship, we fuse them into the framework separately from the question. In addition, we propose a multi-task learning method that utilizes evaluating the grammatical validity of questions as an auxiliary task to better understand a question with complex structures. This utilizes the semantic role labels of the question to randomly shuffle the sentence structure of the question. We have conducted extensive experiments to evaluate the effectiveness in terms of task capabilities, ablation studies, and generalization. Full article

► Show Figures

Figure 1

35 pages, 5910 KB

Open AccessReview

From SLAM to Situational Awareness: Challenges and Survey

by Hriday Bavle, Jose Luis Sanchez-Lopez, Claudio Cimarelli, Ali Tourani and Holger Voos

Sensors 2023, 23(10), 4849; https://doi.org/10.3390/s23104849 - 17 May 2023

Cited by 43 | Viewed by 9262

Abstract

The capability of a mobile robot to efficiently and safely perform complex missions is limited by its knowledge of the environment, namely the situation. Advanced reasoning, decision-making, and execution skills enable an intelligent agent to act autonomously in unknown environments. Situational Awareness [...] Read more.

The capability of a mobile robot to efficiently and safely perform complex missions is limited by its knowledge of the environment, namely the situation. Advanced reasoning, decision-making, and execution skills enable an intelligent agent to act autonomously in unknown environments. Situational Awareness (SA) is a fundamental capability of humans that has been deeply studied in various fields, such as psychology, military, aerospace, and education. Nevertheless, it has yet to be considered in robotics, which has focused on single compartmentalized concepts such as sensing, spatial perception, sensor fusion, state estimation, and Simultaneous Localization and Mapping (SLAM). Hence, the present research aims to connect the broad multidisciplinary existing knowledge to pave the way for a complete SA system for mobile robotics that we deem paramount for autonomy. To this aim, we define the principal components to structure a robotic SA and their area of competence. Accordingly, this paper investigates each aspect of SA, surveying the state-of-the-art robotics algorithms that cover them, and discusses their current limitations. Remarkably, essential aspects of SA are still immature since the current algorithmic development restricts their performance to only specific environments. Nevertheless, Artificial Intelligence (AI), particularly Deep Learning (DL), has brought new methods to bridge the gap that maintains these fields apart from the deployment to real-world scenarios. Furthermore, an opportunity has been discovered to interconnect the vastly fragmented space of robotic comprehension algorithms through the mechanism of Situational Graph (S-Graph), a generalization of the well-known scene graph. Therefore, we finally shape our vision for the future of robotic situational awareness by discussing interesting recent research directions. Full article

(This article belongs to the Special Issue Aerial Robotics: Navigation and Path Planning)

► Show Figures

Figure 1

16 pages, 2035 KB

Open AccessArticle

CGUN-2A: Deep Graph Convolutional Network via Contrastive Learning for Large-Scale Zero-Shot Image Classification

by Liangwei Li, Lin Liu, Xiaohui Du, Xiangzhou Wang, Ziruo Zhang, Jing Zhang, Ping Zhang and Juanxiu Liu

Sensors 2022, 22(24), 9980; https://doi.org/10.3390/s22249980 - 18 Dec 2022

Cited by 3 | Viewed by 3398

Abstract

Taxonomy illustrates that natural creatures can be classified with a hierarchy. The connections between species are explicit and objective and can be organized into a knowledge graph (KG). It is a challenging task to mine features of known categories from KG and to [...] Read more.

Taxonomy illustrates that natural creatures can be classified with a hierarchy. The connections between species are explicit and objective and can be organized into a knowledge graph (KG). It is a challenging task to mine features of known categories from KG and to reason on unknown categories. Graph Convolutional Network (GCN) has recently been viewed as a potential approach to zero-shot learning. GCN enables knowledge transfer by sharing the statistical strength of nodes in the graph. More layers of graph convolution are stacked in order to aggregate the hierarchical information in the KG. However, the Laplacian over-smoothing problem will be severe as the number of GCN layers deepens, which leads the features between nodes toward a tendency to be similar and degrade the performance of zero-shot image classification tasks. We consider two parts to mitigate the Laplacian over-smoothing problem, namely reducing the invalid node aggregation and improving the discriminability among nodes in the deep graph network. We propose a top-k graph pooling method based on the self-attention mechanism to control specific node aggregation, and we introduce a dual structural symmetric knowledge graph additionally to enhance the representation of nodes in the latent space. Finally, we apply these new concepts to the recently widely used contrastive learning framework and propose a novel Contrastive Graph U-Net with two Attention-based graph pooling (Att-gPool) layers, CGUN-2A, which explicitly alleviates the Laplacian over-smoothing problem. To evaluate the performance of the method on complex real-world scenes, we test it on the large-scale zero-shot image classification dataset. Extensive experiments show the positive effect of allowing nodes to perform specific aggregation, as well as homogeneous graph comparison, in our deep graph network. We show how it significantly boosts zero-shot image classification performance. The Hit@1 accuracy is 17.5% relatively higher than the baseline model on the ImageNet21K dataset. Full article

(This article belongs to the Special Issue Recent Trends of Computer Vision and Pattern Recognition for Ecological Monitoring and Sensing)

► Show Figures

Figure 1

13 pages, 1858 KB

Open AccessArticle

Visual Relationship Detection with Multimodal Fusion and Reasoning

by Shouguan Xiao and Weiping Fu

Sensors 2022, 22(20), 7918; https://doi.org/10.3390/s22207918 - 18 Oct 2022

Cited by 4 | Viewed by 2980

Abstract

Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of [...] Read more.

Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of scenes and infer covert states using common sense. Therefore, these methods cannot predict some hidden relationships of object-pairs from complex scenes. To address this problem, we propose unifying vision–language fusion and knowledge graph reasoning to combine visual feature embedding with external common sense knowledge to determine the visual relationships of objects. In addition, before training the relationship detection network, we devise an object–pair proposal module to solve the combination explosion problem. Extensive experiments show that our proposed method outperforms the state-of-the-art methods on the Visual Genome and Visual Relationship Detection datasets. Full article

(This article belongs to the Special Issue Neural Networks and Semantic Analysis in Sensor, Image and Video Processing)

► Show Figures

Figure 1

19 pages, 2475 KB

Open AccessArticle

Graph-Based Embedding Smoothing Network for Few-Shot Scene Classification of Remote Sensing Images

by Zhengwu Yuan, Wendong Huang, Chan Tang, Aixia Yang and Xiaobo Luo

Remote Sens. 2022, 14(5), 1161; https://doi.org/10.3390/rs14051161 - 26 Feb 2022

Cited by 27 | Viewed by 3673

Abstract

As a fundamental task in the field of remote sensing, scene classification is increasingly attracting attention. The most popular way to solve scene classification is to train a deep neural network with a large-scale remote sensing dataset. However, given a small amount of [...] Read more.

As a fundamental task in the field of remote sensing, scene classification is increasingly attracting attention. The most popular way to solve scene classification is to train a deep neural network with a large-scale remote sensing dataset. However, given a small amount of data, how to train a deep neural network with outstanding performance remains a challenge. Existing methods seek to take advantage of transfer knowledge or meta-knowledge to resolve the scene classification issue of remote sensing images with a handful of labeled samples while ignoring various class-irrelevant noises existing in scene features and the specificity of different tasks. For this reason, in this paper, an end-to-end graph neural network is presented to enhance the performance of scene classification in few-shot scenarios, referred to as the graph-based embedding smoothing network (GES-Net). Specifically, GES-Net adopts an unsupervised non-parametric regularizer, called embedding smoothing, to regularize embedding features. Embedding smoothing can capture high-order feature interactions in an unsupervised manner, which is adopted to remove undesired noises from embedding features and yields smoother embedding features. Moreover, instead of the traditional sample-level relation representation, GES-Net introduces a new task-level relation representation to construct the graph. The task-level relation representation can capture the relations between nodes from the perspective of the whole task rather than only between samples, which can highlight subtle differences between nodes and enhance the discrimination of the relations between nodes. Experimental results on three public remote sensing datasets, UC Merced, WHU-RS19, and NWPU-RESISC45, showed that the proposed GES-Net approach obtained state-of-the-art results in the settings of limited labeled samples. Full article

(This article belongs to the Topic Big Data and Artificial Intelligence)

► Show Figures

Graphical abstract

20 pages, 10299 KB

Open AccessArticle

Ontological Ship Behavior Modeling Based on COLREGs for Knowledge Reasoning

by Shubin Zhong, Yuanqiao Wen, Yamin Huang, Xiaodong Cheng and Liang Huang

J. Mar. Sci. Eng. 2022, 10(2), 203; https://doi.org/10.3390/jmse10020203 - 2 Feb 2022

Cited by 21 | Viewed by 3964

Abstract

Formal expression of ship behavior is the basis for developing autonomous navigation systems, which supports the scene recognition, the intention inference, and the rule-compliant actions of the systems. The Convention on the International Regulations for Preventing Collisions at Sea (COLREGs) offers experience-based expressions [...] Read more.

Formal expression of ship behavior is the basis for developing autonomous navigation systems, which supports the scene recognition, the intention inference, and the rule-compliant actions of the systems. The Convention on the International Regulations for Preventing Collisions at Sea (COLREGs) offers experience-based expressions of ship behavior for human beings, helping the humans recognize the scene, infer the intention, and choose rule-compliant actions. However, it is still a challenge to teach a machine to interpret the COLREGs. This paper proposed an ontological ship behavior model based on the COLREGs using knowledge graph techniques, which aims at helping the machine interpret the COLREGs rules. In this paper, the ship is seen as a temporal-spatial object and its behavior is described as the change of object elements in time spatial scales by using Resource Description Framework (RDF), function mapping, and set expression methods. To demonstrate the proposed method, the Narrow Channel article (Rule 9) from COLREGs is introduced, and the ship objects and the ship behavior expression based on Rule 9 are shown. In brief, this paper lays a theoretical foundation for further constructing the ship behavior knowledge graph from COLREGs, which is helpful for the complete machine reasoning of ship behavior knowledge in the future. Full article

(This article belongs to the Special Issue Data/Knowledge-Driven Behaviour Analysis for Maritime Autonomous Surface Ships)

► Show Figures

Figure 1

Search Results (23)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (23)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI