MDPI - Publisher of Open Access Journals

22 pages, 1295 KB

Open AccessArticle

Enhanced Similarity Matrix Learning for Multi-View Clustering

by Dongdong Zhang, Pusheng Wang and Qin Li

Electronics 2025, 14(14), 2845; https://doi.org/10.3390/electronics14142845 - 16 Jul 2025

Viewed by 229

Graph-based multi-view clustering is a fundamental analysis method that learns the similarity matrix of multi-view data. Despite its success, it has two main limitations: (1) complementary information is not fully utilized by directly combining graphs from different views; (2) existing multi-view clustering methods [...] Read more.

Graph-based multi-view clustering is a fundamental analysis method that learns the similarity matrix of multi-view data. Despite its success, it has two main limitations: (1) complementary information is not fully utilized by directly combining graphs from different views; (2) existing multi-view clustering methods do not adequately address redundancy and noise in the data, significantly affecting performance. To address these issues, we propose the Enhanced Similarity Matrix Learning (ES-MVC) for multi-view clustering, which dynamically integrates global graphs from all views with local graphs from each view to create an improved similarity matrix. Specifically, the global graph captures cross-view consistency, while the local graph preserves view-specific geometric patterns. The balance between global and local graphs is controlled through an adaptive weighting strategy, where hyperparameters adjust the relative importance of each graph, effectively capturing complementary information. In this way, our method can learn the clustering structure that contains fully complementary information, leveraging both global and local graphs. Meanwhile, we utilize a robust similarity matrix initialization to reduce the negative effects caused by noisy data. For model optimization, we derive an effective optimization algorithm that converges quickly, typically requiring fewer than five iterations for most datasets. Extensive experimental results on diverse real-world datasets demonstrate the superiority of our method over state-of-the-art multi-view clustering methods. In our experiments on datasets such as MSRC-v1, Caltech101, and HW, our proposed method achieves superior clustering performance with average accuracy (ACC) values of 0.7643, 0.6097, and 0.9745, respectively, outperforming the most advanced multi-view clustering methods such as OMVFC-LICAG, which yield ACC values of 0.7284, 0.4512, and 0.8372 on the same datasets. Full article

► Show Figures

Figure 1

21 pages, 9797 KB

Open AccessArticle

Artificial Intelligence-Driven Optimal Charging Strategy for Electric Vehicles and Impacts on Electric Power Grid

by Umar Jamil, Raul Jose Alva, Sara Ahmed and Yu-Fang Jin

Electronics 2025, 14(7), 1471; https://doi.org/10.3390/electronics14071471 - 6 Apr 2025

Cited by 2 | Viewed by 2613

Abstract

Electric vehicles (EVs) play a crucial role in achieving sustainability goals, mitigating energy crises, and reducing air pollution. However, their rapid adoption poses significant challenges to the power grid, particularly during peak charging periods, necessitating advanced load management strategies. This study introduces an [...] Read more.

Electric vehicles (EVs) play a crucial role in achieving sustainability goals, mitigating energy crises, and reducing air pollution. However, their rapid adoption poses significant challenges to the power grid, particularly during peak charging periods, necessitating advanced load management strategies. This study introduces an artificial intelligence (AI)-integrated optimal charging framework designed to facilitate fast charging and mitigate grid stress by smoothing the “duck curve”. Data from Caltech’s Adaptive Charging Network (ACN) at the National Aeronautics and Space Administration (NASA) Jet Propulsion Laboratory (JPL) site was collected and categorized into day and night patterns to predict charging duration based on key features, including start charging time and energy requested. The AI-driven charging strategy developed optimizes energy management, reduces peak loads, and alleviates grid strain. Additionally, the study evaluates the impact of integrating 1.5 million, 3 million, and 5 million EVs under various AI-based charging strategies, demonstrating the framework’s effectiveness in managing large-scale EV adoption. The peak power consumption reaches around 22,000 MW without EVs, 25,000 MW for 1.5 million EVs, 28,000 MW for 3 million EVs, and 35,000 MW for 5 million EVs without any charging strategy. By implementing an AI-driven optimal charging optimization strategy that considers both early charging and duck curve smoothing, the peak demand is reduced by approximately 16% for 1.5 million EVs, 21.43% for 3 million EVs, and 34.29% for 5 million EVs. Full article

(This article belongs to the Special Issue Recent Advances in Modeling and Control of Electric Energy Systems)

► Show Figures

Figure 1

20 pages, 4907 KB

Open AccessArticle

Phenolic and Acidic Compounds in Radiation Fog at Strasbourg Metropolitan

by Dani Khoury, Maurice Millet, Yasmine Jabali and Olivier Delhomme

Atmosphere 2024, 15(10), 1240; https://doi.org/10.3390/atmos15101240 - 17 Oct 2024

Cited by 1 | Viewed by 996

Abstract

Sixty-four phenols grouped as nitrated, bromo, amino, methyl, chloro-phenols, and cresols, and thirty-eight organic acids grouped as mono-carboxylic and dicarboxylic are analyzed in forty-two fog samples collected in the Alsace region between 2015 and 2021 to check their atmospheric behavior. Fogwater samples are [...] Read more.

Sixty-four phenols grouped as nitrated, bromo, amino, methyl, chloro-phenols, and cresols, and thirty-eight organic acids grouped as mono-carboxylic and dicarboxylic are analyzed in forty-two fog samples collected in the Alsace region between 2015 and 2021 to check their atmospheric behavior. Fogwater samples are collected using the Caltech Active Strand Cloudwater Collector (CASCC2), extracted using liquid–liquid extraction (LLE) on a solid cartridge (XTR Chromabond), and then analyzed using gas chromatography coupled with mass spectrometry (GC-MS). The results show the high capability of phenols and acids to be scavenged by fogwater due to their high solubility. Nitro-phenols and mono-carboxylic acids have the highest contributions to the total phenolic and acidic concentrations, respectively. 2,5-dinitrophenol, 3-methyl-4-nitrophenol, 4-nitrophenol, and 3,4-dinitrophenol have the highest concentration, originating mainly from vehicular emissions and some photochemical reactions. The top three mono-carboxylic acids are hexadecenoic acid (C16), eicosanoic acid (C18), and dodecanoic acid (C12), whereas succinic acid, suberic acid, sebacic acid, and oxalic acid are the most concentrated dicarboxylic acids, originated either from atmospheric oxidation (mainly secondary organic aerosols (SOAs)) or vehicular transport. Pearson’s correlations show positive correlations between organic acids and previously analyzed metals (p < 0.05), between mono- and dicarboxylic acids (p < 0.001), and between the analyzed acidic compounds (p < 0.001), whereas no correlations are observed with previously analyzed inorganic ions. Total phenolic and acidic fractions are found to be much higher than those observed for pesticides, polycyclic aromatic hydrocarbons (PAHs), and polychlorinated biphenyls (PCBs) measured at the same region due to their higher scavenging by fogwater. Full article

(This article belongs to the Section Meteorology)

► Show Figures

Figure 1

16 pages, 1822 KB

Open AccessArticle

A Pedestrian Detection Network Based on an Attention Mechanism and Pose Information

by Zhaoyin Jiang, Shucheng Huang and Mingxing Li

Appl. Sci. 2024, 14(18), 8214; https://doi.org/10.3390/app14188214 - 12 Sep 2024

Cited by 1 | Viewed by 1513

Abstract

Pedestrian detection has recently attracted widespread attention as a challenging problem in computer vision. The accuracy of pedestrian detection is affected by differences in gestures, background clutter, local occlusion, differences in scales, pixel blur, and other factors occurring in real scenes. These problems [...] Read more.

Pedestrian detection has recently attracted widespread attention as a challenging problem in computer vision. The accuracy of pedestrian detection is affected by differences in gestures, background clutter, local occlusion, differences in scales, pixel blur, and other factors occurring in real scenes. These problems lead to false and missed detections. In view of these visual description deficiencies, we leveraged pedestrian pose information as a supplementary resource to address the occlusion challenges that arise in pedestrian detection. An attention mechanism was integrated into the visual information as a supplement to the pose information, because the acquisition of pose information was limited by the pose estimation algorithm. We developed a pedestrian detection method that integrated an attention mechanism with visual and pose information, including pedestrian region generation and pedestrian recognition networks, effectively addressing occlusion and false detection issues. The pedestrian region proposal network was used to generate a series of candidate regions with possible pedestrian targets from the original image. Then, the pedestrian recognition network was used to judge whether each candidate region contained pedestrian targets. The pedestrian recognition network was composed of four parts: visual features, pedestrian poses, pedestrian attention, and classification modules. The visual feature module was responsible for extracting the visual feature descriptions of candidate regions. The pedestrian pose module was used to extract pose feature descriptions. The pedestrian attention module was used to extract attention information, and the classification module was responsible for fusing visual features and pedestrian pose descriptions with the attention mechanism. The experimental results on the Caltech and CityPersons datasets demonstrated that the proposed method could substantially more accurately identify pedestrians than current state-of-the-art methods. Full article

► Show Figures

Figure 1

18 pages, 4382 KB

Open AccessArticle

Vehicle Classification Algorithm Based on Improved Vision Transformer

by Xinlong Dong, Peicheng Shi, Yueyue Tang, Li Yang, Aixi Yang and Taonian Liang

World Electr. Veh. J. 2024, 15(8), 344; https://doi.org/10.3390/wevj15080344 - 30 Jul 2024

Cited by 7 | Viewed by 1952

Abstract

Vehicle classification technology is one of the foundations in the field of automatic driving. With the development of deep learning technology, visual transformer structures based on attention mechanisms can represent global information quickly and effectively. However, due to direct image segmentation, local feature [...] Read more.

Vehicle classification technology is one of the foundations in the field of automatic driving. With the development of deep learning technology, visual transformer structures based on attention mechanisms can represent global information quickly and effectively. However, due to direct image segmentation, local feature details and information will be lost. To solve this problem, we propose an improved vision transformer vehicle classification network (IND-ViT). Specifically, we first design a CNN-In D branch module to extract local features before image segmentation to make up for the loss of detail information in the vision transformer. Then, in order to solve the problem of misdetection caused by the large similarity of some vehicles, we propose a sparse attention module, which can screen out the discernible regions in the image and further improve the detailed feature representation ability of the model. Finally, this paper uses the contrast loss function to further increase the intra-class consistency and inter-class difference of classification features and improve the accuracy of vehicle classification recognition. Experimental results show that the accuracy of the proposed model on the datasets of vehicle classification BIT-Vehicles, CIFAR-10, Oxford Flower-102, and Caltech-101 is higher than that of the original vision transformer model. Respectively, it increased by 1.3%, 1.21%, 7.54%, and 3.60%; at the same time, it also met a certain real-time requirement to achieve a balance of accuracy and real time. Full article

► Show Figures

Figure 1

13 pages, 4647 KB

Open AccessArticle

New Estimates of Nitrogen Fixation on Early Earth

by Madeline Christensen, Danica Adams, Michael L. Wong, Patrick Dunn and Yuk L. Yung

Life 2024, 14(5), 601; https://doi.org/10.3390/life14050601 - 8 May 2024

Cited by 3 | Viewed by 1979

Abstract

Fixed nitrogen species generated by the early Earth’s atmosphere are thought to be critical to the emergence of life and the sustenance of early metabolisms. A previous study estimated nitrogen fixation in the Hadean Earth’s N₂/CO₂-dominated atmosphere; however, that [...] Read more.

Fixed nitrogen species generated by the early Earth’s atmosphere are thought to be critical to the emergence of life and the sustenance of early metabolisms. A previous study estimated nitrogen fixation in the Hadean Earth’s N₂/CO₂-dominated atmosphere; however, that previous study only considered a limited chemical network that produces NO_x species (i.e., no HCN formation) via the thermochemical dissociation of N₂ and CO₂ in lightning flashes, followed by photochemistry. Here, we present an updated model of nitrogen fixation on Hadean Earth. We use the Chemical Equilibrium with Applications (CEA) thermochemical model to estimate lightning-induced NO and HCN formation and an updated version of KINETICS, the 1-D Caltech/JPL photochemical model, to assess the photochemical production of fixed nitrogen species that rain out into the Earth’s early ocean. Our updated photochemical model contains hydrocarbon and nitrile chemistry, and we use a Geant4 simulation platform to consider nitrogen fixation stimulated by solar energetic particle deposition throughout the atmosphere. We study the impact of a novel reaction pathway for generating HCN via HCN₂, inspired by the experimental results which suggest that reactions with CH radicals (from CH₄ photolysis) may facilitate the incorporation of N into the molecular structure of aerosols. When the HCN₂ reactions are added, we find that the HCN rainout rate rises by a factor of five in our 1-bar case and is about the same in our 2- and 12-bar cases. Finally, we estimate the equilibrium concentration of fixed nitrogen species under a kinetic steady state in the Hadean ocean, considering loss by hydrothermal vent circulation, photoreduction, and hydrolysis. These results inform our understanding of environments that may have been relevant to the formation of life on Earth, as well as processes that could lead to the emergence of life elsewhere in the universe. Full article

(This article belongs to the Special Issue Feature Papers in Origins of Life)

► Show Figures

Figure 1

14 pages, 8767 KB

Open AccessArticle

Enhanced YOLOX with United Attention Head for Road Detetion When Driving

by Yuhuan Wu and Yonghong Wu

Mathematics 2024, 12(9), 1331; https://doi.org/10.3390/math12091331 - 27 Apr 2024

Viewed by 1829

Abstract

Object detection plays a crucial role in autonomous driving assistance systems. It requires high accuracy for prediction, a small size for deployment on mobile devices, and real-time inference speed to ensure safety. In this paper, we present a compact and efficient algorithm called [...] Read more.

Object detection plays a crucial role in autonomous driving assistance systems. It requires high accuracy for prediction, a small size for deployment on mobile devices, and real-time inference speed to ensure safety. In this paper, we present a compact and efficient algorithm called YOLOX with United Attention Head (UAH-YOLOX) for detection in autonomous driving scenarios. By replacing the backbone network with GhostNet for feature extraction, the model reduces the number of parameters and computational complexity. By adding a united attention head before the YOLO head, the model effectively detects the scale, position, and contour features of targets. In particular, an attention module called Spatial Self-Attention is designed to extract spatial location information, demonstrating great potential in detection. In our network, the IOU Loss (Intersection of Union) has been replaced with CIOU Loss (Complete Intersection of Union). Further experiments demonstrate the effectiveness of our proposed methods on the BDD100k dataset and the Caltech Pedestrian dataset. UAH-YOLOX achieves state-of-the-art results by improving the detection accuracy of the BDD100k dataset by 1.70% and increasing processing speed by 3.37 frames per second (FPS). Visualization provides specific examples in various scenarios. Full article

► Show Figures

Figure 1

14 pages, 1715 KB

Open AccessArticle

An Integrated Active Deep Learning Approach for Image Classification from Unlabeled Data with Minimal Supervision

by Amira Abdelwahab, Ahmed Afifi and Mohamed Salama

Electronics 2024, 13(1), 169; https://doi.org/10.3390/electronics13010169 - 30 Dec 2023

Cited by 2 | Viewed by 1737

Abstract

The integration of active learning (AL) and deep learning (DL) presents a promising avenue for enhancing the efficiency and performance of deep learning classifiers. This article introduces an approach that seamlessly integrates AL principles into the training process of DL models to build [...] Read more.

The integration of active learning (AL) and deep learning (DL) presents a promising avenue for enhancing the efficiency and performance of deep learning classifiers. This article introduces an approach that seamlessly integrates AL principles into the training process of DL models to build robust image classifiers. The proposed approach employs a unique methodology to select high-confidence unlabeled data points for immediate labeling, reducing the need for human annotation and minimizing annotation costs. Specifically, by combining uncertainty sampling with the pseudo-labeling of confident data, the proposed approach expands the training set efficiently. The proposed approach uses a hybrid active deep learning model that selects the most informative data points that need labeling based on an uncertainty measure. Then, it iteratively retrains a deep neural network classifier on the newly labeled samples. The model achieves high accuracy with fewer manually labeled samples than traditional supervised deep learning by selecting the most informative samples for labeling and retraining in a loop. Experiments on various image classification datasets demonstrate that the proposed model outperforms conventional approaches in terms of classification accuracy and reduced human annotation requirements. The proposed model achieved accuracy of 98.9% and 99.3% for the Cross-Age Celebrity and Caltech Image datasets compared to the conventional approach, which achieved 92.3% and 74.3%, respectively. In summary, this work presents a promising unified active deep learning approach to minimize the human effort in manually labeling data while maximizing classification accuracy by strategically labeling only the most valuable samples for the model. Full article

► Show Figures

Figure 1

27 pages, 3440 KB

Open AccessArticle

Sparse Representations Optimization with Coupled Bayesian Dictionary and Dictionary Classifier for Efficient Classification

by Muhammad Riaz-ud-din, Salman Abdul Ghafoor and Faisal Shafait

Appl. Sci. 2024, 14(1), 306; https://doi.org/10.3390/app14010306 - 29 Dec 2023

Viewed by 1895

Abstract

Among the numerous techniques followed to learn a linear classifier through the discriminative dictionary and sparse representations learning of signals, the techniques to learn a nonparametric Bayesian classifier jointly and discriminately with the dictionary and the corresponding sparse representations have drawn considerable attention [...] Read more.

Among the numerous techniques followed to learn a linear classifier through the discriminative dictionary and sparse representations learning of signals, the techniques to learn a nonparametric Bayesian classifier jointly and discriminately with the dictionary and the corresponding sparse representations have drawn considerable attention from researchers. These techniques jointly learn two sets of sparse representations, one for the training samples over the dictionary and the other for the corresponding labels over the dictionary classifier. At the prediction stage, the representations of the test samples computed over the learned dictionary do not truly represent the corresponding labels, exposing weakness in the joint learning claim of these techniques. We mitigate this problem and strengthen the joint by learning a set of weights over the dictionary to represent the training data and further optimizing the same weights over the dictionary classifier to represent the labels of the corresponding classes of the training data. Now, at the prediction stage, the representation weights of the test samples computed over the learned dictionary also represent the labels of the corresponding classes of the test samples, resulting in the accurate reconstruction of the labels of the classes by the learned dictionary classifier. Overall, a reduction in the size of the Bayesian model’s parameters also improves training time. We analytically and nonparametrically derived the posterior conditional probabilities of the model from the overall joint probability of the model using Bayes’ theorem. We used the Gibbs sampler to solve the joint probability of the model using the derived conditional probabilities, which also supports our claim of efficient optimization of the coupled/joint dictionaries and the sparse representation parameters. We demonstrated the effectiveness of our approach through experiments on the standard datasets, i.e., the Extended YaleB and AR face databases for face recognition, Caltech-101 and Fifteen Scene Category databases for categorization, and UCF sports action database for action recognition. We compared the results with the state-of-the-art methods in the area. The classification accuracies, i.e., 93.25%, 89.27%, 94.81%, 98.10%, and 95.00%, of our approach on the datasets have increases of 0.5 to 2% on average. The overall average error margin of the confidence intervals in our approach is 0.24 compared with the second-best approach, JBDC, for which it is 0.34. The AUC–ROC scores of our approach are 0.98 and 0.992, which are better than those of others, i.e., 0.960 and 0.98, respectively. Our approach is also computationally efficient. Full article

(This article belongs to the Special Issue Novel Applications of Machine Learning and Bayesian Optimization)

► Show Figures

Figure 1

4 pages, 538 KB

Open AccessProceeding Paper

YOLO-Based Fish Detection in Underwater Environments

by Mohammed Yasser Ouis and Moulay Akhloufi

Environ. Sci. Proc. 2024, 29(1), 44; https://doi.org/10.3390/ECRS2023-16315 - 22 Dec 2023

Cited by 2 | Viewed by 3049

Abstract

In this work, we present a comprehensive study on fish detection in underwater environments using sonar images from the Caltech Fish Counting Dataset (CFC). We use the CFC dataset, initially designed for tracking purposes, to optimize and evaluate the performance of YOLO v7 [...] Read more.

In this work, we present a comprehensive study on fish detection in underwater environments using sonar images from the Caltech Fish Counting Dataset (CFC). We use the CFC dataset, initially designed for tracking purposes, to optimize and evaluate the performance of YOLO v7 and YOLO v8 models in fish detection. Our findings demonstrate the high performance of these deep learning models in accurately detecting fish species in sonar images. In our evaluation, YOLO v7 achieved an average precision of 68.3% (AP50) and 62.15% (AP75), while YOLO v8 demonstrated an even better performance with an average precision of 72.47% (AP50) and 66.21% (AP75) across the test dataset of 334,017 images. These high-precision results underscore the effectiveness of these models in fish detection tasks under various underwater conditions. With a dataset of 162,680 training images and 334,017 test images, our evaluation provides valuable insights into the models performance and generalization across diverse underwater conditions. This study contributes to the advancement of underwater fish detection by showcasing the suitability of the CFC dataset and the efficacy of YOLO v7 and YOLO v8 models. These insights can pave the way for further advancements in fish detection, supporting conservation efforts and sustainable fisheries management. Full article

(This article belongs to the Proceedings of ECRS 2023)

► Show Figures

Figure 1

15 pages, 1540 KB

Open AccessArticle

Few-Shot Image Classification via Mutual Distillation

by Tianshu Zhang, Wenwen Dai, Zhiyu Chen, Sai Yang, Fan Liu and Hao Zheng

Appl. Sci. 2023, 13(24), 13284; https://doi.org/10.3390/app132413284 - 15 Dec 2023

Cited by 1 | Viewed by 1972

Abstract

Due to their compelling performance and appealing simplicity, metric-based meta-learning approaches are gaining increasing attention for addressing the challenges of few-shot image classification. However, many similar methods employ intricate network architectures, which can potentially lead to overfitting when trained with limited samples. To [...] Read more.

Due to their compelling performance and appealing simplicity, metric-based meta-learning approaches are gaining increasing attention for addressing the challenges of few-shot image classification. However, many similar methods employ intricate network architectures, which can potentially lead to overfitting when trained with limited samples. To tackle this concern, we propose using mutual distillation to enhance metric-based meta-learning, effectively bolstering model generalization. Specifically, our approach involves two individual metric-based networks, such as prototypical networks and relational networks, mutually supplying each other with a regularization term. This method seamlessly integrates with any metric-based meta-learning approach. We undertake comprehensive experiments on two prevalent few-shot classification benchmarks, namely miniImageNet and Caltech-UCSD Birds-200-2011 (CUB), to demonstrate the effectiveness of our proposed algorithm. The results demonstrate that our method efficiently enhances each metric-based model through mutual distillation. Full article

(This article belongs to the Special Issue Recent Advances in Few-Shot Learning for Computer Vision Tasks)

► Show Figures

Figure 1

20 pages, 4300 KB

Open AccessArticle

AdvRain: Adversarial Raindrops to Attack Camera-Based Smart Vision Systems

by Amira Guesmi, Muhammad Abdullah Hanif and Muhammad Shafique

Information 2023, 14(12), 634; https://doi.org/10.3390/info14120634 - 28 Nov 2023

Cited by 8 | Viewed by 3348

Abstract

Vision-based perception modules are increasingly deployed in many applications, especially autonomous vehicles and intelligent robots. These modules are being used to acquire information about the surroundings and identify obstacles. Hence, accurate detection and classification are essential to reach appropriate decisions and take appropriate [...] Read more.

Vision-based perception modules are increasingly deployed in many applications, especially autonomous vehicles and intelligent robots. These modules are being used to acquire information about the surroundings and identify obstacles. Hence, accurate detection and classification are essential to reach appropriate decisions and take appropriate and safe actions at all times. Current studies have demonstrated that “printed adversarial attacks”, known as physical adversarial attacks, can successfully mislead perception models such as object detectors and image classifiers. However, most of these physical attacks are based on noticeable and eye-catching patterns for generated perturbations making them identifiable/detectable by the human eye, in-field tests, or in test drives. In this paper, we propose a camera-based inconspicuous adversarial attack (AdvRain) capable of fooling camera-based perception systems over all objects of the same class. Unlike mask-based FakeWeather attacks that require access to the underlying computing hardware or image memory, our attack is based on emulating the effects of a natural weather condition (i.e., Raindrops) that can be printed on a translucent sticker, which is externally placed over the lens of a camera whenever an adversary plans to trigger an attack. Note, such perturbations are still inconspicuous in real-world deployments and their presence goes unnoticed due to their association with a natural phenomenon. To accomplish this, we develop an iterative process based on performing a random search aiming to identify critical positions to make sure that the performed transformation is adversarial for a target classifier. Our transformation is based on blurring predefined parts of the captured image corresponding to the areas covered by the raindrop. We achieve a drop in average model accuracy of more than

45 %

and

40 %

on VGG19 for ImageNet dataset and Resnet34 for Caltech-101 dataset, respectively, using only 20 raindrops. Full article

(This article belongs to the Special Issue Security and Privacy for Artificial Intelligence: Opportunities and Challenges)

► Show Figures

Figure 1

18 pages, 2218 KB

Open AccessArticle

Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities

by Yuto Watanabe, Ren Togo, Keisuke Maeda, Takahiro Ogawa and Miki Haseyama

Sensors 2023, 23(22), 9287; https://doi.org/10.3390/s23229287 - 20 Nov 2023

Viewed by 2127

Abstract

At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been [...] Read more.

At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to improve the versatility and performance of text-guided image manipulation, research on its performance evaluation is inadequate. This study proposes Manipulation Direction (MD), a logical and robust metric, which evaluates the performance of text-guided image manipulation by focusing on changes between image and text modalities. Specifically, we define MD as the consistency of changes between images and texts occurring before and after manipulation. By using MD to evaluate the performance of text-guided image manipulation, we can comprehensively evaluate how an image has changed before and after the image manipulation and whether this change agrees with the text. Extensive experiments on Multi-Modal-CelebA-HQ and Caltech-UCSD Birds confirmed that there was an impressive correlation between our calculated MD scores and subjective scores for the manipulated images compared to the existing metrics. Full article

(This article belongs to the Special Issue Advanced Computer Vision Systems 2023)

► Show Figures

Figure 1

19 pages, 4951 KB

Open AccessArticle

S²AC: Self-Supervised Attention Correlation Alignment Based on Mahalanobis Distance for Image Recognition

by Zhi-Yong Wang, Dae-Ki Kang and Cui-Ping Zhang

Electronics 2023, 12(21), 4419; https://doi.org/10.3390/electronics12214419 - 26 Oct 2023

Cited by 3 | Viewed by 1527

Abstract

Susceptibility to domain changes for image classification hinders the application and development of deep neural networks. Domain adaptation (DA) makes use of domain-invariant characteristics to improve the performance of a model trained on labeled data from one domain (source domain) on an unlabeled [...] Read more.

Susceptibility to domain changes for image classification hinders the application and development of deep neural networks. Domain adaptation (DA) makes use of domain-invariant characteristics to improve the performance of a model trained on labeled data from one domain (source domain) on an unlabeled domain (target) with a different data distribution. But existing DA methods simply use pretrained models (e.g., AlexNet, ResNet) for feature extraction, which are convolutional models that are trapped in localized features and fail to acquire long-distance dependencies. Furthermore, many approaches depend too much on pseudo-labels, which can impair adaptation efficiency and lead to unstable and inconsistent results. In this research, we present S²AC, a novel approach for unsupervised deep domain adaptation, that makes use of a stacked attention architecture as a feature map extractor. Our method can fuse domain discrepancy with minimizing a linear transformation of the second statistics (covariances) extended by the p-norm, while simultaneously designing pretext tasks on heuristics to improve the generality of the learning representation. In addition, we have developed a new trainable relative position embedding that not only reduces the model parameters but also enhances model accuracy and expedites the training process. To illustrate our method’s efficacy and controllability, we designed extensive experiments based on the Office31, Office_Caltech_10, and OfficeHome datasets. To the best of our knowledge, the proposed method is the first attempt at incorporating attention-based networks and self-supervised learning for image domain adaptation, and has shown promising results. Full article

(This article belongs to the Special Issue Artificial Intelligence for Robotics)

► Show Figures

Figure 1

16 pages, 4517 KB

Open AccessArticle

An Improved YOLOv5 Algorithm for Vulnerable Road User Detection

by Wei Yang, Xiaolin Tang, Kongming Jiang, Yang Fu and Xinling Zhang

Sensors 2023, 23(18), 7761; https://doi.org/10.3390/s23187761 - 8 Sep 2023

Cited by 9 | Viewed by 2971

Abstract

The vulnerable road users (VRUs), being small and exhibiting random movements, increase the difficulty of object detection of the autonomous emergency braking system for vulnerable road users AEBS-VRUs, with their behaviors highly random. To overcome existing problems of AEBS-VRU object detection, an enhanced [...] Read more.

The vulnerable road users (VRUs), being small and exhibiting random movements, increase the difficulty of object detection of the autonomous emergency braking system for vulnerable road users AEBS-VRUs, with their behaviors highly random. To overcome existing problems of AEBS-VRU object detection, an enhanced YOLOv5 algorithm is proposed. While the Complete Intersection over Union-Loss (CIoU-Loss) and Distance Intersection over Union-Non-Maximum Suppression (DIoU-NMS) are fused to improve the model’s convergent speed, the algorithm also incorporates a minor object detection layer to increase the performance of VRU detection. A dataset for complex AEBS-VRUS scenarios is established based on existing datasets such as Caltech, nuScenes, and Penn-Fudan, and the model is trained using migration learning based on the PyTorch framework. A number of comparative experiments using models such as YOLOv6, YOLOv7, YOLOv8 and YOLOx are carried out. The results of the comparative evaluation show that the proposed improved YOLO5 algorithm has the best overall performance in terms of efficiency, accuracy and timeliness of target detection. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

Search Results (66)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (66)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI