MDPI - Publisher of Open Access Journals

35 pages, 7792 KiB

Open AccessArticle

TWIN-ADAPT: Continuous Learning for Digital Twin-Enabled Online Anomaly Classification in IoT-Driven Smart Labs

by Ragini Gupta, Beitong Tian, Yaohui Wang and Klara Nahrstedt

Future Internet 2024, 16(7), 239; https://doi.org/10.3390/fi16070239 - 4 Jul 2024

Cited by 4 | Viewed by 3069

In the rapidly evolving landscape of scientific semiconductor laboratories (commonly known as, cleanrooms), integrated with Internet of Things (IoT) technology and Cyber-Physical Systems (CPSs), several factors including operational changes, sensor aging, software updates and the introduction of new processes or equipment can lead [...] Read more.

In the rapidly evolving landscape of scientific semiconductor laboratories (commonly known as, cleanrooms), integrated with Internet of Things (IoT) technology and Cyber-Physical Systems (CPSs), several factors including operational changes, sensor aging, software updates and the introduction of new processes or equipment can lead to dynamic and non-stationary data distributions in evolving data streams. This phenomenon, known as concept drift, poses a substantial challenge for traditional data-driven digital twin static machine learning (ML) models for anomaly detection and classification. Subsequently, the drift in normal and anomalous data distributions over time causes the model performance to decay, resulting in high false alarm rates and missed anomalies. To address this issue, we present TWIN-ADAPT, a continuous learning model within a digital twin framework designed to dynamically update and optimize its anomaly classification algorithm in response to changing data conditions. This model is evaluated against state-of-the-art concept drift adaptation models and tested under simulated drift scenarios using diverse noise distributions to mimic real-world distribution shift in anomalies. TWIN-ADAPT is applied to three critical CPS datasets of Smart Manufacturing Labs (also known as “Cleanrooms”): Fumehood, Lithography Unit and Vacuum Pump. The evaluation results demonstrate that TWIN-ADAPT’s continual learning model for optimized and adaptive anomaly classification achieves a high accuracy and F1 score of 96.97% and 0.97, respectively, on the Fumehood CPS dataset, showing an average performance improvement of 0.57% over the offline model. For the Lithography and Vacuum Pump datasets, TWIN-ADAPT achieves an average accuracy of 69.26% and 71.92%, respectively, with performance improvements of 75.60% and 10.42% over the offline model. These significant improvements highlight the efficacy of TWIN-ADAPT’s adaptive capabilities. Additionally, TWIN-ADAPT shows a very competitive performance when compared with other benchmark drift adaptation algorithms. This performance demonstrates TWIN-ADAPT’s robustness across different modalities and datasets, confirming its suitability for any IoT-driven CPS framework managing diverse data distributions in real time streams. Its adaptability and effectiveness make it a versatile tool for dynamic industrial settings. Full article

(This article belongs to the Special Issue Digital Twins in Intelligent Manufacturing)

► Show Figures

Figure 1

26 pages, 8779 KiB

Open AccessArticle

LCV2: A Universal Pretraining-Free Framework for Grounded Visual Question Answering

by Yuhan Chen, Lumei Su, Lihua Chen and Zhiwei Lin

Electronics 2024, 13(11), 2061; https://doi.org/10.3390/electronics13112061 - 25 May 2024

Cited by 1 | Viewed by 1975

Abstract

Grounded Visual Question Answering systems place heavy reliance on substantial computational power and data resources in pretraining. In response to this challenge, this paper introduces the LCV2 modular approach, which utilizes a frozen large language model (LLM) to bridge the off-the-shelf generic visual [...] Read more.

Grounded Visual Question Answering systems place heavy reliance on substantial computational power and data resources in pretraining. In response to this challenge, this paper introduces the LCV2 modular approach, which utilizes a frozen large language model (LLM) to bridge the off-the-shelf generic visual question answering (VQA) module with a generic visual grounding (VG) module. It leverages the generalizable knowledge of these expert models, avoiding the need for any large-scale pretraining. Innovatively, within the LCV2 framework, question and predicted answer pairs are transformed into descriptive and referring captions, enhancing the clarity of the visual cues directed by the question text for the VG module’s grounding. This compensates for the limitations of missing intrinsic text–visual coupling in non-end-to-end frameworks. Comprehensive experiments on benchmark datasets, such as GQA, CLEVR, and VizWiz-VQA-Grounding, were conducted to evaluate the method’s performance and compare it with several baseline methods. In particular, it achieved an IoU F1 score of 59.6% on the GQA dataset and an IoU F1 score of 37.4% on the CLEVR dataset, surpassing some baseline results and demonstrating the LCV2’s competitive performance. Full article

(This article belongs to the Special Issue Advances in Large Language Model Empowered Machine Learning: Design and Application)

► Show Figures

Figure 1

20 pages, 20528 KiB

Open AccessArticle

Multi-Task Visual Perception for Object Detection and Semantic Segmentation in Intelligent Driving

by Jiao Zhan, Jingnan Liu, Yejun Wu and Chi Guo

Remote Sens. 2024, 16(10), 1774; https://doi.org/10.3390/rs16101774 - 16 May 2024

Cited by 5 | Viewed by 3428

Abstract

With the rapid development of intelligent driving vehicles, multi-task visual perception based on deep learning emerges as a key technological pathway toward safe vehicle navigation in real traffic scenarios. However, due to the high-precision and high-efficiency requirements of intelligent driving vehicles in practical [...] Read more.

With the rapid development of intelligent driving vehicles, multi-task visual perception based on deep learning emerges as a key technological pathway toward safe vehicle navigation in real traffic scenarios. However, due to the high-precision and high-efficiency requirements of intelligent driving vehicles in practical driving environments, multi-task visual perception remains a challenging task. Existing methods typically adopt effective multi-task learning networks to concurrently handle multiple tasks. Despite the fact that they obtain remarkable achievements, better performance can be achieved through tackling existing problems like underutilized high-resolution features and underexploited non-local contextual dependencies. In this work, we propose YOLOPv3, an efficient anchor-based multi-task visual perception network capable of handling traffic object detection, drivable area segmentation, and lane detection simultaneously. Compared to prior works, we make essential improvements. On the one hand, we propose architecture enhancements that can utilize multi-scale high-resolution features and non-local contextual dependencies for improving network performance. On the other hand, we propose optimization improvements aiming at enhancing network training, enabling our YOLOPv3 to achieve optimal performance via straightforward end-to-end training. The experimental results on the BDD100K dataset demonstrate that YOLOPv3 sets a new state of the art (SOTA): 96.9% recall and 84.3% mAP50 in traffic object detection, 93.2% mIoU in drivable area segmentation, and 88.3% accuracy and 28.0% IoU in lane detection. In addition, YOLOPv3 maintains competitive inference speed against the lightweight YOLOP. Thus, YOLOPv3 stands as a robust solution for handling multi-task visual perception problems. The code and trained models have been released on GitHub. Full article

(This article belongs to the Topic Information Sensing Technology for Intelligent/Driverless Vehicle, 2nd Edition)

► Show Figures

Graphical abstract

21 pages, 24678 KiB

Open AccessFeature PaperArticle

Efficient Vision Transformer YOLOv5 for Accurate and Fast Traffic Sign Detection

by Guang Zeng, Zhizhou Wu, Lipeng Xu and Yunyi Liang

Electronics 2024, 13(5), 880; https://doi.org/10.3390/electronics13050880 - 25 Feb 2024

Cited by 11 | Viewed by 3488

Abstract

Accurate and fast detection of traffic sign information is vital for autonomous driving systems. However, the YOLOv5 algorithm faces challenges with low accuracy and slow detection when it is used for traffic sign detection. To address these shortcomings, this paper introduces an accurate [...] Read more.

Accurate and fast detection of traffic sign information is vital for autonomous driving systems. However, the YOLOv5 algorithm faces challenges with low accuracy and slow detection when it is used for traffic sign detection. To address these shortcomings, this paper introduces an accurate and fast traffic sign detection algorithm–YOLOv5-Efficient Vision TransFormer(EfficientViT)). The algorithm focuses on improving both the accuracy and speed of the model by replacing the CSPDarknet backbone of the YOLOv5(s) model with the EfficientViT network. Additionally, the algorithm incorporates the Convolutional Block Attention Module(CBAM) attention mechanism to enhance feature layer information extraction and boost the accuracy of the detection algorithm. To mitigate the adverse effects of low-quality labels on gradient generation and enhance the competitiveness of high-quality anchor frames, a superior gradient gain allocation strategy is employed. Furthermore, the strategy introduces the Wise-IoU (WIoU), a dynamic non-monotonic focusing mechanism for bounding box loss, to further enhance the accuracy and speed of the object detection algorithm. The algorithm’s effectiveness is validated through experiments conducted on the 3L-TT100K traffic sign dataset, showcasing a mean average precision (mAP) of 94.1% in traffic sign detection. This mAP surpasses the performance of the YOLOv5(s) algorithm by 4.76% and outperforms the baseline algorithm. Additionally, the algorithm achieves a detection speed of 62.50 frames per second, which is much better than the baseline algorithm. Full article

(This article belongs to the Special Issue Applications of Computer Vision, 2nd Edition)

► Show Figures

Figure 1

23 pages, 4457 KiB

Open AccessArticle

A Task Scheduling Optimization Method for Vehicles Serving as Obstacles in Mobile Edge Computing Based IoV Systems

by Mingwei Feng, Haiqing Yao and Jie Li

Entropy 2023, 25(1), 139; https://doi.org/10.3390/e25010139 - 10 Jan 2023

Cited by 5 | Viewed by 2489

Abstract

In recent years, as more and more vehicles request service from roadside units (RSU), the vehicle-to-infrastructure (V2I) communication links are under tremendous pressure. This paper first proposes a dynamic dense traffic flow model under the condition of fading channel. Based on this, the [...] Read more.

In recent years, as more and more vehicles request service from roadside units (RSU), the vehicle-to-infrastructure (V2I) communication links are under tremendous pressure. This paper first proposes a dynamic dense traffic flow model under the condition of fading channel. Based on this, the reliability is redefined according to the real-time location information of vehicles. The on-board units (OBU) migrate intensive computing tasks to the appropriate RSU to optimize the execution time and calculating cost at the same time. In addition, competitive delay is introduced into the model of execution time, which can describe the channel resource contention and data conflict in dynamic scenes of the internet of vehicles (IoV). Next, the task scheduling for RSU is formulated as a multi-objective optimization problem. In order to solve the problem, a task scheduling algorithm based on a reliability constraint (TSARC) is proposed to select the optimal RSU for task transmission. When compared with the genetic algorithm (GA), there are some improvements of TSARC: first, the quick non-dominated sorting is applied to layer the population and reduce the complexity. Second, the elite strategy is introduced with an excellent nonlinear optimization ability, which ensures the diversity of optimal individuals and provides different preference choices for passengers. Third, the reference point mechanism is introduced to reserve the individuals that are non-dominated and close to reference points. TSARC’s Pareto based multi-objective optimization can comprehensively measure the overall state of the system and flexibly schedule system resources. Furthermore, it overcomes the defects of the GA method, such as the determination of the linear weight value, the non-uniformity of dimensions among objectives, and poor robustness. Finally, numerical simulation results based on the British Highway Traffic Flow Data Set show that the TSARC performs better scalability and efficiency than other methods with different numbers of tasks and traffic flow densities, which verifies the previous theoretical derivation. Full article

(This article belongs to the Section Multidisciplinary Applications)

► Show Figures

Figure 1

24 pages, 7794 KiB

Open AccessArticle

Mask-Point: Automatic 3D Surface Defects Detection Network for Fiber-Reinforced Resin Matrix Composites

by Helin Li, Bin Lin, Chen Zhang, Liang Xu, Tianyi Sui, Yang Wang, Xinquan Hao, Deyu Lou and Hongyu Li

Polymers 2022, 14(16), 3390; https://doi.org/10.3390/polym14163390 - 19 Aug 2022

Cited by 9 | Viewed by 3326

Abstract

Surface defects of fiber-reinforced resin matrix composites (FRRMCs) adversely affect their appearance and performance. To accurately and efficiently detect the three-dimensional (3D) surface defects of FRRMCs, a novel lightweight and two-stage semantic segmentation network, i.e., Mask-Point, is proposed. Stage 1 of Mask-Point is [...] Read more.

Surface defects of fiber-reinforced resin matrix composites (FRRMCs) adversely affect their appearance and performance. To accurately and efficiently detect the three-dimensional (3D) surface defects of FRRMCs, a novel lightweight and two-stage semantic segmentation network, i.e., Mask-Point, is proposed. Stage 1 of Mask-Point is the multi-head 3D region proposal extractors (RPEs), generating several 3D regions of interest (ROIs). Stage 2 is the 3D aggregation stage composed of the shared classifier, shared filter, and non-maximum suppression (NMS). The two stages work together to detect the surface defects. To evaluate the performance of Mask-Point, a new 3D surface defects dataset of FRRMCs containing about 120 million points is produced. Training and test experiments show that the accuracy and the mean intersection of union (mIoU) increase as the number of different 3D RPEs increases in Stage 1, but the inference speed becomes slower when the number of different 3D RPEs increases. The best accuracy, mIoU, and inference speed of the Mask-Point model could reach 0.9997, 0.9402, and 320,000 points/s, respectively. Moreover, comparison experiments also show that Mask-Point offers relatively the best segmentation performance compared with several other typical 3D semantic segmentation networks. The mIoU of Mask-Point is about 30% ahead of the sub-optimal 3D semantic segmentation network PointNet. In addition, a distributed surface defects detection system based on Mask-Point is developed. The system is applied to scan real FRRMC products and detect their surface defects, and it achieves the relatively best detection performance in competition with skilled human workers. The above experiments demonstrate that the proposed Mask-Point could accurately and efficiently detect 3D surface defects of FRRMCs, and the Mask-Point also provides a new potential solution for the 3D surface defects detection of other similar materials Full article

(This article belongs to the Special Issue Development in Fiber-Reinforced Polymer Composites)

► Show Figures

Figure 1

15 pages, 5152 KiB

Open AccessArticle

FASSVid: Fast and Accurate Semantic Segmentation for Video Sequences

by Jose Portillo-Portillo, Gabriel Sanchez-Perez, Linda K. Toscano-Medina, Aldo Hernandez-Suarez, Jesus Olivares-Mercado, Hector Perez-Meana, Pablo Velarde-Alvarado, Ana Lucila Sandoval Orozco and Luis Javier García Villalba

Entropy 2022, 24(7), 942; https://doi.org/10.3390/e24070942 - 7 Jul 2022

Cited by 2 | Viewed by 3316

Abstract

Most of the methods for real-time semantic segmentation do not take into account temporal information when working with video sequences. This is counter-intuitive in real-world scenarios where the main application of such methods is, precisely, being able to process frame sequences as quickly [...] Read more.

Most of the methods for real-time semantic segmentation do not take into account temporal information when working with video sequences. This is counter-intuitive in real-world scenarios where the main application of such methods is, precisely, being able to process frame sequences as quickly and accurately as possible. In this paper, we address this problem by exploiting the temporal information provided by previous frames of the video stream. Our method leverages a previous input frame as well as the previous output of the network to enhance the prediction accuracy of the current input frame. We develop a module that obtains feature maps rich in change information. Additionally, we incorporate the previous output of the network into all the decoder stages as a way of increasing the attention given to relevant features. Finally, to properly train and evaluate our methods, we introduce CityscapesVid, a dataset specifically designed to benchmark semantic video segmentation networks. Our proposed network, entitled FASSVid improves the mIoU accuracy performance over a standard non-sequential baseline model. Moreover, FASSVid obtains state-of-the-art inference speed and competitive mIoU results compared to other state-of-the-art lightweight networks, with significantly lower number of computations. Specifically, we obtain 71% of mIoU in our CityscapesVid dataset, running at 114.9 FPS on a single NVIDIA GTX 1080Ti and 31 FPS on the NVIDIA Jetson Nano embedded board with images of size

1024 \times 2048

and

512 \times 1024

, respectively. Full article

(This article belongs to the Collection Entropy-Based Applied Cryptography and Enhanced Security for Future IT Environments)

► Show Figures

Figure 1

16 pages, 6571 KiB

Open AccessArticle

Active Mask-Box Scoring R-CNN for Sonar Image Instance Segmentation

by Fangjin Xu, Jianxing Huang, Jie Wu and Longyu Jiang

Electronics 2022, 11(13), 2048; https://doi.org/10.3390/electronics11132048 - 29 Jun 2022

Cited by 11 | Viewed by 2562

Abstract

Instance segmentation of sonar images is an effective method for underwater target recognition. However, the mismatch among positioning accuracy found by boxIoU and classification confidence, which is used as NMS score in current instance segmentation models; and the high annotation cost of sonar [...] Read more.

Instance segmentation of sonar images is an effective method for underwater target recognition. However, the mismatch among positioning accuracy found by boxIoU and classification confidence, which is used as NMS score in current instance segmentation models; and the high annotation cost of sonar images, are two major problems in the task. To tackle these problems, in this paper, we present a novel instance segmentation method called Mask-Box Scoring R-CNN and embedded it in our proposed deep active learning framework. For the mismatch problem between boxIoU and NMS score, Mask-Box Scoring R-CNN uses a boxIoU head to predict the quality of the bounding boxes. We amend the non-maximum suppression (NMS) score predicted by BoxIoU to preserve high-quality bounding boxes in inference flow. To deal with the annotating problem, we propose a triplets-measure-based active learning (TBAL) method and a balanced-sampling method applicable for deep learning. The TBAL method evaluates the amount of information of unlabeled samples from the aspects of classification confidence, positioning accuracy, and mask quality. The balanced-sampling method selects hard samples from the dataset to train the model to improve performance. The experimental results show that Mask-Box Scoring R-CNN achieves improvements of 1% in boxAP and 1.3% boxAP on our sonar image dataset compared with Mask Scoring R-CNN and Mask R-CNN, respectively. The active learning framework with TBAL and balanced sampling can achieve a competitive performance with less labeled samples than other frameworks, which can better facilitate underwater target recognition. Full article

(This article belongs to the Special Issue Recent Applications of Object Detection, Tracking, and Abnormal Detection Based on AI)

► Show Figures

Figure 1

19 pages, 553 KiB

Open AccessArticle

Analytical Model of ALOHA and Time- and Frequency-Asynchronous ALOHA with Forward Error Correction for IoT Systems

by Federico Clazzer and Marcel Grec

Sensors 2022, 22(10), 3741; https://doi.org/10.3390/s22103741 - 14 May 2022

Cited by 8 | Viewed by 2843

Abstract

The blooming of internet of things (IoT) services calls for a paradigm shift in the design of communications systems. Short data packets sporadically transmitted by a multitude of low-cost low-power terminals require a radical change in relevant aspects of the protocol stack. For [...] Read more.

The blooming of internet of things (IoT) services calls for a paradigm shift in the design of communications systems. Short data packets sporadically transmitted by a multitude of low-cost low-power terminals require a radical change in relevant aspects of the protocol stack. For example, scheduling-based approaches may become inefficient at the medium access (MAC) layer, and alternatives such as uncoordinated access policies may be preferred. In this context random access (RA) in its simplest form, i.e., additive links on-line Hawaii area (ALOHA), may again become attractive as also proved by a number of technologies adopting it. The use of forward error correction (FEC) can improve its performance, yet a comprehensive analytical model including this aspect is still missing. In this paper, we provide a first attempt by deriving exact expressions for the packet loss rate and spectral efficiency of ALOHA with FEC, and extend the result also to time- and frequency-asynchronous ALOHA aided by FEC. We complement our study with extensive evaluations of the expressions for relevant cases of study, including an IoT system served by low-Earth orbit (LEO) satellites. Non-trivial outcomes show how time- and frequency-asynchronous ALOHA particularly benefit from the presence of FEC and become competitive with ALOHA. Full article

(This article belongs to the Special Issue Massive Machine-Type Communications towards 6G)

► Show Figures

Figure 1

15 pages, 1627 KiB

Open AccessArticle

MSPNet: Multi-Scale Strip Pooling Network for Road Extraction from Remote Sensing Images

by Shenming Qu, Huafei Zhou, Bo Zhang and Shengbin Liang

Appl. Sci. 2022, 12(8), 4068; https://doi.org/10.3390/app12084068 - 18 Apr 2022

Cited by 9 | Viewed by 2681

Abstract

Extracting roads from remote sensing images can support a range of geo-information applications. However, it is challenging due to factors such as the complex distribution of ground objects and occlusion of buildings, trees, shadows, etc. Pixel-wise classification often fails to predict road connectivity [...] Read more.

Extracting roads from remote sensing images can support a range of geo-information applications. However, it is challenging due to factors such as the complex distribution of ground objects and occlusion of buildings, trees, shadows, etc. Pixel-wise classification often fails to predict road connectivity and thus produces fragmented road segments. In this paper, we propose a multi-scale strip pooling network (MSPNet) to learn the linear features of roads. Motivated by the strip pooling being more aligned with the shape of roads, which are long-span and narrow, we develop a multi-scale strip pooling (MSP) module that utilizes strip pooling layers with long but narrow kernel shapes to capture multi-scale long-range context from horizontal and vertical directions. The proposed MSP module focuses on establishing relationships along the road region to guarantee the connectivity of roads. Considering the complex distribution of ground objects, the spatial pyramid pooling is applied to enhance the learning ability of complex features in different sub-regions. In addition, to alleviate the problem caused by an imbalanced distribution of road and non-road pixels, we use binary cross-entropy and dice-coefficient loss functions to jointly train our proposed deep learning model. Then, we perform ablation experiments to adjust the loss contributions to suit the task of road extraction. Comparative experiments on a popular benchmark DeepGlobe dataset demonstrate that our proposed MSPNet establishes new competitive results in both IoU and F1-score. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

17 pages, 256 KiB

Open AccessArticle

Decoupling Analysis of China’s Product Sector Output and Its Embodied Carbon Emissions—An Empirical Study Based on Non-Competitive I-O and Tapio Decoupling Model

by Jianbo Hu, Shanshan Gui and Wei Zhang

Sustainability 2017, 9(5), 815; https://doi.org/10.3390/su9050815 - 15 May 2017

Cited by 16 | Viewed by 4411

Abstract

This paper uses the non-competitive I-O model and the Tapio decoupling model to comprehensively analyze the decoupling relationship between the output of the product sector in China and its embodied carbon emissions under trade openness. For this purpose, the Chinese input and output [...] Read more.

This paper uses the non-competitive I-O model and the Tapio decoupling model to comprehensively analyze the decoupling relationship between the output of the product sector in China and its embodied carbon emissions under trade openness. For this purpose, the Chinese input and output data in 2002, 2005, 2007, 2010, and 2012 are used. This approach is beneficial to identify the direct mechanism for the increased carbon emission in China from a micro perspective and provides a new perspective for the subsequent study about low-carbon economy. The obtained empirical results are as follows: (1) From overall perspective, the decoupling elasticity between the output of the product sector and its embodied carbon emissions decreased. Output and embodied carbon emissions showed a growth link from 2002 to 2005 and a weak decoupling relationship for the rest of the study period. (2) Among the 28 industries in the product sector, the increased growth rate of output in more and more product sectors was no longer accompanied by large CO₂ emissions. The number of industries with strong decoupling relationships between output and embodied carbon emissions increased. (3) From the perspective of three industries, the output and embodied carbon emissions in the second and third industries exhibited a growth link only from 2002 to 2005; the three industries presented weak or strong decoupling for the rest of the study period. Through empirical analysis, this paper mainly through the construction of ecological and environmental protection of low carbon agriculture, low carbon cycle industrial system, as well as intensive and efficient service industry to reduce the carbon emissions of China’s product sector. Full article

(This article belongs to the Section Energy Sustainability)

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI