Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (10)

Search Parameters:
Keywords = multitask cascaded CNN

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 16570 KB  
Article
Dual-Region Encryption Model Based on a 3D-MNFC Chaotic System and Logistic Map
by Jingyan Li, Yan Niu, Dan Yu, Yiling Wang, Jiaqi Huang and Mingliang Dou
Entropy 2026, 28(2), 132; https://doi.org/10.3390/e28020132 - 23 Jan 2026
Viewed by 141
Abstract
Facial information carries key personal privacy, and it is crucial to ensure its security through encryption. Traditional encryption for portrait images typically processes the entire image, despite the fact that most regions lack sensitive facial information. This approach is notably inefficient and imposes [...] Read more.
Facial information carries key personal privacy, and it is crucial to ensure its security through encryption. Traditional encryption for portrait images typically processes the entire image, despite the fact that most regions lack sensitive facial information. This approach is notably inefficient and imposes unnecessary computational burdens. To address this inefficiency while maintaining security, we propose a novel dual-region encryption model for portrait images. Firstly, a Multi-task Cascaded Convolutional Network (MTCNN) was adopted to efficiently segment facial images into two regions: facial and non-facial. Subsequently, given the high sensitivity of facial regions, a robust encryption scheme was designed by integrating a CNN-based key generator, the proposed three-dimensional Multi-module Nonlinear Feedback-coupled Chaotic System (3D-MNFC), DNA encoding, and bit reversal. The 3D-MNFC incorporating time-varying parameters, nonlinear terms and state feedback terms and coupling mechanisms has been proven to exhibit excellent chaotic performance. As for non-facial regions, the Logistic map combined with XOR operations is used to balance efficiency and basic security. Finally, the encrypted image is obtained by restoring the two ciphertext images to their original positions. Comprehensive security analyses confirm the exceptional performance of the regional model: large key space (2536) and near-ideal information entropy (7.9995), NPCR and UACI values of 99.6055% and 33.4599%. It is worth noting that the model has been verified to improve efficiency by at least 37.82%. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

33 pages, 2435 KB  
Article
Multi-Task Learning for Ocean-Front Detection and Evolutionary Trend Recognition
by Qi He, Anqi Huang, Lijia Geng, Wei Zhao and Yanling Du
Remote Sens. 2025, 17(23), 3862; https://doi.org/10.3390/rs17233862 - 28 Nov 2025
Viewed by 401
Abstract
Ocean fronts are central to upper-ocean dynamics and ecosystem processes, yet recognizing their evolutionary trends from satellite data remains challenging. We present a 3D U-Net-based multi-task framework that jointly performs ocean-front detection (OFD) and ocean-front evolutionary trend recognition (OFETR) from sea surface temperature [...] Read more.
Ocean fronts are central to upper-ocean dynamics and ecosystem processes, yet recognizing their evolutionary trends from satellite data remains challenging. We present a 3D U-Net-based multi-task framework that jointly performs ocean-front detection (OFD) and ocean-front evolutionary trend recognition (OFETR) from sea surface temperature gradient heatmaps. Instead of cascading OFD and OFETR in separate stages that pass OFD outputs downstream and can amplify upstream errors, the proposed model shares 3D spatiotemporal features and is trained end-to-end. We construct the Zhejiang–Fujian Coastal Front Mask (ZFCFM) and Evolutionary Trend (ZFCFET) datasets from ESA SST CCI L4 products for 2002–2021 and use them to evaluate the framework against 2D CNN baselines and traditional methods. Multi-task learning improves OFETR compared with single-task training while keeping OFD performance comparable, and the unified design reduces parameter count and daily computational cost. The model outputs daily point-level trend labels aligned with the dataset’s temporal resolution, indicating that end-to-end multi-task learning can mitigate error propagation and provide temporally resolved estimates. Full article
Show Figures

Figure 1

22 pages, 1269 KB  
Article
LightFakeDetect: A Lightweight Model for Deepfake Detection in Videos That Focuses on Facial Regions
by Sarab AlMuhaideb, Hessa Alshaya, Layan Almutairi, Danah Alomran and Sarah Turki Alhamed
Mathematics 2025, 13(19), 3088; https://doi.org/10.3390/math13193088 - 25 Sep 2025
Cited by 2 | Viewed by 5308
Abstract
In recent years, the proliferation of forged videos, known as deepfakes, has escalated significantly, primarily due to advancements in technologies such as Generative Adversarial Networks (GANs), diffusion models, and Vision Language Models (VLMs). These deepfakes present substantial risks, threatening political stability, facilitating celebrity [...] Read more.
In recent years, the proliferation of forged videos, known as deepfakes, has escalated significantly, primarily due to advancements in technologies such as Generative Adversarial Networks (GANs), diffusion models, and Vision Language Models (VLMs). These deepfakes present substantial risks, threatening political stability, facilitating celebrity impersonation, and enabling tampering with evidence. As the sophistication of deepfake technology increases, detecting these manipulated videos becomes increasingly challenging. Most of the existing deepfake detection methods use Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), or Vision Transformers (ViTs), achieving strong accuracy but exhibiting high computational demands. This highlights the need for a lightweight yet effective pipeline for real-time and resource-limited scenarios. This study introduces a lightweight deep learning model for deepfake detection in order to address this emerging threat. The model incorporates three integral components: MobileNet for feature extraction, a Convolutional Block Attention Module (CBAM) for feature enhancement, and a Gated Recurrent Unit (GRU) for temporal analysis. Additionally, a pre-trained Multi-Task Cascaded Convolutional Network (MTCNN) is utilized for face detection and cropping. The model is evaluated using the Deepfake Detection Challenge (DFDC) and Celeb-DF v2 datasets, demonstrating impressive performance, with 98.2% accuracy and a 99.0% F1-score on Celeb-DF v2 and 95.0% accuracy and a 97.2% F1-score on DFDC, achieving a commendable balance between simplicity and effectiveness. Full article
Show Figures

Figure 1

27 pages, 7624 KB  
Article
A Multi-Task Learning Framework with Enhanced Cross-Level Semantic Consistency for Multi-Level Land Cover Classification
by Shilin Tao, Haoyu Fu, Ruiqi Yang and Leiguang Wang
Remote Sens. 2025, 17(14), 2442; https://doi.org/10.3390/rs17142442 - 14 Jul 2025
Viewed by 1489
Abstract
The multi-scale characteristics of remote sensing imagery have an inherent correspondence with the hierarchical structure of land cover classification systems, providing a theoretical foundation for multi-level land cover classification. However, most existing methods treat classification tasks at different semantic levels as independent processes, [...] Read more.
The multi-scale characteristics of remote sensing imagery have an inherent correspondence with the hierarchical structure of land cover classification systems, providing a theoretical foundation for multi-level land cover classification. However, most existing methods treat classification tasks at different semantic levels as independent processes, overlooking the semantic relationships among these levels, which leads to semantic inconsistencies and structural conflicts in classification results. We addressed this issue with a deep multi-task learning (MTL) framework, named MTL-SCH, which enables collaborative classification across multiple semantic levels. MTL-SCH employs a shared encoder combined with a feature cascade mechanism to boost information sharing and collaborative optimization between two levels. A hierarchical loss function is also embedded that explicitly models the semantic dependencies between levels, enhancing semantic consistency across levels. Two new evaluation metrics, namely Semantic Alignment Deviation (SAD) and Enhancing Semantic Alignment Deviation (ESAD), are also proposed to quantify the improvement of MTL-SCH in semantic consistency. In the experimental section, MTL-SCH is applied to different network models, including CNN, Transformer, and CNN-Transformer models. The results indicate that MTL-SCH improves classification accuracy in coarse- and fine-level segmentation tasks, significantly enhancing semantic consistency across levels and outperforming traditional flat segmentation methods. Full article
Show Figures

Figure 1

41 pages, 1802 KB  
Review
A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition
by Andisani Nemavhola, Colin Chibaya and Serestina Viriri
Information 2025, 16(2), 107; https://doi.org/10.3390/info16020107 - 5 Feb 2025
Cited by 14 | Viewed by 9042
Abstract
This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent [...] Read more.
This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent collections such as MegaFace and Ms-Celeb-1M, offering a range of sizes, subject diversity, and image quality. Older databases, such as ORL and FERET, are smaller and cleaner, while newer datasets enable large-scale training with millions of images but pose challenges like inconsistent data quality and high computational costs. The study also examines CNN architectures, including FaceNet and Visual Geometry Group 16 (VGG16), which show strong performance on large datasets like Labeled Faces in the Wild (LFW) and VGGFace, achieving accuracy rates above 98%. In contrast, earlier models like Support Vector Machine (SVM) and Gabor Wavelets perform well on smaller datasets but lack scalability for larger, more complex datasets. The analysis highlights the growing importance of multi-task learning and ensemble methods, as seen in Multi-Task Cascaded Convolutional Networks (MTCNNs). Overall, the findings emphasize the need for advanced algorithms capable of handling large-scale, real-world challenges while optimizing accuracy and computational efficiency in face recognition systems. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)
Show Figures

Figure 1

14 pages, 4131 KB  
Article
Concurrent Learning Approach for Estimation of Pelvic Tilt from Anterior–Posterior Radiograph
by Ata Jodeiri, Hadi Seyedarabi, Sebelan Danishvar, Seyyed Hossein Shafiei, Jafar Ganjpour Sales, Moein Khoori, Shakiba Rahimi and Seyed Mohammad Javad Mortazavi
Bioengineering 2024, 11(2), 194; https://doi.org/10.3390/bioengineering11020194 - 17 Feb 2024
Cited by 1 | Viewed by 4500
Abstract
Accurate and reliable estimation of the pelvic tilt is one of the essential pre-planning factors for total hip arthroplasty to prevent common post-operative complications such as implant impingement and dislocation. Inspired by the latest advances in deep learning-based systems, our focus in this [...] Read more.
Accurate and reliable estimation of the pelvic tilt is one of the essential pre-planning factors for total hip arthroplasty to prevent common post-operative complications such as implant impingement and dislocation. Inspired by the latest advances in deep learning-based systems, our focus in this paper has been to present an innovative and accurate method for estimating the functional pelvic tilt (PT) from a standing anterior–posterior (AP) radiography image. We introduce an encoder–decoder-style network based on a concurrent learning approach called VGG-UNET (VGG embedded in U-NET), where a deep fully convolutional network known as VGG is embedded at the encoder part of an image segmentation network, i.e., U-NET. In the bottleneck of the VGG-UNET, in addition to the decoder path, we use another path utilizing light-weight convolutional and fully connected layers to combine all extracted feature maps from the final convolution layer of VGG and thus regress PT. In the test phase, we exclude the decoder path and consider only a single target task i.e., PT estimation. The absolute errors obtained using VGG-UNET, VGG, and Mask R-CNN are 3.04 ± 2.49, 3.92 ± 2.92, and 4.97 ± 3.87, respectively. It is observed that the VGG-UNET leads to a more accurate prediction with a lower standard deviation (STD). Our experimental results demonstrate that the proposed multi-task network leads to a significantly improved performance compared to the best-reported results based on cascaded networks. Full article
Show Figures

Figure 1

22 pages, 2324 KB  
Article
Improving Misfire Fault Diagnosis with Cascading Architectures via Acoustic Vehicle Characterization
by Adam M. Terwilliger and Joshua E. Siegel
Sensors 2022, 22(20), 7736; https://doi.org/10.3390/s22207736 - 12 Oct 2022
Cited by 10 | Viewed by 3148
Abstract
In a world dependent on road-based transportation, it is essential to understand automobiles. We propose an acoustic road vehicle characterization system as an integrated approach for using sound captured by mobile devices to enhance transparency and understanding of vehicles and their condition for [...] Read more.
In a world dependent on road-based transportation, it is essential to understand automobiles. We propose an acoustic road vehicle characterization system as an integrated approach for using sound captured by mobile devices to enhance transparency and understanding of vehicles and their condition for non-expert users. We develop and implement novel deep learning cascading architectures, which we define as conditional, multi-level networks that process raw audio to extract highly granular insights for vehicle understanding. To showcase the viability of cascading architectures, we build a multi-task convolutional neural network that predicts and cascades vehicle attributes to enhance misfire fault detection. We train and test these models on a synthesized dataset reflecting more than 40 hours of augmented audio. Through cascading fuel type, engine configuration, cylinder count and aspiration type attributes, our cascading CNN achieves 87.0% test set accuracy on misfire fault detection which demonstrates margins of 8.0% and 1.7% over naïve and parallel CNN baselines. We explore experimental studies focused on acoustic features, data augmentation, and data reliability. Finally, we conclude with a discussion of broader implications, future directions, and application areas for this work. Full article
(This article belongs to the Special Issue Intelligent Systems for Fault Diagnosis and Prognosis)
Show Figures

Figure 1

21 pages, 10377 KB  
Article
Implementation of an Intelligent Exam Supervision System Using Deep Learning Algorithms
by Fatima Mahmood, Jehangir Arshad, Mohamed Tahar Ben Othman, Muhammad Faisal Hayat, Naeem Bhatti, Mujtaba Hussain Jaffery, Ateeq Ur Rehman and Habib Hamam
Sensors 2022, 22(17), 6389; https://doi.org/10.3390/s22176389 - 25 Aug 2022
Cited by 16 | Viewed by 8693
Abstract
Examination cheating activities like whispering, head movements, hand movements, or hand contact are extensively involved, and the rectitude and worthiness of fair and unbiased examination are prohibited by such cheating activities. The aim of this research is to develop a model to supervise [...] Read more.
Examination cheating activities like whispering, head movements, hand movements, or hand contact are extensively involved, and the rectitude and worthiness of fair and unbiased examination are prohibited by such cheating activities. The aim of this research is to develop a model to supervise or control unethical activities in real-time examinations. Exam supervision is fallible due to limited human abilities and capacity to handle students in examination centers, and these errors can be reduced with the help of the Automatic Invigilation System. This work presents an automated system for exams invigilation using deep learning approaches i.e., Faster Regional Convolution Neural Network (RCNN). Faster RCNN is an object detection algorithm that is implemented to detect the suspicious activities of students during examinations based on their head movements, and for student identification, MTCNN (Multi-task Cascaded Convolutional Neural Networks) is used for face detection and recognition. The training accuracy of the proposed model is 99.5% and the testing accuracy is 98.5%. The model is fully efficient in detecting and monitoring more than 100 students in one frame during examinations. Different real-time scenarios are considered to evaluate the performance of the Automatic Invigilation System. The proposed invigilation model can be implemented in colleges, universities, and schools to detect and monitor student suspicious activities. Hopefully, through the implementation of the proposed invigilation system, we can prevent and solve the problem of cheating because it is unethical. Full article
(This article belongs to the Special Issue Advances in IoMT for Healthcare Systems)
Show Figures

Figure 1

17 pages, 4363 KB  
Article
A Multitask Cascading CNN with MultiScale Infrared Optical Flow Feature Fusion-Based Abnormal Crowd Behavior Monitoring UAV
by Yanhua Shao, Wenfeng Li, Hongyu Chu, Zhiyuan Chang, Xiaoqiang Zhang and Huayi Zhan
Sensors 2020, 20(19), 5550; https://doi.org/10.3390/s20195550 - 28 Sep 2020
Cited by 22 | Viewed by 4020
Abstract
Visual-based object detection and understanding is an important problem in computer vision and signal processing. Due to their advantages of high mobility and easy deployment, unmanned aerial vehicles (UAV) have become a flexible monitoring platform in recent years. However, visible-light-based methods are often [...] Read more.
Visual-based object detection and understanding is an important problem in computer vision and signal processing. Due to their advantages of high mobility and easy deployment, unmanned aerial vehicles (UAV) have become a flexible monitoring platform in recent years. However, visible-light-based methods are often greatly influenced by the environment. As a result, a single type of feature derived from aerial monitoring videos is often insufficient to characterize variations among different abnormal crowd behaviors. To address this, we propose combining two types of features to better represent behavior, namely, multitask cascading CNN (MC-CNN) and multiscale infrared optical flow (MIR-OF), capturing both crowd density and average speed and the appearances of the crowd behaviors, respectively. First, an infrared (IR) camera and Nvidia Jetson TX1 were chosen as an infrared vision system. Since there are no published infrared-based aerial abnormal-behavior datasets, we provide a new infrared aerial dataset named the IR-flying dataset, which includes sample pictures and videos in different scenes of public areas. Second, MC-CNN was used to estimate the crowd density. Third, MIR-OF was designed to characterize the average speed of crowd. Finally, considering two typical abnormal crowd behaviors of crowd aggregating and crowd escaping, the experimental results show that the monitoring UAV system can detect abnormal crowd behaviors in public areas effectively. Full article
(This article belongs to the Special Issue Sensor Fusion for Object Detection, Classification and Tracking)
Show Figures

Figure 1

17 pages, 6878 KB  
Article
Fast Face Tracking-by-Detection Algorithm for Secure Monitoring
by Jia Su, Lihui Gao, Wei Li, Yu Xia, Ning Cao and Ruichao Wang
Appl. Sci. 2019, 9(18), 3774; https://doi.org/10.3390/app9183774 - 9 Sep 2019
Cited by 9 | Viewed by 6190
Abstract
This work proposes a fast face tracking-by-detection (FFTD) algorithm that can perform tracking, face detection and discrimination tasks. On the basis of using the kernelized correlation filter (KCF) as the basic tracker, multitask cascade convolutional neural networks (CNNs) are used to detect the [...] Read more.
This work proposes a fast face tracking-by-detection (FFTD) algorithm that can perform tracking, face detection and discrimination tasks. On the basis of using the kernelized correlation filter (KCF) as the basic tracker, multitask cascade convolutional neural networks (CNNs) are used to detect the face, and a new tracking update strategy is designed. The update strategy uses the tracking result modified by detector to update the filter model. When the tracker drifts or fails, the discriminator module starts the detector to correct the tracking results, which ensures the out-of-view object can be tracked. Through extensive experiments, the proposed FFTD algorithm is shown to have good robustness and real-time performance for video monitoring scenes. Full article
Show Figures

Figure 1

Back to TopTop