MDPI - Publisher of Open Access Journals

28 pages, 6624 KiB

Open AccessArticle

YoloMal-XAI: Interpretable Android Malware Classification Using RGB Images and YOLO11

by Chaymae El Youssofi and Khalid Chougdali

J. Cybersecur. Priv. 2025, 5(3), 52; https://doi.org/10.3390/jcp5030052 (registering DOI) - 1 Aug 2025

As Android malware grows increasingly sophisticated, traditional detection methods struggle to keep pace, creating an urgent need for robust, interpretable, and real-time solutions to safeguard mobile ecosystems. This study introduces YoloMal-XAI, a novel deep learning framework that transforms Android application files into RGB [...] Read more.

As Android malware grows increasingly sophisticated, traditional detection methods struggle to keep pace, creating an urgent need for robust, interpretable, and real-time solutions to safeguard mobile ecosystems. This study introduces YoloMal-XAI, a novel deep learning framework that transforms Android application files into RGB image representations by mapping DEX (Dalvik Executable), Manifest.xml, and Resources.arsc files to distinct color channels. Evaluated on the CICMalDroid2020 dataset using YOLO11 pretrained classification models, YoloMal-XAI achieves 99.87% accuracy in binary classification and 99.56% in multi-class classification (Adware, Banking, Riskware, SMS, and Benign). Compared to ResNet-50, GoogLeNet, and MobileNetV2, YOLO11 offers competitive accuracy with at least 7× faster training over 100 epochs. Against YOLOv8, YOLO11 achieves comparable or superior accuracy while reducing training time by up to 3.5×. Cross-corpus validation using Drebin and CICAndMal2017 further confirms the model’s generalization capability on previously unseen malware. An ablation study highlights the value of integrating DEX, Manifest, and Resources components, with the full RGB configuration consistently delivering the best performance. Explainable AI (XAI) techniques—Grad-CAM, Grad-CAM++, Eigen-CAM, and HiRes-CAM—are employed to interpret model decisions, revealing the DEX segment as the most influential component. These results establish YoloMal-XAI as a scalable, efficient, and interpretable framework for Android malware detection, with strong potential for future deployment on resource-constrained mobile devices. Full article

► Show Figures

Figure 1

26 pages, 1790 KiB

Open AccessArticle

A Hybrid Deep Learning Model for Aromatic and Medicinal Plant Species Classification Using a Curated Leaf Image Dataset

by Shareena E. M., D. Abraham Chandy, Shemi P. M. and Alwin Poulose

AgriEngineering 2025, 7(8), 243; https://doi.org/10.3390/agriengineering7080243 - 1 Aug 2025

Abstract

In the era of smart agriculture, accurate identification of plant species is critical for effective crop management, biodiversity monitoring, and the sustainable use of medicinal resources. However, existing deep learning approaches often underperform when applied to fine-grained plant classification tasks due to the [...] Read more.

In the era of smart agriculture, accurate identification of plant species is critical for effective crop management, biodiversity monitoring, and the sustainable use of medicinal resources. However, existing deep learning approaches often underperform when applied to fine-grained plant classification tasks due to the lack of domain-specific, high-quality datasets and the limited representational capacity of traditional architectures. This study addresses these challenges by introducing a novel, well-curated leaf image dataset consisting of 39 classes of medicinal and aromatic plants collected from the Aromatic and Medicinal Plant Research Station in Odakkali, Kerala, India. To overcome performance bottlenecks observed with a baseline Convolutional Neural Network (CNN) that achieved only 44.94% accuracy, we progressively enhanced model performance through a series of architectural innovations. These included the use of a pre-trained VGG16 network, data augmentation techniques, and fine-tuning of deeper convolutional layers, followed by the integration of Squeeze-and-Excitation (SE) attention blocks. Ultimately, we propose a hybrid deep learning architecture that combines VGG16 with Batch Normalization, Gated Recurrent Units (GRUs), Transformer modules, and Dilated Convolutions. This final model achieved a peak validation accuracy of 95.24%, significantly outperforming several baseline models, such as custom CNN (44.94%), VGG-19 (59.49%), VGG-16 before augmentation (71.52%), Xception (85.44%), Inception v3 (87.97%), VGG-16 after data augumentation (89.24%), VGG-16 after fine-tuning (90.51%), MobileNetV2 (93.67), and VGG16 with SE block (94.94%). These results demonstrate superior capability in capturing both local textures and global morphological features. The proposed solution not only advances the state of the art in plant classification but also contributes a valuable dataset to the research community. Its real-world applicability spans field-based plant identification, biodiversity conservation, and precision agriculture, offering a scalable tool for automated plant recognition in complex ecological and agricultural environments. Full article

(This article belongs to the Special Issue Implementation of Artificial Intelligence in Agriculture)

► Show Figures

Figure 1

35 pages, 4940 KiB

Open AccessArticle

A Novel Lightweight Facial Expression Recognition Network Based on Deep Shallow Network Fusion and Attention Mechanism

by Qiaohe Yang, Yueshun He, Hongmao Chen, Youyong Wu and Zhihua Rao

Algorithms 2025, 18(8), 473; https://doi.org/10.3390/a18080473 - 30 Jul 2025

Viewed by 137

Abstract

Facial expression recognition (FER) is a critical research direction in artificial intelligence, which is widely used in intelligent interaction, medical diagnosis, security monitoring, and other domains. These applications highlight its considerable practical value and social significance. Face expression recognition models often need to [...] Read more.

Facial expression recognition (FER) is a critical research direction in artificial intelligence, which is widely used in intelligent interaction, medical diagnosis, security monitoring, and other domains. These applications highlight its considerable practical value and social significance. Face expression recognition models often need to run efficiently on mobile devices or edge devices, so the research on lightweight face expression recognition is particularly important. However, feature extraction and classification methods of lightweight convolutional neural network expression recognition algorithms mostly used at present are not specifically and fully optimized for the characteristics of facial expression images, yet fail to make full use of the feature information in face expression images. To address the lack of facial expression recognition models that are both lightweight and effectively optimized for expression-specific feature extraction, this study proposes a novel network design tailored to the characteristics of facial expressions. In this paper, we refer to the backbone architecture of MobileNet V2 network, and redesign LightExNet, a lightweight convolutional neural network based on the fusion of deep and shallow layers, attention mechanism, and joint loss function, according to the characteristics of the facial expression features. In the network architecture of LightExNet, firstly, deep and shallow features are fused in order to fully extract the shallow features in the original image, reduce the loss of information, alleviate the problem of gradient disappearance when the number of convolutional layers increases, and achieve the effect of multi-scale feature fusion. The MobileNet V2 architecture has also been streamlined to seamlessly integrate deep and shallow networks. Secondly, by combining the own characteristics of face expression features, a new channel and spatial attention mechanism is proposed to obtain the feature information of different expression regions as much as possible for encoding. Thus improve the accuracy of expression recognition effectively. Finally, the improved center loss function is superimposed to further improve the accuracy of face expression classification results, and corresponding measures are taken to significantly reduce the computational volume of the joint loss function. In this paper, LightExNet is tested on the three mainstream face expression datasets: Fer2013, CK+ and RAF-DB, respectively, and the experimental results show that LightExNet has 3.27 M Parameters and 298.27 M Flops, and the accuracy on the three datasets is 69.17%, 97.37%, and 85.97%, respectively. The comprehensive performance of LightExNet is better than the current mainstream lightweight expression recognition algorithms such as MobileNet V2, IE-DBN, Self-Cure Net, Improved MobileViT, MFN, Ada-CM, Parallel CNN(Convolutional Neural Network), etc. Experimental results confirm that LightExNet effectively improves recognition accuracy and computational efficiency while reducing energy consumption and enhancing deployment flexibility. These advantages underscore its strong potential for real-world applications in lightweight facial expression recognition. Full article

► Show Figures

Figure 1

24 pages, 2159 KiB

Open AccessArticle

Cross-Domain Transfer Learning Architecture for Microcalcification Cluster Detection Using the MEXBreast Multiresolution Mammography Dataset

by Ricardo Salvador Luna Lozoya, Humberto de Jesús Ochoa Domínguez, Juan Humberto Sossa Azuela, Vianey Guadalupe Cruz Sánchez, Osslan Osiris Vergara Villegas and Karina Núñez Barragán

Mathematics 2025, 13(15), 2422; https://doi.org/10.3390/math13152422 - 28 Jul 2025

Viewed by 280

Abstract

Microcalcification clusters (MCCs) are key indicators of breast cancer, with studies showing that approximately 50% of mammograms with MCCs confirm a cancer diagnosis. Early detection is critical, as it ensures a five-year survival rate of up to 99%. However, MCC detection remains challenging [...] Read more.

Microcalcification clusters (MCCs) are key indicators of breast cancer, with studies showing that approximately 50% of mammograms with MCCs confirm a cancer diagnosis. Early detection is critical, as it ensures a five-year survival rate of up to 99%. However, MCC detection remains challenging due to their features, such as small size, texture, shape, and impalpability. Convolutional neural networks (CNNs) offer a solution for MCC detection. Nevertheless, CNNs are typically trained on single-resolution images, limiting their generalizability across different image resolutions. We propose a CNN trained on digital mammograms with three common resolutions: 50, 70, and 100

μ

m. The architecture processes individual 1 cm² patches extracted from the mammograms as input samples and includes a MobileNetV2 backbone, followed by a flattening layer, a dense layer, and a sigmoid activation function. This architecture was trained to detect MCCs using patches extracted from the INbreast database, which has a resolution of 70

μ

m, and achieved an accuracy of 99.84%. We applied transfer learning (TL) and trained on 50, 70, and 100

μ

m resolution patches from the MEXBreast database, achieving accuracies of 98.32%, 99.27%, and 89.17%, respectively. For comparison purposes, models trained from scratch, without leveraging knowledge from the pretrained model, achieved 96.07%, 99.20%, and 83.59% accuracy for 50, 70, and 100

μ

m, respectively. Results demonstrate that TL improves MCC detection across resolutions by reusing pretrained knowledge. Full article

(This article belongs to the Special Issue Mathematical Methods in Artificial Intelligence for Image Processing)

► Show Figures

Figure 1

21 pages, 3448 KiB

Open AccessArticle

A Welding Defect Detection Model Based on Hybrid-Enhanced Multi-Granularity Spatiotemporal Representation Learning

by Chenbo Shi, Shaojia Yan, Lei Wang, Changsheng Zhu, Yue Yu, Xiangteng Zang, Aiping Liu, Chun Zhang and Xiaobing Feng

Sensors 2025, 25(15), 4656; https://doi.org/10.3390/s25154656 - 27 Jul 2025

Viewed by 299

Abstract

Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) and the limited interpretability [...] Read more.

Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) and the limited interpretability of deep learning models, this paper proposes a multi-granularity spatiotemporal representation learning algorithm based on the hybrid enhancement of handcrafted and deep learning features. A MobileNetV2 backbone network integrated with a Temporal Shift Module (TSM) is designed to progressively capture the short-term dynamic features of the molten pool and integrate temporal information across both low-level and high-level features. A multi-granularity attention-based feature aggregation module is developed to select key interference-free frames using cross-frame attention, generate multi-granularity features via grouped pooling, and apply the Convolutional Block Attention Module (CBAM) at each granularity level. Finally, these multi-granularity spatiotemporal features are adaptively fused. Meanwhile, an independent branch utilizes the Histogram of Oriented Gradient (HOG) and Scale-Invariant Feature Transform (SIFT) features to extract long-term spatial structural information from historical edge images, enhancing the model’s interpretability. The proposed method achieves an accuracy of 99.187% on a self-constructed dataset. Additionally, it attains a real-time inference speed of 20.983 ms per sample on a hardware platform equipped with an Intel i9-12900H CPU and an RTX 3060 GPU, thus effectively balancing accuracy, speed, and interpretability. Full article

(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))

► Show Figures

Figure 1

27 pages, 4682 KiB

Open AccessArticle

DERIENet: A Deep Ensemble Learning Approach for High-Performance Detection of Jute Leaf Diseases

by Mst. Tanbin Yasmin Tanny, Tangina Sultana, Md. Emran Biswas, Chanchol Kumar Modok, Arjina Akter, Mohammad Shorif Uddin and Md. Delowar Hossain

Information 2025, 16(8), 638; https://doi.org/10.3390/info16080638 - 27 Jul 2025

Viewed by 139

Abstract

Jute, a vital lignocellulosic fiber crop with substantial industrial and ecological relevance, continues to suffer considerable yield and quality degradation due to pervasive foliar pathologies. Traditional diagnostic modalities reliant on manual field inspections are inherently constrained by subjectivity, diagnostic latency, and inadequate scalability [...] Read more.

Jute, a vital lignocellulosic fiber crop with substantial industrial and ecological relevance, continues to suffer considerable yield and quality degradation due to pervasive foliar pathologies. Traditional diagnostic modalities reliant on manual field inspections are inherently constrained by subjectivity, diagnostic latency, and inadequate scalability across geographically distributed agrarian systems. To transcend these limitations, we propose DERIENet, a robust and scalable classification approach within a deep ensemble learning framework. It is meticulously engineered by integrating three high-performing convolutional neural networks—ResNet50, InceptionV3, and EfficientNetB0—along with regularization, batch normalization, and dropout strategies, to accurately classify jute leaf diseases such as Cercospora Leaf Spot, Golden Mosaic Virus, and healthy leaves. A key methodological contribution is the design of a novel augmentation pipeline, termed Geometric Localized Occlusion and Adaptive Rescaling (GLOAR), which dynamically modulates photometric and geometric distortions based on image entropy and luminance to synthetically upscale a limited dataset (920 images) into a significantly enriched and diverse dataset of 7800 samples, thereby mitigating overfitting and enhancing domain generalizability. Empirical evaluation, utilizing a comprehensive set of performance metrics—accuracy, precision, recall, F1-score, confusion matrices, and ROC curves—demonstrates that DERIENet achieves a state-of-the-art classification accuracy of 99.89%, with macro-averaged and weighted average precision, recall, and F1-score uniformly at 99.89%, and an AUC of 1.0 across all disease categories. The reliability of the model is validated by the confusion matrix, which shows that 899 out of 900 test images were correctly identified and that there was only one misclassification. Comparative evaluations of the various ensemble baselines, such as DenseNet201, MobileNetV2, and VGG16, and individual base learners demonstrate that DERIENet performs noticeably superior to all baseline models. It provides a highly interpretable, deployment-ready, and computationally efficient architecture that is ideal for integrating into edge or mobile platforms to facilitate in situ, real-time disease diagnostics in precision agriculture. Full article

(This article belongs to the Special Issue Advanced Technologies in Intelligent Detection of Biological Information)

► Show Figures

Figure 1

23 pages, 4467 KiB

Open AccessArticle

Research on Indoor Object Detection and Scene Recognition Algorithm Based on Apriori Algorithm and Mobile-EFSSD Model

by Wenda Zheng, Yibo Ai and Weidong Zhang

Mathematics 2025, 13(15), 2408; https://doi.org/10.3390/math13152408 - 26 Jul 2025

Viewed by 183

Abstract

With the advancement of computer vision and image processing technologies, scene recognition has gradually become a research hotspot. However, in practical applications, it is necessary to detect the categories and locations of objects in images while recognizing scenes. To address these issues, this [...] Read more.

With the advancement of computer vision and image processing technologies, scene recognition has gradually become a research hotspot. However, in practical applications, it is necessary to detect the categories and locations of objects in images while recognizing scenes. To address these issues, this paper proposes an indoor object detection and scene recognition algorithm based on the Apriori algorithm and the Mobile-EFSSD model, which can simultaneously obtain object category and location information while recognizing scenes. The specific research contents are as follows: (1) To address complex indoor scenes and occlusion, this paper proposes an improved Mobile-EFSSD object detection algorithm. An optimized MobileNetV3 with ECA attention is used as the backbone. Multi-scale feature maps are fused via FPN. The localization loss includes a hyperparameter, and focal loss replaces confidence loss. Experiments show that the method achieves stable performance, effectively detects occluded objects, and accurately extracts category and location information. (2) To improve classification stability in indoor scene recognition, this paper proposes a naive Bayes-based method. Object detection results are converted into text features, and the Apriori algorithm extracts object associations. Prior probabilities are calculated and fed into a naive Bayes classifier for scene recognition. Evaluated using the ADE20K dataset, the method outperforms existing approaches by achieving a better accuracy–speed trade-off and enhanced classification stability. The proposed algorithm is applied to indoor scene images, enabling the simultaneous acquisition of object categories and location information while recognizing scenes. Moreover, the algorithm has a simple structure, with an object detection average precision of 82.7% and a scene recognition average accuracy of 95.23%, making it suitable for practical detection requirements. Full article

► Show Figures

Figure 1

24 pages, 1990 KiB

Open AccessArticle

Evaluating Skin Tone Fairness in Convolutional Neural Networks for the Classification of Diabetic Foot Ulcers

by Sara Seabra Reis, Luis Pinto-Coelho, Maria Carolina Sousa, Mariana Neto, Marta Silva and Miguela Sequeira

Appl. Sci. 2025, 15(15), 8321; https://doi.org/10.3390/app15158321 - 26 Jul 2025

Viewed by 391

Abstract

The present paper investigates the application of convolutional neural networks (CNNs) for the classification of diabetic foot ulcers, using VGG16, VGG19 and MobileNetV2 architectures. The primary objective is to develop and compare deep learning models capable of accurately identifying ulcerated regions in clinical [...] Read more.

The present paper investigates the application of convolutional neural networks (CNNs) for the classification of diabetic foot ulcers, using VGG16, VGG19 and MobileNetV2 architectures. The primary objective is to develop and compare deep learning models capable of accurately identifying ulcerated regions in clinical images of diabetic feet, thereby aiding in the prevention and effective treatment of foot ulcers. A comprehensive study was conducted using an annotated dataset of medical images, evaluating the performance of the models in terms of accuracy, precision, recall and F1-score. VGG19 achieved the highest accuracy at 97%, demonstrating superior ability to focus activations on relevant lesion areas in complex images. MobileNetV2, while slightly less accurate, excelled in computational efficiency, making it a suitable choice for mobile devices and environments with hardware constraints. The study also highlights the limitations of each architecture, such as increased risk of overfitting in deeper models and the lower capability of MobileNetV2 to capture fine clinical details. These findings suggest that CNNs hold significant potential in computer-aided clinical diagnosis, particularly in the early and precise detection of diabetic foot ulcers, where timely intervention is crucial to prevent amputations. Full article

(This article belongs to the Special Issue Advances and Applications of Machine Learning for Bioinformatics)

► Show Figures

Figure 1

22 pages, 1359 KiB

Open AccessArticle

Fall Detection Using Federated Lightweight CNN Models: A Comparison of Decentralized vs. Centralized Learning

by Qasim Mahdi Haref, Jun Long and Zhan Yang

Appl. Sci. 2025, 15(15), 8315; https://doi.org/10.3390/app15158315 - 25 Jul 2025

Viewed by 200

Abstract

Fall detection is a critical task in healthcare monitoring systems, especially for elderly populations, for whom timely intervention can significantly reduce morbidity and mortality. This study proposes a privacy-preserving and scalable fall-detection framework that integrates federated learning (FL) with transfer learning (TL) to [...] Read more.

Fall detection is a critical task in healthcare monitoring systems, especially for elderly populations, for whom timely intervention can significantly reduce morbidity and mortality. This study proposes a privacy-preserving and scalable fall-detection framework that integrates federated learning (FL) with transfer learning (TL) to train deep learning models across decentralized data sources without compromising user privacy. The pipeline begins with data acquisition, in which annotated video-based fall-detection datasets formatted in YOLO are used to extract image crops of human subjects. These images are then preprocessed, resized, normalized, and relabeled into binary classes (fall vs. non-fall). A stratified 80/10/10 split ensures balanced training, validation, and testing. To simulate real-world federated environments, the training data is partitioned across multiple clients, each performing local training using pretrained CNN models including MobileNetV2, VGG16, EfficientNetB0, and ResNet50. Two FL topologies are implemented: a centralized server-coordinated scheme and a ring-based decentralized topology. During each round, only model weights are shared, and federated averaging (FedAvg) is applied for global aggregation. The models were trained using three random seeds to ensure result robustness and stability across varying data partitions. Among all configurations, decentralized MobileNetV2 achieved the best results, with a mean test accuracy of 0.9927, F1-score of 0.9917, and average training time of 111.17 s per round. These findings highlight the model’s strong generalization, low computational burden, and suitability for edge deployment. Future work will extend evaluation to external datasets and address issues such as client drift and adversarial robustness in federated environments. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 4863 KiB

Open AccessArticle

Detection Model for Cotton Picker Fire Recognition Based on Lightweight Improved YOLOv11

by Zhai Shi, Fangwei Wu, Changjie Han, Dongdong Song and Yi Wu

Agriculture 2025, 15(15), 1608; https://doi.org/10.3390/agriculture15151608 - 25 Jul 2025

Viewed by 258

Abstract

In response to the limited research on fire detection in cotton pickers and the issue of low detection accuracy in visual inspection, this paper proposes a computer vision-based detection method. The method is optimized according to the structural characteristics of cotton pickers, and [...] Read more.

In response to the limited research on fire detection in cotton pickers and the issue of low detection accuracy in visual inspection, this paper proposes a computer vision-based detection method. The method is optimized according to the structural characteristics of cotton pickers, and a lightweight improved YOLOv11 algorithm is designed for cotton fire detection in cotton pickers. The backbone of the model is replaced with the MobileNetV2 network to achieve effective model lightweighting. In addition, the convolutional layers in the original C3k2 block are optimized using partial convolutions to reduce computational redundancy and improve inference efficiency. Furthermore, a visual attention mechanism named CBAM-ECA (Convolutional Block Attention Module-Efficient Channel Attention) is designed to suit the complex working conditions of cotton pickers. This mechanism aims to enhance the model’s feature extraction capability under challenging environmental conditions, thereby improving overall detection accuracy. To further improve localization performance and accelerate convergence, the loss function is also modified. These improvements enable the model to achieve higher precision in fire detection while ensuring fast and accurate localization. Experimental results demonstrate that the improved model reduces the number of parameters by 38%, increases the frame processing speed (FPS) by 13.2%, and decreases the computational complexity (GFLOPs) by 42.8%, compared to the original model. The detection accuracy for flaming combustion, smoldering combustion, and overall detection is improved by 1.4%, 3%, and 1.9%, respectively, with an increase of 2.4% in mAP (mean average precision). Compared to other models—YOLOv3-tiny, YOLOv5, YOLOv8, and YOLOv10—the proposed method achieves higher detection accuracy by 5.9%, 7%, 5.9%, and 5.3%, respectively, and shows improvements in mAP by 5.4%, 5%, 4.8%, and 6.3%. The improved detection algorithm maintains high accuracy while achieving faster inference speed and fewer model parameters. These improvements lay a solid foundation for fire prevention and suppression in cotton collection boxes on cotton pickers. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

15 pages, 2123 KiB

Open AccessArticle

Multi-Class Visual Cyberbullying Detection Using Deep Neural Networks and the CVID Dataset

by Muhammad Asad Arshed, Zunera Samreen, Arslan Ahmad, Laiba Amjad, Hasnain Muavia, Christine Dewi and Muhammad Kabir

Information 2025, 16(8), 630; https://doi.org/10.3390/info16080630 - 24 Jul 2025

Viewed by 244

Abstract

In an era where online interactions increasingly shape social dynamics, the pervasive issue of cyberbullying poses a significant threat to the well-being of individuals, particularly among vulnerable groups. Despite extensive research on text-based cyberbullying detection, the rise of visual content on social media [...] Read more.

In an era where online interactions increasingly shape social dynamics, the pervasive issue of cyberbullying poses a significant threat to the well-being of individuals, particularly among vulnerable groups. Despite extensive research on text-based cyberbullying detection, the rise of visual content on social media platforms necessitates new approaches to address cyberbullying using images. This domain has been largely overlooked. In this paper, we present a novel dataset specifically designed for the detection of visual cyberbullying, encompassing four distinct classes: abuse, curse, discourage, and threat. The initial prepared dataset (cyberbullying visual indicators dataset (CVID)) comprised 664 samples for training and validation, expanded through data augmentation techniques to ensure balanced and accurate results across all classes. We analyzed this dataset using several advanced deep learning models, including VGG16, VGG19, MobileNetV2, and Vision Transformer. The proposed model, based on DenseNet201, achieved the highest test accuracy of 99%, demonstrating its efficacy in identifying the visual cues associated with cyberbullying. To prove the proposed model’s generalizability, the 5-fold stratified K-fold was also considered, and the model achieved an average test accuracy of 99%. This work introduces a dataset and highlights the potential of leveraging deep learning models to address the multifaceted challenges of detecting cyberbullying in visual content. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

17 pages, 3823 KiB

Open AccessArticle

Lightweight UAV-Based System for Early Fire-Risk Identification in Wild Forests

by Akmalbek Abdusalomov, Sabina Umirzakova, Alpamis Kutlimuratov, Dilshod Mirzaev, Adilbek Dauletov, Tulkin Botirov, Madina Zakirova, Mukhriddin Mukhiddinov and Young Im Cho

Fire 2025, 8(8), 288; https://doi.org/10.3390/fire8080288 - 23 Jul 2025

Viewed by 348

Abstract

The escalating impacts and occurrence of wildfires threaten the public, economies, and global ecosystems. Physiologically declining or dead trees are a great portion of the fires because these trees are prone to higher ignition and have lower moisture content. To prevent wildfires, hazardous [...] Read more.

The escalating impacts and occurrence of wildfires threaten the public, economies, and global ecosystems. Physiologically declining or dead trees are a great portion of the fires because these trees are prone to higher ignition and have lower moisture content. To prevent wildfires, hazardous vegetation needs to be removed, and the vegetation should be identified early on. This work proposes a real-time fire risk tree detection framework using UAV images, which is based on lightweight object detection. The model uses the MobileNetV3-Small spine, which is optimized for edge deployment, combined with an SSD head. This configuration results in a highly optimized and fast UAV-based inference pipeline. The dataset used in this study comprises over 3000 annotated RGB UAV images of trees in healthy, partially dead, and fully dead conditions, collected from mixed real-world forest scenes and public drone imagery repositories. Thorough evaluation shows that the proposed model outperforms conventional SSD and recent YOLOs on Precision (94.1%), Recall (93.7%), mAP (90.7%), F1 (91.0%) while being light-weight (8.7 MB) and fast (62.5 FPS on Jetson Xavier NX). These findings strongly support the model’s effectiveness for large-scale continuous forest monitoring to detect health degradations and mitigate wildfire risks proactively. The framework UAV-based environmental monitoring systems differentiates itself by incorporating a balance between detection accuracy, speed, and resource efficiency as fundamental principles. Full article

► Show Figures

Figure 1

25 pages, 5142 KiB

Open AccessArticle

Wheat Powdery Mildew Severity Classification Based on an Improved ResNet34 Model

by Meilin Li, Yufeng Guo, Wei Guo, Hongbo Qiao, Lei Shi, Yang Liu, Guang Zheng, Hui Zhang and Qiang Wang

Agriculture 2025, 15(15), 1580; https://doi.org/10.3390/agriculture15151580 - 23 Jul 2025

Viewed by 248

Abstract

Crop disease identification is a pivotal research area in smart agriculture, forming the foundation for disease mapping and targeted prevention strategies. Among the most prevalent global wheat diseases, powdery mildew—caused by fungal infection—poses a significant threat to crop yield and quality, making early [...] Read more.

Crop disease identification is a pivotal research area in smart agriculture, forming the foundation for disease mapping and targeted prevention strategies. Among the most prevalent global wheat diseases, powdery mildew—caused by fungal infection—poses a significant threat to crop yield and quality, making early and accurate detection crucial for effective management. In this study, we present QY-SE-MResNet34, a deep learning-based classification model that builds upon ResNet34 to perform multi-class classification of wheat leaf images and assess powdery mildew severity at the single-leaf level. The proposed methodology begins with dataset construction following the GBT 17980.22-2000 national standard for powdery mildew severity grading, resulting in a curated collection of 4248 wheat leaf images at the grain-filling stage across six severity levels. To enhance model performance, we integrated transfer learning with ResNet34, leveraging pretrained weights to improve feature extraction and accelerate convergence. Further refinements included embedding a Squeeze-and-Excitation (SE) block to strengthen feature representation while maintaining computational efficiency. The model architecture was also optimized by modifying the first convolutional layer (conv1)—replacing the original 7 × 7 kernel with a 3 × 3 kernel, adjusting the stride to 1, and setting padding to 1—to better capture fine-grained leaf textures and edge features. Subsequently, the optimal training strategy was determined through hyperparameter tuning experiments, and GrabCut-based background processing along with data augmentation were introduced to enhance model robustness. In addition, interpretability techniques such as channel masking and Grad-CAM were employed to visualize the model’s decision-making process. Experimental validation demonstrated that QY-SE-MResNet34 achieved an 89% classification accuracy, outperforming established models such as ResNet50, VGG16, and MobileNetV2 and surpassing the original ResNet34 by 11%. This study delivers a high-performance solution for single-leaf wheat powdery mildew severity assessment, offering practical value for intelligent disease monitoring and early warning systems in precision agriculture. Full article

(This article belongs to the Special Issue How Optical Sensors and Deep Learning Enhance the Production Management in Smart Agriculture)

► Show Figures

Figure 1

18 pages, 3102 KiB

Open AccessArticle

A Multicomponent Face Verification and Identification System

by Athanasios Douklias, Ioannis Zorzos, Evangelos Maltezos, Vasilis Nousis, Spyridon Nektarios Bolierakis, Lazaros Karagiannidis, Eleftherios Ouzounoglou and Angelos Amditis

Appl. Sci. 2025, 15(15), 8161; https://doi.org/10.3390/app15158161 - 22 Jul 2025

Viewed by 208

Abstract

Face recognition technology is a biometric technology, which is based on the identification or verification of facial features. Automatic face recognition is an active research field in the context of computer vision and artificial intelligence (AI) that is fundamental for a variety of [...] Read more.

Face recognition technology is a biometric technology, which is based on the identification or verification of facial features. Automatic face recognition is an active research field in the context of computer vision and artificial intelligence (AI) that is fundamental for a variety of real-time applications. In this research, the design and implementation of a face verification and identification system of a flexible, modular, secure, and scalable architecture is proposed. The proposed system incorporates several and various types of system components: (i) portable capabilities (mobile application and mixed reality [MR] glasses), (ii) enhanced monitoring and visualization via a user-friendly Web-based user interface (UI), and (iii) information sharing via middleware to other external systems. The experiments showed that such interconnected and complementary system components were able to perform robust and real-time results related to face identification and verification. Furthermore, to identify a proper model of high accuracy, robustness, and performance speed for face identification and verification tasks, a comprehensive evaluation of multiple face recognition pre-trained models (FaceNet, ArcFace, Dlib, and MobileNetV2) on a curated version of the ID vs. Spot dataset was performed. Among the models used, FaceNet emerged as a preferable choice for real-time tasks due to its balance between accuracy and inference speed for both face identification and verification tasks achieving AUC of 0.99, Rank-1 of 91.8%, Rank-5 of 95.8%, FNR of 2% and FAR of 0.1%, accuracy of 98.6%, and inference speed of 52 ms. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)

► Show Figures

Figure 1

18 pages, 5806 KiB

Open AccessArticle

Optical Flow Magnification and Cosine Similarity Feature Fusion Network for Micro-Expression Recognition

by Heyou Chang, Jiazheng Yang, Kai Huang, Wei Xu, Jian Zhang and Hao Zheng

Mathematics 2025, 13(15), 2330; https://doi.org/10.3390/math13152330 - 22 Jul 2025

Viewed by 223

Abstract

Recent advances in deep learning have significantly advanced micro-expression recognition, yet most existing methods process the entire facial region holistically, struggling to capture subtle variations in facial action units, which limits recognition performance. To address this challenge, we propose the Optical Flow Magnification [...] Read more.

Recent advances in deep learning have significantly advanced micro-expression recognition, yet most existing methods process the entire facial region holistically, struggling to capture subtle variations in facial action units, which limits recognition performance. To address this challenge, we propose the Optical Flow Magnification and Cosine Similarity Feature Fusion Network (MCNet). MCNet introduces a multi-facial action optical flow estimation module that integrates global motion-amplified optical flow with localized optical flow from the eye and mouth–nose regions, enabling precise capture of facial expression nuances. Additionally, an enhanced MobileNetV3-based feature extraction module, incorporating Kolmogorov–Arnold networks and convolutional attention mechanisms, effectively captures both global and local features from optical flow images. A novel multi-channel feature fusion module leverages cosine similarity between Query and Key token sequences to optimize feature integration. Extensive evaluations on four public datasets—CASME II, SAMM, SMIC-HS, and MMEW—demonstrate MCNet’s superior performance, achieving state-of-the-art results with 92.88% UF1 and 86.30% UAR on the composite dataset, surpassing the best prior method by 1.77% in UF1 and 6.0% in UAR. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

Search Results (949)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (949)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI