MDPI - Publisher of Open Access Journals

35 pages, 5529 KB

Open AccessArticle

Occasion-Based Clothing Classification Using Vision Transformer and Traditional Machine Learning Models

by Hanaa Alzahrani, Maram Almotairi and Arwa Basbrain

Computers 2026, 15(4), 249; https://doi.org/10.3390/computers15040249 - 17 Apr 2026

Viewed by 898

Clothing classification by occasion is an important area in computer vision and artificial intelligence (AI). This task is particularly challenging because of the subtle visual similarities among clothing categories such as formal, party, and casual attire. Variations in color, fabric, patterns, and lighting [...] Read more.

Clothing classification by occasion is an important area in computer vision and artificial intelligence (AI). This task is particularly challenging because of the subtle visual similarities among clothing categories such as formal, party, and casual attire. Variations in color, fabric, patterns, and lighting further increase the complexity of this task. To address this challenge, we used the Fashionpedia dataset to create a balanced subset of 15,000 images. Specifically, we adopted two different methods for labeling these images: automated classification, which relies on category identifications (IDs) and components, and manual labeling performed by human annotators. We then implemented our preprocessing pipeline, which includes several steps: resizing, image normalization, background removal using segmentation masks, and class balancing. We benchmarked traditional models, including artificial neural networks (ANNs), support vector machines (SVMs), and k-nearest neighbors (KNNs), which use a histogram of oriented gradient (HOG) features, as well as deep learning models such as convolutional neural networks (CNNs), the Visual Geometry Group 16 (VGG16) model utilizing transfer learning, and the vision transformer (ViT) model, all evaluated using identical data splits and preprocessing procedures. The traditional models achieved moderate accuracy, ranging from 54% to 66%. In contrast, the ViT model achieved an accuracy of 81.78% with automated classification and 98.09% with manual labeling. This indicates that a higher label accuracy, along with the preprocessing steps used, significantly enhances the performance. Together, these factors improve the effectiveness of ViT in context-aware apparel classification and establish a reliable baseline for future research. Full article

(This article belongs to the Special Issue Machine Learning: Innovation, Implementation, and Impact)

► Show Figures

Figure 1

23 pages, 32193 KB

Open AccessArticle

Object Detection on Road: Vehicle’s Detection Based on Re-Training Models on NVIDIA-Jetson Platform

by Sleiter Ramos-Sanchez, Jinmi Lezama, Ricardo Yauri and Joyce Zevallos

J. Imaging 2026, 12(1), 20; https://doi.org/10.3390/jimaging12010020 - 1 Jan 2026

Cited by 2 | Viewed by 1954

Abstract

The increasing use of artificial intelligence (AI) and deep learning (DL) techniques has driven advances in vehicle classification and detection applications for embedded devices with deployment constraints due to computational cost and response time. In the case of urban environments with high traffic [...] Read more.

The increasing use of artificial intelligence (AI) and deep learning (DL) techniques has driven advances in vehicle classification and detection applications for embedded devices with deployment constraints due to computational cost and response time. In the case of urban environments with high traffic congestion, such as the city of Lima, it is important to determine the trade-off between model accuracy, type of embedded system, and the dataset used. This study was developed using a methodology adapted from the CRISP-DM approach, which included the acquisition of traffic videos in the city of Lima, their segmentation, and manual labeling. Subsequently, three SSD-based detection models (MobileNetV1-SSD, MobileNetV2-SSD-Lite, and VGG16-SSD) were trained on the NVIDIA Jetson Orin NX 16 GB platform. The results show that the VGG16-SSD model achieved the highest average precision (mAP

\approx 90.7 %

), with a longer training time, while the MobileNetV1-SSD (

512 \times 512

) model achieved comparable performance (mAP

\approx 90.4 %

) with a shorter time. Additionally, data augmentation through contrast adjustment improved the detection of minority classes such as Tuk-tuk and Motorcycle. The results indicate that, among the evaluated models, MobileNetV1-SSD (

512 \times 512

) achieved the best balance between accuracy and computational load for its implementation in ADAS embedded systems in congested urban environments. Full article

(This article belongs to the Special Issue Advances in Machine Learning for Computer Vision Applications)

► Show Figures

Figure 1

20 pages, 4705 KB

Open AccessArticle

MSA-ResNet: A Neural Network for Fine-Grained Instar Identification of Spodoptera frugiperda Larvae in Smart Agriculture

by Quanyuan Xu, Mingyang Wang, Ying Lu, Dan Feng, Hui Ye and Yonghe Li

Agronomy 2025, 15(12), 2724; https://doi.org/10.3390/agronomy15122724 - 26 Nov 2025

Viewed by 782

Abstract

The Spodoptera frugiperda (fall armyworm), a globally significant agricultural pest, poses severe threats to crop production. Accurate identification of larval instar stages is crucial for implementing precise control measures and reducing pesticide use. However, traditional identification methods suffer from low efficiency and heavy [...] Read more.

The Spodoptera frugiperda (fall armyworm), a globally significant agricultural pest, poses severe threats to crop production. Accurate identification of larval instar stages is crucial for implementing precise control measures and reducing pesticide use. However, traditional identification methods suffer from low efficiency and heavy reliance on expert knowledge, while existing deep learning models still face challenges such as insufficient feature extraction and high computational complexity in fine-grained instar classification. To address these issues, this study proposes a novel network model, termed Multi-Scale Improved Self-Attention ResNet (MSA-ResNet), which integrates large convolutional kernels (LCK), atrous spatial pyramid pooling (ASPP), and an improved self-attention (ISA) mechanism into the ResNet50 backbone. These enhancements enable the model to more effectively capture and discriminate subtle morphological details of larvae. Experiments conducted on a self-constructed dataset comprising 24,179 images across six instar stages demonstrate that MSA-ResNet achieves an accuracy of 96.81% on the test set, significantly outperforming mainstream models such as ResNet50, VGG16, and MobileNetV3. In particular, the precision for the first instar increased by 12.94%, while the recall rates for the second and fourth instars improved by 16% and 8.97%, respectively. Ablation studies further validate the effectiveness of each module and the optimal embedding strategy. This research presents a high-precision and efficient intelligent solution for larval instar identification of S. frugiperda, offering a transferable reference for fine-grained image recognition tasks in agricultural pest management. Full article

(This article belongs to the Section Pest and Disease Management)

► Show Figures

Figure 1

20 pages, 1535 KB

Open AccessArticle

ConvNeXt-Driven Detection of Alzheimer’s Disease: A Benchmark Study on Expert-Annotated AlzaSet MRI Dataset Across Anatomical Planes

by Mahdiyeh Basereh, Matthew Alexander Abikenari, Sina Sadeghzadeh, Trae Dunn, René Freichel, Prabha Siddarth, Dara Ghahremani, Helen Lavretsky and Vivek P. Buch

Diagnostics 2025, 15(23), 2997; https://doi.org/10.3390/diagnostics15232997 - 25 Nov 2025

Cited by 4 | Viewed by 1590

Abstract

Background: Alzheimer’s disease (AD) is a leading worldwide cause of cognitive impairment, necessitating accurate, inexpensive diagnostic tools to enable early recognition. Methods: In this study, we present a robust deep learning approach for AD classification based on structural MRI scans, ConvNeXt, an emergent [...] Read more.

Background: Alzheimer’s disease (AD) is a leading worldwide cause of cognitive impairment, necessitating accurate, inexpensive diagnostic tools to enable early recognition. Methods: In this study, we present a robust deep learning approach for AD classification based on structural MRI scans, ConvNeXt, an emergent convolutional architecture inspired by vision transformers. We introduce AlzaSet, a clinically curated T1-weighted MRI dataset of 79 subjects (63 with Alzheimer’s disease [AD], 16 cognitively normal controls [NC]) acquired on a 1.5 T Siemens Aera in axial, coronal, and sagittal planes, respectively (12,947 slices in total). Images are neuroradiologist-labeled. Results are reported per plane, with awareness of the class imbalance at the subject level. We further present AlzaSet, a novel, expertly labeled clinical dataset with axial, coronal, and sagittal perspectives from AD and cognitively normal control subjects. Three ConvNeXt sizes (Tiny, Small, Base) were compared and benchmarked against existing state-of-the-art CNN models (VGG16, VGG19, InceptionV3, DenseNet121). Results: ConvNeXt-Base consistently outperformed the other models on coronal slices with an accuracy of 98.37% and an AUC of 0.992. Coronal views were determined to be most diagnostically informative, with emphasis on visualization of the medial temporal lobe. Moreover, comparison with recent ensemble-based techniques showed superior performance with comparable computational efficiency. Conclusions: These results indicate that ConvNeXt-capable models applied to clinically curated datasets have strong potential to provide scalable, real-time AD screening in diverse settings, including both high-resource and resource-constrained settings. Full article

(This article belongs to the Special Issue AI-Driven Precision Medicine: Innovations in Diagnosis, Prognosis, and Management Response)

► Show Figures

Figure 1

15 pages, 3973 KB

Open AccessArticle

Enhanced Bathymetric Inversion for Tectonic Features via Multi-Gravity-Component DenseNet: A Case Study of Rift Identification in the South China Sea

by Huan Zhang, Houpu Li, Shuai Zhou, Fengshun Zhu, Jingshu Li and Shaofeng Bian

Remote Sens. 2025, 17(20), 3453; https://doi.org/10.3390/rs17203453 - 16 Oct 2025

Viewed by 837

Abstract

Submarine rift systems represent critical tectonic features whose accurate bathymetric characterization remains challenging yet essential for understanding plate boundary dynamics. However, traditional bathymetric inversion methods based on altimetric gravity data exhibit poor performance in resolving rift and steep-slope terrains. To address this limitation [...] Read more.

Submarine rift systems represent critical tectonic features whose accurate bathymetric characterization remains challenging yet essential for understanding plate boundary dynamics. However, traditional bathymetric inversion methods based on altimetric gravity data exhibit poor performance in resolving rift and steep-slope terrains. To address this limitation and enhance accuracy in complex topographic regions, we propose a multi-gravity-component fusion framework based on an improved DenseNet architecture. By integrating shipborne bathymetry, gravity anomaly (GA), vertical gravity gradient (VGG), vertical deflection components (meridian component ξ and prime vertical component η), and GEBCO_2024, we construct a 16 × 16 × 9 input tensor. The model incorporates adaptive transition layers to preserve fine-scale tectonic features and curvature-based stratification to balance learning across diverse terrains. Validation using 43,035 independent points yields an RMSE of 84.75 m, representing a 47.6% reduction relative to GEBCO_2024. Crucially, in the identified rift targets, errors decreased by 69.3–87.1%. Ablation studies reveal that vertical deflection components (ξ, η) dominate the physical constraints, with their removal increasing the RMSE by 91.08 m (a 107.5% increase relative to the baseline error). Architectural innovations and stratification reduce steep-slope RMSE by 6.1%. These results validate the efficacy of directional gravity derivatives for tectonic feature inversion and demonstrate significant potential for application to mid-ocean ridge systems. Full article

► Show Figures

Figure 1

29 pages, 11690 KB

Open AccessArticle

Enhanced Breast Cancer Diagnosis Using Multimodal Feature Fusion with Radiomics and Transfer Learning

by Nazmul Ahasan Maruf, Abdullah Basuhail and Muhammad Umair Ramzan

Diagnostics 2025, 15(17), 2170; https://doi.org/10.3390/diagnostics15172170 - 28 Aug 2025

Cited by 7 | Viewed by 2972

Abstract

Background: Breast cancer remains a critical public health problem worldwide and is a leading cause of cancer-related mortality. Optimizing clinical outcomes is contingent upon the early and precise detection of malignancies. Advances in medical imaging and artificial intelligence (AI), particularly in the fields [...] Read more.

Background: Breast cancer remains a critical public health problem worldwide and is a leading cause of cancer-related mortality. Optimizing clinical outcomes is contingent upon the early and precise detection of malignancies. Advances in medical imaging and artificial intelligence (AI), particularly in the fields of radiomics and deep learning (DL), have contributed to improvements in early detection methodologies. Nonetheless, persistent challenges, including limited data availability, model overfitting, and restricted generalization, continue to hinder performance. Methods: This study aims to overcome existing challenges by improving model accuracy and robustness through enhanced data augmentation and the integration of radiomics and deep learning features from the CBIS-DDSM dataset. To mitigate overfitting and improve model generalization, data augmentation techniques were applied. The PyRadiomics library was used to extract radiomics features, while transfer learning models were employed to derive deep learning features from the augmented training dataset. For radiomics feature selection, we compared multiple supervised feature selection methods, including RFE with random forest and logistic regression, ANOVA F-test, LASSO, and mutual information. Embedded methods with XGBoost, LightGBM, and CatBoost for GPUs were also explored. Finally, we integrated radiomics and deep features to build a unified multimodal feature space for improved classification performance. Based on this integrated set of radiomics and deep learning features, 13 pre-trained transfer learning models were trained and evaluated, including various versions of ResNet (50, 50V2, 101, 101V2, 152, 152V2), DenseNet (121, 169, 201), InceptionV3, MobileNet, and VGG (16, 19). Results: Among the evaluated models, ResNet152 achieved the highest classification accuracy of 97%, demonstrating the potential of this approach to enhance diagnostic precision. Other models, including VGG19, ResNet101V2, and ResNet101, achieved 96% accuracy, emphasizing the importance of the selected feature set in achieving robust detection. Conclusions: Future research could build on this work by incorporating Vision Transformer (ViT) architectures and leveraging multimodal data (e.g., clinical data, genomic information, and patient history). This could improve predictive performance and make the model more robust and adaptable to diverse data types. Ultimately, this approach has the potential to transform breast cancer detection, making it more accurate and interpretable. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

22 pages, 9631 KB

Open AccessArticle

Automatic Recognition of Commercial Tree Species from the Amazon Flora Using Bark Images and Transfer Learning

by Natally Celestino Gama, Luiz Eduardo Soares Oliveira, Samuel de Pádua Chaves e Carvalho, Alexandre Behling, Pedro Luiz de Paula Filho, Márcia Orie de Sousa Hamada, Eduardo da Silva Leal and Deivison Venicio Souza

Forests 2025, 16(9), 1374; https://doi.org/10.3390/f16091374 - 27 Aug 2025

Cited by 3 | Viewed by 2045

Abstract

The application of artificial intelligence (AI) techniques has improved the accuracy of forest species identification, particularly in timber inventories conducted under Sustainable Forest Management (SFM). This study developed and evaluated machine learning models to recognize 16 Amazonian timber species using digital images of [...] Read more.

The application of artificial intelligence (AI) techniques has improved the accuracy of forest species identification, particularly in timber inventories conducted under Sustainable Forest Management (SFM). This study developed and evaluated machine learning models to recognize 16 Amazonian timber species using digital images of tree bark. Data were collected from three SFM units located in Nova Maringá, Feliz Natal, and Cotriguaçu, in the state of Mato Grosso, Brazil. High-resolution images were processed into sub-images (256 × 256 pixels), and two feature extraction methods were tested: Local Binary Patterns (LBP) and pre-trained Convolutional Neural Networks (ResNet50, VGG16, InceptionV3, MobileNetV2). Four classifiers—Support Vector Machine (SVM), Artificial Neural Networks (ANN), Random Forest (RF), and Linear Discriminant Analysis (LDA)—were used. The best result (95% accuracy) was achieved using ResNet50 with SVM, confirming the effectiveness of transfer learning for species recognition based on bark texture. These findings highlight the potential of AI-based tools to enhance accuracy in forest inventories and support decision-making in tropical forest management. Full article

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

► Show Figures

Figure 1

18 pages, 1680 KB

Open AccessArticle

Multi-Task Deep Learning for Simultaneous Classification and Segmentation of Cancer Pathologies in Diverse Medical Imaging Modalities

by Maryem Rhanoui, Khaoula Alaoui Belghiti and Mounia Mikram

Onco 2025, 5(3), 34; https://doi.org/10.3390/onco5030034 - 11 Jul 2025

Cited by 3 | Viewed by 6540

Abstract

Background: Clinical imaging is an important part of health care providing physicians with great assistance in patients treatment. In fact, segmentation and grading of tumors can help doctors assess the severity of the cancer at an early stage and increase the chances [...] Read more.

Background: Clinical imaging is an important part of health care providing physicians with great assistance in patients treatment. In fact, segmentation and grading of tumors can help doctors assess the severity of the cancer at an early stage and increase the chances of cure. Despite that Deep Learning for cancer diagnosis has achieved clinically acceptable accuracy, there still remains challenging tasks, especially in the context of insufficient labeled data and the subsequent need for expensive computational ressources. Objective: This paper presents a lightweight classification and segmentation deep learning model to assist in the identification of cancerous tumors with high accuracy despite the scarcity of medical data. Methods: We propose a multi-task architecture for classification and segmentation of cancerous tumors in the Brain, Skin, Prostate and lungs. The model is based on the UNet architecture with different pre-trained deep learning models (VGG 16 and MobileNetv2) as a backbone. The multi-task model is validated on relatively small datasets (slightly exceed 1200 images) that are diverse in terms of modalities (IRM, X-Ray, Dermoscopic and Digital Histopathology), number of classes, shapes, and sizes of cancer pathologies using the accuracy and dice coefficient as statistical metrics. Results: Experiments show that the multi-task approach improve the learning efficiency and the prediction accuracy for the segmentation and classification tasks, compared to training the individual models separately. The multi-task architecture reached a classification accuracy of 86%, 90%, 88%, and 87% respectively for Skin Lesion, Brain Tumor, Prostate Cancer and Pneumothorax. For the segmentation tasks we were able to achieve high precisions respectively 95%, 98% for the Skin Lesion and Brain Tumor segmentation and a 99% precise segmentation for both Prostate cancer and Pneumothorax. Proving that the multi-task solution is more efficient than single-task networks. Full article

► Show Figures

Figure 1

31 pages, 31711 KB

Open AccessArticle

On the Usage of Deep Learning Techniques for Unmanned Aerial Vehicle-Based Citrus Crop Health Assessment

by Ana I. Gálvez-Gutiérrez, Frederico Afonso and Juana M. Martínez-Heredia

Remote Sens. 2025, 17(13), 2253; https://doi.org/10.3390/rs17132253 - 30 Jun 2025

Cited by 8 | Viewed by 2457

Abstract

This work proposes an end-to-end solution for leaf segmentation, disease detection, and damage quantification, specifically focusing on citrus crops. The primary motivation behind this research is to enable the early detection of phytosanitary problems, which directly impact the productivity and profitability of Spanish [...] Read more.

This work proposes an end-to-end solution for leaf segmentation, disease detection, and damage quantification, specifically focusing on citrus crops. The primary motivation behind this research is to enable the early detection of phytosanitary problems, which directly impact the productivity and profitability of Spanish and Portuguese agricultural developments, while ensuring environmentally safe management practices. It integrates an onboard computing module for Unmanned Aerial Vehicles (UAVs) using a Raspberry Pi 4 with Global Positioning System (GPS) and camera modules, allowing the real-time geolocation of images in citrus croplands. To address the lack of public data, a comprehensive database was created and manually labelled at the pixel level to provide accurate training data for a deep learning approach. To reduce annotation effort, we developed a custom automation algorithm for pixel-wise labelling in complex natural backgrounds. A SegNet architecture with a Visual Geometry Group 16 (VGG16) backbone was trained for the semantic, pixel-wise segmentation of citrus foliage. The model was successfully integrated as a modular component within a broader system architecture and was tested with UAV-acquired images, demonstrating accurate disease detection and quantification, even under varied conditions. The developed system provides a robust tool for the efficient monitoring of citrus crops in precision agriculture. Full article

(This article belongs to the Special Issue Application of Satellite and UAV Data in Precision Agriculture)

► Show Figures

Figure 1

13 pages, 12530 KB

Open AccessArticle

Data Augmentation-Driven Improvements in Malignant Lymphoma Image Classification

by Sandi Baressi Šegota, Vedran Mrzljak, Ivan Lorencin and Nikola Anđelić

Computers 2025, 14(7), 252; https://doi.org/10.3390/computers14070252 - 26 Jun 2025

Cited by 1 | Viewed by 1222

Abstract

Artificial intelligence (AI)-based techniques have become increasingly prevalent in the classification of medical images. However, the effectiveness of such methods is often constrained by the limited availability of annotated medical data. To address this challenge, data augmentation is frequently employed. This study investigates [...] Read more.

Artificial intelligence (AI)-based techniques have become increasingly prevalent in the classification of medical images. However, the effectiveness of such methods is often constrained by the limited availability of annotated medical data. To address this challenge, data augmentation is frequently employed. This study investigates the impact of a novel augmentation approach on the classification performance of malignant lymphoma histopathological images. The proposed method involves slicing high-resolution images (1388 × 1040 pixels) into smaller segments (224 × 224 pixels) before applying standard augmentation techniques such as flipping and rotation. The original dataset consists of 374 images, comprising 32.6% mantle cell lymphoma, 30.2% chronic lymphocytic leukemia, and 37.2% follicular lymphoma. Through slicing, the dataset was expanded to 8976 images, and further augmented to 53,856 images. The visual geometry group with 16 layers (VGG16) convolutional neural network (CNN) was trained and evaluated on three datasets: the original, the sliced, and the sliced with augmentation. Performance was assessed using accuracy, AUC, precision, sensitivity, specificity, and F1 score. The results demonstrate a substantial improvement in classification performance when slicing was employed, with additional, albeit smaller, gains achieved through subsequent augmentation. Full article

(This article belongs to the Special Issue Advanced Image Processing and Computer Vision (2nd Edition))

► Show Figures

Figure 1

15 pages, 1457 KB

Open AccessArticle

Benchmarking Accelerometer and CNN-Based Vision Systems for Sleep Posture Classification in Healthcare Applications

by Minh Long Hoang, Guido Matrella, Dalila Giannetto, Paolo Craparo and Paolo Ciampolini

Sensors 2025, 25(12), 3816; https://doi.org/10.3390/s25123816 - 18 Jun 2025

Viewed by 1853

Abstract

Sleep position recognition plays a crucial role in diagnosing and managing various health conditions, such as sleep apnea, pressure ulcers, and musculoskeletal disorders. Accurate monitoring of body posture during sleep can provide valuable insights for clinicians and support the development of intelligent healthcare [...] Read more.

Sleep position recognition plays a crucial role in diagnosing and managing various health conditions, such as sleep apnea, pressure ulcers, and musculoskeletal disorders. Accurate monitoring of body posture during sleep can provide valuable insights for clinicians and support the development of intelligent healthcare systems. This research presents a comparative analysis of sleep position recognition using two distinct approaches: image-based deep learning and accelerometer-based classification. There are five classes: prone, supine, right side, left side, and wake up. For the image-based method, the Visual Geometry Group 16 (VGG16) convolutional neural network was fine-tuned with data augmentation strategies including rotation, reflection, scaling, and translation to enhance model generalization. The image-based model achieved an overall accuracy of 93.49%, with perfect precision and recall for “right side” and “wakeup” positions, but slightly lower performance for “left side” and “supine” classes. In contrast, the accelerometer-based method employed a feedforward neural network trained on features extracted from segmented accelerometer data, such as signal sum, standard deviation, maximum, and spike count. This method yielded superior performance, reaching an accuracy exceeding 99.8% across most sleep positions. The “wake up” position was particularly easy to detect due to the absence of body movements such as heartbeat or respiration when the person is no longer in bed. The results demonstrate that while image-based models are effective, accelerometer-based classification offers higher precision and robustness, particularly in real-time and privacy-sensitive scenarios. Further comparisons of the system characteristics, data size, and training time are also carried out to offer crucial insights for selecting the appropriate technology in clinical, in-home, or embedded healthcare monitoring applications. Full article

(This article belongs to the Special Issue Advances in Sensing Technologies for Sleep Monitoring)

► Show Figures

Figure 1

30 pages, 4558 KB

Open AccessArticle

AI-Powered Lung Cancer Detection: Assessing VGG16 and CNN Architectures for CT Scan Image Classification

by Rapeepat Klangbunrueang, Pongsathon Pookduang, Wirapong Chansanam and Tassanee Lunrasri

Informatics 2025, 12(1), 18; https://doi.org/10.3390/informatics12010018 - 11 Feb 2025

Cited by 29 | Viewed by 9019

Abstract

Lung cancer is a leading cause of mortality worldwide, and early detection is crucial in improving treatment outcomes and reducing death rates. However, diagnosing medical images, such as Computed Tomography scans (CT scans), is complex and requires a high level of expertise. This [...] Read more.

Lung cancer is a leading cause of mortality worldwide, and early detection is crucial in improving treatment outcomes and reducing death rates. However, diagnosing medical images, such as Computed Tomography scans (CT scans), is complex and requires a high level of expertise. This study focuses on developing and evaluating the performance of Convolutional Neural Network (CNN) models, specifically the Visual Geometry Group 16 (VGG16) architecture, to classify lung cancer CT scan images into three categories: Normal, Benign, and Malignant. The dataset used consists of 1097 CT images from 110 patients, categorized according to these severity levels. The research methodology began with data collection and preparation, followed by training and testing the VGG16 model and comparing its performance with other CNN architectures, including Residual Network with 50 layers (ResNet50), Inception Version 3 (InceptionV3), and Mobile Neural Network Version 2 (MobileNetV2). The experimental results indicate that VGG16 achieved the highest classification performance, with a Test Accuracy of 98.18%, surpassing the other models. This accuracy highlights VGG16’s strong potential as a supportive diagnostic tool in medical imaging. However, a limitation of this study is the dataset size, which may reduce model accuracy when applied to new data. Future studies should consider increasing the dataset size, using Data Augmentation techniques, fine-tuning model parameters, and employing advanced models such as 3D CNN or Vision Transformers. Additionally, incorporating Gradient-weighted Class Activation Mapping (Grad-CAM) to interpret model decisions would enhance transparency and reliability. This study confirms the potential of CNNs, particularly VGG16, for classifying lung cancer CT images and provides a foundation for further development in medical applications. Full article

(This article belongs to the Section Medical and Clinical Informatics)

► Show Figures

Figure 1

41 pages, 1802 KB

Open AccessReview

A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition

by Andisani Nemavhola, Colin Chibaya and Serestina Viriri

Information 2025, 16(2), 107; https://doi.org/10.3390/info16020107 - 5 Feb 2025

Cited by 29 | Viewed by 10954

Abstract

This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent [...] Read more.

This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent collections such as MegaFace and Ms-Celeb-1M, offering a range of sizes, subject diversity, and image quality. Older databases, such as ORL and FERET, are smaller and cleaner, while newer datasets enable large-scale training with millions of images but pose challenges like inconsistent data quality and high computational costs. The study also examines CNN architectures, including FaceNet and Visual Geometry Group 16 (VGG16), which show strong performance on large datasets like Labeled Faces in the Wild (LFW) and VGGFace, achieving accuracy rates above 98%. In contrast, earlier models like Support Vector Machine (SVM) and Gabor Wavelets perform well on smaller datasets but lack scalability for larger, more complex datasets. The analysis highlights the growing importance of multi-task learning and ensemble methods, as seen in Multi-Task Cascaded Convolutional Networks (MTCNNs). Overall, the findings emphasize the need for advanced algorithms capable of handling large-scale, real-world challenges while optimizing accuracy and computational efficiency in face recognition systems. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)

► Show Figures

Figure 1

24 pages, 9651 KB

Open AccessEditor’s ChoiceArticle

Fault Detection in Induction Machines Using Learning Models and Fourier Spectrum Image Analysis

by Kevin Barrera-Llanga, Jordi Burriel-Valencia, Angel Sapena-Bano and Javier Martinez-Roman

Sensors 2025, 25(2), 471; https://doi.org/10.3390/s25020471 - 15 Jan 2025

Cited by 17 | Viewed by 4225

Abstract

Induction motors are essential components in industry due to their efficiency and cost-effectiveness. This study presents an innovative methodology for automatic fault detection by analyzing images generated from the Fourier spectra of current signals using deep learning techniques. A new preprocessing technique incorporating [...] Read more.

Induction motors are essential components in industry due to their efficiency and cost-effectiveness. This study presents an innovative methodology for automatic fault detection by analyzing images generated from the Fourier spectra of current signals using deep learning techniques. A new preprocessing technique incorporating a distinctive background to enhance spectral feature learning is proposed, enabling the detection of four types of faults: healthy motor coupled to a generator with a broken bar (HGB), broken rotor bar (BRB), race bearing fault (RBF), and bearing ball fault (BBF). The dataset was generated from three-phase signals of an induction motor controlled by a Direct Torque Controller under various operating conditions (20–1500 rpm with 0–100% load), resulting in 4251 images. The model, based on a Visual Geometry Group (VGG) architecture with 19 layers, achieved an overall accuracy of 98%, with specific accuracies of 99% for RAF, 100% for BRB, 100% for RBF, and 95% for BBF. A new model interpretability was assessed using explainability techniques, which allowed for the identification of specific learning patterns. This analysis introduces a new approach by demonstrating how different convolutional blocks capture particular features: the first convolutional block captures signal shape, while the second identifies background features. Additionally, distinct convolutional layers were associated with each fault type: layer 9 for RAF, layer 13 for BRB, layer 16 for RBF, and layer 14 for BBF. This methodology offers a scalable solution for predictive maintenance in induction motors, effectively combining signal processing, computer vision, and explainability techniques. Full article

(This article belongs to the Special Issue Feature Papers in Fault Diagnosis & Sensors 2024)

► Show Figures

Figure 1

25 pages, 8832 KB

Open AccessArticle

3D-CNN with Multi-Scale Fusion for Tree Crown Segmentation and Species Classification

by Jiayao Wang, Zhen Zhen, Yuting Zhao, Ye Ma and Yinghui Zhao

Remote Sens. 2024, 16(23), 4544; https://doi.org/10.3390/rs16234544 - 4 Dec 2024

Cited by 9 | Viewed by 3111

Abstract

Natural secondary forests play a crucial role in global ecological security, climate change mitigation, and biodiversity conservation. However, accurately delineating individual tree crowns and identifying tree species in dense natural secondary forests remains a challenge. This study combines deep learning with traditional image [...] Read more.

Natural secondary forests play a crucial role in global ecological security, climate change mitigation, and biodiversity conservation. However, accurately delineating individual tree crowns and identifying tree species in dense natural secondary forests remains a challenge. This study combines deep learning with traditional image segmentation methods to improve individual tree crown detection and species classification. The approach utilizes hyperspectral, unmanned aerial vehicle laser scanning data, and ground survey data from Maoershan Forest Farm in Heilongjiang Province, China. The study consists of two main processes: (1) combining semantic segmentation algorithms (U-Net and Deeplab V3 Plus) with watershed transform (WTS) for tree crown detection (U-WTS and D-WTS algorithms); (2) resampling the original images to different pixel densities (16 × 16, 32 × 32, and 64 × 64 pixels) and inputting them into five 3D-CNN models (ResNet10, ResNet18, ResNet34, ResNet50, VGG16). For tree species classification, the MSFB combined with the CNN models were used. The results show that the U-WTS algorithm achieved a recall of 0.809, precision of 0.885, and an F-score of 0.845. ResNet18 with a pixel density of 64 × 64 pixels achieved the highest overall accuracy (OA) of 0.916, an improvement of 0.049 over the original images. After incorporating MSFB, the OA improved by approximately 0.04 across all models, with only a 6% increase in model parameters. Notably, the floating-point operations (FLOPs) of ResNet18 + MSFB were only one-eighth of those of ResNet18 with 64 × 64 pixels, while achieving similar accuracy (OA: 0.912 vs. 0.916). This framework offers a scalable solution for large-scale tree species distribution mapping and forest resource inventories. Full article

► Show Figures

Figure 1

Search Results (84)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (84)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI