Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,150)

Search Parameters:
Keywords = vision Transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 2027 KB  
Article
An Improved Diffusion Model for Generating Images of a Single Category of Food on a Small Dataset
by Zitian Chen, Zhiyong Xiao, Dinghui Wu and Qingbing Sang
Foods 2026, 15(3), 443; https://doi.org/10.3390/foods15030443 - 26 Jan 2026
Abstract
In the era of the digital food economy, high-fidelity food images are critical for applications ranging from visual e-commerce presentation to automated dietary assessment. However, developing robust computer vision systems for food analysis is often hindered by data scarcity for long-tail or regional [...] Read more.
In the era of the digital food economy, high-fidelity food images are critical for applications ranging from visual e-commerce presentation to automated dietary assessment. However, developing robust computer vision systems for food analysis is often hindered by data scarcity for long-tail or regional dishes. To address this challenge, we propose a novel high-fidelity food image synthesis framework as an effective data augmentation tool. Unlike generic generative models, our method introduces an Ingredient-Aware Diffusion Model based on the Masked Diffusion Transformer (MaskDiT) architecture. Specifically, we design a Label and Ingredients Encoding (LIE) module and a Cross-Attention (CA) mechanism to explicitly model the relationship between food composition and visual appearance, simulating the “cooking” process digitally. Furthermore, to stabilize training on limited data samples, we incorporate a linear interpolation strategy into the diffusion process. Extensive experiments on the Food-101 and VireoFood-172 datasets demonstrate that our method achieves state-of-the-art generation quality even in data-scarce scenarios. Crucially, we validate the practical utility of our synthetic images: utilizing them for data augmentation improved the accuracy of downstream food classification tasks from 95.65% to 96.20%. This study provides a cost-effective solution for generating diverse, controllable, and realistic food data to advance smart food systems. Full article
Show Figures

Figure 1

18 pages, 2599 KB  
Article
C-ViT: An Improved ViT Model for Multi-Label Classification of Bamboo Chopstick Defects
by Waizhong Wang, Wei Peng, Liancheng Zeng, Yue Shen, Chaoyun Zhu and Yingchun Kuang
Sensors 2026, 26(3), 812; https://doi.org/10.3390/s26030812 - 26 Jan 2026
Abstract
The quality of disposable bamboo chopsticks directly affects consumers’ usage experience and health safety. Therefore, quality inspection is particularly important, and multi-label classification of defects can better meet the refined demands of actual production. While ViT has made significant progress in visual tasks, [...] Read more.
The quality of disposable bamboo chopsticks directly affects consumers’ usage experience and health safety. Therefore, quality inspection is particularly important, and multi-label classification of defects can better meet the refined demands of actual production. While ViT has made significant progress in visual tasks, it has limitations when dealing with extreme aspect ratios like bamboo chopsticks. To address this, this paper proposes an improved ViT model, C-ViT, introducing a convolutional neural network feature extraction module (CFE) to replace traditional patch embedding, making the input features more suitable for the ViT model. Moreover, existing loss functions in multi-label classification tasks focus on label prediction optimization, making hard labels difficult to learn due to their low gradient contribution. Therefore, this paper proposes a Hard Examples Contrastive Loss (HCL) function, dynamically selecting hard examples and combining label and feature correlation to construct a contrastive learning mechanism, enhancing the model’s ability to model hard examples. Experimental results show that on the self-built bamboo chopstick defect dataset (BCDD), C-ViT improves the mAP by 1.2% to 92.8% compared to the ViTS model, and can reach 94.3% after adding HCL. In addition, we further verified the effectiveness of the proposed HCL function in multi-label classification tasks on the VOC2012 public dataset. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

21 pages, 1284 KB  
Article
Probabilistic Indoor 3D Object Detection from RGB-D via Gaussian Distribution Estimation
by Hyeong-Geun Kim
Mathematics 2026, 14(3), 421; https://doi.org/10.3390/math14030421 - 26 Jan 2026
Abstract
Conventional object detectors represent each object by a deterministic bounding box, regressing its center and size from RGB images. However, such discrete parameterization ignores the inherent uncertainty in object appearance and geometric projection, which can be more naturally modeled as a probabilistic density [...] Read more.
Conventional object detectors represent each object by a deterministic bounding box, regressing its center and size from RGB images. However, such discrete parameterization ignores the inherent uncertainty in object appearance and geometric projection, which can be more naturally modeled as a probabilistic density field. Recent works have introduced Gaussian-based formulations that treat objects as distributions rather than boxes, yet they remain limited to 2D images or require late fusion between image and depth modalities. In this paper, we propose a unified Gaussian-based framework for direct 3D object detection from RGB-D inputs. Our method is built upon a vision transformer backbone to effectively capture global context. Instead of separately embedding RGB and depth features or refining depth within region proposals, our method takes a full four-channel RGB-D tensor and predicts the mean and covariance of a 3D Gaussian distribution for each object in a single forward pass. We extend a pretrained vision transformer to accept four-channel inputs by augmenting the patch embedding layer while preserving ImageNet-learned representations. This formulation allows the detector to represent both object location and geometric uncertainty in 3D space. By optimizing divergence metrics such as the Kullback–Leibler or Bhattacharyya distances between predicted and target distributions, the network learns a physically consistent probabilistic representation of objects. Experimental results on the SUN RGB-D benchmark demonstrate that our approach achieves competitive performance compared to state-of-the-art point-cloud-based methods while offering uncertainty-aware and geometrically interpretable 3D detections. Full article
Show Figures

Figure 1

14 pages, 2524 KB  
Article
From Practice to Territory: Experiences of Participatory Agroecology in the AgrEcoMed Project
by Lucia Briamonte, Domenica Ricciardi, Michela Ascani and Maria Assunta D’Oronzio
World 2026, 7(2), 19; https://doi.org/10.3390/world7020019 - 26 Jan 2026
Abstract
The environmental and social crises affecting global agri-food systems highlight the need for a profound transformation of production models and their territorial relations. In this context, agroecology, understood as science, practice, and movement, has emerged as a paradigm capable of integrating ecological sustainability, [...] Read more.
The environmental and social crises affecting global agri-food systems highlight the need for a profound transformation of production models and their territorial relations. In this context, agroecology, understood as science, practice, and movement, has emerged as a paradigm capable of integrating ecological sustainability, social equity, and community participation. Within this framework, the work carried out by CREA in the AgrEcoMed project (new agroecological approach for soil fertility and biodiversity restoration to improve economic and social resilience of Mediterranean farming systems), funded by the PRIMA programme, investigates agroecology as a social and political process of territorial regeneration. This process is grounded in co-design with local stakeholders, collective learning, and the construction of multi-actor networks for agroecology in the Mediterranean. The Manifesto functions as a tool for participatory governance and value convergence, aiming to consolidate a shared vision for the Mediterranean agroecological transition. The article examines, through an analysis of the existing literature, the role of agroecological networks and empirically examines the function of the collective co-creation of the Manifesto as a tool for social innovation. The methodology is based on a participatory action-research approach that used local focus groups, World Café, and thematic analysis to identify the needs of the companies involved. The results highlight the formation of a multi-actor network currently comprising around 90 members and confirm the effectiveness of the Manifesto as a boundary object for horizontal governance. This demonstrates how sustainability can emerge from dialogue, cooperation, and the co-production of knowledge among local actors. Full article
Show Figures

Figure 1

18 pages, 14590 KB  
Article
VTC-Net: A Semantic Segmentation Network for Ore Particles Integrating Transformer and Convolutional Block Attention Module (CBAM)
by Yijing Wu, Weinong Liang, Jiandong Fang, Chunxia Zhou and Xiaolu Sun
Sensors 2026, 26(3), 787; https://doi.org/10.3390/s26030787 - 24 Jan 2026
Viewed by 78
Abstract
In mineral processing, visual-based online particle size analysis systems depend on high-precision image segmentation to accurately quantify ore particle size distribution, thereby optimizing crushing and sorting operations. However, due to multi-scale variations, severe adhesion, and occlusion within ore particle clusters, existing segmentation models [...] Read more.
In mineral processing, visual-based online particle size analysis systems depend on high-precision image segmentation to accurately quantify ore particle size distribution, thereby optimizing crushing and sorting operations. However, due to multi-scale variations, severe adhesion, and occlusion within ore particle clusters, existing segmentation models often exhibit undersegmentation and misclassification, leading to blurred boundaries and limited generalization. To address these challenges, this paper proposes a novel semantic segmentation model named VTC-Net. The model employs VGG16 as the backbone encoder, integrates Transformer modules in deeper layers to capture global contextual dependencies, and incorporates a Convolutional Block Attention Module (CBAM) at the fourth stage to enhance focus on critical regions such as adhesion edges. BatchNorm layers are used to stabilize training. Experiments on ore image datasets show that VTC-Net outperforms mainstream models such as UNet and DeepLabV3 in key metrics, including MIoU (89.90%) and pixel accuracy (96.80%). Ablation studies confirm the effectiveness and complementary role of each module. Visual analysis further demonstrates that the model identifies ore contours and adhesion areas more accurately, significantly improving segmentation robustness and precision under complex operational conditions. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

48 pages, 1184 KB  
Systematic Review
Machine Learning, Neural Networks, and Computer Vision in Addressing Railroad Accidents, Railroad Tracks, and Railway Safety: An Artificial Intelligence Review
by Damian Frej, Lukasz Pawlik and Jacek Lukasz Wilk-Jakubowski
Appl. Sci. 2026, 16(3), 1184; https://doi.org/10.3390/app16031184 - 23 Jan 2026
Viewed by 97
Abstract
Ensuring robust railway safety is paramount for efficient and reliable transportation systems, a challenge increasingly addressed through advancements in artificial intelligence (AI). This review paper comprehensively explores the burgeoning role of AI in enhancing the safety of railway operations, focusing on key contributions [...] Read more.
Ensuring robust railway safety is paramount for efficient and reliable transportation systems, a challenge increasingly addressed through advancements in artificial intelligence (AI). This review paper comprehensively explores the burgeoning role of AI in enhancing the safety of railway operations, focusing on key contributions from machine learning, neural networks, and computer vision. We synthesize current research that leverages these sophisticated AI methodologies to mitigate risks associated with railroad accidents and optimize railroad tracks management. The scope of this review encompasses diverse applications, including real-time monitoring of track conditions, predictive maintenance for infrastructure components, automated defect detection, and intelligent systems for obstacle and intrusion detection. Furthermore, it delves into the use of AI in assessing human factors, improving signaling systems, and analyzing accident/incident reports for proactive risk management. By examining the integration of advanced analytical techniques into various facets of railway operations, this paper highlights how AI is transforming traditional safety paradigms, paving the way for more resilient, efficient, and secure railway networks worldwide. Full article
16 pages, 1428 KB  
Article
StrDiSeg: Adapter-Enhanced DINOv3 for Automated Ischemic Stroke Lesion Segmentation
by Qiong Chen, Donghao Zhang, Yimin Chen, Siyuan Zhang, Yue Sun, Fabiano Reis, Li M. Li, Li Yuan, Huijuan Jin and Wu Qiu
Bioengineering 2026, 13(2), 133; https://doi.org/10.3390/bioengineering13020133 - 23 Jan 2026
Viewed by 120
Abstract
Deep vision foundation models such as DINOv3 offer strong visual representation capacity, but their direct deployment in medical image segmentation remains difficult due to the limited availability of annotated clinical data and the computational cost of full fine-tuning. This study proposes an adaptation [...] Read more.
Deep vision foundation models such as DINOv3 offer strong visual representation capacity, but their direct deployment in medical image segmentation remains difficult due to the limited availability of annotated clinical data and the computational cost of full fine-tuning. This study proposes an adaptation framework called StrDiSeg that integrates lightweight bottleneck adapters between selected transformer layers of DINOv3, enabling task-specific learning while preserving pretrained knowledge. An attention-enhanced U-Net decoder with multi-scale feature fusion further refines the representations. Experiments were performed on two publicly available ischemic stroke lesion segmentation datasets—AISD (Non Contrast CT) and ISLES22 (DWI). The proposed method achieved Dice scores of 0.516 on AISD and 0.824 on ISLES22, outperforming baseline models and demonstrating strong robustness across different clinical imaging modalities. These results indicate that adapter-based fine-tuning provides a practical and computationally efficient strategy for leveraging large pretrained vision models in medical image segmentation. Full article
Show Figures

Figure 1

17 pages, 3892 KB  
Article
Transformer-Driven Semi-Supervised Learning for Prostate Cancer Histopathology: A DINOv2–TransUNet Framework
by Rubina Akter Rabeya, Jeong-Wook Seo, Nam Hoon Cho, Hee-Cheol Kim and Heung-Kook Choi
Mach. Learn. Knowl. Extr. 2026, 8(2), 26; https://doi.org/10.3390/make8020026 - 23 Jan 2026
Viewed by 50
Abstract
Prostate cancer is diagnosed through a comprehensive study of histopathology slides, which takes time and requires professional interpretation. To minimize this load, we developed a semi-supervised learning technique that combines transformer-based representation learning and a custom TransUNet classifier. To capture a wide range [...] Read more.
Prostate cancer is diagnosed through a comprehensive study of histopathology slides, which takes time and requires professional interpretation. To minimize this load, we developed a semi-supervised learning technique that combines transformer-based representation learning and a custom TransUNet classifier. To capture a wide range of morphological structures without manual annotation, our method pretrains DINOv2 on 10,000 unlabeled prostate tissue patches. After receiving the transformer-derived features, a bespoke CNN-based decoder uses residual upsampling and carefully constructed skip connections to merge data from many spatial scales. Expert pathologists identified only 20% of the patches in the whole dataset; the remaining unlabeled samples were contributed by using a consistency-driven learning method that promoted reliable predictions across various augmentations. The model received precision and recall scores of 91.81% and 89.02%, respectively, and an accuracy of 93.78% on an additional test set. These results exceed the performance of a conventional U-Net and a baseline encoder–decoder network. All things considered, the localized CNN (Convolutional Neural Network) decoding and global transformer attention provide a reliable method for prostate cancer classification in situations with little annotated data. Full article
Show Figures

Graphical abstract

16 pages, 5308 KB  
Article
Patient-Level Classification of Rotator Cuff Tears on Shoulder MRI Using an Explainable Vision Transformer Framework
by Murat Aşçı, Sergen Aşık, Ahmet Yazıcı and İrfan Okumuşer
J. Clin. Med. 2026, 15(3), 928; https://doi.org/10.3390/jcm15030928 (registering DOI) - 23 Jan 2026
Viewed by 70
Abstract
Background/Objectives: Diagnosing Rotator Cuff Tears (RCTs) via Magnetic Resonance Imaging (MRI) is clinically challenging due to complex 3D anatomy and significant interobserver variability. Traditional slice-centric Convolutional Neural Networks (CNNs) often fail to capture the necessary volumetric context for accurate grading. This study [...] Read more.
Background/Objectives: Diagnosing Rotator Cuff Tears (RCTs) via Magnetic Resonance Imaging (MRI) is clinically challenging due to complex 3D anatomy and significant interobserver variability. Traditional slice-centric Convolutional Neural Networks (CNNs) often fail to capture the necessary volumetric context for accurate grading. This study aims to develop and validate the Patient-Aware Vision Transformer (Pa-ViT), an explainable deep-learning framework designed for the automated, patient-level classification of RCTs (Normal, Partial-Thickness, and Full-Thickness). Methods: A large-scale retrospective dataset comprising 2447 T2-weighted coronal shoulder MRI examinations was utilized. The proposed Pa-ViT framework employs a Vision Transformer (ViT-Base) backbone within a Weakly-Supervised Multiple Instance Learning (MIL) paradigm to aggregate slice-level semantic features into a unified patient diagnosis. The model was trained using a weighted cross-entropy loss to address class imbalance and was benchmarked against widely used CNN architectures and traditional machine-learning classifiers. Results: The Pa-ViT model achieved a high overall accuracy of 91% and a macro-averaged F1-score of 0.91, significantly outperforming the standard VGG-16 baseline (87%). Notably, the model demonstrated superior discriminative power for the challenging Partial-Thickness Tear class (ROC AUC: 0.903). Furthermore, Attention Rollout visualizations confirmed the model’s reliance on genuine anatomical features, such as the supraspinatus footprint, rather than artifacts. Conclusions: By effectively modeling long-range dependencies, the Pa-ViT framework provides a robust alternative to traditional CNNs. It offers a clinically viable, explainable decision support tool that enhances diagnostic sensitivity, particularly for subtle partial-thickness tears. Full article
(This article belongs to the Section Orthopedics)
Show Figures

Figure 1

26 pages, 4329 KB  
Review
Advanced Sensor Technologies in Cutting Applications: A Review
by Motaz Hassan, Roan Kirwin, Chandra Sekhar Rakurty and Ajay Mahajan
Sensors 2026, 26(3), 762; https://doi.org/10.3390/s26030762 - 23 Jan 2026
Viewed by 194
Abstract
Advances in sensing technologies are increasingly transforming cutting operations by enabling data-driven condition monitoring, predictive maintenance, and process optimization. This review surveys recent developments in sensing modalities for cutting systems, including vibration sensors, acoustic emission sensors, optical and vision-based systems, eddy-current sensors, force [...] Read more.
Advances in sensing technologies are increasingly transforming cutting operations by enabling data-driven condition monitoring, predictive maintenance, and process optimization. This review surveys recent developments in sensing modalities for cutting systems, including vibration sensors, acoustic emission sensors, optical and vision-based systems, eddy-current sensors, force sensors, and emerging hybrid/multi-modal sensing frameworks. Each sensing approach offers unique advantages in capturing mechanical, acoustic, geometric, or electromagnetic signatures related to tool wear, process instability, and fault development, while also showing modality-specific limitations such as noise sensitivity, environmental robustness, and integration complexity. Recent trends show a growing shift toward hybrid and multi-modal sensor fusion, where data from multiple sensors are combined using advanced data analytics and machine learning to improve diagnostic accuracy and reliability under changing cutting conditions. The review also discusses how artificial intelligence, Internet of Things connectivity, and edge computing enable scalable, real-time monitoring solutions, along with the challenges related to data needs, computational costs, and system integration. Future directions highlight the importance of robust fusion architectures, physics-informed and explainable models, digital twin integration, and cost-effective sensor deployment to accelerate adoption across various manufacturing environments. Overall, these advancements position advanced sensing and hybrid monitoring strategies as key drivers of intelligent, Industry 4.0-oriented cutting processes. Full article
Show Figures

Figure 1

27 pages, 11804 KB  
Article
FRAM-ViT: Frequency-Aware and Relation-Enhanced Vision Transformer with Adaptive Margin Contrastive Center Loss for Fine-Grained Classification of Ancient Murals
by Lu Wei, Zhengchao Chang, Jianing Li, Jiehao Cai and Xianlin Peng
Electronics 2026, 15(2), 488; https://doi.org/10.3390/electronics15020488 - 22 Jan 2026
Viewed by 94
Abstract
Fine-grained visual classification requires recognizing subtle inter-class differences under substantial intra-class variation. Ancient mural recognition poses additional challenges: severe degradation and complex backgrounds introduce noise that obscures discriminative features, limited annotated data restricts model training, and dynasty-specific artistic styles manifest as periodic brushwork [...] Read more.
Fine-grained visual classification requires recognizing subtle inter-class differences under substantial intra-class variation. Ancient mural recognition poses additional challenges: severe degradation and complex backgrounds introduce noise that obscures discriminative features, limited annotated data restricts model training, and dynasty-specific artistic styles manifest as periodic brushwork patterns and compositional structures that are difficult to capture. Existing spatial-domain methods fail to model the frequency characteristics of textures and the cross-region semantic relationships inherent in mural imagery. To address these limitations, we propose a Vision Transformer (ViT) framework which integrates frequency-domain enhancement, explicit token-relation modeling, adaptive multi-focus inference, and discriminative metric supervision. A Frequency Channel Attention (FreqCA) module applies 2D FFT-based channel gating to emphasize discriminative periodic patterns and textures. A Cross-Token Relation Attention (CTRA) module employs joint global and local gates to strengthen semantically related token interactions across distant regions. An Adaptive Omni-Focus (AOF) block partitions tokens into importance groups for multi-head classification, while Complementary Tokens Integration (CTI) fuses class tokens from multiple transformer layers. Finally, Adaptive Margin Contrastive Center Loss (AMCCL) improves intra-class compactness and inter-class separability with margins adapted to class-center similarities. Experiments on CUB-200-2011, Stanford Dogs, and a Dunhuang mural dataset show accuracies of 91.15%, 94.57%, and 94.27%, outperforming the ACC-ViT baseline by 1.35%, 1.63%, and 2.20%, respectively. Full article
Show Figures

Figure 1

24 pages, 9875 KB  
Article
Corn Kernel Segmentation and Damage Detection Using a Hybrid Watershed–Convex Hull Approach
by Yi Shen, Wensheng Wang, Xuanyu Luo, Feiyu Zou and Zhen Yin
Foods 2026, 15(2), 404; https://doi.org/10.3390/foods15020404 - 22 Jan 2026
Viewed by 57
Abstract
Accurate segmentation of adhered (sticky) corn kernels and reliable damage detection are critical for quality control in corn processing and kernel selection. Traditional watershed algorithms suffer from over-segmentation, whereas deep learning methods require large annotated datasets that are impractical in most industrial settings. [...] Read more.
Accurate segmentation of adhered (sticky) corn kernels and reliable damage detection are critical for quality control in corn processing and kernel selection. Traditional watershed algorithms suffer from over-segmentation, whereas deep learning methods require large annotated datasets that are impractical in most industrial settings. This study proposes W&C-SVM, a hybrid computer vision method that integrates an improved watershed algorithm (Sobel gradient and Euclidean distance transform), convex hull defect detection and an SVM classifier trained on only 50 images. On an independent test set, W&C-SVM achieved the highest damage detection accuracy of 94.3%, significantly outperforming traditional watershed SVM (TW + SVM) (74.6%), GrabCut (84.5%) and U-Net trained on the same 50 images (85.7%). The method effectively separates severely adhered kernels and identifies mechanical damage, supporting the selection of intact kernels for quality control. W&C-SVM offers a low-cost, small-sample solution ideally suited for small-to-medium food enterprises and breeding laboratories. Full article
Show Figures

Figure 1

37 pages, 1556 KB  
Article
Leading the Digital Transformation of Education: The Perspective of School Principals
by Bistra Mizova, Yonka Parvanova and Roumiana Peytcheva-Forsyth
Adm. Sci. 2026, 16(1), 57; https://doi.org/10.3390/admsci16010057 - 22 Jan 2026
Viewed by 135
Abstract
This mixed-methods study investigates the strategic management of digital transformation in Bulgarian schools by analysing principals’ self-reported leadership practices and styles. Using data from a nationally representative sample (N = 349) gathered through the SELFIE tool, complemented by 30 in-depth interviews, the research [...] Read more.
This mixed-methods study investigates the strategic management of digital transformation in Bulgarian schools by analysing principals’ self-reported leadership practices and styles. Using data from a nationally representative sample (N = 349) gathered through the SELFIE tool, complemented by 30 in-depth interviews, the research examines how school leaders understand and enact their roles as digital leaders within a context of fragmented policies and uneven digital capacity. Quantitative results reveal a central paradox: although 89.7% of principals claim to actively support teachers’ digital innovation, only about half report having a formalised digital strategy. This imbalance between strong operational support and weak institutionalisation reflects the dominant approach to school digitalisation in Bulgaria. Qualitative cluster analysis identifies three leadership profiles: (1) a strategic–collaborative profile, characterised by long-term planning, partnerships, and data-driven decisions; (2) a supportive–collaborative profile focused on teacher communities and context-specific professional development but lacking strategic vision; and (3) a balanced–pragmatic profile oriented toward measurable improvements and adaptive responses. Triangulation with national assessment data shows that leadership styles align with institutional contexts: high-performing schools tend to apply strategic–collaborative leadership, while lower-performing schools adopt pragmatic, adaptive approaches. The study argues that digital transformation requires context-sensitive frameworks recognising multiple developmental trajectories, highlighting the need for differentiated policies that support strategic institutionalisation of existing digital innovations while addressing structural inequalities. Full article
(This article belongs to the Section Leadership)
Show Figures

Figure 1

21 pages, 5567 KB  
Article
Classification of Double-Bottom U-Shaped Weld Joints Using Synthetic Images and Image Splitting
by Gyeonghoon Kang and Namkug Ku
J. Mar. Sci. Eng. 2026, 14(2), 224; https://doi.org/10.3390/jmse14020224 - 21 Jan 2026
Viewed by 54
Abstract
The shipbuilding industry relies heavily on welding, which accounts for approximately 70% of the overall production process. However, the recent decline in skilled workers, together with rising labor costs, has accelerated the automation of shipbuilding operations. In particular, the welding activities are concentrated [...] Read more.
The shipbuilding industry relies heavily on welding, which accounts for approximately 70% of the overall production process. However, the recent decline in skilled workers, together with rising labor costs, has accelerated the automation of shipbuilding operations. In particular, the welding activities are concentrated in the double-bottom region of ships, where collaborative robots are increasingly introduced to alleviate workforce shortages. Because these robots must directly recognize U-shaped weld joints, this study proposes an image-based classification system capable of automatically identifying and classifying such joints. In double-bottom structures, U-shaped weld joints can be categorized into 176 types according to combinations of collar plate type, slot, watertight feature, and girder. To distinguish these types, deep learning-based image recognition is employed. To construct a large-scale training dataset, 3D Computer-Aided Design (CAD) models were automatically generated using Open Cascade and subsequently rendered to produce synthetic images. Furthermore, to improve classification performance, the input images were split into left, right, upper, and lower regions for both training and inference. The class definitions for each region were simplified based on the presence or absence of key features. Consequently, the classification accuracy was significantly improved compared with an approach using non-split images. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

41 pages, 2850 KB  
Article
Automated Classification of Humpback Whale Calls Using Deep Learning: A Comparative Study of Neural Architectures and Acoustic Feature Representations
by Jack C. Johnson and Yue Rong
Sensors 2026, 26(2), 715; https://doi.org/10.3390/s26020715 - 21 Jan 2026
Viewed by 95
Abstract
Passive acoustic monitoring (PAM) using hydrophones enables collecting acoustic data to be collected in large and diverse quantities, necessitating the need for a reliable automated classification system. This paper presents a data-processing pipeline and a set of neural networks designed for a humpback-whale-detection [...] Read more.
Passive acoustic monitoring (PAM) using hydrophones enables collecting acoustic data to be collected in large and diverse quantities, necessitating the need for a reliable automated classification system. This paper presents a data-processing pipeline and a set of neural networks designed for a humpback-whale-detection system. A collection of audio segments is compiled using publicly available audio repositories and extensively curated via manual methods, undertaking thorough examination, editing and clipping to produce a dataset minimizing bias or categorization errors. An array of standard data-augmentation techniques are applied to the collected audio, diversifying and expanding the original dataset. Multiple neural networks are designed and trained using TensorFlow 2.20.0 and Keras 3.13.1 frameworks, resulting in a custom curated architecture layout based on research and iterative improvements. The pre-trained model MobileNetV2 is also included for further analysis. Model performance demonstrates a strong dependence on both feature representation and network architecture. Mel spectrogram inputs consistently outperformed MFCC (Mel-Frequency Cepstral Coefficients) features across all model types. The highest performance was achieved by the pretrained MobileNetV2 using mel spectrograms without augmentation, reaching a test accuracy of 99.01% with balanced precision and recall of 99% and a Matthews correlation coefficient of 0.98. The custom CNN with mel spectrograms also achieved strong performance, with 98.92% accuracy and a false negative rate of only 0.75%. In contrast, models trained with MFCC representations exhibited consistently lower robustness and higher false negative rates. These results highlight the comparative strengths of the evaluated feature representations and network architectures for humpback whale detection. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

Back to TopTop