Integration of Superpixel Segmentation, Convolutional Neural Networks and Vision Transformers for Automatic Benthic Habitats Classification

Mohamed, Hassan; Nadaoka, Kazuo

doi:10.3390/rs18111711

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Integration of Superpixel Segmentation, Convolutional Neural Networks and Vision Transformers for Automatic Benthic Habitats Classification

by

Hassan Mohamed

^1,*

and

Kazuo Nadaoka

^2,3

¹

Department of Geomatics Engineering, Shoubra Faculty of Engineering, Benha University, Cairo 11672, Egypt

²

Kajima Technical Research Institute (KaTRI), Chofu 182-0036, Japan

³

Institute of Science Tokyo, Meguro-ku, Tokyo 152-8552, Japan

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(11), 1711; https://doi.org/10.3390/rs18111711

Submission received: 24 February 2026 / Revised: 5 May 2026 / Accepted: 21 May 2026 / Published: 26 May 2026

(This article belongs to the Section Ocean Remote Sensing)

Download Versions Notes

Abstract

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have achieved significant success in various computer vision applications, including the classification of high-resolution imagery. However, a notable limitation of these deep learning approaches is their tendency to inadequately preserve the precise edges and shapes of target objects. In contrast, Object-Based Image Analysis (OBIA) offers a methodology that emphasizes the preservation of object boundaries by segmenting images into meaningful objects. Combining CNNs and ViTs with OBIA leverages the feature extraction capabilities of these deep learning algorithms and the boundary-preserving advantages of OBIA, leading to enhanced classification accuracy and improved delineation of object boundaries in high-resolution images. Still, the main challenge for combining these methods lies in effectively aligning the irregularly shaped image objects produced by OBIA with the regular image patches required by CNNs and ViT architectures. In this study, we propose a novel approach that integrates superpixel segmentation with CNNs and ViTs for the automatic classification of benthic habitats using high-resolution orthomosaic images. Initially, the Simple Linear Iterative Clustering (SLIC) algorithm was applied to segment the high-resolution orthomosaic images into superpixels. Subsequently, the central points of the resulting superpixels were utilized to generate square image patches. These patches performed as inputs for ConvNeXt-Base and EfficientNet-B0 pre-trained CNNs to extract fine-grained features and Dinov2 ViTs to extract high-level features. Then, a Support Vector Machine (SVM) classifier was trained using these attributes to classify benthic habitats. Eventually, the classification label derived from the SVM defined the class of each superpixel segment. This method achieved an average overall accuracy of 0.96 in classifying benthic habitats. Overall, we demonstrate that combining CNNs, ViTs, and superpixel segmentation is an effective approach to benthic habitats classification, providing accurate high-resolution maps of heterogeneous reef environments.

Keywords: convolutional neural networks; vision transformers; orthomosaics; OBIA; SLIC; SVM

Share and Cite

MDPI and ACS Style

Mohamed, H.; Nadaoka, K. Integration of Superpixel Segmentation, Convolutional Neural Networks and Vision Transformers for Automatic Benthic Habitats Classification. Remote Sens. 2026, 18, 1711. https://doi.org/10.3390/rs18111711

AMA Style

Mohamed H, Nadaoka K. Integration of Superpixel Segmentation, Convolutional Neural Networks and Vision Transformers for Automatic Benthic Habitats Classification. Remote Sensing. 2026; 18(11):1711. https://doi.org/10.3390/rs18111711

Chicago/Turabian Style

Mohamed, Hassan, and Kazuo Nadaoka. 2026. "Integration of Superpixel Segmentation, Convolutional Neural Networks and Vision Transformers for Automatic Benthic Habitats Classification" Remote Sensing 18, no. 11: 1711. https://doi.org/10.3390/rs18111711

APA Style

Mohamed, H., & Nadaoka, K. (2026). Integration of Superpixel Segmentation, Convolutional Neural Networks and Vision Transformers for Automatic Benthic Habitats Classification. Remote Sensing, 18(11), 1711. https://doi.org/10.3390/rs18111711

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Integration of Superpixel Segmentation, Convolutional Neural Networks and Vision Transformers for Automatic Benthic Habitats Classification

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI