You are currently viewing a new version of our website. To view the old version click .
Journal of Imaging
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Communication
  • Open Access

15 October 2025

Surgical Instrument Segmentation via Segment-then-Classify Framework with Instance-Level Spatiotemporal Consistency Modeling

,
and
School of Automation and Intelligence, Beijing Jiaotong University, Beijing 100044, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Image and Video Processing

Abstract

Accurate segmentation of surgical instruments in endoscopic videos is crucial for robot-assisted surgery and intraoperative analysis. This paper presents a Segment-then-Classify framework that decouples mask generation from semantic classification to enhance spatial completeness and temporal stability. First, a Mask2Former-based segmentation backbone generates class-agnostic instance masks and region features. Then, a bounding box-guided instance-level spatiotemporal modeling module fuses geometric priors and temporal consistency through a lightweight transformer encoder. This design improves interpretability and robustness under occlusion and motion blur. Experiments on the EndoVis 2017 and 2018 datasets demonstrate that our framework achieves mIoU improvements of 3.06%, 2.99%, and 1.67% and mcIoU gains of 2.36%, 2.85%, and 6.06%, respectively, over previously state-of-the-art methods, while maintaining computational efficiency.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.