# A Semantic Segmentation Method Based on Image Entropy Weighted Spatio-Temporal Fusion for Blade Attachment Recognition of Marine Current Turbines

## Abstract

## 1. Introduction

- An IEWSTF-based semantic segmentation method is proposed in this paper for the attachment recognition of MCT blades, which aims to improve robustness against the variations in rotational speed of the MCT;
- The STF mechanism is proposed for learning the ST features between the adjacent frames to alleviate the degradation of feature extraction due to motion blur;
- An IEW mechanism is proposed to obtain the optimal fusion features by adaptively adjusting the fusion weights by measuring the degree of difference between the adjacent frames.

## 2. Related Works

#### 2.1. Semantic Segmentation

#### 2.2. Image Entropy

## 3. Data and Methods

#### 3.1. Image Dataset for MCT

#### 3.2. The Image Entropy Weighted Spatio-Temporal Fusion-Based SS Method

#### 3.2.1. Spatio-Temporal Fusion

_{t}will learn the features of the key frame $\mathit{F}\left(t\right)$ by iteratively updating the network parameters. The feature maps will be used as the foundation for SS. Furthermore, since the ground truth for segmentation is annotated based on the key frame $\mathit{F}\left(t\right)$, the feature maps extracted by ViT

_{t−1}and ViT

_{t+1}in the adjacent frames $\mathit{F}\left(t-1\right)$ and $\mathit{F}\left(t+1\right)$ will also gradually approach the feature maps corresponding to the key frame $\mathit{F}\left(t\right)$ during the iterative update of the network parameters. In fact, this will enable ViT

_{t−1}and ViT

_{t+1}to construct mappings from the adjacent frames to the key frame features, which can be considered as the network capturing the ST features of the adjacent frames [29,37]. It is worth noting that the above ViT modules use a similar structure to the Transformer Block in SegFormer. However, this paper uses three ViT modules to process three frames, respectively. The additional ViT modules are used to learn the spatio-temporal features in the adjacent frames and fuse them with the features of the key frames. The process of STF can be formulated as Equation (3):

_{t}, ViT

_{t−1}and ViT

_{t+1}, respectively.

#### 3.2.2. Image Entropy Weighting

## 4. Results and Discussion

#### 4.1. Configuration of the Training Process

#### 4.2. Evaluation Metrics

#### 4.3. Overall Accuracy Evaluation of SS on MCT Dataset

#### 4.4. Evaluation of the Robustness against the Variations in Rotational Speed

## 5. Conclusions

**Figure 1.**The images of MCT blades and the corresponding GT under different conditions of attachment simulation. The red box in the images indicates the area covered by the attachment. The forms of attachment are: (

**a**) healthy; (

**b**) single blade sparsely attached; (

**c**) single blade densely attached; (

**d**) double blades densely attached; (

**e**) triple blades densely attached.

Parameters | Value |
---|---|

Optimizer | AdamW |

Batch size | 4 |

Initial learning rate | 0.001 |

Epochs | 100 |

Shuffle | True |

Photometric Distortion | True |

Methods | mIoU |
---|---|

SegFormer-MiT-B0 | 92.95 |

SETR-Naive-Base | 94.26 |

IEWSTF (Ours) | 96.99 |

