Abstract
To address the challenges of insufficient robustness and limited feature extraction in photovoltaic module image segmentation under complex scenarios, we propose a high-precision PV module segmentation model (Pv-UNet) that integrates Transformer and improved U-Net architecture. The model introduces a MultiScale Transformer in the encoding path to achieve cross-scale feature correlation and semantic enhancement, combines residual structure with dynamic channel adaptation mechanism in the DoubleConv module to improve feature transfer stability, and incorporates an Attention Gate module in the decoding path to suppress complex background interference. Experimental data were obtained from UAV visible light images of a photovoltaic power station in Yuezhe Town, Qiubei County, Yunnan Province. Compared with U-Net, BatchNorm-UNet, and Seg-UNet, Pv-UNet achieved significant improvements in IoU, Dice, and Precision metrics to 97.69%, 93.88%, and 97.99% respectively, while reducing the Loss value to 0.0393. The results demonstrate that our method offers notable advantages in both accuracy and robustness for PV module segmentation, providing technical support for automated inspection and intelligent monitoring of photovoltaic power stations.