Road extraction is a unique and difficult problem in the field of semantic segmentation because roads have attributes such as slenderness, long span, complexity, and topological connectivity, etc. Therefore, we propose a novel road extraction network, abbreviated HsgNet, based on high-order spatial information global perception network using bilinear pooling. HsgNet, taking the efficient LinkNet as its basic architecture, embeds a Middle Block between the Encoder and Decoder. The Middle Block learns to preserve global-context semantic information, long-distance spatial information and relationships, and different feature channels’ information and dependencies. It is different from other road segmentation methods which lose spatial information, such as those using dilated convolution and multiscale feature fusion to record local-context semantic information. The Middle Block consists of three important steps: (1) forming a feature resource pool to gather high-order global spatial information; (2) selecting a feature weight distribution, enabling each pixel position to obtain complementary features according to its own needs; and (3) inversely mapping the intermediate output feature encoding to the size of the input image by expanding the number of channels of the intermediate output feature. We compared multiple road extraction methods on two open datasets, SpaceNet and DeepGlobe. The results show that compared to the efficient road extraction model D-LinkNet, our model has fewer parameters and better performance: we achieved higher mean intersection over union (71.1%), and the model parameters were reduced in number by about 1/4.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited