Next Article in Journal
Trans-AODnet for Aerosol Optical Depth Retrieval and Atmospheric Correction of Moderate to High-Spatial-Resolution Satellite Imagery
Previous Article in Journal
Reading the Himalayan Treeline in 3D: Species Turnover and Structural Thresholds from UAV LiDAR
Previous Article in Special Issue
SAR Radio Frequency Interference Suppression Based on Kurtosis-Guided Attention Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Dual-Resolution Network Based on Orthogonal Components for Building Extraction from VHR PolSAR Images

by
Songhao Ni
1,2,
Fuhai Zhao
1,
Mingjie Zheng
1,*,
Zhen Chen
1 and
Xiuqing Liu
1
1
Department of Space Microwave Remote Sensing System, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
2
School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100190, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(2), 305; https://doi.org/10.3390/rs18020305
Submission received: 30 November 2025 / Revised: 12 January 2026 / Accepted: 14 January 2026 / Published: 16 January 2026

Highlights

What are the main findings?
  • A systematic analysis reveals that resolution enhancement alters scattering mechanisms, and a high-resolution SLC benchmark dataset is constructed to fill the research gap, verifying that end-to-end learning yields superior segmentation performance.
  • An Orthogonal Dual-Resolution Network (ODRNet) is developed to decompose complex-valued data into real and imaginary components, incorporating a bilateral fusion mechanism, to achieve high accuracy as a benchmark.
What is the implication of the main finding?
  • The study validates that the orthogonal representation of SLC preserves signal integrity, establishing a more generalize data representation strategy for deep learning applications.
  • The proposed ODRNet effectively balances semantic context and fine spatial details, offering a high-precision solution for building footprint extraction in complex urban environments.

Abstract

Sub-meter-resolution Polarimetric Synthetic Aperture Radar (PolSAR) imagery enables precise building footprint extraction but introduces complex scattering correlated with fine spatial structures. This change renders both traditional methods, which rely on simplified scattering models, and existing deep learning approaches, which sacrifice spatial detail through multi-looking, inadequate for high-precision extraction tasks. To address this, we propose an Orthogonal Dual-Resolution Network (ODRNet) for end-to-end, precise segmentation directly from single-look complex (SLC) data. Unlike complex-valued neural networks that suffer from high computational cost and optimization difficulties, our approach decomposes complex-valued data into its orthogonal real and imaginary components, which are then concurrently fed into a Dual-Resolution Branch (DRB) with Bilateral Information Fusion (BIF) to effectively balance the trade-off between semantic and spatial details. Crucially, we introduce an auxiliary Polarization Orientation Angle (POA) regression task to enforce physical consistency between the orthogonal branches. To tackle the challenge of diverse building scales, we designed a Multi-scale Aggregation Pyramid Pooling Module (MAPPM) to enhance contextual awareness and a Pixel-attention Fusion (PAF) module to adaptively fuse dual-branch features. Furthermore, we have constructed a VHR PolSAR building footprint segmentation dataset to support related research. Experimental results demonstrate that ODRNet achieves 64.3% IoU and 78.27% F1-score on our dataset, and 73.61% IoU with 84.8% F1-score on a large-scale SLC scene, confirming the method’s significant potential and effectiveness in high-precision building extraction directly from SLC.

1. Introduction

A building footprint is the outline of a building projected onto the ground in a top view, which can be used to infer the geographical location, spatial boundary, and occupied area of buildings. The precise extraction of building footprints is a critical task for urban planning, disaster assessment, and dynamic monitoring [1,2]. Synthetic Aperture Radar (SAR), with its all-weather, all-day imaging capabilities, provides a reliable means for continuous Earth observation. Moreover, employing comprehensive polarimetric SAR data across four channels enables the use of polarimetric coherences for characterizing man-made targets and classifying land cover [3]. Therefore, the extraction of building footprints using PolSAR data is a focal point of research within the PolSAR community [4,5,6].
Building footprint extraction is generally treated as a binary segmentation task, approached primarily via two methods: artificial feature extraction methods (such as statistical distribution and polarimetric decomposition) and data-driven deep learning methods. However, as the resolution of PolSAR imagery improves to the sub-meter level (Very High Resolution, VHR), artificial feature extraction methods face fundamental limitations. These methods primarily rely on simplified physical scattering models. For example, they often use polarimetric decomposition to equate a building’s scattering response to the double-bounce scattering from an ideal corner formed by a wall and the ground. Hu et al. [7] improved the decomposition methodology using a pure volume scattering model, effectively interpreting the overestimation of volume scattering (OVS) and buildings with large orientation angles (LOBs). Furthermore, some methods use man-made features like the RVI/PA plane [5] or Span/PISP plane [8] with statistical thresholds to distinguish buildings. This assumption is acceptable for low resolutions. However, the fine structures of buildings such as roofs, walls and windows are clearly visible in VHR images. This results in a complex spatial mix where multiple scattering mechanisms coexist, making a single double-bounce model insufficient for an accurate description. The overall description of the change with resolution increased is shown in Figure 1.
To overcome the limitations of traditional methods, researchers have turned to data-driven deep learning. Semantic segmentation models, represented by Fully Convolutional Networks (FCNs), U-Net, and their variants [9,10,11,12,13], have achieved great success in building extraction from remote sensing images. Some studies have successfully applied them to SAR images [14,15,16]. For instance, Wu et al. [17] have successfully mapped built-up areas across China utilizing an improved Unet network and single-polarimetric GF-3 images, while researchers [18] created a benchmark and dataset for individual building extraction from single-polarization images. Moreover, recent studies have introduced advanced architectures like the lightweight pyramid transformer to extract buildings in complex scenes such as port regions [19], demonstrating the strong potential of deep learning in SAR interpretation.
Despite this progress, existing deep learning methods still exhibit significant shortcomings when processing VHR PolSAR data, rooted in a lack of analysis of buildings’ scattering in VHR PolSAR images and the failure to design end-to-end, data-driven architectures. At the data representation level, current methods fail to achieve true end-to-end learning. The majority of networks convert the complex-valued data into real-valued data (amplitude, coherence matrix [ T ] or scattering components) [20]. This conversion, aimed at generating stable polarimetric features and suppressing speckle, requires spatial averaging (i.e., multi-looking). However, as we will demonstrate in Section 2, this operation inevitably blurs the outlines and details of buildings, degrading the valuable spatial information in VHR images. This reliance on preprocessed inputs traps researchers in a continuous cycle of feature selection and optimization, which fundamentally contradicts the core advantage of deep learning: allowing the network to learn features autonomously from raw data. To process complex data directly, one is to use complex convolution operations to construct a neural network [21]. For example, Kuang et al. [22] proposed a complex-valued diffusion model as a representation learner, Yu et al. [23] add a decoder to extend the Complex-Valued Neural Networks (CVNNs) to solve terrain segmentation tasks, and Xu et al. [24] address semantic segmentation by transforming BiseNetV2 into a complex-valued network, leveraging the polarimetric covariance matrix as input. However, the deployment of CVNNs within modern deep frameworks faces inherent obstacles. Theoretically, it has been shown that complex-valued neurons converge significantly slower than real-valued neurons [25]. In practice, CVNNs necessitate specialized layers and often simulate complex arithmetic using separate real and imaginary tensors, thereby increasing computational cost [26]. Crucially, recent parameter-matched comparisons reveal that RVNNs can match or even outperform CVNNs, offering a more efficient alternative for training and inference [27]. An alternative approach uses dual-branch networks for matching separated complex-valued data [21,23]; Zeng et al. [28] avoid using immature complex-valued networks and obtain competitive terrain segmentation results, implying that feature-level fusion of complex information is an effective technical route. However, this polar-coordinate-based representation has its own inherent challenges: amplitude and phase share a complex nonlinear relationship, phase data is prone to wrapping artifacts, and it is highly sensitive to noise.
In addition, buildings in urban scenes exhibit substantial scale variation, ranging from small residential units to large industrial complexes, and their footprints are often densely distributed with complex boundaries. Despite the effective use of multi-kernel methods for extracting discriminative features in ship detection [29,30], the challenges persist in pixel-wise segmentation. Furthermore, classic encoder–decoder structures sacrifice spatial resolution in exchange for a larger receptive field and semantic information [31], which is detrimental to extracting buildings with fine outlines and diverse sizes. In VHR PolSAR images, a target’s scattering is intrinsically correlated with its geometric properties; therefore, ensuring the integrity of spatial information becomes a critical requirement.
Inspired by the above discussions, this article first presents an analysis of the unique physical properties of VHR PolSAR images, which directly informs our network design. Based on this analysis, we propose an Orthogonal Dual-Resolution Network (ODRNet) for high-precision building footprint extraction directly from SLC. We decompose the raw data into its orthogonal real and imaginary parts as input instead of traditional feature with multi-looking, providing a more robust data foundation for end-to-end learning. This input is then fed into parallel branches: one branch maintains a high resolution to preserve fine spatial structures, while the other extracts multi-scale semantic information via downsampling, with efficient interaction facilitated by our designed Bilateral Information Fusion (BIF) module. The network’s effectiveness was then comprehensively validated on a large-scale SAR dataset.
The main contributions of this article are as follows:
1.
By utilizing the expanded data from the MSAW dataset [32], we have generated building footprints for 202 PolSAR SLC and formatted it as a semantic segmentation dataset. And the created dataset can serve as a standard dataset for related research and comparison, filling the gap in related research data. Based on the dataset, we analyze the scattering of buildings at sub-meter resolution and evaluate the impact of the spatial average on the network.
2.
We propose ODRNet that adopts a Dual-resolution Branch (DRB) with BIF to integrate the real and imaginary part of the data effectively with features of different scales. Additionally, the inclusion of Auxiliary heads for Polarization Orientation Angle (POA) prediction provides explicit physical supervision, ensuring that the network learns meaningful polarimetric representations from the separate inputs while facilitating enhanced optimization and convergence.
3.
Due to the significant scale differences between buildings and the scattering distribution related to the scale, we propose the Multi-scales Aggregation Pyramid Pooling Module (MAPPM). This module can fuse and extract multi-scale scattering information from the feature maps at different stages of the low-resolution branch. Additionally, to integrate information from dual-branch and fully utilize the information of complex-valued data, we design a Pixel-attention Fusion (PAF) module.
The remainder of this article is structured as follows. Section 2 focuses on the analysis of VHR PolSAR images and the process of feature selection. The proposed method is illustrated in Section 3. Section 4 describes the specific experiments and detailed analysis of the results. Section 5 provides an ablation and discussion. The conclusion of this article is drawn in Section 6.

2. Analysis of High-Resolution PolSAR Image

2.1. Representation of PolSAR Images

In traditional PolSAR processing, the coherency matrix [ T ] is obtained as a matrix producing the conjugate transpose of the Pauli basis k p with itself. In the actual situation, since the targets are coherent on PolSAR images, it is necessary to perform a multi-look operation for [ T ] . The corresponding mathematical model can be written as
[ T ] = k p · k p H ,
k p = 1 2 S H H + S V V S H H S V V 2 S H V T ,
where · represents multi-looking for spatial average, while the superscripts T and H are the vector transpose operator and the conjugate transposition, respectively.
Considering LOB effects, the POA θ , which is between the resolution cell and the radar line of sight, can be derived from T 33 ( θ ) = 0 [33]:
T 33 ( θ ) = 2 T 22 T 33 sin 4 θ 4 Re T 23 cos 4 θ .
Thus, the polarimetric orientation angle θ should be written as
θ = 1 4 tan 1 2 Re ( T 23 ) T 22 T 33 , ( π 4 < θ < π 4 ) .

2.2. The Relationship Between Resolution and Scattering

Due to the change in the resolution cell, the scattering components are altered, and the two-dimensional spatial distribution of the scattering also changes. Consequently, the dominant scattering characteristics required for building extraction change. To explore the impact of resolution on the two-dimensional spatial distribution of building scattering, we adopt the method proposed in [34] to simulate PolSAR images at different resolutions. The original image is from the 0.25 m resolution airborne PolSAR image published in the SpaceNet expanded dataset [32]. We use the selected areas from the VHR PolSAR image shown in Figure 2 to display the relationship between resolution and scattering.
First, we present the PolSAR images of Patch A at different resolutions as shown in Figure 3. It can be observed that, as different downsample ratios are used to simulate buildings at different resolutions, the scattering details on the building facades gradually disappear, and changes in the scattering components occur simultaneously. At high resolutions, the two-dimensional distribution of building scattering is strongly correlated with the building’s structure. In contrast, at low resolutions, the details of the buildings are lost and degenerate into a relatively single dominant scattering.
In order to eliminate the influence caused by the rotation angle of the building and demonstrate that the scattering distribution brought about by the improvement in resolution is strongly correlated with the building structure and that buildings cannot be extracted based on a single dominant scattering characteristic, we adopt the building extraction method of Hu [7] based on the scattering model.
The researchers use pure volume scattering to describe the scattering from the tree canopy and add a cross-scattering component to describe oriented buildings. Then, the method introduces the polarimetric asymmetry (PA) proposed by Lee with the helix component to form a urban revised rate r to achieve power redistribution for obtaining surface scattering P s + , double-bounce scattering P d + , volume scattering P v , helix scattering P c , and cross-scattering P c r o .
r = 1 λ 1 λ 2 s p a n 3 λ 3 · P cro + P c mean ( P cro + P c ) + P cro + P c ,
P d + = P d + P d P s + P d · r · P v , P s + = P s + P s P s + P d · r · P v , P v = ( 1 r ) · P v , P c = 2 Im ( T 23 ) , P c r o = 30 ( T 33 f v / 3 f c / 2 ) / ( 15 + cos 4 θ ) ,
where P s = f s ( 1 + β 2 ) , P d = f d ( 1 + α 2 ) , P v = f v , P c = f c , P cro = f cro . And f s , f d , f v , f c , and f cro are the parameters to be estimated, which is similar to the original Sato [33].
We conduct a quantitative analysis of low-rise buildings without rotation angles and high-rise buildings with large rotation angles, respectively. We use the areas marked by the red strips in Figure 4 and Figure 5 to illustrate the changes in single-bounce, double-bounce, and volume scattering along with the spatial distribution at different resolutions.
Through observation, it can be seen that, in high-resolution images, the intense double-bounce scattering is strongly correlated with the edges of the buildings and the angles formed between the building facades and the ground. For example, in the low-rise buildings shown in Figure 4, the order is intense single-bounce scattering, weak scattering, and then intense double-bounce scattering. In the high-rise building shown in Figure 5, there is strong double-bounce scattering first. Then, due to the complexity of the building facades, relatively strong volume scattering is exhibited, and subsequently, the dihedral angles formed by the building facades and the ground show intense double-bounce scattering. As the resolution decreases, the distribution of the above-mentioned scattering patterns along with the building structure gradually disappears, and the dominant scattering becomes double-bounce scattering. In this case, the effectiveness of Hu’s method can be observed. However, in high-resolution scenarios, the scattering distribution no longer shows a single dominant double-bounce scattering. Therefore, we introduce deep learning methods to achieve data-driven feature representation learning, aiming to further refine the application from urban area extraction to building extraction.

2.3. The Feature Selection

Prior work [2,35] typically relies on hand-crafted features rooted in the polarimetric coherency matrix [ T ] . Although spatial averaging is used to stabilize polarimetric features while suppressing speckle, it inevitably blurs boundaries and fine details in VHR PolSAR images. Accordingly, we take spatial averaging as an evaluation criterion and leverage an orthogonal representation of the raw complex data to fully exploit the end-to-end capability of deep learning.
Based on previous studies [36,37], spatial averaging can modify the spatial and statistical characteristics of PolSAR images when different numbers of multi-looking are employed, especially for using the complex-valued PolSAR Dataset. In this article, for VHR PolSAR images, an analysis of polarimetric features and spatial correlation is conducted from eigenvalues of [ T ] and the Equivalent Number of Looks (ENL) [38] by applying a Boxcar filter with different window sizes as multi-looking. The eigenvalues and eigenvectors of the coherence matrix [ T ] are calculated by
[ T ] = i = 1 3 λ i u i u i H ,
where λ i , i = 1 , 2 , 3 represents three eigenvalues of the coherence matrix [ T ] and u i , i = 1 , 2 , 3 are unit orthogonal eigenvectors. The ENL can be expressed as
ENL = ( 0.5227 β ) 2 ,
where β = V a r ( x ) E ( x ) is the ratio of the data x’s standard deviation to its mean.
From Figure 6b, it can be observed that the polarimetric features tend to stabilize as the number of looks increases. This process can obtain stable dominant scattering while introducing bias [37]. As shown in Figure 6a,c, with the changes in the amplitude of PolSAR data and assessment by the ENL, it demonstrates that multi-looking makes the image smoother, reduces the standard deviation and suppresses coherent speckle. However, the side effect of multi-look is blurring the PolSAR images and losing the intrinsic information, as there is a high correlation between the scattering and the target structures in the sub-meter-resolution PolSAR image, and the ENL in VHR PolSAR images improved slightly.
A comparison of polarimetric feature inputs in Table 8 reveals that complete spatial information and diverse polarimetric features can achieve better metrics and results for the segmentation task of building footprint extraction. And automatically extracted features also have more representational and generalization capabilities than manually crafted features. Thus, we use raw data [ S ] as the input for the neural network in this article, which fully represents the scattering information of targets without losing spatial information.

3. Method

Informed by the analysis in Section 2, which highlights the unique challenges of VHR PolSAR data, this section details the architecture of our proposed ODRNet, illustrated in Figure 7. Our core design is to process SLC data directly, avoiding the information loss associated with traditional preprocessing steps.

3.1. The Overall Structure and the Format of Input Data

Different representations of SAR images are shown in Figure 8. Although the phase can provide more information in previous classification tasks [21,23], the dense segmentation task requires more geometric features and spatial information of the targets. The phase, which follows a uniform distribution and is sensitive to noise, shares a complex nonlinear relationship with amplitude. Conversely, decomposing the complex-valued data into its real and imaginary parts offers a more robust foundation. These components form an orthogonal basis that completely preserves the original signal information without loss, effectively avoiding the aforementioned issues of phase wrapping and nonlinearity. Furthermore, as shown in Figure 8, the real and imaginary components tend to follow a Gaussian distribution in most situations. This statistical property is highly advantageous for modern deep neural networks, as it aligns well with normalization techniques like Batch Normalization, promoting faster convergence and improved training stability. The Gaussian distribution can be expressed as
f ( x ) = 1 2 π σ e ( x ) 2 2 σ 2 ,
where f ( x ) represents the frequency of the histogram, x represents the value and σ 2 is the variance. Meanwhile, in the field of communication, IQ signals have been widely used as network inputs for modulation recognition [39,40].
Informed by the statistical analysis of orthogonal components discussed in Section 3.1 and empirically supported by the ablation study in Table 4, the raw data can be written as
[ S ] = R HH R HV R VH R VV + j I HH I HV I VH I VV = [ R ] + j [ I ] .
In the end, the input data is transformed into the form of (10); R , I R H × W × 4 corresponds to two branches, where H and W are the sizes of the original PolSAR image. To effectively process the parallel real and imaginary parts, our ODRNet architecture is centered around a DRB. This architecture facilitates progressive interaction between its high-resolution and low-resolution pathways via BIF modules. The low-resolution branch, enhanced by a MAPPM, is responsible for capturing rich semantic and multi-scale contextual information, while the high-resolution branch preserves fine-grained spatial details. This representation from Encoder is then passed to a lightweight decoder to generate the segmentation result.

3.2. Dual-Resolution Branch

The DRB forms the core of our encoder, which comprises two parallel pathways that process the real and imaginary components, respectively, for balancing the trade-off between semantic and spatial details. In the following, we will detail how its two branches use similar backbones for extracting features, and BIF for interacting and progressively fusing features.

3.2.1. Backbones

The backbones are primarily composed of the blocks of ResNet [41]: Basic block and Bottleneck, which is used as an encoder for feature extraction. The backbones of different branches are similar, but there are differences in the parameters and scales of the feature map to ensure the feature extraction and interaction at the dual resolution.
The input order in this section is that R corresponds to the low-resolution branch and I corresponds the other branch; the ablation experiments are shown in Section 4.3. The Deep Stem module performs feature map reduction and preliminary feature extraction through continuous convolution blocks. The obtained feature F s L , F s H R H / 4 × W / 4 × C 1 can be formulated as follows, where L , H correspond to the low-resolution branch and high-resolution branch, respectively:
F n ( · ) = ReLU ( BN ( Conv n × n ( · ) ) ) ,
F s H = F H 3 3 ( F H 2 3 ( F H 1 3 ( I ) ) ) , F s L = F L 3 3 ( F L 2 3 ( F L 1 3 ( R ) ) ) .
where Conv 3 × 3 ( · ) represents the 3 × 3 convolution layer. After convolution, BN denotes the BatchNorm layer, which is used to perform feature normalization to reduce network overfitting and gradient vanishing [42]. Moreover, to improve the nonlinear ability and achieve fast convergence, the rectified linear unit (ReLU) is used as the activation layer [43]. F 3 ( · ) represents a complete convolution module with a kernel size of 3.
The detailed architecture of the backbones for extracting features is shown in Figure 7 and Table 1. Suppose that F k L R H / S k L × W / S k L × C k L are the feature maps at different stages k within the low-resolution branch, where k = 1 , 2 , 3 , 4 ; S k L = 2 k + 2 is the corresponding downsampling to original input images; and C k L = 2 k × C 1 is the channel dimension of F k L . In addition, F k H R H / S k H × W / S k H × C k H keeps the size and channel dimension of the feature maps unchanged, S k H = 2 3 and C k H = 2 × C 1 . Note that, for F H k in Stage 5, there is a Bottleneck with expansion = 2 , so C 5 H = 4 × C 1 .
On one hand, the low-resolution branch reduces feature maps and increases channel dimensions to aggregate global information and emphasize the extraction of multi-scale information, On the other hand, the high-resolution branch maintains larger feature maps during the feature extraction process to preserve detail features of the building footprint and emphasize the extraction of spatial information.

3.2.2. Bilateral Information Fusion

The Bilateral Information Fusion (BIF) module is the key component enabling progressive interaction between the high-resolution and low-resolution branches. As illustrated in Figure 9, BIF facilitates a bidirectional information exchange. The “High-to-Low” path injects detailed spatial information into the low-resolution branch via downsampling convolutions. Conversely, the “Low-to-High” path integrates semantic context into the high-resolution branch using convolutions to compress channel dimensions before upsampling.
Assuming that the high-resolution input is F k H and the low-resolution input is F k L , Figure 9 illustrates the implementation of BIF, which can be represented as follows:
F k L - H = ReLU ( F k 1 H + Up ( H / s k H , W / s k H ) ( W L 1 ( F k 1 L ) ) ) , F k H - L = ReLU ( F k 1 L + W n × H 3 ( F k 1 H ) ) .
where W n × H 3 represents n convolutions with stride 2 to downsample the high-resolution feature map to the same shape of the low-resolution’s, and if two branches’ feature maps are the same scale, the stride is 1. Up ( H / s k H , W / s k H ) means upsampling the feature maps to H / s k H × W / s k H . And F k L - H means the feature of Low–High in stage k. The BIF module is strategically inserted before the final ReLU activation of each stage’s residual block. This placement ensures consistency with the residual learning paradigm while establishing direct cross-branch communication. Through this mechanism, BIF not only facilitates the fusion of spatial and semantic information across dual resolutions but also achieves an implicit mixing of the initial real and imaginary data streams at the feature level, significantly enhancing the network’s representative power.
Based on the above description of DRB, the overall Encoder in the lower stages—Stage 1 and Stage 2—information interaction is achieved by using feature maps of the same scale, which fully integrates the information from the real and imaginary parts; in the higher stages—Stage 3 and Stage 4—feature extraction is conducted with a focus on different resolutions. The features from various scales are fused, enabling the generation and extraction of more robust and representative features between the two branches.

3.3. Multi-Scale Aggregation Pyramid Pooling Module

Previous research [44] indicates that remote sensing images have buildings of different scales and the scattering distribution and dominant scattering of buildings vary at different resolutions, with an example shown in Figure 3, Figure 4 and Figure 5. To effectively capture these multi-scale features, the low-resolution branch is augmented with a Multi-scale Aggregation Pyramid Pooling Module (MAPPM), as illustrated in Figure 10.
The MAPPM first employs a modified ConvMixer operator to aggregate features from different stages of the low-resolution branch. Inspired by [45], this operator is adept at mixing spatial and channel information with high computational efficiency by using large-kernel depthwise separable convolutions and residual connection. The formulation is shown below:
D n ( · ) = ReLU ( BN ( DWConv n × n ( · ) ) ) ,
F k Mixer = M 7 ( F k L ) = F 1 ( F k L + D 7 ( F k L ) ) ,
where F k Mixer denotes the feature map from Stagek, and DWConv n × n differs from regular convolution in terms of setting groups, where each convolution kernel corresponds to one channel, so the channel information needs to be weighted to generate a new feature map by PWConv = F 1 .
Subsequently, the aggregated features F con are fed into a pyramid fusion structure. This structure processes the features through multiple parallel branches, including global pooling and convolutions with varying receptive fields, to create a comprehensive multi-scale contextual representation F k M :
F con = F 3 ( Concat ( Up ( H / 8 , W / 8 ) ( F k Mixer ) ) , k [ 1 , 4 ] ,
F k M = F k 3 ( Up ( F 1 ( Global ( F con ) ) + F k + 1 M ) , k = 1 ; F k 3 ( Up ( F n k ( F con ) + F k + 1 M ) , k [ 2 , 4 ] ; F k 1 ( F con ) , k = 5 ,
F PPM = F 1 1 ( F con ) + F 2 1 ( Concat ( F k M ) ) ,
where Global represents the global pooling layer, and k is the k-th branch from top to bottom. Considering computational efficiency, the MAPPM uses feature maps with a size of 1/8. However, for semantic segmentation, high-resolution feature maps are necessary to preserve spatial information and edge details. Following the same approach as Deeplabv3+, we perform upsampling on the output of the pyramid module and concatenate it with lower-level feature maps. Then, a 1 × 1 refinement convolution is applied to obtain the final output F D L .
F D L = F 1 ( Concat ( M 7 ( F s L ) , Up ( H / 4 , W / 4 ) ( F PPM ) ) ) .

3.4. Pixel-Attention Fusion Module

Following the encoder, the Pixel-attention Fusion (PAF) module is employed to intelligently integrate the specialized features from the two branches. The high-resolution branch provides fine-grained spatial details, while the low-resolution branch offers rich semantic context. The PAF adaptively merges this complementary information at a pixel level.
The architecture is shown in Figure 11. Suppose that the outputs of the high- and low-resolution branches are F 5 H and F D L , respectively. In order to maintain consistency with the scale of the low-resolution branch, the high-resolution branch is also concatenated with the low-level spatial information F s H from the Stem stage. The output can be represented as
F D H = F 1 ( Concat ( M 7 ( F s H ) , Up ( H / 4 , W / 4 ) ( F 5 H ) ) ) ,
δ = Sigmoid ( W 1 ( W H 1 ( F D H ) · W L 1 ( F D L ) ) ) ,
F P = F 1 ( δ · F D H + ( 1 δ ) · F D L ) ,
where F P is the output of PAF. By calculating the weights of each pixel in the feature map, the fusion module achieves pixel attention guidance, which is inspired by the attention mechanism.

3.5. Loss Function

In this article, we add an additional supervision as shown in Figure 7. This auxiliary head is specifically designed to predict the POA. This mechanism serves a critical purpose: it supervises the fusion of the real and imaginary components at the same scale, ensuring that the network learns physically meaningful representations consistent with the target’s orientation:
F A u x = F 3 ( F 1 L - H + Up ( F 3 L - H ) ) ,
where F A u x is the output of the Auxiliary Head, and F k L - H corresponds to the features of Low-to-High in the i-th stage.
The final loss can be expressed as
Loss = α L aux + L seg ,
where L seg represents the standard Cross-Entropy (CE) loss for the main segmentation task. L aux denotes the Mean Squared Error (MSE) loss employed for the POA regression, which quantifies the divergence between the predicted angle and the ground truth:
L C E ( y , p ) = c = 1 C y c log ( p c ) , L MSE ( y p o a , y ^ p o a ) = 1 N j = 1 N ( y j p o a y ^ j p o a ) 2 .
Here, C represents the total number of semantic classes, p c denotes the predicted probability for class c, and y c serves as the ground truth indicator. For the auxiliary L MSE , N is the total number of pixels, and y j p o a and y ^ j p o a correspond to the ground truth and predicted POA values for the pixel, respectively. Finally, the weight assigned to the Auxiliary Head is α = 0.4 . In addition, the Auxiliary Head is discarded in the inference stage, which can enhance the accuracy for this task without any additional cost.

4. Experiments

In this section, we first introduce the experimental data and environment setup, including parameter settings and evaluation metrics. Then, in Section 4.2, we compare it with other segmentation methods. Finally, in Section 4.3, we design corresponding ablation experiments to validate the effectiveness of the proposed modules.

4.1. Experimental Setting

In order to evaluate the performance of the proposed method, we use strip images from the X-band airborne SAR provided by Ceplla, which includes HH, HV, VH and VV polarization images. The specific experimental data are described as follows:

4.1.1. Dataset Description

Since only the training set of the Multi-Sensor All-Weather Mapping (MSAW) dataset [32] has ground truth, it causes inconsistencies in comparisons between different papers. And the absence of complex-valued data in MSAW has hindered further investigation and research in related work.
The situation has been improved with the release of the SpaceNet expanded dataset, which includes 202 PolSAR SLC (0.25 m × 0.25 m) with ground truth labels for training, validation and testing. These labels are provided in polygon format, and the dataset is split by a line in the EPSG:32631. Optical images corresponding to the training and validation areas are also provided. These images collected building data in Rotterdam, Netherlands, on 15, 17 and 29 September 2019, as shown in the coverage area in Figure 12. In order to make the generated dataset more consistent and suitable for further research and as a benchmark for evaluating related methods, we have developed a workflow to ensure the determinism of dataset generation.
Our dataset generation workflow comprises three main stages. First, the raw SLC data is radiometrically calibrated using the provided metadata, resulting in four-channel, SLC data. Second, the ground truth polygons are geospatially filtered to match the coverage of the SLC scenes and then rasterized into binary masks. Subsequently, the calibrated imagery and corresponding masks are tiled into non-overlapping 512 × 512 pixel patches. Finally, to mitigate class imbalance, patches containing only background pixels are discarded. This process yields a final dataset consisting of 32,716 patches for training and 22,298 patches for validation. To ensure reproducibility, the source code for this data preparation pipeline will be made publicly available on GitHub.
In the experiment, we use the dataset called Expanded MSAW generated by a unified workflow to evaluate the performance of the methods. Since the test ground truth is not available for optical images, we only utilize training and validation datasets to ensure consistency in comparisons of multi-source fusion methods. The comparison of datasets before and after processing is shown in Table 2, and the statistical distribution of the dataset is shown in Figure 13.

4.1.2. Implementation Details

All the code and experiments in this article are based on the MMSegmentation code repository, the deep learning framework is Pytorch 1.13.1, and the workstation is based on NVIDIA 3090 GPUs. During training, stochastic gradient descent (SGD) [46] is used as the optimizer with a learning rate of 0.01 and weight decay of 0.0005. A linear learning rate schedule is employed, in which the initial rate is multiplied by 1 iter max _ iter power each iteration with a power of 0.9. The batch size is set to 12, and the total number of training iterations is 160K.

4.1.3. Evaluation Metrics

To assess the performance of different methods under comprehensive conditions, we employ four widely used evaluation metrics for building footprint extraction, including IoU, F1, Precision and Recall [16]. Pixels correctly predicted as the building category are denoted as TP, buildings falsely classified as background are denoted as FP, background correctly identified as background is denoted as TN, and background falsely classified as buildings is denoted as FN. Therefore, the metrics are defined as follows:
Precision = TP TP + FP ,
Recall = TP TP + FN ,
IoU = TP TP + FP + FN ,
F 1 = 2 × Precision × Recall Precision + Recall .

4.2. Results and Comparison

To verify the superiority of our proposed method, comparison experiments with other segmentation methods are conducted on the Expanded MSAW dataset. The results are shown in Table 3. The comparison methods not only contain several computer vision segmentation methods (FCN, Unet, Ocrnet, DeepLabV3+, Segformer and LSKNet) but also include some multi-branch and PolSAR terrain segmentation methods (LAM-CV-BiSeNetV2, TS-SHES and L-CV-Deeplabv3+). Especially, considering the different input requirements of different networks, we use an eight-channel real-valued representation of the complex SLC for computer vision segmentation methods, which avoids the information loss introduced by amplitude-only inputs. TS-SHES uses both amplitude and phase as input, and LAM-CV-BiSeNetV2 and L-CV-Deeplabv3+ use [ T ] as input.
From Table 3, it can be seen that the proposed ODRNet achieves the best overall performance on the Expanded MSAW dataset. Compared with computer vision segmentation methods using the same inputs, ODRNet shows a clear margin, with an IoU improvement of at least 4.55%. In addition, we report an input-representation ablation on SegFormer and LSKNet to quantify the effect of input completeness. Using complex-valued data instead of amplitude yields consistent but modest gains (SegFormer: +0.90% IoU; LSKNet: +1.11% IoU), indicating that merely concatenating more channels brings limited improvement. Its superiority is also pronounced against methods tailored for PolSAR data. For instance, it surpasses TS-SHES (which uses amplitude and phase) by 8.1% in IoU and outperforms LAM-CV-BiSeNetV2 (which uses the coherency matrix [T]) by 5.83% in IoU. These results strongly support our core hypothesis. By processing the orthogonal real and imaginary components of SLC, our method avoids the information loss inherent in conventional spatial averaging. Moreover, this orthogonal representation provides a more favorable input for deep networks, enabling a higher performance potential. Crucially, this superior accuracy is achieved with exceptional computational efficiency; at 66.036 GFLOPs, ODRNet is significantly more efficient than most competitors.
This quantitative superiority is substantiated by a qualitative comparison of the segmentation resultsacross several challenging scenarios (depicted in Figure 14). This visual analysis highlights ODRNet’s practical advantages over competing methods.
First, Figure 14(a1) displays the building with irregularly shaped features, which has a relatively uniform angle map and a minor calculated orientation angle. Our method excels in preserving the integrity of irregular building shapes with other methods.
Second, different parts of the building in Figure 14(a2,a3) show different scattering characteristics, with their Pauli decomposition illustrated in Figure 14(b2,b3). These buildings’ edges demonstrate typical double-bounce scattering, while their main structures exhibit either subdued scattering or pronounced volume scattering. For the segmentation result shown in Figure 15(b2,b3), ORDNet can accurately distinguish buildings with different scattering characteristics and achieve accurate building area extraction.
Third, the scenarios shown in Figure 14(b4,b5) have the typical issue of LOBs with an orientation angle of about 45 . According to Figure 14(b4,b5), the calculated orientation angles are above 30 and + 20 , respectively, and the calculated angle maps are more confusing and not consistent. The results are shown in Figure 15, which shows that the data-driven automatic feature extraction can solve the LOB problem more effectively. Meanwhile, ORDNet achieves the preservation of building edges and the correct classification of similar scattering characteristics such as cars in Figure 14(a4), which avoids commission error and improves the recall compared with other methods.
In summary, the visual results in Figure 15 clearly show ODRNet’s better performance in handling complex shapes, complex distribution of scattering and different building angles. This proves the benefits of our method, which learns directly from SLC by decomposing into real and imaginary components.

4.3. Ablation Experiments

In this section, we conduct a series of ablation experiments using the dataset mentioned in Section 4.1 to validate the effectiveness of the proposed method and each component: the results for different input formats are shown in Table 4; the results for each component are shown in Table 5; the comparison of MAPPM and PAF is demonstrated in Table 6 and Table 7, respectively.
The Baseline only has the backbones of the DRB in the Encoder, and uses sum operators to fuse the feature for obtaining results through the Decoder Head described in Section 3. The same training settings are used for all ablation experiments.

4.3.1. Analysis of the Input Format

To verify the selection of input data formats, we compared the performance of ODRNet using different input formats. The results shown in Table 4 clearly indicate that the performance of the ODRNet is significantly superior for the real and imaginary input formats compared to the amplitude and phase. Therefore, we can conclude that the fusion of real and imaginary features at the pixel level is more effective than using amplitude and phase in this dataset and task. Additionally, since the overall network architecture is asymmetric, the order of input data also has some influence on the final results. As indicated in Table 4, the best result is achieved when the low-resolution branch inputs real data and the high-resolution branch inputs imaginary data, which improves the IoU by 2.5%, F1 by 1.88%, Precision by 0.39% and Recall by 3.21%. Under the circular complex-Gaussian speckle assumption, the real and imaginary parts are statistically rotationally symmetric in the complex plane and therefore do not imply an intrinsic physical role separation. We attribute the observed order sensitivity to inductive bias and optimization dynamics introduced by the asymmetric dual-branch design, and we adopt this input ordering in all subsequent experiments.

4.3.2. Analysis of the Key Components

The addition and comparison of all components are carried out by adding them to the baseline model for comparison, thereby deriving corresponding conclusions.
To systematically validate our architectural design and the contribution of each component, we conducted a comprehensive ablation study, with the results summarized in Table 5. The study first establishes the fundamental importance of our dual-resolution design. A One-Branch model, which naively stacks the real and imaginary parts as an eight-channel input, achieves a modest IoU of 58.93%. This is considerably lower than our Baseline model (59.93% IoU), which employs a simple dual-branch structure without any advanced interaction. This initial comparison underscores that a parallel processing paradigm is necessary to effectively utilize the distinct information within the complex-valued data.
The BIF module enables progressive interaction between the branches. Without it, the branches operate in isolation, leading to suboptimal fusion. The Grad-CAM visualization, shown in Figure 16, would show the high-resolution branch activating indiscriminately on all high-frequency details (including noise), while the low-resolution branch produces only coarse semantic blobs. Introducing BIF yields a significant 1.58% IoU improvement. As visualized by the feature maps in Figure 16, BIF facilitates a crucial exchange: the high-resolution branch receives semantic guidance to refine its focus on building-related edges, while the low-resolution branch incorporates precise spatial cues, effectively suppressing irrelevant activations.
Building upon this interactive framework, we integrated the remaining modules. The MAPPM addresses building scale variance, boosting the IoU by a further 0.69%. The PAF module provides a more robust, attention-guided final fusion, adding another 0.4% IoU. Finally, the Auxiliary Head eases the optimization of the deep network, contributing the final performance leap to our full ODRNet (64.3% IoU). This synergy is evident in the final encoder outputs shown in Figure 16e, where the high-resolution branch preserves sharp edges and the low-resolution branch excels at suppressing interference, leading to a final segmentation result that is both accurate and clean.

4.3.3. Analysis of the MAPPM

Table 5 demonstrates the improvement in segmentation performance achieved by MAPPM. Compared to conventional multi-scale modules, we made modifications to MAPPM by adding feature aggregation and information interaction between different scales. Table 6 clearly indicates that the MAPPM achieves improvements of 1.52% in IoU and 1.13% in F1, compared to ODRNet without MAPPM.
To illustrate the impact of MAPPM more intuitively, some visual segmentation results are generated and presented in Figure 17 by Grad-CAM [50], which utilizes the gradients streaming into the final convolution layer of networks to highlight the crucial part inputs for inferencing results. In the building footprint extraction, simply merging high- and low-resolution features, as shown in the red box of Figure 17b, leads to several issues: low activation scores for building footprints, missing small sizes of buildings, and the presence of erroneous targets. This network fails to adequately capture the in-depth features across multiple scales, resulting in insufficient focus on buildings. By introducing the MAPPM in the low-resolution branch, as illustrated in Figure 17c, multi-scale buildings achieve higher activation scores, and disturbed targets are effectively suppressed. The MAPPM enhances the capture of multi-scale building features in PolSAR images by aggregating features across scales and facilitating interactions between different scales.

4.3.4. Analysis of the PAF

To validate the design of our PAF module, we compared its performance against two common baseline strategies: channel-wise concatenation (Concat) and element-wise summation (Sum). As shown in Table 7, while both baseline methods perform reasonably well, PAF consistently outperforms them across all metrics. Specifically, it achieves an IoU improvement of 0.49% over Concat and 0.46% over Sum, with corresponding F1-score gains. This suggests that naive fusion mechanisms are insufficient for optimally integrating the complementary features from the dual branches. In contrast, PAF’s ability to dynamically arbitrate the contributions of spatial-rich and semantic-rich features at a pixel level is crucial for resolving conflicts and producing a more refined segmentation output.

5. Discussion

5.1. Influence of the Auxiliary Head

As illustrated in Figure 18, the segmentation performance exhibits an increase-then-decrease trend, peaking at α = 0.4 with an IoU of 64.3%. This initial improvement validates that the auxiliary POA task provides essential physical supervision, guiding the encoder to capture rotation-invariant and phase-dependent features. Optimizing the POA regression loss imposes an implicit yet rigorous physical constraint, compelling the network to exploit inter-channel phase-dependent correlations rather than relying solely on R e ( [ S ] ) / I m ( [ S ) components. However, performance degrades noticeably when α exceeds 0.6. This indicates that an excessive auxiliary weight causes the regression loss to dominate the gradient optimization, distracting the network from the primary segmentation objective. This leads to a “negative transfer” effect, where the model over-prioritizes angle prediction at the expense of pixel-wise classification. Consequently, α = 0.4 is empirically selected as the optimal equilibrium between physical constraints and semantic discrimination.

5.2. Influence of the Polarimetric Feature

In this section, we explore the impact of [ S ] and varying numbers of multi-looking on polarimetric features. The analysis, as summarized in Table 8, reveals a declining trend in the metrics IoU and F1 as the number of looks increases for [ T ] . Notably, the [ S ] demonstrates superior performance, obtaining at least 16.11% and 13.23% improvements in IoU and F1. We further evaluate non-coherent polarimetric decomposition features, including Cloude–Pottier decomposition and the physical model-based decomposition of Hu et al. [7]. Although these features offer physical interpretability, their segmentation performance remains markedly lower than when using raw scattering measurements: Hu’s decomposition achieves an IoU of 47.79% and an F1 of 64.67%, and Cloude–Pottier features achieve an IoU of 48.95% and an F1 of 65.73%.
In contrast, directly learning from the raw complex scattering representation [ S ] yields the best performance, reaching 64.30% IoU and 78.27% F1, at least +16.11% IoU and +13.23% F1 compared with the [ T ] as input. This suggests that, in VHR PolSAR imagery, building scattering is tightly coupled with geometric structures (edges, corners and roof/wall layouts), and aggressive spatial averaging can alter local scattering mixtures and remove high-frequency spatial cues. Therefore, preserving spatial integrity and learning features end-to-end from raw complex measurements is advantageous for footprint segmentation.

5.3. Large-Scale VHR PolSAR Image Validation

Finally, we use full polarization SLC with a size of 2341 × 14885 pixels from Rotterdam in the Netherlands to validate the effectiveness of the ODRNet. A visualization of the results is shown in Figure 19, displaying corresponding optical images. The SLC contains a variety of scene categories; from left to right are large buildings, building groups including different scales of buildings, small building groups, and regular community buildings. Consistent with the previous analysis of different scenarios, our method also achieves great segmentation results for large buildings, multi-scale buildings and community buildings in SLC. At the same time, due to the small number of pixels occupied by small buildings, the evaluation metrics reach 73.61% IoU and 84.8% F1. This proves the great potential of ODRNet applied for large-scale SLC.

6. Conclusions

In this article, we first conducted a systematic analysis of VHR PolSAR data, demonstrating that, at sub-meter resolutions, the scattering characteristics of buildings become exceedingly complex, and that conventional spatial averaging critically degrades essential geometric information. Based on these insights, we proposed ODRNet, a novel dual-resolution network designed to process raw SLC data directly. By decomposing the data into orthogonal real and imaginary components, our approach avoids the complexities of complex-valued operators and effectively leverages standard real-valued CNNs. Through a Dual-Resolution Branch (DRB) with Bilateral Information Fusion (BIF), the network progressively integrates rich semantic context with fine-grained spatial details, while the integration of POA supervision further ensures that the split components retain their physical coupling during feature learning.
Comprehensive experiments on our curated benchmark, derived from the Expanded MSAW dataset, validating the superiority of this approach. ODRNet significantly outperforms the state-of-the-art method, achieving improvements of at least 4.16% in IoU and 3.33% in F1-score, while maintaining high computational efficiency (66.036 GFLOPs). Therefore, this work not only presents a highly accurate and efficient model for all-weather building mapping but also establishes a strong benchmark, underscored by a principled analysis of the underlying data characteristics.

Author Contributions

Conceptualization, S.N.; Methodology, S.N.; Validation, S.N.; Formal analysis, S.N. and X.L.; Resources, Z.C.; Data curation, M.Z.; Writing—original draft, S.N.; Writing—review & editing, X.L.; Supervision, F.Z. and Z.C.; Project administration, F.Z. and M.Z.; Funding acquisition, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The National Civil Aerospace Pre-Research Project under Grant D010206.

Data Availability Statement

Publicly available datasets were analyzed in this study. The raw Single Look Complex (SLC) data are available from the SpaceNet 6 challenge hosted on AWS (aws s3 ls s3://spacenet-dataset/AOIs/AOI_11_Rotterdam/). The code used for dataset generation, the processing scripts, and the model architecture presented in this study are publicly available at https://github.com/Ni-Songhao/PFFN, accessed on 13 January 2026.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Belgiu, M.; Drǎguţ, L. Comparing Supervised and Unsupervised Multiresolution Segmentation Approaches for Extracting Buildings from Very High Resolution Imagery. ISPRS J. Photogramm. Remote Sens. 2014, 96, 67–75. [Google Scholar] [CrossRef] [PubMed]
  2. Adriano, B.; Yokoya, N.; Xia, J.; Miura, H.; Liu, W.; Matsuoka, M.; Koshimura, S. Learning from Multimodal and Multitemporal Earth Observation Data for Building Damage Mapping. ISPRS J. Photogramm. Remote Sens. 2021, 175, 132–143. [Google Scholar] [CrossRef]
  3. Hu, Y.; Fan, J.; Wang, J. Classification of PolSAR Images Based on Adaptive Nonlocal Stacked Sparse Autoencoder. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1050–1054. [Google Scholar] [CrossRef]
  4. Deng, L.; Wang, C. Improved Building Extraction with Integrated Decomposition of Time-Frequency and Entropy-Alpha Using Polarimetric SAR Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4058–4068. [Google Scholar] [CrossRef]
  5. Quan, S.; Xiong, B.; Xiang, D.; Zhao, L.; Zhang, S.; Kuang, G. Eigenvalue-Based Urban Area Extraction Using Polarimetric SAR Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 458–471. [Google Scholar] [CrossRef]
  6. Wang, Y.; Yu, W.; Wang, R.; Wang, L.; Ge, D.; Liu, X.; Wang, C.; Liu, B. An Improved Urban Area Extraction Method for PolSAR Data Using Eigenvalues and Optimal Roll-Invariant Features. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 6455–6467. [Google Scholar] [CrossRef]
  7. Hu, C.; Wang, Y.; Sun, X.; Quan, S.; Xiang, D. Model-Based Polarimetric Target Decomposition with Power Redistribution for Urban Areas. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 8795–8808. [Google Scholar] [CrossRef]
  8. Wang, Y.; Yu, W.; Hou, W. Five-Component Decomposition Methods of Polarimetric SAR and Polarimetric SAR Interferometry Using Coupling Scattering Mechanisms. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2021, 14, 6662–6676. [Google Scholar] [CrossRef]
  9. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nat 2015, 521, 436–444. [Google Scholar] [CrossRef]
  10. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
  11. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  12. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
  13. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar] [CrossRef]
  14. Wu, G.; Shao, X.; Guo, Z.; Chen, Q.; Yuan, W.; Shi, X.; Xu, Y.; Shibasaki, R. Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks. Remote Sens. 2018, 10, 407. [Google Scholar] [CrossRef]
  15. Shao, Z.; Tang, P.; Wang, Z.; Saleem, N.; Yam, S.; Sommai, C. BRRNet: A Fully Convolutional Neural Network for Automatic Building Extraction from High-Resolution Remote Sensing Images. Remote Sens. 2020, 12, 1050. [Google Scholar] [CrossRef]
  16. Wang, L.; Fang, S.; Meng, X.; Li, R. Building Extraction with Vision Transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–11. [Google Scholar] [CrossRef]
  17. Wu, F.; Wang, C.; Zhang, H.; Li, J.; Li, L.; Chen, W.; Zhang, B. Built-up Area Mapping in China from GF-3 SAR Imagery Based on the Framework of Deep Learning. Remote Sens. Environ. 2021, 262, 112515. [Google Scholar] [CrossRef]
  18. Xia, J.; Yokoya, N.; Adriano, B.; Zhang, L.; Li, G.; Wang, Z. A Benchmark High-Resolution GaoFen-3 SAR Dataset for Building Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5950–5963. [Google Scholar] [CrossRef]
  19. Zhang, B.; Wu, Q.; Wu, F.; Huang, J.; Wang, C. A Lightweight Pyramid Transformer for High-Resolution SAR Image-Based Building Classification in Port Regions. Remote Sens. 2024, 16, 3218. [Google Scholar] [CrossRef]
  20. Geng, J.; Zhang, Y.; Jiang, W. Polarimetric SAR Image Classification Based on Hierarchical Scattering-Spatial Interaction Transformer. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
  21. Zhang, Z.; Wang, H.; Xu, F.; Jin, Y.Q. Complex-Valued Convolutional Neural Network and Its Application in Polarimetric SAR Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7177–7188. [Google Scholar] [CrossRef]
  22. Kuang, Z.; Liu, K.; Bi, H.; Li, F. PolSAR Image Classification With Complex-Valued Diffusion Model as Representation Learners. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 12184–12201. [Google Scholar] [CrossRef]
  23. Yu, L.; Zeng, Z.; Liu, A.; Xie, X.; Wang, H.; Xu, F.; Hong, W. A Lightweight Complex-Valued DeepLabv3+ for Semantic Segmentation of PolSAR Image. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 930–943. [Google Scholar] [CrossRef]
  24. Xu, R.; Zhang, S.; Dong, C.; Mei, S.; Zhang, J.; Zhao, Q. Lightweight Attention Refined and Complex-Valued BiSeNetV2 for Semantic Segmentation of Polarimetric SAR Image. Remote Sens. 2025, 17, 3527. [Google Scholar] [CrossRef]
  25. Wu, J.H.; Zhang, S.Q.; Jiang, Y.; Zhou, Z.H. Complex-valued neurons can learn more but slower than real-valued neurons via gradient descent. In Proceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 10–16 December 2023. NIPS ’23. [Google Scholar]
  26. Trabelsi, C.; Bilaniuk, O.; Zhang, Y.; Serdyuk, D.; Subramanian, S.; Santos, J.F.; Mehri, S.; Rostamzadeh, N.; Bengio, Y.; Pal, C.J. Deep Complex Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  27. Almansoori, M.K.; Telek, M. Performance evaluation of Complex-Valued Neural Networks on real and complex-valued classification and reconstruction tasks. Mach. Learn. Appl. 2025, 22, 100742. [Google Scholar] [CrossRef]
  28. Zeng, X.; Wang, Z.; Feng, K.; Gao, X.; Sun, X. TS-SHES: Terrain Segmentation in Complex-Valued PolSAR Images via Scattering Harmonization and Explicit Supervision. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–20. [Google Scholar] [CrossRef]
  29. Ai, J.; Mao, Y.; Luo, Q.; Jia, L.; Xing, M. SAR Target Classification Using the Multikernel-Size Feature Fusion-Based Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
  30. Ai, J.; Tian, R.; Luo, Q.; Jin, J.; Tang, B. Multi-Scale Rotation-Invariant Haar-Like Feature Integrated CNN-Based Ship Detection Algorithm of Multiple-Target Environment in SAR Imagery. IEEE Trans. Geosci. Remote Sens. 2019, 57, 10070–10087. [Google Scholar] [CrossRef]
  31. Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
  32. Shermeyer, J.; Hogan, D.; Brown, J.; Van Etten, A.; Weir, N.; Pacifici, F.; Hansch, R.; Bastidas, A.; Soenen, S.; Bacastow, T.; et al. SpaceNet 6: Multi-Sensor All Weather Mapping Dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 196–197. [Google Scholar]
  33. Sato, A.; Yamaguchi, Y.; Singh, G.; Park, S.-E. Four-Component Scattering Power Decomposition with Extended Volume Scattering Model. IEEE Geosci. Remote Sens. Lett. 2012, 9, 166–170. [Google Scholar] [CrossRef]
  34. Zhao, F.; Mallorqui, J.J.; Lopez-Sanchez, J.M. Impact of SAR Image Resolution on Polarimetric Persistent Scatterer Interferometry with Amplitude Dispersion Optimization. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–10. [Google Scholar] [CrossRef]
  35. Kang, J.; Wang, Z.; Zhu, R.; Xia, J.; Sun, X.; Fernandez-Beltran, R.; Plaza, A. DisOptNet: Distilling Semantic Knowledge from Optical Images for Weather-Independent Building Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
  36. Kim, H.; Song, J.; Natsuaki, R.; Hirose, A. Dependence of Polarimetric Characteristics on Sar Resolutions: Experimental Analysis. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Yokohama, Japan, 28 July–2 August 2019; p. 3289. [Google Scholar] [CrossRef]
  37. Lee, J.S.; Ainsworth, T.L.; Kelly, J.P.; Lopez-Martinez, C. Evaluation and Bias Removal of Multilook Effect on Entropy/Alpha/Anisotropy in Polarimetric SAR Decomposition. IEEE Trans. Geosci. Remote Sens. 2008, 46, 3039–3052. [Google Scholar] [CrossRef]
  38. Lee, J.; Hoppel, K. Noise Modeling and Estimation of Remotely-Sensed Images. In Proceedings of the 12th Canadian Symposium on Remote Sensing Geoscience and Remote Sensing Symposium (IGARSS), Vancouver, BC, Canada, 10–14 July 1989; Volume 2, pp. 1005–1008. [Google Scholar] [CrossRef]
  39. Xu, J.; Luo, C.; Parr, G.; Luo, Y. A Spatiotemporal Multi-Channel Learning Framework for Automatic Modulation Recognition. IEEE Wirel. Commun. Lett. 2020, 9, 1629–1632. [Google Scholar] [CrossRef]
  40. Shao, M.; Li, D.; Hong, S.; Qi, J.; Sun, H. IQFormer: A Novel Transformer-Based Model with Multi-Modality Fusion for Automatic Modulation Recognition. IEEE Trans. Cogn. Commun. Netw. 2025, 11, 1623–1634. [Google Scholar] [CrossRef]
  41. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  42. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
  43. Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
  44. Sun, G.; Huang, H.; Zhang, A.; Li, F.; Zhao, H.; Fu, H. Fusion of Multiscale Convolutional Neural Networks for Building Extraction in Very High-Resolution Images. Remote Sens. 2019, 11, 227. [Google Scholar] [CrossRef]
  45. Trockman, A.; Kolter, J.Z. Patches Are All You Need? arXiv 2022, arXiv:2201.09792. [Google Scholar]
  46. Ruder, S. An Overview of Gradient Descent Optimization Algorithms. arXiv 2017, arXiv:1609.04747. [Google Scholar] [CrossRef]
  47. Yuan, Y.; Chen, X.; Wang, J. Object-Contextual Representations for Semantic Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 173–190. [Google Scholar]
  48. Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Proceedings of the Neural Information Processing Systems (NeurIPS), Virtually, 6–14 December 2021. [Google Scholar]
  49. Li, Y.; Li, X.; Dai, Y.; Hou, Q.; Liu, L.; Liu, Y.; Cheng, M.M.; Yang, J. LSKNet: A Foundation Lightweight Backbone for Remote Sensing. Int. J. Comput. Vis. 2024, 133, 1410–1431. [Google Scholar] [CrossRef]
  50. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision ICCV, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]
Figure 1. An illustration of how the resolution impacts the scattering distribution and application. Pd (double-bounce scattering), Ps (surface scattering), Pv (volume scattering), Pc (helix scattering), and Pro (cross-scattering) calculated by Hu’s method [7].
Figure 1. An illustration of how the resolution impacts the scattering distribution and application. Pd (double-bounce scattering), Ps (surface scattering), Pv (volume scattering), Pc (helix scattering), and Pro (cross-scattering) calculated by Hu’s method [7].
Remotesensing 18 00305 g001
Figure 2. Color-coded Pauli image with 0.25 m resolution of X-band in Training Set from SpaceNet expanded dataset [32]. Patches A, B, and C are used to simulate the effects of different resolutions. Red (double-bounce), Green (volume scattering), Blue (surface scattering).
Figure 2. Color-coded Pauli image with 0.25 m resolution of X-band in Training Set from SpaceNet expanded dataset [32]. Patches A, B, and C are used to simulate the effects of different resolutions. Red (double-bounce), Green (volume scattering), Blue (surface scattering).
Remotesensing 18 00305 g002
Figure 3. Simulating the feature changes of buildings as resolution decreases. (a) Original image with 0.25 m resolution. (b) 4× downsample. (c) 8× downsample. (d) 20× downsample. Red (double-bounce), Green (volume scattering), Blue (surface scattering).
Figure 3. Simulating the feature changes of buildings as resolution decreases. (a) Original image with 0.25 m resolution. (b) 4× downsample. (c) 8× downsample. (d) 20× downsample. Red (double-bounce), Green (volume scattering), Blue (surface scattering).
Remotesensing 18 00305 g003
Figure 4. (a) Enlarged image of Patch B. (b) Optical image from the Maxar Worldview-2 satellite. (cf) The scattering power in images with different resolutions.
Figure 4. (a) Enlarged image of Patch B. (b) Optical image from the Maxar Worldview-2 satellite. (cf) The scattering power in images with different resolutions.
Remotesensing 18 00305 g004
Figure 5. (a) Enlarged image of Patch C. (b) Optical image from the Maxar Worldview-2 satellite. (cf) The scattering power in images with different resolutions.
Figure 5. (a) Enlarged image of Patch C. (b) Optical image from the Maxar Worldview-2 satellite. (cf) The scattering power in images with different resolutions.
Remotesensing 18 00305 g005
Figure 6. The relationship between the spatial average and eigenvalues, obtained by calculating data within building footprints utilizing the Ground Truth as a mask. (a) Effect of the number of multi-looks on ENL. (b) Effect of the number of multi-looks on the eigenvalues of [ T ] . (c) Histograms of different polarizations for various multi-look numbers.
Figure 6. The relationship between the spatial average and eigenvalues, obtained by calculating data within building footprints utilizing the Ground Truth as a mask. (a) Effect of the number of multi-looks on ENL. (b) Effect of the number of multi-looks on the eigenvalues of [ T ] . (c) Histograms of different polarizations for various multi-look numbers.
Remotesensing 18 00305 g006
Figure 7. Framework of the proposed ODRNet. The architectures of Auxiliary Head and Segmentation Head are Decoder Head with the differences of input channels. PAF is Pixel-attention Fusion module, and MAPPM is Multi-scales Aggregation Pyramid Pooling Module.
Figure 7. Framework of the proposed ODRNet. The architectures of Auxiliary Head and Segmentation Head are Decoder Head with the differences of input channels. PAF is Pixel-attention Fusion module, and MAPPM is Multi-scales Aggregation Pyramid Pooling Module.
Remotesensing 18 00305 g007
Figure 8. Different representations of SAR images and their histograms. (a) Amplitude. (b) Phase. (c) Real part. (d) Imaginary part.
Figure 8. Different representations of SAR images and their histograms. (a) Amplitude. (b) Phase. (c) Real part. (d) Imaginary part.
Remotesensing 18 00305 g008
Figure 9. Architecture of the proposed BIF module between branches.
Figure 9. Architecture of the proposed BIF module between branches.
Remotesensing 18 00305 g009
Figure 10. Architecture of the proposed MAPPM.
Figure 10. Architecture of the proposed MAPPM.
Remotesensing 18 00305 g010
Figure 11. Architecture of the proposed PAF module.
Figure 11. Architecture of the proposed PAF module.
Remotesensing 18 00305 g011
Figure 12. The study area of Rotterdam located ( 51.88 °N, 4.39 °E) in the EPSG:4326 coordinate system. The red part represents the training set of building footprints, while the blue represents the validation set.
Figure 12. The study area of Rotterdam located ( 51.88 °N, 4.39 °E) in the EPSG:4326 coordinate system. The red part represents the training set of building footprints, while the blue represents the validation set.
Remotesensing 18 00305 g012
Figure 13. Distribution of size and height of the labeled buildings. (a) Histogram of building footprint size. (b) Histogram of building footprint height.
Figure 13. Distribution of size and height of the labeled buildings. (a) Histogram of building footprint size. (b) Histogram of building footprint height.
Remotesensing 18 00305 g013
Figure 14. Selected typical patches. (a1a5) Optical images. (b1b5) Color-coded Pauli images. (c1c5) Ground Truth. (d1d5) Orientation angle maps.
Figure 14. Selected typical patches. (a1a5) Optical images. (b1b5) Color-coded Pauli images. (c1c5) Ground Truth. (d1d5) Orientation angle maps.
Remotesensing 18 00305 g014
Figure 15. Building footprint extraction results obtained by different methods. White, red and blue denote true positives, false positives and false negatives, respectively. (a1a5) FCN. (b1b5) Unet. (c1c5) LSKNet (Re( [ S ] ); Im( [ S ] )). (d1d5) LAM-CV-BiSeNetV2. (e1e5) TS-SHES. (f1f5) L-CV-Deeplabv3+. (g1g5) Ours.
Figure 15. Building footprint extraction results obtained by different methods. White, red and blue denote true positives, false positives and false negatives, respectively. (a1a5) FCN. (b1b5) Unet. (c1c5) LSKNet (Re( [ S ] ); Im( [ S ] )). (d1d5) LAM-CV-BiSeNetV2. (e1e5) TS-SHES. (f1f5) L-CV-Deeplabv3+. (g1g5) Ours.
Remotesensing 18 00305 g015
Figure 16. The example of feature maps for ODRNet. (a) Pauli image (Top) and Ground Truth (Bottom). (b) The feature maps (Stage2) for the low- and high-resolution branch, respectively. (c) The feature maps after BIF. (d) The output of feature maps after Stage 4. (e) Decoder logits (Top) and the final segmentation map (Bottom).
Figure 16. The example of feature maps for ODRNet. (a) Pauli image (Top) and Ground Truth (Bottom). (b) The feature maps (Stage2) for the low- and high-resolution branch, respectively. (c) The feature maps after BIF. (d) The output of feature maps after Stage 4. (e) Decoder logits (Top) and the final segmentation map (Bottom).
Remotesensing 18 00305 g016
Figure 17. Visual analysis of the features extracted from various models using Grad-CAM. (a) Corresponding optical images. (b) Model without MAPPM. (c) Model with MAPPM.
Figure 17. Visual analysis of the features extracted from various models using Grad-CAM. (a) Corresponding optical images. (b) Model without MAPPM. (c) Model with MAPPM.
Remotesensing 18 00305 g017
Figure 18. Segmentation performance (IoU) with different settings of α .
Figure 18. Segmentation performance (IoU) with different settings of α .
Remotesensing 18 00305 g018
Figure 19. The segmentation results of a complex-valued PolSAR image. (a) Optical image from the Maxar Worldview-2 satellite. (b) Color-coded Pauli image from SpaceNet expanded dataset. (c) Ground Truth. (d) Segmentation result.
Figure 19. The segmentation results of a complex-valued PolSAR image. (a) Optical image from the Maxar Worldview-2 satellite. (b) Color-coded Pauli image from SpaceNet expanded dataset. (c) Ground Truth. (d) Segmentation result.
Remotesensing 18 00305 g019
Table 1. Detailed structure of the backbones (without BIF).
Table 1. Detailed structure of the backbones (without BIF).
Low-ResolutionHigh-Resolution
Stage 1Basic block × 2
Basic block stride = 2
Basic block
Basic block × 2
Basic block stride = 2
Basic block
Stage 2Basic block stride = 2
Basic block
Basic block × 2
Stage 3Basic block stride = 2
Basic block
Basic block × 2
Stage 4Bottleneck, stride = 2Bottleneck, stride = 1
Table 2. Comparison of datasets.
Table 2. Comparison of datasets.
DatasetMSAWExpanded MSAW
SensorsCapella Space’s sensorCapella Space’s sensor
Data formatIntensitySingle Look Complex
BandXX
PolarizationFullFull
Resolution0.5 m × 0.5 m0.25 m × 0.25 m
Size900 × 900 pixels512 × 512 pixels
Training set3401 images32,716 images
Validation set22,298 images
Table 3. Comparison of various segmentation methods on our datasets. (bold indicate the optimal value).
Table 3. Comparison of various segmentation methods on our datasets. (bold indicate the optimal value).
MethodInputIoU (%)Precision (%)Recall (%)F1 (%)Params (M)GFLOPs
FCN [11]Re( [ S ] ); Im( [ S ] )59.6576.1573.3674.7347.1198.05
Unet [12]Re( [ S ] ); Im( [ S ] )58.5571.8675.9873.8629.0203.22
Ocrnet [47]Re( [ S ] ); Im( [ S ] )59.7580.2370.0674.8070.5167.83
Deeplabv3+ [31]Re( [ S ] ); Im( [ S ] )59.7279.1470.8874.7842.0176.81
Segformer [48]Amplitude56.5573.6570.5272.0564.195.7
Re( [ S ] ); Im( [ S ] )57.4574.3671.6272.9864.195.8
LSKNet [49]Amplitude60.1476.2173.7274.9426.1101.3
Re( [ S ] ); Im( [ S ] )61.2577.2174.7375.9726.1101.4
LAM-CV-BiSeNetV2 [24] [ T ] 58.4777.1070.6973.7661.7403
TS-SHES [28]Amplitude; Phase56.269.1976.1771.9686.9191.39
L-CV-Deeplabv3+ [23] [ T ] 44.8261.4362.3861.938.1112.93
OursRe( [ S ] ); Im( [ S ] )64.379.8876.7378.2776.5766.036
Table 4. Influence of the input format in proposed method.
Table 4. Influence of the input format in proposed method.
Input DataIoU (%)Precision (%)Recall (%)F1 (%)
Low: Real
High: Imaginary
64.379.8876.7378.27
Low: Imaginary
High: Real
63.9579.4976.5978.01
Low: Amplitude
High: Phase
62.5378.475.5476.94
Low: Phase
High: Amplitude
61.879.4973.5276.39
Table 5. Influence of each component in the proposed method.
Table 5. Influence of each component in the proposed method.
ModelBIFMAPPMPAFAuxiliaryIoU (%)Precision (%)Recall (%)F1 (%)
Baseline 59.9374.9274.9774.94
61.5177.175.2676.17
62.279.5674.0376.7
62.678.6175.4677.0
63.3878.7376.7378.27
One-Branch 58.9376.6571.8374.16
ODRNet64.379.8876.7378.27
Table 6. Influence of the multi-scale module.
Table 6. Influence of the multi-scale module.
MAPPMIoU (%)Precision (%)Recall (%)F1 (%)
w/o62.7878.975.4577.14
w/64.379.8876.7378.27
Table 7. Influence of different fusion methods.
Table 7. Influence of different fusion methods.
MethodIoU (%)Precision (%)Recall (%)F1 (%)
Concatenate63.8179.4676.4177.9
Sum63.8479.8776.0977.93
PAF64.379.8876.7378.27
Table 8. Influence of the polarimetric feature in the proposed method.
Table 8. Influence of the polarimetric feature in the proposed method.
DataMulti-LookingIoU (%)Precision (%)Recall (%)F1 (%)
[ T ] 3 × 348.1961.9668.4565.04
5 × 546.464.562.3263.39
7 × 743.1277.7849.1860.26
9 × 942.1272.9849.959.27
H/A/ α 5 × 548.9563.8567.7265.73
Hu’s decomposition [7]5 × 547.7962.5067.0064.67
[ S ] 1 × 164.379.8876.7378.27
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ni, S.; Zhao, F.; Zheng, M.; Chen, Z.; Liu, X. A Dual-Resolution Network Based on Orthogonal Components for Building Extraction from VHR PolSAR Images. Remote Sens. 2026, 18, 305. https://doi.org/10.3390/rs18020305

AMA Style

Ni S, Zhao F, Zheng M, Chen Z, Liu X. A Dual-Resolution Network Based on Orthogonal Components for Building Extraction from VHR PolSAR Images. Remote Sensing. 2026; 18(2):305. https://doi.org/10.3390/rs18020305

Chicago/Turabian Style

Ni, Songhao, Fuhai Zhao, Mingjie Zheng, Zhen Chen, and Xiuqing Liu. 2026. "A Dual-Resolution Network Based on Orthogonal Components for Building Extraction from VHR PolSAR Images" Remote Sensing 18, no. 2: 305. https://doi.org/10.3390/rs18020305

APA Style

Ni, S., Zhao, F., Zheng, M., Chen, Z., & Liu, X. (2026). A Dual-Resolution Network Based on Orthogonal Components for Building Extraction from VHR PolSAR Images. Remote Sensing, 18(2), 305. https://doi.org/10.3390/rs18020305

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop