Next Article in Journal
Lightweight Unsupervised Homography Estimation for Infrared and Visible Images Based on UAV Perspective Enabling Real-Time Processing in Space–Air–Ground Integrated Network
Previous Article in Journal
Waveform Analysis for Enhancing Airborne LiDAR Bathymetry in Turbid and Shallow Tidal Flats of the Korean West Coast
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Three-Dimensional Imaging Method for Space Targets Utilizing Optical-ISAR Joint Observation

Graduate School, Space Engineering University, Beijing 101416, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(23), 3881; https://doi.org/10.3390/rs17233881
Submission received: 8 November 2025 / Revised: 24 November 2025 / Accepted: 27 November 2025 / Published: 29 November 2025
(This article belongs to the Section Remote Sensing Image Processing)

Highlights

What are the main findings?
  • Designs two dedicated semantic segmentation networks to automatically extract target regions from optical and ISAR images. The gradient information of observed images is effectively utilized to enhance the accuracy of target region extraction.
  • Proposes an improved 3D reconstruction method that iteratively combines octree-based space carving technique with projection-optimization-based region offset correction technique to achieve high-quality 3D reconstruction of targets.
What is the implication of the main finding?
  • Reduces manual intervention in the 3D reconstruction of non-cooperative targets and improve the automation level of target region extraction.
  • Significantly improves the efficiency of 3D reconstruction. The correction of target region offset ensures high-quality 3D imaging and enhances the capability of 3D reconstruction in complex scenarios.

Abstract

Three-dimensional (3D) reconstruction technology for space targets can provide information support such as target structures and dimensions for space missions including on-orbit services and fault diagnosis, which is crucial for maintaining the normal operation of space assets. Optical devices and ISAR (Inverse Synthetic Aperture Radar) can provide high-resolution two-dimensional (2D) images of space targets, serving as the primary means for space target observation. However, existing 3D imaging methods for space targets exhibit significant limitations: the fusion process of optical observation data and ISAR observation data lacks automation, and factors such as image offset that affect 3D imaging quality are not fully considered. To address these issues, this paper proposes a novel 3D imaging method for space targets utilizing optical-ISAR joint observation. This method first employs semantic segmentation networks to automatically extract target regions from optical and ISAR images. Then, it combines octree-space carving technology for efficient 3D reconstruction and performs correction of target region offset based on projection optimization to achieve high-quality 3D imaging. This method eliminates the need for manual target region extraction, improving the automation level of the algorithm. The application of octree-space carving technology greatly enhances the efficiency of 3D reconstruction. Moreover, by correcting target region offset, the proposed method delivers superior 3D imaging results. Simulation experiments demonstrate that the method exhibits significant superior performance in terms of reconstruction efficiency and imaging quality. Additionally, experiments based on measured data further verify the feasibility and practical application value of the proposed method.

1. Introduction

With the increase in space object populations and the growing complexity of space missions, refined space target observation has become crucial. Currently, ground-based optical telescopes and inverse synthetic aperture radar (ISAR) serve as the primary means for space target observation. While these devices can acquire two-dimensional (2D) high-resolution images of space targets, their inherent imaging principles fundamentally limit direct acquisition of 3D structural information, thereby significantly impeding the progress of related space missions [1,2,3,4]. Consequently, research on 3D imaging technology for space targets has become a current research focus in the field of space target observation.
According to the differences in observation configurations, existing 3D imaging methods for space targets can be categorized into three types: (1) optical-based 3D imaging [5,6,7,8], (2) ISAR-based 3D imaging [9,10,11,12,13], and (3) optical-ISAR fusion-based 3D imaging [14,15]. For optical 3D imaging, conventional approaches typically employ structure from motion (SfM) and multi-view stereo (MVS) techniques to reconstruct target structures [16,17], where SfM recovers camera poses and generates sparse point clouds through feature matching (e.g., SIFT, ORB) and bundle adjustment (BA), whereas MVS further generates dense point clouds via multi-view stereo matching. However, these hardware-free methods suffer from a strong dependence on accurate feature extraction. Consequently, their performance significantly degrades with weak textures, complex lighting, or severe occlusion. Notably, recent advances in deep learning have brought new breakthroughs to optical 3D reconstruction. For instance, Xu et al. [18] proposed a neural radiance field (NeRF)-based method, which implicitly represents 3D scene information without explicit feature extraction. In another study, Chang et al. [19] validated the 3D gaussian splatting (3DGS) technique through 3D reconstruction of space stations, using ground-based amateur telescope observation data. However, deep learning-based techniques still face significant practical challenges. They generally require dense sampled, multi-angle images as input and need extensive training to achieve optimal reconstruction quality. In ground-based space observations, however, constraints such as narrow field of view and limited observation windows often prevent the collection of sufficiently comprehensive and dense image data. These restrictions significantly hinder the practical deployment of deep learning techniques in real-world space target observation scenarios.
In the field of ISAR 3D imaging, both array ISAR [20,21] and interferometric ISAR (InISAR) [22,23] techniques face inherent limitations. Array ISAR system requires the construction of a complex multi-channel receiving array, while InISAR relies on high-precision multi-antenna observation configurations. These requirements not only increase the hardware complexity but also incur prohibitive costs, restricting their practical adoption. Consequently, single-antenna ISAR 3D imaging technology has emerged as a research hotspot. The factorization method is one of the most widely studied single-antenna ISAR 3D imaging methods. This approach decomposes the position matrix of scattering centers to derive the 3D structure of space targets. However, due to the sparse nature of ISAR images, the accurate extraction and association of scattering centers remain challenging, which restricts its practicality. According to the principles of radar imaging, 2D ISAR imaging can be viewed as the projection of the target’s 3D scattering characteristics onto the imaging projection plane (IPP) [24,25]. Building on this concept, Liu et al. [26] proposed the image sequence energy accumulation (ISEA) method, which reconstructs the target’s 3D structure using the accumulated projection energy in observed images. This approach eliminates the need to extract and associate scattering centers, thereby simplifying the process and enabling dense reconstructions. Nevertheless, it is only applicable to three-axis stabilized targets. To tackle these issues, Zhou et al. [27] jointly optimized the spin parameters and 3D structure of the target based on ISEA, achieving 3D imaging of spinning targets. However, this method requires a high signal-to-noise ratio (SNR), and its reconstruction quality is easily compromised by noise.
Due to the inherent differences in imaging principles, the IPPs of optical imaging and ISAR imaging exhibit significant disparities. Specifically, when optical and ISAR devices are co-located, their IPPs are mutually orthogonal. This characteristic not only provides a theoretical foundation for optical-ISAR joint observation systems but also drives research into 3D reconstruction techniques based on optical-ISAR fusion. Zhou et al. [14] proposed a co-located optical-ISAR 3D imaging method, which reconstructs target structures using contours extracted from both optical and ISAR images. Leveraging the orthogonality between the IPPs of optical imaging and ISAR imaging, this method achieves dense 3D reconstruction results. However, it requires manual contour extraction and is only applicable to three-axis stabilized targets. To address these limitations, Long et al. [15] proposed an automated ISAR-optical contour fusion 3D reconstruction method, which jointly estimates attitude and reconstructs the 3D structure. However, for spinning space targets, the single-station optical-ISAR joint observation system is susceptible to attitude estimation failure, which consequently degrades the 3D reconstruction quality. Additionally, existing methods neglect the impact of target region offsets in observation images on 3D imaging quality. In ISAR imaging, range alignment and phase adjust will introduce range and azimuth offsets in the observation images [28,29], while in optical imaging, CCD installation misalignments and calibration errors may cause horizontal and vertical offsets.
To address the aforementioned issues, a novel 3D imaging method for space targets utilizing optical-ISAR joint observation is proposed in this paper. First, an extended optical-ISAR joint observation system is adopted to estimate the attitude of the space target. Subsequently, two semantic segmentation networks are employed to automatically extract target regions from the observed images, respectively, providing feature inputs for 3D reconstruction. Finally, projection optimization is applied to correct the offsets of the extracted target regions, achieving high-quality 3D imaging of the space target. To the best of our knowledge, this is the first study to consider the impact of target region offsets in the 3D reconstruction process. Compared with existing works, the main contributions of this study are summarized as follows.
  • Automated High-Integrity Target Region Extraction: Two semantic segmentation networks are employed to extract target regions from observed images without manual intervention, significantly enhancing the automation level of target region extraction. These networks can effectively fuse the gradient information of observed images and leverage the guidance of the binary segmentation branch on the semantic segmentation branch, thereby improving the completeness of target region extraction.
  • High-Efficiency 3D Reconstruction: By combining octree technology with space carving technology, an octree of the space to be reconstructed is constructed. This method only needs to perform projection consistency verification on a small number of key nodes in the octree, replacing the process of sequentially performing projection consistency verification on all voxels to be reconstructed in traditional methods, thus significantly improving the efficiency of 3D reconstruction.
  • Projection-Optimized Target Regions Offset Correction: To address the impact of target regions offset on 3D imaging quality, a projection optimization-based offset correction method is proposed. This method adopts the reconstruction reprojection intersection over union (RR-IOU) as the optimization function and achieves target regions offset correction through iterative region adjustment and projection optimization.
The rest of the article is organized as follows: Section 2 introduces the optical-ISAR joint observation system, detailing the imaging projection models for both optical imaging and ISAR imaging. Section 3 presents the proposed 3D imaging algorithm for space targets, focusing on the semantic segmentation network for target region extraction, octree-based space carving for 3D reconstruction, and the projection-optimized target region offset correction method. Section 4 validates the effectiveness and robustness of the proposed method through experimental results. Section 5 discusses the performance of the proposed method based on the experimental results. Finally, Section 6 summarizes the paper and provides an outlook for future work.

2. Imaging Geometry of the Optical-ISAR Joint Observation System for Space Targets

With increasing space activities, the on-orbit operating states of various spacecraft have shown a trend of diversification, including complex motions such as active maneuvering, attitude adjustment, and uncontrolled tumbling, posing significant challenges to 3D imaging. To achieve 3D imaging of space targets with complex on-orbit operating states, this paper adopts an extended optical-ISAR joint observation system for observing space targets. As shown in Figure 1, the optical-ISAR joint observation system consists of a ground-based optical device (Optical1), a radar device (ISAR1) co-located with Optical1, and another radar device (ISAR2) deployed at a different location. When the space target enters the common visible area of the three observation stations, they synchronously image it from distinct perspectives.
For consistent characterization of the target’s motion, an orbital coordinate system (OCS) is established, as illustrated in Figure 2. The origin of OCS is located at the centroid O S of the space target, the X S -axis is tangent to the target orbit and points to the direction of motion, the Z S -axis points to the earth center, and the Y S -axis is determined by the right-hand rule. Due to the motion of the target in orbit, the lines of sight (LOS) of optical device and ISAR device will change continuously during imaging, forming the relative motion between the space target and the ground stations.
In fact, both optical and ISAR imaging can be viewed as the 2D projection of the target’s 3D information onto the IPP [30,31]. To achieve 3D reconstruction based on this principle, the projection relationships of optical and ISAR imaging must be analyzed separately. To uniformly describe the relative motion between the space target and stations, this paper transforms both relative motions into OCS. As shown in Figure 2, the LOS of both the ISAR and optical devices can be described as follows:
r = cos θ LOS cos φ LOS , cos θ LOS sin φ LOS , sin θ LOS T
where r represents the LOS vector, θ LOS and φ LOS denote the elevation and azimuth angles of the LOS in OCS, respectively.
To enhance the algorithm’s applicability in complex space target observation missions, this study focuses on uncontrolled space targets. Having lost attitude control capability, uncontrolled space targets typically exhibit highly nonlinear and unpredictable motion characteristics. Based on existing analyses of Envisat’s motion in [32,33], such uncontrolled targets generally demonstrate a fixed-axis spin state during the imaging interval. Consequently, the relative motion between the target and observation stations comprises two components: the target’s spin motion and the equivalent rotational motion. As illustrated in Figure 3, the rotational vector ω tar of the space target in OCS can be derived through vector composition, the relationship of the motion vectors can be expressed as:
ω tar = ω LOS + ω spin
where ω LOS denotes the equivalent rotation vector induced by LOS variations, and ω LOS refers to the spin vector.
Conventionally, the LOS at the central moment of imaging observation is referred to as the imaging LOS. According to the radar imaging principle, the range-dimensional projection vector k range of ISAR imaging can be expressed as:
k range = cos θ LOS cos φ LOS , cos θ LOS sin φ LOS , sin θ LOS T
As for the azimuth-dimensional projection vector k azimuth of ISAR imaging, it is jointly determined by the LOS vector and the effective rotation vector (ERV) ω eff of the target as follows:
k azimuth = 2 λ r 0 × ω eff
ω eff = ω tar r 0 T ω tar r 0 = ω LOS + ω spin ω spin r 0
where r 0 represents the imaging LOS vector.
Then, the projection relationship between an arbitrary point q on the space target and its corresponding position in the ISAR image can be expressed as:
r q d q = k range k azimuth P ISAR q
where r q and d q represent the range-dimension coordinate and azimuth-dimension coordinate of q in the ISAR image, respectively, and P ISAR denotes the ISAR imaging projection matrix.
Unlike ISAR imaging, the observation results of optical imaging depend solely on the relative attitude between the target and the optical device, as well as the intrinsic parameters of camera, and are independent of the target’s motion state. Consequently, the projection relationship between q and its corresponding position in the optical image can be expressed as:
h q v q = k horizontal k vertical P optical q
where k horizontal and k vertical represent the horizontal and vertical projection vectors, respectively, which can be derived from camera intrinsic parameters and observation distance; h q and v q denote the horizontal-dimensional coordinate and vertical-dimensional coordinate of q in the optical image, respectively; and P optical is the optical imaging projection matrix.
We have now formulated the imaging projection models for both optical and ISAR imaging of space targets. As shown in Figure 4, for the co-located optical and ISAR devices, they share the same LOS. Consequently, the IPP of optical ( IPP opt ) is perpendicular to the LOS, while the IPP of ISAR1 ( IPP ISAR 1 ) is parallel to the LOS. As a result, IPP opt and IPP ISAR 1 being mutually orthogonal. The optical image provides the 2D projection information of the target in the IPP perpendicular to the LOS, whereas the ISAR image provides the 2D projection information in the IPP along the LOS. By continuously observing space targets, acquiring sequential images from multiple perspectives, and leveraging the imaging projection model, precise 3D reconstruction of the target structure can be achieved.
Through the joint processing the optical and ISAR images, we propose a 3D reconstruction method for space targets that leverages the orthogonality of optical-ISAR projections. This method first automatically extracts the target regions from the observed images then reconstructs the 3D structure of the target based on the orthogonal projection information provided by optical and ISAR imaging. During the reconstruction process, the impact of target region offsets on the 3D reconstruction quality is also fully considered. Specific implementation details of this method will be elaborated in Section 3.

3. Space Targets 3D Reconstruction Using Projection Features of Target Regions

Based on the preceding analysis, both optical imaging and ISAR imaging of space targets are essentially projection processes that map 3D target information onto 2D imaging planes. Notably, when optical and ISAR devices are co-located, their IPPs are mutually orthogonal, which provides a foundation for 3D reconstruction. Building upon this analysis, this paper proposes a 3D reconstruction method for space targets based on the projection features of target regions.
The workflow of the proposed method is illustrated in Figure 5, which primarily includes three key steps: target region extraction utilizing semantic segmentation networks, 3D reconstruction based on octree-space carving, and target region offset correction based on projection optimization. Prior to 3D reconstruction, the initialization involving tasks such as time synchronization and data processing can be achieved using the techniques described in [34,35,36,37], which will not be elaborated here.
After initialization, the proposed semantic segmentation networks are applied to extract target regions from the observed image sequences, providing projection feature inputs for 3D reconstruction. Subsequently, octree-space carving and target region offset correction are performed alternately to continuously optimize and correct the target region offset, achieving coarse-to-fine 3D imaging. Detailed descriptions of each step will be provided in subsequent sections.

3.1. Target Region Extraction Using Semantic Segmentation Networks

To achieve 3D reconstruction of space targets, accurate extraction of target regions from observation images must first be accomplished. However, conventional segmentation methods often fail to precisely extract target regions from these two image types due to their distinct characteristics: ISAR images are typically sparse with low signal-to-noise ratios, while optical images are susceptible to lighting and meteorological conditions. Inspired by [38,39,40,41], this paper proposes two semantic segmentation networks, OpticalSegNet and ISARSegNet, designed for target region extraction in optical and ISAR images, respectively. While OpticalSegNet and ISARSegNet share nearly identical backbone architectures, they incorporate subtle structural adaptations tailored to their respective imaging modalities. In the following, OpticalSegNet will be taken as an example to introduce the structural composition of the proposed semantic segmentation networks.
To accurately extract target regions from optical images of space targets, this paper proposes OpticalSegNet, an enhanced network based on the classic Unet architecture [42] with three key improvements: (1) the introduction of an image gradient processing branch (IGPG) that enables dynamic gradient-guided semantic segmentation through feature fusion between image features and gradient features; (2) the incorporation of CBAM and Non-Local modules (NL) [43,44] into skip connections at different resolutions to establish long-range dependencies and capture global contextual information, thereby addressing insufficient semantic correlation in deep features; and (3) the addition of a binary segmentation branch (BSB) at the output layer to form a dual-output supervision structure with the semantic segmentation branch for simultaneous optimization of both tasks. The complete architecture of OpticalSegNet is illustrated in Figure 6.

3.1.1. Image Gradient Processing Branch and Feature Fusion Module

The classic Unet consists of an encoder, a decoder, and skip connections, forming a distinctive “U”-shaped structure. The encoder progressively extracts semantic features through convolutional and downsampling operations, while the decoder, in conjunction with skip connections, combines shallow detail features with deep semantic representations to generate a segmentation mask matching the input dimensions. Nevertheless, this architecture exhibits significant limitations in utilizing image gradient information (including edges, contours, and other critical boundary cues). First, the network only implicitly learns gradient features from raw pixels, and max-pooling downsampling operations will lead to gradient information loss. Second, conventional convolutional kernels struggle to effectively capture gradient features. As research on remote sensing image segmentation [40,41] has demonstrated, the explicit utilization of gradient information can significantly improve segmentation accuracy. Drawing on this finding, this paper innovatively introduces an IGPB into Unet. By explicitly inputting gradient features and designing a feature fusion module (FFM) to achieve dynamic gradient-guided semantic segmentation, effectively enhancing the network’s sensitivity to gradient features while preserving the original architecture.
In OpticalSegNet, we concatenate various gradient features including vertical gradient G y , horizontal gradient G x , gradient magnitude M x y , and gradient phase φ x y as inputs to the IGPB. The gradient magnitude M x y and gradient phase φ x y are calculated as follows:
M x y = G x 2 + G y 2
φ x y = arctan G y / G x
where G y and G x can be obtained by convolving the image with Sobel operators. The Sobel operators in the horizontal and vertical directions are as follows:
S x = 1 0 1 2 0 2 1 0 1
S y = 1 2 1 0 0 0 1 2 1
where S x and S y are the Sobel operators in the horizontal and vertical directions, respectively.
To enhance the segmentation network’s capability in utilizing both semantic and boundary information for boundary-guided semantic segmentation, this paper proposes a FFM that integrates feature maps from the image processing branch (IPB) and IGPB. As illustrated in Figure 7, the FFM is mainly composed of an adaptive channel adjustment module (ACAM) and a CBAM. In ACAM, image features and gradient features are fused and complementarily enhanced according to adaptive weights, avoiding noise and information dilution introduced by simple feature exchange. Then, the features after differential initial fusion are input into CBAM for further screening and enhancement: channel attention screens key fusion features, and spatial attention focuses on target regions, further enhancing effective information and suppressing noise interference. Finally, the optimized features are split along the channel dimension and redistributed to their respective branches for subsequent feature extraction.
In the following, we will introduce the structure of ACAM, which is illustrated in Figure 8. In ACAM, the input features first undergo global average pooling, followed by a 1D convolution layer, and then pass through a Sigmoid activation to generate the channel-wise weights γ I (for image features) and γ G (for gradient features).
Assuming the input features of IPB and IGPB in the s -th FFM are x I , s h × w × c and x G , s h × w × c , respectively, the fused outputs can be achieved by normalized channel feature weighting:
x ˜ I , s , c = γ I , s , c x I , s , c + 1 γ I , s , c x G , s , c
x ˜ G , s , c = γ G , s , c x G , s , c + 1 γ G , s , c x I , s , c
where x ˜ I , s , c and x ˜ G , s , c are the weighted fusion outputs of x I , s and x G , s , respectively; denotes element-wise multiplication, and c represents the feature channels.
By leveraging channel-wise feature weighting, the features of IPB and IGPB are effectively fused and enhanced, facilitating mutual interaction and guidance between semantic information and edge information. To further extract effective information and suppress noise interference, the weighted and fused image features and gradient features are first concatenated along the channel dimension, refined through CBAM, and then split again along the channel dimension to restore the feature outputs of the two branches.
After processing by FMM, the IPB incorporates boundary constraints to avoid semantic localization deviations, while the IGPB, leveraging the semantic guidance provided by IPB, effectively eliminates interference from irrelevant edges. This ensures that the subsequent network can further utilize both types of features, enhancing the semantic consistency and boundary accuracy of semantic segmentation.

3.1.2. Attention-Based Skip Connection Feature Optimization

The skip connections in the conventional Unet architecture lack optimization for the relevance and effectiveness of cross-layer features, making them susceptible to interference from noise or redundant information, which can lead to blurred segmentation boundaries and segmentation errors. Additionally, due to its reliance on the local receptive fields of convolutional operations, the classic Unet fails to capture long-range dependencies between features and cannot effectively model the holistic geometric characteristics of targets, thereby compromising segmentation accuracy and robustness. To address these limitations, inspired by ref. [45], we introduce attention modules into skip connections at different resolutions for feature optimization. For high-resolution skip branches, we employ the computationally efficient CBAM, which utilizes dual attention mechanisms (channel and spatial) to perform channel-wise feature selection and target region focusing, effectively suppressing background noise while enhancing critical features. For low-resolution skip branches rich in semantic information, we incorporate the NL module, leveraging self-attention mechanisms to model global dependencies among feature pixels. This hierarchical attention mechanism design compensates for the Unet’s deficiency in global modeling and establishes effective global contextual relationships, thereby comprehensively improving the accuracy and robustness of semantic segmentation.

3.1.3. Dual Segmentation Branch Structure

In refs. [45,46], semantic segmentation of targets is cascaded with a binary segmentation network. However, such methods fail to leverage the guidance of binary segmentation for semantic segmentation, which not only requires repeated training but also makes the semantic segmentation results susceptible to the accuracy of binary segmentation. To address these limitations, we augment the semantic segmentation output branch of the classic Unet with an auxiliary binary segmentation output branch for joint training. By simultaneously utilizing semantic segmentation labels and binary segmentation labels for network training, it avoids redundant training and significantly improve training efficiency. Additionally, the global target information extracted by binary segmentation can be fully utilized to guide semantic segmentation. During the training process, the loss function of the network is defined as:
L seg = k bin L bin + k sem L sem + k guide L guide
where L bin , L sem and L guide represent the binary segmentation loss, semantic segmentation loss, and guidance loss, respectively, while k bin , k sem , and k guide denote their corresponding loss weights. L bin , L sem and L guide can be calculated as follows:
L bin = L BCE + L Dice-bin
L sem = L CCE + L Dice-sem
L guide = L MSE P sem , P bin + L Dice-guide P sem , P bin
where L BCE and L CCE represent the binary cross-entropy and multi-class cross-entropy, respectively; L Dice-bin and L Dice-sem denote binary Dice loss and multi-class Dice loss, respectively; L MSE is mean squared error (MSE); L Dice-guide denotes guided Dice loss; P sem and P bin represent the component segmentation foreground map and binary segmentation foreground map, respectively, which can be obtained by taking the union of all non-background classes from the binary segmentation output and semantic segmentation output.
Compared with natural images, optical observation images of space targets from ground-based devices exhibit significant differences such as image blur, lack of color and texture information, and low signal-to-noise ratio (SNR). However, OpticalSegNet can effectively overcome the impact of these imaging defects by fully utilizing image gradient information to guide semantic segmentation. For ground-based ISAR images, affected by geometric deformations (e.g., rotation, translation, scaling) [47] caused by changes in imaging parameters and target motion, as well as inherent low SNR and boundary blur issues. Therefore, ISARSegNet retains the main framework of OpticalSegNet while specifically replacing the first and second convolutional layers with Deformable Convolutions (DConv) to enhance the network’s adaptability to geometric deformations. Its overall architecture is shown in Figure 9.
The introduction of DConv enables the network to effectively adapt to ISAR image distortions caused by varying imaging parameters and unknown target motions. By matching the non-uniform distribution of scattering centers in ISAR images, DConv dynamically adjusts both the convolutional kernel morphology and receptive field scope. This adaptive mechanism facilitates more effective feature extraction while significantly reducing segmentation errors induced by target geometric variations.

3.2. Three-Dimensional Reconstruction Based on Octree-Space Carving

The space carving-based 3D reconstruction technique for space targets [48,49] takes the target projection regions and their corresponding imaging projection matrices as inputs, and filters initial voxels through multi-view projection consistency verification: valid target voxels should project within the target regions across multiple image frames, while invalid voxels tend to project outside these regions. By integrating the projection consistency verification results from all observation images, a final projection consistency score can be calculated for each voxel. Setting an appropriate score threshold yields the final 3D point cloud belonging to the target, thereby completing the 3D imaging process. However, traditional space carving-based 3D imaging methods require projection determination for each voxel individually, resulting in high computational complexity. To improve the efficiency of 3D reconstruction, we incorporate octree technology [50,51] and propose an octree-accelerated space carving method for 3D reconstruction. The processing flow of this method is illustrated in Figure 10.
The main steps of the octree-space carving-based 3D reconstruction method for space targets are as follows.
Step 1: Voxel grid initialization. According to the approximate size of the target and actual requirements, the initial voxel grid of the target is constructed. Based on the estimated attitude parameters of the space target, this paper uniformly reconstructs the 3D structure of the target in OCS, where the initial voxel grid V i n i of the target can be expressed as:
V ini = G r i d 3 d X ini , Y ini , Z ini
where X ini represents the coordinate grid values of the initial voxels along the X-axis; X i n i X max : Δ X : X max , X max and Δ X denote the maximum coordinate value and coordinate resolution of the initial voxels along the X-axis, respectively; Y i n i and Z i n i are defined similarly to X i n i ; and G r i d 3 d represents the operation of generating a 3D spatial voxel grid according to the coordinate range.
Step 2: Octree construction. Octree construction is a process of spatial partitioning, in which each node represents a spatial region defined by the boundaries x n o d e min , x n o d e max , y n o d e min , y n o d e max , z n o d e min and z n o d e max . First, initialize the root node. This node spans the entire 3D voxel grid, with its spatial boundaries defined as: x n o d e min = X max , x n o d e max = X max , y n o d e min = Y max , y n o d e max = Y max , z n o d e min = Z max , and z n o d e max = Z max . Next, child nodes are partitioned recursively. Any parent node containing more than one voxel grid point is split into eight child nodes at the midpoints of the X-axis, Y-axis, and Z-axis. This partitioning process continues recursively until every node becomes a leaf node containing exactly one voxel grid point. As shown in Figure 11, once all nodes have been recursively partitioned, the octree construction for the entire voxel grid space is complete.
Step 3: Traverse the octree and perform projection consistency verification. Assuming the voxel grid space contained in a certain node is V node = i , j , k | V ini i , j , k , where i , j , and k are the indices of the voxel, respectively. Based on the LOS data and station position, the attitude of the space target can be estimated, and then the optical imaging projection matrix P optical , f and ISAR imaging projection matrix P ISAR , f for acquiring the f -th set of optical-ISAR observation images can be calculated. According to the imaging projection principle, the optical projection regions S optical , f n o d e and ISAR projection regions S ISAR , f n o d e of V n o d e can be expressed as:
S ISAR , f node = P ISAR , f V node
S optical , f node = P optical , f V node
Based on S optical , f n o d e , S ISAR , f n o d e , S optical , f and S ISAR , f , we can determine whether a node belongs to the target 3D structure as follows: (1) if S optical , f n o d e and S ISAR , f n o d e have no overlap with S optical , f and S ISAR , f at all, none of the voxel grid points within the node belong to the target 3D structure, and no further projection judgment is needed for its child nodes; (2) if S optical , f n o d e and S ISAR , f n o d e are entirely inside S optical , f n o d e and S ISAR , f n o d e , then all voxel grid points within the node belong to the target 3D structure, and no further projection judgment is required for its child nodes; (3) if S optical , f n o d e and S ISAR , f n o d e partially overlap with S optical , f and S ISAR , f , then recursively perform the projection consistency verification on the eight child nodes of this node until reaching leaf nodes, where the projection consistency verification concludes.
For each voxel grid, we assign a projection consistency score e f i , j , k for the f -th set of images and initialize it to 0. As long as the projection consistency verification for the f -th set of images is completed and the voxel grid point is determined to belong to the target 3D structure, then e f i , j , k = 1 , and the final projection consistency score E i , j , k for this voxel grid point can be expressed as:
E i , j , k = f = 1 F e f i , j , k
where F is the total number of optical-ISAR image sets.
Step 4: Set the score threshold to obtain the target 3D structure. After completing the projection consistency verification of octree nodes, the final projection consistency score for each node is obtained. The target 3D structure can then be determined by setting an appropriate score threshold as follows:
V tar = i , j , k | E i , j , k > ζ
where ζ represents the projection consistency score threshold.

3.3. Target Region Offset Correction Based on Projection Optimization

As mentioned earlier, target regions in optical and ISAR images usually have a certain degree of offset. The core of space carving-based 3D reconstruction lies in the intersection operation of back-projected spatial regions from multi-view observation images. If offsets exist in the target regions, the intersection of back-projected regions from different images will be significantly smaller than that without offsets. Therefore, the offsets of target regions in ISAR images and optical images inevitably degrades the final 3D reconstruction quality, and eliminating the impact of target region offsets is a prerequisite for ensuring 3D reconstruction quality.
To mitigate the impact of target region offsets on 3D reconstruction quality, a straightforward approach involves first estimate the offsets, then adjusting the target regions accordingly, and finally performing 3D reconstruction using the corrected regions. Based on this idea, this paper proposes a projection optimization-based target region offsets correction method. It employs intersection over union (IOU) between the reprojected regions of the adjusted 3D reconstruction and the corrected target regions as the optimization objective. The zebra optimization algorithm (ZOA) [52] is utilized to estimate the offsets of target regions in each frame of images and then completes target region adjustment and 3D reconstruction according to the offset estimation results. Finally, it iteratively performs region offset estimation and projection optimization according to the optimization objective until the algorithm converges, ultimately achieving the correction of target region offsets and obtaining high-quality 3D reconstruction results. The detailed steps of the proposed method are as follows.
Step 1: Initialization. Set the estimation ranges of range-dimensional offset, azimuth-dimensional offset, horizontal-dimensional offset, and vertical-dimensional offset. Initialize the zebra population number K , maximum iteration number T max , current iteration number t , and initial position B k of zebra k . B k = b k , 1 , b k , 2 , b k , 3 , ··· , b k , m is a vector composed of target region’s range-dimensional offset, azimuth-dimensional offset, horizontal-dimensional offset, and vertical-dimensional offset, with vector dimension m = 4 F , and F is the number of optical-ISAR image sets. Input the target regions S optical , f and S ISAR , f , along with the imaging projection matrices P optical , f and P ISAR , f ,   f = 1 , ··· , F .
Step 2: Region Adjustment and 3D Reconstruction. Adjust the target regions S optical , f and S ISAR , f according to the offset parameters corresponding to the zebra position B k , with the adjusted regions denoted as S ISAR , f and S optical , f . Subsequently, perform 3D reconstruction of the target using the octree-space carving method to obtain the reconstructed voxels V tar , k for zebra k .
Step 3: Fitness calculation. Using the reconstructed voxels V tar , k and the imaging projection matrices P optical , f k and P ISAR , f k , calculate the reprojected target regions S ^ optical , f k and S ^ ISAR , f k . In this paper, RR-IOU is adopted to calculate the fitness score for each zebra, where RR-IOU is defined as the IOU between the reprojected region of the corrected 3D reconstruction result and the offset-corrected region, which can be calculated as follows:
mIOU k = 1 F f = 1 F IOU ISAR , f k + IOU optical , f k
IOU ISAR , f k = s u m S ^ ISAR , f k S ISAR , f s u m S ^ ISAR , f k + s u m S ISAR , f s u m S ^ ISAR , f k S ISAR , f
IOU optical , f k = s u m S ^ optical , f k S optical , f s u m S ^ optical , f k + s u m S optical , f s u m S ^ optical , f k S optical , f
where IOU optical , f k and IOU ISAR , f k represent the RR-IOU of the f -th set of optical and ISAR images, respectively, where the superscript and subscript k are used to distinguish between different zebras.
Step 4: Zebra Position Update. In ZOA, the update of zebra positions is divided into two phases: the foraging phase and the defense phase. After calculating the RR-IOU of all zebras in the population, the zebra with the best performance in the population, namely the Pioneer Zebra, can be selected. In the foraging phase, under the leadership of the pioneer zebra, the positions of the remaining zebras are updated as follows:
B k = B k + R 1 Z pioneer I 1 B k
B k = B k , mIOU k   <   mIOU k B k , else
where B k denotes the updated position of zebra k in the foraging phase, Z pioneer represents the position of the Pioneer Zebra; R 1 is a random vector with each element being a random number between 0 and 1; and I 1 = round 1 + R 11 , where the definition of R 11 is consistent with R 1 .
During the defense phase, zebras adopt distinct defensive strategies based on predator variations, and their positions are updated as follows:
B k = B k + c 1 2 R 2 1 1 t T max B k P s 0.5 B k + R 3 Z attacked I 2 B k else
B k = B k , mIOU k   <   mIOU k B k , else
where B k represents the updated position of zebra k during the defense phase; c 1 is a constant value of 0.02; the definitions of R 2 and R 3 are consistent with R 1 ; Z attacked denotes the position of the attacked zebra; I 2 shares the same definition as I 1 ; and P s indicates the defensive strategy selection probability.
Step 5: Iteration Termination Condition Judgment. If t < T max , then t = t + 1 , and return to Step 2 to continue iterative optimization. If t = T max , then terminate the iteration and output the global optimal position B global . Subsequently, perform region adjustment and 3D reconstruction based on B global to obtain the final high-quality 3D imaging result.
In summary, the workflow of the region offset correction method based on projection optimization is illustrated in Figure 12.

4. Experiments and Results

To verify the effectiveness of the proposed 3D imaging method, a series of experiments were designed and conducted in this paper. Section 4.1 introduces the experimental datasets used in this study. In Section 4.2, the performance of the proposed semantic segmentation method is evaluated. In Section 4.3, an optical-ISAR joint observation system is adopted to analyze the effectiveness and robustness of the proposed 3D imaging method based on simulated data. Finally, Section 4.4 verifies the semantic segmentation performance and overall feasibility of the proposed method based on partial measured data.

4.1. Data Description

Due to the scarcity of measured optical and ISAR data for space targets, this paper adopts simulated data for experimental verification: space target optical images are rendered and generated based on the open-source software Blender (version 4.3) [53], and target radar echoes are simulated using the physical optics method [54], where the 3D target models and TLE orbital data are all obtained from public databases [55,56]. After setting the deployment positions of optical and ISAR devices, the rendering of optical images and the simulation of ISAR echoes are completed in combination with imaging parameters. The processes of optical data acquisition and ISAR data simulation are shown in Figure 13.
TLE data is one of the common data used for orbit simulation of space targets. The first line of TLE data includes identification and time information, such as the target’s satellite catalog number, international designator, and element set epoch. The second line contains orbital parameters like orbit inclination, eccentricity, and right ascension of ascending node [56]. Based on TLE data, we can accurately calculate the target’s orbital position data in space within a given time period. In this paper, six sets of TLE data are utilized for data simulation, and the detailed parameters are listed in Table 1.
Considering that relying solely on co-located optical and ISAR devices may not be sufficient to complete target attitude estimation, this paper adopts the station deployment and observation setup in [36], adding a new ISAR device (ISAR2) in Beijing. The used observation station positions are listed in Table 2. It should be noted that ISAR2 is only used to ensure the accuracy of target attitude estimation, and its observation results are not used for 3D reconstruction. Based on the TLE data and the deployment positions of the observation stations, the first visible arcs of different targets relative to observation stations is shown in Figure 14. It is worth noting that these arcs in Figure 14 do not appear simultaneously, and ARC1-ARC6 represent the first visible arcs of TLE1-TLE6, respectively.
During optical image rendering, the observation geometry between the target and the station in the visible arc is first transformed to the OCS, and then Blender software is used for rendering. To emulate image degradation in real-world observation scenarios, defocus blur, motion blur, Gaussian noise, and salt-and-pepper noise are introduced into the simulated images. These artifacts, respectively, mimic image blurring caused by atmospheric turbulence, target smearing induced by long exposure, thermal noise from the optical system, and pixel defects in the sensor. All degradation parameters are randomized to cover diverse observational scenarios. The position and intensity of light source are likewise randomized to replicate varying illumination conditions. The simulation parameters for optical image rendering are listed in Table 3.
The radar imaging parameters employed in the ISAR echo simulation are listed in Table 4. By utilizing diverse imaging parameters during the echo simulation, the generated echo data exhibit rich characteristics, which can effectively enhance the performance of the feature extraction network, so that our method can be applied to optical-ISAR joint observation scenarios where each ISAR has different imaging parameters.
During the simulation of optical and ISAR echoes, the theoretical positions of different target components in the IPP were automatically annotated based on imaging projection principles. This study selected eight space targets with distinct structural configurations. By integrating 3D models of these targets with TLE data and incorporating spin motion with randomized angular velocities (0.015–0.03 rad/s) and arbitrary spin-axis orientations, we conducted joint simulations of optical images and ISAR echoes to construct a semantic segmentation dataset. The dataset comprises 24,000 optical-ISAR image sets, with 3000 sets per target category. To meet the requirements of semantic segmentation network training and evaluation, the dataset was partitioned into training and testing sets at an 8:2 ratio for model parameter learning and generalization capability validation, respectively. Figure 15 displays the 3D models of the space targets, while Figure 16 and Figure 17 present representative simulated optical and ISAR images from the semantic segmentation dataset. In the experimental setup, the co-located deployment of optical and ISAR devices resulted in orthogonal and complementary target information characteristics between the two imaging modalities.
Meanwhile, to verify the performance of the proposed 3D imaging method, three arcs (ARC1, ARC3, and ARC5) were selected from the visible arcs in Figure 14. Combined with the 3D models of three targets (Aqua, Meteor, and TG-1), experimental scenarios for 3D reconstruction were established, respectively. In these experimental scenarios, the targets were set to rotate around the Z-axis of the OCS at a constant angular velocity of 0.015 rad/s. Optical image simulation and ISAR echo simulation were conducted based on the imaging geometric relationships among the ground-based optical device, ISAR device, and the targets. After aperture division and imaging processing, optical-ISAR observation images of three targets were obtained, with specific quantities as follows: 35 sets for Aqua, 55 sets for Meteor, and 27 sets for TG-1. This dataset is referred to as the 3D reconstruction dataset.
To investigate the influence of target motion states on 3D imaging, we analyzed the relative motion between the target and observation stations under different motion states. The LOS and equivalent LOS variations in the observation stations in OCS for three-axis stabilized and spinning states are shown in Figure 18.
As can be seen from Figure 18, the spinning motion indeed affects the LOS variation, thereby influencing the ISAR imaging results. Consequently, the target’s spinning motion must be considered during the 3D reconstruction process. The estimation of the target’s spinning motion can be performed using the methods in [34,36], which is not the research focus of this paper and thus will not be further elaborated here.

4.2. Performance Validation of Semantic Segmentation

Binary segmentation can effectively extract the complete foreground region distribution of targets, achieving separation between foreground targets and background environments, thereby providing projection feature constraints for subsequent target 3D reconstruction. In contrast, component segmentation performs fine-grained partitioning of the internal target structure, decomposing the target into distinct independent components, can intuitively present the spatial distribution and morphological characteristics of each part in the overall target, and is conducive to the structural and functional analysis of the target. In this section, we verify the performance of the proposed semantic segmentation method through experiments.
First, we train the segmentation networks using the semantic segmentation dataset and then validate the semantic segmentation performance of the proposed method in experimental scenarios using the 3D reconstruction dataset. For the experiments, we select three imaging periods (ARC1-5, ARC3-25, and ARC5-27) representing different target states including approach, overhead, and departure phases, respectively. Due to the extremely fine structures of some target components that are difficult to accurately extract from the overall target region, we categorize the segmentation areas into four classes: background, main body, solar panels, and other components, represented by black, red, green, and blue colors, respectively. The observation images and ground truth segmentation masks for these three imaging periods are shown in Figure 19.
To validate the performance of the proposed method, we conducted both comparative experiments and ablation experiments. In the comparative experiments, we compared our method with existing mainstream semantic segmentation methods to verify its superior performance in target region extraction tasks. For the ablation experiments, by incrementally introducing various structural improvements and analyzing performance changes, we quantitatively assessed the specific contributions of different structural enhancements to segmentation performance, thereby verifying the rationality of the design for each improvement module.

4.2.1. Performance Comparison and Analysis

To validate the effectiveness of the proposed semantic segmentation method, this experiment selected mainstream semantic segmentation methods such as Deeplabv3+ [57], FCN [58], CL-NL-Unet [45], Pix2pixGan [59], Segmenter [60], and Segformer [61] as comparative baselines. Semantic segmentation experiments were then conducted on optical and ISAR images, respectively. All semantic segmentation methods were trained on a semantic segmentation dataset using a personal computer equipped with a single RTX 5000 GPU and a CUDA 11.3 environment. Since the detailed training processes of the comparative methods have been elaborated in relevant studies, they will not be repeated here. The focus is on the training configuration of the proposed method: the Stochastic Gradient Descent (SGD) optimizer was used, with an initial learning rate set to 0.01, momentum set to 0.9, weight decay set to 0.0005, and the number of epochs set to 300. Additionally, threshold-based binary segmentation methods [62,63] were employed to segment the observed images. The target region extraction results of different semantic segmentation methods are shown in Figure 20, Figure 21 and Figure 22.
From the visual comparison of the target region extraction results from the observation images of three types of targets, it is evident that different methods exhibit significant differences in their effectiveness. Taking the optical images of Aqua as an example, methods such as FCN, Pix2pixGAN, Segmenter, and Segformer sometimes misclassify the main body as other components during the extraction of target regions, which may affect subsequent analysis of the target’s component composition. Meanwhile, Deeplabv3+ and FCN suffer from inaccuracies in the edges and contours of the extracted results, mistakenly classifying some background pixels as the target. Traditional threshold-based methods exhibit even more prominent issues: their extraction results not only contain numerous holes but also have discontinuous edge contours. Overall, deep learning-based target region extraction methods demonstrate stronger feature extraction and generalization capabilities, producing target regions that are more continuous and complete, with superior overall performance.
To quantitatively evaluate the performance of different semantic segmentation methods, we adopted metrics such as IOU and mean intersection over union (MIOU) for assessment. The definition of MIOU is as follows:
MIOU = 1 H h = 1 H IOU h
where IOU h represents the IOU of the h -th component category, h = 1 , 2 , , H and H is the total number of component categories. The calculation of IOU can refer to Equation (24).
We statistically calculated the average values of MIOU and IOU for the target region extraction results of different semantic segmentation methods in the 3D reconstruction dataset. The detailed results are shown in Table 5, where the values before and after the ‘/’ in each cell of the table correspond to the MIOU and IOU of the target region extraction results for optical and ISAR images, respectively.
As can be seen from Table 5, Deeplabv3+, FCN, and Pix2pixGan exhibit similar semantic segmentation performance, with their MIOU values all below 0.88 and IOU values all below 0.91. The semantic segmentation performances of the Segmenter and Segformer methods are somewhat improved compared with the aforementioned methods, and the MIOU values of optical images can reach more than 0.88. Due to the introduction of the NL module that can obtain global information, CL-NL-Unet has achieved a significant improvement in semantic segmentation performance, with the IOU values of both optical and ISAR images reaching more than 0.91. Meanwhile, the MIOU and IOU values of the proposed method for optical and ISAR images are both higher than those of the comparison methods, indicating that the proposed method achieves more accurate semantic segmentation and more complete extraction of target regions. The proposed method demonstrates superior performance compared to CL-NL-Unet, achieving significant improvements through two key innovations: (1) eliminating error accumulation from the conventional two-stage pipeline (binary segmentation followed by semantic segmentation), and (2) optimizing multi-scale skip connection mechanisms for enhanced feature extraction. In addition, compared with threshold-based segmentation methods, semantic segmentation network-based methods have higher accuracy in target region extraction, with IOU values all above 0.9. This indicates that semantic segmentation networks can effectively suppress sidelobe interference and noise, yielding more continuous and coherent target regions. The high-quality target regions provide a reliable guarantee for subsequent 3D imaging quality.

4.2.2. Ablation Experiment

To analyze the impact of various structural improvements in the proposed method on semantic segmentation performance, we used Unet as the baseline model and sequentially integrated structural enhancement modules such as IGPB, skip connection feature optimization (SCFO), and BSB into the Unet architecture to construct a new semantic segmentation network. Subsequently, we evaluated the target region extraction performance of each network using the 3D reconstruction dataset. The specific experimental results are presented in Table 6. It should be noted that the data before and after “/” in each cell of the table correspond to the experimental results for optical images and ISAR images, respectively. The values in parentheses represent the relative performance change in the network compared to the baseline model.
As can be seen from Table 6, the semantic segmentation performance gradually improves as structural enhancements such as GPB, SCFO, and BSB are introduced into the baseline network. The results indicate a clear trend: the more structural improvements are applied, the better the semantic segmentation performance of the corresponding network. Experiments demonstrate that the Unet networks modified individually with IGPB, BCFO, or BSB all outperform the baseline model, thereby verifying the effectiveness of the corresponding structural improvements.

4.3. Performance Verification of 3D Reconstruction

To verify the performance of the proposed 3D reconstruction method, a series of experiments are designed in this section. First, the effectiveness of the proposed method is verified. Second, the 3D imaging quality of the proposed method under different configurations is analyzed. Third, the proposed method is compared with existing 3D imaging methods through comparative experiments. Finally, the robustness of the proposed method is analyzed.

4.3.1. Effectiveness Validation

In the previous section, we have comprehensively validated the effectiveness of the proposed semantic segmentation method. Therefore, the target regions are extracted based on the semantic segmentation network. To verify the effectiveness of the proposed 3D reconstruction method, we added target region offsets (ranging from 0 to 15 cells) to the target regions to simulate the target region offsets caused by motion compensation and installation deviations.
In the experiments, the Contour Projection Reconstruction (CPR) method [30] and the proposed method were, respectively, adopted for 3D imaging processing of Aqua, Meteor, and TG-1. The CPR method employs an optical-ISAR joint observation system to observe space targets and reconstructs the 3D regions of targets based on optical-ISAR contour fusion. However, it does not consider the offset of the target region. In the proposed method, ZOA-based region adjustment and projection optimization are adopted to achieve target region offset correction and 3D region optimization. The parameters of ZOA are set as follows: the number of zebra populations K is set to 50, and the maximum number of iterations T max is set to 200. The specific settings of the reconstruction region are set as follows: the number of grid points in the three coordinate axes ( N X , N Y , and N Z ) are all 300. For Aqua, the ranges of the three coordinate axes ( L X , L Y , and L Z ) are all 20 m. For Meteor and TG, L X , L Y , and L Z are all 10 m. Regarding the projection consistency score threshold ζ , we set it as 95% of the maximum score based on empirical experience [15].
The 3D imaging results obtained by CPR and proposed method are shown in Figure 23. To effectively quantitatively evaluate the 3D reconstruction quality of different methods, we adopt reconstruction accuracy (RA), reconstruction integrity (RI) [27] and voxel IOU (VIOU) [15] as the evaluation metrics, and the corresponding values of these metrics are presented in Table 7, where the three values in each cell correspond to RA, RI and VIOU, respectively.
As observed from Figure 23, the CPR method fails to consider target region offsets and reconstructs 3D structures based on offset contours, which ultimately introduces reconstruction deviations and degrades imaging quality. Compared with the ground truth, the reconstructed 3D structures by CPR exhibit varying degrees of structural loss in solar panels and the main body, which adversely affects subsequent structural and functional analyses. In the reconstruction experiments on different targets, the VIOU values of CPR-based results are all below 55%. Among these, the reconstruction result of Meteor shows the most severe structural missing, with a VIOU value of only 41.05%. In contrast, the proposed method employs ZOA-based region adjustment and projection optimization to effectively correct target region offsets, ensuring high reconstruction accuracy. The reconstructed 3D structure is both complete and highly consistent with the ideal 3D structure, achieving VIOU values above 80% for all targets. Meanwhile, it can be observed from Table 7 that the 3D reconstruction results based on the CPR all achieve a RA of over 91%, while their RI remains below 56%. This occurs because the target area exhibits an offset, causing the reconstructed region to approximate a subset of the actual target area. As a result, a higher RA value is achieved, but the RI value remains low.
The iteration curves and the probability distribution of region offset estimation errors during the region adjustment and projection optimization process are presented in Figure 24. As shown in Figure 24a–c, the proposed method consistently converges within 80 iterations, demonstrating the efficient search capability and rapid convergence characteristics of ZOA. Additionally, Figure 24d–f show that ZOA can accurately estimate region offsets, with only a few images failing to achieve precise offset compensation and the errors within 3 cells. The above results verify the effectiveness of ZOA for target region offset correction.
Meanwhile, to verify the effectiveness of regional offset correction based on projection optimization, we analyzed the reconstructed reprojection regions of different methods. As shown in Figure 25, the first column presents the real target regions, the second column shows the reconstructed reprojection regions of CPR, and the third column displays the reconstructed reprojection regions of the proposed method.
As shown in Figure 25, target regions without offset correction introduce significant deviations in the 3D reconstructed regions, with structural losses of varying degrees in areas such as solar panels and the main body, resulting in large differences between the reconstructed reprojection regions and the real regions. In contrast, the proposed method optimizes and corrects target region offsets, eliminating deviations in the reconstructed regions, thus achieving extremely high similarity between the reconstructed reprojection regions and the real target regions. To quantitatively analyze the differences between the reprojection regions and the real target regions of different methods, this paper statistically calculated the RR-IOU of each method, with the results presented in Table 8.
As can be seen from the data in Table 8, in the 3D reconstruction experiments of different targets, the RR-IOUs of CPR are all lower than those of the proposed method. Moreover, the RR-IOUs of CPR are generally small, all below 0.55. In contrast, the RR-IOUs of the proposed method are all above 0.81 in all experiments, indicating that the reprojection regions of its reconstruction results have higher similarity with the real target regions. These results not only confirm the effectiveness of the region adjustment and projection optimization approach in correcting target region offsets but also validate the feasibility and superiority of the proposed method.
To verify the efficiency improvement brought by octree technology to 3D reconstruction, we statistically analyzed the processing time of CPR and the proposed method under different reconstructed region settings. The experiments were conducted on a personal computer equipped with an Intel (R) Core (TM) i7-10700 CPU @ 2.90 GHz and running the Ubuntu 20.04 operating system, with the runtime statistics presented in Table 9. It should be noted that this experiment only counted the processing time of the 3D reconstruction part based on octree-space carving, and the analysis of the overall runtime of the proposed method will be conducted in subsequent experiments.
As can be seen from Table 9, the proposed method has a shorter runtime than CPR, primarily due to the introduction of octree technology. The proposed method only needs to perform projection consistency verification on a small number of key nodes in the octree, eliminating the need for the exhaustive traversal and individual projection consistency verification on all voxels to be reconstructed as in the CPR method, thus significantly improving 3D reconstruction efficiency. According to the time comparison results, compared with the CPR method, the operating efficiency of the proposed method is improved by two orders of magnitude. In addition, the experimental data also reveals a clear trend: the more voxel grids of the target to be reconstructed, the more significant the application advantage of octree technology.
The above experiments have fully validated both the effectiveness and reconstruction efficiency of the proposed 3D imaging method. The results demonstrate that the introduction of octree technology significantly improves 3D reconstruction efficiency. Additionally, the experiments have verified the impact of region offset correction on 3D imaging quality. Based on these findings, subsequent experiments will focus on analyzing the influence of other factors on 3D imaging quality, with region offset factors no longer being considered.

4.3.2. Performance Analysis Under Different Configurations

In the 3D imaging process, factors such as the device configuration, target motion state, and accuracy of target region extraction all affect the final 3D imaging quality. To systematically analyze the impact of these factors, we designed four distinct experimental configurations based on the proposed method. These configurations are detailed in Table 10.
In the experiment, Config1 and Config2 adopt only ISAR device and only optical device to observe space targets, respectively. Config3 uses optical-ISAR joint observation, with the target defaulted to be in a three-axis stable state and no attitude estimation performed. Config4 employs optical-ISAR joint observation for space targets and conducts attitude estimation, while extracting the target region using the threshold method. It is worth noting that target region offset correction has been completed for all the above configuration methods. The 3D reconstruction results obtained using these four configurations are presented in Figure 26, with the corresponding values of RA, RI and VIOU provided in Table 11.
From the results in Figure 26 and Table 11, it can be observed that the 3D reconstruction quality varies across different configurations. Since Config1 and Config2 adopt single ISAR device and single optical device to observe space targets, respectively. Although they can initially reconstruct the 3D structure of the targets, the 3D reconstruction quality is poor due to the limited observation perspective of a single device, with all VIOU values below 65%. As shown in Figure 26a–f, significant noise regions appear around structural components like solar panels and the main body, representing ambiguous areas indistinguishable from true target regions due to limited observation perspectives. These artifacts substantially interfere with structural and dimensional analysis. In contrast, the proposed method (as detailed in the previous section) employs optical-ISAR joint observation for space targets. It effectively leverages the orthogonality between optical and ISAR observations to distinguish the true 3D regions of the target from the ambiguous regions. This method effectively preserves structural features such as the main body and solar panels and significantly improving the 3D imaging quality of the targets with all VIOU values above 80%. Although Config3 also adopts optical-ISAR joint observation for space targets, it assumes an idealized three-axis stabilized attitude, resulting in a mismatch between its imaging projection model and the actual imaging projection model of the spinning target. Consequently, Config3 fails to effectively reconstruct the 3D structure of the target and can only retain a very small number of target regions. This explains why its reconstruction performance is significantly worse than those of Config1 and Config2 despite using optical-ISAR joint observation. Compared with the proposed method, Config4 also achieves relatively accurate 3D reconstruction of the target. However, its reconstruction results are relatively sparse with a large number of voids due to pixel loss in the threshold-based target extraction process. This issue erroneously removes valid spatial regions of the target structure, ultimately degrading the 3D imaging quality. As a result, the VIOU values are all lower than those of the proposed method.
Notably, under Config1, the 3D reconstruction result of TG-1 achieved an RI exceeding 90%, while its RA was only 53.07%. This occurs because the reconstruction approximately encompasses the true 3D structure of the target, leading to a high RI but a low RA. By synthesizing the data from Table 10 and Table 11, it can be observed that, compared to RI and RA, VIOU provides a more objective evaluation of 3D reconstruction quality and better describes the consistency between the reconstructed area and the true target area.
Similarly, we statistically analyzed the running time required for 3D reconstruction under different configurations, and the results are shown in Table 12. As can be seen from the table, since Config1 and Config2 utilize ISAR images and optical images for 3D reconstruction, respectively, their runtimes are approximately half of those in the experiments in Section 4.3.1. In contrast, both Config3 and Config4 adopt optical-ISAR joint 3D reconstruction, and their required runtimes are generally comparable to those in the experiment in Section 4.3.1.
Based on the above experimental results, the following conclusions can be drawn: Firstly, the optical-ISAR joint observation system fully utilizes the orthogonality of optical observation and ISAR observation, and its 3D imaging performance is significantly improved compared with single ISAR or optical device observation. Secondly, target attitude estimation is a prerequisite for constructing a correct imaging projection model. An incorrect projection relationship will significantly degrade 3D imaging quality; making accurate attitude estimation is essential for space targets. Moreover, the accuracy of target region extraction directly affects 3D imaging quality. More accurate and complete target region extraction leads to better reconstruction integrity and higher imaging quality.
Comprehensive analysis of the above experimental results demonstrates that the proposed method achieves superior 3D imaging performance. Specifically, the method adopts optical-ISAR joint observation for space targets, extracts target regions based on semantic segmentation networks, and performs target attitude estimation and regional offset correction. Therefore, the proposed method achieves high-quality 3D imaging results, with VIOU values higher than those of comparative methods with other configurations in the reconstruction experiments of the three targets. Through the above experiments, the performance of the proposed method is further verified.

4.3.3. Comparison Analysis

To further verify the performance of the proposed method, we compared it with existing ISAR 3D imaging methods, including the ISEA method [26], the spinning target ISEA (ST-ISEA) method [64], and the orthographic factorization method (OFM) [65]. To enable the above methods to utilize optical images, all optical images were converted to ISAR images using the optical-to-ISAR transformation network [66,67] to ensure compatibility with these methods. Meanwhile, all methods completed the target region offsets correction, and it was assumed that all scattering centers of the target in the OFM were accurately extracted and associated. The 3D imaging results of different space targets using the above methods are shown in Figure 27, and the corresponding values of RA, RI and VIOU are given in Table 13.
From Figure 27 and Table 13, it can be seen that, similar to Config3 in the previous section, the ISEA defaults target is in a three-axis stabilized state, resulting in a mismatch in the imaging projection relationship and only allowing partial retention of the target regions. Meanwhile, since ISEA continuously eliminates the neighborhood of the projected regions of the reconstructed target structure during the reconstruction process, the 3D regions obtained by ISEA are sparser than those of Config3, with VIOU values all decreasing to a certain extent compared with Config3. ST-ISEA jointly optimizes the attitude and 3D structure of space target, enabling fundamentally correct reconstruction of the target’s 3D geometry, its reconstruction results are similar to that of Config4, but are sparser with more voids. This is because ST-ISEA also continuously eliminates the neighborhood of the projected regions of the reconstructed target structure, leading to the erroneous elimination of some spatial regions belonging to the target structure, and its VIOU values are all lower than those of Config4. For the OFM, the quality of its 3D reconstruction results is high, and information such as the target’s component structures is well presented. It should be noted that the VIOU of OFM was not calculated in this paper, mainly based on the following two reasons: first, the reconstruction relies on the idealized assumption that all scattering centers were perfectly extracted and associated, which is difficult to satisfy in practical application scenarios. second, restricted by the inherent characteristics of the OFM, the attitude of the 3D reconstruction result is inconsistent with the real attitude, making direct VIOU comparisons without accounting for such discrepancies objectively invalid.
Meanwhile, since all methods involve optimization processes, we also statistically analyzed the reconstruction time required for different methods. For the ISEA, SR-ISEA, and OFM, the statistical time only includes the duration of 3D reconstruction, excluding the time for image processing, scattering center extraction and matching, target region extraction, and regional offset correction. As for the proposed method, we counted the complete time consumption. The ISEA and SR-ISEA methods were set with 80 particles and 1500 points to be reconstructed. The running time of different methods is shown in Table 14.
As can be seen from Table 14, ISEA, SR-ISEA, and the proposed method all involve optimization processes, thus consuming more time. ISEA only involves the optimization of the target region to be reconstructed, resulting in the shortest processing time among the aforementioned three methods, with the processing time for all three targets not exceeding 3000 s. SR-ISEA needs to simultaneously optimize the target attitude and the reconstructed region, leading to an increase in time consumption compared with ISEA, with an average running time of 4394.68 s. Since the proposed method involves regional adjustment and projection optimization during the 3D reconstruction process, it has the longest time consumption among the three methods, with the processing time for all three targets exceeding 5400 s. In contrast, OFM takes the shortest time, with an average running time of only 14.62 s. This is because only the time for the factorization-based 3D reconstruction process is counted here, excluding the time-consuming steps such as scattering center extraction and matching.
Based on the comprehensive analysis of the above experimental results, the proposed method is relatively time-consuming due to the involvement of regional adjustment and projection optimization in the entire process. Nevertheless, it ensures the quality of target 3D reconstruction and enables accurate presentation of the structural features of the target components. Compared to obtaining low-quality, unknown-pose 3D imaging results with shorter runtime, acquiring high-quality, pose-consistent 3D imaging results with relatively longer processing time is more meaningful for extracting target 3D structural information.

4.3.4. Statistical Analysis

To verify the performance of the proposed method, this study adopts the Monte Carlo simulation method to comprehensively analyze the impacts of factors such as the number of images, image interval, number of offset images, and target region offset value in 3D imaging on reconstruction quality. First, to investigate the effects of the number of images and image interval on the reconstruction quality of the proposed method, two sets of experiments are designed: the first set randomly selects different numbers of images from all observed images of the three targets for 3D imaging, with each number of images setting repeated 50 times; the second set adopts different observed image intervals during the reconstruction of the three targets, and each interval setting is also repeated 50 times. The 3D reconstruction region settings in both sets of experiments are consistent with those in Section 4.3.1, and VIOU, which can more objectively evaluate the quality of 3D reconstruction, is used as the evaluation metric for 3D reconstruction quality. The relevant experimental results are shown in Figure 28.
It can be seen from Figure 28a–c that as the number of images used for reconstruction increases, the VIOU value continuously rises, indicating a steady improvement in 3D imaging quality. This is because more observed images provide richer observation perspectives, which helps distinguish the real structural region of the target from the blurred region and ultimately improves 3D reconstruction quality. Meanwhile, Figure 28d–f reveal that smaller intervals between reconstruction images yield higher-quality 3D imaging. Specifically, when the image interval is 1, the VIOU value of the 3D reconstruction results remains above 0.65. However, when the image interval increases to 6, the imaging quality deteriorates significantly, with the VIOU value dropping below 0.55. This phenomenon can be explained as follows: when the image interval is 1, the images used for 3D reconstruction still have relatively abundant observation perspectives, which can cover most of the observation features required for target 3D reconstruction, with only a small loss of detailed target features. In contrast, as the image interval continues to increase, the number of images involved in 3D reconstruction decreases significantly, resulting in a substantial lack of observation features needed for target 3D reconstruction. This ultimately leads to a significant decline in 3D imaging quality, with the VIOU value being significantly lower than that when the interval is 1.
Meanwhile, to analyze the impacts of factors such as the proportion of offset images and target region offset value on the reconstruction quality of the proposed method, experiments were repeated 50 times under different settings of the proportion of offset images and target region offset values, and the relevant experimental results are shown in Figure 29. In Figure 29, different colors and line styles correspond to different proportions of offset images.
As can be seen from Figure 29, the 3D reconstruction quality of space targets shows a downward trend with the increase in the number of offset images and the enlargement of target region offset. Specifically, when the target offset is 1 unit, the reconstruction quality exhibits only moderate degradation as the offset image ratio grows, maintaining an IOU above 0.7. However, when the target region offset reaches 15 cells, even a modest 10% offset image ratio causes severe deterioration in 3D imaging quality. Analyzed from the principle of 3D imaging, the reasons for this phenomenon are as follows: the target 3D reconstruction region is the intersection of the back projections of the target region in 3D space. When the target region offset is small, the 3D reconstruction region only has the loss of partial edge structural features. However, when the target region offset increases to 15 cells, the intersection of the back projections of the target region in 3D space becomes smaller and smaller, resulting in serious loss of the target’s 3D structure and a significant decline in imaging quality. It can be known from the experimental results in Section 4.3.1 that the proposed method can achieve a target region offset correction accuracy of 3 cells, which can provide a guarantee for obtaining good 3D imaging quality.

4.4. Validation of Measured Data

To validate the effectiveness of the proposed method on measured data, we conducted experiments using publicly available measured data. It should be noted that since this data was acquired through non-simultaneous observations by optical and ISAR devices without providing observation LOS information, 3D imaging experiments cannot be performed using the proposed method. Therefore, this experiment focuses on verifying and analyzing the target region extraction performance of the semantic segmentation network on the measured data, as well as the feasibility of the proposed method for 3D imaging.
Specifically, the optical measured data used in the experiment was acquired by an amateur telescope, with the Chinese Space Station (CSS) as the observed target [19]. The ISAR measured data, on the other hand, was obtained from the tracking and imaging radar (TIRA) in Germany, with TG-1 as the imaging target [68]. As shown in Figure 30, to construct the measured dataset, different modules of the ISS observation images were segmented to extract solar panel-body structures similar to those of space targets, resulting in an equivalent sequence of optical images of space targets. Similarly, frames were extracted sequentially from the measured video files of TG-1, followed by regional cropping to obtain a sequence of ISAR images of space targets. Finally, data augmentation was performed through transformations such as rotation, scaling, and flipping, yielding a total of 1000 optical images and 1000 ISAR images, respectively. Among these, 900 frames were used for training, and the remaining 100 frames for validation. The component region distributions of the measured data were manually labeled based on experience, with partial measured observation images presented in Figure 31. The results of target region extraction using the proposed method are shown in Figure 32.
As can be seen from Figure 32, the proposed method can effectively extract the target region and perform component segmentation. From the visual comparison with the ground truth masks, the component segmentation results are relatively accurate. To quantitatively evaluate the semantic segmentation performance of the proposed method, experiments on target region extraction were conducted using a total of 100 ISAR images from the validation data, and the average value of MIOU and IOU for target region extraction were calculated, and the results are shown in Table 15.
As can be seen from Table 14, the proposed method can accurately extract target regions from measured images, with MIOU and IOU reaching above 0.88 and 0.91, respectively. These results fully demonstrate the effectiveness of the proposed semantic segmentation networks in processing measured data. At the same time, it can also be observed that the accuracy of semantic segmentation has decreased to some extent compared to that of simulated data, attributed to the higher complexity of measured data. However, with more measured data available for model training, the network will extract data features more comprehensively, and the performance of semantic segmentation is expected to be improved.
Since the optical device and the ISAR device observe different targets, and the LOS information during observation is unavailable, the proposed method cannot be directly applied for 3D imaging. Therefore, we adopt the observation information in Section 4.3.1 as the equivalent observation data for both optical device and ISAR device to verify the feasibility of the proposed method for 3D reconstruction using measured data. Figure 33 shows the 3D reconstructed region obtained using the equivalent observation information.
As can be seen from Figure 33, the proposed method can reconstruct the 3D region of target based on observation data. However, since only a single-frame image is used for reconstruction, the target region has high resolution along the normal direction while relatively low resolution in non-normal directions, resulting in a large number of blurred areas. It is worth noting that if the real observation information of each frame image can be obtained, fusing the reconstruction results of each frame can effectively utilize the geometric constraints from different perspectives. This will significantly eliminate the blurred areas in single-frame reconstruction and ultimately achieve high-quality target 3D reconstruction.

5. Discussion

Benefiting from the octree technology and the target region offset correction technology based on projection optimization, the proposed 3D reconstruction method can achieve higher-quality 3D imaging results in more general scenarios. The experimental results in the above sections fully verify the effectiveness and robustness of the proposed 3D reconstruction method, indicating that it can achieve high-quality 3D imaging even when target region offsets exist. At the same time, experiments using measured data also fully prove the feasibility of the proposed method in the application of 3D imaging processing for measured data.

6. Conclusions

To acquire the 3D structural information of space targets, this paper proposes a 3D imaging method based on optical-ISAR joint observation. This method ensures the robustness of space target attitude estimation through extended optical-ISAR joint observation and employs the proposed semantic segmentation networks to automatically extract target regions from optical observation images and ISAR images. By combining octree technology and space carving technology, the efficiency of 3D reconstruction is improved, and the ZOA is adopted to achieve effective correction of target region offsets, ultimately achieving high-quality 3D imaging in complex motion states. Relevant experiments have verified the effectiveness and robustness of the proposed method. Compared with existing methods, the proposed method has four major advantages: strong robustness in attitude estimation, high automation in target region extraction, high efficiency in 3D reconstruction, and accurate correction of target region offsets, making it suitable for the 3D reconstruction of most spacecraft in complex motion states. Additionally, the method exhibits good scalability and can be easily extended to more complex observation systems with a larger number of observation devices and richer types of observation platforms.
However, due to the difficulty in obtaining measured data, all 3D imaging experiments of space targets in this paper are carried out based on simulated data. As for measured data, only equivalent verification and feasibility analysis have been performed. In the future, more publicly available optical images and ISAR observation images can be incorporated to conduct 3D imaging experiments of space targets, further verifying and enhancing the practicality of the method.

Author Contributions

Conceptualization, J.L., C.Y. and Y.Z.; Methodology, J.L., C.X. and Q.Z.; Software, J.L., X.Z. and H.F.; Visualization, C.Y., C.X. and H.F.; Writing—original draft, J.L., Y.Z. and H.F.; Writing—review and editing, J.L., H.F. and Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the editors of Remote Sensing and the anonymous reviewers for their patience, helpful remarks, and useful feedback.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Paolozzi, A.; Porfilio, M.; Currie, D.G.; Dantowitz, R.F. The SPQR experiment: Detecting damage to orbiting spacecraft with ground-based telescopes. Microgravity Sci. Technol. 2007, 19, 65–69. [Google Scholar] [CrossRef]
  2. Graziani, F.; Paolozzi, A.; Sindoni, G.; Felli, F.; Brotzu, A. Improving the imaging of the ISS through the SPQR experiment. In Proceedings of the International Astronautical Federation—56th International Astronautical Congress 2005, Fukuoka, Japan, 17–21 October 2005. [Google Scholar]
  3. Jiang, S.T.; Ren, X.Y.; Wang, C.Y.; Jiang, L.B.; Wang, Z. A Self-Supervised Feature Point Detection Method for ISAR Images of Space Targets. Remote Sens. 2025, 17, 441. [Google Scholar] [CrossRef]
  4. Wang, L.Z.; Wang, L.; Conde, M.H.; Zhu, D.Y. A Novel Enhanced Convolutional Dictionary Learning Method for CS ISAR Imaging. IEEE Geosci. Remote Sens. Lett. 2025, 22, 3502905. [Google Scholar] [CrossRef]
  5. Zhang, H.P.; Wei, Q.M.; Jiang, Z.G. 3D Reconstruction of Space Objects from Multi-Views by a Visible Sensor. Sensors 2017, 17, 1689. [Google Scholar] [CrossRef] [PubMed]
  6. Fu, T.; Zhou, Y.; Wang, Y.; Liu, J.; Zhang, Y.M.; Kong, Q.L.; Chen, B. Neural Field-Based Space Target 3D Reconstruction with Predicted Depth Priors. Aerospace 2024, 11, 997. [Google Scholar] [CrossRef]
  7. Issitt, A.; Mahendrakar, T.; Alvarez, A.; White, R.T.; Sizemore, A. On Optimal Observation Orbits for Learning Gaussian Splatting-Based 3D Models of Unknown Resident Space Objects. In Proceedings of the AIAA SCITECH 2025 Forum, Orlando, FL, USA, 6–10 January 2025. [Google Scholar]
  8. Lu, C.X.; Yu, M.; Cui, H.T. Fast 3D Reconstruction for Unknown Space Targets via Neural Radiance Field. J. Phys. Conf. Ser. 2024, 2762, 012004. [Google Scholar] [CrossRef]
  9. Zhou, C.X.; Jiang, L.B.; Yang, Q.W.; Ren, X.Y.; Wang, Z. High Precision Cross-Range Scaling and 3D Geometry Reconstruction of ISAR Targets Based on Geometrical Analysis. IEEE Access 2020, 8, 132415–132423. [Google Scholar] [CrossRef]
  10. Xu, D.; Bie, B.W.; Sun, G.C.; Xing, M.D.; Pascazio, V. ISAR Image Matching and Three-Dimensional Scattering Imaging Based on Extracted Dominant Scatterers. Remote Sens. 2020, 12, 2699. [Google Scholar] [CrossRef]
  11. Kang, L.; Luo, Y.; Zhang, Q.; Liu, X.W.; Liang, B.S. 3D scattering image reconstruction based on measurement optimization of a radar network. Remote Sens. Lett. 2020, 11, 697–706. [Google Scholar] [CrossRef]
  12. Gong, R.; Wang, L.; Zhu, D.Y. A novel approach for squint InISAR imaging with dual-antenna configuration. Digit. Signal Process. 2022, 127, 103592. [Google Scholar] [CrossRef]
  13. Shao, S.; Yan, H.D.; Liu, H.W. Squint InISAR Imaging of Group Targets with Dual-Optimization-Driven Method for True and False Marine Targets Recognition. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 6399–6416. [Google Scholar] [CrossRef]
  14. Zhou, Y.J.; Zhang, L.; Xing, C.; Xie, P.F.; Cao, Y.H. Target Three-Dimensional Reconstruction From the Multi-View Radar Image Sequence. IEEE Access 2019, 7, 36722–36735. [Google Scholar] [CrossRef]
  15. Long, B.; Tang, P.L.; Wang, F.; Jin, Y.Q. 3-D Reconstruction of Space Target Based on Silhouettes Fused ISAR–Optical Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5213919. [Google Scholar] [CrossRef]
  16. Shen, T.W.; Zhu, S.Y.; Fang, T.; Zhang, R.Z.; Quan, L. Graph-Based Consistent Matching for Structure-from-Motion. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 8–16 October 2016; pp. 139–155. [Google Scholar]
  17. Furukawa, Y.; Ponce, J. Accurate, Dense, and Robust Multiview Stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1362–1376. [Google Scholar] [CrossRef]
  18. Xu, Q.X.; Hu, M.; Fang, Y.Q.; Zhang, X.T. A neural radiance fields method for 3D reconstruction of space target. Adv. Space Res. 2025, 75, 6924–6943. [Google Scholar] [CrossRef]
  19. Chang, Z.M.; Liu, B.Y.; Xia, Y.F.; Guo, Y.M.; Shi, B.X.; Sun, H. Reconstructing Satellites in 3D from Amateur Telescope Images. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 1–12. [Google Scholar] [CrossRef] [PubMed]
  20. Jiao, Z.K.; Ding, C.B.; Chen, L.Y.; Zhang, F.B. Three-Dimensional Imaging Method for Array ISAR Based on Sparse Bayesian Inference. Sensors 2018, 18, 3563. [Google Scholar] [CrossRef]
  21. Zhou, X.P.; Wei, G.H.; Wu, S.L.; Wang, D.W. Three-Dimensional ISAR Imaging Method for High-Speed Targets in Short-Range Using Impulse Radar Based on SIMO Array. Sensors 2016, 16, 364. [Google Scholar] [CrossRef]
  22. Zhou, P.; Wu, J.C.; Zheng, J.H.; Zhang, Z.H.; Zhang, X.; Liu, G.W.; Zhang, J. 3-D Imaging Method of Ship Target for InISAR Based on Second Selection of Time Window. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 23387–23396. [Google Scholar] [CrossRef]
  23. Hou, K.F.; Fan, H.Y.; Liu, Q.H.; Ren, L.X.; Mao, E.K. Three-Dimensional Reconstruction of Target Based on Phase-Derived Technology. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4102612. [Google Scholar] [CrossRef]
  24. Gong, R.; Wang, L.; Wu, B.; Zhang, G.; Zhu, D.Y. Optimal Space-Borne ISAR Imaging of Space Objects with Co-Maximization of Doppler Spread and Spacecraft Component Area. Remote Sens. 2024, 16, 1037. [Google Scholar] [CrossRef]
  25. Xu, D.; Wang, X.; Wu, Z.X.; Fu, J.X.; Zhang, Y.H.; Chen, J.L.; Xing, M.D. Space Target 3-D Reconstruction Using Votes Accumulation Method of ISAR Image Sequence. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10502–10514. [Google Scholar] [CrossRef]
  26. Liu, L.; Zhou, Z.B.; Zhou, F.; Shi, X.R. A New 3-D Geometry Reconstruction Method of Space Target Utilizing the Scatterer Energy Accumulation of ISAR Image Sequence. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8345–8357. [Google Scholar] [CrossRef]
  27. Zhou, Z.B.; Liu, L.; Du, R.Z.; Zhou, F. Three-Dimensional Geometry Reconstruction Method for Slowly Rotating Space Targets Utilizing ISAR Image Sequence. Remote Sens. 2022, 14, 1144. [Google Scholar] [CrossRef]
  28. Chen, R.D.; Jiang, Y.C.; Liu, Z.T.; Zhang, Y.; Jiang, B. A novel spaceborne ISAR imaging approach for space target with high-order translational motion compensation and spatial variant MTRC correction. Int. J. Remote Sens. 2023, 44, 6549–6578. [Google Scholar] [CrossRef]
  29. Liu, C.; Luo, Y.H.; Yu, Z.J. A Robust Translational Motion Compensation Method for Moving Target ISAR Imaging Based on Phase Difference-Lv’s Distribution and Auto-Cross-Correlation Algorithm. Remote Sens. 2024, 16, 3554. [Google Scholar] [CrossRef]
  30. Zhou, Y.J.; Zhang, L.; Cao, Y.H.; Huang, Y. Optical-and-radar Image Fusion for Dynamic Estimation of Spin Satellites. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 2020, 29, 2963–2976. [Google Scholar] [CrossRef]
  31. Wang, C.Y.; Jiang, L.B.; Ren, X.L.; Zhong, W.J.; Wang, Z. Automatic Instantaneous Attitude Estimation Framework for Spacecraft Based on Colocated Optical/ISAR Observation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 3502005. [Google Scholar] [CrossRef]
  32. Pittet, J.-N.; Šilha, J.; Schildknecht, T. Spin motion determination of the Envisat satellite through laser ranging measurements from a single pass measured by a single station. Adv. Space Res. 2018, 61, 1121–1131. [Google Scholar] [CrossRef]
  33. Song, C.; Lin, H.Y.; Zhao, C.Y. Analysis of Envisat’s rotation state using epoch method. Adv. Space Res. 2020, 66, 2681–2688. [Google Scholar] [CrossRef]
  34. Li, J.S.; Zhang, Y.S.; Yin, C.B.; Xu, C.; Li, P.J.; He, J. A Novel Joint Motion Compensation Algorithm for ISAR Imaging Based on Entropy Minimization. Sensors 2024, 24, 4332. [Google Scholar] [CrossRef]
  35. Li, J.S.; Yin, C.B.; Xu, C.; He, J.; Li, P.J.; Zhang, Y.S. Attitude Estimation of Spinning Space Targets Utilizing Multistatic ISAR Joint Observation. Remote Sens. 2025, 17, 2263. [Google Scholar] [CrossRef]
  36. Du, R.Z.; Liu, L.; Bai, X.; Zhou, X.; Zhou, F. Instantaneous Attitude Estimation of Spacecraft Utilizing Joint Optical-and-ISAR Observation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5112114. [Google Scholar] [CrossRef]
  37. Wang, C.Y.; Jiang, L.B.; Zhong, W.J.; Ren, X.Y.; Wang, Z. Staring-imaging satellite pointing estimation based on sequential ISAR images. Chin. J. Aeronaut. 2024, 37, 261–276. [Google Scholar] [CrossRef]
  38. Chen, M.; Wang, T.F.; Xu, C.C.; Chen, J.; Chen, E.P.; Pan, Z.S. Gradient Prior Guidance and Image Adaptation Enhancement for Semi-Supervised SAR Ship Instance Segmentation. IEEE Sens. J. 2024, 24, 36216–36229. [Google Scholar] [CrossRef]
  39. Piramanayagam, S.; Cutler, P.J.; Schwartzkopf, W.; Koehler, F.W.; Saber, E. Application of gradient based image segmentation to SAR imagery. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4316–4319. [Google Scholar]
  40. Li, C.K. SAR image segmentation based on different gradients with nonlocal mean filter. In Proceedings of the Thirteenth International Conference on Digital Image Processing (ICDIP 2021), Singapore, 30 June 2021. [Google Scholar]
  41. Pan, S.M.; Tao, Y.L.; Nie, C.C.; Chong, Y.W. PEGNet: Progressive Edge Guidance Network for Semantic Segmentation of Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2021, 18, 637–641. [Google Scholar] [CrossRef]
  42. Pan, P.; Zhang, C.X.; Sun, J.B.; Guo, L.N. Multi-scale conv-attention U-Net for medical image segmentation. Sci. Rep. 2025, 15, 12041. [Google Scholar] [CrossRef]
  43. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8 September 2018; pp. 3–19. [Google Scholar]
  44. Wang, X.L.; Girshick, R.; Gupta, A.; He, K.M. Non-local Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
  45. Kou, P.; Qiu, X.F.; Liu, Y.X.; Zhao, D.J.; Li, W.J.; Zhang, S.H. ISAR Image Segmentation for Space Target Based on Contrastive Learning and NL-Unet. IEEE Geosci. Remote Sens. Lett. 2023, 20, 3506105. [Google Scholar] [CrossRef]
  46. Zhu, X.L.; Zhang, Y.S.; Lu, W.; Fang, Y.Q.; He, J. An ISAR Image Component Recognition Method Based on Semantic Segmentation and Mask Matching. Sensors 2023, 23, 7955. [Google Scholar] [CrossRef]
  47. Ni, P.S.; Liu, Y.Y.; Pei, H.; Du, H.Z.; Li, H.L.; Xu, G. CLISAR-Net: A Deformation-Robust ISAR Image Classification Network Using Contrastive Learning. Remote Sens. 2023, 15, 33. [Google Scholar] [CrossRef]
  48. Montenegro, A.A.; Carvalho, P.C.P.; Velho, L. Space carving with a hand-held camera. In Proceedings of the 17th Brazilian Symposium on Computer Graphics and Image Processing, Curitiba, Brazil, 17–20 October 2004; pp. 396–403. [Google Scholar]
  49. Wang, C.; Yang, W.M.; Liao, Q.M. A Space Carving Based Reconstruction Method Using Discrete Viewing Edges. In Proceedings of the 2013 Seventh International Conference on Image and Graphics, Qingdao, China, 26–28 July 2013; pp. 607–611. [Google Scholar]
  50. Saftly, W.C.P.; Baes, M.; Gordon, K.D.; Vandewoude, S.; Rahimi, A.; Stalevski, M. Using hierarchical octrees in Monte Carlo radiative transfer simulations. Astron. Astrophys. 2013, 554, A10. [Google Scholar] [CrossRef]
  51. Li, J.Z.; Wen, Z.Y.; Zhang, L.; Hu, J.B.; Hou, F.; Zhang, Z.B.; He, Y. GS-Octree: Octree-based 3D Gaussian Splatting for Robust Object-level 3D Reconstruction Under Strong Lighting. In Proceedings of the Computer Graphics Forum, Strasbourg, France, 22–26 April 2024. [Google Scholar]
  52. Trojovská, E.; Dehghani, M.; Trojovský, P. Zebra Optimization Algorithm: A New Bio-Inspired Optimization Algorithm for Solving Optimization Algorithm. IEEE Access 2022, 10, 49445–49473. [Google Scholar] [CrossRef]
  53. Blender-The Free and Open Source 3D Creation Suite. Available online: https://www.blender.org/ (accessed on 27 October 2025).
  54. Boag, A. A fast physical optics (FPO) algorithm for high frequency scattering. IEEE Trans. Antennas Propag. 2004, 52, 197–204. [Google Scholar] [CrossRef]
  55. 3D Resources-NASA Science. Available online: https://nasa3d.arc.nasa.gov/models (accessed on 27 October 2025).
  56. Space-Track.ROG. Available online: https://www.space-track.org/ (accessed on 27 October 2025).
  57. Chen, L.C.; Zhu, Y.K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 833–851. [Google Scholar]
  58. Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
  59. Wang, J.D.; Li, Y.H.; Du, L.; Song, M.; Xing, M.D. Joint Estimation of Satellite Attitude and Size Based on ISAR Image Interpretation and Parametric Optimization. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5103817. [Google Scholar] [CrossRef]
  60. Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for Semantic Segmentation. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
  61. Xie, E.Z.; Wang, W.H.; Yu, Z.D.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Proceedings of the Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Montreal, QC, Canada, 6–14 December 2021. [Google Scholar]
  62. Javadi, S.H.; Sahli, H.; Bourdoux, A. Rayleigh-based segmentation of ISAR images. Appl. Opt. 2023, 62, F1–F7. [Google Scholar] [CrossRef]
  63. Singh, T.R.; Roy, S.; Singh, O.I.; Sinam, T.; Singh, K.M. A New Local Adaptive Thresholding Technique in Binarization. Int. J. Comput. Sci. 2011, 8, 271–277. [Google Scholar]
  64. Zhou, Z.B.; Jin, X.G.; Liu, L.; Zhou, F. Three-Dimensional Geometry Reconstruction Method from Multi-View ISAR Images Utilizing Deep Learning. Remote Sens. 2023, 15, 1882. [Google Scholar] [CrossRef]
  65. Wang, F.; Xu, F.; Jin, Y.Q. Three-Dimensional Reconstruction From a Multiview Sequence of Sparse ISAR Imaging of a Space Target. IEEE Trans. Geosci. Remote Sens. 2017, 56, 611–620. [Google Scholar] [CrossRef]
  66. Zhao, Y.X.; Liao, H.Z.; Kong, D.R.; Yang, Z.X.; Xia, J.Y. SAIG: Semantic-Aware ISAR Generation via Component-Level Semantic Segmentation. IEEE Geosci. Remote Sens. Lett. 2025, 22, 3504405. [Google Scholar] [CrossRef]
  67. Zhou, Y.J.; Li, W.F.; Wei, S.P.; Wang, G.Y.; Zhang, W.A. Satellite Dynamic Estimation Utilizing Feature Translation Across Optical and Radar Imagery. IEEE Trans. Instrum. Meas. 2025, 74, 8502114. [Google Scholar] [CrossRef]
  68. Monitoring the Re-Entry of the Chinese Space Station Tiangong-1 with TIRA. Available online: https://www.fhr.fraunhofer.de/en/businessunits/space/monitoring-the-re-entry-of-the-chinese-space-station-tiangong-1-with-tira.html (accessed on 1 April 2018).
Figure 1. The imaging geometry of the optical-ISAR joint observation system.
Figure 1. The imaging geometry of the optical-ISAR joint observation system.
Remotesensing 17 03881 g001
Figure 2. Definition of LOS.
Figure 2. Definition of LOS.
Remotesensing 17 03881 g002
Figure 3. Geometric of motion vectors for space targets.
Figure 3. Geometric of motion vectors for space targets.
Remotesensing 17 03881 g003
Figure 4. Joint observation imaging projection geometry for space targets.
Figure 4. Joint observation imaging projection geometry for space targets.
Remotesensing 17 03881 g004
Figure 5. Flowchart of the proposed method.
Figure 5. Flowchart of the proposed method.
Remotesensing 17 03881 g005
Figure 6. Network structure of OpticalSegNet.
Figure 6. Network structure of OpticalSegNet.
Remotesensing 17 03881 g006
Figure 7. Structure diagram of FFM.
Figure 7. Structure diagram of FFM.
Remotesensing 17 03881 g007
Figure 8. Structure diagram of ACAM.
Figure 8. Structure diagram of ACAM.
Remotesensing 17 03881 g008
Figure 9. Structure of ISARSegNet.
Figure 9. Structure of ISARSegNet.
Remotesensing 17 03881 g009
Figure 10. Flowchart of the octree-space carving-based 3D reconstruction method.
Figure 10. Flowchart of the octree-space carving-based 3D reconstruction method.
Remotesensing 17 03881 g010
Figure 11. Flowchart of octree construction.
Figure 11. Flowchart of octree construction.
Remotesensing 17 03881 g011
Figure 12. Flowchart of the region offset correction method based on projection optimization.
Figure 12. Flowchart of the region offset correction method based on projection optimization.
Remotesensing 17 03881 g012
Figure 13. Simulation process of experimental data: (a) optical imaging; (b) ISAR imaging.
Figure 13. Simulation process of experimental data: (a) optical imaging; (b) ISAR imaging.
Remotesensing 17 03881 g013
Figure 14. Geometric relationships between the space targets and observation stations.
Figure 14. Geometric relationships between the space targets and observation stations.
Remotesensing 17 03881 g014
Figure 15. Three-Dimensional models of the space targets used: (a) Aqua, (b) CALIPSO, (c) CloudSat, (d) COROT, (e) Jason-1, (f) Meteor, (g) Sentinel, and (h) TG-1.
Figure 15. Three-Dimensional models of the space targets used: (a) Aqua, (b) CALIPSO, (c) CloudSat, (d) COROT, (e) Jason-1, (f) Meteor, (g) Sentinel, and (h) TG-1.
Remotesensing 17 03881 g015
Figure 16. Partial simulated optical images: (a) Aqua, (b) CALIPSO, (c) CloudSat, (d) COROT, (e) Jason-1, (f) Meteor, (g) Sentinel, and (h) TG-1.
Figure 16. Partial simulated optical images: (a) Aqua, (b) CALIPSO, (c) CloudSat, (d) COROT, (e) Jason-1, (f) Meteor, (g) Sentinel, and (h) TG-1.
Remotesensing 17 03881 g016
Figure 17. Partial simulated ISAR images: (a) Aqua, (b) CALIPSO, (c) CloudSat, (d) COROT, (e) Jason-1, (f) Meteor, (g) Sentinel, and (h) TG-1.
Figure 17. Partial simulated ISAR images: (a) Aqua, (b) CALIPSO, (c) CloudSat, (d) COROT, (e) Jason-1, (f) Meteor, (g) Sentinel, and (h) TG-1.
Remotesensing 17 03881 g017
Figure 18. LOS and equivalent LOS variations in the observation stations: (ac) LOS variations in ARC1, ARC3 and ARC5; (df) equivalent LOS variations in ARC1, ARC3 and ARC5.
Figure 18. LOS and equivalent LOS variations in the observation stations: (ac) LOS variations in ARC1, ARC3 and ARC5; (df) equivalent LOS variations in ARC1, ARC3 and ARC5.
Remotesensing 17 03881 g018
Figure 19. Observation images and ground truth semantic segmentation masks: (ac) optical images of Aqua, Meteor and TG-1; (df) ISAR images of Aqua, Meteor and TG-1; (gi) semantic segmentation masks for (ac); and (jl) semantic segmentation masks for (df).
Figure 19. Observation images and ground truth semantic segmentation masks: (ac) optical images of Aqua, Meteor and TG-1; (df) ISAR images of Aqua, Meteor and TG-1; (gi) semantic segmentation masks for (ac); and (jl) semantic segmentation masks for (df).
Remotesensing 17 03881 g019
Figure 20. Semantic segmentation results of Aqua’s optical images (ah) and ISARimages (ip) using different methods: (a,i) Deeplabv3+, (b,j) FCN, (c,k) CL-NL-Unet, (d,l) Pix2pixGan, (e,m) Segmenter, (f,n) Segformer, (g,o) Proposed, and (h,p) Threshold.
Figure 20. Semantic segmentation results of Aqua’s optical images (ah) and ISARimages (ip) using different methods: (a,i) Deeplabv3+, (b,j) FCN, (c,k) CL-NL-Unet, (d,l) Pix2pixGan, (e,m) Segmenter, (f,n) Segformer, (g,o) Proposed, and (h,p) Threshold.
Remotesensing 17 03881 g020
Figure 21. Semantic segmentation results of Meteor’s optical images (ah) and ISARimages (ip) using different methods: (a,i) Deeplabv3+, (b,j) FCN, (c,k) CL-NL-Unet, (d,l) Pix2pixGan, (e,m) Segmenter, (f,n) Segformer, (g,o) Proposed, and (h,p) Threshold.
Figure 21. Semantic segmentation results of Meteor’s optical images (ah) and ISARimages (ip) using different methods: (a,i) Deeplabv3+, (b,j) FCN, (c,k) CL-NL-Unet, (d,l) Pix2pixGan, (e,m) Segmenter, (f,n) Segformer, (g,o) Proposed, and (h,p) Threshold.
Remotesensing 17 03881 g021
Figure 22. Semantic segmentation results of TG-1’s optical images (ah) and ISAR images (ip) using different methods: (a,i) Deeplabv3+, (b,j) FCN, (c,k) CL-NL-Unet, (d,l) Pix2pixGan, (e,m) Segmenter, (f,n) Segformer, (g,o) Proposed, and (h,p) Threshold.
Figure 22. Semantic segmentation results of TG-1’s optical images (ah) and ISAR images (ip) using different methods: (a,i) Deeplabv3+, (b,j) FCN, (c,k) CL-NL-Unet, (d,l) Pix2pixGan, (e,m) Segmenter, (f,n) Segformer, (g,o) Proposed, and (h,p) Threshold.
Remotesensing 17 03881 g022
Figure 23. Comparison of 3D reconstruction results: (ac) reconstruction results of CPR method for Aqua, Meteor and TG-1; (df) reconstruction results of the proposed method results for Aqua, Meteor and TG-1; (gi) ideal 3D structures of Aqua, Meteor, and TG-1. The colors represent the elevation of the target voxels.
Figure 23. Comparison of 3D reconstruction results: (ac) reconstruction results of CPR method for Aqua, Meteor and TG-1; (df) reconstruction results of the proposed method results for Aqua, Meteor and TG-1; (gi) ideal 3D structures of Aqua, Meteor, and TG-1. The colors represent the elevation of the target voxels.
Remotesensing 17 03881 g023
Figure 24. Results of region offset correction: (ac) optimization iteration curves of Aqua, Meteor and TG-1; (df) offset estimation error distributions of Aqua, Meteor and TG-1.
Figure 24. Results of region offset correction: (ac) optimization iteration curves of Aqua, Meteor and TG-1; (df) offset estimation error distributions of Aqua, Meteor and TG-1.
Remotesensing 17 03881 g024
Figure 25. Comparison of the real target region, reconstructed reprojection region obtained by CPR, and reconstructed reprojection region of the proposed method for: (ac) optical image of Aqua, (df) ISAR image of Aqua, (gi) optical image of Meteor, (jl) ISAR image of Meteor, (mo) optical image of TG-1, and (pr) ISAR image of TG-1.
Figure 25. Comparison of the real target region, reconstructed reprojection region obtained by CPR, and reconstructed reprojection region of the proposed method for: (ac) optical image of Aqua, (df) ISAR image of Aqua, (gi) optical image of Meteor, (jl) ISAR image of Meteor, (mo) optical image of TG-1, and (pr) ISAR image of TG-1.
Remotesensing 17 03881 g025
Figure 26. Three-Dimensional reconstruction results of (a,d,g,j) Aqua, (b,e,h,k) Meteor and (c,f,i,l) TG-1 under Config1, Config2, Config3 and Config4. The colors represent the elevation of the target voxels.
Figure 26. Three-Dimensional reconstruction results of (a,d,g,j) Aqua, (b,e,h,k) Meteor and (c,f,i,l) TG-1 under Config1, Config2, Config3 and Config4. The colors represent the elevation of the target voxels.
Remotesensing 17 03881 g026
Figure 27. Three-Dimensional reconstruction results of (a,d,g) Aqua, (b,e,h) Meteor and (c,f,i) TG-1 using ISEA, ST-ISEA and OFM, respectively. The colors represent the elevation of the target voxels.
Figure 27. Three-Dimensional reconstruction results of (a,d,g) Aqua, (b,e,h) Meteor and (c,f,i) TG-1 using ISEA, ST-ISEA and OFM, respectively. The colors represent the elevation of the target voxels.
Remotesensing 17 03881 g027
Figure 28. Variation of 3D reconstruction quality with proportion of images used for reconstruction and image interval for (a,d) Aqua, (b,e) Meteor and (c,f) TG-1.
Figure 28. Variation of 3D reconstruction quality with proportion of images used for reconstruction and image interval for (a,d) Aqua, (b,e) Meteor and (c,f) TG-1.
Remotesensing 17 03881 g028
Figure 29. Variation of 3D reconstruction quality with proportion of offset images and target region offset for (a) Aqua, (b) Meteor and (c) TG-1.
Figure 29. Variation of 3D reconstruction quality with proportion of offset images and target region offset for (a) Aqua, (b) Meteor and (c) TG-1.
Remotesensing 17 03881 g029
Figure 30. The processing flow of the measured data.
Figure 30. The processing flow of the measured data.
Remotesensing 17 03881 g030
Figure 31. Measured images: (ac) optical images (Frames 1–3); (df) ISAR images (Frames 1–3).
Figure 31. Measured images: (ac) optical images (Frames 1–3); (df) ISAR images (Frames 1–3).
Remotesensing 17 03881 g031
Figure 32. Comparative analysis of ground truth masks and semantic segmentation results: (ac) ground truth masks of optical images (Frames 1–3); (df) ground truth masks of ISAR images (Frames 1–3); (gi) segmentation results corresponding to (ac); (jl) segmentation results corresponding to (df).
Figure 32. Comparative analysis of ground truth masks and semantic segmentation results: (ac) ground truth masks of optical images (Frames 1–3); (df) ground truth masks of ISAR images (Frames 1–3); (gi) segmentation results corresponding to (ac); (jl) segmentation results corresponding to (df).
Remotesensing 17 03881 g032
Figure 33. Three-Dimensional Imaging results: (ac) 3D reconstruction results of optical images (Frames 1–3); (df) results of observing (ac) along the normal direction; (gi) 3D reconstruction results of ISAR images (Frames 1–3); (jl) results of observing (gi) along the normal direction. The colors represent the elevation of the target voxels.
Figure 33. Three-Dimensional Imaging results: (ac) 3D reconstruction results of optical images (Frames 1–3); (df) results of observing (ac) along the normal direction; (gi) 3D reconstruction results of ISAR images (Frames 1–3); (jl) results of observing (gi) along the normal direction. The colors represent the elevation of the target voxels.
Remotesensing 17 03881 g033
Table 1. Parameters of the used TLE data.
Table 1. Parameters of the used TLE data.
TLE NameParameters
TLE11 39150U 13018A 23092.27047355  .00001617  00000-0  23877-3 0  9995
2 39,150  98.0352 163.0146 0,022,971  76.1631 284.2129 14.76472717535358
TLE21 44310U 19032A 23092.17633197  .00006723  00000-0  52032-3 0  9993
2 44,310  44.9832 194.7597 0,009,778 146.1448 288.1752 15.02106084209691
TLE31 51102U 22004A 23092.58401987  .00000276  00000-0  11131-3 0  9990
2 51,102  98.5957 152.8512 0,500,395 303.1456  52.2629 13.83927097 62353
TLE41 25544U 98067A 23091.10374725  .00020749  00000-0  37896-3 0  9992
2 25,544  51.6419 350.4695 0,007,223 140.3984   6.4707 15.49341873389833
TLE51 48274U 21035A 23091.59408903  .00036645  00000-0  37919-3 0  9995
2 48,274  41.4735 262.9651 0,005,222 273.2786 165.3212 15.63969261109859
TLE61 37820U 11053A 16266.35688463  .00025497  00000-0  24137-3 0  9991
2 37,820  42.7662  24.7762 0,015,742 351.0529 104.2087 15.66280400285808
Table 2. Locations of the observation stations.
Table 2. Locations of the observation stations.
Observation StationPosition
ISAR1Shanghai (31.10N, 121.36E, 17.70 m)
Optical1Shanghai (31.10N, 121.36E, 17.70 m)
ISAR2Beijing (39.91N, 116.39E, 35.00 m)
Table 3. Rendering parameters used for optical imaging simulation.
Table 3. Rendering parameters used for optical imaging simulation.
ParameterValue
Image size (cells × cells)-512 × 512
Defocus blur Standard   deviation   σ b l u r σ b l u r = 1 ~ 5
Kernel   size   K s i z e K s i z e = 6 σ b l u r + 1
Motion blur Motion   length L b l u r L b l u r = 3 ~ 10
Motion   angle θ b l u r (°) θ b l u r = 50 ~ 50
Gaussian noise Noise   standard   deviation σ n o i s e σ n o i s e = 2 ~ 15
Salt-and-pepper noise Noise   density p n o i s e (%) p n o i s e = 0.1 ~ 5
Salt-to-pepper   ratio r n o i s e r n o i s e = 1 : 1 ~ 1 : 2
Table 4. Imaging parameters of ISAR device.
Table 4. Imaging parameters of ISAR device.
ParameterValue
Data size (cells × cells)512 × 512
Signal frequency (GHz)10, 12 and 14
Bandwidth (GHz)2, 3 and 4
Pulse repetition frequency (Hz)30
Table 5. Comparison of MIOU and IOU for target region extraction results.
Table 5. Comparison of MIOU and IOU for target region extraction results.
MethodMIOUIOU
Deeplabv3+0.8632/0.85730.9078/0.9012
FCN0.8561/0.85980.9037/0.9024
CL-NL-Unet0.8921/0.88430.9256/0.9183
Pix2pixGan0.8797/0.87640.9005/0.8936
Segmenter0.8894/0.87920.9219/0.9015
Segformer0.8821/0.87290.9158/0.9083
Proposed0.9063/0.89570.9340/0.9202
Threshold-0.8391/0.8176
Table 6. Impact of different network architectures on semantic segmentation performance.
Table 6. Impact of different network architectures on semantic segmentation performance.
MethodMIOUIOU
Unet0.8603/0.85660.8908/0.8797
Unet + IGPB0.8840 (+0.0237)/0.8790 (+0.0224)0.9091 (+0.0183)/0.8964 (+0.0167)
Unet + BCFO0.8856 (+0.0253)/0.8814 (+0.0248)0.9080 (+0.0172)/0.8962 (+0.0165)
Unet + BSB0.8727 (+0.0124)/0.8659 (+0.0093)0.9162 (+0.0254)/0.9014 (+0.0217)
Unet + IGPB + BCFO0.9005 (+0.0402)/0.8961 (+0.0395)0.9220 (+0.0312)/0.9095 (+0.0298)
Unet + IGPB + BSB0.8975 (+0.0372)/0.8931 (+0.0365)0.9259 (+0.0351)/0.9134 (+0.0337)
Unet + BCFO + BSB0.8986 (+0.0383)/0.8933 (+0.0367)0.9280 (+0.0372)/0.9162 (+0.0365)
Unet + IGPB + BCFO + BSB0.9032 (+0.0429)/0.8957 (+0.0391)0.9340 (+0.0432)/0.9202 (+0.0405)
Table 7. Comparison of RA, RI and VIOU among between the two methods.
Table 7. Comparison of RA, RI and VIOU among between the two methods.
MethodAquaMeteorTG-1
CPR93.57/55.19/53.1792.87/42.39/41.0591.15/55.02/52.23
Proposed93.53/94.57/88.7693.95/90.86/85.8494.37/85.28/81.15
Table 8. Comparison of RR-IOU across different methods.
Table 8. Comparison of RR-IOU across different methods.
MethodAquaMeteorTG-1
CPR0.53170.41050.5223
Proposed0.88760.85840.8115
Table 9. Comparison of running time of different methods under different reconstruction region settings.
Table 9. Comparison of running time of different methods under different reconstruction region settings.
TargetReconstruction Region SettingTime Consumption of the CPR Method (s)Time Consumption of the Proposed Method (s)
Aqua150 × 150 × 150392.69315.1620
200 × 200 × 200949.375412.6386
300 × 300 × 3003192.357140.7313
Meteor150 × 150 × 150419.72966.2699
200 × 200 × 2001005.138814.5124
300 × 300 × 3003401.892646.9426
TG-1150 × 150 × 150489.75937.5549
200 × 200 × 2001161.907515.1930
300 × 300 × 3003915.628351.8342
Table 10. Configuration of different setups for the proposed method.
Table 10. Configuration of different setups for the proposed method.
ConfigurationOptical DeviceISAR DeviceTarget Region ExtractionAttitude EstimateOffset Correction
Config1×OpticalSegNet and ISARSegNet
Config2×OpticalSegNet and ISARSegNet-
Config3OpticalSegNet and ISARSegNet×
Config4Threshold
Table 11. Comparison of RA, RI and VIOU under different configurations.
Table 11. Comparison of RA, RI and VIOU under different configurations.
TargetConfig1Config2Config3Config4Proposed
Aqua85.53/68.97/61.7684.73/61.83/55.63 55.71/36.53/28.3190.79/79.05/73.18 93.53/94.57/88.76
Meteor86.92/69.39/62.83 84.85/53.41/48.7654.63/39.51/29.7589.37/81.92/74.6593.95/90.86/85.84
TG-153.07/90.11/50.1587.59/60.52/55.7453.26/49.17/34.3588.25/87.62/78.4794.37/85.28/81.15
Table 12. Comparison of running time of different methods under different configurations.
Table 12. Comparison of running time of different methods under different configurations.
ConfigurationReconstruction Time for Aqua (s)Reconstruction Time for Meteor (s)Reconstruction Time for TG-1 (s)Average Runtime (s)
Config119.732824.206723.471922.4705
Config220.419522.814626.321123.1851
Config338.542147.295748.362944.7336
Config441.492543.816248.971544.7601
Table 13. Comparison of RA, RI and VIOU for different methods.
Table 13. Comparison of RA, RI and VIOU for different methods.
TargetISEAST-ISEAProposed
Aqua54.39/34.45/26.7389.53/70.68/65.28 93.53/94.57/88.76
Meteor53.26/37.95/28.47 88.94/73.28/67.1693.95/90.86/85.84
TG-151.98/44.93/31.7589.39/71.10/65.5794.37/85.28/81.15
Table 14. Comparison of time consumption among different methods.
Table 14. Comparison of time consumption among different methods.
MethodReconstruction Time for Aqua (s)Reconstruction Time for Meteor (s)Reconstruction Time for TG-1 (s)Average Runtime (s)
ISEA2353.292676.842937.412655.85
ST-ISEA4373.624538.474271.954394.68
OFM12.5614.8716.4214.62
Proposed5485.295538.566195.735739.86
Table 15. Statistics of MIOU and IOU for semantic segmentation on measured data.
Table 15. Statistics of MIOU and IOU for semantic segmentation on measured data.
Data TypeMIOUIOU
Optical images0.89350.9265
ISAR images0.88700.9153
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Zhang, Y.; Yin, C.; Xu, C.; Zhu, X.; Fang, H.; Zhang, Q. A Novel Three-Dimensional Imaging Method for Space Targets Utilizing Optical-ISAR Joint Observation. Remote Sens. 2025, 17, 3881. https://doi.org/10.3390/rs17233881

AMA Style

Li J, Zhang Y, Yin C, Xu C, Zhu X, Fang H, Zhang Q. A Novel Three-Dimensional Imaging Method for Space Targets Utilizing Optical-ISAR Joint Observation. Remote Sensing. 2025; 17(23):3881. https://doi.org/10.3390/rs17233881

Chicago/Turabian Style

Li, Jishun, Yasheng Zhang, Canbin Yin, Can Xu, Xinli Zhu, Haihong Fang, and Qingchen Zhang. 2025. "A Novel Three-Dimensional Imaging Method for Space Targets Utilizing Optical-ISAR Joint Observation" Remote Sensing 17, no. 23: 3881. https://doi.org/10.3390/rs17233881

APA Style

Li, J., Zhang, Y., Yin, C., Xu, C., Zhu, X., Fang, H., & Zhang, Q. (2025). A Novel Three-Dimensional Imaging Method for Space Targets Utilizing Optical-ISAR Joint Observation. Remote Sensing, 17(23), 3881. https://doi.org/10.3390/rs17233881

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop