1. Introduction
Against the backdrop of an increasingly complex global security landscape and the public’s growing demand for safe travel, there is a pressing need for efficient, reliable, and privacy-respecting personal security screening technologies, which has become a key focus for both academia and industry [
1,
2,
3]. Conventional metal detectors and handheld scanners are limited to identifying metallic objects, while being ineffective against non-metallic threats such as ceramic knives, plastic explosives, and liquid hazardous materials. Although active millimeter-wave imaging can penetrate clothing to reveal a human body outline [
4,
5], its use of transmitted electromagnetic waves has raised public concerns regarding potential health effects and personal privacy. Within this context, passive millimeter-wave (PMMW) imaging technology has gained prominence due to its distinct advantages. As a passive detection system, it only receives millimeter-wave signals radiated by the human body and concealed objects themselves, without the need for any form of electromagnetic wave emission [
6,
7]. Thus, it completely eliminates health risks and privacy violations at the physical level and is hailed as one of the ideal solutions for the next generation of human body security checks.
However, the practical application of PMMW imaging technology still faces a fundamental bottleneck: the low radiation brightness temperature (BT) contrast between the target and the background. Under indoor ambient temperature conditions, many non-metallic contraband objects exhibit radiation characteristics highly similar to those of human skin, resulting in extremely weak BT differences from the background in the final images, along with a low signal-to-noise ratio (SNR), which makes reliable detection and identification challenging [
7,
8]. Moreover, factors such as the material and thickness of clothing, fluctuations in environmental temperature, and the complex curvature of the human body further exacerbate image degradation [
9,
10] significantly limiting the detection performance and practical utility of traditional single-intensity imaging methods. To break through this bottleneck, researchers have turned their attention to another fundamental physical property of electromagnetic waves—polarization. Polarization information reveals the directional and structural characteristics of a target during the process of radiating or scattering electromagnetic waves, offering key insights that intensity images alone cannot provide [
11]. Therefore, through polarimetric measurements, it becomes possible to extract material and structural “fingerprints” of concealed objects, enabling them to be more clearly distinguished from human skin amidst complex backgrounds.
Current techniques primarily rely on single-polarization images and single-pixel analysis modes. The information provided by a single polarization method is relatively limited, whereas multi-polarization techniques enable the acquisition of richer information [
12,
13]. Numerous studies have demonstrated the potential of multi-polarization sensing and processing across various applications, such as multi-polarization image enhancement [
14,
15], target segmentation [
16,
17], reflection interference suppression [
18,
19], material type identification [
17,
20], and wind direction/speed measurement [
21,
22]. Meanwhile, single-pixel processing methods often result in numerous discrete false positives or missed detections, thereby adversely affecting target detection and segmentation tasks. Consequently, many researchers advocate the adoption of multi-polarization imaging techniques to expand the dimensionality of BT data, enabling the extraction of richer scene information and enhanced detection performance. Previous studies have indicated that in-depth exploration and utilization of polarization information can significantly improve the detection capability of PMMW imaging [
6,
23]. To this end, parameters such as the degree of linear polarization (DoLP) [
23], passive degree of polarization (PDoP) [
11], linear polarization ratio (LPR) [
16], and angle of polarization (AoP) [
24] have been proposed as quantitative features for target detection in diverse scenarios, serving to measure the extent of target polarization. Such methods predominantly rely on polarization degree information for image segmentation to achieve target recognition. However, the polarization state is influenced not only by the target’s material composition but also by variations in target geometry and incidence angles, both of which can alter the observed polarimetric characteristics. These factors may lead to missed detections or false positives in polarization-based detection methods under specific conditions. Furthermore, the effective fusion of multi-dimensional polarization features with radiative intensity, spatial texture, and other relevant attributes is essential for constructing a robust detection system with low false alarm rates, representing a crucial step toward transitioning this technology from laboratory research to practical application.
To address the aforementioned challenges, this paper innovatively proposes a multi-polarization image fusion framework that combines multi-scale edge-preserving decomposition based on Gaussian and weighted average curvature filtering (GWACF) with a gradient-domain pulse-coupled neural network (PCNN). Unlike existing methods that mainly rely on handcrafted features such as Stokes parameters, simple polarization difference, or polarization ratio, our method, for the first time, cascades a weighted average curvature filter (WACF) with Gaussian filtering (GF) to hierarchically decompose an image into a base structural layer (BS), a coarse structural layer (CS), and a fine structural layer (FS). The uniqueness of this decomposition lies in the fact that WACF can simultaneously preserve edges and suppress noise, while the GF progressively strips away structural information at different scales, thus enabling each layer to carry complementary polarization features. Regarding the fusion strategy, we design distinct fusion rules for different layers: for the texture-rich CS and FS layers, a dual-channel PCNN modulated by multi-scale morphological gradient (MSMG) is adopted; for the BS layer that carries the dominant background, an energy-attribute weighted fusion scheme is used. This overall “decomposition–layering–adaptive fusion” pipeline has not been reported in the field of passive millimeter-wave (PMMW) multi-polarization image processing. In comparison, although the sub-region polarization fusion method [
25] also exploits multi-polarization information, its fusion is primarily performed on sub-blocks of Stokes parameter images, lacking multi-scale separation of image structures. The Fisher vector polarization method [
7] focuses on enhancing region classification through Fisher encoding, which essentially remains a feature-level post-processing rather than hierarchical fusion prior to image reconstruction. Deep learning-based methods [
26] primarily rely on convolutional neural networks or Transformers for end-to-end object detection, and they have indeed achieved remarkable progress in performance. However, these methods typically require large amounts of annotated data and suffer from limited interpretability. In contrast, our method is a physics-driven image fusion method that does not rely on training data, thus offering better generalization and interpretability. Through edge-preserving decomposition and layer-wise differentiated fusion, it is physically more consistent with the millimeter-wave polarization scattering mechanism. The main contributions of this paper are:
- (1)
This paper designs a multi-scale edge-preserving decomposition model, which integrates Gaussian filtering with weighted average curvature filtering, termed GWACF. The model is designed to hierarchically represent multi-polarization PMMW images by decomposing them into three structural layers: the FS layer, the BS layer, and the CS layer. The FS layer and the CS layer jointly preserve the rich texture and specific structural information of the image, while the BS layer serves as the bottom-level expression of the image, capturing core elements such as primary structure and background content.
- (2)
In both the FS and CS layers, since they preserve the texture details and partial structural features of the image, the gradient-domain PCNN method performs well in fusing fine-textured images and is therefore employed for the fusion of these two layers. As for the BS layer, which contains the primary background information and overall structural contours of the image, the fusion effect at this level directly determines the overall quality of the final image. Given that the energy attribute (EA) fusion method performs exceptionally well when dealing with image with asymmetric information distribution, and is particularly suitable for coarse approximate image layers. Consequently, it is applied to the fusion of the BS layer.
- (3)
The proposed method was validated through multi-polarization PMMW imaging detection experiments. Experimental results demonstrate that the fusion method produces high-quality and robust imaging of concealed objects on the human body. Qualitative and quantitative comparisons with several competitive baseline methods further highlight its performance advantages.
The remainder of this paper is organized as follows.
Section 2 presents the necessary preliminaries. The proposed fusion method is described in
Section 3. Experimental results and performance analysis are provided in
Section 4 to validate the effectiveness of our method. Finally, conclusions are given in
Section 5.
3. Methodology
This section systematically presents the proposed multi-polarization PMMW image fusion method, whose overall workflow is depicted in
Figure 3. The framework sequentially comprises the following five core modules: GWACF-based multi-scale decomposition, fusion of fine structural (FS) layer, fusion of coarse structural (CS) layer, fusion of base structural (BS) layer, and final image reconstruction.
The process begins by acquiring observational data at four linear polarization angles (0°, 45°, 90°, and 135°), which are recorded as
TB0,
TB45,
TB90, and
TB135, respectively. These polarization measurements are then linearly averaged to yield two components,
TA and
TB, defined as follows.
Subsequently, the GWACF decomposition algorithm is employed to decompose the TA and TB images into three distinct layers: the FS layer, CS layer, and BS layer. The corresponding BS layers are coalesced using an energy property-based fusion strategy. In contrast, the fusion of the FS and CS layers is achieved using a gradient-domain PCNN approach. The final fused image is reconstructed by integrating these combined layers.
From the perspective of physical rationality, the pairing strategy adopted in this paper—(TB0 and TB90) as one group, and (TB45 and TB135) as the other—exhibits clear completeness in the polarization basis. Each group forms an orthogonal polarization basis, enabling comprehensive characterization of any linear polarization state. These two groups are sensitive to the structural and dielectric properties of the target along the horizontal/vertical and diagonal directions, respectively, thereby providing complementary polarization signatures. Moreover, this pairing effectively suppresses the strong dependence of a single polarization channel on target orientation, significantly enhancing the robustness of concealed target detection while reducing redundancy and noise interference under the premise of preserving complementary polarization information.
As for the direct fusion of all four polarization channels, although theoretically feasible, it suffers from notable limitations. First, there exists informational redundancy among the four polarization angles; directly fusing them introduces substantial repetitive information into the fusion network, which may lead to feature competition and unstable fusion decisions. Second, simultaneously feeding all four polarization channels into the fusion model greatly increases computational complexity and may introduce nonlinear inter-channel interference, resulting in artifacts or edge blurring in the fused image.
In contrast, the proposed method first constructs two complementary components, TA and TB, which physically mitigate signal fluctuations caused by target orientation while preserving polarization anisotropy information. This provides a more robust input for subsequent edge-preserving decomposition and hierarchical fusion. Therefore, direct fusion of all four channels is not considered in this paper.
3.1. GWACF Multi-Scale Decomposition
For a 2D image
T, its weighted average curvature filtering (WACF) [
35] can be expressed as:
where ∇· and ∇ denote the divergence operator and the gradient operator, respectively.
In the case of
k = 2, the expression of Equation (15) simplifies to:
where Δ is the isotropic Laplacian operator.
Tx and
Ty are the first-order partial derivatives along the
x and
y directions, respectively.
Txx,
Txy and
Tyy represent the corresponding second-order partial derivatives, respectively.
Within a 3 × 3 window, the investigation considers eight possible normalized half-window directions, with corresponding kernels generated for all eight cases:
Furthermore, the eight distance values
ri can be computed using the eight kernels, respectively.
where * denotes the convolution operation. The discrete form of Equation (16) can be expressed as:
For the purpose of analysis, the WACF procedure is represented as:
where
Tout and
Tin represent the filtered output image and the original input image before filtering, respectively. WACF(·) denotes the WACF operation.
Based on this, the paper proposed an image decomposition method that integrates Gaussian filtering with weighted average curvature filtering, namely the GWACF method, whose overall workflow is illustrated in
Figure 4.
Tin refers to the input image,
(
m = 1, 2, 3) denotes the resultant image after the
m-th Gaussian filtering operation, and
is the result after performing the WACF operation. The decomposed layers
,
and
are then given by the following equations:
where
and
can be respectively calculated as follows:
where the symbol GF(·) denotes the Gaussian filtering operator.
After decomposition via the GWACF operator,
Tin can be composed of three distinct hierarchical levels.
Therefore,
TA and
TB in
Figure 3 can be decomposed into
,
,
,
,
and
through Equations (21) to (23), respectively.
3.2. Fusion Strategy for Fine and Coarse Structural Layers
The multi-scale morphological gradient (MSMG) is an edge extraction method that integrates multi-scale strategies with mathematical morphological operations. By performing morphological operations at different structural element scales and fusing the results, it effectively enhances and captures image contours and detailed features with improved accuracy. The specific steps are as follows:
- (1)
The multi-scale structural elements are denoted as:
where
MS1 denotes a basic structural unit, using a 3 × 3 matrix structuring element as the basic unit, i.e.,
MS1 = [1, 1, 1; 1, 1, 1; 1, 1, 1].
v represents the scale factor, and ⨁ indicates the morphological dilation operation.
- (2)
For an image T, its gradient feature Gv is characterized by the morphological gradient operator as follows:
where
denotes the morphological erosion operation.
- (3)
The output value ρ of MSMG can be computed as the weighted sum of the gradients across different scales:
where
wv denotes the gradient weight at the
v-th scale and can be expressed as:
Both the CS and FS layers carry the texture properties and partial structural characteristics of polarized images. The gradient-domain PCNN fusion strategy is well-suited for fusing images containing small-scale texture features, and is therefore adopted in this paper. Specifically, the CS and FS layers obtained by applying MSMG calculations to
TA and
TB are utilized as the connection strengths input in the PCNN, thereby constituting an MSMG-modulated dual-channel PCNN model, as shown in
Figure 5. Its mathematical expression is given as follows.
According to the introduction in reference [
36], under the premise of preserving the biological characteristics of the original model, the receptive field of the PCNN model can be simplified as:
where
and
denote the input stimuli received by channel 1 and channel 2, respectively, with their magnitudes corresponding to the pixel values at location (
i,
j) in the two input images.
Lij represents the connection parameter, and the connection weight amplification factor
VL is set to 1.
Wijkl denotes the connection weight matrix between neurons. When each neuron is positioned at the center of a 3 × 3 (or 5 × 5) weight matrix, its adjacent pixels correspond to the neurons within this matrix. The connection weights between neurons are closely related to their spatial distance.
Therefore, we define the connection weight as the inverse square of the Euclidean distance between connected neurons—specifically, the connection weight between neuron
ij and neuron
kl is given by:
The modulation field is defined as:
where
Uij denotes the internal activity of the dual-channel output, while
and
correspond to the internal activities of channel 1 and channel 2, respectively, which can be expressed as:
Here, β1 and β2 represent the connection strengths of channel 1 and channel 2, respectively. As for the pulse generator field of this network, it can be represented by Equations (11) and (12).
Despite the notable advances PCNN has achieved in image fusion techniques, this method still exhibits a critical limitation: each pixel in its network architecture corresponds to an independent neuron. If PCNN is directly applied for fusion, pixels representing the same content within an image block may be activated asynchronously, leading to biased fusion decisions. Consequently, the final fused image may exhibit undesirable pixel-level mutations and block artifacts.
As mentioned above, the MSMG exhibits excellent capability in extracting image edge features. Therefore, it is employed as a pre-modulation processing unit for PCNN to effectively enhance spatial correlations across different layers. In this paper, the image results processed by MSMG are used as the connection strength input of the network, and the detailed mathematical representation is given as follows:
where
ρ1 and
ρ2 are defined as the outputs of the two input images under the MSMG operator, which can be derived from Equation (29).
Then, the fused FS and CS layers are denoted as:
where
and
can be obtained from Equations (36) and (37).
3.3. Fusion Strategy for Base Structural Layer
The BS contains the primary information of the polarized image (such as the main texture and the background). Given the information asymmetry inherent in polarization images, an energy-attribute fusion strategy is employed at this level to effectively balance disparities and achieve information complementarity and enhancement. The implementation of this strategy involves three main steps:
- (1)
Calculate the eigenvalues HA and HB of the BS layer, which are respectively expressed as:
where
HA and
HB represent the feature values of the BS layer.
and
represent the mean values of
and
, respectively, while
and
denote the median values of
and
, respectively.
- (2)
Calculate the energy attribute functions EA and EB, which are respectively given by:
where
EA and
EB represent the energy attribute functions of the BS layer.
αG denotes the gain coefficient, we set
αG = 4 in this paper.
- (3)
Obtain the final fused result for the BS layer by weighted averaging:
4. Validation Experiments
Two volunteers were recruited for the experiment, as illustrated in
Figure 6. One volunteer participated in Scenario 1, while the other volunteer performed Scenarios 2 and 3. In each scenario, five types of concealed objects were randomly placed on the volunteer’s chest, abdomen, and thigh pockets, with all objects positioned inside the clothing and close to skin. Imaging was conducted using a W-band focal plane scanning system [
37], operating within the 70–110 GHz frequency range (bandwidth: 40 GHz). The radiometer channels featured a noise figure better than 3.5 dB, with an integration time of 280 μs, achieving a radiometric sensitivity better than 0.5 K.
A high-density polyethylene dielectric lens with a diameter of 460 mm was employed to focus the millimeter waves, and the observation distance was 2.5 m. Imaging under different linear polarizations was realized by rotating the radiometer and feed antenna. Polarization channel calibration was performed using the two-point calibration method: a blackbody absorbing material (emissivity: 0.999) was placed in front of the system, with its physical temperature varying between 25 °C and 60 °C, while a cold temperature reference was obtained by immersing the material in liquid nitrogen (approximately 77 K). Calibration was conducted prior to each polarization angle acquisition to compensate for system drift. To assess stability, each volunteer/scenario was independently measured three times. All experiments were carried out indoors at an ambient temperature of approximately 25 °C, and linear polarization images at 0°, 45°, 90°, and 135° were acquired using the W-band PMMW system.
4.1. Concealed Contraband Detection Imaging Results
In this paper, we adopt an MSMG-PCNN model to fuse the FS and CS layers. For the PCNN, the decay time constant αθ is set to 3, and the amplification factors VL and Vθ are 1 and 20, respectively. The number of iterations is fixed at 100, with the stopping rule defined as reaching the maximum iteration count. A 3 × 3 connection kernel is used as the neighborhood size. For the morphological gradient, the scale factor is v = 3, and the basic structural element MS1 is a 3 × 3 all-ones structuring element. The GF is applied with σ = 20 and a kernel size of 5 × 5. In the WACF decomposition, the number of iterations per scale is set to 2. For the EA fusion, the gain coefficient is αG = 4.
As shown in
Figure 6, the results of three sets of concealed contraband detection imaging experiments are presented, corresponding to three different scenarios from top to bottom in sequence.
Figure 6a shows the detection imaging results for Scenario 1, where five concealed objects are annotated with color boxes, including metal pliers (#N1), a utility knife (#N2), an alcohol bottle (#N3), a mobile phone (#N4), and a charging case (#N5).
Figure 6b displays the image processing result for Scenario 2, containing a water bottle (#N1), a ceramic knife (#N2), a handgun (#N3), an alcohol bottle (#N4), and a utility knife (#N5).
Figure 6c is the imaging result for Scenario 3, which includes a water bottle (#N1), an alcohol bottle (#N2), a handgun (#N3), a glue (#N4), and a ceramic knife (#N5).
As can be observed from the imaging results, all four linear polarization modes—horizontal (TB0), 45° (TB45), vertical (TB90), and 135° (TB135) linear polarization—struggle to effectively distinguish concealed contraband from the human body background, thereby significantly reducing the detection probability. This reveals an inherent limitation of linear polarization in PMMW security checks: its detection performance is highly dependent on the angle between the direction of the target and the polarization direction. For instance, a knife exhibits the strongest signal when aligned parallel to the polarization direction, while the response weakens considerably under orthogonal orientation, easily leading to missed detections. Additionally, non-metallic or structurally complex concealed contraband is difficult to identify due to its weak polarimetric signature. Moreover, the polarization wave scattering caused by the curvature and contour of the human body will also introduce interference and increase the difficulty of image interpretation.
To improve the detection capability of concealed objects, we further investigated multipolarization fusion methods. The results processed by seven fusion methods—polarization summation average (PSA) [
25], principal component analysis (PCA) [
38], discrete cosine transform (DCT) [
39], laplacian pyramid fusion (LPF) [
40], subregion fusion (SF) [
25], and two advanced deep learning fusion models, FDFuse [
41] and LSRNet [
42]—along with the proposed fusion method are presented in
Figure 6. It can be seen that the proposed fusion strategy effectively suppresses image noise while enhancing the intensity contrast between the concealed object and the human background, and presenting a more complete object shape and clearer contour features.
4.2. Performance Analysis
As can be observed from the detection imaging results in
Figure 6, the proposed method effectively reconstructs the shape and contour features of concealed contraband. To further quantitatively evaluate its reconstruction performance, we introduce entropy [
43], blind/referenceless image spatial quality evaluator (BRISQUE) [
44], natural image quality evaluator (NIQE) [
45], perception-based image quality evaluator (PIQE) [
46], and signal-to-noise ratio (SNR) [
25] for comprehensive analysis.
In PMMW imaging, the acquired images inherently suffer from SNR and pronounced noise interference. Under such conditions, high entropy values often originate not from meaningful target information but from background noise and random fluctuations. Pursuing high entropy alone may instead indicate insufficient noise suppression, which is detrimental to subsequent detection and recognition tasks. The core of the proposed method lies in enhancing the brightness temperature contrast of concealed targets and emphasizing target contours and structural information, rather than preserving all textural details including noise. Consequently, in the fused image, fluctuations in background regions are effectively suppressed, target information is strengthened, and the pixel distribution becomes more concentrated. The moderate reduction in entropy therefore reflects the effectiveness of the method, rather than signifying information loss. In fact, in PMMW tasks, lower entropy generally corresponds to a clearer background and more salient targets. Hence, entropy should not be interpreted in isolation as “information richness”. Instead, it must be jointly interpreted with task-relevant metrics such as SNR and detection accuracy.
On the other hand, although no-reference image quality assessment metrics such as BRISQUE, NIQE, and PIQE were originally designed for natural images, their core function is to measure structural integrity, perceptual quality, and the degree of distortion, and they are not strictly confined to natural scenes. These metrics are built upon spatial-domain statistical features, patch-based statistical models, or perceptual features, and they exhibit high sensitivity to blur, noise, block artifacts, and structural distortions—precisely the critical factors that need to be evaluated in PMMW image fusion. In PMMW imaging, quality assessment focuses on edge preservation, structural clarity, and noise suppression, which align closely with the statistical characteristics captured by the aforementioned no-reference metrics. Therefore, employing these metrics to evaluate the fusion performance of the proposed method is both reasonable and valid.
For the three experimental scenarios depicted in
Figure 6, the computed results of each evaluation metric (entropy, BRISQUE, NIQE, PIQE) are summarized in
Table 1. The data demonstrate that the proposed fusion strategy outperforms all comparative methods across all metrics, achieving the best PMMW imaging reconstruction performance. Furthermore, as evidenced by the SNR value for each concealed contraband in
Table 2, the proposed method achieves the highest SNR value. This significantly enhances the contrast between concealed contraband and the surrounding background, thereby effectively improving target detectability.
4.3. Performance on ROC Curves
To comprehensively assess the detection performance, this paper adopts the ROC curve proposed in [
47] as an analytical tool. This curve clearly illustrates the trade-off between detection sensitivity and specificity by plotting the relationship between the true positive rate (TPR) and false positive rate (FPR) across varying thresholds. The area under the curve (AUC) serves as a measure of the overall performance, where TPR reflects the detection sensitivity and FPR is related to its specificity. Their respective formulas are expressed as follows:
where TP (true positive) denotes pixels that truly belong to the target object and are correctly identified as such. TN (true negative) refers to pixels that truly belong to the background and are correctly classified as background. FP (false positive) represents pixels that truly belong to the background but are incorrectly identified as the target object. FN (false negative) indicates pixels that truly belong to the target object but are incorrectly identified as background.
Regarding the generation of ROC curves, we need to further clarify the following two points. First, we provide a detailed description of the threshold setting protocol: after normalizing the pixel values of the fused images to the range [0, 1], we traverse all possible thresholds at a step size of 0.01, and calculate the TPR and FPR at each step, thus generating a complete ROC curve. Second, regarding the generation of ground-truth masks, three researchers independently performed manual annotation based on the actual positions and contours of concealed objects in the original PMMW images. The final binary masks were determined by majority voting, while background regions were selected from typical areas free of interference around the targets. All methods under comparison (including the proposed method and the seven competing methods) strictly use the identical masks for pixel-wise evaluation, ensuring a consistent and fair basis for computing TP, FP, TN, and FN.
As illustrated in
Figure 7, the ROC curves provide a visual representation of the detection outcomes for each concealed object across the three experimental scenarios. The experimental results demonstrate that the ROC curve of the proposed method is consistently closest to the upper-left corner in all detection tasks and achieves the largest AUC values. This finding indicates that the proposed method maintains a significant advantage in detecting concealed objects, exhibiting superior discriminative ability and higher detection accuracy. In other words, the proposed method attains the largest AUC for concealed contraband in all cases, further confirming its excellent detection performance.
4.4. Ablation Experiments
To further validate the contribution of each key component in the proposed framework, we conducted a series of ablation experiments on the first experimental scenario. Specifically, we evaluated the following variants: (1) Decomposition architecture: replacing GWACF with Gaussian filtering alone or WACF alone; (2) Texture layer fusion: removing the MSMG modulation from PCNN; (3) Background layer fusion: replacing EA fusion with simple weighted averaging; (4) Decomposition layers: using one-layer (BS only) or two-layer (BS + FS) decomposition; and (5) Input construction: using the original four-channel polarization images instead of TA/TB. The quantitative results are shown in
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7. The complete model consistently outperformed all variants across all evaluation metrics, confirming the necessity and effectiveness of each proposed module. In particular, GWACF achieved the best edge-preserving decomposition, MSMG-PCNN enhanced texture fusion quality, EA fusion maintained structural integrity, and the three-level decomposition provided the optimal multi-scale representation.
4.5. Time Complexity Analysis
The proposed method mainly consists of three parts: GWACF multi-scale decomposition, MSMG-PCNN fusion (applied to the FS and CS layers), and energy attribute fusion (applied to the BS layer). Let the size of a single input image be H × W. The number of GWACF decomposition scales is set to S = 3, and during the WACF decomposition process, the number of iterations at each scale is set to L. The number of MSMG scales is denoted as R. The number of PCNN iterations is N, and the size of the neighborhood window is K × K.
- (1)
GWACF decomposition: Each layer consists of GF operation (complexity O(HW)) and WACF operation. WACF involves convolution with eight directional kernels, and performing L iterations for each convolution yields a complexity of O(K2LHW). Therefore, the complexity of a single layer is O(8K2LHW). Consequently, the total complexity of the three-layer GWACF decomposition is O(3·(HW + 8K2LHW)), which can be further expressed as O(HW + K2LHW).
- (2)
MSMG-PCNN fusion: This is used for the FS and CS layers (a total of 2S = 6 layers). For each scale v in MSMG, morphological dilation and erosion operations are performed on the image. The morphological operation for each pixel requires traversing all pixels within the window, so the complexity for a single scale is O(HWv2). Summing over all scales yields O(HWR3). In the PCNN, each iteration involves computing the neighborhood connection weights (complexity O(K2HW)) and updating the internal activity (complexity O(HW)). With a total of N iterations, the complexity for a single layer is O(NK2HW). Consequently, the overall complexity of MSMG-PCNN fusion is O(6·(HWR3 + NK2HW)), which can be further simplified to O(HWR3 + NK2HW).
- (3)
Energy attribute fusion (BS layer): This involves only pixel-wise mean and exponential operations, yielding a complexity of O(HW).
In summary, the overall complexity of the proposed method is:
where
R =
K = 3,
L = 2, and
N is set to 100, so the overall complexity is linearly related to the size of the image.
To verify the practical efficiency, we measured the average running time of each algorithm on the same hardware platform (a computer equipped with an i7-12700H CPU and a GeForce RTX 3060 GPU). The experimental results are presented in
Table 8. For images with a resolution of 175 × 360, the average processing time of the proposed method is approximately 0.69 s. Although this time is higher than those of the other fusion methods, considering the significant improvement in detection accuracy and edge preservation achieved by our method, this computational cost remains within an acceptable range for practical security inspection scenarios. It is anticipated that the use of higher-performance processors, combined with further exploration of acceleration algorithms, could potentially enable real-time imaging at video rates in the future.
5. Conclusions
This paper addresses the insufficient application of multi-polarization technology in PMMW security imaging by proposing a multi-scale edge-preserving image decomposition framework integrated with a gradient-domain PCNN fusion strategy. The method employs a GWACF model to decompose images into three layers—FS, BS, and CS layers—which are subsequently fused using gradient-domain PCNN and energy-attribute methods. This enables effective complementary of multi-polarization information while preserving essential texture and edge details. Experimental results demonstrate that the proposed fusion method outperforms existing mainstream methods across multiple evaluation metrics, significantly enhancing target contour and structural information while providing superior image quality for subsequent detection and segmentation tasks. This study has verified the potential of the multi-polarization PMMW technology in the detection of concealed objects in the human body, and it is applicable to scenarios such as non-intrusive security checks at smart transportation hubs, covert security protection in public places, and non-destructive flaw detection in industries.
In terms of future research works, the following two aspects will be focused on: firstly, further optimization of the fusion algorithm to improve computational efficiency and achieve real-time imaging capability at video rate, and secondly, exploration of integration with deep learning-based fusion methods to enhance the extraction and characterization of multi-polarization target features.