1. Introduction
Non-stationary signal detection is a fundamental problem in statistical signal processing and arises widely in radar sensing and wireless communications. Classical detection theory is usually formulated within the framework of binary hypothesis testing and the likelihood-ratio test, and detection performance is commonly evaluated under fixed false-alarm probability constraints [
1]. However, many traditional detection statistics rely on global energy, mean spectral features, template correlation, or a single covariance matrix. When the two classes have similar global low-order statistics, discriminative information may be carried primarily by local time–frequency structures, directional organization, and spatial arrangement. In such cases, global statistics may suppress or discard the local evidence that is most relevant to detection.
Time–frequency analysis provides a finer representation for non-stationary signals and can reveal structural phenomena such as transient energy concentration, local ridges, frequency drift, and fragmented textures [
2,
3]. Existing spectro-temporal patch methods have shown that structural patterns within local time–frequency patches can provide effective information for language identification and acoustic event detection [
4,
5,
6]. Nevertheless, most of these methods still treat spectrogram patches as ordinary intensity maps, template vectors, or inputs to learning models, and they lack explicit geometric modeling of local directional relationships and anisotropy. By contrast, the structure tensor can stably characterize directional energy, directional coupling, and anisotropy through local second-order gradient statistics [
7,
8,
9]. After regularization, it naturally forms a symmetric positive definite (SPD) matrix object. Therefore, transforming a time–frequency patch into an SPD-valued structure field provides a more appropriate representation for detection tasks dominated by local structural information.
In this work, an SPD field is a spatially distributed collection of local symmetric positive definite tensors. Each tensor describes the local directional organization and anisotropy of a time–frequency patch, while the field preserves how such local structures are spatially arranged across the entire observation.
For the purpose of this study, the relevant non-stationary behavior is considered from a structural time–frequency perspective. The signal characteristics of interest are not assumed to be fully described by a time-invariant spectrum, a stationary covariance model, or a global energy statistic. Instead, the class-relevant information may appear as local organization patterns in the time–frequency plane, including local ridges, drifting spectral components, directional continuity, fragmented structures, or spatially localized texture changes. The purpose of introducing a local SPD structure field is therefore to model how these local time–frequency structures are organized, rather than to detect non-stationarity merely as a binary property of a signal.
The SPD manifold and information geometry provide intrinsic metrics for comparing such structural objects [
10,
11,
12,
13,
14]. Previous studies have shown that the affine-invariant Riemannian metric (AIRM), the Karcher mean, the Riemannian geometric mean, and related matrix-manifold methods have been used for covariance modeling and brain–computer interface classification [
15,
16]. In radar detection, information-geometric and matrix information-geometric methods usually represent an observation cell as a single covariance matrix, Hermitian positive definite (HPD) matrix, or cell-level matrix object, and then construct detection statistics using matrix distances, divergences, or geometric filtering [
17,
18,
19,
20,
21,
22,
23]. These methods effectively exploit the non-Euclidean geometry of matrix objects, but most remain at the level of sample-wise or cell-level matrix comparison and preserve little of the spatial distribution of local structures inside a patch.
Recent matrix-manifold studies further show that SPD/HPD matrix representations can be equipped with different intrinsic metrics, alignment operations, and discriminative projections. For example,
-invariant Riemannian metrics provide a broad geometric framework for SPD matrices, Riemannian Procrustes analysis uses geometry-aware transformations for transfer learning on SPD covariance features, and discriminative HPD-manifold projection has been used for matrix information-geometric radar detection in non-Gaussian clutter [
24,
25,
26]. These works reinforce the importance of respecting matrix-manifold geometry when positive-definite matrix objects are used as signal representations.
This leaves a methodological gap. Time–frequency patch methods preserve locality but usually lack SPD geometry-consistent modeling; SPD information-geometric detectors have a mature manifold foundation but often compress each observation into a single matrix object; and structure-tensor methods describe local directional organization but have rarely been formulated as sample-level detection statistics under fixed false-alarm probability () constraints. The present work addresses this gap by modeling each sample as a spatially distributed local SPD structure field, thereby preserving both the local organization of time–frequency structures and the intrinsic geometry of SPD matrices.
Motivated by this gap, this paper proposes an information-geometric detector based on local SPD structure fields for non-stationary signal detection problems in which the discriminative information is mainly reflected in local time–frequency structural organization. The proposed method first represents a time–frequency log-amplitude patch as a local SPD structure field composed of pointwise structure tensors. It then estimates two class-conditional pointwise Karcher mean reference fields under AIRM, and constructs local distance-difference evidence using the pointwise difference between squared AIRM geodesic distances. Finally, discriminative weighting, block-wise robust evidence pooling, and fixed- threshold calibration based on an independent null-hypothesis () calibration set are combined to form a sample-level detection statistic.
The main contributions of this paper are fourfold. First, we introduce an object-level reformulation for non-stationary signal detection by using local symmetric positive definite (SPD) structure fields as detection objects. This reformulation bridges time–frequency patch representations and Riemannian matrix-based detection frameworks, shifting the detection object from conventional scalar statistics or single-matrix summaries to spatially distributed structural objects.
Second, we construct a geometry-consistent relative-closeness detector on the proposed SPD structure-field representation. Class-conditional reference fields are estimated by pointwise Karcher means under the affine-invariant Riemannian metric (AIRM), and local distance-difference evidence is defined through pointwise differences between squared AIRM geodesic distances. This provides a field-level, geometry-consistent reformulation of the classical relative-closeness decision principle on the SPD manifold.
Third, we develop a robust evidence aggregation mechanism for local-structure-dominated detection. Discriminative weighting is used to emphasize stable local structural regions, whereas block-wise robust pooling suppresses low-information areas, local misalignment, and abnormal responses. The resulting aggregation framework integrates spatially nonuniform local distance-difference evidence into a sample-level detection statistic under fixed- calibration.
Fourth, we provide a mechanism-level analysis and validation framework based on a controlled SPD-field structured-locality benchmark, baseline comparisons, ablation studies, structural perturbation experiments, non-Gaussian background experiments, and paired-difference analysis. This analysis separates the roles of the object-layer representation and the Riemannian metric, showing that the local SPD structure-field representation is the primary source of performance gain, while Riemannian geometry provides a complementary consistency constraint and improves stable comparison within the same representation layer.
The remainder of this paper is organized as follows.
Section 2 introduces the theoretical motivation and algorithmic construction of the proposed detector.
Section 3 presents the experimental setup and results.
Section 4 discusses the methodological relationships, applicability limits, and future directions.
Section 5 concludes the paper.
2. Materials and Methods
2.1. Problem Formulation and Information-Geometric Motivation
Classical binary hypothesis testing is usually formulated within the likelihood-ratio test (LRT) framework [
1]. Under an ideal setting in which the statistical models are known, let the observation
x follow the probability models
and
under the null hypothesis
and the alternative hypothesis
, respectively. The classical likelihood-ratio test can then be written as
where
is the decision threshold. For
N independent and identically distributed observations
, the normalized log-likelihood ratio is
As
, the law of large numbers implies that the above expression converges to an expectation under the true distribution
. Combined with the Kullback–Leibler (KL) divergence [
27],
the normalized log-likelihood ratio is equivalent to
Therefore, under ideal large-sample conditions, classical detection can be interpreted as a relative-closeness decision in the sense of KL divergence [
18,
28]: the observation distribution is compared with the two hypothesized models, and the class with the smaller Kullback–Leibler divergence is selected.
This interpretation is used only as theoretical motivation. In the present work, the detection object is not a distribution-level hypothesis model or a single global covariance matrix, but an SPD-valued structure field defined over a local support
:
Here, denotes the set of symmetric positive definite matrices.
Correspondingly, the objects to be compared are no longer the two probability models
and
, but two class-conditional reference structure fields,
This object-layer redefinition is the key starting point that distinguishes the proposed method from the classical detection object level.
On this new object layer, the idea of classical relative closeness can be reformulated at the field level. Let
denote the geodesic distance on the SPD manifold, and let
be a spatial weight. The field-level energy functional relative to the class-
c reference structure field can be defined as
The corresponding field-level relative-closeness statistic is then defined as
A larger value of
indicates that the test structure field is globally closer, in the geometric sense, to
than to
; conversely, a smaller value indicates greater closeness to
. Expanding Equation (
7) gives
This expression shows that field-level relative closeness can be naturally written as a spatially weighted sum of pointwise differences between squared geodesic distances. The local distance-difference evidence and its sample-level aggregation developed below are constructed precisely from this field-level reformulation.
The field-level statistic above is used only to explain the design logic of the proposed method, not as a formal optimality statement for the proposed detector. The proposed method inherits the geometric idea of making a decision according to relative closeness from classical detection [
29,
30,
31], but on the new detection object of local SPD structure fields, it redefines the class-conditional reference centers, the local comparison criterion, and the sample-level evidence aggregation mechanism.
2.2. Local Time–Frequency Structure-Field Representation
Let the observed discrete-time signal be
, where
. To characterize local structural features of non-stationary signals in the joint time–frequency domain, we first use the short-time Fourier transform (STFT) to map the raw observation onto the time–frequency plane [
2,
3]. It is defined as
where
is the analysis window,
R is the frame shift,
m is the time-frame index,
k is the frequency-bin index, and
K is the number of discrete frequency points. The corresponding magnitude spectrum is denoted by
In this paper, the time–frequency representation is not the final decision object, but an intermediate domain for constructing the local structure-field representation. Compared with the raw one-dimensional waveform, the time–frequency representation can more directly reveal transient energy concentration, local ridges, directional continuity, frequency drift, and fragmented textures. These local organizational patterns are often more directly related to the task-relevant discriminative information in complex non-stationary detection tasks than any single global statistic.
Thus, the role of non-stationarity in the proposed pipeline is twofold. First, it motivates the use of a time–frequency representation, because the discriminative structures may change with time and may not be visible in a global spectrum or in raw waveform energy alone. Second, it motivates a local field representation rather than a single pooled statistic, because the useful evidence may be spatially localized, directionally organized, and unevenly distributed over the time–frequency support. The subsequent SPD structure-field construction is designed to preserve this local organization at the object level.
To reduce numerical instability caused by a large amplitude dynamic range and to enhance the separability of weak structural regions, we further use the log-amplitude representation
where
is a small regularization term used to avoid numerical singularities. Local patches are then extracted from
, and the patch support is denoted by
The input object of the subsequent detector is the local structure field defined on . This object choice means that the focus of the paper is not the intensity values themselves, but the directional relationships and spatial organization of local structural units within the patch.
On a local patch, we first compute the local gradient of the log-amplitude map,
where
and
denote the first-order derivatives along the temporal and frequency directions, respectively. The gradient directly reflects the direction and magnitude of local intensity variations. However, a single gradient vector is sensitive to noise, slight misalignment, and isolated outliers, and is therefore not stable enough to represent neighborhood-level structural organization. For this reason, we do not use the gradient itself as the final local feature; instead, we further construct a second-order structure tensor.
The raw second-order tensor is defined as
This tensor encodes local directional energy, directional coupling, and anisotropy into a
symmetric matrix [
7,
8,
9,
32]. To lift this representation from a pointwise gradient outer product to a stable neighborhood-scale second-order structural statistic, we further apply local smoothing to
:
where
is the Gaussian kernel corresponding to the integration scale of the structure tensor, and ∗ denotes convolution. This smoothing step is not merely denoising; rather, it accumulates directional information over a local neighborhood, so that the structure tensor more stably reflects the directional organization pattern inside a patch.
Based on this smoothed tensor, we add a ridge regularization term and define the pointwise structure tensor as
The membership in is ensured by two factors: local smoothing lifts the structural statistic from a pointwise gradient outer product to neighborhood-level second-order information, while further moves the matrix away from the degenerate boundary. This guarantees that subsequent Riemannian distances, Karcher means, and sample-level geometric comparisons are all performed in a valid SPD space.
Therefore, the entire patch can be represented as an SPD-valued structure field defined on
:
This representation is not a lossless transform of the raw patch, but a task-oriented local structural representation designed to preserve detection-relevant second-order geometric information, including directional organization, anisotropy, and spatial organization. All subsequent modeling and comparison steps are built around this SPD structure field.
2.3. Class-Conditional Reference Structure Fields Under AIRM
After obtaining the local SPD structure-field representation, the core training-stage task is to estimate, from the two classes of training samples, class-conditional reference structure fields that represent their local geometric organization patterns. Because the subsequent local comparison uses the affine-invariant Riemannian metric (AIRM), the reference centers must be defined under the same geometry. Otherwise, reference modeling and distance comparison would no longer share a unified intrinsic geometric framework.
Let the structure field of the
i-th training patch in class
be denoted by
For any fixed position
, the training samples from the same class provide a set of SPD matrices,
where
is the number of training samples in class
c. Instead of globally compressing the entire patch first, this paper estimates the class-conditional reference structure tensor separately at each fixed position, thereby preserving the spatial distribution of local geometric relationships. The reference structure field for class
c is therefore defined as
where
is the class-conditional reference structure tensor at position
x. For any
, the AIRM geodesic distance is defined as [
11,
12,
13]
where
denotes the matrix logarithm and
denotes the Frobenius norm. This distance is affine-invariant on the SPD manifold and characterizes intrinsic geometric differences between covariance-like objects. It has therefore been widely used for geometric modeling and classification of SPD matrices.
Accordingly, the reference structure tensor of class
c at position
x is defined as the pointwise Karcher mean under AIRM [
10,
11,
33]:
The two class-conditional reference structure fields are thus given by
This pointwise estimation strategy preserves the spatial distribution of class-conditional local structures and keeps the reference fields structurally aligned with the test structure field. Compared with the arithmetic mean, the Karcher mean is more consistent with the intrinsic geometry of the SPD manifold [
14,
24,
34], and it also keeps reference estimation consistent with the subsequent AIRM-based distance comparison.
2.4. Local Distance-Difference Evidence
After obtaining the two class-conditional reference structure fields
and
, the testing-stage task is to construct local distance-difference evidence by comparing the test structure field with the two reference fields under a unified SPD/AIRM geometry. Let the structure field corresponding to a test sample be
Because the local structure tensor at each position belongs to the SPD space, the local comparison between the test structure field and the reference structure fields should not be reduced to element-wise Euclidean differences or Frobenius differences. Instead, it should be built on the intrinsic geometry that is consistent with reference modeling [
13,
14,
15,
16]. For this reason, AIRM is again adopted as the distance metric for pointwise comparison.
For any position
, the pointwise geodesic distance from the test structure tensor to each class reference structure tensor is defined as
Here,
is the pointwise Karcher mean reference tensor of class
c at position
x. The local distance-difference evidence is then defined pointwise as
The sign of this quantity has a direct local discriminative meaning. When , the test structure tensor at position x is locally closer, in the geometric sense, to than to . When , it is closer to . Thus, forms a spatially distributed local distance-difference evidence map over the entire region .
The key point of this construction is that local comparison and class-conditional reference modeling share the same geometric framework. In
Section 2.3, the class-conditional centers are given by pointwise Karcher means under AIRM; in this section, the local comparison of the test sample with the reference centers is also performed using AIRM. Therefore,
is not a comparison of raw intensity, gradient magnitude, or a single local energy value. It is a comparison of the relative closeness of local second-order structural objects on the SPD manifold. Accordingly,
can be viewed as the pointwise expansion of the field-level relative-closeness statistic introduced in
Section 2.1, characterizing the local geometric tendency of the test structure field toward the two reference fields.
Under this definition,
is not merely an empirical distance-difference quantity; it can also be given an explicit information-geometric interpretation under local approximation conditions [
13,
35].
Proposition 1. Information-geometric interpretation of local distance-difference evidence. Suppose that, for any position , the test structure tensor satisfies a locally isotropic Riemannian Gaussian approximation model in a neighborhood of the class-c reference tensor [35]:where is the squared AIRM geodesic distance and is a shared local perturbation scale for the two classes. If the correlation among local positions is approximately absorbed into a nonnegative spatial weight in the field-level statistic, then the weighted distance-difference statisticdiffers from the approximate log-likelihood ratio only by a positive proportionality factor and an additive constant. Proof of Proposition 1. Under the locally isotropic Riemannian Gaussian approximation above,
When the two classes share the same local perturbation scale and the difference between normalization constants is absorbed into an additive constant, we have
Rearranging gives
that is,
Further taking a spatially weighted sum over
yields
Therefore, under the local approximation conditions above, the field-level weighted distance-difference statistic is monotonically consistent with the approximate log-likelihood ratio. This proposition explains only the information-geometric meaning of the additive distance-difference statistic under a locally isotropic Riemannian Gaussian approximation. The discriminative weighting and block-wise robust pooling introduced below are sample-level aggregation mechanisms built on this local distance-difference evidence. They are intended to enhance stable discriminative regions and reduce the effects of local misalignment, structural fragmentation, and abnormal responses, but they do not constitute an optimality proof in the Neyman–Pearson sense. □
2.5. Discriminative Weighting and Block-Wise Robust Pooling
The local distance-difference evidence map provides spatially distributed geometric information, but different positions do not contribute equally to the final decision. To enhance stable discriminative regions and suppress the effects of local misalignment, structural fragmentation, and abnormal responses, we introduce discriminative weighting and block-wise robust evidence pooling.
Let
and
denote the two class-conditional reference structure tensors at position
x. The local between-class separation is defined as
For class
, the within-class dispersion at position
x is defined as
The unnormalized discriminative weight is then given by
This quantity measures the stable local separability of the geometric structure at position x. Positions with large between-class separation and small within-class dispersion are expected to provide persistent discriminative evidence, whereas positions dominated by high dispersion and low separation are down-weighted.
To make the weights comparable across spatial positions and reduce high-frequency fluctuations caused by finite training samples,
is first min–max normalized,
and then mildly smoothed and clipped to the interval
:
The resulting discriminative weight map emphasizes stable local discriminative regions while preserving spatial continuity in the weight field.
After obtaining
and
, the remaining task is to aggregate the weighted local distance-difference evidence into a sample-level detection statistic. A direct global summation of the weighted evidence,
can still be sensitive to local shifts, missing structures, and isolated abnormal responses. In complex non-stationary scenarios, discriminative evidence is often concentrated in a small number of local structural regions rather than uniformly distributed over the entire patch. This idea is related to multiple-instance learning and set-level pooling, where sample-level decisions are inferred from local instances. However, the proposed method uses non-learned, geometry-driven evidence aggregation rather than trainable attention weights or deep pooling functions [
36,
37,
38].
We define a set of overlapping blocks over
:
For each block
, the weighted block score is computed as follows:
This block-level averaging reduces the influence of pointwise outliers and small local shifts, while preserving the spatial concentration of high-response local structures.
The block scores
are then sorted in descending order. Given a top-block fraction
, the number of selected high-response blocks is
The global mean term and the top-block mean term are defined as
where
denotes the index set of the selected top
K blocks. The final sample-level statistic is
Here,
captures the overall geometric tendency over the field, whereas
emphasizes salient responses from sparse discriminative structures [
39]. The aggregation coefficient
controls the trade-off between global stability and local saliency. This aggregation is not intended as a theoretically optimal mixture; rather, it provides a controlled balance for local-structure detection, where discriminative evidence can be spatially sparse and locally unstable. In the experiments, the top-block fraction
q and the aggregation coefficient
are fixed across all nominal signal-to-noise ratio/signal-to-clutter ratio (SNR/SCR)-like index points, random seeds, and proposed and ablated variants; they are not re-tuned separately for individual operating points. The general selection principles of these and other numerical parameters are summarized in the subsection entitled “Controlled SPD-Field Benchmark Generation”.
Thus, the test sample is mapped from the local distance-difference evidence map to a sample-level statistic T. The discriminative weight map emphasizes stable discriminative positions, block-wise scoring reduces pointwise anomalies and local shifts, and the combination of and balances global stability with local saliency.
2.6. Final Decision Rule and Fixed- Threshold Calibration
Given the sample-level statistic
T, the final decision rule is defined as
Here,
is the detection threshold. A larger value of
T indicates that the test sample is closer to the target-class reference structure field
after discriminative weighting and robust pooling; conversely, a smaller value indicates greater closeness to the null-class reference structure field
. Therefore, the threshold is calibrated using an independent
calibration set [
40,
41,
42].
Let the statistics computed from the independent
calibration set be
. Sorting them in ascending order gives
For a target false-alarm probability
, we use the empirical order-statistic threshold
Because
is estimated from a finite number of
calibration samples, the achieved false-alarm probability does not necessarily equal the nominal target value exactly. We therefore estimate empirical false-alarm probability, referred to as achieved
, using an independent
audit set:
The detection probability is estimated using an independent
test set:
This protocol explicitly separates threshold estimation, detection evaluation, and false-alarm auditing. It avoids calibration/test leakage and ensures that fixed- comparisons among different detectors are performed on a consistent experimental basis. Receiver operating characteristic (ROC) curves are obtained by sweeping the threshold, whereas all fixed- operating-point results use the independent calibration set for threshold determination and the independent audit set for false-alarm validation.
2.7. Algorithmic Summary of the Proposed Detector
The leading idea of the proposed detector is to compare local structural organization rather than raw intensity or a single global statistic. As illustrated in
Figure 1, a waveform or time–frequency patch is first represented as a local SPD structure field. Class-conditional reference fields are then estimated by pointwise Karcher means under AIRM. During testing, the test field is compared with the two reference fields to obtain local distance-difference evidence, which is weighted, robustly pooled, and finally converted into a fixed-
decision using an independently calibrated threshold.
Algorithm 1 summarizes the complete procedure. For raw waveform or spectrogram inputs, the local SPD structure field is constructed using the STFT and structure-tensor procedure described in
Section 2.2. In the controlled benchmark of
Section 3, generated SPD fields are directly used as inputs to the reference-estimation and scoring stages.
| Algorithm 1. Proposed information-geometric detector based on local SPD structure fields |
| Input: |
| Raw training samples or precomputed SPD-field samples , /, ; |
| raw test sample or precomputed SPD-field sample ; |
| independent calibration set ; |
| target false-alarm probability ; |
| local patch support ; |
| overlapping block set ; |
| top-block fraction q; |
| aggregation coefficient . |
| Output: |
| Sample-level detection statistic T and final decision. |
| Training phase: |
- 1:
For each training sample, obtain its local SPD structure field. - 2:
If raw inputs are used, construct the local SPD structure field following Section 2.2. - 3:
If the input is a controlled SPD-field sample, use the generated SPD field directly. - 4:
End for. - 5:
For each position and each class , estimate the pointwise Karcher mean reference tensor according to Equation ( 24). - 6:
Obtain the two class-conditional reference fields and . - 7:
Compute , , and for all . - 8:
Construct the discriminative weight map by normalization, smoothing, and clipping.
|
| Calibration phase: |
- 9:
For each calibration sample , do. - 10:
Compute its local SPD structure field . - 11:
Compute the local distance-difference evidence according to Equation ( 28). - 12:
For each block , compute the weighted block score according to Equation ( 42). - 13:
Sort in descending order and select the top blocks. - 14:
Compute the calibration statistic T according to Equation ( 46). - 15:
End for. - 16:
Sort the calibration statistics and obtain using the empirical order-statistic rule in Section 2.6.
|
| Testing phase: |
- 17:
Compute the local SPD structure field of the test sample X. - 18:
Compute the local distance-difference evidence according to Equation ( 28). - 19:
For each block , compute the weighted block score according to Equation ( 42). - 20:
Sort in descending order and select the top K blocks. - 21:
Compute and according to Equations ( 44) and ( 45). - 22:
Compute the final statistic . - 23:
If , decide ; otherwise, decide .
|
2.8. Computational Complexity
Table 1 summarizes the computational complexity of the proposed detector. Let
denote the total number of training samples,
the number of spatial positions,
the maximum number of Karcher iterations,
L the number of overlapping blocks,
the size of block
,
M the number of calibration samples, and
the cost of evaluating one sample-level statistic.
The main computational cost of the proposed detector comes from pointwise Riemannian reference estimation and repeated AIRM distance evaluation. Although AIRM provides an intrinsic comparison on the SPD manifold, faster alternatives such as Log-Euclidean metrics or tangent-space approximations may reduce the online cost in large-scale settings [
24,
34,
43].
3. Experimental Results
3.1. Experimental Setup and Main Detection Scene
The experiments evaluate the effectiveness, component contributions, and applicability limits of the proposed detector. In all formal experiments, the training, calibration, testing, and audit sets are mutually independent. Receiver operating characteristic (ROC) curves are obtained by directly sweeping the detection threshold. For fixed false-alarm probability operating points, the threshold is determined from the empirical quantile of an independent calibration set matched to the corresponding experimental condition. Two target operating points, and , are reported throughout the experiments. The achieved false-alarm probability is also reported as an empirical calibration check to verify whether the nominal false-alarm target is realized at the expected order of magnitude.
The main experiment uses a structured-locality detection scene. This scene is not designed to create separability through pronounced global energy differences, mean spectral appearance differences, or a single pooled covariance statistic. Instead, the class difference is mainly encoded in local structural orientation, spatial arrangement, and structural stability. From the viewpoint of non-stationary signal analysis, this scene represents the situation in which the informative content is spatially inhomogeneous and locally organized over the time–frequency support. The local structural pattern varies with position, and the class distinction is expressed through localized orientation organization and spatial arrangement rather than through a single global distribution. Therefore, the benchmark is intended to emulate a local-structure-dominated non-stationary detection setting at the SPD-field object layer. Additional structural perturbation and non-Gaussian background experiments are further conducted to examine the robustness and applicability limits of the proposed detector.
In the formal experiments, 20,000 independent samples are used for fixed- threshold calibration, 500 test samples and 500 test samples are used for ROC, area under the curve (AUC), and detection probability () evaluation, and another 20,000 independent audit samples are used only for achieved validation. All evaluated methods share the same data splits, target false-alarm probabilities, calibration protocol, and audit protocol. Therefore, the performance differences are mainly attributable to the detection statistics themselves rather than to threshold implementation or sample reuse.
The experiments are conducted on a controlled SPD-field object-layer benchmark.
Section 2 describes the complete construction, from raw observations or time–frequency patches to local SPD structure fields. In
Section 3, however, samples are generated directly at the SPD-field object layer in order to isolate and evaluate the core detector mechanisms after the representation stage, including reference-field estimation, local distance-difference evidence, discriminative weighting, block-wise robust pooling, and fixed-
calibration. This design allows the experiment to focus on whether the proposed detector can exploit local structural organization once such structures have been represented as SPD fields. Therefore, the experimental conclusions validate the detector mechanism after SPD-field representation, rather than a complete end-to-end waveform-level detection pipeline.
Controlled SPD-Field Benchmark Generation
The main scene generates samples directly at the local SPD structure-field object layer. The patch support is defined as
Each sample is represented as an SPD-valued field,
For an orientation angle
, the directional SPD prototype matrix is defined as
where
is a two-dimensional rotation matrix,
, and
. All generated matrices are symmetrized and constrained by a minimum eigenvalue condition,
The background structure is generated from a shared orientation atlas
. This atlas is constructed from four quadrant-wise base orientations,
,
,
, and
, and is mildly smoothed by a Gaussian kernel to avoid artificial discontinuities. At background positions, the SPD tensor is generated in the log domain as
where
denotes background orientation jitter,
is a symmetric log-domain perturbation, and
denotes SPD correction under the eigenvalue-flooring constraint.
Class-discriminative differences are introduced through stable local structural islands. The class-defining orientation separation is set to
. With the base orientation set to
, the two class-discriminative orientations are
The main scene contains three discriminative local structures centered at
,
, and
, with a radius of
and relative amplitudes of
,
, and
, respectively. The mask of the
k-th local structure is defined as
Inside a discriminative local structure, the local orientation is obtained by interpolating between the background orientation and the class-discriminative orientation:
The discriminative orientation jitter is set to , and the discriminative log-domain perturbation amplitude is set to .
To avoid reducing the scene to a simple template-matching problem, each sample further includes six shared unstable distractors. The center of each distractor is randomly sampled, its radius is set to
, its orientation is randomly selected from
and a rotation jitter of
is applied. Its amplitude is randomly sampled from
. These distractors serve as nuisance local structures and simulate local interference, misalignment, and non-discriminative structural responses. Consequently, each generated sample is a spatially inhomogeneous SPD field rather than a homogeneous field generated from a single position-independent tensor distribution. The background atlas, localized discriminative islands, orientation jitter, and shared unstable distractors jointly create position-dependent local structural variations. These variations are used to represent, at the object layer, the kind of non-stationary time–frequency organization that motivates the proposed detector.
In this benchmark, the nominal index is a background-difficulty control parameter rather than a measured physical signal-to-noise ratio or signal-to-clutter ratio. It does not modify the class-defining structural separation or the amplitudes of the discriminative local structures. Instead, controls nuisance factors, including background perturbation, background orientation jitter, distractor perturbation, and distractor strength. Although the benchmark does not define a physical SCR, the nominal index serves an analogous role by controlling the strength of background nuisance factors.
The perturbation scale is defined as
where
dB,
,
, and
in the main experiments. The exponential form in Equation (
60) is not intended to model a physical SNR or SCR relationship. It is adopted as a convenient monotonic mapping that produces approximately logarithmic changes in nuisance intensity when the nominal index is varied in dB. Therefore, the nominal SNR/SCR-like index sweep evaluates how the detectors behave under different background-difficulty conditions. It facilitates fixed-
performance comparison across nuisance levels, while avoiding artificial separability caused by changing the class-defining local structures themselves.
All experiments use mutually independent training, calibration, testing, and audit sets. The training sets are used to estimate class-conditional pointwise reference fields and the discriminative weight map. The calibration set is used only for fixed- threshold estimation. The testing sets are used for AUC and evaluation, and the independent audit set is used only for achieved validation.
The parameters listed in
Table 2 are not used as free fitting degrees of freedom for individual curves or operating points. They are fixed before the formal comparisons and kept unchanged across nominal SNR/SCR-like index values, random seeds, calibration/test splits, and proposed and ablated variants, unless an experiment explicitly studies a perturbation factor. This fixed-configuration policy is intended to avoid treating the detector as an unconstrained multi-parameter fitting model.
The selection of these parameters follows the principles of numerical stability, local-structure preservation, SPD validity, robust aggregation, and fair fixed- comparison. Specifically, is used as a small stabilization term to prevent logarithmic singularities; sets the local integration scale of the structure tensor; and the eigenvalue floor keep tensors inside the valid SPD cone; and stabilize the discriminative weight map; the block size, stride, and control robust local pooling; q and balance sparse local saliency with global stability; the Karcher tolerance and iteration limit are determined by convergence and computational cost; and the calibration, test, and audit sample sizes are used to separate threshold calibration, detection evaluation, and achieved- auditing. The selected values are not claimed to be universally optimal; rather, they provide a fixed and transparent configuration for comparing detector mechanisms under the same controlled benchmark and calibration protocol.
3.2. Main Results Under the Structured-Locality Detection Scene
This section compares the proposed detector with the uniform-weight variant and the global-pooling variant in the main scene. The nominal SNR/SCR-like background-difficulty index is swept from dB to 0 dB. Both the overall ranking ability and the detection performance at fixed- operating points are evaluated under this controlled nuisance-difficulty sweep.
As shown in
Figure 2, the proposed detector achieves the highest average AUC in the main scene, but the source of this advantage should be interpreted in layers. Compared with the global-pooling variant, the proposed detector shows a more consistent advantage across the nominal SNR/SCR-like index sweep, indicating that purely global aggregation weakens the contribution of local discriminative structures. Compared with the uniform-weight variant, the improvement is smaller but generally stable, suggesting that discriminative weighting provides a supplementary benefit by emphasizing stable local structural regions. Because the benchmark deliberately suppresses global low-order cues, the AUC does not vary strictly monotonically with the nominal SNR/SCR-like index. Local fluctuations between adjacent nominal SNR/SCR-like index points mainly reflect the joint effects of nuisance/background variability, random distractors, and the finite number of random seeds, rather than a simple energy-detection behavior.
Figure 3 presents the fixed-
detection probability curves under different values of the nominal SNR/SCR-like background-difficulty index. These curves are intended to facilitate performance comparison under controlled nuisance levels, in the same spirit as comparing detection probability under different signal-to-noise or signal-to-clutter conditions. Although the benchmark does not define a physical signal-to-clutter ratio, the nominal index plays an analogous role by controlling the intensity of background nuisance factors while keeping the class-defining local structures unchanged.
At fixed-
operating points, the proposed detector performs better overall than the uniform-weight and global-pooling variants. At
, the detection probabilities at
dB,
dB, and 0 dB are 0.3624, 0.4316, and 0.5232, respectively, all of which are higher than those of the two simplified variants. At
, the advantage of the proposed detector is generally retained, and it achieves the highest or joint-highest performance in the mid-to-high nominal SNR/SCR-like index range. This fixed-
advantage should be interpreted together with the paired-difference analysis in
Section 3.3, because the gains over geometry-simplified structure-field baselines are clearly smaller than the gains over global low-order baselines.
3.3. Baseline Comparison and Ablation Analysis
To assess comparability against external baselines and the contribution of individual components, we further conduct baseline comparisons and ablation analyses in the main scene. All methods use the same data splits and the same fixed-
calibration protocol. Therefore, the comparison focuses on the choice of detection object and the construction of the sample-level statistic, rather than on differences in threshold implementation.
Figure 4 summarizes the baseline comparison under the main scene, and
Table 3 reports the corresponding average performance results.
The baseline comparison shows that local structure-field methods clearly outperform the global-energy and pooled-covariance baselines. This indicates that, in the present main scene, the discriminative information mainly resides in local structural organization rather than in global energy or a single pooled covariance statistic. The template-correlation baseline is competitive at fixed- operating points, but its AUC is substantially lower than those of local structure-field methods, suggesting that simple template similarity cannot fully capture the geometric variations in local SPD structure fields. The achieved values of all methods are close to their corresponding target levels, supporting a fair comparison under fixed- conditions.
Within the structure-field baselines, the proposed detector provides only a modest improvement over the Euclidean/Frobenius structure-field baseline. To avoid over-interpreting small mean differences, we further conduct paired-difference analysis. For any metric
given a baseline
b and a paired experimental unit
u, the paired difference is defined as
For the main nominal SNR/SCR-like index sweep, the paired difference is first computed for each seed–index pair and then averaged over all nominal SNR/SCR-like index points within each random seed. Therefore, the independent paired unit for the confidence interval is the random seed rather than an individual seed–index pair. This prevents adjacent nominal SNR/SCR-like index points under the same seed from being treated as fully independent observations and reduces the risk of underestimating the confidence interval.
We report the mean paired difference and its 95% confidence interval. This analysis is used to avoid over-interpreting small mean performance differences.
Table 4 reports the resulting paired performance differences between the proposed detector and the competing methods.
The seed-averaged paired-difference analysis shows that the proposed detector achieves positive AUC gains over both the uniform-weight and global-pooling variants on the main sweep. Its fixed-
gains over the global-pooling variant are also relatively clear. For the uniform-weight variant, the confidence interval of the
difference at
slightly crosses zero after seed-level aggregation; therefore, the improvement at this operating point should be interpreted cautiously. In contrast, the gains over global low-order baselines are much larger, whereas the gains over the Euclidean/Frobenius structure-field baseline are small. The main evidence therefore supports the effectiveness of the local SPD structure-field object layer. AIRM-based geometry-consistent modeling should be interpreted as a smaller supplementary contribution rather than as the sole source of a large performance jump.
Figure 5 presents the ablation analysis under the main scene.
In the ablation analysis, the proposed detector, the uniform-weight variant, and the global-pooling variant share the same local SPD structure-field representation and the same pointwise AIRM distance-difference map. They differ only in whether discriminative weighting and top-block emphasis are used. The results indicate that the proposed detector achieves the best values on all three core metrics. This suggests that useful discriminative information is not uniformly distributed over the whole patch, but is concentrated in several stable local structural regions. Discriminative weighting increases the contribution of these regions, while block-wise pooling reduces the influence of low-information areas and local abnormal responses on the sample-level statistic.
3.4. Robustness Under Structural Perturbations and Non-Gaussian Conditions
This section evaluates robustness under local structural perturbations and non-Gaussian background perturbations. The purpose is to examine whether the performance ordering remains stable when structural strength and background distribution change.
Figure 6 reports the robustness results under local structural perturbations.
In the local structural perturbation experiment, the structural-strength perturbation results show that the gain of the proposed detector over the uniform-weight variant remains generally stable across perturbation settings around the main scene. Under the slightly stronger, main, slightly weaker, and boundary auxiliary conditions, remains positive and stays on the order of . This indicates that, when the orientation separation and stability of local structures undergo small to moderate changes, the ranking advantage of the proposed detector does not reverse. The result suggests that the proposed detector retains its advantage when local structures remain present but their strength and stability are perturbed.
The non-Gaussian background perturbation experiment compares Gaussian, Laplace-distributed, impulsive, and heavy-tailed field-level background perturbations. Perturbations are applied at the SPD-field object layer. For an SPD tensor
at position
x, the tensor is first symmetrized, and a symmetric random perturbation is then added in the log domain:
where
is a symmetric random perturbation matrix, and
denotes SPD correction under the same minimum eigenvalue constraint used in the main scene, namely
. The perturbation matrix is defined as
where
is the noise scale.
Four field-level perturbations are compared. In Gaussian perturbation, the entries of
are independently sampled from the standard normal distribution. The Laplace-distributed perturbation is generated by inverse-transform sampling:
Impulsive perturbation is based on Gaussian perturbation and amplifies the perturbation magnitude by a factor of 4 with probability
:
Heavy-tailed perturbation uses a Student-
t distribution with
degrees of freedom and variance normalization:
The perturbation scales are set to , , , and for Gaussian, Laplace-distributed, impulsive, and heavy-tailed perturbations, respectively. All perturbations are independently generated for each sample and each spatial position. After perturbation, matrices are symmetrized again and SPD correction is applied.
In this experiment, non-Gaussian perturbations are not applied to the training
sets. Therefore, the class-conditional reference fields and the discriminative weight map are still estimated from the main-scene training data. Non-Gaussian perturbations are applied only to the calibration
, test
, test
, and audit
sets. For each noise condition, the fixed-
threshold is re-estimated using the corresponding perturbed independent
calibration set.
Figure 7 reports the robustness results under non-Gaussian background perturbations.
The results indicate that the main performance ordering is not reversed under the four field-level background perturbations. The proposed detector remains higher than the uniform-weight and global-pooling variants in AUC and at both operating points. Since the threshold is recalibrated using the perturbed calibration set under each noise condition, this result indicates robustness to moderate field-level distributional perturbations. It does not imply, however, that a threshold calibrated under one noise condition can be directly transferred to another noise condition without recalibration.
4. Discussion
4.1. Sources of Performance Gain
The experimental results indicate that the primary gain arises from the local SPD-field representation, rather than from the Riemannian metric alone. Compared with the global-energy and pooled-covariance baselines, local structure-field methods preserve the directional organization, anisotropy, and spatial distribution within the patch, and are therefore more suitable for the present local-structure-dominated detection scenario.
Within the local structure-field object layer, AIRM-based comparison and pointwise Karcher mean reference fields provide a geometry-consistent way to define class-conditional reference fields and local distance-difference evidence. However, the empirical gain over the Euclidean/Frobenius structure-field baselines is relatively small. Therefore, the role of AIRM should be understood as a geometry-consistent modeling component within the object-layer framework, rather than as a factor that alone produces a large performance jump.
Discriminative weighting and block-wise robust pooling mainly improve sample-level evidence aggregation. They shift the statistic from a purely global average response toward stable local discriminative regions, thereby reducing the influence of low-information areas, local misalignment, and abnormal responses on the final statistic.
4.2. Relationship to Existing Methods and Future Directions
Compared with existing methods, the core difference of the proposed method is not simply the use of a more complex distance or a more complex pooling strategy, but the change in the detection object layer. Classical detection and constant false-alarm rate (CFAR) methods usually operate on scalar statistics and adaptive thresholds under fixed false-alarm probability constraints. They have clear detection-theoretic interpretations, but their ability to represent local structural organization is limited. Spectrogram patch methods preserve time–frequency locality, but most of them still treat patches as intensity templates or learning inputs. Existing matrix information-geometric detectors place covariance matrices or HPD matrices on SPD/HPD manifolds and compare them using intrinsic matrix geometry, forming a mature route for radar detection. However, such methods usually compress each observation cell into a single matrix object. The proposed method lies at the intersection of these routes: it inherits the locality of time–frequency patch methods, transforms local structures into an SPD-valued field through the structure tensor, and performs reference estimation and local distance-difference evidence construction under AIRM. The relationship between the proposed method and representative methodological families is summarized in
Table 5.
One representative research line in information-geometric signal detection is the series of works by Cheng, Hua, Wang, Wu, Yang, and collaborators on matrix information geometry detection. This line starts from an information-geometric interpretation of signal detection, connects Neyman–Pearson detection, Kullback–Leibler divergence, and radar CFAR detection, and further develops radar target detection methods based on symmetrized KL divergence, total KL divergence, total Jensen–Bregman divergence, and matrix information geometry. More recent work extends this line to weak target detection in heterogeneous clutter and track-before-detect scenarios in range–azimuth measurements. In particular, discriminative HPD-manifold projection and TBD-MIG detectors further show that manifold projection can enhance the discriminative capability of HPD matrix representations for range-spread radar targets in non-Gaussian clutter [
26]. These developments indicate that matrix information geometry has evolved from theoretical interpretation toward more complex radar detection tasks. In contrast to this line, the present work does not continue to construct a detector around a single covariance or HPD matrix object. Instead, it represents each sample as a spatially distributed local SPD structure field in order to preserve the local structural distribution inside a patch.
Another representative line is matrix information geometry for radar processing associated with Barbaresco and related work. This line introduced Cartan/Siegel geometry, SPD/HPD matrix manifolds, Fréchet metric spaces, and related tools into radar covariance matrix processing, and explored robust statistical processing methods such as OS-HDR-CFAR, OS-STAP, and Riemannian means/medians. Along this direction, Ono and Peng recently compared AIRM, Log-Euclidean, and Bures–Wasserstein geometries for HPD matrices in Matrix-CFAR signal detectors, while also analyzing detection performance, outlier robustness, and computational complexity. These developments suggest that an important future direction is not merely to add more detector modules, but to establish a more systematic connection among different SPD/HPD geometries, robust reference estimation, and local structure-field representations.
The additional SPD-manifold studies also help clarify the geometric scope of the present method. General studies of
-invariant metrics emphasize that AIRM is one member of a broader family of meaningful intrinsic geometries on SPD matrices [
24]. Riemannian Procrustes analysis further demonstrates that SPD covariance features can be aligned across domains while respecting manifold geometry [
25]. These studies mainly operate on covariance, SPD, or HPD matrix objects, whereas the present work focuses on a spatially distributed field of local SPD structure tensors, where the spatial organization of local time–frequency structures is itself part of the discriminative information.
4.3. Information-Geometric Interpretation and Its Limits
The design of the proposed method is inspired by a classical relative-closeness decision principle. In classical detection, the likelihood-ratio test can be interpreted, under large-sample conditions, as a comparison between the Kullback–Leibler divergences from the observation distribution to the two hypothesized distributions. This relationship provides an information-geometric motivation for the local distance-difference evidence used in this paper.
However, the proposed detector should not be interpreted as a strict generalization of the classical LRT. The object layer in this paper is the local SPD structure field. The quantity represents local distance-difference evidence under the SPD/AIRM geometry, while discriminative weighting and block-wise robust pooling are sample-level evidence aggregation mechanisms. Under additional approximation assumptions, such as locally isotropic Riemannian Gaussian perturbations, the difference between squared AIRM distances can be monotonically related to an approximate log-likelihood ratio. This relationship should be understood as an explanatory motivation rather than as a strict optimality proof.
4.4. Limitations and Applicability
The proposed detector is most suitable for non-stationary detection tasks in which discriminative information is primarily encoded in local structural organization, directional continuity, and spatial arrangement patterns. In such scenarios, the relevant class differences are not adequately captured by global energy, mean spectral intensity, or a single covariance-scale statistic. Conversely, if the class difference is mainly reflected in global energy, average spectral strength, or a single covariance-scale variation, simpler global statistics may already be sufficient, and the advantage of the proposed method may diminish.
A potential class of physical applications is the analysis of measurement signals whose diagnostic information appears as specific local structures in the time–frequency plane. Examples include radar or wireless sensing signals with drifting spectral components, multipath-induced ridges, fragmented time–frequency textures, or local directional continuity. Related GNSS radio-occultation and reflected-signal studies have shown that direct and reflected signal components can produce structured signatures in spectral or time–frequency representations [
44,
45]. For such problems, the proposed detector may be useful if the class-relevant information can be stably encoded as local ridges, directional organization, or spatially localized texture changes in the time–frequency plane.
However, this applicability is conditional. The present manuscript does not claim direct validation on GNSS radio-occultation data, reflected-signal datasets, or any other specific physical measurement dataset. Applying the proposed method to such problems would require an end-to-end validation pipeline, including raw waveform preprocessing, time–frequency representation design, local SPD structure-field construction, and fixed- calibration under the corresponding physical clutter or noise conditions. Therefore, validation on real non-stationary physical data is an important future direction rather than a conclusion of the current controlled benchmark.
The benefit of the method also depends on the existence of local discriminative structures that can be stably estimated. When discriminative local structures are too few, overly fragmented, or strongly contaminated by local interference, the gains from discriminative weighting and top-block emphasis may not persist. The boundary setting in the structural perturbation experiment also indicates that the method is not unconditionally superior under all local complexity conditions.
In addition, AIRM distance computation, pointwise Karcher mean estimation, and block-wise pooling introduce higher computational cost than global-energy or pooled-covariance baselines. Therefore, the proposed method is more appropriate for scenarios where structural interpretability and local-structure detection performance are prioritized over minimum-cost detection.
Finally, the experiments in this paper are conducted on a controlled SPD-field object-layer benchmark. These experiments mainly validate the detector mechanism after local structures have been represented as SPD fields. They do not constitute a complete end-to-end validation on raw non-stationary waveforms or real physical measurement datasets. Moreover, the nominal SNR/SCR-like index used in the benchmark is a controlled background-difficulty index and should not be regarded as a measured physical signal-to-noise ratio or signal-to-clutter ratio. End-to-end validation on raw waveforms, real public datasets, cross-scene fixed- calibration, and more efficient geometric approximations remain important directions for future work.