Hankel-Structured Graph Learning for Meta-Verified Sylvester Reconstruction in Binary Waring Decomposition

Wang, Wenjie; Liang, Chen-Wei; Wang, Mu-Jiang-Shan; Zhang, Chi

doi:10.3390/sym18061012

Open AccessArticle

Hankel-Structured Graph Learning for Meta-Verified Sylvester Reconstruction in Binary Waring Decomposition

¹

College of Sciences, Northeastern University, Shenyang 110004, China

²

School of Mathematics and Statistics, Faculty of Science, University of New South Wales, Sydney, NSW 2052, Australia

³

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

⁴

Shenzhen Kaihong Digital Industry Development Co., Ltd., Shenzhen 518000, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(6), 1012; https://doi.org/10.3390/sym18061012 (registering DOI)

Submission received: 6 May 2026 / Revised: 8 June 2026 / Accepted: 10 June 2026 / Published: 12 June 2026

(This article belongs to the Section Mathematics)

Download

Browse Figures

Versions Notes

Abstract

Binary Waring decomposition seeks to express a homogeneous binary form as a minimal sum of powers of linear forms. In the binary setting, Sylvester’s theorem gives a classical algebraic route for rank determination and parameter recovery through structured Hankel/catalecticant matrices. Although this procedure is exact and interpretable in ideal arithmetic, practical rank identification may become unstable when the input coefficients are contaminated by noise or when the underlying roots are close to degenerate configurations. This paper develops a data-driven rank inference framework coupled with certified Sylvester reconstruction for robust binary Waring decomposition. The proposed method first converts the coefficient sequence into a Hankel-aware graph that captures recurrence-induced dependencies among polynomial coefficients. A graph neural network is then used to infer plausible rank candidates from this structured representation. Instead of accepting a single prediction directly, the framework performs explicit Sylvester reconstruction and algebraic residual verification for candidate ranks. To further improve decision reliability, a lightweight meta-verification module integrates reconstruction residuals, model confidence scores, and stability-related indicators to select the most credible rank. Experiments on large-scale synthetic binary forms show that the proposed meta-guided variant improves rank identification and verified reconstruction success relative to the one-shot hybrid solver under low-to-moderate noise while maintaining the transparency and auditability of classical symbolic–numeric computation. Additional stress tests indicate that performance can degrade under shifted sampling regimes; so, the method should be interpreted as a robust decision layer within the modeled problem class rather than as unconstrained real-world validation.

Keywords:

binary Waring decomposition; Sylvester theorem; Hankel matrix; graph neural networks; rank identification; symbolic–numeric computation

1. Introduction

Binary Waring decomposition asks for a homogeneous binary form to be written as a minimal sum of powers of linear forms. For binary forms, Sylvester’s classical theorem connects this problem to linear recurrences and structured Hankel/catalecticant matrices built from the coefficient sequence [1]. This viewpoint is central in algebraic geometry, symbolic computation, tensor rank theory, and Prony-type recovery methods [2,3,4,5,6].

In exact arithmetic, Sylvester-style solvers are interpretable and certifiable. In numerical settings, however, the rank decision is often made by an incremental search over candidate ranks, repeatedly constructing Hankel blocks and checking for near-kernel structure before attempting reconstruction [7]. These tests can be fragile when coefficients are noisy or when roots are nearly colliding, and residual-based thresholds may change the selected rank under small perturbations [8,9,10,11]. The central question of this paper is therefore not whether learning can replace Sylvester reconstruction, but whether it can make the rank-selection stage more reliable while preserving explicit algebraic verification.

We study a hybrid, learning-assisted Sylvester pipeline. The coefficient sequence is encoded as a sparse graph whose edges reflect local adjacency and repeated Hankel-window reuse. A graph neural network (GNN) predicts plausible Waring ranks and stability-related signals from this solver-aligned representation. The predicted ranks are then passed to a classical Sylvester reconstruction step, and candidate decompositions are accepted only after explicit residual verification. Thus, the neural component proposes and ranks hypotheses, while the final decomposition remains an auditable algebraic object.

The graph representation is chosen because the Hankel constraints are not merely sequential: the same coefficient participates in many overlapping windows and anti-diagonal relations. Message passing provides a compact way to aggregate these distributed recurrence-consistency cues [12,13,14,15]. We also discuss alternative encodings, including direct Hankel-array inputs, local convolutions, and attention-based sequence models, and clarify that the proposed graph should be understood as a domain-specific inductive bias rather than an intrinsically superior architecture for all settings. More broadly, graph-theoretic models have long been used to describe structural dependence, connectivity, robustness, and diagnosability in discrete computational systems, including interconnection networks, Cayley-type graph networks, graph orientation problems, and Hamiltonian digraph structures [16,17,18,19]. These works do not directly provide a method for binary Waring decomposition; rather, they motivate the general viewpoint that carefully designed graph representations can expose structural relations that are not immediately visible from a raw sequential representation. In the present work, this idea is specialized to the Hankel-induced coefficient graph derived from Sylvester recurrence structure.

To improve robustness beyond a one-shot rank prediction, we introduce a Meta-Solver. It evaluates a small set of candidate ranks, reconstructs each candidate algebraically, and uses a lightweight classifier to combine reconstruction residuals with GNN confidence and stability features. This mechanism is especially useful when several ranks give similar residuals under noise. The resulting method is designed primarily as a robust decision layer; the present implementation is not claimed to provide a runtime advantage over optimized classical baselines.

All experiments are based on controlled binary-form generators with known ground truth. This design makes the algebraic verification and error analysis reproducible, but it does not by itself establish performance under arbitrary real-world distribution shift. Accordingly, we report additional stress-test settings and state the remaining distribution-shift limitation explicitly. The intended scope is therefore narrower and more precise: the method is designed for symbolic–numeric and Prony/Sylvester-type workflows in which the input is a coefficient or moment sequence believed to arise from a low-rank binary form, and where every proposed decomposition must be checked by explicit residual verification. For naturally occurring or unlabeled data streams, the framework can provide a verified candidate decomposition when one passes the algebraic tests, but it cannot by itself certify that the data-generating process truly belongs to the binary Waring model class.

The main contributions of this work are summarized as follows:

We propose a learning-assisted Sylvester solver for binary Waring decomposition in which a graph neural network (GNN) predicts plausible Waring ranks directly from Hankel-induced coefficient structure. This learning module serves as a structural front-end that guides the classical Sylvester reconstruction procedure while preserving the exact algebraic verification pipeline.
We introduce a Meta-Solver mechanism that evaluates multiple candidate ranks rather than relying on a single rank prediction with a rigid residual threshold. By combining algebraic reconstruction features with learned confidence signals, the Meta-Solver improves rank selection robustness, particularly in settings where numerical residuals alone are insufficiently discriminative.
We empirically demonstrate that the proposed framework improves rank identification and verified reconstruction success under low-to-moderate coefficient noise, and we report stress tests under shifted sampling regimes. Although the current implementation does not yet provide a runtime advantage over classical baselines, it illustrates how learning-based candidate verification can enhance robustness in algebraic decomposition pipelines.

2. Mathematical Preliminaries

This section reviews the classical algebraic framework underlying binary Waring decomposition and highlights the structured linear-algebraic formulation that forms the basis of modern computational solvers. In particular, we emphasize the role of moment representations and structured matrices, which later allow the learning-assisted components to interface naturally with the classical Sylvester pipeline. Standard references include Sylvester’s original 1851 work, modern treatments of binary form rank and symmetric tensor rank, and symbolic-computational approaches based on polynomial decomposition and moment matrices [1,2,3,4,7,20,21].

2.1. Binary Waring Decomposition

Let

d \geq 1

and consider a binary form of degree d over a field

F \in {R, C}

written in the normalized coefficient convention:

f (x, y) = \sum_{i = 0}^{d} (\binom{d}{i}) a_{i} x^{d - i} y^{i}, a_{i} \in F .

(1)

A (binary) Waring decomposition of length r expresses the polynomial as a sum of powers of linear forms:

f (x, y) = \sum_{k = 1}^{r} λ_{k} {(x + β_{k} y)}^{d}, λ_{k} \in F ∖ {0}, β_{k} \in F,

(2)

where the linear forms are assumed to be pairwise distinct in the generic case (i.e.,

β_{i} \neq β_{j}

for

i \neq j

). The smallest such r is called the Waring rank of f, denoted

{rank}_{W} (f)

.

Expanding (2) and matching coefficients yields the classical moment or exponential-sum representation:

a_{i} = \sum_{k = 1}^{r} λ_{k} β_{k}^{i}, i = 0, 1, \dots, d .

(3)

This representation shows that binary Waring decomposition is equivalent to recovering a sparse exponential model from a finite sequence of moments. Such formulations appear naturally in Prony-type reconstruction problems arising in signal processing, sparse interpolation, and moment inversion [6].

From a computational perspective, Equation (3) induces structured linear relations among the coefficients. These relations can be organized into Hankel or catalecticant matrices, whose kernel structure encodes the unknown parameters

β_{k}

. Classical Sylvester-style algorithms exploit this structure by detecting the smallest rank at which the corresponding Hankel system admits a nontrivial null space, after which the decomposition parameters can be recovered through root-finding and linear reconstruction; related symbolic algorithms for symmetric rank computation have also been developed in the broader tensor setting [1,5,7].

Uniqueness regime (informal).

More precisely, a generic rank-r binary form is identifiable when:

2 r \leq d + 1,

equivalently,

r < (d + 2) / 2

. In particular, for odd degree

d = 2 m + 1

, the generic rank is

m + 1

and the Waring decomposition is unique up to permutation and scaling of the summands. For even degree

d = 2 m

, the generic rank is

m + 1

, which lies outside the strict identifiable range and should not be described by the same uniqueness statement [2,7,22]. This distinction plays an important role in practical solvers, since decomposition stability is closely related to whether the underlying rank lies within the identifiable regime. In the learning-assisted pipeline developed later in this paper, we optionally predict an auxiliary “uniqueness” flag to indicate when reconstruction is expected to be stable.

2.2. Hankel (Catalecticant) Characterization

Sylvester’s classical approach characterizes Waring decomposition through structured Hankel (or catalecticant) matrices constructed from the coefficient sequence

(a_{i})

[1]. These matrices encode the linear recurrence relations satisfied by the moment representation (3) and form the algebraic backbone of Sylvester-style decomposition algorithms, as well as more general moment-matrix approaches to polynomial decomposition [21].

For an integer

r \geq 1

, define the

(r + 1) \times (r + 1)

Hankel matrix:

A_{0}^{(r)} = [\begin{matrix} a_{0} & a_{1} & \dots & a_{r} \\ a_{1} & a_{2} & \dots & a_{r + 1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{r} & a_{r + 1} & \dots & a_{2 r} \end{matrix}],

(4)

and the shifted Hankel matrix:

A_{1}^{(r)} = [\begin{matrix} a_{1} & a_{2} & \dots & a_{r + 1} \\ a_{2} & a_{3} & \dots & a_{r + 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{r + 1} & a_{r + 2} & \dots & a_{2 r + 1} \end{matrix}],

(5)

whenever the required coefficients exist (for instance,

d \geq 2 r + 1

for (5)). When the polynomial degree is insufficient to build these full blocks, practical implementations simply use the largest available Hankel matrices consistent with the coefficient range.

Theorem 1

(Sylvester-type rank criterion, binary case). Assume f admits a decomposition (2) of length r with pairwise distinct

β_{k}

. Then, the moment sequence (3) satisfies the following properties:

1.: The coefficient sequence $(a_{i})$ obeys a linear recurrence relation of order r, and the associated Hankel matrices become rank-deficient once their size exceeds the minimal rank. Equivalently, the minimal Waring rank is the smallest r for which the sequence admits a nontrivial linear recurrence.
2.: There exists a degree-r polynomial:

$T (t) = t^{r} + c_{r - 1} t^{r - 1} + \dots + c_{1} t + c_{0}$

(6)

whose roots are precisely ${β_{k}}_{k = 1}^{r}$ . The coefficients $(c_{0}, \dots, c_{r - 1})$ satisfy a linear system derived from the Hankel structure induced by (3).

Conversely, if coefficients

(c_{0}, \dots, c_{r - 1})

can be found such that the induced recurrence holds for the moment sequence

{a_{i}}

and the polynomial T has r distinct roots, then, the binary form f admits a Waring decomposition of length r of the form (2). See [1,2,7] for detailed proofs and algebraic interpretations.

The theorem shows that Waring rank identification can be reduced to detecting the minimal order linear recurrence satisfied by the moment sequence. In classical Sylvester-style solvers, this is typically achieved by incrementally increasing the candidate rank r, constructing the corresponding Hankel matrices, and checking for the emergence of a nontrivial null space. When the coefficients are perturbed by noise, the system is typically solved in a least-squares sense. In structured parameter estimation problems, such formulations often lead to separable nonlinear least-squares models, where efficient numerical methods such as variable projection can be applied to eliminate linear parameters and improve numerical stability [23,24].

2.3. Linear Recurrence and Reconstruction

Recurrence system.

A moment sequence

(a_{i})

admits a linear recurrence of order r if there exist coefficients

(c_{0}, \dots, c_{r - 1})

such that:

a_{i + r} + c_{r - 1} a_{i + r - 1} + \dots + c_{1} a_{i + 1} + c_{0} a_{i} = 0, i = 0, 1, \dots, d - r .

(7)

In the context of binary Waring decomposition, this recurrence corresponds exactly to the minimal polynomial whose roots are the unknown parameters

β_{k}

introduced in (2). Consequently, identifying the correct recurrence order is equivalent to identifying the Waring rank.

Using the Hankel structure introduced in (4), the recurrence coefficients can be obtained (when the system is well-posed) from the linear system:

\underset{Hankel / Toeplitz - like block from {a_{i}}}{\underset{︸}{[\begin{matrix} a_{0} & a_{1} & \dots & a_{r - 1} \\ a_{1} & a_{2} & \dots & a_{r} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{d - r} & a_{d - r + 1} & \dots & a_{d - 1} \end{matrix}]}} [\begin{matrix} c_{0} \\ c_{1} \\ ⋮ \\ c_{r - 1} \end{matrix}] = - [\begin{matrix} a_{r} \\ a_{r + 1} \\ ⋮ \\ a_{d} \end{matrix}] .

(8)

In exact arithmetic and generic settings, a square sub-block is sufficient (often derived from

A_{0}^{(r - 1)}

). When the coefficients are perturbed by noise, the system is typically solved in a least-squares sense.

Recovering roots and weights.

Once

(c_{0}, \dots, c_{r - 1})

are obtained, one forms the characteristic polynomial

T (t)

in (6) and computes its roots

β_{1}, \dots, β_{r}

. These roots correspond to the linear forms appearing in the Waring decomposition.

The weights

λ_{k}

are then recovered from the Vandermonde system implied by (3):

[\begin{matrix} 1 & 1 & \dots & 1 \\ β_{1} & β_{2} & \dots & β_{r} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ β_{1}^{r - 1} & β_{2}^{r - 1} & \dots & β_{r}^{r - 1} \end{matrix}] [\begin{matrix} λ_{1} \\ λ_{2} \\ ⋮ \\ λ_{r} \end{matrix}] = [\begin{matrix} a_{0} \\ a_{1} \\ ⋮ \\ a_{r - 1} \end{matrix}] .

(9)

Under noisy conditions, one typically employs an overdetermined least-squares formulation using rows

i = 0, \dots, d

, which can be solved using standard numerical techniques for linear least-squares problems [25]. It is well known that Vandermonde systems become severely ill-conditioned when roots are clustered, making this step a significant source of numerical instability in practice [8,26,27].

Verification.

Given a candidate rank

\hat{r}

and reconstructed parameters

({{\hat{β}}_{k}, {\hat{λ}}_{k}})

, the decomposition can be verified by evaluating the coefficient residual:

ε_{max} = max_{0 \leq i \leq d} | a_{i} - \sum_{k = 1}^{\hat{r}} {\hat{λ}}_{k} {\hat{β}}_{k}^{i} |,

(10)

or alternatively an

ℓ_{2}

residual normalized by

{∥ a ∥}_{2}

. This verification step plays a central role in the hybrid framework developed later: the learning module proposes a candidate rank

\hat{r}

(and optionally a regime indicator), but the final acceptance of a decomposition is determined by this algebraic residual check.

2.4. Classical Sylvester Rank Search

A practical implementation of Sylvester’s method typically performs an incremental search over candidate ranks. Starting from

r = 1

, the algorithm tests whether a linear recurrence of order r exists for the coefficient sequence (7). Equivalently, one checks whether the corresponding Hankel block admits a nontrivial null space. Once such a recurrence is detected, the associated characteristic polynomial is constructed and the decomposition parameters are recovered through root finding and linear reconstruction.

Computational implications.

The dominant cost arises from repeatedly solving linear systems and computing polynomial roots across candidate ranks. Although each individual step is inexpensive for moderate r, the cumulative overhead can become significant when the maximal search range

R_{max}

is large or when many instances must be processed in batch, reflecting the well-known computational scaling behavior of structured linear-algebraic routines [28,29].

Moreover, the decision at each candidate rank relies on numerical criteria such as singular-value thresholds or residual tolerances. When the input coefficients are perturbed by noise or when the underlying roots are nearly degenerate, these criteria become sensitive to scaling and ill-conditioning [8,9,10]. As a result, the incremental search procedure may either terminate prematurely at an incorrect rank or fail to detect the correct one without careful parameter tuning.

The classical incremental rank-search procedure is summarized in Algorithm 1.

Algorithm 1 Classical Sylvester Rank Search (incremental)

for

r = 1, 2, \dots, R_{max}

do
Build Hankel blocks needed for (8)
Solve for recurrence coefficients

(c_{0}, \dots, c_{r - 1})

      if no stable/nontrivial solution exists then
          continue
      end if
      Form

T (t) = t^{r} + c_{r - 1} t^{r - 1} + \dots + c_{0}

and compute roots

{β_{k}}

      if roots are not distinct (or root finding fails) then
          continue
      end if
      Solve Vandermonde system (9) for weights

{λ_{k}}

Compute verification residual

ε_{max}

using (10)
if

ε_{max} \leq τ

then
          return rank r and decomposition parameters
      end if
   end for
   return failure (or switch to numeric heuristics/regularization)

These observations motivate the learning-assisted strategy proposed in this work. Instead of exhaustively exploring all candidate ranks, a graph neural network (GNN) predicts a plausible rank

\hat{r}

(and optionally a structural regime indicator) directly from the coefficient structure. The classical Sylvester reconstruction is then executed at the proposed rank, followed by the same algebraic verification step and, when needed, additional candidate checks. In this way, the learning module acts as a proposal mechanism for rank hypotheses while preserving the interpretability and correctness guarantees of the underlying algebraic solver.

3. Graph Representation of Polynomial Structure

The Sylvester framework reduces rank identification and reconstruction for binary Waring decomposition to two coupled ingredients: (i) the existence of a short linear recurrence in the coefficient (moment) sequence, and (ii) structured linear-algebraic computations on Hankel/catalecticant matrices (Section 2) [1,6,7]. A key structural feature of this formulation is that Hankel constraints reuse the same coefficients across many overlapping submatrices: each coefficient

a_{i}

appears repeatedly in multiple entries of multiple Hankel blocks and their shifted variants. Consequently, local relations among neighboring indices are propagated through the entire sequence via overlapping windows, creating a rich pattern of repeated couplings among

{a_{i}}_{i = 0}^{d}

.

This repeated coupling pattern can be viewed as an explicit relational structure on the coefficient sequence. Rather than treating

{a_{i}}

as an unstructured vector, we encode these Hankel-induced dependencies as a graph and process the resulting structure with standard message-passing architectures [12,13,30]. Attention-based variants further allow the model to weight different couplings adaptively [31], while the geometric deep learning viewpoint motivates incorporating such algebraic structure and symmetry priors directly into the learning module [14,15].

In this section, we construct a coefficient graph whose nodes correspond to coefficients and whose edges encode Hankel-style index couplings. The graph representation is used only to support a rank-proposal classifier in later sections; the final decomposition remains entirely determined by classical Sylvester reconstruction and is accepted only after explicit algebraic verification (Section 4).

3.1. Coefficient Graph Construction

Let

f (x, y) = \sum_{i = 0}^{d} (\binom{d}{i}) a_{i} x^{d - i} y^{i} .

We associate with f an undirected graph

G = (V, E)

whose node set

V = {0, 1, \dots, d}

indexes the coefficient sequence. Node

i \in V

corresponds to the coefficient

a_{i}

.

Node features.

To improve numerical stability and enable generalization across different polynomial degrees, we use normalized and degree-aware node features:

x_{i} = [{\tilde{a}}_{i}, \frac{i}{d}, log (1 + | {\tilde{a}}_{i} |)] \in R^{3}, {\tilde{a}}_{i} = \frac{a_{i}}{{∥ a ∥}_{2} + ϵ} .

(11)

Here,

a = (a_{0}, \dots, a_{d})

and

ϵ > 0

is a small numerical constant (e.g.,

10^{- 12}

). For complex coefficients, one may instead use

(ℜ ({\tilde{a}}_{i}), ℑ ({\tilde{a}}_{i}), i / d, log (1 + | {\tilde{a}}_{i} |))

as the node feature vector.

Edge design principle.

The Sylvester framework relies on Hankel matrices whose entries repeatedly reuse neighboring coefficients. In particular, each Hankel block involves sliding windows of the sequence

{a_{i}}

, and adjacent blocks share large subsets of indices (Section 2). Thus, coefficients that appear together within Hankel windows are algebraically coupled through the recurrence relations.

This observation motivates constructing a sparse graph that reflects these local reuse patterns. The edge set is therefore defined as the union:

E = E_{local} \cup E_{Hankel},

combining simple index adjacency with Hankel-induced couplings.

Local adjacency edges $E_{local}$ .
We connect consecutive indices along the coefficient sequence:

$(i, i + 1) \in E_{local}, i = 0, 1, \dots, d - 1 .$

(12)

These edges preserve the natural ordering of coefficients and allow for message propagation along the sequence.
Hankel-coupling edges $E_{Hankel}$ .
Let $R_{max}$ denote the maximum candidate rank considered during training. For each $r \in {1, \dots, R_{max}}$ , define the Hankel window length:

$L (r) = 2 r + 1 .$

For every window start $s \in {0, \dots, d - L (r)}$ , we consider the index set:

$W_{s, r} = {s, s + 1, \dots, s + 2 r} .$

Within each window, we connect pairs of indices that participate in the same Hankel anti-diagonal. Equivalently, we add an edge

$(i, j) \in E_{Hankel}$

(13)

whenever there exist $(r, s)$ such that

$i, j \in W_{s, r}, i + j = 2 s + t$

for some integer $t \in {0, 1, \dots, 4 r}$ .
To keep the graph sparse in practice, we additionally restrict edges to pairs satisfying:

$| i - j | \leq Δ,$

where $Δ$ is a small bandwidth parameter (e.g., $Δ = 6$ ). This preserves the dominant local Hankel couplings while maintaining efficient message passing [13,14].

Edge multiplicity as an inductive bias.

A pair of indices

(i, j)

may satisfy the condition in (13) for multiple triples

(r, s, t)

corresponding to different Hankel windows and anti-diagonals. This repeated co-occurrence reflects the fact that certain coefficient pairs participate more frequently in the structured linear relations induced by the Sylvester framework.

To encode this structural frequency, we assign an edge multiplicity:

w_{i j} = # {(r, s, t) : (i, j) is generated by (r, s, t)} .

(14)

The quantity

w_{i j}

therefore measures how strongly the pair

(i, j)

is coupled by overlapping Hankel constraints. In the graph representation,

w_{i j}

is stored as an edge feature, or optionally converted to a normalized weight:

{\hat{w}}_{i j} = \frac{w_{i j}}{{max}_{p, q} w_{p q}} .

Such multiplicity features provide a natural inductive bias for message passing by emphasizing coefficient pairs that repeatedly co-occur in the algebraic structure.

Practical summary.

The resulting graph representation has three key properties:

It is sparse and approximately banded along the coefficient index line.
It reflects the shift-invariant reuse patterns of Hankel matrices.
Its connectivity is controlled by two parameters: the maximum candidate rank $R_{max}$ and the sparsity radius $Δ$ .

These properties ensure that the graph remains computationally efficient while still preserving the dominant algebraic dependencies in the coefficient sequence. Consequently, it provides a suitable input structure for scalable message-passing architectures and robust learning in the presence of noise.

This design is summarized in Table 1.

Why this graph encoding?

The coefficient graph is not intended merely to encode the natural ordering of the sequence

(a_{i})

. Its purpose is to expose the repeated use of the same coefficients across overlapping Hankel windows, which is the algebraic mechanism behind the recurrence tests in Sylvester reconstruction. This gives the GNN access to relations that are local in Hankel-window coordinates but not always captured by nearest-neighbor sequence adjacency alone.

Table 2 further compares the proposed coefficient graph with other possible structural encodings for rank prediction.

The same idea could be used beyond purely sequential data whenever the observations are governed by structured moment, Hankel, Toeplitz, or catalecticant constraints. For the present binary Waring problem, however, the graph is tied specifically to the Sylvester recurrence structure; extending it to other settings would require rebuilding the edge rules from the corresponding algebraic constraints.

Figure 1 illustrates the Hankel-induced coupling graph used in the proposed coefficient-graph representation.

3.2. Structural Insight

The coefficient graph is constructed to reflect how low Waring rank manifests as a short linear recurrence in the moment sequence (Section 2), which is also the central mechanism underlying classical Prony-type recovery methods [6]. When the Waring rank is small, many overlapping Hankel windows admit recurrence relations with consistent coefficients. Consequently, evidence of the recurrence structure is redundantly distributed across the coefficient sequence.

Message-passing graph neural networks provide a natural mechanism for aggregating such distributed local cues. Each GNN layer updates node representations by combining information from neighboring nodes, and repeated layers allow local recurrence signals to propagate across the entire sequence [13,30]. In this way, the network can infer global rank-related structure from locally repeated Hankel constraints.

Convolutional graph layers such as GCN perform neighborhood averaging that is naturally aligned with the shift-invariant reuse patterns induced by sliding Hankel windows [12]. More general message-passing architectures can incorporate edge features and learn nonlinear aggregation functions that improve robustness under perturbations [13]. Attention-based GNNs further enable the model to assign larger weights to edges corresponding to frequently repeated or more reliable structural constraints, which is particularly beneficial in noisy or near-degenerate regimes [31]. From the geometric deep learning viewpoint, the graph construction injects a domain-specific algebraic structure into the learning process, allowing the network to exploit Hankel symmetries directly rather than rediscovering them implicitly [14].

Importantly, the representation remains solver-aligned. The graph is derived from the same Hankel structures used in classical Sylvester reconstruction, preserving interpretability and consistency with the underlying algebraic theory. Within the proposed pipeline, the GNN operates only as a rank predictor (and optionally a regime classifier), while the final decomposition is always computed and validated through explicit algebraic reconstruction.

3.3. Computational Complexity and Implementation Notes

The coefficient graph can be constructed by scanning Hankel windows and adding sparse couplings satisfying

| i - j | \leq Δ

. With this restriction, graph construction requires time.

O (d R_{max} Δ)

The resulting graph contains

| V | = d + 1

nodes and

| E | = O (d Δ)

edges (up to constants depending on

R_{max}

). Hence, the graph is sparse and approximately banded along the coefficient index axis. For node-feature dimension F and hidden dimension h, the graph storage cost is:

O (d F + d Δ),

and the activation memory of an L-layer message-passing network is approximately:

O (L d h + L d Δ)

up to batching constants. The per-layer arithmetic cost of a GCN-type implementation is:

O (| E | h + | V | h^{2}) = O (d Δ h + d h^{2}) .

These bounds show that the graph module scales linearly in d for fixed

R_{max}

,

Δ

, F, h, and L. They also make clear that the learning module adds overhead relative to a single algebraic reconstruction attempt. Accordingly, the current implementation is evaluated primarily as a robustness-oriented decision layer rather than as an optimized runtime accelerator.

Table 3 reports an implementation-level profile of the graph-construction stage on the CPU environment used for the revision experiments. The measured node and edge counts confirm the expected sparse linear growth in d.

4. Learning-Assisted Rank Prediction

We now introduce the learning module that predicts plausible Waring ranks from the coefficient graph constructed in Section 3. The goal of the network is not to compute the decomposition itself, but rather to provide rank proposals and confidence signals for the classical solver.

Given a predicted rank

\hat{r}

, the algorithm executes a single Sylvester reconstruction attempt (Section 2), If the reconstruction residual satisfies the acceptance criterion, the decomposition is returned; otherwise, the solver may fall back to alternative ranks or regularized recovery strategies.

This design follows the broader paradigm of combining learned proposal mechanisms with exact downstream verification. In this framework, machine learning components guide the algorithm toward promising structural hypotheses, while the final output remains determined by rigorous algebraic checks [14,15].

4.1. Problem Formulation

Let

f (x, y)

be a homogeneous polynomial of degree d with coefficient vector:

a = {(a_{0}, a_{1}, \dots, a_{d})}^{⊤} \in R^{d + 1} .

From

a

, we construct the coefficient graph

G = (V, E, X, E)

as described in Section 3, where

| V | = d + 1

,

X = {x_{i}}_{i \in V}

denotes the node feature set, and

E = {e_{i j}}_{(i, j) \in E}

optionally contains edge features.

The learning task is to predict the Waring rank of f from the graph representation G. Formally, we learn a mapping:

Φ_{θ} : G ⟶ {1, \dots, R_{max}},

parameterized by

θ

, where

R_{max}

denotes a predefined upper bound on the rank. The mapping

Φ_{θ}

is implemented by a graph neural network that outputs a probability distribution over candidate ranks:

p_{θ} (r ∣ G), r \in {1, \dots, R_{max}} .

(15)

The predicted rank is then obtained via:

\hat{r} = arg max_{r \in {1, \dots, R_{max}}} p_{θ} (r ∣ G) .

The classifier is trained using the standard cross-entropy loss, which is widely used in probabilistic classification models and statistical learning frameworks [32].

L_{CE} (θ) = - \sum_{r = 1}^{R_{max}} 1 [r = y] log p_{θ} (r ∣ G),

(16)

where

1 [\cdot]

denotes the indicator function.

Optional auxiliary head (stability regime).

In practical reconstruction scenarios, it is useful to distinguish between well-conditioned instances and near-degenerate cases in which the decomposition may be numerically unstable. To capture this information, we optionally introduce a binary auxiliary label

u \in {0, 1},

where

u = 1

indicates that the instance belongs to a stable regime and

u = 0

corresponds to a potentially ill-conditioned case.

An auxiliary sigmoid output head produces a prediction

\hat{u} \in (0, 1)

, which is trained using binary cross-entropy. The total training objective becomes:

L (θ) = L_{CE} (θ) + λ L_{aux} (θ),

(17)

where

λ \geq 0

controls the relative weight of the auxiliary task.

Table 4 summarizes the learning tasks and outputs used by the proposed rank predictor.

4.2. Network Architecture

The mapping

Φ_{θ}

introduced in Section 4.1 is implemented using a message-passing graph neural network. The architecture consists of

L = 4

graph convolution layers with hidden dimension

h = 128

, followed by a graph-level readout and a lightweight multilayer perceptron (MLP) classifier.

The design follows the general message-passing framework for graph neural networks [13], with convolutional updates similar to graph convolutional networks (GCNs) [12]. A variety of related architectures have been proposed to improve representation power and scalability on graph-structured data. Examples include neighborhood aggregation models such as GraphSAGE [33], expressive graph isomorphism networks that match the discriminative power of the Weisfeiler–Lehman test [34,35], and systematic benchmark studies comparing modern GNN architectures across diverse tasks [36]. Comprehensive surveys further summarize the rapid development of graph neural networks and their applications in structured data analysis [37]. In the present work, we adopt a relatively lightweight message-passing architecture because the coefficient graphs derived from Hankel structure exhibit strong algebraic inductive biases. This architecture allows local structural signals encoded in the coefficient graph to propagate through multiple neighborhoods before producing a global rank prediction.

Message-passing layers.

Let

H^{(0)} = X

denote the initial node feature matrix. For layers

ℓ = 1, \dots, L

, node embeddings are updated according to:

h_{i}^{(ℓ)} = σ (W^{(ℓ)} \cdot AGG ({h_{j}^{(ℓ - 1)} : j \in N (i) \cup {i}}, {e_{i j}})),

(18)

where

N (i)

denotes the neighbor set of node i,

AGG (\cdot)

is a permutation-invariant aggregation operator,

W^{(ℓ)}

are learnable weight matrices, and

σ (\cdot)

denotes a nonlinear activation function (ReLU in our implementation).

The aggregation operator performs neighborhood averaging or normalized summation in the spirit of GCN-style convolution [12], although the formulation naturally extends to more general message-passing schemes that incorporate edge features [13]. To improve generalization, dropout is applied after each message-passing layer.

Graph-level representation.

After L layers, node embeddings encode structural information gathered from multiple-hop neighborhoods. A graph-level representation is then obtained using global mean pooling:

g = READOUT (G) = \frac{1}{| V |} \sum_{i \in V} h_{i}^{(L)} .

(19)

The resulting embedding

g

summarizes the structural patterns of the coefficient graph and serves as input to the classifier.

Classification head.

The final rank prediction is produced by a two-layer multilayer perceptron:

z = MLP (g), p_{θ} (r ∣ G) = softmax {(z)}_{r} .

(20)

The predicted rank

\hat{r}

is then obtained by selecting the class with the largest probability.

Optimization.

Model parameters are optimized using the Adam algorithm [38]. Training uses mini-batches of graphs and early stopping based on validation loss. To improve numerical stability, coefficient vectors are normalized before graph construction, and node features may optionally be standardized across the training set. Dropout regularization is applied within the message-passing stack to mitigate overfitting [39].

The default hyperparameters used for the rank prediction network are summarized in Table 5. Figure 2 illustrates the overall learning-assisted rank prediction pipeline.

4.3. Hybrid Solver

The predicted rank

\hat{r}

is used to prioritize candidate ranks before Sylvester-style reconstruction. Instead of incrementally testing

r = 1, 2, \dots

, the solver performs a single reconstruction attempt at the predicted rank

\hat{r}

. The candidate decomposition is then validated through an explicit residual check.

This design preserves mathematical auditability. If the predicted rank is correct, the solver terminates after one reconstruction step. If the verification test fails, a conservative fallback strategy is applied by exploring a small neighborhood around

\hat{r}

. In this way, the learning module serves only as a proposal mechanism, while correctness is always certified by algebraic verification.

The computational cost of the hybrid solver can be expressed in terms of the cost

T (r)

of one Sylvester reconstruction attempt at rank r. In the classical incremental strategy, the solver tests ranks sequentially until the correct rank

r^{★}

is reached, leading to a total cost:

\sum_{r = 1}^{r^{★}} T (r) .

In contrast, the one-shot learning-assisted approach performs reconstruction at the predicted rank

\hat{r}

, followed by a small number of fallback attempts if necessary. If the candidate set has size C, the algebraic reconstruction cost becomes:

C T (\hat{r}) or more generally \sum_{r_{c} \in C} T (r_{c}),

where

C

denotes the tested candidate ranks. For the neighbor and top 3 strategies, C is small; for the All strategy,

C = R_{max}

.

The full wall-clock cost also includes graph construction, GNN inference, and meta-classifier evaluation. Thus, the proposed method can reduce the number of algebraic rank trials, but it need not be faster in a prototype implementation. This distinction is important: the main empirical advantage reported here is improved robustness of the rank decision under perturbation, not runtime acceleration.

Table 6 summarizes the difference between exhaustive Sylvester rank search and the proposed learning-assisted rank guidance. Figure 3 shows the overall hybrid solver pipeline from GNN-based rank proposal to Sylvester reconstruction and residual verification.

4.4. Meta-Solver: Learned Candidate Verification Beyond Hard Residual Thresholding

Algorithm 2 verifies a single GNN-predicted rank using a fixed residual threshold. While effective in many cases, this binary decision rule can become brittle in the presence of noise or near-degenerate polynomial configurations. In particular, the condition “accept if

ε \leq τ

” does not exploit additional information already available from the learning module, such as posterior rank confidence or the predicted stability score.

Algorithm 2 Learning-Assisted Sylvester Solver (rank-guided and verifiable)

Build coefficient graph G from

a

(Section 3)
Predict rank

\hat{r} = arg {max}_{r} p_{θ} (r ∣ G)

(Optional) predict stability score

\hat{u}

Construct Hankel blocks for rank

\hat{r}

Solve the recurrence system (Section 2)
Recover roots

{{\hat{β}}_{k}}_{k = 1}^{\hat{r}}

and weights

{{\hat{λ}}_{k}}_{k = 1}^{\hat{r}}

Compute verification residual

ε_{max}

using Equation (10)
if

ε_{max} \leq τ (\hat{u})

then
return verified rank

\hat{r}

and decomposition
else
fallback: test ranks

r \in {\hat{r} - 1, \hat{r} + 1}

(or a small window)
        apply the same reconstruction and verification procedure
        return the first verified solution; otherwise declare failure
   end if

To address this limitation, we introduce a Meta-Solver that replaces rigid thresholding with a lightweight learned verification layer. Importantly, the final decomposition remains algebraic and fully auditable: the meta-model only evaluates candidate ranks using solver-derived features, while the decomposition itself is still produced by the Sylvester reconstruction procedure.

The Meta-Solver preserves the same separation of roles as the hybrid pipeline. The GNN produces a structural proposal in the form of rank probabilities and a stability score, whereas the Sylvester solver remains the exact reconstruction backend that computes recurrence coefficients, roots, weights, and residuals. The key difference lies in the decision rule. Instead of verifying a single candidate rank using a fixed threshold, the Meta-Solver evaluates a candidate set and assigns each candidate a probability of correctness:

P (correct ∣ ϕ_{r}),

where

ϕ_{r}

is a feature vector combining GNN confidence with algebraic reconstruction quality for candidate rank r. Verification thus becomes a calibrated candidate-ranking problem.

Let the GNN posterior be

p_{θ} (r ∣ G)

and define:

{\hat{r}}_{gnn} = arg max_{r} p_{θ} (r ∣ G) .

Rather than evaluating only

{\hat{r}}_{gnn}

, we construct a candidate set

C

using one of three strategies. The neighbor strategy uses the local set:

C = {{\hat{r}}_{gnn} - 1, {\hat{r}}_{gnn}, {\hat{r}}_{gnn} + 1} \cap {1, \dots, R_{max}} .

This local correction policy reflects the empirical observation that rank prediction errors typically occur between adjacent classes. The top 3 strategy uses:

C = Top 3 (p_{θ} (r ∣ G)),

that is, the three ranks with highest posterior probability under the GNN. The All strategy evaluates:

C = {1, 2, \dots, R_{max}} .

Although this evaluates all ranks, it differs fundamentally from classical incremental search because candidates are ranked by a learned meta-classifier rather than accepted sequentially by a fixed threshold.

These strategies expose a controllable accuracy–runtime trade-off: neighbor is the most efficient, top 3 offers greater flexibility, and All provides maximal candidate coverage.

For each candidate rank

r_{c} \in C

, the solver performs a single Sylvester reconstruction attempt. This attempt constructs the Hankel recurrence system of order

r_{c}

, solves for recurrence coefficients, recovers roots

{β_{k}}

, solves the Vandermonde system for weights

{λ_{k}}

, reconstructs the coefficients, and computes residuals.

Two residual statistics are computed:

ε_{max} = max_{i} |a_{i} - \sum_{k = 1}^{r_{c}} λ_{k} β_{k}^{i}|, ε_{2} = \frac{∥ a - \hat{a} ∥_{2}}{{∥ a ∥}_{2} + ϵ} .

For each candidate rank, we form a feature vector

ϕ_{r_{c}} = [{log}_{10} (ε_{max}), {log}_{10} (ε_{2}), s_{gnn}, p_{θ} (r_{c} ∣ G), | r_{c} - {\hat{r}}_{gnn} |],

where

s_{gnn}

denotes the GNN stability score. This representation explicitly combines symbolic evidence (residual quality) with learned structural confidence (GNN outputs).

A lightweight binary classifier

h_{ψ}

evaluates candidate ranks and produces a calibrated probability

q_{r_{c}} = h_{ψ} (ϕ_{r_{c}}) = P (correct ∣ ϕ_{r_{c}}) .

In our implementation,

h_{ψ}

is a logistic regression model that estimates the probability that a candidate rank corresponds to the correct decomposition. Logistic regression provides a simple probabilistic classification model and forms a standard component of many statistical learning pipelines [40]. The resulting probability score can therefore be interpreted as a confidence measure for each candidate reconstruction.

Using probabilistic scores rather than a fixed residual threshold enables a more flexible decision rule. Instead of accepting a candidate solely based on the magnitude of its residual, the meta-classifier integrates multiple signals, including reconstruction residuals, structural confidence from the GNN, and rank proximity, to estimate the likelihood that the candidate rank is correct. Such confidence-aware decision rules are closely related to calibrated classification and selective prediction frameworks, where models are allowed to reject uncertain predictions in order to improve reliability [41,42]. The final rank is then selected by choosing the candidate with the highest predicted probability.

{\hat{r}}_{meta} = arg max_{r_{c} \in C} q_{r_{c}} .

A confidence threshold

τ_{meta}

(default

0.5

) determines whether

{\hat{r}}_{meta}

is accepted. If

{max}_{r_{c} \in C} q_{r_{c}} \leq τ_{meta}

, the solver triggers the chosen fallback policy, such as rejection, returning the best candidate, or reverting to

{\hat{r}}_{gnn}

.

Crucially, every accepted solution still corresponds to an explicit algebraic reconstruction and can therefore be verified using the same residual criteria employed throughout this work. The use of logistic regression provides a simple and interpretable probabilistic calibration layer over solver-derived features [43,44]. The meta-classifier is trained on a held-out validation split using features extracted from solver executions. For each validation instance, the frozen GNN first produces the posterior distribution

p_{θ} (r ∣ G)

, the predicted rank

{\hat{r}}_{gnn}

, and the stability score

s_{gnn}

. We then construct

C

using one of the candidate strategies above, perform one Sylvester reconstruction attempt for each

r_{c} \in C

, extract

ϕ_{r_{c}}

, and assign the binary label:

y_{r_{c}} = \{\begin{matrix} 1, & r_{c} = r^{★}, \\ 0, & otherwise, \end{matrix}

where

r^{★}

denotes the ground-truth rank. A logistic regression classifier is then fitted on the resulting dataset

{(ϕ_{r_{c}}, y_{r_{c}})}

.

This training procedure is lightweight and modular. The GNN parameters remain fixed, and the meta-classifier can be retrained independently for different polynomial degrees or noise regimes without modifying the underlying network.

The hybrid solver described in Section 4.3 can be interpreted as a special case of the Meta-Solver framework. Specifically, it corresponds to a singleton candidate set centered at

{\hat{r}}_{gnn}

(with optional fixed neighbors) and a deterministic threshold-based decision rule applied to the residual.

The Meta-Solver generalizes this strategy by learning a candidate scoring function over multiple ranks. This allows the system to integrate solver-derived evidence and learned structural confidence, improving robustness when residual values alone are ambiguous while preserving the verifiable algebraic backend.

The complete candidate-scored Meta-Solver procedure is summarized in Algorithm 3.

Algorithm 3 GNN + Meta-Solver (candidate-scored and verifiable)

Build coefficient graph G from input coefficients a
Run GNN to obtain posterior

p_{θ} (r ∣ G)

, predicted rank

{\hat{r}}_{gnn}

, and stability score

s_{gnn}

Construct candidate set

C

using the chosen strategy (Neighbor/Top-3/All)
for each candidate rank

r_{c} \in C

do
Run one-shot Sylvester reconstruction at rank

r_{c}

Compute residual statistics

ε_{max}

and

ε_{2}

Form meta-feature vector

ϕ_{r_{c}} = [{log}_{10} ε_{max}, {log}_{10} ε_{2}, s_{gnn}, p_{θ} (r_{c} ∣ G), | r_{c} - {\hat{r}}_{gnn} |]

Compute meta-score

q_{r_{c}} = h_{ψ} (ϕ_{r_{c}})

end for
Select best candidate

{\hat{r}}_{meta} = arg max_{r_{c} \in C} q_{r_{c}}

if

{max}_{r_{c} \in C} q_{r_{c}} > τ_{meta}

then
return

{\hat{r}}_{meta}

and verified decomposition
else
return fallback result (reject/best candidate/

{\hat{r}}_{gnn}

)
end if

5. Experimental Setup

This section describes the dataset generation procedure, baseline implementations, and evaluation metrics. All experiments were implemented in Python 3.10, and the full implementation will be released on GitHub upon acceptance to ensure reproducibility.

5.1. Dataset Generation

We construct large-scale synthetic datasets of binary forms with known ground-truth Waring rank. Each polynomial is generated from its decomposition parameters and then expanded into its coefficient sequence. A sample corresponds to a binary form:

f (x, y) = \sum_{k = 1}^{r} λ_{k} {(x + β_{k} y)}^{d},

where r denotes the ground-truth rank. The resulting coefficient sequence

{a_{i}}_{i = 0}^{d}

serves as the input representation for both the learning-based models and the classical algebraic solvers.

Unless otherwise stated, all experiments use degrees

d \in {50, 100, 200}

, ranks

r \in {1, \dots, 10}

,

100, 000

samples per degree, an

80 % / 10 % / 10 %

train/validation/test split, real coefficients, and a fixed random seed for reproducibility.

The evaluation separates controlled ground-truth accuracy from sensitivity to assumption mismatch. Rank accuracy is measured on generated instances for which the true rank and decomposition parameters are known by construction. Distribution-shift stress tests then change the sampling prior while preserving ground-truth labels, using shifted root distributions, clustered-root instances, lognormal weight magnitudes, higher noise levels, and degree/rank regimes outside the default training grid. These tests measure robustness within and near the modeled problem class; they should not be interpreted as confirmation of performance on arbitrary naturally occurring data.

For each polynomial sample with degree d, a ground-truth rank is sampled as follows:

r \sim Uniform {1, \dots, R_{max}} .

The decomposition parameters

β_{k}

are sampled independently from either a bounded uniform distribution or a normal distribution,

β_{k} \sim U [- 1, 1] or N (0, 1),

while enforcing a minimum separation condition

min_{i \neq j} | β_{i} - β_{j} | \geq δ,

which avoids degenerate root configurations. Weights are sampled as follows:

λ_{k} \sim N (0, 1),

with extremely small magnitudes (

| λ_{k} | < λ_{min}

) rejected to avoid numerical degeneracy. The coefficient sequence is then constructed from the moment identity:

a_{i} = \sum_{k = 1}^{r} λ_{k} β_{k}^{i}, i = 0, \dots, d .

The coefficient vector is then

a = [a_{0}, a_{1}, \dots, a_{d}] .

For learning-based models, we use the normalized representation:

\tilde{a} = \frac{a}{{∥ a ∥}_{2} + ϵ} .

The raw coefficient vector

a

is retained for solver-based reconstruction. To evaluate robustness, additive Gaussian noise is introduced as follows:

a^{(σ)} = a + σ η, η \sim N (0, I) .

Experiments consider noise levels

σ \in {0, 10^{- 6}, 10^{- 4}, 10^{- 3}, 10^{- 2}, 10^{- 1}} .

Samples are shuffled using a fixed seed and partitioned into training, validation, and test subsets using an

80 / 10 / 10

ratio.

The synthetic dataset generation procedure is summarized in Algorithm 4.

Algorithm 4 Synthetic Dataset Generation for Binary Waring Decomposition

Require: Degree set

D

; rank range

{1, \dots, R_{max}}

; samples per degree N; noise levels

Σ

Ensure: Dataset

S = {(\tilde{a}, r, a, d, σ)}

Initialize

S \leftarrow \emptyset

for each degree

d \in D

do
for

n = 1

to N do
Sample rank

r \sim Uniform {1, \dots, R_{max}}

Sample distinct roots

{β_{k}}_{k = 1}^{r}

Sample weights

{λ_{k}}_{k = 1}^{r}

Compute coefficients

a_{i} \leftarrow \sum_{k = 1}^{r} λ_{k} β_{k}^{i}, i = 0, \dots, d

a \leftarrow [a_{0}, \dots, a_{d}]

for each noise level

σ \in Σ

do
if

σ > 0

then
Sample

η \sim N (0, I)

a^{(σ)} \leftarrow a + σ η

else

a^{(σ)} \leftarrow a

end if
Normalize input

\tilde{a} = a^{(σ)} / (∥ a^{(σ)} ∥_{2} + ϵ)

Add sample

(\tilde{a}, r, a^{(σ)}, d, σ)

to

S

        end for
      end for
   end for
   Shuffle

S

and split into train/validation/test
return

S

5.2. Baselines

We compare the proposed learning-assisted solver with representative baselines spanning classical algebraic methods and hybrid learning-based approaches. The Classical Sylvester rank search baseline tests candidate ranks sequentially

r = 1, 2, \dots

until a valid reconstruction is found, and serves as the primary symbolic baseline. The Hankel-SVD rank estimation baseline estimates rank from the singular value spectrum of the associated Hankel matrix and then performs a single Sylvester reconstruction step. The GNN-guided hybrid solver is the rank-guided method from Section 4.3, where the GNN predicts one rank candidate and verification uses a fixed residual threshold.

We also evaluate three Meta-Solver variants. Neighbor evaluates

{{\hat{r}}_{gnn} - 1, {\hat{r}}_{gnn}, {\hat{r}}_{gnn} + 1}

, top 3 evaluates the three ranks with highest posterior probability, and All evaluates all ranks

1, \dots, R_{max}

using the meta-classifier.

These variants allow us to analyze the trade-off between computational cost and robustness of the candidate selection stage.

5.3. Metrics

We evaluate performance along three dimensions: rank identification accuracy, reconstruction quality, and computational efficiency.

Rank identification accuracy: Let r denote the ground-truth rank and $\hat{r}$ the predicted rank. Classification accuracy is defined as follows:

$Acc = \frac{1}{N} \sum_{n = 1}^{N} 1 [{\hat{r}}^{(n)} = r^{(n)}] .$

To account for class imbalance across rank values, we additionally report the macro-F1 score computed over the rank classes.

Reconstruction error: The reconstruction quality is measured using the normalized $ℓ_{2}$ residual:

$ε_{2} = \frac{∥ a - \hat{a} ∥_{2}}{{∥ a ∥}_{2} + ϵ},$

where the reconstructed coefficients are:

${\hat{a}}_{i} = \sum_{k = 1}^{\hat{r}} {\hat{λ}}_{k} {\hat{β}}_{k}^{i} .$

Verified success rate: We report the proportion of instances whose reconstruction satisfies the verification criterion:

$VSR = \frac{1}{N} \sum_{n = 1}^{N} 1 [ε_{2}^{(n)} \leq τ] .$

This metric reflects the fraction of cases in which the solver returns a decomposition that passes the residual-based verification test.

Runtime: Computational efficiency is evaluated using the average wall-clock runtime per instance (in milliseconds), measured over the full test set under identical hardware and implementation conditions.

6. Results

This section reports the experimental results for rank identification accuracy, verified reconstruction success, runtime, and robustness under coefficient noise. In addition to the classical exhaustive Sylvester baseline and the original hybrid solver, we evaluate the proposed GNN + Meta-Solver variants introduced in Section 4.4, using three candidate-selection strategies (Neighbor, Top-3, and All). All results are computed on the test split under the evaluation protocol and verification criteria defined in Section 5.3.

Before test-time evaluation, a lightweight logistic-regression meta-classifier is trained on a validation-derived meta dataset. The resulting meta-classifier accuracies are

0.6182

(neighbor),

0.6033

(top 3), and

0.5960

(All). Although these binary accuracies are moderate, the classifier is used only to rank candidate ranks; the final decomposition is always obtained through explicit algebraic reconstruction and verification.

6.1. Rank Prediction Accuracy

Table 7 summarizes rank identification accuracy (Acc), macro-F1 score (F1), and verified success rate (VSR) for polynomial degrees

d \in {50, 100, 200}

. Several observations are consistent across all tested degrees.

First, the classical exhaustive Sylvester baseline achieves the highest raw rank identification accuracy in this clean synthetic setting, reaching approximately 70–72% accuracy with

100 %

VSR across all degrees. This behavior is expected because the classical solver explicitly enumerates candidate ranks until a valid reconstruction is found.

Second, the original hybrid solver (GNN-guided one-shot reconstruction combined with a fixed residual threshold) performs poorly in rank classification, with accuracy around 11–13% and macro-F1 near

3 %

. Despite this low classification accuracy, the method still maintains a relatively high VSR (∼90%), indicating that algebraic verification prevents catastrophic reconstruction failures. However, the results suggest that a single fixed threshold is insufficient for reliable rank selection.

Third, the proposed Meta-Solver variants consistently improve over the original hybrid solver. Across all degrees, the performance ordering is stable:

Meta (All) > Meta (Top - 3) > Meta (Neighbor) > Hybrid (Original) .

Among these variants, Meta (All) achieves the strongest performance, with rank accuracies of

49.70 %

,

45.70 %

, and

46.60 %

for

d = 50, 100, 200

, respectively. Importantly, it restores the verified success rate to

100 %

in all cases.

Compared with the original hybrid solver, this corresponds to an improvement of roughly 34–37 percentage points in rank identification accuracy. These results support the central motivation of Section 4.4: integrating GNN-derived structural confidence with algebraic residual statistics yields a substantially more reliable decision mechanism than a rigid residual threshold while maintaining full algebraic verification.

Table 7 reports the rank identification accuracy, macro-F1 score, and verified success rate on the test set.

6.2. Runtime Comparison

Table 8 reports the average wall-clock runtime per instance (in milliseconds) for polynomial degrees

d = 50, 100, 200

. As expected, runtime increases with degree for all methods due to the growth of Hankel matrices, larger linear systems, and more expensive root and Vandermonde computations.

The classical exhaustive Sylvester baseline is the fastest method in the current implementation (

\sim 0.93

ms at

d = 50, 100

and

1.63

ms at

d = 200

). This behavior is expected because the classical procedure performs a minimal sequence of algebraic checks without constructing auxiliary graph representations or evaluating multiple candidate ranks.

The original hybrid solver introduces additional overhead due to graph construction and GNN inference, resulting in runtimes of

10.77

,

18.86

, and

34.13

ms for

d = 50, 100, 200

, respectively.

The proposed Meta-Solver variants incur additional computational cost because multiple candidate ranks are evaluated and meta-features are computed for each candidate. The measured runtimes are: neighbor (

12.29 / 21.12 / 37.99

ms), top 3 (

12.32 / 21.32 / 38.48

ms), and All (

16.35 / 28.22 / 49.49

ms) for

d = 50 / 100 / 200

, respectively.

Within the Meta-Solver family, the runtime ordering follows the size of the candidate set:

Neighbor \approx Top - 3 < All .

This reflects the additional reconstruction and scoring steps required when evaluating more candidate ranks. While the All strategy provides the strongest rank prediction accuracy (Section 6.1), it also incurs the highest computational cost.

Overall, the results highlight a trade-off between computational cost and decision robustness. The Meta-Solver framework allows this trade-off to be tuned through candidate-set size, enabling practitioners to select a configuration that balances runtime and reconstruction reliability for a given application.

Table 8 reports the average wall-clock runtime per instance for the classical, hybrid, and Meta-Solver variants.

6.3. Architecture and Distribution-Shift Stress Tests

To address the choice of graph message passing more directly, we ran an additional pilot comparison against non-graph sequence encodings. The MLP, CNN, and Transformer baselines were trained on the same small controlled sample set (600 training, 200 validation, and 300 test instances), while the GNN row uses the released pretrained RankGNN checkpoint evaluated on the same test set. These results are intended as architecture stress tests rather than a replacement for the main large-scale experiments.

Table 9 reports the pilot comparison of different coefficient-sequence encodings under the architecture stress-test setting.

The pilot comparison does not show an advantage for the CNN or Transformer encodings in this setting. The Transformer has the largest parameter count, while the graph model is smaller and better aligned with Hankel reuse patterns. At the same time, the graph model is slower at inference in this unoptimized CPU implementation because each instance requires explicit graph construction and message passing.

We also evaluated controlled distribution-shift settings at

d = 200

and

σ = 10^{- 4}

using 200 samples per setting. The shifted settings preserve known ground-truth ranks but alter the sampling assumptions through normal root distributions, lognormal weights, clustered roots, and ill-conditioned weights.

Table 10 reports the distribution-shift stress-test results under the strict verification tolerance.

These stress tests support a more cautious interpretation of the proposed method. The Meta-Solver generally maintains high verified-success rates because incorrect candidates can be rejected by algebraic verification, but rank accuracy degrades under substantial distribution shift, especially for normal-root and near-collision regimes. This confirms that the method is best understood as a robustness-oriented decision layer within the modeled problem class, not as evidence of unconstrained real-world generalization.

6.4. Noise Robustness

We evaluate robustness under coefficient noise over the grid:

σ \in {0, 10^{- 6}, 10^{- 4}, 10^{- 3}, 10^{- 2}, 10^{- 1}} .

Figure 4 summarizes the trends in rank identification accuracy and verified success rate.

A consistent degradation trend is observed across all methods as the noise level increases. The original hybrid solver deteriorates rapidly under noise and approaches near-zero performance at moderate perturbation levels (

σ \geq 10^{- 3}

). This behavior indicates that the fixed residual threshold used in the hybrid solver is not sufficiently robust when the moment sequence is perturbed.

The classical exhaustive baseline also exhibits strong sensitivity to coefficient perturbations. Even very small noise levels can cause rank misidentification or verification failure in some cases. This phenomenon is consistent with the known numerical instability of structured low-rank recovery and Vandermonde-type systems, which are highly sensitive to perturbations in moment sequences [6,10,26,27].

In contrast, the proposed Meta-Solver variants demonstrate improved robustness in the low-to-moderate noise regime. Among these variants, Meta (All) consistently achieves the strongest performance among the learning-based approaches. For example:

At $σ = 10^{- 6}$ , Hybrid achieves $11.30 %$ Acc and $90.45 %$ VSR, while Meta (All) reaches $34.90 %$ Acc and $100.00 %$ VSR;
At $σ = 10^{- 4}$ , Hybrid drops to $1.80 %$ Acc and $15.55 %$ VSR, whereas Meta (All) maintains $25.15 %$ Acc and $96.30 %$ VSR;
At $σ = 10^{- 3}$ , Hybrid fails completely ( $0.00 %$ Acc, $0.00 %$ VSR), while Meta (All) still achieves $15.70 %$ Acc and $50.25 %$ VSR.

These results support the design principle of the Meta-Solver: when residual magnitudes alone become unreliable under noisy conditions, combining residual information with GNN posterior confidence and stability features allows the decision layer to select more reliable rank candidates.

At larger noise levels (

σ \geq 10^{- 2}

), all methods eventually fail. This behavior is expected because severe perturbations corrupt the moment sequence and lead to extreme ill-conditioning in root-finding and Vandermonde recovery steps [8,9]. In this regime, performance degradation is dominated by downstream numerical conditioning rather than the quality of rank proposals alone.

Table 11 reports the raw noise-robustness results corresponding to Figure 4.

6.5. Ablation on GNN-Derived Meta-Features

To better understand the contribution of each component in the Meta-Solver, we conduct an ablation study on the meta-feature design. Recall that the meta-classifier operates on a feature vector composed of reconstruction residual statistics and auxiliary GNN-derived signals. In the full configuration, the feature vector includes: (i) residual magnitudes

{log}_{10} (ε_{max})

and

{log}_{10} (ε_{2})

, (ii) the GNN stability score

s_{gnn}

, (iii) the candidate posterior probability

p_{θ} (r_{c} ∣ G)

, and (iv) the rank-distance term

| r_{c} - {\hat{r}}_{gnn} |

.

For the ablation experiment, we remove the GNN-derived features (ii)–(iv), leaving only the residual-based features. This allows us to isolate the effect of learned structural priors provided by the GNN.

Table 12 reports rank identification accuracy for the Meta-Solver under the three candidate strategies (neighbor, top 3, and All), comparing the full feature configuration against the reduced configuration across degrees

d \in {50, 100, 200}

.

Across all tested settings, removing the GNN-derived features consistently reduces accuracy. This confirms that the Meta-Solver benefits not merely from evaluating multiple candidate ranks, but from combining algebraic reconstruction evidence with learned structural priors.

The largest performance drop occurs for the Meta (All) strategy. At

d = 50

, accuracy decreases from

51.4 %

to

37.8 %

(a drop of

13.6

points); at

d = 100

, from

46.1 %

to

36.5 %

(a drop of

9.6

points); and at

d = 200

, from

45.1 %

to

34.6 %

(a drop of

10.5

points). This behavior is expected because the All strategy evaluates the full candidate set

{1, \dots, R_{max}}

, where many incorrect ranks can still produce small residuals due to overfitting or numerical artifacts. In this larger search space, the GNN-derived posterior confidence and stability provide important priors that help suppress implausible candidates.

By contrast, the neighbor and top 3 strategies show smaller performance degradation when GNN features are removed. For neighbor, the drop ranges from

1.5

to

3.3

points; for top 3, from

3.2

to

6.4

points. This milder degradation is consistent with the fact that these strategies already restrict the candidate set based on the GNN rank prediction, resulting in a cleaner candidate pool in which residual information alone is often sufficient.

Overall, this ablation study demonstrates that the effectiveness of the Meta-Solver arises from feature fusion. Residual features provide explicit reconstruction evidence from algebraic verification, while GNN-derived signals supply global structural confidence about plausible ranks. The combination becomes particularly important when the candidate search space is large.

7. Discussion

Algorithmic aspects of tensor rank and decomposition problems have attracted significant attention in theoretical computer science and numerical analysis. Several works have studied the computational complexity, identifiability conditions, and algorithmic strategies for tensor decomposition problems, highlighting both their expressive power and computational challenges [45].

This work advocates a verifiable hybrid paradigm in which machine learning is used primarily for structural inference—such as predicting plausible ranks or estimating conditioning regimes—while the final output remains an algebraic object that can be certified through explicit residual verification. The overall design reflects a broader research direction that combines data-driven inference with rigorous algorithmic verification. Rather than replacing classical algorithms, neural models act as proposal or ranking mechanisms that guide downstream solvers toward promising hypotheses while preserving mathematical correctness guarantees.

From a machine learning perspective, the proposed system can be viewed as a probabilistic decision layer embedded within an exact computational pipeline. The graph neural network provides structural predictions and confidence signals derived from the coefficient graph, while the meta-classifier combines these signals with solver-derived residual features to estimate the likelihood that a candidate rank is correct. Such probabilistic reasoning and calibrated prediction mechanisms are well studied in statistical learning and probabilistic modeling [40,41]. At the same time, recent developments in graph neural networks have demonstrated that message-passing architectures can effectively capture relational patterns and structural dependencies in graph-structured data [33,34,37]. The present framework combines these ideas with classical algebraic solvers, resulting in a hybrid system that preserves interpretability while improving robustness in noisy or numerically ambiguous regimes.

7.1. Coefficient Graphs as an Inductive Bias

The Hankel/catalecticant structure implies strong dependencies among coefficients. Sliding windows reuse anti-diagonal entries and encode linear recurrence constraints. Representing these dependencies as a graph exposes the rank signal to message passing, which is well suited for capturing relational structure and local-to-global patterns on structured domains [12,13,30,31].

From this perspective, the GNN is not learning algebraic rules in a purely black-box fashion. Instead, it learns compact representations of recurrence-consistency cues induced by Hankel reuse and root separation. This inductive bias explains why relatively small graph models are sufficient to extract informative rank signals.

7.2. Interpretability and Correctness Guarantees

A key advantage of the proposed approach over purely neural regression methods is that the final output remains auditable. The solver returns

(\hat{r}, {{\hat{β}}_{k}, {\hat{λ}}_{k}})

, and acceptance is determined by an explicit reconstruction residual. Thus, the learning component cannot silently produce an incorrect decomposition: predictions must pass algebraic verification.

The feature-ablation experiment further clarifies this mechanism. When GNN-derived auxiliary signals (stability score, posterior probability, and rank-distance) are removed from the meta-classifier, rank accuracy consistently decreases. This indicates that the Meta-Solver’s advantage arises from combining algebraic evidence with learned structural priors rather than simply evaluating multiple candidate ranks.

7.3. Efficiency Considerations

Classical Sylvester search evaluates candidate ranks sequentially until verification succeeds. The GNN-guided solver reduces this search by proposing a likely rank, while the Meta-Solver improves reliability by scoring several plausible candidates rather than relying on a single threshold.

In the current implementation, however, runtime improvements are not yet observed because graph construction, neural inference, and candidate scoring introduce additional overhead. Consequently, the present system should be viewed primarily as a more robust decision layer rather than an optimized computational accelerator.

7.4. Robustness and Numerical Conditioning

Experiments under coefficient noise show that the Meta-Solver substantially improves robustness relative to the original hybrid solver. By combining residual-based evidence with learned posterior confidence, the decision layer can better distinguish plausible ranks when residual magnitudes alone become unreliable.

Nevertheless, the method remains subject to the intrinsic numerical conditioning limits of Vandermonde-type recovery. When roots are nearly colliding, the underlying linear systems become ill-conditioned and weight recovery may be unstable, a phenomenon widely studied in numerical linear algebra and perturbation analysis of structured matrices [8,11,26,27,29]. Thus, the proposed approach improves the reliability of rank selection but does not eliminate fundamental conditioning challenges.

7.5. Limitations

Several limitations remain.

Distribution shift: The rank predictor is trained on parameterized synthetic distributions of $(β_{k}, λ_{k})$ . These distributions provide ground truth and reproducibility, but they do not cover all possible data-generating mechanisms. Performance may degrade if deployment data follow substantially different priors, although the verification step preserves correctness by rejecting candidates that do not pass reconstruction checks.
Conditioning sensitivity: Instances with clustered roots remain numerically difficult for both classical and proposed methods.
Inference overhead: Graph construction, neural inference, and meta-candidate scoring introduce runtime and memory overhead in the current prototype implementation.
Meta-feature dependence: The strongest Meta-Solver variants rely on informative GNN-derived confidence cues, as confirmed by the ablation study.

7.6. Learning to Guide Exact Solvers

More broadly, the results support the view that neural models can serve as proposal and ranking mechanisms within exact computational pipelines while preserving formal verifiability. The same principle may apply to other structured decomposition problems where a discrete structural parameter must be identified before applying a downstream exact solver.

8. Conclusions

This paper introduced a learning-assisted Sylvester framework for binary Waring decomposition that integrates graph-based structural inference with exact algebraic reconstruction. By representing coefficient interactions through a coefficient graph derived from Hankel structure, a graph neural network can infer plausible Waring ranks from structured dependencies among polynomial coefficients [12,13,30,31]. The predicted rank then guides a Sylvester-style reconstruction, while correctness is guaranteed by an explicit residual verification step. The results indicate that the primary benefit of the proposed framework lies in improving the robustness of rank identification under noise and ambiguous residual regimes, rather than replacing the classical solver in idealized noiseless settings.

To improve robustness beyond the original GNN-guided one-shot hybrid solver, we further proposed a Meta-Solver that scores multiple candidate ranks using both reconstruction residuals and GNN-derived confidence features. The experimental results demonstrate that this meta-guided decision layer significantly improves rank identification accuracy and verified success rates compared with the original hybrid solver, particularly in low-to-moderate noise regimes. Among the tested variants, the Meta (All) strategy provides the strongest overall performance, indicating that learned candidate ranking can substantially improve reliability when a single-rank decision is insufficient.

An ablation study further shows that the improvement arises from the fusion of symbolic and learned information. Removing GNN-derived auxiliary signals—such as stability scores and posterior confidence—consistently reduces rank accuracy, especially when the candidate search space is large. This confirms that the Meta-Solver benefits from combining explicit algebraic reconstruction evidence with learned structural priors.

Although the current implementation does not yet provide a runtime advantage over the classical exhaustive baseline due to graph construction and inference overhead, the proposed approach improves the robustness of the rank-selection stage relative to the original one-shot hybrid solver in the tested noisy regimes. More broadly, the results support a general paradigm for symbolic–numeric computation: neural models can serve as structural proposal or candidate-ranking mechanisms within exact computational pipelines while preserving formal verifiability [14,15].

Future work will explore extending the framework to multivariate Waring decomposition and higher-order symmetric tensor rank problems [3,4], improving numerical stability for clustered-root regimes [8,26,27], and optimizing the learned front-end to reduce inference overhead and improve practical scalability. To facilitate reproducibility and further research, the implementation of the proposed framework will be publicly released on GitHub upon acceptance of this manuscript.

Author Contributions

Conceptualization, C.Z. and M.-J.-S.W.; methodology, W.W., C.-W.L. and M.-J.-S.W.; software, W.W.; validation, W.W. and C.-W.L.; formal analysis, W.W. and M.-J.-S.W.; investigation, W.W. and C.-W.L.; data curation, W.W.; writing—original draft preparation, W.W.; writing—review and editing, C.-W.L., M.-J.-S.W. and C.Z.; visualization, W.W.; supervision, C.Z. and M.-J.-S.W.; project administration, C.Z.; funding acquisition, C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 12101111. The APC was funded by the same project.

Data Availability Statement

The source code used in this study will be publicly available at: https://github.com/liangchenwei666-ai/gnness (accessed on 9 June 2026) upon publication of this article. All datasets used in the experiments were synthetically generated using the procedures described in this paper.

Acknowledgments

The authors used AI-assisted language tools only for language polishing, grammar checking, and proofreading during the preparation of the manuscript. The core ideas, theoretical analysis, algorithm design, experimental design, data analysis, and scientific conclusions were developed entirely by the authors. All AI-assisted language edits were carefully reviewed and verified by the authors, who take full responsibility for the final content of the manuscript.

Conflicts of Interest

Mujiangshan Wang was employed by Shenzhen Kaihong Digital Industry Development Co., Ltd. The remaining authors declare that this research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest. The funders had no role in the design of this study; in the collection, analyses, or interpretation of data; in the writing of this manuscript; or in the decision to publish the results.

References

Sylvester, J.J. An Essay on Canonical Forms, Supplement to a Sketch of a Memoir on Elimination, Transformation and Canonical Forms. In The Collected Mathematical Papers of James Joseph Sylvester, Reprinted as Paper 34; Cambridge University Press: Cambridge, UK, 1851; Volume I, pp. 203–216. [Google Scholar]
Iarrobino, A.; Kanev, V. Power Sums, Gorenstein Algebras, and Determinantal Loci. In Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1999; Volume 1721. [Google Scholar] [CrossRef]
Landsberg, J.M.; Teitler, Z. On the Ranks and Border Ranks of Symmetric Tensors. Found. Comput. Math. 2010, 10, 339–366. [Google Scholar] [CrossRef]
Comon, P.; Golub, G.H.; Lim, L.H.; Mourrain, B. Symmetric Tensors and Symmetric Tensor Rank. SIAM J. Matrix Anal. Appl. 2008, 30, 1254–1279. [Google Scholar] [CrossRef]
Bernardi, A.; Gimigliano, A.; Idà, M. Computing the Symmetric Rank of Symmetric Tensors. J. Symb. Comput. 2011, 46, 34–53. [Google Scholar] [CrossRef]
Sauer, T. Prony’s Method: An Old Trick for New Problems. In Snapshots of Modern Mathematics from Oberwolfach; Mathematisches Forschungsinstitut Oberwolfach: Oberwolfach, Germany, 2018. [Google Scholar]
Comas, G.; Seiguer, M. On the Rank of a Binary Form. Found. Comput. Math. 2011, 11, 65–78. [Google Scholar] [CrossRef]
Higham, N.J. Accuracy and Stability of Numerical Algorithms, 2nd ed.; SIAM: Philadelphia, PA, USA, 2002. [Google Scholar] [CrossRef]
Wilkinson, J.H. Rounding Errors in Algebraic Processes; Prentice-Hall: Englewood Cliffs, NJ, USA, 1963. [Google Scholar]
Markovsky, I. Low Rank Approximation: Algorithms, Implementation, Applications; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Stewart, G.W.; Sun, J.G. Matrix Perturbation Theory; Academic Press: Cambridge, MA, USA, 1990. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; Proceedings of Machine Learning Research. Volume 70, pp. 1263–1272. [Google Scholar]
Bronstein, M.M.; Bruna, J.; Cohen, T.; Veličković, P. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv 2021, arXiv:2104.13478. [Google Scholar] [CrossRef]
Battaglia, P.W.; Hamrick, J.B.; Bapst, V.; Sanchez-Gonzalez, A.; Zambaldi, V.; Malinowski, M.; Tacchetti, A.; Raposo, D.; Santoro, A.; Faulkner, R.; et al. Relational Inductive Biases, Deep Learning, and Graph Networks. arXiv 2018, arXiv:1806.01261. [Google Scholar] [CrossRef]
Wang, S.; Wang, M. The Edge Connectivity of Expanded k-Ary n-Cubes. Discret. Dyn. Nat. Soc. 2018, 2018, 7867342. [Google Scholar] [CrossRef]
Wang, M.; Wang, S. Diagnosability of Cayley graph networks generated by transposition trees under the comparison diagnosis model. Ann. Appl. Math. 2016, 32, 166–173. [Google Scholar]
Zhao, L.; Wang, M.; Zhang, X.; Lin, Y.; Wang, S. An algorithm for the orientation of complete bipartite graphs. In Proceedings of the 2017 International Conference on Applied Mathematics, Modelling and Statistics Application (AMMSA 2017); Atlantis Press: Dordrecht, The Netherlands, 2017; pp. 361–364. [Google Scholar]
Wang, M.-J.-S.; Yuan, J.; Lin, S.-W. Ordered and Hamilton Digraphs. Chin. Q. J. Math. 2010, 25, 317–326. [Google Scholar]
Bernardi, A.; Brachat, J.; Mourrain, B. A Comparison of Different Notions of Ranks of Symmetric Tensors. arXiv 2012, arXiv:1210.8169. [Google Scholar] [CrossRef]
Bernardi, A.; Brachat, J.; Comon, P.; Mourrain, B. Multihomogeneous Polynomial Decomposition Using Moment Matrices. In Proceedings of the International Symposium on Symbolic and Algebraic Computation (ISSAC), San Jose, CA, USA, 8–11 June 2011. [Google Scholar] [CrossRef]
Kruskal, J.B. Three-Way Arrays: Rank and Uniqueness of Trilinear Decompositions, with Application to Arithmetic Complexity and Statistics. Linear Algebra Its Appl. 1977, 18, 95–138. [Google Scholar] [CrossRef]
Golub, G.H.; Pereyra, V. The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate. SIAM J. Numer. Anal. 1973, 10, 413–432. [Google Scholar] [CrossRef]
Golub, G.H.; Pereyra, V. Separable Nonlinear Least Squares: The Variable Projection Method and its Applications. Inverse Probl. 2003, 19, R1–R26. [Google Scholar] [CrossRef]
Björck, Å. Numerical Methods for Least Squares Problems; SIAM: Philadelphia, PA, USA, 1996. [Google Scholar]
Björck, Å.; Pereyra, V. Solution of Vandermonde Systems of Equations. Math. Comput. 1970, 24, 893–903. [Google Scholar] [CrossRef][Green Version]
Gautschi, W.; Inglese, G. Lower Bounds for the Condition Number of Vandermonde Matrices. Numer. Math. 1987, 52, 241–250. [Google Scholar] [CrossRef]
Golub, G.H.; Van Loan, C.F. Matrix Computations, 4th ed.; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
Demmel, J.W. Applied Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs. arXiv 2017, arXiv:1706.02216. [Google Scholar] [CrossRef]
Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How Powerful are Graph Neural Networks? arXiv 2019, arXiv:1810.00826. [Google Scholar] [CrossRef]
Morris, C.; Ritzert, M.; Fey, M.; Hamilton, W.L.; Lenssen, J.E.; Rattan, G.; Grohe, M. Weisfeiler and Leman Go Neural: Higher-Order Graph Neural Networks. Proc. Aaai Conf. Artif. Intell. 2019, 33, 4602–4609. [Google Scholar] [CrossRef]
Dwivedi, V.P.; Joshi, C.K.; Luu, A.T.; Laurent, T.; Bengio, Y.; Bresson, X. Benchmarking Graph Neural Networks. J. Mach. Learn. Res. 2020, 24, 1–48. [Google Scholar]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.E.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On Calibration of Modern Neural Networks. arXiv 2017, arXiv:1706.04599. [Google Scholar] [CrossRef]
Chow, C.K. On Optimum Recognition Error and Reject Tradeoff. IEEE Trans. Inf. Theory 1970, 16, 346–347. [Google Scholar] [CrossRef]
Cox, D.R. The Regression Analysis of Binary Sequences. J. R. Stat. Soc. Ser. B (Methodol.) 1958, 20, 215–232. [Google Scholar] [CrossRef]
Hosmer, D.W.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression, 3rd ed.; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
Moitra, A.; Valiant, G. Settling the Polynomial Learnability of Mixtures of Gaussians. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science, Las Vegas, NV, USA, 23–26 October 2010; pp. 93–102. [Google Scholar]

Figure 1. Hankel coupling graph. Nodes represent coefficients

a_{i}

. Colored diagonal bands illustrate index-sum couplings corresponding to Hankel anti-diagonals within sliding windows of the sequence. These couplings reflect the repeated reuse of coefficients in Hankel matrices underlying the Sylvester reconstruction framework.

Figure 1. Hankel coupling graph. Nodes represent coefficients

a_{i}

. Colored diagonal bands illustrate index-sum couplings corresponding to Hankel anti-diagonals within sliding windows of the sequence. These couplings reflect the repeated reuse of coefficients in Hankel matrices underlying the Sylvester reconstruction framework.

Figure 2. Learning-assisted rank prediction pipeline. The coefficient graph is processed by stacked message-passing layers that propagate local structural information across the graph. A global pooling operation produces a graph-level embedding, which is then fed into a multilayer perceptron to output a rank prediction

\hat{r}

.

Figure 2. Learning-assisted rank prediction pipeline. The coefficient graph is processed by stacked message-passing layers that propagate local structural information across the graph. A global pooling operation produces a graph-level embedding, which is then fed into a multilayer perceptron to output a rank prediction

\hat{r}

.

Figure 3. Hybrid solver pipeline. A graph neural network predicts a candidate rank

\hat{r}

from the coefficient graph. The Sylvester module then performs algebraic reconstruction at that rank. An explicit residual check certifies correctness; if verification fails, a small fallback search around

\hat{r}

is performed.

Figure 3. Hybrid solver pipeline. A graph neural network predicts a candidate rank

\hat{r}

from the coefficient graph. The Sylvester module then performs algebraic reconstruction at that rank. An explicit residual check certifies correctness; if verification fails, a small fallback search around

\hat{r}

is performed.

Figure 4. Noise robustness at degree

d = 200

. Rank accuracy (and optionally verified success rate) as a function of coefficient noise level

σ

.

Figure 4. Noise robustness at degree

d = 200

. Rank accuracy (and optionally verified success rate) as a function of coefficient noise level

σ

.

Table 1. Graph encoding used for rank identification.

Component	Design
Nodes	One node per coefficient index $i \in {0, \dots, d}$
Node features	$x_{i} = [{\tilde{a}}_{i}, i / d, \log (1 + \| {\tilde{a}}_{i} \|)]$
Local edges	$(i, i + 1)$ for $i = 0, \dots, d - 1$
Hankel edges	Pairs in Hankel windows with constant index-sum (restricted by radius $Δ$ )
Edge features	Multiplicity $w_{i j}$ (optionally normalized)
Goal	Encode Hankel-induced coefficient reuse patterns for message passing

Table 2. Comparison of possible structural encodings for rank prediction.

Encoding	Advantage	Limitation in This Setting
Raw coefficient vector	Simple and inexpensive	Does not expose repeated Hankel-window reuse explicitly
Direct Hankel arrays	Close to the algebraic matrices used by the solver	Requires choosing matrix sizes and may duplicate the same coefficient many times
Local 1D convolutions	Efficient for nearby sequential patterns	Captures adjacency well but not all overlapping anti-diagonal couplings
Transformer sequence model	Flexible global interactions	Higher memory cost and weaker built-in Hankel inductive bias for moderate d
Coefficient graph	Sparse, solver-aligned, and able to aggregate local-to-global recurrence cues	Adds graph-construction overhead and is not claimed to be universally optimal

Table 3. Graph construction and storage profile for

R_{max} = 10

and

Δ = 6

(50 samples per degree, CPU).

Table 3. Graph construction and storage profile for

R_{max} = 10

and

Δ = 6

(50 samples per degree, CPU).

Degree d	Nodes	Edges	Graph Storage (KB)	Build Time (ms)	GNN Parameters
50	51	570	12.01	$4.18 \pm 0.34$	67,979
100	101	1170	24.61	$9.66 \pm 0.66$	67,979
200	201	2370	49.81	$20.03 \pm 0.95$	67,979

Table 4. Learning tasks and outputs used in the proposed rank predictor. CG = coefficient graph; VT = verification threshold; FB = fallback (local search around

\hat{r}

).

Table 4. Learning tasks and outputs used in the proposed rank predictor. CG = coefficient graph; VT = verification threshold; FB = fallback (local search around

\hat{r}

).

	Rank Classification	Auxiliary Stability
Input	CG G	CG G
Output	$\hat{r} \in {1, \dots, R_{max}}$	$\hat{u} \in (0, 1)$
Use in solver	determine Hankel size for reconstruction	adjust VT and enable FB if needed

Table 5. Default hyperparameters for the rank prediction network.

Hyperparameter	Value
GNN layers L	4
Hidden dimension h	128
Aggregation	Mean/normalized sum (GCN-style)
Readout	Global mean pooling
Classifier	2-layer MLP (hidden dimension 128, ReLU)
Dropout	$0.2$
Optimizer	Adam
Initial learning rate	$10^{- 3}$
Batch size	256 (graphs)
Training epochs	up to 200 with early stopping

Table 6. Comparison between exhaustive rank search and learning-assisted rank guidance.

Method	Reconstruction Attempts	Typical Cost
Classical Sylvester search	$r^{★}$ attempts	$\sum_{r = 1}^{r^{★}} T (r)$
Learning-assisted solver (ours)	C candidate attempts	$O (d R_{max} Δ) + T_{GNN} + \sum_{r_{c} \in C} T (r_{c})$

Table 7. Rank identification and verification results on the test set. Acc = rank accuracy; F1 = macro-F1; VSR = verified success rate. Best values within each degree block are boldfaced.

Degree d	Method	Acc (%)	F1 (%)	VSR (%)
50	Classical	70.70	67.53	100.00
	Hybrid (Original)	12.55	3.50	93.85
	Meta (Neighbor)	34.35	32.51	83.30
	Meta (Top-3)	38.25	36.00	88.40
	Meta (All)	49.70	49.21	100.00
100	Classical	70.75	68.84	100.00
	Hybrid (Original)	11.20	3.07	92.50
	Meta (Neighbor)	32.05	31.16	86.45
	Meta (Top-3)	36.10	35.07	90.65
	Meta (All)	45.70	46.45	100.00
200	Classical	71.95	71.03	100.00
	Hybrid (Original)	11.60	3.23	89.90
	Meta (Neighbor)	30.70	30.23	83.80
	Meta (Top-3)	33.95	33.50	89.55
	Meta (All)	46.60	47.62	100.00

Table 8. Runtime comparison on the test set (average wall-clock time per instance, ms). The lowest runtime for each degree is boldfaced.

Method	$d = 50$	$d = 100$	$d = 200$
Classical	0.9258	0.9320	1.6268
Hybrid	10.7652	18.8642	34.1328
Meta (Neighbor)	12.2912	21.1179	37.9936
Meta (Top-3)	12.3175	21.3234	38.4829
Meta (All)	16.3518	28.2235	49.4930

Table 9. Pilot comparison of coefficient-sequence encodings (300 test samples, CPU). Best values in each column are boldfaced.

Model	Parameters	Acc (%)	F1 (%)	Runtime (ms)
MLP	95,114	10.00	4.69	0.007
1D CNN	85,386	9.67	1.76	0.070
Transformer	398,346	8.00	1.48	1.069
Pretrained RankGNN	67,979	20.33	16.18	15.484

Table 10. Distribution-shift stress tests at

d = 200

,

σ = 10^{- 4}

using the strict verification tolerance

τ = 10^{- 6}

(200 samples per setting, CPU). Best Acc and VSR values within each setting are boldfaced.

Table 10. Distribution-shift stress tests at

d = 200

,

σ = 10^{- 4}

using the strict verification tolerance

τ = 10^{- 6}

(200 samples per setting, CPU). Best Acc and VSR values within each setting are boldfaced.

Setting	Classical Acc	Classical VSR	Hybrid Acc	Hybrid VSR	Meta Acc	Meta VSR
In-distribution	0.00	0.00	1.50	17.50	30.00	94.50
Normal roots	1.00	78.50	7.50	85.00	11.00	99.50
Lognormal weights	0.00	0.00	3.00	14.50	26.50	89.50
Near-collision roots	0.00	0.00	2.50	9.00	7.50	98.00
Ill-conditioned weights	0.00	0.00	3.50	16.50	23.00	92.50

Table 11. Noise robustness results used for Figure 4. Best Acc and VSR values for each noise level are boldfaced.

Method	$σ = 0$	$σ = 10^{- 6}$	$σ = 10^{- 4}$	$σ = 10^{- 3}$
Classical Acc (%)	73.65	0.00	0.00	11.55
Classical VSR (%)	100.00	0.00	0.00	43.15
Hybrid Acc (%)	11.95	11.30	1.80	0.00
Hybrid VSR (%)	91.05	90.45	15.55	0.00
Meta (Neighbor) Acc (%)	32.10	29.45	21.65	10.70
Meta (Neighbor) VSR (%)	84.35	84.95	73.90	24.20
Meta (Top-3) Acc (%)	36.20	31.85	21.95	13.35
Meta (Top-3) VSR (%)	89.25	91.00	80.35	34.30
Meta (All) Acc (%)	46.55	34.90	25.15	15.70
Meta (All) VSR (%)	100.00	100.00	96.30	50.25

Table 12. Ablation study on GNN-derived meta-features. “With GNN features” uses the full meta-feature vector, while “No GNN features” removes the GNN-derived stability score, candidate posterior probability, and rank-distance term. Best accuracies within each degree block are boldfaced.

Degree d	Strategy	With GNN Features (%)	No GNN Features (%)	Change
50	Neighbor	33.3	30.0	$- 3.3$
	Top-3	38.7	32.3	$- 6.4$
	All	51.4	37.8	$- 13.6$
100	Neighbor	30.9	28.0	$- 2.9$
	Top-3	34.9	31.1	$- 3.8$
	All	46.1	36.5	$- 9.6$
200	Neighbor	30.0	28.5	$- 1.5$
	Top-3	32.4	29.2	$- 3.2$
	All	45.1	34.6	$- 10.5$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, W.; Liang, C.-W.; Wang, M.-J.-S.; Zhang, C. Hankel-Structured Graph Learning for Meta-Verified Sylvester Reconstruction in Binary Waring Decomposition. Symmetry 2026, 18, 1012. https://doi.org/10.3390/sym18061012

AMA Style

Wang W, Liang C-W, Wang M-J-S, Zhang C. Hankel-Structured Graph Learning for Meta-Verified Sylvester Reconstruction in Binary Waring Decomposition. Symmetry. 2026; 18(6):1012. https://doi.org/10.3390/sym18061012

Chicago/Turabian Style

Wang, Wenjie, Chen-Wei Liang, Mu-Jiang-Shan Wang, and Chi Zhang. 2026. "Hankel-Structured Graph Learning for Meta-Verified Sylvester Reconstruction in Binary Waring Decomposition" Symmetry 18, no. 6: 1012. https://doi.org/10.3390/sym18061012

APA Style

Wang, W., Liang, C.-W., Wang, M.-J.-S., & Zhang, C. (2026). Hankel-Structured Graph Learning for Meta-Verified Sylvester Reconstruction in Binary Waring Decomposition. Symmetry, 18(6), 1012. https://doi.org/10.3390/sym18061012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hankel-Structured Graph Learning for Meta-Verified Sylvester Reconstruction in Binary Waring Decomposition

Abstract

1. Introduction

2. Mathematical Preliminaries

2.1. Binary Waring Decomposition

2.2. Hankel (Catalecticant) Characterization

2.3. Linear Recurrence and Reconstruction

2.4. Classical Sylvester Rank Search

3. Graph Representation of Polynomial Structure

3.1. Coefficient Graph Construction

3.2. Structural Insight

3.3. Computational Complexity and Implementation Notes

4. Learning-Assisted Rank Prediction

4.1. Problem Formulation

4.2. Network Architecture

4.3. Hybrid Solver

4.4. Meta-Solver: Learned Candidate Verification Beyond Hard Residual Thresholding

5. Experimental Setup

5.1. Dataset Generation

5.2. Baselines

5.3. Metrics

6. Results

6.1. Rank Prediction Accuracy

6.2. Runtime Comparison

6.3. Architecture and Distribution-Shift Stress Tests

6.4. Noise Robustness

6.5. Ablation on GNN-Derived Meta-Features

7. Discussion

7.1. Coefficient Graphs as an Inductive Bias

7.2. Interpretability and Correctness Guarantees

7.3. Efficiency Considerations

7.4. Robustness and Numerical Conditioning

7.5. Limitations

7.6. Learning to Guide Exact Solvers

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI