Quantum Kernels for Narrative Coherence: An Application to Path Optimization in Document Graphs for Storyline Extraction

Keith-Norambuena, Brian; Canales, Javiera; Araya, Maximiliano; Rojas-Córdova, Carolina; Meneses-Villegas, Claudio; Lam-Esquenazi, Elizabeth; Flores-Bustos, Angélica

doi:10.3390/math14101734

Open AccessArticle

Quantum Kernels for Narrative Coherence: An Application to Path Optimization in Document Graphs for Storyline Extraction

by

Brian Keith-Norambuena

^1,*

,

Javiera Canales

²

,

Maximiliano Araya

²

,

Carolina Rojas-Córdova

³

,

Claudio Meneses-Villegas

¹

,

Elizabeth Lam-Esquenazi

⁴

and

Angélica Flores-Bustos

¹

Department of Computing and Systems Engineering, Universidad Católica del Norte, Antofagasta 1270709, Chile

²

CoreDevX, Santiago 7510838, Chile

³

Department of Industrial Engineering, Universidad Católica del Norte, Antofagasta 1270709, Chile

⁴

Department of Chemical and Environmental Engineering, Universidad Católica del Norte, Antofagasta 1270709, Chile

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(10), 1734; https://doi.org/10.3390/math14101734

Submission received: 10 April 2026 / Revised: 7 May 2026 / Accepted: 14 May 2026 / Published: 18 May 2026

(This article belongs to the Special Issue Applied Mathematics in Artificial Intelligence: Methods, Algorithms, and Applications)

Download

Browse Figures

Versions Notes

Abstract

Narrative extraction algorithms construct storylines by finding coherent paths through document collections. The Narrative Trails algorithm frames this as maximum-capacity path optimization, where path quality depends on a coherence function measuring document relationships. We introduce quantum kernels as coherence functions for narrative extraction—to the best of our knowledge, the first systematic characterisation of quantum kernel methods for storyline extraction—and compare them against classical baselines on two corpora using a multi-seed protocol. The sweep covers 93 method evaluations (54 quantum kernels across three encoder families—R_Y+CNOT-ring, IQP/ZZ-feature-map, and a projected quantum kernel—and 39 classical kernels—cosine, RBF, and the cluster-aware Narrative Trails baseline). On 11,215 human navigation paths from Wikispeedia, evaluation metrics divide into two clusters that disagree with each other: alignment-based metrics (length-normalised DTW and per-step DTW similarity) favour methods that produce long alignment-rich paths, while set-overlap metrics (Jaccard and F1) favour methods that produce shorter paths with higher article overlap. On LLM-judged coherence for Cuban news storylines, evaluated under a 12-method × 5-seed × 30-endpoint-pair × 2-judge design (Claude Sonnet 4.5 and GPT-4o, both at

T = 0

via structured tool calling), the cluster-aware classical baseline is the top method in terms of mean overall coherence; the 5-method quantum-kernel pool and the 7-method classical-kernel pool on matched projection input show no significant differences after Holm correction. Cross-task analysis reveals that LLM coherence rank correlates with alignment-cluster Wikispeedia metrics (Spearman

ρ \approx + 0.70

) and anti-correlates with overlap-cluster metrics (

ρ \approx - 0.62

). A closed-form theoretical analysis shows that the depth-1 R_Y+CNOT-ring kernel reduces to a classical product-of-cosines kernel order equivalent to RBF, explaining the absence of empirical separation at low depth; deeper encoders break the cancellation but exponentially concentrate kernel values, eroding inter-pair distinguishability. Our results characterise quantum coherence kernels as competitive with classical kernels on the same projected input rather than decisively superior, with the cluster-aware classical baseline retaining a modest advantage attributable to its explicit topical structure.

Keywords:

quantum kernels; narrative extraction; storyline extraction; document coherence; path optimization; quantum machine learning; natural language processing; document graphs

MSC:

68T50; 81P68; 68T05

1. Introduction

Narrative extraction—constructing thematically coherent sequences of documents from a corpus [1]—is a fundamental task in natural language processing with applications in news summarization, knowledge discovery, and information retrieval. The Narrative Trails algorithm [2] addresses this by finding maximum-capacity paths through document graphs where edge weights represent document coherence [3]. This bottleneck formulation optimizes for the weakest transition in a path, making the choice of coherence function consequential.

Quantum machine learning has shown promising results in several application areas—classification benchmarks under encoded data regimes [4,5], learning from physical experimental data with provable advantage [6,7], and variational chemistry-style simulation [8]—yet applications to natural language processing remain relatively unexplored. Quantum kernels—which compute similarity via state overlap in an exponentially large Hilbert space—offer a theoretically motivated approach to measuring document relationships that may capture structure classical methods miss.

We introduce quantum coherence kernels for narrative path extraction and systematically evaluate their performance against classical baselines. The coherence function in the original Narrative Trails algorithm combines angular similarity in UMAP-projected space and topic similarity from HDBSCAN clustering. This design emphasizes local neighborhood structure and topical groupings but may overlook global associative connections that span topic boundaries [9]. We investigate quantum kernels [4] as an alternative, with particular attention to how dimensionality reduction before quantum encoding affects the resulting similarity structure.

The choice of projection method carries theoretical implications. Encoding 1536-dimensional embeddings directly would require hundreds of circuit layers, so some projection is necessary. UMAP transforms geometry to emphasize local manifold structure, potentially discarding global relationships. Random projection, grounded in the Johnson–Lindenstrauss lemma [10,11], preserves pairwise distances from the original embedding space. We hypothesize that random-projection quantum kernels should better preserve associative structure, since human navigation often follows connections based on shared concepts that may not respect topic-cluster boundaries [12].

Why quantum kernels, specifically, rather than simpler similarity measures on randomly projected embeddings? The bottleneck formulation of narrative coherence [2] optimizes for the weakest transition in a path, requiring accurate measurement of all document relationships—including subtle connections that simple similarity measures might miss. Quantum kernels compute similarity via state overlap in an exponentially large Hilbert space [4,5,13], capturing higher-order relationships between embedding dimensions that cosine similarity or RBF kernels cannot efficiently represent [14]. The interference patterns of the quantum circuit [4,15] may encode complex interactions between features that correspond to nuanced semantic relationships. Thus, the theoretical argument combines two elements: (1) random projection preserves the original embedding geometry per Johnson–Lindenstrauss lemma, maintaining semantic relationships that human navigation exploits, and (2) quantum kernels provide expressive similarity computation that can capture complex feature interactions within this preserved geometry [7].

We systematically compare both approaches against the classical baseline using a sweep that totals 93 methods on Wikispeedia (1 classical Narrative Trails baseline, 30 R_Y+CNOT-ring quantum configurations covering the original factorial design over qubits and depth, 14 IQP/ZZ-feature-map variants, 10 projected quantum-kernel variants, and 38 cosine/RBF baselines on the same projected inputs and on the raw 1536-dimembeddings) and a 12-method, 5-seed, dual-judge protocol on Cuban news. On 11,215 human navigation paths from Wikispeedia [16], the standard evaluation metrics divide into two clusters that rank methods in opposite ways: alignment-based metrics (length-normalised DTW and per-step similarity) favour UMAP-projected quantum kernels and the cluster-aware classical baseline, while set-overlap metrics (Jaccard and F1) favour random-projection methods. On LLM-judged coherence for Cuban news [17], the cluster-aware classical baseline is the top method under both judges, while a pool-vs-pool comparison between the five genuinely quantum methods and the seven classical methods (classical narrative trails plus six cosine/RBF kernels on the same random projection used by the quantum pool) shows no significant difference after Holm correction. Therefore, the two evaluation regimes reward different kernel properties: smooth topical concentration helps both alignment and narrative coherence; sharp pairwise similarity on raw projections helps overlap with short navigation paths.

2. Background and Mathematical Foundations

2.1. Mathematical Preliminaries

2.1.1. Maximum-Capacity Path Optimization

Given a weighted graph (

G = (V, E, w)

) with edge weights of

w : E \to [0, 1]

, the maximum-capacity path between vertices s and t is the path (

P^{*}

) that maximizes the minimum edge weight:

P^{*} = arg max_{P \in Paths (s, t)} min_{e \in P} w (e)

(1)

This bottleneck formulation differs from shortest-path or maximum-weight problems: a path with nine edges with weights of 0.9 and one edge of a weight 0.1 scores only 0.1. For narrative extraction, this ensures storylines maintain consistent coherence throughout rather than averaging strong and weak transitions.

The maximum-capacity path can be computed efficiently via the maximum spanning tree (MST). The MST of a weighted, undirected graph is a spanning subtree whose total edge weight is maximised; classical algorithms [18] build it greedily by adding edges in decreasing-weight order while avoiding cycles. The relevant property here is that the unique tree path between any two vertices in the MST is also the maximum-capacity path in the original graph: any alternative path must include at least one edge that the greedy procedure rejected, and rejection implies that edge cannot exceed the bottleneck of the tree path. This enables

O (N^{2} log N)

MST construction (given edge weights), followed by

O (N)

path queries; the total preprocessing cost depends on the coherence function used to compute edge weights (see Section 3.7). Concretely, with

N = 4

documents arranged so that the heaviest edges form a chain (A–B–C with weights of

0.9, 0.9

) and any other edge has a weight

\leq 0.5

, the MST coincides with the chain, and the maximum-capacity path between A and C is A–B–C with a bottleneck of

0.9

; the alternative direct edge (A–C) would have a bottleneck

\leq 0.5

.

2.1.2. Quantum Kernels

A quantum kernel measures similarity by encoding classical data into quantum states and computing their overlap. Given an n-qubit encoding circuit

U (z)

that maps input vector z to quantum state

| ψ (z) 〉 = {U (z) | 0 〉}^{\otimes n}

, the kernel function is the squared state overlap (fidelity):

k (z_{u}, z_{v}) = {| 〈 ψ (z_{u}) | ψ (z_{v}) 〉 |}^{2} .

(2)

Substituting

| ψ (z) 〉 = {U (z) | 0 〉}^{\otimes n}

gives

〈 ψ (z_{u}) | ψ (z_{v}) 〉 = {〈 0 |}^{\otimes n} U^{†} (z_{u}) U (z_{v}) {| 0 〉}^{\otimes n}

, so

k (z_{u}, z_{v}) = {|{〈 0 |}^{\otimes n} U^{†} (z_{u}) U (z_{v}) {| 0 〉}^{\otimes n}|}^{2} .

(3)

The right-hand side has a direct measurement interpretation. Preparing

U^{†} (z_{u}) U (z_{v}) {| 0 〉}^{\otimes n}

and measuring on a computational basis, the probability of the outcome (

{| 0 〉}^{\otimes n}

) is

{| 〈 0 |}^{\otimes n} U^{†} (z_{u}) U (z_{v}) {| 0 〉}^{\otimes n} |^{2}

according to the Born rule, which is exactly

k (z_{u}, z_{v})

. This is the compute–uncompute (or fidelity) construction: encode

z_{v}

forward, then apply the adjoint of the encoding for

z_{u}

and read out

{| 0 〉}^{\otimes n}

. When

z_{u} = z_{v}

, the adjoint exactly undoes the forward circuit, and the kernel equals 1. The kernel implicitly operates in a

2^{n}

-dimensional Hilbert space [5], potentially capturing nonlinear relationships between features that classical kernels cannot efficiently represent. Whether that expressivity is informative for a particular task depends on the encoding and the data; we return to this question in Section 4.1.

2.1.3. Dimensionality Reduction for Quantum Encoding

Document embeddings (1536 dimensions from text-embedding-3-small) must be projected to lower dimensions before quantum encoding—directly encoding high-dimensional data would require intractable circuit depths. We consider two projection approaches that bracket the local-vs-global design space: random Gaussian projection, which preserves pairwise distances per the Johnson–Lindenstrauss lemma, and UMAP, which emphasizes local manifold structure. The choice of projection method determines what geometric relationships are available for the quantum kernel to measure.

Why these two and not others? PCA [19] preserves global variance but is linear, so it cannot follow the curved manifold structure that document embeddings often inhabit; Isomap [20] and Locally Linear Embedding [21] preserve manifold geometry in a spirit similar to that of UMAP and have been shown to dominated by it on high-dimensional embedding inputs in recent comparative studies [22], whereas trained autoencoders would require a downstream supervision signal that the unsupervised storyline extraction setting does not provide. Therefore, UMAP and random projection bracket the relevant design axis (local manifold preservation versus distance preservation), and the empirical contrast between them is informative without an exhaustive sweep over manifold-learning methods that behave qualitatively like UMAP.

2.2. Related Work

A full treatment of quantum machine learning is beyond the scope of this paper; we refer readers to Schuld and Petruccione [23] for theoretical foundations and to Cerezo et al. [8] for an overview of variational quantum algorithms. Here, we focus specifically on quantum kernels as similarity measures.

Quantum natural language processing has explored various approaches to leveraging quantum computation for language tasks [24,25,26,27]. The dominant line—compositional QNLP rooted in the DisCoCat formalism [28]—encodes sentence structure into tensor networks or variational circuits and targets sentence-level classification. Our setting differs in two ways: (i) the unit of computation is a pre-trained 1536-dimensional document embedding rather than a parsed sentence, and (ii) the downstream task is a graph-path optimization (maximum-capacity path through a coherence graph) rather than a per-instance classification. We are therefore not competing with DisCoCat-style models: we use quantum circuits as a similarity operator over already embedded documents, and the contribution lies in whether that similarity yields better paths through the document graph. Quantum similarity learning, more broadly [4,5,7,29,30], has examined kernel-based learning on numerical inputs, including the projected quantum-kernel construction [29] that we evaluate as one of our three quantum families. Quantum concepts have also been applied metaphorically to narrative construction, using superposition to represent character ambiguity and entanglement to model conflict dynamics [31], though this symbolic approach differs from our use of quantum kernels for computation of document similarity. To the best of our knowledge, no prior work has applied quantum kernels to coherence measurement for storyline extraction or any related graph-path task on document embeddings; our contribution is therefore methodological (the first systematic factorial study of quantum kernels in this setting) rather than a claim that quantum kernels beat classical alternatives—which, as our results show, they do not.

The Narrative Trails framework [2] builds on earlier storyline extraction methods [1,3,32,33], distinguishing itself through its maximum-capacity path formulation and efficient MST-based algorithm. Its bottleneck optimization motivates our investigation: replacing the coherence function could change which storylines the algorithm produces.

Quantum kernels have shown promise in various machine learning contexts [4,5], with incremental data uploading [29,34,35] providing a practical encoding scheme for classical data. Random projection as a dimensionality reduction technique dates to Johnson and Lindenstrauss [10,36], who proved that random linear maps approximately preserve distances. Achlioptas [11] showed that simple random matrices suffice, making the technique practical for high-dimensional data.

2.3. Narrative Trails and Bottleneck Optimization

The Narrative Trails framework [2] applies the maximum-capacity path formulation to storyline extraction. Given N documents with embeddings

e_{1}, \dots, e_{N}

and a coherence function (

θ (i, j) \in [0, 1]

), the algorithm seeks a path (

P = (i_{1}, \dots, i_{k})

) from source to target that maximizes

{min}_{j} θ (i_{j}, i_{j + 1})

. Since a single weak transition determines overall path quality, the design of the coherence function directly shapes which storylines the algorithm produces. The classical coherence function combines two components:

θ (u, v) = \sqrt{S (z_{u}, z_{v}) \cdot T ({\hat{z}}_{u}, {\hat{z}}_{v})} .

(4)

The four objects in Equation (4) are defined as follows.

$z_{u} = π_{UMAP} (e_{u})$ : The embedding ( $e_{u} \in R^{1536}$ ) is mapped to a low-dimensional vector ( $z_{u} \in R^{48})$ by uniform manifold approximation and projection [37], which fits a fuzzy simplicial-set representation of the data manifold and minimises the cross-entropy between the high- and low-dimensional fuzzy-set representations. UMAP preserves local neighbourhood structure but may distort large-scale geometry; we use it because it is the projection used by the original Narrative Trails baseline.
${\hat{z}}_{u}$ : The soft cluster assignment of document u produced by Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) [38]. HDBSCAN extracts a stable cluster hierarchy from a mutual-reachability-distance MST and converts it to a probability distribution over K topic clusters by aggregating membership scores along the condensed tree branches; ${\hat{z}}_{u} \in Δ^{K - 1}$ is the resulting probability vector.
$S (z_{u}, z_{v})$ : The rescaled cosine similarity between the UMAP-projected vectors ( $S (z_{u}, z_{v}) = \frac{1}{2} (1 + \frac{〈 z_{u}, z_{v} 〉}{∥ z_{u} ∥ ∥ z_{v} ∥}) \in [0, 1]$ ) so that orthogonal projections give $0.5$ rather than 0 and antipodal projections give 0.
$T ({\hat{z}}_{u}, {\hat{z}}_{v})$ : The topic similarity derived from the symmetric Jensen–Shannon divergence [39] between cluster-membership distributions ( $T (p, q) = 1 - JSD (p ∥ q)$ , where $JSD (p ∥ q) = \frac{1}{2} KL (p ∥ m) + \frac{1}{2} KL (q ∥ m)$ and $m = (p + q) / 2$ . JSD is bounded in $[0, 1]$ when computed with logarithm base 2), so $T \in [0, 1]$ .

The geometric mean requires both spatial proximity (via S) and thematic alignment (via T) for high coherence.

As noted above, the MST property enables efficient path extraction: the dominant cost is computing all

O (N^{2})

pairwise coherences, making the approach practical for moderately sized corpora.

2.4. Quantum Kernels as Similarity Measures

Classical kernels like RBF or cosine similarity compute similarity directly from input features. Quantum kernels [5] instead encode data into quantum states and measure their overlap in the resulting

2^{n}

-dimensional Hilbert space, as formalized above.

In practice, this transformation is parameterised by circuit depth (controlling expressivity) and width (controlling the feature-space dimension). Whether the high-dimensional mapping captures useful structure depends on the task—not all problems benefit from the additional expressivity. We use a layered encoding architecture based on incremental data uploading [34] where input features are distributed across successive layers, with entanglement enabling cross-layer interactions.

3. Methodology

3.1. Problem Formulation

Given N documents with embeddings

e_{1}, \dots, e_{N} \in R^{D}

(

D = 1536

from text-embedding-3-small), we seek a coherence function (

θ : R^{D} \times R^{D} \to [0, 1]

) such that the maximum-capacity path, i.e.,

P^{*} = arg max_{P} min_{(i, j) \in P} θ (e_{i}, e_{j}),

(5)

produces storylines that align with human judgments of narrative coherence.

The classical Narrative Trails approach computes

θ

as a geometric mean of angular similarity (in UMAP-projected space) and topic similarity (from HDBSCAN clustering). We propose replacing this with a quantum kernel:

θ (e_{i}, e_{j}) = k (π (e_{i}), π (e_{j}))

, where

π

projects embeddings to lower dimensions and k computes quantum-state overlap.

To understand which quantum kernel configurations are effective for narrative coherence, we employ a factorial experimental design covering 93 method evaluations: 54 quantum kernels across three encoder families (R_Y+CNOT-ring data re-uploading, IQP/ZZ-feature-map, and the projected quantum kernel of Huang et al. [29]) and 39 classical kernels (cosine, RBF, and the cluster-aware Narrative Trails baseline) on matched projected inputs and on the raw 1536-dimensional embeddings.

3.2. Datasets and Evaluation Metrics

We evaluate on two datasets. The Wikispeedia dataset [16] contains human navigation paths through a subset of Wikipedia—records of paths that people actually traversed when asked to navigate between articles. This provides a supervised ground truth for what constitutes a reasonable inter-article sequence. The dataset contains 3928 articles with 11,215 complete human navigation paths, which we use in full for evaluation. For each method, we report five metrics from the Narrative Trails reference implementation [2], all built on Dynamic Time Warping (DTW) [40] or set overlap. DTW computes the minimum-cost monotone alignment between two sequences (

X = (x_{1}, \dots, x_{m})

and

Y = (y_{1}, \dots, y_{n})

) by filling a cost table (

D \in R^{m \times n}

) with

D_{i j} = c (x_{i}, y_{j}) + min (D_{i - 1, j}, D_{i, j - 1}, D_{i - 1, j - 1})

, where c is the per-element cost; the bottom-right cell (

D_{m n}

) is the total alignment cost, and the back-trace gives the alignment of length

ℓ \in [max (m, n), m + n - 1]

. We use

c (x_{i}, y_{j}) = 1 - cos ({\hat{e}}_{x_{i}}, {\hat{e}}_{y_{j}})

, where

\hat{e}

is the embedding in the original 1536-dim space. We report: (i) the length-normalised DTW distance (nDTW—lower is better;

D_{m n} / ℓ

, the per-aligned-step cost), (ii) the nDTW similarity (1 minus the mean per-step cost along the alignment), (iii) the per-step DTW similarity (length-independent mean of

1 - c

over each transition in the extracted path), (iv) the Jaccard overlap between the extracted-path article set and the human-path article set, and (v) the harmonic-mean F1 of precision and recall on those same sets. As we show in Section 4, these metrics fall into two clusters that rank methods differently, so we report both.

For unsupervised evaluation, we use a corpus of 418 news articles about Cuban political events from 2020–2021 [17]. Without ground-truth paths, we rely on an LLM-as-a-judge evaluation approach for coherence [41]. Each storyline is scored on four dimensions—logical flow, thematic consistency, temporal coherence, and narrative completeness—each on a 1–10 scale. Two judges, Claude Sonnet 4.5 and GPT-4o, each score every storyline independently at a temperature of

T = 0

via structured tool/function calling that emits exactly the four integer scores; no free-form justification is generated. Therefore, each cell (method, seed, and endpoint pair) yields one score per dimension per judge. We evaluate 30 spatially distant endpoint pairs, each sampled once and reused across all method–seed–judge combinations to enable paired comparisons. For each method we use 5 random projection seeds (42, 123, 456, 789, and 1024) so that conclusions are not driven by a single projection draw. For supervised evaluation, we report the nDTW, per-step similarity, Jaccard overlap, and F1 score; for unsupervised evaluation, we report the mean overall coherence and per-dimension scores with paired Wilcoxon tests against the classical baseline.

Note that coherence scores from different methods are not directly comparable in absolute terms—each defines its own similarity space. The path extraction algorithm uses only the ranking of document pairs, so what matters is whether a method’s rankings produce paths that align with human behaviour or external quality judgements.

The prompt shown in Figure 1 is used by both judges. The same prompt is sent to each judge, with the only difference being the underlying API endpoint (Anthropic Messages API for Sonnet 4.5; OpenAI Chat Completions API for GPT-4o).

3.3. Quantum Coherence Kernels

We replace the coherence function with a quantum kernel. The classical approach computes coherence as a geometric mean of angular similarity (in UMAP space) and topic similarity (from clustering). Our quantum approach computes coherence directly from kernel similarity: given two documents, we project their embeddings to low dimensions, encode them via a quantum circuit, and measure the overlap of the resulting representations. Higher overlap indicates more similar documents, yielding higher coherence.

The bottleneck path formulation requires accurate similarity measurement for all document pairs, including subtle relationships that simple similarity measures might miss. Quantum kernels map inputs into an exponentially larger feature space through a parameterised nonlinear transformation. Whether this expressivity translates to better coherence measurement is an empirical question our experiments address.

We note that there are practical constraints that shape our approach. Document embeddings from OpenAI’s text-embedding-3-small have 1536 dimensions, but encoding high-dimensional data into quantum circuits is computationally intractable—each dimension adds circuit depth, and simulation of deep circuits scales exponentially. We therefore project embeddings to between 2 and 64 dimensions before computing kernel similarity, with the projection method becoming a key experimental variable.

We use a layered encoding circuit based on incremental data uploading [34], where input features are distributed across successive layers with entanglement between them, increasing the expressivity of the quantum feature map. For n qubits and L layers, the circuit encodes a

d = n \times L

dimensional input vector (

z = (z_{1}, \dots, z_{d})

). Each layer (

ℓ \in {1, \dots, L}

) consists of the following:

Rotation gates: Each qubit (i) receives an $R_{Y}$ rotation parameterised by input feature $z_{j}$ , where $R_{Y} (θ) = exp (- i θ Y / 2)$ rotates the qubit state around the Y-axis of the Bloch sphere.
Entangling gates: CNOT gates in a ring topology connect neighbouring qubits, creating entanglement that allows the circuit to represent correlations between features.

For n qubits and an d-dimensional input, each layer encodes n distinct features from the input vector, requiring

⌈ d / n ⌉

layers to encode all d features. The entanglement gates between layers allow the circuit to capture correlations across feature groups.

Before encoding, projected features are normalized globally (per feature dimension) to the

[0, π]

range across all documents. The

[0, π]

range is required by the quantum encoding: the projected coordinates are used directly as

R_{Y}

rotation angles, and angles outside

[0, π]

alias onto identical states. The classical kernel baselines (cosine and RBF) do not need this normalization, but we apply the same projection-and-normalization pipeline to them so that the quantum-versus-classical comparison is matched on input format and isolates the effect of the kernel rather than the input scale. Global (as opposed to per-document) normalization preserves relative positions between documents in each feature dimension, ensuring the rotation angles reflect meaningful inter-document differences. The kernel is then read out via the compute–uncompute construction of Equations (2) and (3): applied to the layered encoder described above, this means a forward pass of L

R_{Y}

+CNOT-ring layers parameterised by

z_{v}

, followed by the adjoint of the same circuit parameterised by

z_{u}

(reversed layer order, reversed CNOT order, and negated angles), with the all-zeros bitstring probability serving as the kernel value and, therefore, as the coherence score.

Figure 2 shows the encoding architecture at three illustrative sizes (

n = 2

,

n = 3

, and a generic n). The three panels illustrate how the CNOT-ring entangler closes for any n: at

n = 3

, the ring wraps from

q_{3}

back to

q_{1}

(panel b), and the same wrap rule applies for an arbitrary n (panel c). The

n = 2

case (panel a) is drawn schematically as a single CNOT but is degenerate—the sequential sweep and the wrap-around CNOT act on the same wire pair in opposite directions, so the implementation applies the pair

(q_{1} \to q_{2}, q_{2} \to q_{1})

per layer; the figure abstracts these as one CNOT for visual symmetry with the

n \geq 3

panels. For an input of dimension

d = n \times L

, each of the L layers consumes n consecutive features as

R_{Y}

rotation angles, and the CNOT ring after each layer entangles them with the qubits in the next layer’s rotation block. The data re-uploading pattern—rotation block, entangler, rotation block, entangler, …—is the source of the kernel’s non-trivial dependence on the input at depth

L \geq 2

; at depth 1, in contrast, the entangler (G) appears once on each side of the compute–uncompute product, and the resulting

G^{†} G = I

collapses the kernel into a per-feature product (this is the source of the depth-1 cancellation we make precise in Section 4.1). The figure is therefore not just a schematic: the visual symmetry between the forward

R_{Y}

+CNOT-ring block and its adjoint is precisely the structural feature that the cancellation theorem exploits.

What does the kernel actually measure? The compute–uncompute readout is operationally simple—the probability of the all-zeros bitstring—but its information content is not opaque. For the depth-1 R_Y+CNOT-ring family, Theorem 1 (Section 4.1) gives the closed form, i.e.,

k (u, v) = \prod_{i} {cos}^{2} ((u_{i} - v_{i}) / 2)

, which is a product of per-feature angle similarities. The kernel is therefore directly interpretable: it is a smooth, monotone function of the per-feature angular displacement, with the product structure encoding “all features must agree” in the same way a Gaussian RBF does. Higher-depth and non-product encodings (IQP and PQK) lift this product structure, but the depth-1 case anchors the family in classical kernel intuition rather than treating the quantum circuit as a black box.

Two Alternative Kernel Families

Beyond the data re-uploading

R_{Y}

+CNOT-ring kernel just described (henceforth referred to as the q_* family), we also evaluate two architecturally distinct quantum kernels chosen to break specific structural properties of that family.

IQP/ZZ-feature-map kernel (iqpML_*): A single-layer, instantaneous, quantum polynomial encoding [4] in the form of

U_{IQP} (z) = (\prod_{(i, j) \in E} ZZ (z_{i} z_{j})) \cdot (⨂_{i} R Z (z_{i})) \cdot H^{\otimes n},

(6)

where

ZZ (θ) = exp (- i θ Z_{i} Z_{j} / 2)

couples qubits

(i, j)

along a ring entangler pattern (E) and H is the Hadamard gate. Unlike

R_{Y}

+CNOT-ring, the

ZZ

rotation angles depend on the product (

z_{i} z_{j}

), so the compute–uncompute product (

U^{†} (u) U (v)

) contains phases in the form of

v_{i} v_{j} - u_{i} u_{j}

, which do not factor into a function of

v - u

alone. We use the multilayer variant (iqpML_random_8q_2L_16d), which stacks two such encodings to enable data re-uploading of

d = 16

features into

n = 8

qubits.

Projected quantum kernel (pqk_*): Following Huang et al. [29], we reuse the same

R_{Y}

+CNOT-ring data re-uploading circuit described above and replace the compute–uncompute fidelity readout with a classical kernel on local Pauli expectations. For each input (z), we compute

r (z) = {({〈 X_{i} 〉}_{ψ (z)}, {〈 Y_{i} 〉}_{ψ (z)}, {〈 Z_{i} 〉}_{ψ (z)})}_{i = 1}^{n} \in R^{3 n},

(7)

i.e., the

3 n

-dim vector of single-qubit Pauli expectation values in the encoded state (

| ψ (z) 〉

). The projected quantum kernel is then the classical RBF kernel on the basis of these expectations:

k_{PQK} (u, v) = exp ({- γ ∥ r (u) - r (v) ∥}^{2}) .

(8)

The motivation in [29] is that PQK retains some quantum non-classicality (the encoded state (

| ψ (z) 〉

) is genuinely quantum) while avoiding the exponential concentration that plagues fidelity kernels at depth.

Figure 3 illustrates the IQP encoding at

n = 4

qubits with a single layer: a Hadamard column rotates the qubits onto the X eigenstates, single-qubit

R Z (z_{i})

rotations imprint each input feature, and a ring of

ZZ (z_{i} z_{j})

entanglers couples adjacent qubits using products of features. The product structure of the entangler angles is exactly the property that lets IQP escape the depth-1 cancellation theorem (Proposition 3).

Figure 4 illustrates the projected quantum kernel: the encoding is the same R_Y+CNOT-ring data re-uploading circuit as in Figure 2, but the compute–uncompute readout is replaced with a per-qubit local Pauli measurement of the encoded state (

| ψ (z) 〉

). The resulting

3 n

-dim vector (

r (z)

) is fed into a classical RBF kernel; this side-steps the depth-induced concentration of fidelity readouts (Remark 1) at the cost of computing a classical kernel on top of a quantum representation.

3.4. Projection Methods

We compare two approaches to dimensionality reduction before quantum encoding.

UMAP [37] projection matches the classical baseline, emphasizing local manifold structure. This isolates the quantum kernel’s contribution but applies nonlinear transformations that may reshape similarity structure.

Random Gaussian projection constructs a matrix (

R \in R^{k \times 1536}

) with entries from

N (0, 1 / k)

. The Johnson–Lindenstrauss lemma [10,36] guarantees that projecting to

k = O (log N / ϵ^{2})

dimensions preserves pairwise distances within

(1 \pm ϵ)

. Unlike UMAP, random projection involves no learning—the projected coordinates derive entirely from the original embedding geometry [42].

3.5. Factorial Experimental Design

To systematically investigate how projection type, qubit count, and circuit depth affect quantum kernel performance, we employ a full factorial design on the supervised Wikispeedia benchmark. We test the following:

Projection types: UMAP and random Gaussian projection;
Qubit counts: 2, 4, and 8 qubits;
Layer counts: 1, 2, 4, 6, and 8 encoding layers.

The projection dimensionality is determined by qubits × layers, yielding feature dimensions ranging from 2 to 64. This gives 30 R_Y+CNOT-ring quantum configurations (2 projections × 3 qubit counts × 5 layer counts).

To isolate the contribution of the quantum kernel from input-format effects, we evaluate the same coherence-graph pipeline with classical kernels on identical projected inputs and with two architecturally distinct quantum kernel families. The classical kernel baselines are cosine and RBF applied to Gaussian random projections innine target dimensions (2, 4, 8, 12, 16, 24, 32, 48, and 64) and to UMAP projections inthe same nine dimensions, plus cosine and RBF computed directly on the raw 1536-dim text-embedding-3-small representation. We label the projected variants as cos_random_dd, rbf_random_dd, cos_umap_dd, and rbf_umap_dd; all consume the same projected and

[0, π]

-normalised input as the quantum kernels, with no qubits in the kernel itself. The raw-embedding variants (cos_raw_1536d and rbf_raw_1536d) sit at the other end of the projection axis: because text-embedding-3-small is, itself, a pre-trained transformer encoder, cosine on its raw output is the natural transformer-based “neural similarity” baseline. The two alternative quantum kernel families (IQP/ZZ-feature-map and the projected quantum kernel of Huang et al. [29]) probe whether the empirical findings extend beyond R_Y+CNOT-ring to encoders that escape the depth-1 cancellation theorem proven in Section 4.1. They are evaluated on a subset of the qubit-and-depth grid used for R_Y+CNOT-ring (14 IQP and 10 PQK variants spanning 4 and 8 qubits at 1, 2, 4, and 8 layers; the omitted depths are listed in Section S1 of the Supplementary Material). Together with the cluster-aware classical baseline (classical_NT), the totals are: 1 classical_NT + 30 R_Y+CNOT + 14 IQP + 10 PQK + 36 cosine/RBF on projected inputs + 2 cosine/RBF on raw embeddings = 93 methods.

For the unsupervised LLM evaluation, the same 30 endpoint pairs are scored across a 12-method subset chosen by aggregate rank on the supervised metrics: the classical baseline, the headline UMAP-projected quantum kernel (q_umap_4q_4d), three random-projection quantum kernels (q_random_4q_8d, q_random_8q_16d, and a multi-layer IQP variant iqpML_random_8q_2L_16d), one projected quantum kernel (pqk_random_8q_8L_64d) [29], and three target dimensions each of the cosine and RBF classical kernel baselines on random projection. This subset gives 12 methods × 5 seeds × 30 pairs × 2 judges = 3600 LLM calls, of which 1800 distinct storylines are scored once per judge.

3.6. Classical Baseline and Path Extraction

The classical baseline uses the original Narrative Trails coherence function: a geometric mean of angular similarity in 48-dimensional UMAP space [37] and topic similarity from HDBSCAN [38] soft clustering. All quantum configurations eliminate the clustering component entirely, computing coherence directly from quantum-state overlap. Path extraction follows the original algorithm: compute all pairwise coherences, build the maximum spanning tree, and extract paths between specified endpoints.

3.7. Computational Complexity

The pipeline has complexity:

Projection: $O (N \cdot D \cdot d)$ for random projection [11]; approximately $O (N \cdot D \cdot log N)$ for UMAP [37] with approximate nearest neighbours [43];
Kernel matrix computation: $O (N^{2} \cdot C (d, L))$ , where $C (d, L)$ is the circuit simulation cost;
MST construction: $O (N^{2} log N)$ via Kruskal’s algorithm [18];
Path queries: $O (N)$ per query.

For classical simulation,

C (d, L)

scales exponentially in qubit count but is tractable for

n \leq 8

qubits. All quantum circuits were implemented using PennyLane [44]. We note that these complexities reflect the general algorithm; specific implementations may achieve better performance through sparse graph structures, caching, or parallelization. Bottleneck value queries (finding the minimum edge weight on a path) could be reduced to

O (log N)

via lowest-common-ancestor preprocessing [45]; however, extracting the full path sequence remains

O (path length)

, and such optimizations are orthogonal to our methodological contributions.

4. Results

4.1. Theoretical Characterisation of the Quantum Kernel Family

Before reporting empirical performance, we characterise, in closed form, what the compute–uncompute fidelity kernel actually computes for the encoding family we evaluate. The result narrows the space of plausible quantum advantages for this kernel class and motivates the alternative encodings (IQP and PQK) we add in the empirical study.

Theorem 1

(Depth-1 Cancellation of CNOT-Ring Entanglers). Let

U (z) = G \cdot ⨂_{i = 1}^{n} R_{Y} (z_{i})

be a depth-1 encoding with rotations of

R_{Y} (θ) = exp (- i θ Y / 2)

on each qubit, followed by an arbitrary data-independent unitary G (e.g., a CNOT ring). The compute–uncompute fidelity kernel is

k (u, v) = {|{〈 0 |}^{\otimes n} U^{†} (u) U (v) {| 0 〉}^{\otimes n}|}^{2} = \prod_{i = 1}^{n} {cos}^{2} (\frac{u_{i} - v_{i}}{2}) .

(9)

Proof.

Let

R (z) = ⨂_{i} R_{Y} (z_{i})

. Then,

U^{†} (u) U (v) = R^{†} (u) G^{†} G R (v) = R^{†} (u) R (v),

since

G^{†} G = I

. The single-qubit rotations (

R_{Y} (\cdot)

) on distinct qubits act on disjoint tensor factors and therefore commute, and

{R_{Y} (θ)}_{θ \in R}

is a one-parameter group, so

R_{Y} {(u_{i})}^{†} R_{Y} (v_{i}) = R_{Y} (v_{i} - u_{i})

for each i. Hence,

R^{†} (u) R (v) = ⨂_{i = 1}^{n} R_{Y} (v_{i} - u_{i}) .

Using the identity expressed as

〈 0 | R_{Y} (θ) | 0 〉 = cos (θ / 2)

, which follows from

R_{Y} (θ) = cos (θ / 2) I - i sin (θ / 2) Y

and

〈 0 | Y | 0 〉 = 0

, the all-zeros amplitude factors over qubits:

{〈 0 |}^{\otimes n} U^{†} {(u) U (v) | 0 〉}^{\otimes n} = \prod_{i = 1}^{n} 〈 0 | R_{Y} (v_{i} - u_{i}) | 0 〉 = \prod_{i = 1}^{n} cos (\frac{v_{i} - u_{i}}{2}) .

Squaring the modulus yields Equation (9). □

Two consequences follow.

Corollary 1

(Depth-1 R_Y+CNOT-Ring Kernel Is a Classical Product Kernel). Equation (9) factors over qubits and depends only on per-feature differences (

u_{i} - v_{i}

). Furthermore, in the small-angle regime (

| u_{i} - v_{i} | ≪ 1

),

k (u, v) = exp (- \frac{1}{4} {∥ u - v ∥}_{2}^{2}) + O ({∥ u - v ∥}_{4}^{4}),

(10)

i.e., the kernel agrees to the fourth order with an isotropic Gaussian RBF of bandwidth

γ = 1 / 4

.

Proof.

The factorisation and dependence on differences immediately follow from Equation (9). For the small-angle expansion, the half-angle identity (

{cos}^{2} (x / 2) = (1 + cos x) / 2

), together with the Taylor series of cos, gives

{cos}^{2} (\frac{x}{2}) = 1 - \frac{x^{2}}{4} + O (x^{4}) .

Applying

log (1 - t) = - t + O (t^{2})

to each factor of Equation (9),

log k (u, v) = \sum_{i = 1}^{n} log (1 - \frac{{(u_{i} - v_{i})}^{2}}{4} + O ({(u_{i} - v_{i})}^{4})) = - \frac{1}{4} {∥ u - v ∥}_{2}^{2} + O ({∥ u - v ∥}_{4}^{4}),

where

{∥ \cdot ∥}_{p}

denotes the

ℓ_{p}

norm. Exponentiating yields Equation (10). □

On our projected Wikispeedia data, the Spearman correlation between

\prod_{i} {cos}^{2} ((u_{i} - v_{i}) / 2)

and the corresponding RBF kernel is

0.9999

, so the two kernels induce essentially identical MST rankings.

Proposition 1

(Mercer Expansion and Effective RKHS). The depth-1 kernel of Theorem 1 admits the explicit feature map:

ϕ (z) = ⨂_{i = 1}^{n} \frac{1}{\sqrt{2}} (1, cos z_{i}, sin z_{i}) \in R^{3^{n}}, k (u, v) = 〈 ϕ (u), ϕ (v) 〉 .

(11)

Therefore, the Hilbert space of the reproducing kernel has a dimension of

3^{n}

—larger than the

2^{n}

Hilbert space of the circuit—but it contains only products of single-feature kernels, with no cross-feature interactions.

Proof.

For each qubit (i), set

ϕ_{i} (z_{i}) = \frac{1}{\sqrt{2}} (1, cos z_{i}, sin z_{i}) \in R^{3}

. Then,

〈 ϕ_{i} (u_{i}), ϕ_{i} (v_{i}) 〉 = \frac{1}{2} (1 + cos u_{i} cos v_{i} + sin u_{i} sin v_{i}) = \frac{1}{2} (1 + cos (u_{i} - v_{i})) = {cos}^{2} (\frac{u_{i} - v_{i}}{2}),

where the second equality uses the angle-difference identity for cosine and the third uses the half-angle identity, i.e.,

{cos}^{2} (x / 2) = (1 + cos x) / 2

. Inner products of tensor products factor over the factors:

〈 ϕ (u), ϕ (v) 〉 = 〈⨂_{i} ϕ_{i} (u_{i}), ⨂_{i} ϕ_{i} (v_{i})〉 = \prod_{i = 1}^{n} 〈 ϕ_{i} (u_{i}), ϕ_{i} (v_{i}) 〉 = \prod_{i = 1}^{n} {cos}^{2} (\frac{u_{i} - v_{i}}{2}) = k (u, v),

using Theorem 1 for the last equality. Each

ϕ_{i} (z_{i})

lies in

R^{3}

, so

ϕ (z) \in R^{3^{n}}

. The feature map is a tensor product of single-feature factors and therefore contains no cross-feature interaction terms. □

These results explain why R_Y+CNOT-ring kernels at depth 1 cannot outperform an appropriately tuned classical RBF kernel in principle and why the empirical pool-vs-pool comparison yields competitive parity rather than separation (see Section 4.2). Two routes break this collapse:

Proposition 2

(Depth ≥ 2 Retains an Inner Entangler). Let

U (z) = G \cdot R_{2} (z_{2}) \cdot G \cdot R_{1} (z_{1})

with

R_{ℓ} (x) = ⨂_{i} R_{Y} (x_{ℓ, i})

and G a data-independent unitary. Then,

U^{†} (u) U (v) = R_{1}^{†} (u_{1}) G^{†} ⨂_{i} R_{Y} (v_{2, i} - u_{2, i}) G R_{1} (v_{1}) .

(12)

If G is not a tensor product of single-qubit unitaries, the resulting kernel (

k_{2}

) does not factor over qubits and is not a function of

v - u

alone.

Proof.

The middle

G^{†} G = I

cancellation gives

U^{†} (u) U (v) = R_{1}^{†} (u_{1}) G^{†} R_{2}^{†} (u_{2}) R_{2} (v_{2}) G R_{1} (v_{1})

, and the one-parameter-group argument from the proof of Theorem 1 applied to the middle factor yields

R_{2}^{†} (u_{2}) R_{2} (v_{2}) = ⨂_{i} R_{Y} (v_{2, i} - u_{2, i})

, establishing Equation (12).

For the second claim, take

n = 2

and

G = {CNOT}_{12}

. A direct computation gives

{CNOT}_{12} (I \otimes R_{Y} (α)) {CNOT}_{12} = exp (- i α Z_{1} Y_{2} / 2),

which is not a tensor product of single-qubit operators for

α \notin π Z

. Hence, the inner conjugation (

G^{†} ⨂_{i} R_{Y} (v_{2, i} - u_{2, i}) G

) in Equation (12) introduces a coupling between qubit 1 and qubit 2 that is absent at depth 1, so

k_{2}

does not factor over qubits.

Furthermore,

R_{1} (v_{1})

and

R_{1}^{†} (u_{1})

are separated in Equation (12) by the (in general, non-trivial) operator (

G^{†} ⨂_{i} R_{Y} (v_{2, i} - u_{2, i}) G

), so they cannot be combined into a single

R_{1} (v_{1} - u_{1})

via the one-parameter group identity. Concretely, fix

u_{2} = v_{2}

(so the middle operator reduces to

G^{†} G = I

); then,

k_{2}

collapses to the depth-1 kernel and depends on

v_{1} - u_{1}

alone. Now, perturb

u_{2} \neq v_{2}

: the middle operator becomes

\neq I

, and varying

u_{1}

at a fixed

v_{1} - u_{1}

generally changes the inner product (

{〈 0 |}^{\otimes n} R_{1}^{†} (u_{1}) [\dots] R_{1} (v_{1}) {| 0 〉}^{\otimes n}

). Hence,

k_{2}

is not a function of

v - u

alone. □

The additional expressivity does not, however, translate into better MST rankings on our data because a depth

\geq 2

also concentrates kernel values around a fixed mean (Remark 1).

Proposition 3

(IQP Encoding Escapes the Cancellation). For the IQP encoding (

U (z) = \prod_{(i, j)} ZZ (z_{i} z_{j}) \cdot ⨂_{i} R Z (z_{i}) \cdot H^{\otimes n}

) from Equation (6),

U^{†} (u) U (v) = H^{\otimes n} \cdot ⨂_{i} R Z (v_{i} - u_{i}) \cdot \prod_{(i, j)} ZZ (v_{i} v_{j} - u_{i} u_{j}) \cdot H^{\otimes n} .

(13)

The kernel (

k_{IQP} (u, v) = {| 〈 0 |}^{\otimes n} U^{†} {(u) U (v) | 0 〉}^{\otimes n} |^{2}

) does not factor over qubits and is not a function of

v - u

alone.

Proof.

Using

H^{†} = H

,

R Z^{†} (α) = R Z (- α)

, and

{ZZ}^{†} (α) = ZZ (- α)

,

U^{†} (z) = H^{\otimes n} \cdot ⨂_{i} R Z (- z_{i}) \cdot \prod_{(i, j)} ZZ (- z_{i} z_{j}),

so that

U^{†} (u) U (v) = H^{\otimes n} \cdot ⨂_{i} R Z (- u_{i}) \cdot \prod_{(i, j)} ZZ (- u_{i} u_{j}) \cdot \prod_{(i, j)} ZZ (v_{i} v_{j}) \cdot ⨂_{i} R Z (v_{i}) \cdot H^{\otimes n} .

All

R Z

and

ZZ

gates are diagonal in the computational basis and therefore commute pairwise. Combining same-axis rotations using

R Z (α) R Z (β) = R Z (α + β)

and

ZZ (α) ZZ (β) = ZZ (α + β)

yields Equation (13).

For the second claim, take

n = 2

and consider two pairs of inputs with the same difference but different products:

(u, v) = ((0, 0), (1, 1))

and

(u^{'}, v^{'}) = ((1, 0), (2, 1))

. Both satisfy

v - u = v^{'} - u^{'} = (1, 1)

, but

v_{0} v_{1} - u_{0} u_{1} = 1 \neq 2 = v_{0}^{'} v_{1}^{'} - u_{0}^{'} u_{1}^{'}

. Direct evaluation of Equation (13) on these inputs yields

k_{IQP} (u, v) \approx 0.469

and

k_{IQP} (u^{'}, v^{'}) \approx 0.211

, so

k_{IQP}

is not a function of

v - u

alone and, a fortiori, does not factor over qubits as a single-feature product. □

Remark 1

(Empirical Concentration with Depth). On 1000 cached Wikispeedia document pairs at

n = 8

qubits, the standard deviation of compute–uncompute kernel values across the pair sample collapses with depth:

Layers L	1	2	4	6	8
Std of k	$0.175$	$0.131$	$0.065$	$0.029$	$0.015$

This is consistent with the exponential concentration result of Thanasilp et al. [30]: deep encoders push kernel values toward a fixed mean, eroding the inter-pair distinguishability that the MST construction relies on. Our deepest configuration (8q

\times

8 L) has a kernel-value std of

0.015

, so most pairs are nearly indistinguishable to the path-extraction step, which explains its empirical underperformance.

Together, Theorem 1, Proposition 1, Proposition 3, and Remark 1 characterise the design space we explore: the natural data re-uploading kernel collapses to a classical product kernel at depth 1; deeper layers escape that collapse but concentrate empirically; non-tensor-product encodings (IQP and PQK) escape both but—as the experiments show—do not produce a decisive empirical advantage for storyline extraction in our setting.

4.2. Supervised Evaluation: Wikispeedia

On 11,215 human navigation paths from Wikispeedia, the standard evaluation metrics partition into two clusters that rank methods in opposite ways. Alignment-based metrics—length-normalised DTW (nDTW), nDTW similarity, and per-step DTW similarity—reward methods that produce long, alignment-rich extractions; set-overlap metrics—Jaccard and F1—reward methods that produce shorter extractions with more verbatim overlap with the human article set. Across the methods we evaluate, the two clusters’ top entries are different.

Table 1 reports the alignment cluster. The two best methods are q_umap_4q_4d (nDTW

1.339 \pm 0.031

across five seeds) and the cos_umap_12d classical baseline on UMAP projection (nDTW

1.352 \pm 0.009

); the cluster-aware classical baseline classical_NT sits at an nDTW of

1.452 \pm 0.018

. Random projection methods—both quantum and classical—occupy the bottom of this cluster (nDTW

\geq 1.65

).

Table 2 reports the overlap cluster. The ordering is essentially reversed: random projection methods (both quantum and classical) achieve the highest Jaccard overlap and F1 score, while classical_NT and q_umap_4q_4d sit at the bottom. The mechanism is visible in the extracted path lengths (Section 4.2): the cluster-aware classical_NT extracts paths of about 20 articles, and UMAP-projected methods extract paths of 25–30 articles, both aligning well with the 8.5-article human path under nDTW but covering few of its specific articles; random projection methods extract paths of ∼14 articles, with more verbatim overlap but worse alignment.

Figure 5 shows the per-method ranking under each cluster for the 12 LLM-evaluated methods (one representative per family from each of the alignment-cluster and overlap-cluster ends), making the disagreement between the two clusters visible at body-figure scale. The complete ranking across all 93 methods is reported in Figure S6 of the Supplementary Material, which confirms that the cluster split is a property of the entire sweep and not an artefact of the 12-method selection.

4.3. Unsupervised Evaluation: Cuban News

Table 3 reports mean overall coherence per method per judge across the full

5 \times 30 = 150

(seed, endpoint pair) units, with 95% bootstrap confidence intervals. Both judges place classical_NT first; the next four methods (q_umap_4q_4d, the cos_random family, the rbf_random family, and q_random_4q_8d) have heavily overlapping intervals and are statistically indistinguishable from each other under both judges.

GPT-4o scores are systematically higher than Sonnet 4.5 scores by ∼0.4–0.5 points, on average, but the two judges agree on the method ranking: across the 12 methods, Spearman

ρ = 0.860

and Kendall

τ = 0.727

. At the cell level (1800 distinct (method, seed, pair) triples), inter-judge agreement is moderate (Spearman

ρ = 0.581

on overall coherence, and ICC(2,1) absolute agreement

= 0.494

); the largest disagreements are on temporal coherence, where GPT-4o is systematically more lenient (mean

7.90

vs.

7.10

,

ρ = 0.415

).

Figure 6 shows family-level mean overall coherence with bootstrap confidence intervals alongside the per-dimension pool comparison; Figure 7 shows the per-method ranking under each judge.

4.4. Statistical Analysis

We analyse the LLM coherence data with two paired Wilcoxon comparisons motivated by what an honest reading of the kernel families demands. First, we ask whether genuinely quantum kernels (5 methods: 3 R_Y+CNOT-ring data-reuploading kernels, 1 multi-layer IQP, and 1 projected quantum kernel) outperform the full set of classical kernel methods in the LLM-evaluated subset (7 methods: classical_NT, plus 3 cosine and 3 RBF kernels on the random Gaussian projection). Six of these seven classical methods are matched on input with the quantum pool (same random projection), and classical_NT is the original cluster-aware baseline included for completeness; including it strengthens the classical pool and therefore makes the quantum pool’s null result more conservative. Second, we ask whether the five quantum kernels outperform the original cluster-aware Narrative Trails baseline (classical_NT, which combines UMAP-projected angular similarity with HDBSCAN-derived topic similarity) on its own; this is the comparison the original manuscript made. We use the within-(seed, pair) mean over each pool as the aggregated value, paired across

n = 5 \times 30 = 150

units, with Holm–Bonferroni correction across the four dimensions and two judges of each comparison. Each Wilcoxon test runs within a single judge—scores are never pooled across judges—so the moderate cell-level inter-judge agreement (ICC(2,1) =

0.494

, Section 4.3) does not enter the test as a noise term. Cross-judge consistency is then a separate qualitative robustness check: the same comparisons are reported under each judge in Table 4 and Table 5, and the conclusions agree in sign and significance pattern across the two.

Quantum kernel pool vs. classical kernel pool (matched projection input): no significant difference on any of the four dimensions under either judge after Holm correction (Table 4). The largest raw p is

0.83

(GPT-4o, temporal coherence), and the smallest is

0.02

(Sonnet 4.5, thematic consistency), with neither surviving correction. Effect sizes range from

- 0.11 %

to

- 1.01 %

.

Table 4. Per-dimension paired Wilcoxon comparison of mean over 5 quantum kernel methods vs. mean over 7 classical kernel methods (classical_NT plus the 6 cosine/RBF kernels on random projection),

n = 150

(seed, pair) units. Holm–Bonferroni correction across 4 dimensions × 2 judges. No significant differences after correction.

Table 4. Per-dimension paired Wilcoxon comparison of mean over 5 quantum kernel methods vs. mean over 7 classical kernel methods (classical_NT plus the 6 cosine/RBF kernels on random projection),

n = 150

(seed, pair) units. Holm–Bonferroni correction across 4 dimensions × 2 judges. No significant differences after correction.

Judge	Dimension	Classical Pool	Quantum Pool	$Δ$ %	p Holm
Sonnet 4.5	Logical Flow	6.77	6.74	$- 0.51$	$0.65$
Sonnet 4.5	Thematic Consistency	7.71	7.63	$- 1.01$	$0.19$
Sonnet 4.5	Temporal Coherence	7.11	7.09	$- 0.33$	$0.60$
Sonnet 4.5	Narrative Completeness	5.76	5.71	$- 0.83$	$0.65$
GPT-4o	Logical Flow	6.86	6.80	$- 0.79$	$0.65$
GPT-4o	Thematic Consistency	7.86	7.79	$- 0.88$	$0.65$
GPT-4o	Temporal Coherence	7.90	7.89	$- 0.11$	$0.83$
GPT-4o	Narrative Completeness	6.49	6.44	$- 0.74$	$0.65$

Quantum kernel pool vs. classical narrative trails alone: Significant differences are obvered in favour of classical_NT on six of the eight cells after Holm correction (Table 5). The two non-significant cells are temporal coherence under each judge—the dimension most directly enforced by the date-constrained pathfinder, which is shared across all methods. Mean differences are 1.1–3.3% per dimension and consistent in sign across the two judges. The original manuscript’s claim that quantum methods significantly outperform classical on three of four dimensions reflected single-seed, single-judge variance: under multi-seed, dual-judge replication with the genuinely quantum subset, the cluster-aware classical baseline modestly but consistently outperforms.

Table 5. Per-dimension paired Wilcoxon comparison of mean over 5 quantum kernel methods vs. classical_NT alone,

n = 150

(seed, pair) units. Holm–Bonferroni correction across 4 dimensions × 2 judges. *

p < 0.05

after correction.

Table 5. Per-dimension paired Wilcoxon comparison of mean over 5 quantum kernel methods vs. classical_NT alone,

n = 150

(seed, pair) units. Holm–Bonferroni correction across 4 dimensions × 2 judges. *

p < 0.05

after correction.

Judge	Dimension	Classical_NT	Quantum Pool	$Δ$ %	p Holm
Sonnet 4.5	Logical Flow	6.89	6.74	$- 2.13$	0.002 *
Sonnet 4.5	Thematic Consistency	7.80	7.63	$- 2.12$	<0.001 *
Sonnet 4.5	Temporal Coherence	7.17	7.09	$- 1.13$	$0.09$
Sonnet 4.5	Narrative Completeness	5.91	5.71	$- 3.34$	<0.001 *
GPT-4o	Logical Flow	6.96	6.80	$- 2.24$	0.01 *
GPT-4o	Thematic Consistency	7.96	7.79	$- 2.18$	0.007 *
GPT-4o	Temporal Coherence	8.00	7.89	$- 1.37$	0.06
GPT-4o	Narrative Completeness	6.64	6.44	$- 2.99$	0.007 *

Cross-task agreement. The 12 LLM-evaluated methods also have five-seed Wikispeedia metrics. Spearman correlation between the Cuba LLM coherence rank (mean over both judges) and each Wikispeedia metric rank (Table 6) reveals that the alignment-cluster Wikispeedia metrics agree with the LLM-judged ranking (

ρ \approx + 0.7

), while the overlap-cluster metrics disagree with it (

ρ \approx - 0.6

). Per-step DTW similarity is uncorrelated with LLM coherence. This rank disagreement is internal to the Wikispeedia metrics: the choice of supervised metric pre-determines which family of methods looks best on the unsupervised task too.

4.5. Cross-Task Scatter Plots

Figure 8 visualises the Spearman correlations of Section 4.4 (Table 6) as scatter plots. Each point is one of the 12 LLM-evaluated methods; the y-axis is the Cuba LLM’s mean overall coherence (averaged across both judges), and the x-axis cycles through five Wikispeedia metrics. The two alignment-cluster metrics (nDTW and nDTW similarity) line up against LLM coherence with

ρ \approx + 0.7

, the two overlap-cluster metrics (Jaccard, F1) anti-correlate with

ρ \approx - 0.6

, and per-step DTW similarity is uncorrelated. Method colours encode kernel family.

4.6. Computational Performance

Table 7 reports per-pair coherence computation time across all kernel families evaluated in the 93-method sweep, refreshed with the timing script in the supplementary archive (5 seeds × 10 trials = 50 measurements of 100 random pairs per configuration run sequentially on a single host with no concurrent activity). The cluster-aware Narrative Trails coherence (rescaled angular cosine on the UMAP-projected vector blended with a JSD-based topic similarity on the cluster-membership probabilities) sets the classical reference at

0.038

ms per pair; bare cosine and RBF on the same projected inputs are 7–

16 \times

cheaper than classical_NT because they skip the topic term. R_Y+CNOT-ring quantum kernels range from

2.16

ms (2 qubits, averaged across layer counts) to

12.45

ms (8 qubits, 8 layers); the alternative IQP/ZZFeatureMap encoder is slightly cheaper at matched depth (e.g.,

10.86

ms vs

12.45

ms at 8 qubits, 8 layers) because it omits the explicit CNOT ring; the projected quantum kernel is the heaviest of the three quantum families (

15.15

ms at 8 qubits, 8 layers) because the post-projection Gaussian is built on dense

3 n

-dim local Pauli vectors. Within each family, execution time scales approximately linearly with circuit depth at a fixed qubit count; the sub-linear ratio (

5.3 \times

for

8 \times

more layers in the R_Y+CNOT 8-qubit sweep) reflects a fixed per-pair circuit-initialisation overhead. Absolute timings are simulator- and host-dependent; the relative ordering across families reproduces across hardware.

5. Discussion

5.1. Two Wikispeedia Metric Clusters Disagree

The Wikispeedia metrics in widespread use for storyline extraction do not all measure the same thing. Alignment-based metrics (length-normalised DTW and per-step DTW similarity) reward methods that produce long, alignment-rich extractions whose trajectory closely follows the human navigation; set-overlap metrics (Jaccard and F1) reward methods that produce shorter extractions sharing many specific articles with the human path. These two views of “goodness” rank our methods in opposite orders (Table 1 and Table 2), with a rank correlation of

ρ \approx - 0.6

between the two clusters. The disagreement is not a bug but a structural feature: the human Wikispeedia paths are short navigation games rather than constructed narratives, so methods optimised for set overlap with them will tend to truncate to short article subsets, while methods optimised for trajectory alignment will tend to produce longer paths that share the human path’s geometric envelope without necessarily covering its specific nodes. We present both clusters because either alone gives a misleading impression of relative method quality.

5.2. LLM Coherence Rank Aligns with the Alignment Cluster

Cross-task analysis (Table 6) reveals that LLM-judged narrative coherence correlates positively with the Wikispeedia alignment cluster (

ρ \approx + 0.7

) and negatively with the overlap cluster (

ρ \approx - 0.6

). Methods that win on alignment also produce stories LLMs find more coherent; methods that win on overlap produce stories LLMs find less coherent. The mechanism is consistent across both phenomena: smooth topical concentration—supplied either by HDBSCAN cluster membership in classical_NT or by aggressive UMAP-to-low-d bottleneck in q_umap_4q_4d and cos_umap_12d—helps both alignment with human paths and the LLM-judged coherence of the extracted storyline. Sharp pairwise similarity on raw random projections, in contrast, helps overlap with short navigation paths but not narrative coherence.

5.3. Quantum Kernels Are Competitive but Not Decisively Better

On the matched-input pool comparison (Table 4), the genuinely quantum kernels and the classical kernels on the same projection are statistically indistinguishable in every dimension under both judges after Holm correction. The cluster-aware classical_NT baseline is the top method by a small but consistent margin (Table 5); the quantum kernel (q_umap_4q_4d) is its closest competitor in terms of overall coherence and beats it on the alignment cluster of Wikispeedia metrics. The picture is therefore one of competitive parity: quantum kernels work as well as classical kernels on this task, the cluster-aware baseline retains a modest edge attributable to its explicit topical structure, and no kernel choice produces dramatic improvements over any other.

This empirical parity has a theoretical interpretation, given the characterisation in Section 4.1. At depth 1, the R_Y+CNOT-ring kernel is provably a classical product kernel close to an RBF (Theorem 1, Corollary 1); at greater depth, the kernel does become genuinely non-classical (Proposition 2), but exponential concentration sets in (Remark 1), erasing the inter-pair distinguishability the MST construction needs. The non-product encodings (IQP and PQK) escape the depth-1 cancellation but do not show empirical separation either, indicating that the additional expressivity of these encodings is not what storyline extraction on document embeddings is bottlenecked by. Quantum kernels for storyline extraction are therefore best read as alternative classical kernels with an extra cost: the underlying coherence-graph problem is sufficiently well-posed in the projected geometry that simpler kernels exhaust most of the available signal.

5.4. Why `classical_NT` Wins the LLM Comparison

The Narrative Trails baseline combines angular similarity in UMAP-projected space with topic similarity from HDBSCAN soft clustering. The cluster term explicitly encodes “stay on theme” across the path. On Wikispeedia, where humans played a navigation game [16] rather than constructing narratives, the cluster term over-constrains the path—this is why classical_NT loses the alignment cluster to UMAP-based methods that omit it. On Cuba narrative coherence, where a “good” story is, by definition, a thematic unit, the cluster term is exactly what the LLM is grading, and it produces a modest advantage. The same property may hurt or help depending on the task. We therefore avoid framing classical_NT’s LLM coherence win as a generic kernel-quality result: it is task-specific.

5.5. Departures from the Original Manuscript Claims

Earlier reports of this work claimed that quantum kernels significantly outperform the classical baseline on three of four LLM coherence dimensions (

p < 0.05

, paired Wilcoxon) and achieve a 29.6% DTW improvement on Wikispeedia. Both claims need qualification. The DTW improvement was reported using raw accumulated DTW cost rather than the length-normalised nDTW used in the Narrative Trails reference [2]; under nDTW, the gap between the best UMAP-projected method and classical_NT is approximately 7%, not 29.6%. The per-dimension LLM coherence claim used a single random projection seed and a single judge, with the “quantum” pool effectively averaging across configurations of widely varying behaviour; the multi-seed, dual-judge replication reverses the sign of the comparison against classical_NT (Section 4.4). The storyline-set Jaccard overlap between seeds for the same quantum kernel is approximately

0.22

, so single-seed effect sizes of

1.5

–

3.3 %

are within the seed-driven storyline-set noise.

5.6. Limitations

Quantum simulation, not hardware. All experiments use classical simulation of quantum circuits. We cannot assess how noise, decoherence, and gate errors on real quantum hardware would affect coherence computation [8,30]. The practical viability of quantum coherence kernels depends on hardware performance that we do not evaluate. Two structural arguments suggest the central conclusions transfer to hardware nonetheless: the depth-1 cancellation result (Theorem 1) holds at the operator level, and symmetric noise on

U (z_{v})

and

U^{†} (z_{u})

preserves the per-feature factorisation up to a uniform contrast reduction, leaving the pair ranking the MST construction depends on unchanged, and the exponential concentration phenomenon underlying the deep encoder underperformance (Remark 1, [30]) is reinforced rather than relieved by hardware noise. The competitive-parity verdict is therefore conservative for the hardware case.

Qubit scalability in simulation. Our experiments are limited to 2, 4, and 8 qubits due to the exponential state space of quantum simulation (

2^{n}

states for n qubits). As shown in Table 7, while coherence computation remains practical (∼1–15 ms per pair across all three quantum families), the slowdown relative to the cluster-aware classical_NT baseline (∼50–

400 \times

in lightning.qubit simulation) and the memory required to hold the dense

2^{n} \times 2^{n}

state vector constrain exploration of higher qubit counts. Investigating whether larger circuits yield further improvements would require quantum hardware or advanced simulation techniques such as tensor networks.

LLM-judge scope. The unsupervised evaluation uses two judges (Claude Sonnet 4.5 and GPT-4o) at

T = 0

via structured tool calling on a 12-method subset of the 93 supervised configurations chosen by aggregate rank. Method-level rank correlation between the two judges is high (Spearman

ρ = 0.86

), but cell-level absolute agreement is moderate (ICC

\approx 0.49

). The qualitative conclusions are not driven by judge choice within this pair, but neither do we claim that all current frontier LLMs would converge on the same ranking; further judges would tighten the inference but are unlikely to overturn the matched-input pool-vs-pool null result.

Storyline-set seed sensitivity. Storylines extracted by the same kernel under different random projection seeds share only ∼22% of their articles, on average. The five-seed protocol mitigates but does not eliminate this variance: single-seed effect sizes below ∼5% are within the seed-driven storyline-set noise floor, which is why we average over seeds and report bootstrap intervals.

Dataset and embedding dependence. Results may depend on the specific datasets and embedding model. Different corpora, domains, or embedding models could yield different relative performance. The hypothesis that random projection preserves useful structure assumes the original embeddings capture semantically meaningful relationships.

Evaluation-metric dependence. The two Wikispeedia metric clusters (alignment and overlap) rank methods in opposite orders. Either cluster taken alone gives a misleading picture of relative kernel quality. We do not take a position on which cluster is the “true” Wikispeedia metric—both have face validity for different aspects of narrative extraction—but we recommend reporting both whenever Wikispeedia is used as a benchmark.

No direct human evaluation. Our evaluation relies on two proxies for narrative quality: alignment with historical human navigation paths and LLM-as-a-judge coherence ratings. Neither constitutes a user study with human readers evaluating extracted storylines for comprehensibility, usefulness, or engagement. Wikispeedia captures how humans navigated Wikipedia under game-like conditions [16], which may differ from how they would evaluate narrative coherence. LLM judgements reflect model-specific notions of coherence that may not align with human preferences. We mitigate this in two ways. First, we use a dual-judge protocol with two architecturally distinct frontier LLMs (Claude Sonnet 4.5 and GPT-4o) at

T = 0

via structured tool calling and report cell-level absolute agreement (ICC_2,1

\approx 0.49

) alongside method-level rank correlation (

ρ = 0.86

); the qualitative pool-vs-pool conclusion is preserved across both judges. Second, prior LLM-as-a-judge evaluation studies on related text-quality tasks [46,47,48] report substantial Pearson and Spearman correlation (

ρ ≳ 0.5

) between frontier-LLM scores and human ratings on coherence-adjacent dimensions, supporting the use of LLM judgement as a coarse proxy when matched-input pool-vs-pool comparisons are the analytical target rather than absolute coherence levels. Future work should nonetheless include human judges rating extracted storylines directly to confirm that the cross-task agreement we observe between the alignment cluster and LLM coherence is not driven by LLM-specific stylistic preferences.

6. Conclusions

We have introduced quantum kernels as coherence functions for narrative path extraction—to the best of our knowledge, the first systematic characterisation of quantum kernel methods applied to storyline extraction—and compared them to classical baselines on Wikispeedia and on a Cuban news corpus under a 12-method, 5-seed, dual-judge LLM evaluation. Our main findings are negative on quantum advantage but positive on methodology.

First, the standard Wikispeedia metrics partition into two clusters that rank methods in opposite orders: alignment-based metrics favour UMAP-projected quantum kernels and the cluster-aware classical baseline; set-overlap metrics favour random projection methods, regardless of whether the kernel is quantum or classical. Either cluster reported alone yields a misleading verdict on relative kernel quality, and we recommend reporting both for any future Wikispeedia-based evaluation. Second, on Cuba LLM-judged coherence, the cluster-aware Narrative Trails baseline is the top method under both judges; a pool-vs-pool comparison between the five genuinely quantum methods and the seven classical methods (classical_NT plus six cosine/RBF kernels on the random projection that the quantum pool also uses) shows no significant differences after Holm correction. Third, the LLM-judged coherence ranking correlates with the alignment cluster of Wikispeedia metrics (

ρ \approx + 0.7

) and anti-correlates with the overlap cluster (

ρ \approx - 0.6

); the same kernel property—smooth topical concentration—helps both alignment with human paths and LLM-judged narrative coherence.

The practical takeaway for NLP practitioners considering quantum kernels for document similarity is that they are competitive with classical kernels at the same projection budget but do not provide a decisive advantage on this task; the cluster-aware classical baseline retains a small but consistent edge attributable to its explicit topical structure. Future work should examine whether quantum kernels offer advantages on tasks that genuinely reward the higher-order feature interactions they encode, conduct human-judge studies to triangulate the LLM evaluation, and explore hardware implementations whose noise characteristics may differ from classical simulation.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/math14101734/s1: Section S1 (Empirical Companions to the Depth-1 Cancellation Theorem) with Figures S1–S3 (Spearman rank correlation between each quantum kernel and an RBF kernel as a function of circuit depth; exponential concentration of

R_{Y}

+CNOT kernel values with depth; bandwidth-tuning sweep against RBF on representative single-layer configurations); Section S2 (Robustness, Path Geometry, and Full-Sweep Rankings) with Figures S4–S7 (multi-seed dispersion of nDTW for the 12 LLM-evaluated methods; extracted-path length structure across methods; complete dual-cluster ranking across all 93 methods; method-by-metric rank heatmap across eight Wikispeedia metrics for all 93 methods); and Section S3 (Accompanying Code and Data) describing the minimal analysis scripts and the cached experimental outputs distributed alongside this manuscript.

Author Contributions

Conceptualization, B.K.-N.; methodology, B.K.-N., J.C., M.A., C.R.-C. and C.M.-V.; software, B.K.-N., J.C., M.A. and A.F.-B.; validation, B.K.-N., C.R.-C., C.M.-V., J.C. and M.A.; formal analysis, B.K.-N. and C.M.-V.; investigation, B.K.-N., A.F.-B., E.L.-E., J.C. and M.A.; resources, B.K.-N., J.C. and M.A.; data curation, B.K.-N.; writing—original draft preparation, B.K.-N., C.R.-C. and C.M.-V.; writing—review and editing, B.K.-N., C.R.-C., C.M.-V., E.L.-E., A.F.-B., J.C. and M.A.; visualisation, B.K.-N.; supervision, B.K.-N.; project administration, B.K.-N.; funding acquisition, B.K.-N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by ANID FONDEF grant number ID25I10072 “Narrative Panopticon: Intelligent Platform For Mapping And Monitoring Information Narratives From Multi-Source Data Streams”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included as Supplementary Material, comprising the cached experimental outputs (per-method and per-seed Wikispeedia metrics, per-pair and per-method dual-judge LLM scores, kernel-concentration and bandwidth-tuning sweeps, and timing benchmarks) and minimal Python scripts for the three quantum kernels and the per-pair timing sweep, alongside the analysis scripts that reproduce the body tables from the cached outputs. The classical pipeline (UMAP and HDBSCAN preprocessing, link-constrained MST extraction, nDTW evaluation, and Cuban news LLM judging) is unchanged from Narrative Trails [2] and is referenced rather than duplicated. Further inquiries can be directed to the corresponding author. The Wikispeedia dataset is publicly available at https://snap.stanford.edu/data/wikispeedia.html (accessed on 10 April 2026). The Cuban news corpus is available at https://github.com/briankeithn/narrative-maps (accessed on 10 April 2026).

Acknowledgments

During the preparation of this work, the authors used Claude to refine writing and support literature review activities. Additionally, Writefull integrated in Overleaf was used to improve writing quality and readability. After using these tools/services, the authors reviewed and edited the content as needed and take full responsibility for the content of the article.

Conflicts of Interest

The authors have no conflicts of interest to declare. The affiliation of the second and third authors with CoreDevX, and the ANID FONDEF funding, are both transparently disclosed in the manuscript (in the Affiliations and Funding sections, respectively).

Abbreviations

The following abbreviations are used in this manuscript:

DTW	Dynamic Time Warping
HDBSCAN	Hierarchical Density-Based Spatial Clustering of Applications with Noise
LLM	Large Language Model
MST	Maximum Spanning Tree
NLP	Natural Language Processing
QML	Quantum Machine Learning
RBF	Radial Basis Function
UMAP	Uniform Manifold Approximation and Projection

Notation Summary

Symbol	Description
D	Original embedding dimension (1536)
d	Projected dimension (2–64)
n	Number of qubits
N	Number of documents (documents referenced as $1, \dots, N$ )
L	Number of circuit layers
$π$	Projection function $R^{D} \to R^{d}$
$z_{i}$	Projected embedding for document i: $z_{i} = π (e_{i})$
${\hat{z}}_{i}$	HDBSCAN soft cluster assignment for document i
$U (z)$	Quantum encoding circuit
$\| ψ (z) 〉$	Encoded quantum state
$k (\cdot, \cdot)$	Quantum kernel function
$θ (\cdot, \cdot)$	Document coherence function
$P^{*}$	Maximum-capacity path

References

Keith Norambuena, B.F.; Mitra, T.; North, C. A survey on event-based news narrative extraction. ACM Comput. Surv. 2023, 55, 1–39. [Google Scholar] [CrossRef]
German, F.; Keith Norambuena, B.; North, C. Narrative Trails: A Method for Coherent Storyline Extraction via Maximum Capacity Path Optimization. In Proceedings of the Text2Story 2025: Eighth Workshop on Narrative Extraction from Texts (Text2Story@ECIR 2025), Lucca, Italy, 10 April 2025; CEUR Workshop Proceedings; CEUR-WS.org: Aachen, Germany, 2025; Volume 3964, pp. 15–27. [Google Scholar]
Shahaf, D.; Guestrin, C. Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–28 July 2010; pp. 623–632. [Google Scholar] [CrossRef]
Havlíček, V.; Córcoles, A.D.; Temme, K.; Harrow, A.W.; Kandala, A.; Chow, J.M.; Gambetta, J.M. Supervised Learning with Quantum-enhanced Feature Spaces. Nature 2019, 567, 209–212. [Google Scholar] [CrossRef] [PubMed]
Schuld, M.; Killoran, N. Quantum Machine Learning in Feature Hilbert Spaces. Phys. Rev. Lett. 2019, 122, 040504. [Google Scholar] [CrossRef]
Huang, H.Y.; Broughton, M.; Cotler, J.; Chen, S.; Li, J.; Mohseni, M.; Neven, H.; Babbush, R.; Kueng, R.; Preskill, J.; et al. Quantum advantage in learning from experiments. Science 2022, 376, 1182–1186. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Arunachalam, S.; Temme, K. A rigorous and robust quantum speed-up in supervised machine learning. Nat. Phys. 2021, 17, 1013–1017. [Google Scholar] [CrossRef]
Cerezo, M.; Arrasmith, A.; Babbush, R.; Benjamin, S.C.; Endo, S.; Fujii, K.; McClean, J.R.; Mitarai, K.; Yuan, X.; Cincio, L.; et al. Variational Quantum Algorithms. Nat. Rev. Phys. 2021, 3, 625–644. [Google Scholar] [CrossRef]
Barzilay, R.; Lapata, M. Modeling local coherence: An entity-based approach. Comput. Linguist. 2008, 34, 1–34. [Google Scholar] [CrossRef]
Johnson, W.B.; Lindenstrauss, J. Extensions of Lipschitz Mappings into a Hilbert Space. Contemp. Math. 1984, 26, 189–206. [Google Scholar] [CrossRef]
Achlioptas, D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 2003, 66, 671–687. [Google Scholar] [CrossRef]
West, R.; Leskovec, J. Human wayfinding in information networks. In Proceedings of the 21st international conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 619–628. [Google Scholar] [CrossRef]
Schuld, M.; Petruccione, F. Quantum Models as Kernel Methods. In Machine Learning with Quantum Computers; Quantum Science and Technology; Springer: Cham, Switzerland, 2021; pp. 217–245. [Google Scholar] [CrossRef]
Schuld, M.; Sweke, R.; Meyer, J.J. Effect of data encoding on the expressive power of variational quantum-machine-learning models. Phys. Rev. A 2021, 103, 032430. [Google Scholar] [CrossRef]
Lloyd, S.; Schuld, M.; Ijaz, A.; Izaac, J.; Killoran, N. Quantum Embeddings for Machine Learning. arXiv 2020, arXiv:2001.03622. [Google Scholar] [CrossRef]
West, R.; Pineau, J.; Precup, D. Wikispeedia: An Online Game for Inferring Semantic Distances between Concepts. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009), Pasadena, CA, USA, 11–17 July 2009; pp. 1598–1603. [Google Scholar]
Keith Norambuena, B.F.; Mitra, T.; North, C. Mixed multi-model semantic interaction for graph-based narrative visualizations. In Proceedings of the 28th International Conference on Intelligent User Interfaces, Sydney, NSW, Australia, 27–31 March 2023; pp. 866–888. [Google Scholar] [CrossRef]
Kruskal, J.B. On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 1956, 7, 48–50. [Google Scholar] [CrossRef]
Jolliffe, I.T. Principal Component Analysis, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2002. [Google Scholar] [CrossRef]
Tenenbaum, J.B.; de Silva, V.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef]
Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed]
Becht, E.; McInnes, L.; Healy, J.; Dutertre, C.A.; Kwok, I.W.; Ng, L.G.; Ginhoux, F.; Newell, E.W. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 2019, 37, 38–44. [Google Scholar] [CrossRef]
Schuld, M.; Petruccione, F. Machine Learning with Quantum Computers, 2nd ed.; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
Meichanetzidis, K.; Gogioso, S.; de Felice, G.; Chiappori, N.; Toumi, A.; Coecke, B. Quantum Natural Language Processing on Near-Term Quantum Computers. Electron. Proc. Theor. Comput. Sci. 2021, 340, 213–229. [Google Scholar] [CrossRef]
Kartsaklis, D.; Fan, I.; Yeung, R.; Pearson, A.; Lorenz, R.; Toumi, A.; de Felice, G.; Meichanetzidis, K.; Clark, S.; Coecke, B. lambeq: An Efficient High-Level Python Library for Quantum NLP. arXiv 2021, arXiv:2110.04236. [Google Scholar] [CrossRef]
Lorenz, R.; Pearson, A.; Meichanetzidis, K.; Kartsaklis, D.; Coecke, B. QNLP in practice: Running compositional models of meaning on a quantum computer. J. Artif. Intell. Res. 2023, 76, 1305–1342. [Google Scholar] [CrossRef]
Widdows, D.; Aboumrad, W.; Kim, D.; Ray, S.; Mei, J. Quantum Natural Language Processing. KI-Künstl. Intell. 2024, 38, 293–310. [Google Scholar] [CrossRef]
Coecke, B.; Sadrzadeh, M.; Clark, S. Mathematical Foundations for a Compositional Distributional Model of Meaning. Linguist. Anal. 2010, 36, 345–384. [Google Scholar]
Huang, H.Y.; Broughton, M.; Mohseni, M.; Babbush, R.; Boixo, S.; Neven, H.; McClean, J.R. Power of data in quantum machine learning. Nat. Commun. 2021, 12, 2631. [Google Scholar] [CrossRef]
Thanasilp, S.; Wang, S.; Cerezo, M.; Holmes, Z. Exponential concentration in quantum kernel methods. Nat. Commun. 2024, 15, 5200. [Google Scholar] [CrossRef] [PubMed]
Sgouros, N.M. Embedding and implementation of quantum computational concepts in digital narratives. In Proceedings of the International Conference on Entertainment Computing; Springer: Berlin/Heidelberg, Germany, 2015; pp. 140–154. [Google Scholar] [CrossRef]
Zhou, D.; Guo, L.; He, Y. Neural storyline extraction model for storyline generation from news articles. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; Volume 1, pp. 1727–1736. [Google Scholar] [CrossRef]
Santana, B.; Campos, R.; Amorim, E.; Jorge, A.; Silvano, P.; Nunes, S. A survey on narrative extraction from textual data. Artif. Intell. Rev. 2023, 56, 8393–8435. [Google Scholar] [CrossRef]
Periyasamy, M.; Meyer, N.; Ufrecht, C.; Scherer, D.D.; Plinge, A.; Mutschler, C. Incremental Data-Uploading for Full-Quantum Classification. In Proceedings of the IEEE International Conference on Quantum Computing and Engineering (QCE), Broomfield, CO, USA, 18–23 September 2022; pp. 31–37. [Google Scholar] [CrossRef]
Pérez-Salinas, A.; Cervera-Lierta, A.; Gil-Fuster, E.; Latorre, J.I. Data Re-uploading for a Universal Quantum Classifier. Quantum 2020, 4, 226. [Google Scholar] [CrossRef]
Larsen, K.G.; Nelson, J. The Optimality of the Johnson-Lindenstrauss Lemma. In Proceedings of the 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS); IEEE: Piscataway, NJ, USA, 2017; pp. 633–638. [Google Scholar] [CrossRef]
Healy, J.; McInnes, L. Uniform manifold approximation and projection. Nat. Rev. Methods Prim. 2024, 4, 82. [Google Scholar] [CrossRef]
McInnes, L.; Healy, J.; Astels, S. hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2017, 2, 205. [Google Scholar] [CrossRef]
Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef]
Sakoe, H.; Chiba, S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 1978, 26, 43–49. [Google Scholar] [CrossRef]
Keith, B. LLM-as-a-Judge Approaches as Proxies for Mathematical Coherence in Narrative Extraction. Electronics 2025, 14, 2735. [Google Scholar] [CrossRef]
Shinde, A.R.; Nurminen, J.K. Influence of Data Dimensionality Reduction Methods on the Effectiveness of Quantum Machine Learning Models. In Proceedings of the 2025 IEEE International Conference on Quantum Computing and Engineering (QCE); IEEE: Piscataway, NJ, USA, 2025. [Google Scholar] [CrossRef]
Dong, W.; Charikar, M.; Li, K. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India, 28 March–1 April 2011; pp. 577–586. [Google Scholar] [CrossRef]
Bergholm, V.; Izaac, J.; Schuld, M.; Gogolin, C.; Ahmed, S.; Ajith, V.; Alam, M.S.; Alonso-Linaje, G.; AkashNarayanan, B.; Asadi, A.; et al. PennyLane: Automatic Differentiation of Hybrid Quantum-Classical Computations. arXiv 2022, arXiv:1811.04968. [Google Scholar]
Bender, M.A.; Farach-Colton, M. The LCA problem revisited. In Proceedings of the Latin American Symposium on Theoretical Informatics; Springer: Berlin/Heidelberg, Germany, 2000; pp. 88–94. [Google Scholar] [CrossRef]
Liu, Y.; Iter, D.; Xu, Y.; Wang, S.; Xu, R.; Zhu, C. G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023. [Google Scholar] [CrossRef]
Zheng, L.; Chiang, W.L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.P.; et al. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Volume 36. [Google Scholar] [CrossRef]
Wang, J.; Liang, Y.; Meng, F.; Sun, Z.; Shi, H.; Li, Z.; Xu, J.; Qu, J.; Zhou, J. Is ChatGPT a Good NLG Evaluator? A Preliminary Study. In Proceedings of the 4th New Frontiers in Summarization Workshop, Singapore, 6 December 2023. [Google Scholar] [CrossRef]

Figure 1. Prompt used for LLM-as-a-judge coherence evaluation. Responses are collected via structured tool/function calling that emits exactly the four integer dimension scores. Each storyline is evaluated by two judges independently (Claude Sonnet 4.5 and GPT-4o, both at

T = 0

).

Figure 1. Prompt used for LLM-as-a-judge coherence evaluation. Responses are collected via structured tool/function calling that emits exactly the four integer dimension scores. Each storyline is evaluated by two judges independently (Claude Sonnet 4.5 and GPT-4o, both at

T = 0

).

Figure 2. Data re-uploading quantum circuits with

R_{Y}

rotations and CNOT-ring entanglement. (a) Two-qubit circuit; the entangling step is drawn as a single CNOT for clarity, but the

n = 2

ring is degenerate—the sequential sweep and the wrap-around closure act on the same pair of wires, so the implementation applies the back-to-back pair

(q_{1} \to q_{2}, q_{2} \to q_{1})

per layer following the ring-closure rule that the figure draws explicitly at

n \geq 3

. (b) Three-qubit circuit; the third CNOT wraps to close the ring. (c) Generalised n-qubit circuit with the same ring-wrap rule. Each layer applies

R_{Y}

rotations parameterised by n consecutive input features, followed by the CNOT-ring entangler.

Figure 2. Data re-uploading quantum circuits with

R_{Y}

rotations and CNOT-ring entanglement. (a) Two-qubit circuit; the entangling step is drawn as a single CNOT for clarity, but the

n = 2

ring is degenerate—the sequential sweep and the wrap-around closure act on the same pair of wires, so the implementation applies the back-to-back pair

(q_{1} \to q_{2}, q_{2} \to q_{1})

per layer following the ring-closure rule that the figure draws explicitly at

n \geq 3

. (b) Three-qubit circuit; the third CNOT wraps to close the ring. (c) Generalised n-qubit circuit with the same ring-wrap rule. Each layer applies

R_{Y}

rotations parameterised by n consecutive input features, followed by the CNOT-ring entangler.

Figure 3. IQP/ZZ-feature-map encoding at

n = 4

qubits with a single layer:

H^{\otimes n}

rotates each qubit onto the X eigenstates; then, a layer of single-qubit

R Z (z_{i})

rotations imprints each feature, and a ring of

ZZ (z_{i} z_{j})

entanglers couples adjacent qubits using products of features. The kernel is read out via the standard compute–uncompute construction.

Figure 3. IQP/ZZ-feature-map encoding at

n = 4

qubits with a single layer:

H^{\otimes n}

rotates each qubit onto the X eigenstates; then, a layer of single-qubit

R Z (z_{i})

rotations imprints each feature, and a ring of

ZZ (z_{i} z_{j})

entanglers couples adjacent qubits using products of features. The kernel is read out via the standard compute–uncompute construction.

Figure 4. Projected quantum kernel readout. The encoding (left) is the same R_Y+CNOT-ring data re-uploading circuit shown in Figure 2, drawn compactly with an ellipsis between layer 1 and layer L. Per-qubit local Pauli expectations (

〈 X_{i} 〉, 〈 Y_{i} 〉, 〈 Z_{i} 〉

) are assembled into a

3 n

-dim vector (

r (z) \in R^{3 n}

) and fed into a classical RBF kernel (Equation (8)).

Figure 4. Projected quantum kernel readout. The encoding (left) is the same R_Y+CNOT-ring data re-uploading circuit shown in Figure 2, drawn compactly with an ellipsis between layer 1 and layer L. Per-qubit local Pauli expectations (

〈 X_{i} 〉, 〈 Y_{i} 〉, 〈 Z_{i} 〉

) are assembled into a

3 n

-dim vector (

r (z) \in R^{3 n}

) and fed into a classical RBF kernel (Equation (8)).

Figure 5. Wikispeedia metric ranking by cluster for the 12 LLM-evaluated methods (5-seed mean per method). (a) Alignment cluster (nDTW, lower is better; UMAP-projected methods on top). (b) Overlap cluster (Jaccard, higher is better; random projection methods on top). The two rankings disagree: methods at the top of (a) are at the bottom of (b) and vice versa. The same disagreement holds across all 93 methods (Figure S6 of the Supplementary Material).

Figure 6. LLM-judged coherence on Cuban news—multi-seed dual-judge evaluation. (a) Family-level mean overall coherence with 95% bootstrap confidence intervals; both judges shown side by side. Families are split by projection where applicable, so each label states both the kernel ansatz and the projection it operates on. (b) Per-dimension means for the genuinely quantum kernel pool (5 methods) vs. the classical kernel pool on matched projection input (7 methods, including classical_NT); error bars (omitted here for clarity) are reported in Table 4 and Table 5.

Figure 7. Per-method LLM-judged coherence on Cuban news (

n = 150

per (method, judge) cell). Both panels share the per-method y-axis. (a) Overall coherence under each judge, with methods sorted in descending order of the Sonnet 4.5 mean (classical_NT on top); 6–8 range on the 1–10 rubric. The two judges produce strongly correlated rankings (Spearman

ρ = 0.86

); GPT-4o is, on average, ∼0.4–0.5 points more lenient, but the rank order is preserved. (b) Four per-dimension strips (logical flow, thematic consistency, temporal coherence, and narrative completeness) using the same dot-pair style as (a), so per-dimension judge agreement and per-dimension method spread are both directly readable. Each strip’s x-limits are rounded to the nearest 0.5; the differing widths reflect the differing dimension levels (thematic ∼7.6–7.9 (highest); narrative completeness ∼6.0–6.5 (lowest)). Within each dimension, the 12 methods cluster within a ∼0.3-point band, consistent with the muted method-vs-method differences in (a).

Figure 7. Per-method LLM-judged coherence on Cuban news (

n = 150

per (method, judge) cell). Both panels share the per-method y-axis. (a) Overall coherence under each judge, with methods sorted in descending order of the Sonnet 4.5 mean (classical_NT on top); 6–8 range on the 1–10 rubric. The two judges produce strongly correlated rankings (Spearman

ρ = 0.86

); GPT-4o is, on average, ∼0.4–0.5 points more lenient, but the rank order is preserved. (b) Four per-dimension strips (logical flow, thematic consistency, temporal coherence, and narrative completeness) using the same dot-pair style as (a), so per-dimension judge agreement and per-dimension method spread are both directly readable. Each strip’s x-limits are rounded to the nearest 0.5; the differing widths reflect the differing dimension levels (thematic ∼7.6–7.9 (highest); narrative completeness ∼6.0–6.5 (lowest)). Within each dimension, the 12 methods cluster within a ∼0.3-point band, consistent with the muted method-vs-method differences in (a).

Figure 8. Cross-task scatter: each panel plots the Cuba LLM’s mean overall coherence against one Wikispeedia metric, computed for the 12 methods evaluated under both protocols. Spearman

ρ

and p-value are annotated per panel; method colours encode kernel family.

Figure 8. Cross-task scatter: each panel plots the Cuba LLM’s mean overall coherence against one Wikispeedia metric, computed for the 12 methods evaluated under both protocols. Spearman

ρ

and p-value are annotated per panel; method colours encode kernel family.

Table 1. Wikispeedia alignment-cluster metrics on 11,215 human paths (mean ± std across 5 random projection seeds). Length-normalised DTW (nDTW, lower is better) divides the raw DTW distance by alignment length; nDTW similarity is the mean cosine similarity along the alignment; per-step DTW similarity is the average over each transition (length-independent). Methods grouped by family. ↓ means lower is better, ↑ means higher is better, bold represents the best value in each column.

Method	nDTW ↓	nDTW Sim. ↑	DTW Sim./Step ↑
`classical_NT`	$1.452 \pm 0.018$	$0.539 \pm 0.016$	$0.227 \pm 0.002$
`q_umap_4q_4d`	$1.339 \pm 0.031$	$0.634 \pm 0.013$	$0.239 \pm 0.003$
`cos_umap_12d`	$1.352 \pm 0.009$	$0.624 \pm 0.013$	$0.236 \pm 0.004$
`q_random_4q_8d`	$1.726 \pm 0.059$	$0.442 \pm 0.014$	$0.232 \pm 0.003$
`q_random_8q_16d`	$1.726 \pm 0.061$	$0.438 \pm 0.027$	$0.234 \pm 0.005$
`iqpML_random_8q_2L_16d`	$1.722 \pm 0.057$	$0.437 \pm 0.022$	$0.231 \pm 0.004$
`pqk_random_8q_8L_64d`	$1.735 \pm 0.062$	$0.422 \pm 0.026$	$0.229 \pm 0.004$
`cos_random_64d`	$1.645 \pm 0.055$	$0.473 \pm 0.041$	$0.245 \pm 0.004$
`rbf_random_64d`	$1.659 \pm 0.063$	$0.486 \pm 0.010$	$0.244 \pm 0.005$
`cos_raw_1536d`^†	$1.602$	$0.556$	$0.260$

^† Cosine on the raw 1536-dim text-embedding-3-small representation; this is the natural transformer-similarity baseline. RBF on the raw embedding gives the same MST and identical metrics. Deterministic: no projection seed; hence, no ± std.

Table 2. Wikispeedia overlap-cluster metrics (mean ± std across 5 random projection seeds). Jaccard is the set overlap between the extracted path and the human path; F1 is the harmonic mean of precision and recall on those same sets. ↑ means higher is better, bold represents the best value in each column.

Method	Jaccard ↑	F1 ↑
`classical_NT`	$0.116 \pm 0.005$	$0.203 \pm 0.008$
`q_umap_4q_4d`	$0.096 \pm 0.004$	$0.172 \pm 0.006$
`cos_umap_12d`	$0.098 \pm 0.007$	$0.174 \pm 0.011$
`q_random_4q_8d`	$0.138 \pm 0.008$	$0.238 \pm 0.011$
`q_random_8q_16d`	$0.143 \pm 0.011$	$0.246 \pm 0.016$
`iqpML_random_8q_2L_16d`	$0.143 \pm 0.008$	$0.245 \pm 0.012$
`pqk_random_8q_8L_64d`	$0.147 \pm 0.013$	$0.251 \pm 0.019$
`cos_random_12d`	$0.147 \pm 0.009$	$0.252 \pm 0.013$
`rbf_random_16d`	$0.143 \pm 0.011$	$0.246 \pm 0.017$
`cos_raw_1536d`^†	$0.104$	$0.184$

^† Cosine on the raw 1536-dim text-embedding-3-small representation. The transformer-similarity baseline lands in the alignment cluster, between UMAP and random-projection methods on nDTW (1.60), but is at the bottom of the overlap cluster (Jaccard, 0.10; F1, 0.18). The mechanism mirrors classical_NT: high-fidelity coherence on raw embeddings produces longer extracted paths that align well but cover few of the human-path articles.

Table 3. Cuban news LLM-as-a-judge coherence scores (1–10 scale, mean overall coherence over four dimensions). Each cell aggregates over 5 random projection seeds and 30 endpoint pairs (

n = 150

). The 95% confidence intervals are 5000-resample percentile bootstrapped. Methods sorted by Sonnet 4.5 mean.

Table 3. Cuban news LLM-as-a-judge coherence scores (1–10 scale, mean overall coherence over four dimensions). Each cell aggregates over 5 random projection seeds and 30 endpoint pairs (

n = 150

). The 95% confidence intervals are 5000-resample percentile bootstrapped. Methods sorted by Sonnet 4.5 mean.

Method	Sonnet 4.5 (95% CI)	GPT-4o (95% CI)
`classical_NT`	$6.94$ [6.82, 7.04]	$7.39$ [7.27, 7.52]
`q_umap_4q_4d`	$6.88$ [6.79, 6.97]	$7.39$ [7.27, 7.50]
`cos_random_32d`	$6.87$ [6.76, 6.98]	$7.27$ [7.12, 7.41]
`cos_random_12d`	$6.86$ [6.75, 6.96]	$7.26$ [7.12, 7.39]
`rbf_random_32d`	$6.83$ [6.71, 6.96]	$7.25$ [7.12, 7.38]
`q_random_4q_8d`	$6.83$ [6.72, 6.94]	$7.26$ [7.13, 7.39]
`rbf_random_64d`	$6.80$ [6.68, 6.91]	$7.26$ [7.12, 7.39]
`cos_random_64d`	$6.79$ [6.68, 6.90]	$7.28$ [7.15, 7.41]
`q_random_8q_16d`	$6.79$ [6.69, 6.88]	$7.21$ [7.08, 7.34]
`rbf_random_16d`	$6.78$ [6.68, 6.88]	$7.22$ [7.09, 7.35]
`iqpML_random_8q_2L_16d`	$6.76$ [6.66, 6.85]	$7.19$ [7.06, 7.32]
`pqk_random_8q_8L_64d`	$6.72$ [6.61, 6.82]	$7.12$ [6.98, 7.26]

Table 6. Spearman rank correlation between Cuba LLM coherence (mean over Sonnet 4.5 and GPT-4o) and each Wikispeedia metric, computed for the 12 methods evaluated under both protocols.

Wikispeedia Metric	Spearman $ρ$	p-Value
nDTW (lower is better)	$+ 0.70$	$0.012$
nDTW similarity	$+ 0.66$	$0.020$
Jaccard	$- 0.62$	$0.033$
F1	$- 0.61$	$0.036$
DTW similarity per step	$+ 0.12$	$0.71$

Table 7. Computational cost of coherence methods (classical simulation on lightning.qubit, 5 seeds × 10 trials of 100 pairs per configuration; reference platform: 4-core Intel Xeon @ 2.1 GHz, 15 GB RAM, Ubuntu 24.04, Python 3.11, PennyLane 0.44.1, pennylane-lightning 0.44.0). Time is per pairwise coherence computation. Slowdown is relative to the cluster-aware classical_NT baseline.

Configuration	Mean Time (ms)	Slowdown	State Space
Classical kernels
`classical_NT` (angular cos + JSD topic, baseline)	0.038	1×	—
cosine (random-projection 64D)	0.005	0.14×	—
cosine (raw 1536D)	0.008	0.20×	—
RBF (random-projection 64D)	0.002	0.06×	—
RBF (raw 1536D)	0.005	0.12×	—
R_Y+CNOT-ring (compute–uncompute)
2 qubits (avg over ${1, 2, 4, 6, 8}$ layers)	2.16	57×	$2^{2} = 4$
4 qubits (avg)	3.71	99×	$2^{4} = 16$
8 qubits (avg)	6.83	182×	$2^{8} = 256$
8q × 1 layer (8D)	2.34	62×	256
8q × 2 layers (16D)	3.64	97×	256
8q × 4 layers (32D)	6.44	171×	256
8q × 6 layers (48D)	9.27	247×	256
8q × 8 layers (64D)	12.45	331×	256
IQP/ZZFeatureMap (compute–uncompute)
4 qubits (avg over ${1, 2, 4, 8}$ layers)	2.97	79×	16
8 qubits (avg over ${1, 2, 4, 8}$ layers)	5.48	146×	256
8q × 8 layers (64D)	10.86	289×	256
Projected quantum kernel (local Paulis + Gaussian)
4 qubits (avg over ${1, 2}$ layers)	3.51	94×	16
8 qubits (avg over ${1, 2, 8}$ layers)	8.65	230×	256
8q × 8 layers (64D)	15.15	403×	256

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Keith-Norambuena, B.; Canales, J.; Araya, M.; Rojas-Córdova, C.; Meneses-Villegas, C.; Lam-Esquenazi, E.; Flores-Bustos, A. Quantum Kernels for Narrative Coherence: An Application to Path Optimization in Document Graphs for Storyline Extraction. Mathematics 2026, 14, 1734. https://doi.org/10.3390/math14101734

AMA Style

Keith-Norambuena B, Canales J, Araya M, Rojas-Córdova C, Meneses-Villegas C, Lam-Esquenazi E, Flores-Bustos A. Quantum Kernels for Narrative Coherence: An Application to Path Optimization in Document Graphs for Storyline Extraction. Mathematics. 2026; 14(10):1734. https://doi.org/10.3390/math14101734

Chicago/Turabian Style

Keith-Norambuena, Brian, Javiera Canales, Maximiliano Araya, Carolina Rojas-Córdova, Claudio Meneses-Villegas, Elizabeth Lam-Esquenazi, and Angélica Flores-Bustos. 2026. "Quantum Kernels for Narrative Coherence: An Application to Path Optimization in Document Graphs for Storyline Extraction" Mathematics 14, no. 10: 1734. https://doi.org/10.3390/math14101734

APA Style

Keith-Norambuena, B., Canales, J., Araya, M., Rojas-Córdova, C., Meneses-Villegas, C., Lam-Esquenazi, E., & Flores-Bustos, A. (2026). Quantum Kernels for Narrative Coherence: An Application to Path Optimization in Document Graphs for Storyline Extraction. Mathematics, 14(10), 1734. https://doi.org/10.3390/math14101734

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Quantum Kernels for Narrative Coherence: An Application to Path Optimization in Document Graphs for Storyline Extraction

Abstract

1. Introduction

2. Background and Mathematical Foundations

2.1. Mathematical Preliminaries

2.1.1. Maximum-Capacity Path Optimization

2.1.2. Quantum Kernels

2.1.3. Dimensionality Reduction for Quantum Encoding

2.2. Related Work

2.3. Narrative Trails and Bottleneck Optimization

2.4. Quantum Kernels as Similarity Measures

3. Methodology

3.1. Problem Formulation

3.2. Datasets and Evaluation Metrics

3.3. Quantum Coherence Kernels

Two Alternative Kernel Families

3.4. Projection Methods

3.5. Factorial Experimental Design

3.6. Classical Baseline and Path Extraction

3.7. Computational Complexity

4. Results

4.1. Theoretical Characterisation of the Quantum Kernel Family

4.2. Supervised Evaluation: Wikispeedia

4.3. Unsupervised Evaluation: Cuban News

4.4. Statistical Analysis

4.5. Cross-Task Scatter Plots

4.6. Computational Performance

5. Discussion

5.1. Two Wikispeedia Metric Clusters Disagree

5.2. LLM Coherence Rank Aligns with the Alignment Cluster

5.3. Quantum Kernels Are Competitive but Not Decisively Better

5.4. Why classical_NT Wins the LLM Comparison

5.5. Departures from the Original Manuscript Claims

5.6. Limitations

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Notation Summary

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.4. Why `classical_NT` Wins the LLM Comparison