Early Prediction of Convergence Outcomes in Parameterized Root-Finding: A Feature-Based Study

Carpentieri, Bruno; Velichko, Andrei; Shams, Mudassir; Lecca, Paola

doi:10.3390/math14122036

Open AccessArticle

Early Prediction of Convergence Outcomes in Parameterized Root-Finding: A Feature-Based Study

by

Bruno Carpentieri

^1,*

,

Andrei Velichko

²

,

Mudassir Shams

^3,4

and

Paola Lecca

¹

Faculty of Engineering, Free University of Bozen–Bolzano, 39100 Bolzano, Italy

²

Institute of Physics and Technology, Petrozavodsk State University, 185910 Petrozavodsk, Russia

³

Department of Mathematics, Faculty of Arts and Science, Balikesir University, 10145 Balıkesir, Turkey

⁴

Department of Mathematics and Statistics, Riphah International University, I-14, Islamabad 44000, Pakistan

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(12), 2036; https://doi.org/10.3390/math14122036

Submission received: 21 March 2026 / Revised: 12 May 2026 / Accepted: 5 June 2026 / Published: 7 June 2026

(This article belongs to the Special Issue Advancements in Application of Scientific Computing and Numerical Analysis)

Download

Browse Figures

Versions Notes

Abstract

We study early prediction of convergence outcomes in parameterized root-finding problems. The analysis focuses on whether short prefixes of solver trajectories contain useful information about convergence within prescribed iteration horizons, without modifying the underlying numerical scheme. To avoid relying on profile-derived targets, we use solver-level success fractions

Y_{20}

,

Y_{50}

, and

Y_{100}

, defined as the fractions of trajectories that satisfy the convergence criterion within 20, 50, and 100 iterations, respectively. Within this setting, we compare several families of early solver-derived features, including residual-based, step-based, and trajectory-derived quantities. We also distinguish between direct-prefix predictors, which use the first N values of a feature family, and scalar summary predictors, in which the same prefix is compressed into a single descriptor. Numerical experiments on controlled parameterized root-finding benchmarks show that simple residual- and step-based summaries provide the strongest within-base predictors among the tested feature families. In particular, mean and median log residual and log step summaries are informative even at short prefixes. More elaborate trajectory-derived descriptors, including the kNN–LLE proxy, are less effective as standalone predictors in the present experiments. The results also show that prediction difficulty depends strongly on the benchmark structure: Dataset 1 leads to near-saturated prediction performance at very short prefixes, whereas the oscillatory Dataset 2A provides a more challenging and discriminative case. In this latter setting, compact scalar summaries at

N = 10

can occasionally match or slightly outperform the corresponding full-prefix predictors, suggesting that scalar compression may suppress part of the transient variability present in the full-prefix representation. Cross-base transfer is substantially more difficult than within-base prediction, indicating that the observed predictive relationships are not uniformly portable across the considered benchmark problems. Overall, the study suggests that, in controlled parameterized settings of the type examined here, convergence outcomes can often be predicted from simple and computationally inexpensive early trajectory summaries. The results should be interpreted as a feature-based early convergence outcome prediction study, rather than as a general diagnostic methodology for nonlinear solvers.

Keywords:

nonlinear equations; root-finding methods; early prediction; convergence outcomes; solver trajectories; feature-based prediction; data-driven methods

MSC:

65H05; 65P20; 68T09; 37M99

Graphical Abstract

1. Introduction

Robust and reliable root-finding remains a central component of scientific computing, underpinning nonlinear optimization, inverse problems, parameter estimation, and the numerical solution of discretized PDE models [1,2,3]. In many applications, the practical performance of an iterative solver depends not only on the underlying function f and the initialization, but also on algorithmic hyperparameters (e.g., damping, step control, restart policies) and on the sensitivity of the iteration map to perturbations. This sensitivity is especially pronounced in high-order and parameterized iterative schemes, such as multipoint methods, where the convergence dynamics may change significantly with the choice of internal parameters [4]. As a consequence, there is growing interest in methods for assessing solver reliability across parameter domains and in identifying whether useful information about convergence behavior may already be available during the early stages of the iteration process.

Nonlinear solvers are frequently embedded within broader computational workflows, for instance in parameter estimation, inverse problems, multiphysics simulations, and nonlinear model predictive control. Nonlinear models of practical relevance arise in a wide range of applications, including biomedical and physiological modeling, bioheat transfer in therapeutic contexts, and pharmacological therapy optimization [5,6,7,8]. In such settings, as well as in large-scale steady-state computations in dynamical systems models [9], unreliable solver behavior may affect the robustness and reliability of downstream computations, as recently discussed in the context of inference for differential equation models [10]. While the present work focuses on controlled parameterized root-finding benchmarks, these broader settings motivate the study of whether early solver-derived information can provide useful indications about future convergence behavior before the maximum iteration count is reached.

Recent work has explored the integration of data-driven components into numerical algorithms, with the aim of improving robustness, efficiency, and adaptivity [11,12,13]. In the context of nonlinear solvers, these approaches include learned initialization strategies, surrogate-assisted solvers, and neural network-based root-finding methods [14,15,16]. At the same time, recent studies have also emphasized the importance of carefully assessing the scope and validity of data-driven approaches in scientific computing, particularly in relation to benchmark design, baseline comparisons, and the interpretation of predictive performance [17].

A related line of research has focused on trajectory-derived quantities motivated by dynamical systems theory. Since iterative solvers generate sequences of approximations that can be interpreted as discrete dynamical systems, quantities such as Lyapunov exponents and related finite-time indicators provide one possible way to characterize transient sensitivity and local divergence behavior [18,19,20]. These quantities can provide useful qualitative information about solver dynamics, but they are typically constructed from full trajectories and therefore are less naturally suited to early-stage prediction settings.

In earlier work [21], we introduced a contractivity profiling approach based on a kNN-derived proxy of a largest Lyapunov exponent (LLE), enabling the construction of stability landscapes over parameter domains. While that study focused on trajectory-derived stability profiles computed from complete solver histories, it also motivated a related question: to what extent can information extracted from only the early stages of the iteration process be used to anticipate convergence within prescribed iteration horizons?

The present work revisits this limitation and reformulates the underlying question in a more restricted predictive setting. Rather than asking whether a trajectory-derived quantity can itself be reconstructed from partial information, we investigate what information about later convergence outcomes may already be contained in the early stages of the iteration process. In particular, we study whether short prefixes of solver trajectories contain predictive information about convergence behavior at different iteration horizons.

To examine this question, we consider solver-level targets that measure convergence within prescribed iteration horizons. Specifically, for each parameter configuration, we define quantities such as

Y_{20}, Y_{50}, Y_{100},

which measure the fraction of trajectories that converge within a prescribed number of iterations. These targets provide a hierarchy of prediction tasks, ranging from short-horizon to longer-horizon convergence behavior. Compared with our earlier profile-based setting, this formulation avoids predicting a trajectory-derived quantity from its own partial evolution and instead focuses on convergence outcomes that are directly tied to solver behavior.

Within this setting, we analyze several families of early solver-derived features extracted from the initial iterations, including residual-based, step-based, and trajectory-derived quantities. The objective is not to propose a new machine learning methodology, but rather to examine which types of early-time information appear most predictive of convergence outcomes, and how this depends on the feature representation and on the prediction horizon. To further clarify the positioning of the present study with respect to related approaches, Table 1 summarizes the main differences in terms of targets, feature representations, and methodological objectives. In particular, the present study examines how early solver-derived features relate to solver-level convergence outcomes at multiple finite iteration horizons.

Within the considered benchmark setting, the numerical results indicate several recurring trends across the analyzed datasets. Predictive performance is not uniform across iteration horizons, but depends on both the target horizon and the structure of the retained features. In the present experiments, short-horizon targets (e.g.,

Y_{20}

) are more strongly associated with level-type quantities reflecting the current state of the solver trajectory, whereas longer-horizon targets (e.g.,

Y_{50}

and

Y_{100}

) appear to depend more strongly on information related to the evolution of the trajectory, including trends and variability measures. These observations suggest that the predictive content of early solver-derived features is both horizon-dependent and problem-dependent within the considered benchmark problems.

A second observation concerns the relative behavior of different feature families. In the reported experiments, residual-based features generally provide the strongest predictive performance, which is consistent with their direct relation to the convergence quantities used as targets. Step-based features also remain competitive across several settings, while the Lyapunov-based proxy, although theoretically motivated, plays a more limited role as a standalone predictor in the present setting. Taken together, these comparisons help clarify which types of early solver-derived quantities appear most strongly associated with the targets

Y_{H}

in the considered experiments.

The experiments also show substantial variability across problem instances and parameter regimes. Some benchmark settings exhibit near-saturated predictability already at very short prefixes, whereas others remain comparatively challenging even at longer horizons. In addition, cross-problem transfer is consistently more difficult than within-problem prediction, suggesting that predictive relationships learned in one setting may not generalize uniformly across different solver configurations and benchmark problems. These observations highlight both the potential usefulness and the current limitations of early convergence outcome prediction in controlled parameterized settings.

Taken together, the present study should be viewed primarily as a controlled investigation of how early solver-derived features relate to convergence outcomes within prescribed iteration horizons, rather than as a general diagnostic methodology for nonlinear solvers. The goal is to examine, in a simplified benchmark setting, which types of early-time information appear most predictive, how this depends on the prediction horizon, and to what extent compact feature representations retain useful predictive content. Our study is intentionally restricted to controlled scalar benchmark problems, with the aim of isolating and analyzing the structure of early-time predictive information in a simplified setting. Extension to higher-dimensional nonlinear systems, broader solver classes, and application-driven computational settings remains future work.

The main contributions of this work can be summarized as follows:

A reformulation of the prediction setting based on solver-level convergence targets ( $Y_{20}, Y_{50}, Y_{100}$ ), designed to study convergence outcomes rather than proxy-derived trajectory scores.
A multi-horizon analysis of early solver-derived information, examining how predictive behavior changes across short-, intermediate-, and longer-iteration convergence targets.
A comparative study of several feature families, including residual-based, step-based, and trajectory-derived quantities, in order to assess which types of early-time information appear most predictive in the considered benchmark problems.
An empirical evaluation of both within-problem prediction and cross-problem transfer, highlighting the variability of predictive performance across different parameter regimes and benchmark settings.
A comparative predictive analysis based on a simple fixed regression model used as a feature-analysis probe, with the goal of examining the relationship between compact early-time feature representations and convergence outcomes.

The remainder of the paper is organized as follows. Section 2 introduces the considered prediction setting, including the iterative scheme, feature construction, and multi-horizon prediction protocol. Section 3 describes the experimental design and benchmark configuration. Section 4 presents the main numerical results, including horizon-dependent behavior and feature comparisons. Section 5 discusses the interpretation and limitations of the observed predictive trends. Finally, Section 6 concludes the paper and outlines directions for future work.

2. Methodology

This section describes the prediction setting and feature construction procedure used to study early solver-derived information in parameterized root-finding problems. In contrast to the earlier profile-based formulation, the present study focuses on solver-level targets that quantify convergence behavior within fixed iteration counts. The goal is to examine how information contained in short prefixes of solver trajectories relates to convergence outcomes at different iteration horizons.

Figure 1 provides a conceptual overview of the workflow. Solver ensembles are generated over a parameter grid, iteration histories are recorded, and several feature families are constructed from residual, step, and trajectory information. These feature families are then restricted to early prefixes and either used directly as prefix vectors or compressed into single scalar summaries. In the present study, the direct-prefix analyses are summarized at

N \in {5, 10, 15}

, whereas the scalar summary analyses are evaluated at the fixed prefix lengths

N = 3

and

N = 10

. A fixed kNN regression model is then used to estimate solver-level targets, and the resulting performance is analyzed as a function of both the prediction horizon and the type of feature used.

2.1. Problem Setting and Iterative Scheme

We consider nonlinear root-finding problems of the form

f (z) = 0,

(1)

where

f : R \to R

in the benchmark problems considered in this study. More generally, the same type of question may arise for systems

f : R^{d} \to R^{d}

, although such extensions are not investigated here. The computational backbone is a derivative-free, parallel, two-parameter iterative scheme controlled by parameters

(α, β)

. In the present study, the scheme is used as a fixed generator of solver trajectories over the parameter grid, and no modification of the underlying numerical method is introduced. For each fixed

(α, β)

, the solver generates a sequence of iterates

{z_{k}}_{k \geq 0}

starting from an initial guess

z_{0}

.

The objective of this work is not to modify or improve the underlying numerical scheme, but rather to study how quantities extracted from the observed iteration dynamics can be used to predict convergence outcomes at prescribed iteration horizons. Auxiliary validation of the solver itself, including sensitivity to initialization and comparison with existing methods, is reported in Appendix A.1.

2.2. Solver Trajectories and Observable Quantities

Each solver run produces a trajectory

{z_{k}}_{k \geq 0}

, from which we extract observable quantities that summarize the evolution of the iteration. In this study, we consider the following in particular:

Residual-based quantities, when available;
Step-based quantities, such as $∥ z_{k + 1} - z_{k} ∥$ ;
Derived scalar time series constructed from the evolution of these quantities.

One quantity used throughout the analysis is the log step norm

y_{k} = \log (∥ z_{k + 1} - z_{k} ∥), k = 0, 1, \dots, K - 1,

(2)

which provides a compact scalar representation of the local motion of the iterates. Step-based quantities are natural solver observables because they provide a direct measure of the local motion of the iterates and complement residual-based information.

2.3. Solver-Level Targets and Prediction Horizons

To quantify convergence behavior at selected iteration horizons, we define solver-level targets based on fixed iteration counts. For a given parameter configuration

(α, β)

and a set of initializations, we define

Y_{H} = fraction of trajectories that converge within H iterations,

(3)

where

H \in {20, 50, 100}

in the present study.

These quantities provide a hierarchy of prediction tasks:

$Y_{20}$ describes short-horizon convergence behavior;
$Y_{50}$ describes intermediate-horizon convergence behavior;
$Y_{100}$ describes longer-budget convergence behavior.

This formulation avoids the self-referential aspect of predicting a profile-derived quantity from its own partial trajectory, and instead focuses on quantities that are more directly tied to solver performance. It also makes it possible to examine how predictive behavior changes with the iteration horizon.

2.4. Early-Time Feature Construction

To extract information from the initial phase of the solver dynamics, we construct features from short prefixes of the trajectory. For a given prefix length N, we define the feature vector

X_{N} = [x_{1}, x_{2}, \dots, x_{N}],

(4)

where

x_{k}

denotes a scalar observable derived from the solver trajectory, such as a residual-based or step-based quantity. Thus, N is the number of early iterations retained from a given feature family. In the main within-base analysis, the dependence on prefix length is visualized through curves in N; for compact reporting, the direct-prefix summaries are reported at

N \in {5, 10, 15}

, while the scalar summary analyses are reported at

N = 3

and

N = 10

.

We consider multiple families of features:

Residual-based features, which describe the evolution of the residual;
Step-based features, which describe the motion of the iterates;
Trajectory-derived features, including quantities constructed from the sequence ${y_{k}}$ ;
Lyapunov-based proxy features, obtained via a kNN-based approximation of local divergence rates.

The kNN–LLE proxy is constructed by applying a delay embedding to the micro-series

y_{k}

and estimating local divergence through multi-horizon prediction errors. This yields a profile

λ_{1} (t)

, which is used here as a trajectory-derived descriptor related to local stability. In the present study, this proxy is treated as one feature family among several, rather than as the central object of the analysis.

2.5. Feature Families and Scalar Summaries

To make the feature comparison more transparent and to isolate the contribution of different types of early-time information, we additionally compress each prefix

X_{N}

into a single scalar summary

s (X_{N})

. These summaries are used to represent distinct aspects of the first N iterations:

Level: the last value, prefix mean, or prefix median;
Trend: the linear slope, last–first difference, or early–late mean difference;
Variability: the prefix standard deviation or mean absolute first difference;
Curvature: the mean second difference or related quantities.

This decomposition makes it possible to compare two complementary representations: (i) the full-prefix vector

X_{N}

, and (ii) a compressed representation in which only one scalar descriptor

s (X_{N})

is retained. The latter viewpoint helps assess which type of early information—level, trend, variability, or curvature—is most strongly associated with predictive performance in the considered benchmark problems.

2.6. Multi-Horizon Prediction Setting

For each prefix length N and each horizon

H \in {20, 50, 100}

, we define two related prediction tasks:

X_{N} ⟶ Y_{H}, s (X_{N}) ⟶ Y_{H} .

(5)

The first formulation corresponds to the feature-family curves shown later in the paper, whereas the second corresponds to the single-summary predictors.

This multi-horizon setting makes it possible to examine how predictive performance changes as more early trajectory information is retained and how it depends on the temporal scale of the target. In particular, it allows us to compare short-horizon prediction, which may be more strongly associated with the current level of the trajectory, with longer-horizon prediction, where trend and variability information may become more relevant. It also makes the role of the selected early-prefix lengths explicit:

N = 3

and

N = 10

for the scalar summary analyses, and

N = 5

,

N = 10

, and

N = 15

for the aggregate direct-prefix summaries.

2.7. Benchmark Design, Validation Settings, and Evaluation Metrics

The experimental setup is based on a controlled benchmark including multiple datasets, parameter grids, and feature configurations. The main text reports representative within-base results for Dataset 1 and Dataset 2A, while additional multi-dataset summaries are used to assess transfer behavior across the considered benchmark problems.

Within each dataset, performance is evaluated by a random train/test split, providing an estimate of in-distribution predictive accuracy. In addition, we consider cross-base transfer, in which a model trained on one dataset is tested on another. This setting provides a stricter comparison than within-base prediction and is used to examine how strongly the observed predictive relationships depend on the specific benchmark problem.

All predictive experiments use the same regression model family, namely k-nearest neighbors (kNN), combined with standardization and a small grid search for the neighborhood size. Model performance is evaluated by the coefficient of determination (

R^{2}

), mean absolute error (MAE), and root mean squared error (RMSE) for within-base prediction, and by the weighted absolute percentage error (WAPE) for cross-base transfer. The use of a simple regression model is intentional: the goal is not to optimize machine learning performance or to propose a new learning algorithm, but to use a fixed predictive probe for comparing feature representations and assessing the information contained in early solver trajectories.

3. Experimental Setup

This section describes the experimental design used to evaluate the early prediction of convergence outcomes within prescribed iteration horizons. The setup is designed to examine how predictive performance depends on the prefix length, the prediction horizon, and the type of feature used.

3.1. Benchmark Equations, Parameter Grid, and Computational Budget

The controlled benchmark used in the present study is built from two nonlinear equation classes, considered here as controlled root-finding test problems. Dataset 1 uses the polynomial test problem

P (z) = z^{7} + 123 z^{6} - 1293 z^{3} + z^{2} - 1024 = 0,

(6)

treated with the vector-valued modified two-step scheme used in the original solver setting. Dataset 2A uses a scalar oscillatory benchmark of the form

q (x) = x - η \cos (x) = 0,

(7)

with the representative configuration

η = 1.8

and center parameter

x_{0, center} = - 1.2

. In both cases, the solver is controlled by the same method parameters

(α, β)

.

All experiments are performed over a two-parameter control grid

(α, β)

defined by

α \in [- 3, 5], β \in [- 2, 4],

(8)

sampled on a uniform

60 \times 60

grid over the parameter domain.

For each grid point, we generate an ensemble of

N_{runs} = 100

independent solver trajectories corresponding to distinct initializations. Each trajectory is computed for a maximum of

K = 200

iterations, and a fixed random seed is used to ensure reproducibility across all experiments.

3.2. Trajectory Construction and Observables

From each trajectory

{z_{k}}_{k = 0}^{K}

, we extract scalar time series that summarize the evolution of the solver. In particular, we consider the log step norm

y_{k} = \log (∥ z_{k + 1} - z_{k} ∥), k = 0, \dots, K - 1,

(9)

which provides a compact scalar representation of the local motion of the iterates.

To ensure numerical robustness, a log-domain tail floor is applied to prevent instability in extremely small step regimes. In addition, a lightweight stabilization mechanism is used to avoid numerical overflow in strongly divergent cases, while preserving the natural dynamics in non-explosive regimes.

3.3. Definition of Solver-Level Targets

To quantify convergence behavior at selected iteration horizons, we define solver-level targets based on fixed iteration counts. For each parameter configuration

(α, β)

, we define

Y_{H} = \frac{number of trajectories converging within H iterations}{N_{runs}},

(10)

where

H \in {20, 50, 100}

.

These targets provide a hierarchy of prediction tasks:

$Y_{20}$ describes short-horizon convergence behavior;
$Y_{50}$ describes intermediate-horizon convergence behavior;
$Y_{100}$ describes longer-budget convergence behavior.

This formulation allows us to assess whether early-time information is associated with convergence outcomes at different temporal scales.

3.4. Feature Construction and Prefix Definition

For each trajectory, features are constructed from early prefixes of the observed time series. For a given prefix length N, we define

X_{N} = [x_{1}, x_{2}, \dots, x_{N}],

(11)

where

x_{k}

denotes a scalar observable derived from the trajectory, such as a residual-based or step-based quantity. Thus, N is the number of early iterations retained from a given feature family. In the main within-base analysis, the dependence on prefix length is visualized through curves in N; for compact reporting, the direct-prefix summaries use

N \in {5, 10, 15}

, whereas the scalar summary analyses use

N = 3

and

N = 10

.

Two complementary representations are studied. First, the prefix vector

X_{N}

itself is used as the predictor. Second, the same prefix is compressed into a single scalar summary

s (X_{N})

. The latter representation is used to assess which aspect of the first N iterations is most informative when only one scalar descriptor is retained. The candidate summaries include level-type descriptors (e.g., last value, prefix mean, prefix median), trend-type descriptors (e.g., linear slope, last–first difference, early–late mean difference), and variability descriptors (e.g., prefix standard deviation, mean absolute first difference, mean second difference).

As an additional feature source, we also consider a kNN-based proxy for local divergence rates derived from the micro-series

y_{k}

. This proxy is computed using a delay embedding with look-back length

L = 5

, forecast horizons

h \in {1, 2, 3, 4, 5}

, and neighborhood size

k = 3

. The resulting profile

λ_{1} (t)

is used here as a trajectory-derived descriptor related to local stability, and is treated as one feature family among several.

3.5. Multi-Horizon Prediction Protocol

For each prefix length N and each target horizon

H \in {20, 50, 100}

, we define a prediction problem of the form

X_{N} ⟶ Y_{H} or s (X_{N}) ⟶ Y_{H} .

(12)

The first formulation corresponds to the feature-family curves shown later in the paper, whereas the second corresponds to the single-summary predictors. Prefix lengths are scanned as

N \in {1, 1 + Δ_{N}, 1 + 2 Δ_{N}, \dots},

(13)

up to the available trajectory length, allowing us to evaluate how prediction accuracy changes as longer early prefixes are retained. In addition to the full curves, the manuscript explicitly discusses scalar summary analyses at

N = 3

and

N = 10

, together with a direct-prefix reference at

N = 15

, in order to make the notion of “early” information concrete.

3.6. Train/Test Splits and Transfer Settings

Within each dataset, grid points are randomly partitioned into training and test sets with a fixed test fraction of

30 %

, providing an estimate of in-distribution predictive performance. In addition, we consider cross-base transfer, where a model trained on one dataset is tested on another. This setting provides a stricter comparison than within-dataset prediction and is used to examine how strongly the observed predictive relationships depend on the specific benchmark problem.

3.7. Prediction Model and Evaluation Metrics

For each pair

(N, H)

, a k-nearest neighbors (kNN) regressor is trained to predict

Y_{H}

from either the prefix vector

X_{N}

or the scalar summary

s (X_{N})

, depending on the representation being evaluated. Standardization is applied, and a small grid search is used to select the neighborhood size. The use of kNN is not intended as a machine learning contribution; rather, it provides a fixed and simple nonparametric probe for comparing feature representations under the same predictive model.

Performance is evaluated on held-out test data using the mean absolute error (MAE), root mean squared error (RMSE), and coefficient of determination (

R^{2}

) for within-base prediction. For cross-base transfer, we use the weighted absolute percentage error (WAPE), which provides a transparent error measure for comparing transfer behavior across datasets. The analysis focuses not only on aggregate performance, but also on how predictive accuracy depends on the prefix length, the target horizon, and the feature family.

4. Results

4.1. Solver-Level Target Landscapes

We begin by examining the structure of the solver-level targets

Y_{20}, Y_{50}, Y_{100}

, defined as the fraction of trajectories reaching a prescribed residual tolerance within a fixed number of iterations. Unlike profile-derived quantities, these targets are directly tied to convergence outcomes at prescribed iteration horizons.

Figure 2 shows representative heatmaps for Dataset 1 and Dataset 2A. A basic monotonicity property is visible:

Y_{20} \leq Y_{50} \leq Y_{100},

which reflects the expansion of the success region as the allowed number of iterations increases.

Across both datasets, the targets exhibit a nonuniform spatial structure and visible transitions between low- and high-success regions. Dataset 1 displays relatively smooth and well-separated regions, while Dataset 2A shows more irregular and fragmented patterns. These observations indicate that the solver-level targets are not uniform over the parameter domain and therefore provide meaningful prediction tasks within the considered benchmark setting.

4.2. Within-Base Prediction of Solver-Level Targets

We next assess within-base prediction performance for the solver-level targets using early prefixes of the solver trajectories.

Figure 3 shows the prediction performance as a function of the prefix length N for the representative target

Y_{50}

. Within the considered benchmark setting, Dataset 1 behaves as an easier case: several features achieve near-saturated performance already at very short prefixes. Dataset 2A is more challenging, with the best-performing features reaching moderate

R^{2}

values, typically in the range

0.70

–

0.75

.

A compact quantitative summary is reported in Table 2, which aggregates performance over early prefixes (

N \in {5, 10, 15}

). The table shows a relatively stable ranking across datasets and targets, with the strongest predictors among the considered features coming from residual and step summaries, particularly the mean and median log residual, and mean and median log step. It is important, however, to distinguish the two complementary prediction settings used throughout the paper. In the continuous curves of Figure 3, the predictor is the full-prefix vector

X_{N}

, so increasing N means retaining more coordinates from the same early trajectory. By contrast, in the fixed-prefix scalar summary analyses reported below, the same prefix is compressed into a single scalar summary before regression. Thus, the curves quantify the performance obtained from the full early prefix, whereas the scalar tables quantify how much of that information is retained under a one-scalar compression. To make this distinction explicit, we provide fixed-prefix tables for single-summary predictors at

N = 3

and

N = 10

, together with a direct comparison against the best full-prefix models and the continuous within-base curves that extend to

N = 15

.

Median-based summaries perform particularly well in the more irregular Dataset 2A, while mean-based summaries remain competitive in the smoother Dataset 1. By contrast, the Lyapunov-based feature achieves lower

R^{2}

values than the leading residual- and step-based summaries in both benchmark cases.

To clarify what is meant by a single-summary predictor, Table 3 and Table 4 report the best standalone scalar descriptor for the representative target

Y_{50}

at two fixed early-prefix lengths,

N = 3

and

N = 10

. In each entry, the first N values of one feature family are compressed into a single scalar, such as the last value, a last–first difference, a linear slope, or a variability measure, and this scalar alone is used as the input to the kNN regressor. This makes the construction of the one-scalar predictors explicit.

The fixed-prefix view highlights three observations. First, strong performance is already observed at

N = 3

, especially in Dataset 1, indicating that very short prefixes can be highly informative in the easier benchmark regime. Second, moving from

N = 3

to

N = 10

improves and stabilizes the rankings in the more challenging Dataset 2A, while Figure 3 indicates that extending the prefix further to

N = 15

yields only incremental additional gains for the leading direct-prefix models. Third, Table 5 and Table 6 make the distinction between the two representations concrete: in Dataset 1, compressing the prefix to a single scalar at

N = 3

already retains most of the predictive signal, whereas in Dataset 2A the best scalar summary at

N = 10

is slightly stronger than the best direct-prefix result. This latter observation suggests that, in the oscillatory benchmark, the full prefix may contain transient variability that is not always beneficial for prediction, and that scalar compression can sometimes suppress part of this variability. Overall, these comparisons help quantify how much predictive information is available at each early-prefix length and how much of it can be retained under a one-scalar representation.

4.3. Feature-Family Comparison and Early-Time Information

Table 2 indicates that much of the predictive signal in the considered experiments is captured by low-dimensional summaries of the early trajectory. These features can be interpreted as describing the following:

The level of the residual (mean log residual);
The robust level of the residual (median log residual);
The early step behavior (step summaries);
The variability of the early trajectory (standard deviation features).

In the reported experiments, level and robust-level descriptors are among the most informative predictors. In particular, median-based summaries rank among the strongest predictors in the more heterogeneous or oscillatory settings considered here.

This interpretation is further supported by the fixed-prefix scalar summary analyses in Table 3 and Table 4, by the direct-prefix versus scalar comparison in Table 5 and Table 6, by the visual comparison in Figure 4, and by the broader synthesis summarized in Table 7. Across the considered targets, level-based and trend-based features are most frequently selected among the best-performing summaries, while higher-order or curvature-based descriptors play a more limited role. Lyapunov-based features also fall into this weaker category in the present benchmark setting, suggesting that the corresponding local-sensitivity descriptor is less effective as a standalone predictor of the targets

Y_{H}

considered here.

4.4. Cross-Base Transfer

We now consider cross-base transfer, where the fixed kNN regression model is trained on one dataset and tested on another. This provides a stricter comparison than within-base prediction and helps assess how sensitive the observed predictive relationships are to the specific benchmark problem.

A representative two-dataset transfer summary at

N = 15

is provided in Table 8. The median log residual gives lower transfer errors than the Lyapunov feature for several of the displayed train/test combinations, although the transfer remains asymmetric between the two datasets. In particular, transfer from Dataset 2A to Dataset 1 is more difficult than transfer in the reverse direction for the retained features. Table 9 then reports a compact two-dataset mean summary for the representative short-horizon target

Y_{20}

, averaging over the

2 \times 2

train/test combinations involving Dataset 1 and Dataset 2A. Cross-base prediction is more difficult than within-base prediction, with all features showing increased error. Nevertheless, some differences between feature families remain visible: median residual and median step features yield the lowest average WAPE values among the retained summaries, whereas the Lyapunov feature results in larger transfer errors.

The values in Table 9 should be interpreted as descriptive aggregate transfer errors rather than as formal statistical significance estimates. Since the table reports mean WAPE values over the

2 \times 2

source/target dataset pairs, lower values indicate a lower average transfer error. The relative behavior of a feature family should therefore be assessed primarily by the stability of its ranking across the early-prefix lengths

N = 5, 10, 15

, rather than by a single isolated value. The table also shows that different feature families have different sensitivity to the prefix length: median-based summaries tend to improve as N increases, whereas mean-based summaries are more stable across the displayed prefix lengths. Finally, these transfer values are influenced by the structure of the underlying nonlinear problem and by the role of each dataset as source or target, which helps account for the gap between cross-base and within-base prediction performance.

Overall, the transfer results suggest that the predictive relationships learned from early solver-derived features are not uniformly portable across the considered benchmark problems. Residual- and step-based summaries retain the lowest average transfer errors among the tested feature families, while the Lyapunov-based feature results in larger errors and stronger variation across dataset pairs. This indicates that, in the present experiments, the Lyapunov-based descriptor is less effective as a standalone transferable feature for predicting the targets

Y_{H}

.

4.5. Summary of Main Findings

The main observations from the results can be summarized as follows:

The solver-level targets $Y_{20}, Y_{50}, Y_{100}$ define nonuniform convergence targets over the parameter domain, as shown by the heatmaps in Figure 2.
Within-dataset prediction can be accurate in the considered benchmarks, particularly for Dataset 1, but prediction becomes more difficult under cross-base transfer (Figure 3 and Table 9).
Among the tested feature families, residual and step summaries provide the strongest within-base predictors in the reported experiments (Table 2).
Median-based summaries are particularly competitive in the more irregular oscillatory benchmark, while mean residual summaries become more relevant for longer-horizon targets.
The Lyapunov-based feature has a weaker standalone predictive performance than the leading residual- and step-based summaries in the considered experiments.
Much of the observed predictive signal is captured by low-dimensional descriptors of the early trajectory, as is also indicated by the scalar prefix analysis in Table 7.

5. Discussion

The revised results provide a more focused view of how early solver-derived information relates to convergence outcomes at prescribed iteration horizons in the considered parameterized root-finding benchmarks. By replacing the original proxy-based target with solver-level success fractions

Y_{20}

,

Y_{50}

, and

Y_{100}

, the analysis shifts from predicting a trajectory-derived quantity to examining which types of early information are associated with convergence outcomes at different iteration horizons.

Structure of early predictive information.

Across the considered datasets and targets, the strongest predictors are generally simple scalar summaries of the residual and step sequences, particularly mean and median log residual and log step features (Table 2). The fixed-prefix scalar summary analyses (Table 3, Table 4, Table 5 and Table 6, and Figure 4) make this observation more explicit: a single scalar extracted from the first

N = 3

or

N = 10

iterations can already provide substantial predictive information, and, in the strongest cases, the continuous curves of Figure 3 show only limited additional improvement when extending the direct prefix to

N = 15

.

Within the considered experiments, the relative importance of feature families appears to depend on the target horizon: shorter horizons are more closely associated with early trajectory level, whereas longer horizons appear to benefit from information about the evolution of the trajectory. These observations suggest that early prediction in this benchmark setting is strongly influenced by level information, namely the magnitude and initial decay behavior of the residual. By contrast, variability measures and Lyapunov-based features provide a weaker standalone predictive performance in the present experiments. This indicates that, for the targets considered here, much of the useful predictive signal is captured by relatively simple early trajectory summaries rather than by more elaborate trajectory-derived descriptors.

Dataset-dependent regimes.

The results also show a visible distinction between easier and more challenging prediction regimes within the considered benchmarks. In Dataset 1, the prediction problem is close to saturated: leading features achieve near-perfect

R^{2}

values at very short prefixes (Figure 3, left panel). By contrast, Dataset 2A provides a more challenging case, where prediction performance remains moderate and feature rankings become more discriminative (Figure 3, right panel). The contrast between Dataset 1 and Dataset 2A indicates that predictive difficulty is strongly problem-dependent, even within controlled scalar benchmarks. This supports the importance of including sufficiently heterogeneous benchmark settings when evaluating feature-based early prediction approaches.

Role of robust summaries.

A recurring observation in the reported experiments is the strong performance of median-based summaries, especially for shorter horizons such as

Y_{20}

(Table 2 and Table 7). The fixed-prefix tables further show that, in the more irregular Dataset 2A, simple level-type summaries remain competitive already at

N = 3

, whereas increasing the prefix length to

N = 10

stabilizes the ranking and allows a variability-based step summary to slightly outperform the direct-prefix baseline. Compared with mean-based features, median statistics appear to provide a more stable characterization of early dynamics in the presence of oscillatory or heterogeneous trajectories. As the target horizon increases, a gradual shift toward mean-based residual summaries is observed, suggesting that longer-horizon prediction is more closely associated with the overall trajectory level than with very early transient behavior.

Cross-base transfer.

The cross-base transfer results (Table 8 and Table 9) provide a stricter comparison than within-base prediction. In this setting, residual and step summaries yield the lowest transfer errors among the tested feature families, with the median residual and median step features achieving the lowest WAPE values in the representative two-dataset screen for

Y_{20}

. In contrast, the Lyapunov-based feature results in larger and less stable transfer errors, suggesting that it is less effective as a standalone transferable descriptor in the present benchmark setting. The two-dataset transfer matrices also reveal an asymmetry between source and target datasets: some configurations are more difficult as targets than as sources, indicating that the observed predictive relationships do not transfer uniformly across the considered parameter regimes.

Implications for early convergence outcome prediction.

From a practical perspective, the results suggest that, in controlled parameterized settings of the type considered here, useful early convergence outcome predictors can be built from relatively simple solver-derived quantities. In particular, residual- and step-based summaries are inexpensive to compute and provide transparent information about the early state and evolution of the iteration. This supports the use of simple feature representations as a first screening tool for parameter configurations, without requiring more complex trajectory-derived indicators as standalone predictors.

A natural scenario is repeated parameter exploration, where many related nonlinear solves must be executed over a grid or during calibration loops. In such settings, an early prediction layer could be used to flag parameter configurations that appear likely to converge within a prescribed number of iterations, or conversely configurations that may require further attention before the maximum iteration count is reached. Similar ideas may also be relevant in repeated steady-state computations or inverse-problem workflows, where the same solver is called many times under changing parameters. These examples should be interpreted as plausible application scenarios rather than validated application claims for the present benchmark, and direct domain-specific validation remains an important topic for future work.

Limitations and outlook.

The present study is intentionally limited to controlled scalar benchmark problems and to a fixed parameterized root-finding setting. This restriction makes it possible to isolate feature-level effects, but it also means that the results should not be interpreted as establishing a general diagnostic methodology for nonlinear solvers. In particular, the present analysis does not identify root causes of solver failure, such as ill-conditioning, stiffness, bifurcation effects, or basin boundary phenomena; rather, it studies the early prediction of convergence within prescribed iteration horizons from solver-derived features.

The study also focuses on single-feature predictors and on a simple fixed regression model used as a predictive probe. This provides a controlled setting for comparing feature representations, but it does not address the potential benefits of feature combinations, alternative learning models, or adaptive decision rules. In addition, although the cross-base experiments provide a first indication of transfer behavior within the considered benchmarks, broader validation across different equation classes, higher-dimensional nonlinear systems, PDE-based problems, and solver families remains an important direction for future work. Extending the analysis to more heterogeneous problem settings may further clarify when early solver-derived information can support practical solver selection or parameter screening.

Summary of main observations.

In the considered benchmarks, early prediction is strongly associated with simple level-based features, especially mean and median residual and step summaries.
The fixed-prefix scalar summary analyses at $N = 3$ and $N = 10$ indicate that substantial predictive information can already be present at very short prefixes, with only modest additional gains by $N = 15$ for the leading direct-prefix families.
Dataset structure has a strong effect on predictive difficulty, with the oscillatory benchmark providing a more discriminative test case than the smoother polynomial benchmark.
Median-based summaries are particularly competitive at shorter horizons, while mean residual summaries become more relevant for longer-horizon targets.
Residual- and step-based summaries show lower transfer errors than the Lyapunov-based feature in the considered cross-base experiments.
In the more irregular oscillatory benchmark, compact scalar summaries can occasionally match or slightly outperform the full-prefix representation, suggesting that scalar compression may suppress part of the transient variability present in the full-prefix.
In controlled parameterized settings of the type studied here, simple and computationally inexpensive solver-derived features can provide useful early predictors of convergence outcomes.

6. Conclusions

This work studied the early prediction of convergence outcomes in a controlled parameterized root-finding setting. The analysis was based on solver-level targets defined as success fractions at prescribed iteration horizons, namely

Y_{20}

,

Y_{50}

, and

Y_{100}

. This formulation shifts the focus from predicting a profile-derived score to examining how early solver-derived features relate to convergence outcomes at different iteration horizons.

The results show that, within the considered benchmarks, simple scalar summaries of the residual and step sequences are among the most informative predictors. In particular, mean and median log residual and log step features provide strong within-base performance in the reported experiments. The fixed-prefix analysis at

N = 3

and

N = 10

, together with the continuous curves extending to

N = 15

, indicates that substantial predictive information can already be present in very short prefixes. In the more irregular oscillatory benchmark, compact single-summary predictors can occasionally match or slightly outperform the corresponding full-prefix models, suggesting that scalar compression may suppress part of the transient variability present in the full-prefix representation. By contrast, more elaborate trajectory-derived descriptors, including Lyapunov-based features, provide weaker standalone predictive performance in the present experiments. Overall, these findings suggest that, for the targets considered here, much of the useful predictive signal is captured by relatively simple early trajectory summaries.

The analysis also highlights the importance of benchmark structure. While some configurations lead to nearly saturated prediction performance, the more irregular oscillatory benchmark exposes clearer differences between feature families and provides a more discriminative evaluation setting. In addition, the cross-base transfer experiments show that out-of-dataset prediction is more difficult than within-dataset prediction. Residual- and step-based summaries achieve the lowest transfer errors among the tested feature families, whereas the Lyapunov-based feature degrades more substantially. This underscores the gap between within-base predictability and transfer across the considered benchmark problems.

From a practical standpoint, these findings suggest that, in controlled parameterized settings of the type studied here, simple solver-derived features can provide useful early predictors of convergence outcomes. Such features are computationally inexpensive and may support the preliminary screening of parameter configurations before the maximum iteration count is reached. This possible use should be interpreted as a direction for further investigation rather than as a validated deployment claim.

Future work will extend the present analysis in several directions, including the study of multi-feature models, alternative predictive probes, broader cross-problem validation, and adaptive decision rules. A key next step is to assess whether the observed feature rankings and early prediction behavior persist for higher-dimensional nonlinear systems, PDE-based problems, and broader solver families. Another important direction is to move beyond convergence outcome prediction toward diagnostic analyses that can identify the possible causes of solver failure, such as ill-conditioning, stiffness, bifurcation effects, or basin boundary phenomena.

Author Contributions

Conceptualization, B.C., A.V. and M.S.; Methodology, B.C., A.V. and M.S.; Software, A.V. and M.S.; Validation, B.C., A.V. and M.S.; Formal analysis, B.C.; Investigation, B.C., A.V. and M.S.; Data curation, A.V. and M.S.; Writing—original draft preparation, B.C., A.V. and M.S.; Writing—review and editing, B.C., A.V. and P.L.; Visualization, A.V., M.S. and P.L.; Supervision, B.C. and A.V.; Project administration, B.C. All authors have read and agreed to the published version of the manuscript.

Funding

The work reported in Section 3 and Section 4 was supported by the Russian Science Foundation (grant no. 22-11-00055-P, https://rscf.ru/en/project/22-11-00055/, accessed on 10 June 2025) (Andrei Velichko). The material reported in Section 1 and Section 5 was supported by the European Regional Development and Cohesion Funds (ERDF) 2021–2027 under Project AI4AM—EFRE1052 (Bruno Carpentieri). Bruno Carpentieri is also a member of the Gruppo Nazionale per il Calcolo Scientifico (GNCS) of the Istituto Nazionale di Alta Matematica (INdAM).

Data Availability Statement

The data and code supporting the findings of this study are publicly available on Zenodo at https://doi.org/10.5281/zenodo.19023064.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
CPU	Central Processing Unit
kNN	k-Nearest Neighbors
LLE	Largest Lyapunov Exponent
MAE	Mean Absolute Error
ML	Machine Learning
ODE	Ordinary Differential Equation
PDE	Partial Differential Equation
RMSE	Root Mean Squared Error

Appendix A. Supplementary Results and Solver Validation

This appendix provides complementary material supporting the early-time prediction analysis, and reports auxiliary information on the numerical behavior of the underlying solver.

Appendix A.1. Auxiliary Solver Validation and Sensitivity to Initialization

We report auxiliary solver-level validation results documenting the numerical behavior of the underlying parallel scheme. While these results are not part of the feature-based prediction study itself, they provide additional context on the convergence behavior of the solver used to generate the trajectories analyzed in the main text.

All experiments are performed at the representative parameter pair

(α, β) = (- 0.1, 4.0)

, selected from a stable region of the parameter domain.

To examine sensitivity with respect to initialization, we consider three representative strategies:

Near-root:

$z_{k}^{(0)} = α_{k} + 10^{- 2} ε_{k}, ε_{k} \in [- 1, 1], k = 1, \dots, 7 .$
Moderate:

$z_{k}^{(0)} = r e^{\frac{2 π i (k - 1)}{7}}, r = 5, k = 1, \dots, 7 .$
Random:

$z_{k}^{(0)} \sim U ([- 10, 10] + i [- 10, 10]), k = 1, \dots, 7 .$

The convergence behavior of simultaneous root-finding methods depends on the quality of the initial approximations. To compare the behavior under different starting conditions, we use the three initialization strategies defined above and monitor the resulting error dynamics. For each case, we record the maximum per-root error at iteration k,

E^{(k)} = \max_{1 \leq i \leq 7} |z_{i}^{(k)} - ξ_{i}| .

(A1)

The convergence histories are shown in Figure A1, while Table A1 summarizes the early-iteration error decay.

Figure A1. Early-iteration convergence of the proposed parallel scheme under different initialization scenarios, showing

\log_{10} (E^{(k)})

versus the iteration index. Faster entry into the asymptotic regime is observed for near-root initialization, while moderate and random initializations still exhibit rapid error decay in this representative setting.

Figure A1. Early-iteration convergence of the proposed parallel scheme under different initialization scenarios, showing

\log_{10} (E^{(k)})

versus the iteration index. Faster entry into the asymptotic regime is observed for near-root initialization, while moderate and random initializations still exhibit rapid error decay in this representative setting.

Table A1. Early-iteration error behavior of the proposed scheme under different initialization scenarios.

Iteration k	Near-Root Init.	Moderate Init.	Random Init.	Order (Approx.)
1	$1.26 \times 10^{- 2}$	$3.35 \times 10^{- 1}$	$8.44 \times 10^{- 1}$	–
2	$2.86 \times 10^{- 7}$	$5.71 \times 10^{- 5}$	$3.47 \times 10^{- 4}$	3.92
3	$4.66 \times 10^{- 21}$	$7.44 \times 10^{- 9}$	$9.78 \times 10^{- 10}$	4.15
4	$2.16 \times 10^{- 59}$	$2.43 \times 10^{- 17}$	$4.72 \times 10^{- 27}$	4.97
5	$6.38 \times 10^{- 75}$	$8.04 \times 10^{- 49}$	$7.01 \times 10^{- 25}$	5.67
6	$9.17 \times 10^{- 191}$	$0.21 \times 10^{- 160}$	$3.47 \times 10^{- 90}$	5.97

The results show that closer initial guesses lead to faster entry into the asymptotic high-order convergence regime. At the same time, moderate and random initializations also exhibit rapid error decay within a few iterations in this representative setting. This behavior provides a consistent numerical basis for analyzing early solver trajectories in the feature-based prediction study.

Comparison with Existing Parallel Schemes

For completeness, we report a compact comparison between the proposed scheme and representative parallel methods from the literature [22,23]. Table A2 summarizes the number of iterations to convergence, CPU time, memory usage, and maximum error, while Table A3 reports the theoretical and observed convergence orders.

Table A2. Computational performance of the proposed solver compared with existing parallel schemes.

Method	Iterations	CPU Time (s)	Memory (MB)	Maximum Error
Proposed Scheme [21]	5	0.012	123.65	$2.05 \times 10^{- 189}$
Existing Scheme [22]	11	0.018	256.76	$5.14 \times 10^{- 45}$
Existing Scheme [23]	10	0.015	254.89	$0.77 \times 10^{- 37}$

Table A3. Theoretical and observed convergence orders of the compared methods.

Method	Order (Theoretical)	Order (Observed)
Proposed Scheme [21]	6	5.97
Existing Scheme [22]	6	4.56
Existing Scheme [23]	6	3.19

In this representative comparison, the proposed scheme reaches the prescribed tolerance in fewer iterations and achieves smaller final errors than the comparison methods, while maintaining an observed convergence order close to the theoretical sixth order. These auxiliary results provide additional context on the numerical behavior of the solver used to generate the trajectory data analyzed in the main manuscript.

Appendix A.2. Additional Prediction Results (Supplementary)

Structure of solver-level targets.

To illustrate the structure of the solver-level targets, we report representative heatmaps of the success fraction quantities

Y_{20}, Y_{50}, and Y_{100}

across selected benchmark datasets.

A basic property of these targets is the monotonicity

Y_{20} \leq Y_{50} \leq Y_{100},

which follows from the fact that increasing the allowed number of iterations can only increase the fraction of trajectories that satisfy the convergence criterion. This behavior is reflected in the heatmaps:

Y_{20}

highlights regions of rapid convergence, whereas

Y_{100}

yields broader regions of high success.

Beyond this monotonic structure, it is also useful to examine how the three horizons differ across datasets. In this sense, the targets are not redundant:

Y_{20}

captures stricter early convergence,

Y_{50}

represents an intermediate regime, and

Y_{100}

describes a more permissive notion of solver success. The progression across these targets provides a compact visualization of how convergence regions expand as the allowed number of iterations increases.

Figure A2 and Figure A3 reproduce the solver target heatmaps shown in the main text and are included here for completeness. In Dataset 1, the spatial structure is relatively regular and reflects the underlying parameter sweep

(α, β)

. By contrast, Dataset 2A exhibits more irregular patterns, suggesting a stronger sensitivity to the problem dynamics. Taken together, these visualizations indicate that the resulting prediction tasks are benchmark-dependent and nonuniform over the parameter domain.

Figure A2. Solver fraction heatmaps for Dataset 1 (polynomial benchmark). The axes correspond to the method parameters

(α, β)

, and the panels illustrate increasing iteration horizons.

Figure A2. Solver fraction heatmaps for Dataset 1 (polynomial benchmark). The axes correspond to the method parameters

(α, β)

, and the panels illustrate increasing iteration horizons.

Figure A3. Solver fraction heatmaps for Dataset 2A (oscillatory benchmark). Compared with Dataset 1, the structure is more irregular, providing a more challenging benchmark within the considered study.

Overall, these visualizations indicate that the solver-level targets are nonuniform over the parameter domains and provide a meaningful basis for the prediction analysis developed in the main text.

Prediction behavior in a more challenging regime.

To complement the within-base results reported in the main text, we briefly emphasize the behavior observed in Dataset 2A, which provides a more challenging prediction setting within the considered benchmarks. In contrast to Dataset 1, where prediction performance is nearly saturated at very short prefix lengths, Dataset 2A exhibits a more gradual increase in accuracy as the prefix length grows.

Within this setting, residual- and step-based summaries are the most informative among the tested feature families, achieving moderate

R^{2}

values, while Lyapunov-based descriptors are less effective as standalone predictors. This indicates that Dataset 2A provides a more discriminative setting for comparing feature families than the smoother Dataset 1 benchmark.

These observations are consistent with the trends reported in the main text and support the use of Dataset 2A as a representative challenging case for assessing early-time predictive performance within the scope of the present study.

Additional transfer results.

To further summarize cross-base transfer behavior within the considered benchmarks, we report a compact summary based on the mean WAPE over the train/test dataset pairs.

Table A4. Cross-base transfer performance (mean WAPE). Lower values indicate lower average transfer error.

Feature	$N = 5$	$N = 10$	$N = 15$
Mean log residual	1.009	1.055	1.077
Mean log step	1.069	1.032	1.034
Median log residual	1.220	0.787	0.737
Median log step	1.062	0.808	0.767
Lyapunov raw	–	1.740	1.796

The results indicate that cross-base transfer is more challenging than within-base prediction in the considered benchmarks. Residual- and step-based summaries achieve lower average transfer errors than the Lyapunov-based descriptor, which is less effective as a standalone transferable feature in this setting.

Scalar prefix analysis (summary).

Additional scalar prefix analyses provide further insight into how different types of early-time information relate to convergence outcomes at different iteration horizons. In particular, the results suggest that short-horizon targets are strongly associated with the current level of the early trajectory, as captured by simple residual-based summaries.

As the prediction horizon increases, information related to the evolution of the trajectory, including trend- and step-based summaries, appears to become more relevant. This shift from level-based to more evolution-sensitive summaries is consistent with the horizon-dependent behavior discussed in the main text.

These observations help explain why simple statistical features perform well in the considered benchmarks and relate the scalar prefix results to the broader feature-family comparisons reported in the main text.

References

Ortega, J.M.; Rheinboldt, W.C. Iterative Solution of Nonlinear Equations in Several Variables; Classics in Applied Mathematics; SIAM: Philadelphia, PA, USA, 2000; Volume 30. [Google Scholar] [CrossRef]
Kelley, T.C. Solving Nonlinear Equations with Newton’s Method; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2003. [Google Scholar] [CrossRef]
Dennis, J.E.; Schnabel, R.B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1996. [Google Scholar] [CrossRef]
Petković, M.S.; Neta, B.; Petković, L.D.; Džunić, J. Multipoint Methods for Solving Nonlinear Equations: A Survey. Appl. Math. Comput. 2014, 226, 635–660. [Google Scholar] [CrossRef]
Keener, J.; Sneyd, J. Mathematical Physiology I: Cellular Physiology, 2nd ed.; Interdisciplinary Applied Mathematics; Springer: New York, NY, USA, 2009; Volume 8. [Google Scholar] [CrossRef]
Quarteroni, A.; Sacco, R.; Saleri, F. Numerical Mathematics, 2nd ed.; Texts in Applied Mathematics; Springer: Berlin/Heidelberg, Germany, 2007; Volume 37. [Google Scholar] [CrossRef]
Kumar, D.; Ansari, F.; Singh, J. Computational Analysis of Nonlinear Bioheat Transfer Equation of Multilayered Brain Tissue with Porosity during Tumor Thermal Therapy. Eur. Phys. J. Plus 2026, 141, 437. [Google Scholar] [CrossRef]
Pannocchia, G.; Landi, A.; Laurino, M. On the Use of Nonlinear Model Predictive Control for Pharmacological Therapy Optimization. In Proceedings of the 2010 IEEE Workshop on Health Care Management (WHCM), Venice, Italy, 18–20 February 2010; pp. 1–6. [Google Scholar] [CrossRef]
Lines, G.T.; Paszkowski, Ł.; Schmiester, L.; Weindl, D.; Stapor, P.; Hasenauer, J. Efficient Computation of Steady States in Large-Scale ODE Models of Biochemical Reaction Networks. IFAC-PapersOnLine 2019, 52, 32–37. [Google Scholar] [CrossRef]
Creswell, R.; Shepherd, K.M.; Lambert, B.; Mirams, G.R.; Lei, C.L.; Tavener, S.; Robinson, M.; Gavaghan, D.J. Understanding the Impact of Numerical Solvers on Inference for Differential Equation Models. J. R. Soc. Interface 2024, 21, 20230369. [Google Scholar] [CrossRef] [PubMed]
Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-Informed Machine Learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Lu, L.; Jin, P.; Pang, G.; Zhang, Z.; Karniadakis, G.E. Learning Nonlinear Operators via DeepONet Based on the Universal Approximation Theorem of Operators. Nat. Mach. Intell. 2021, 3, 218–229. [Google Scholar] [CrossRef]
Freitas, D.; Guerreiro Lopes, L.; Morgado-Dias, F. A Neural Network-Based Approach for Approximating Arbitrary Roots of Polynomials. Mathematics 2021, 9, 317. [Google Scholar] [CrossRef]
Zuo, Q.; Fan, H.; Xiao, L.; Liu, Z. Two Novel Cold-Start Multistage Neural Solvers for Constrained Nonlinear Equations with Extended Time Horizons. Neural Netw. 2025, 192, 107917. [Google Scholar] [CrossRef] [PubMed]
Hutter, F.; Kotthoff, L.; Vanschoren, J. (Eds.) Automated Machine Learning: Methods, Systems, Challenges; The Springer Series on Challenges in Machine Learning; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
McGreivy, N.; Hakim, A. Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations. Nat. Mach. Intell. 2024, 6, 1256–1269. [Google Scholar] [CrossRef]
Wolf, A.; Swift, J.B.; Swinney, H.L.; Vastano, J.A. Determining Lyapunov Exponents from a Time Series. Phys. D Nonlinear Phenom. 1985, 16, 285–317. [Google Scholar] [CrossRef]
Rosenstein, M.T.; Collins, J.J.; De Luca, C.J. A practical method for calculating largest Lyapunov exponents from small data sets. Phys. D Nonlinear Phenom. 1993, 65, 117–134. [Google Scholar] [CrossRef]
Kantz, H. A robust method to estimate the maximal Lyapunov exponent of a time series. Phys. Lett. A 1994, 185, 77–87. [Google Scholar] [CrossRef]
Shams, M.; Velichko, A.; Carpentieri, B. Finite-Time Contractivity Profiling of a Two-Parameter Parallel Root-Finding Scheme via a kNN–LLE Proxy. Mathematics 2026, 14, 879. [Google Scholar] [CrossRef]
Machado, R.N.; Lopes, L.G. A Family of Ehrlich-Type Accelerated Methods with King’s Correction for the Simultaneous Approximation of Polynomial Complex Zeros. Glob. J. Pure Appl. Math. 2019, 15, 789–802. [Google Scholar]
Petković, M.S.; Petković, L.D.; Džunić, J. On an Efficient Method for the Simultaneous Approximation of Polynomial Multiple Roots. Appl. Anal. Discret. Math. 2014, 8, 73–94. [Google Scholar] [CrossRef]

Figure 1. Conceptual pipeline of the feature-based early convergence prediction workflow. Solver ensembles are generated over a parameter grid, producing iteration histories from which multiple feature families are extracted, including residual-based, step-based, and kNN–LLE trajectory-derived descriptors. These families are then restricted to early prefixes and can be interpreted either as prefix vectors or as single scalar summaries. In the present study, the scalar summary analyses focus on

N = 3

and

N = 10

, while the direct-prefix comparisons are summarized at

N = 5

,

N = 10

, and

N = 15

. The resulting predictors are used to estimate level targets

Y_{H}

, with

H \in {20, 50, 100}

, and are evaluated through within-base and cross-base validation.

Figure 1. Conceptual pipeline of the feature-based early convergence prediction workflow. Solver ensembles are generated over a parameter grid, producing iteration histories from which multiple feature families are extracted, including residual-based, step-based, and kNN–LLE trajectory-derived descriptors. These families are then restricted to early prefixes and can be interpreted either as prefix vectors or as single scalar summaries. In the present study, the scalar summary analyses focus on

N = 3

and

N = 10

, while the direct-prefix comparisons are summarized at

N = 5

,

N = 10

, and

N = 15

. The resulting predictors are used to estimate level targets

Y_{H}

, with

H \in {20, 50, 100}

, and are evaluated through within-base and cross-base validation.

Figure 2. Solver fraction heatmaps for two representative benchmark configurations. (Top row): Dataset 1 (polynomial benchmark). (Bottom row): Dataset 2A (oscillatory benchmark). Columns correspond to increasing iteration horizons (

Y_{20}, Y_{50}, Y_{100}

). Both cases exhibit nonuniform spatial patterns and a monotonic expansion as the allowed number of iterations increases.

Figure 2. Solver fraction heatmaps for two representative benchmark configurations. (Top row): Dataset 1 (polynomial benchmark). (Bottom row): Dataset 2A (oscillatory benchmark). Columns correspond to increasing iteration horizons (

Y_{20}, Y_{50}, Y_{100}

). Both cases exhibit nonuniform spatial patterns and a monotonic expansion as the allowed number of iterations increases.

Figure 3. Within-base prediction performance for the solver target

Y_{50}

as a function of prefix length N in the direct-prefix setting, where the first N values of one feature family are used jointly as the predictor. The seven displayed curves represent the following: black—Lyapunov raw; blue, orange, and green—mean, median, and standard deviation of the log residual; and red, purple, and brown—mean, median, and standard deviation of the log step. Dataset 1 (left) exhibits near-saturated performance from very short prefixes, whereas Dataset 2A (right) provides a more challenging case with moderate but structured predictability. In both panels, the Lyapunov curve remains below the leading residual- and step-based curves. The dashed vertical guides at

N = 3

,

N = 10

, and

N = 15

mark, respectively, the two scalar summary prefix lengths and an additional direct-prefix reference used in the fixed-prefix comparisons reported above.

Figure 3. Within-base prediction performance for the solver target

Y_{50}

as a function of prefix length N in the direct-prefix setting, where the first N values of one feature family are used jointly as the predictor. The seven displayed curves represent the following: black—Lyapunov raw; blue, orange, and green—mean, median, and standard deviation of the log residual; and red, purple, and brown—mean, median, and standard deviation of the log step. Dataset 1 (left) exhibits near-saturated performance from very short prefixes, whereas Dataset 2A (right) provides a more challenging case with moderate but structured predictability. In both panels, the Lyapunov curve remains below the leading residual- and step-based curves. The dashed vertical guides at

N = 3

,

N = 10

, and

N = 15

mark, respectively, the two scalar summary prefix lengths and an additional direct-prefix reference used in the fixed-prefix comparisons reported above.

Figure 4. Heatmap view of the best single-summary prediction performance for the representative target

Y_{50}

at fixed early-prefix lengths

N = 3

and

N = 10

. Each row corresponds to a retained feature family, and each cell reports the best achieved

R^{2}

when the first N values of that family are compressed into a single scalar summary. The figure complements the continuous curves of Figure 3, which show the corresponding direct-prefix analysis up to

N = 15

.

Figure 4. Heatmap view of the best single-summary prediction performance for the representative target

Y_{50}

at fixed early-prefix lengths

N = 3

and

N = 10

. Each row corresponds to a retained feature family, and each cell reports the best achieved

R^{2}

when the first N values of that family are compressed into a single scalar summary. The figure complements the continuous curves of Figure 3, which show the corresponding direct-prefix analysis up to

N = 15

.

Table 1. Comparison of the present study with related approaches for stability analysis and data-driven prediction in iterative solvers.

Approach	Target	Features	ML role	Objective
Classical stability analysis	Stability indicators (e.g., LLE)	Full trajectories	None	Qualitative or quantitative stability analysis
Data-driven prediction (generic)	Time series outcomes	Raw or embedded signals	Direct prediction	Forecasting future behavior
Profiling-based analysis (prior work)	Proxy-derived metrics	Trajectory-derived signals	Auxiliary prediction	Exploration of trajectory-based stability behavior
Present study	Solver-level targets $Y_{H}$	Early solver-derived features	Simple regression probe	Analysis of early-time predictive information

Table 2. Early-prefix within-base prediction performance (

R^{2}

). For each feature and target, we report the mean

R^{2}

over early prefixes (

N = {5, 10, 15}

; for Lyapunov,

N = {10, 15}

). Results are shown for a representative easy case (Dataset 1) and a more challenging oscillatory case (Dataset 2A).

Table 2. Early-prefix within-base prediction performance (

R^{2}

). For each feature and target, we report the mean

R^{2}

over early prefixes (

N = {5, 10, 15}

; for Lyapunov,

N = {10, 15}

). Results are shown for a representative easy case (Dataset 1) and a more challenging oscillatory case (Dataset 2A).

	Dataset 1			Dataset 2A
Feature	$Y_{20}$	$Y_{50}$	$Y_{100}$	$Y_{20}$	$Y_{50}$	$Y_{100}$
Mean log residual	0.997	0.998	0.997	0.612	0.632	0.642
Median log residual	0.997	0.998	0.995	0.709	0.690	0.684
Std log residual	0.982	0.971	0.791	0.644	0.666	0.666
Mean log step	0.995	0.996	0.992	0.719	0.715	0.708
Median log step	0.995	0.993	0.993	0.621	0.579	0.572
Std log step	0.983	0.953	0.991	0.662	0.663	0.656
Lyapunov raw	0.428	0.382	0.379	0.205	0.208	0.211

Table 3. Best single-summary predictors for

Y_{50}

in Dataset 1 at fixed early-prefix lengths. Each cell reports the best scalar summary extracted from the first N values of the corresponding feature family, followed by the achieved

R^{2}

.

Table 3. Best single-summary predictors for

Y_{50}

in Dataset 1 at fixed early-prefix lengths. Each cell reports the best scalar summary extracted from the first N values of the corresponding feature family, followed by the achieved

R^{2}

.

Feature Family	$N = 3$ (Best Scalar, $R^{2}$ )	$N = 10$ (Best Scalar, $R^{2}$ )
Mean log residual	$Δ$ (last–first), 0.970	$Δ$ (last–first), 0.949
Median log residual	early–late mean diff., 0.780	linear slope, 0.941
Std log residual	mean $\| Δ x \|$ , 0.843	mean $\| Δ x \|$ , 0.751
Mean log step	mean 2nd diff., 0.833	mean 2nd diff., 0.953
Median log step	last value, 0.776	mean $\| Δ x \|$ , 0.868
Std log step	last value, 0.665	$Δ$ (last–first), 0.562

Table 4. Best single-summary predictors for

Y_{50}

in Dataset 2A at fixed early-prefix lengths. Each cell reports the best scalar summary extracted from the first N values of the corresponding feature family, followed by the achieved

R^{2}

.

Table 4. Best single-summary predictors for

Y_{50}

in Dataset 2A at fixed early-prefix lengths. Each cell reports the best scalar summary extracted from the first N values of the corresponding feature family, followed by the achieved

R^{2}

.

Feature Family	$N = 3$ (Best Scalar, $R^{2}$ )	$N = 10$ (Best Scalar, $R^{2}$ )
Mean log residual	last value, 0.562	last value, 0.752
Median log residual	last value, 0.538	last value, 0.731
Std log residual	last value, 0.366	mean $\| Δ x \|$ , 0.594
Mean log step	last value, 0.467	last value, 0.768
Median log step	last value, 0.498	early–late mean diff., 0.674
Std log step	last value, 0.457	mean $\| Δ x \|$ , 0.772

Table 5. Dataset 1: best direct-prefix and single-summary predictors for the representative target

Y_{50}

at selected early-prefix lengths. Direct-prefix models use the full-prefix vector

X_{N}

of one feature family, whereas single-summary models use only one scalar extracted from the same prefix. The scalar summary screen was evaluated at

N = 3

and

N = 10

, while the direct-prefix curves also extend to

N = 15

.

Table 5. Dataset 1: best direct-prefix and single-summary predictors for the representative target

Y_{50}

at selected early-prefix lengths. Direct-prefix models use the full-prefix vector

X_{N}

of one feature family, whereas single-summary models use only one scalar extracted from the same prefix. The scalar summary screen was evaluated at

N = 3

and

N = 10

, while the direct-prefix curves also extend to

N = 15

.

Prefix Length	Best Direct-Prefix Predictor	Best Single-Summary Predictor
$N = 3$	Mean log residual, $R^{2} = 0.998$	Mean log residual ( $Δ$ last–first), $R^{2} = 0.970$
$N = 10$	Mean log residual, $R^{2} = 0.999$	Mean log step (mean 2nd diff.), $R^{2} = 0.953$
$N = 15$	Mean log residual, $R^{2} = 0.998$	not evaluated in the scalar screen

Table 6. Dataset 2A: best direct-prefix and single-summary predictors for the representative target

Y_{50}

at selected early-prefix lengths. Direct-prefix models use the full-prefix vector

X_{N}

of one feature family, whereas single-summary models use only one scalar extracted from the same prefix. The stronger scalar result at

N = 10

suggests that, in this oscillatory benchmark, scalar compression may suppress part of the transient variability present in the full-prefix representation.

Table 6. Dataset 2A: best direct-prefix and single-summary predictors for the representative target

Y_{50}

at selected early-prefix lengths. Direct-prefix models use the full-prefix vector

X_{N}

of one feature family, whereas single-summary models use only one scalar extracted from the same prefix. The stronger scalar result at

N = 10

suggests that, in this oscillatory benchmark, scalar compression may suppress part of the transient variability present in the full-prefix representation.

Prefix Length	Best Direct-Prefix Predictor	Best Single-Summary Predictor
$N = 3$	Median log residual, $R^{2} = 0.676$	Mean log residual (last value), $R^{2} = 0.562$
$N = 10$	Mean log step, $R^{2} = 0.754$	Std log step (mean $\| Δ x \|$ ), $R^{2} = 0.772$
$N = 15$	Mean log step, $R^{2} = 0.758$	not evaluated in the scalar screen

Table 7. Interpretive synthesis of the scalar prefix analysis (

N_{\max} = 3

). The labels summarize which types of low-dimensional summaries most often emerge as the best predictors across the considered datasets and horizons.

Table 7. Interpretive synthesis of the scalar prefix analysis (

N_{\max} = 3

). The labels summarize which types of low-dimensional summaries most often emerge as the best predictors across the considered datasets and horizons.

Feature Family	$Y_{20}$	$Y_{50}$	$Y_{100}$
Level (mean residual)	dominant	dominant	dominant
Robust level (median residual)	dominant	dominant	strong
Trend (step summaries)	strong	strong	moderate
Variability (std)	moderate	moderate	weak
Curvature/higher-order	weak	weak	weak
Lyapunov-based	weak	weak	weak

Table 8. Representative cross-base transfer results (WAPE) at

N = 15

for the two datasets discussed in the main text. Rows denote the training dataset and columns denote the testing dataset. Smaller values indicate lower transfer error.

Table 8. Representative cross-base transfer results (WAPE) at

N = 15

for the two datasets discussed in the main text. Rows denote the training dataset and columns denote the testing dataset. Smaller values indicate lower transfer error.

Feature	Training Dataset	Test: Dataset 1	Test: Dataset 2A
Median log residual	Dataset 1	0.034	0.512
Median log residual	Dataset 2A	0.842	0.156
Lyapunov raw	Dataset 1	0.325	0.490
Lyapunov raw	Dataset 2A	1.189	0.255

Table 9. Cross-base transfer performance (mean WAPE over the

2 \times 2

source/target dataset pairs). Smaller values indicate lower average transfer error. The last column reports the early-prefix mean over the displayed prefix lengths. Bold values indicate the best performance within each column.

Table 9. Cross-base transfer performance (mean WAPE over the

2 \times 2

source/target dataset pairs). Smaller values indicate lower average transfer error. The last column reports the early-prefix mean over the displayed prefix lengths. Bold values indicate the best performance within each column.

Feature	$N = 5$	$N = 10$	$N = 15$	Early Mean
Mean log residual	1.009	1.055	1.077	1.047
Mean log step	1.069	1.032	1.034	1.045
Median log residual	1.220	0.787	0.737	0.915
Median log step	1.062	0.808	0.767	0.879
Std log residual	1.237	1.235	1.288	1.253
Std log step	1.134	1.287	1.300	1.240
Lyapunov raw	–	1.740	1.796	1.768

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Carpentieri, B.; Velichko, A.; Shams, M.; Lecca, P. Early Prediction of Convergence Outcomes in Parameterized Root-Finding: A Feature-Based Study. Mathematics 2026, 14, 2036. https://doi.org/10.3390/math14122036

AMA Style

Carpentieri B, Velichko A, Shams M, Lecca P. Early Prediction of Convergence Outcomes in Parameterized Root-Finding: A Feature-Based Study. Mathematics. 2026; 14(12):2036. https://doi.org/10.3390/math14122036

Chicago/Turabian Style

Carpentieri, Bruno, Andrei Velichko, Mudassir Shams, and Paola Lecca. 2026. "Early Prediction of Convergence Outcomes in Parameterized Root-Finding: A Feature-Based Study" Mathematics 14, no. 12: 2036. https://doi.org/10.3390/math14122036

APA Style

Carpentieri, B., Velichko, A., Shams, M., & Lecca, P. (2026). Early Prediction of Convergence Outcomes in Parameterized Root-Finding: A Feature-Based Study. Mathematics, 14(12), 2036. https://doi.org/10.3390/math14122036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Early Prediction of Convergence Outcomes in Parameterized Root-Finding: A Feature-Based Study

Abstract

1. Introduction

2. Methodology

2.1. Problem Setting and Iterative Scheme

2.2. Solver Trajectories and Observable Quantities

2.3. Solver-Level Targets and Prediction Horizons

2.4. Early-Time Feature Construction

2.5. Feature Families and Scalar Summaries

2.6. Multi-Horizon Prediction Setting

2.7. Benchmark Design, Validation Settings, and Evaluation Metrics

3. Experimental Setup

3.1. Benchmark Equations, Parameter Grid, and Computational Budget

3.2. Trajectory Construction and Observables

3.3. Definition of Solver-Level Targets

3.4. Feature Construction and Prefix Definition

3.5. Multi-Horizon Prediction Protocol

3.6. Train/Test Splits and Transfer Settings

3.7. Prediction Model and Evaluation Metrics

4. Results

4.1. Solver-Level Target Landscapes

4.2. Within-Base Prediction of Solver-Level Targets

4.3. Feature-Family Comparison and Early-Time Information

4.4. Cross-Base Transfer

4.5. Summary of Main Findings

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Supplementary Results and Solver Validation

Appendix A.1. Auxiliary Solver Validation and Sensitivity to Initialization

Appendix A.2. Additional Prediction Results (Supplementary)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI