Hyper–Dual Numbers: A Theoretical Foundation for Exact Second Derivatives

Park, Sung Bum; Kim, Ji Eun

doi:10.3390/math13243909

Open AccessArticle

Hyper–Dual Numbers: A Theoretical Foundation for Exact Second Derivatives

by

Sung Bum Park

¹

and

Ji Eun Kim

^2,*

¹

Department of Automotive Materials and Components Engineering, Dongguk University, Gyeongju 38066, Republic of Korea

²

Department of Mathematics, Dongguk University, Gyeongju 38066, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(24), 3909; https://doi.org/10.3390/math13243909

Submission received: 16 September 2025 / Revised: 22 November 2025 / Accepted: 2 December 2025 / Published: 6 December 2025

(This article belongs to the Section C: Mathematical Analysis)

Download

Browse Figures

Versions Notes

Abstract

Second-order derivative information, including mixed curvature, is central to Newton and trust-region optimization, uncertainty quantification, and simulation-based design. Classical finite differences (FD) remain popular but require delicate step-size tuning and can suffer from cancelation and noise amplification. Complex-step differentiation offers machine-precision gradients without subtractive cancelation, yet many second-derivative complex-step formulas reintroduce differencing. Hyper-dual numbers provide an algebraically principled alternative: by lifting real code to a four-component commutative nilpotent algebra, one obtains exact first and mixed second derivatives from a single evaluation, without finite differencing. This article gives a consolidated theoretical and experimental foundation for hyper-dual numbers. We formalize the algebra, prove exact Taylor truncation at second order, derive coefficient–extraction formulas for gradients and Hessians, and state assumptions for exactness, including limitations at non-smooth points and the need to branch on real parts. We present implementation patterns and language skeletons (C++, Python 3.11.5, MATLAB R2023b), and we provide fair numerical comparisons with FD, complex-step, and AD baselines. Stability tests under additive noise and ill-conditioning, together with runtime and memory profiling, demonstrate that hyper-dual coefficients are robust and reproducible in floating-point arithmetic, particularly for black-box codes where Hessian information is needed but differencing is fragile.

Keywords:

hyper–dual numbers; exact second derivatives; Hessian; dual numbers; complex–step differentiation; finite differences; automatic differentiation

MSC:

65D25; 65Y20; 65K05; 65F10

1. Introduction

Accurate and robust second-order derivatives are central to Newton-type optimization, model calibration, sensitivity analysis, and uncertainty quantification. In these contexts, the Hessian matrix—or selected second-order entries—governs convergence of Newton and quasi-Newton methods, informs trust-region or line-search strategies, and quantifies curvature in parameter spaces.

Classical finite-difference (FD) formulas approximate derivatives via Taylor expansions but suffer from two competing error sources: truncation error for large step sizes and subtractive cancelation for very small steps in floating-point arithmetic. Selecting a globally robust step size is notoriously problem-dependent, especially in the presence of measurement noise or ill-conditioned models. Complex-step differentiation addresses cancelation for first derivatives by embedding a purely imaginary perturbation and reading the imaginary component without differencing. However, standard recipes for second derivatives reintroduce differencing or multi-step identities, thereby re-exposing the method to subtraction and step sensitivity.

Hyper-dual numbers offer a principled remedy. By introducing two commuting nilpotent directions, one can encode both first and second derivatives as non-real coefficients in a four-component algebra. The Taylor lift of a

C^{2}

function in this algebra truncates exactly at second order, so that the mixed nilpotent coefficient corresponds directly to the second derivative. The resulting method provides second derivatives to working precision without differencing and without step search, while simultaneously returning first derivatives.

Beyond classical finite differences, the complex-step technique provides machine-precision first derivatives without subtractive cancelation, yet standard second-derivative recipes typically reintroduce differencing or multi-step identities; see Squire and Trapp [1] for the seminal note, Fornberg [2] on FD weights and truncation error, and Higham [3] and Goldberg [4] for floating-point effects. Curvature is essential for Newton-type optimization and sensitivity analysis [5], while adjoint (reverse-mode) approaches remain the workhorse for large-scale gradient evaluation in simulation-based design [6]. Hyper-dual numbers complement these tools by delivering exact second derivatives without step tuning or differencing. In the broader landscape, algorithmic differentiation (AD) texts—Griewank and Walther [7] and Naumann [8]—and survey/review articles—Martins and Hwang [9], Baydin et al. [10], and Kim [11]—frame where hyper-dual numbers fit relative to forward, reverse, and adjoint approaches.

Compared with earlier expositions of hyper-dual numbers, we pursue the following two goals:

Assumption transparency and algebraic clarity. We make the assumptions required for exactness explicit (smoothness, control-flow branching policies, floating-point limitations, and behavior at non-smooth points). We prove the exact second-order Taylor lift and multivariate chain rule in a concise, implementation-oriented style.
Comprehensive experimental assessment. We supplement algebraic results with numerical stability tests under noise and ill-conditioning, runtime and memory profiling, and sensitivity analyses in FD step size and variable scaling. A curated set of benchmark problems—from low-dimensional analytic tests to classical optimization functions and higher-dimensional examples—illustrates accuracy and robustness. We also include minimal C++, Python 3.11.5, and MATLAB R2023b skeletons for rapid prototyping.

To keep the exposition cohesive, we consolidate the material into a small number of sections. Algebraic constructions and assumptions appear in Section 2 and Section 3, practical algorithms and implementation patterns in Section 4, numerical experiments and stability tests in Section 5, and cost/stability/comparative analysis in Section 6. Section 7 discusses extensions and integration with AD frameworks, and Section 8 concludes. Appendix A, Appendix B and Appendix C provide language-specific skeletons.

2. Hyper-Dual Algebra and Assumptions for Exactness

2.1. From Generalized Complex Numbers to Hyper-Duals

Numbers of the form

a + b E

with multiplication

(a + b E) (c + d E) = a c + (a d + b c) E + b d E^{2}

unify several familiar algebras: complex numbers (

E^{2} = - 1

), double numbers (

E^{2} = + 1

), and dual numbers (

E^{2} = 0

). In particular, dual numbers

a + b ε

with

ε^{2} = 0

encode exact first derivatives: The coefficient of

ε

in

f (x + h ε)

equals

h f^{'} (x)

for

C^{1}

functions f.

Quaternions enlarge the space of imaginary directions but remain noncommutative, and the fact that squares of pure imaginary quaternions are real prevents the clean isolation of second-order information in a purely non-real component. What is needed instead are two commuting nilpotent directions whose product is nonzero but whose squares vanish.

2.2. Definition of the Hyper-Dual Algebra

Introduce commuting nilpotent units

ε_{1}, ε_{2}

with

ε_{1}^{2} = ε_{2}^{2} = {(ε_{1} ε_{2})}^{2} = 0, ε_{1} ε_{2} = ε_{2} ε_{1} \neq 0 .

(1)

A hyper-dual number is an element

a = a_{0} + a_{1} ε_{1} + a_{2} ε_{2} + a_{12} ε_{1} ε_{2}, a_{0}, a_{1}, a_{2}, a_{12} \in R,

(2)

and the set of such elements forms a commutative, associative, unital algebra over

R

, denoted

H D

.

Definition 1

(Coefficient readout and real part). For

a \in H D

written as above, we denote by

{[a]}_{ε_{1}}

,

{[a]}_{ε_{2}}

, and

{[a]}_{ε_{1} ε_{2}}

the coefficients of

ε_{1}

,

ε_{2}

, and

ε_{1} ε_{2}

, respectively, and write

Re (a) : = a_{0}

for the real part. In code, control-flow branches are evaluated on

Re (a)

to preserve the real-valued path.

Arithmetic in

H D

follows from distributivity and (1). If

a = a_{0} + a_{1} ε_{1} + a_{2} ε_{2} + a_{12} ε_{1} ε_{2}

and

b = b_{0} + b_{1} ε_{1} + b_{2} ε_{2} + b_{12} ε_{1} ε_{2}

, then

\begin{matrix} a + b & = (a_{0} + b_{0}) + (a_{1} + b_{1}) ε_{1} + (a_{2} + b_{2}) ε_{2} + (a_{12} + b_{12}) ε_{1} ε_{2}, \end{matrix}

(3)

\begin{matrix} a b & = a_{0} b_{0} + (a_{0} b_{1} + a_{1} b_{0}) ε_{1} + (a_{0} b_{2} + a_{2} b_{0}) ε_{2} \\ + (a_{0} b_{12} + a_{1} b_{2} + a_{2} b_{1} + a_{12} b_{0}) ε_{1} ε_{2} . \end{matrix}

(4)

If

a_{0} \neq 0

, the inverse

a^{- 1}

exists and can be written explicitly.

Proposition 1

(Inverse in

H D

). Let

a \in H D

with

a_{0} \neq 0

. Then

a^{- 1} = \frac{1}{a_{0}} - \frac{a_{1}}{a_{0}^{2}} ε_{1} - \frac{a_{2}}{a_{0}^{2}} ε_{2} + (\frac{2 a_{1} a_{2}}{a_{0}^{3}} - \frac{a_{12}}{a_{0}^{2}}) ε_{1} ε_{2},

(5)

and the usual quotient rule holds under smooth lifting of scalar functions.

2.3. Assumptions for Exactness and Limitations at Non-Smooth Points

The algebraic statements in this paper are exact. To translate them into reproducible numerical claims in floating-point arithmetic, we make the following assumptions explicit and use them consistently throughout:

(A1): Smoothness.We assume that $f : R^{n} \to R$ is $C^{2}$ in a neighborhood of the evaluation point. This guarantees the existence of mixed partial derivatives and Schwarz symmetry of the Hessian.
(A2): Real-path control flow. All branching in code is evaluated on real parts, i.e., conditionals use $Re (x)$ only. This ensures that the hyper-dual execution follows the same path as the real-valued code.
(A3): Floating-point arithmetic. Numerical claims are qualified by floating-point arithmetic: hyper-dual coefficients are returned to working precision and are ultimately limited by the conditioning of the real evaluation and the underlying hardware/BLAS stack.
(A4): Non-smooth points. At non-smooth points (e.g., absolute values, maximum/minimum, limiters), classical second derivatives may not exist or may depend on the approach direction. In such cases, hyper-dual coefficients represent the derivatives of the implemented procedure under (A2), not classical derivatives of an idealized continuous model. Smoothing or regularization should be applied when classical second derivatives are required.

A Practical Smoothing Remark

When a model contains non-smooth primitives such as

| x |

or

max (a, b)

, hyper-dual coefficients reflect directional derivatives of the implemented code under the real-path policy (A2), which may deviate from classical second derivatives. In practice, a mild, smooth surrogate often restores a well-defined Hessian while preserving the intended physics. Two standard examples are

\begin{matrix} | x | & \approx ϕ_{δ} (x) : = \sqrt{x^{2} + δ^{2}}, δ > 0, \end{matrix}

(6)

\begin{matrix} max (a, b) & \approx ψ_{κ} (a, b) : = \frac{1}{κ} log (e^{κ a} + e^{κ b}), κ ≫ 1 . \end{matrix}

(7)

Both

ϕ_{δ}

and

ψ_{κ}

are

C^{\infty}

and converge pointwise to the original non-smooth operators as

δ \to 0

or

κ \to \infty

, respectively. Hyper-dual evaluations of the smoothed code then return the exact Hessian of the surrogate, which is typically the quantity required in Newton-type solvers, sensitivity analysis, or curvature-based uncertainty quantification.

Assumption (A1) delineates the class of functions for which hyper-dual numbers return second derivatives corresponding to the Hessian of f. Assumptions (A2)–(A4) connect the algebra to real code, clarifying that deviations from theoretical exactness may arise from control-flow policies, non-smooth modeling choices, and floating-point effects rather than from the hyper-dual algebra itself.

Our algebraic construction is consistent with forward-mode AD via dual numbers and its object-oriented implementations; see Neidinger [12] for a clear tutorial. For engineering optimization contexts that motivated generalized numbers, including hyper-duals, see Fike and Alonso [13] and the follow-up implementation notes.

3. Exact Taylor Lift and Extraction of Derivatives

The exact truncation at bi-degree (1, 1) parallels Taylor-arithmetic viewpoints in AD; Griewank and Walther [7] and implementations that emphasize no differencing for coefficient extraction. For domain applications that require second-order sensitivity but maintain real-path control flow, Rehner et al. [14] generalize (hyper-)dual numbers to higher orders and vector-valued settings.

3.1. Scalar Case

Let

f : R \to R

be

C^{2}

near x and consider the hyper-dual perturbation

d = h_{1} ε_{1} + h_{2} ε_{2}, h_{1}, h_{2} \in R .

Because all monomials of total degree

\geq 3

in

ε_{1}, ε_{2}

vanish, the Taylor expansion of f in

H D

truncates exactly at degree two.

Theorem 1

(Exact second-order lift in

H D

). Let

f \in C^{2} (R)

near x. Then

f (x + d) = f (x) + h_{1} f^{'} (x) ε_{1} + h_{2} f^{'} (x) ε_{2} + h_{1} h_{2} f^{″} (x) ε_{1} ε_{2} .

(8)

Consequently,

\frac{{[f (x + d)]}_{ε_{1}}}{h_{1}} = f^{'} (x), \frac{{[f (x + d)]}_{ε_{2}}}{h_{2}} = f^{'} (x), \frac{{[f (x + d)]}_{ε_{1} ε_{2}}}{h_{1} h_{2}} = f^{″} (x) .

(9)

In floating-point arithmetic, these identities hold to working precision modulo roundoff and the conditioning of the real evaluation.

Proof.

Write the classical Taylor series to order two and observe that all terms of degree

\geq 3

vanish because every monomial contains either

ε_{1}^{2}

,

ε_{2}^{2}

, or

{(ε_{1} ε_{2})}^{2}

, which are zero by (1). Terms linear in d produce the

ε_{1}

and

ε_{2}

coefficients, and the unique degree-two mixed term yields the

ε_{1} ε_{2}

coefficient. □

3.2. Multivariate Case and Hessian Extraction

For

f : R^{n} \to R

and standard basis vectors

e_{i}, e_{j}

, consider

x^{★} = x + h_{1} ε_{1} e_{i} + h_{2} ε_{2} e_{j} .

(10)

The multivariate Taylor expansion yields

f (x^{★}) = f (x) + h_{1} \partial_{x_{i}} f (x) ε_{1} + h_{2} \partial_{x_{j}} f (x) ε_{2} + h_{1} h_{2} \partial_{x_{i} x_{j}}^{2} f (x) ε_{1} ε_{2} .

(11)

Thus a single hyper-dual evaluation recovers two first derivatives and one mixed second derivative,

\frac{{[f (x^{★})]}_{ε_{1}}}{h_{1}} = \partial_{x_{i}} f (x), \frac{{[f (x^{★})]}_{ε_{2}}}{h_{2}} = \partial_{x_{j}} f (x), \frac{{[f (x^{★})]}_{ε_{1} ε_{2}}}{h_{1} h_{2}} = \partial_{x_{i} x_{j}}^{2} f (x) .

(12)

Repeating this procedure for

1 \leq i \leq j \leq n

and exploiting symmetry builds the full Hessian in

n (n + 1) / 2

hyper-dual evaluations. Vector-mode implementations can propagate multiple directions concurrently, further reducing the required number of passes at a modest memory cost.

3.3. Chain Rule and Compositions

Let

g : R^{n} \to R^{m}

and

f : R^{m} \to R

be

C^{2}

near x, and let

v, w \in R^{n}

. Set

x^{★} = x + h_{1} ε_{1} v + h_{2} ε_{2} w

and define

F = f \circ g

. A direct calculation yields a mixed-direction chain rule for the second derivative of F.

Theorem 2

(Mixed-direction chain rule in

H D

). Under assumptions (A1)–(A3), one has

\begin{matrix} \frac{{[(f \circ g) (x^{★})]}_{ε_{1} ε_{2}}}{h_{1} h_{2}} & = & v^{⊤} \nabla^{2} (f \circ g) (x) w \\ = & {(J_{g} v)}^{⊤} \nabla^{2} f (g (x)) (J_{g} w) + \sum_{a = 1}^{m} \partial_{a} f (g (x)) v^{⊤} \nabla^{2} g_{a} (x) w, \end{matrix}

(13)

where

J_{g}

is the Jacobian of g and

g_{a}

denotes the a-th component of g.

Proof.

Write

g (x + h_{1} ε_{1} v + h_{2} ε_{2} w) = g (x) + Δ

with

Δ = J_{g} (x) (h_{1} ε_{1} v + h_{2} ε_{2} w) + \frac{1}{2} h_{1}^{2} ε_{1}^{2} \nabla^{2} g [[v, v]] + \frac{1}{2} h_{2}^{2} ε_{2}^{2} \nabla^{2} g [[w, w]] + h_{1} h_{2} ε_{1} ε_{2} Γ,

where the bilinear form

Γ

has a-th component

v^{⊤} \nabla^{2} g_{a} (x) w

. Nilpotency kills all

ε_{1}^{2}

and

ε_{2}^{2}

terms, so

Δ = h_{1} ε_{1} J_{g} v + h_{2} ε_{2} J_{g} w + h_{1} h_{2} ε_{1} ε_{2} Γ .

Now lift f at

y = g (x)

:

f (y + Δ) = f (y) + \nabla f {(y)}^{⊤} Δ + \frac{1}{2} Δ^{⊤} \nabla^{2} f (y) Δ,

and retain only coefficients of

ε_{1}

,

ε_{2}

, and

ε_{1} ε_{2}

. The linear terms give the first two identities. The mixed coefficient equals

{(J_{g} v)}^{⊤} \nabla^{2} f (y) (J_{g} w) + \nabla f {(y)}^{⊤} Γ

, which is exactly the stated chain rule for the mixed second derivative. □

This identity confirms that hyper-dual numbers faithfully represent the Hessian of composed maps, provided that the real-path control flow is respected.

When only Hessian-vector products are required, reverse-on-forward strategies or the classic Pearlmutter [15] operator yield

H v

at a cost comparable to one gradient, complementing the hyper-dual readout used here for explicit Hessian entries.

Remark 1

(Nonsmooth points and branch sensitivity). If f is not

C^{2}

(e.g., contains

| x |

, max, or limiters), the second derivative may not exist or depend on the chosen real-path branch. Hyper-dual coefficients then reflect the implemented model under the stated branching policy, and users should smooth or regularize as appropriate.

Lemma 1

(Schwarz symmetry via commuting nilpotents). If

f \in C^{2} (R^{n})

, then

{[f (x + h_{1} ε_{1} e_{i} + h_{2} ε_{2} e_{j})]}_{ε_{1} ε_{2}} / (h_{1} h_{2}) = \partial_{i j}^{2} f (x) = \partial_{j i}^{2} f (x)

and is independent of assigning directions to

ε_{1}, ε_{2}

.

Proof.

The lift is unique up to the commutation of

ε_{1}

and

ε_{2}

, and the mixed coefficient encodes the bilinear form associated with the Hessian. Since

f \in C^{2}

, Schwarz’s theorem gives

\partial_{x_{i} x_{j}}^{2} f = \partial_{x_{j} x_{i}}^{2} f

, hence the stated equality. □

Corollary 1

(Affine reparametrization). Let

ψ (x) = A x + b

with

A \in R^{m \times n}

,

b \in R^{m}

, and

F (x) = f (ψ (x))

for

f \in C^{2} (R^{m})

. Then

\frac{{[F (x + h_{1} ε_{1} v + h_{2} ε_{2} w)]}_{ε_{1} ε_{2}}}{h_{1} h_{2}} = v^{⊤} (A^{⊤} \nabla^{2} f (A x + b) A) w,

i.e., the mixed coefficient transforms by the standard Hessian congruence under affine maps.

Proof.

Apply Theorem 2 with

g = ψ

. Since

\nabla^{2} ψ_{a} \equiv 0

, only the first term

{(J_{ψ} v)}^{⊤} \nabla^{2} f (J_{ψ} w) = {(A v)}^{⊤} \nabla^{2} f (A x + b) (A w)

remains. □

4. Implementation Patterns and Algorithms

4.1. Smooth Lifts of Elementary Functions

Analytic functions lift to

H D

via their Taylor series. For

a = a_{0} + a_{1} ε_{1} + a_{2} ε_{2} + a_{12} ε_{1} ε_{2}

, one obtains, for example,

\begin{matrix} exp (a) & = exp (a_{0}) + a_{1} exp (a_{0}) ε_{1} + a_{2} exp (a_{0}) ε_{2} + exp (a_{0}) (a_{12} + a_{1} a_{2}) ε_{1} ε_{2}, \end{matrix}

(14)

\begin{matrix} sin (a) & = sin (a_{0}) + a_{1} cos (a_{0}) ε_{1} + a_{2} cos (a_{0}) ε_{2} + (a_{12} cos a_{0} - a_{1} a_{2} sin a_{0}) ε_{1} ε_{2}, \end{matrix}

(15)

\begin{matrix} cos (a) & = cos (a_{0}) - a_{1} sin (a_{0}) ε_{1} - a_{2} sin (a_{0}) ε_{2} - (a_{12} sin a_{0} + a_{1} a_{2} cos a_{0}) ε_{1} ε_{2}, \end{matrix}

(16)

\begin{matrix} log (a) & = log (a_{0}) + \frac{a_{1}}{a_{0}} ε_{1} + \frac{a_{2}}{a_{0}} ε_{2} + (\frac{a_{12}}{a_{0}} - \frac{a_{1} a_{2}}{a_{0}^{2}}) ε_{1} ε_{2}, a_{0} > 0 . \end{matrix}

(17)

These formulas can be used to overload elementary functions for

H D

in C++, Python 3.11.5, or MATLAB R2023b (see Appendix A, Appendix B and Appendix C). Correct lifts of special functions such as tanh,

\sqrt{\cdot}

, or Bessel functions follow the same pattern.

4.2. Hessian Extraction Algorithm

The following pseudocode summarizes a standard pattern for assembling a dense Hessian using hyper-dual numbers. A “real-converge, hyper-dual sweep” pattern is recommended when f embeds an iterative solver.

Algorithm 1 provides a minimal building block for sparse Hessians, Hessian-vector products, or curvature in selected coordinate pairs, reducing the number of hyper-dual evaluations when full dense Hessians are unnecessary.

In practice,

h_{1}

and

h_{2}

can be taken as 1 after nondimensionalizing variables, since the algebra is exact. Reporting the chosen

h_{1}, h_{2}

nevertheless helps document the numeric ranges of coefficients.

4.3. Branching, Non-Smooth Operators, and Reproducibility

As emphasized in (A2), branch conditions should be evaluated on real parts only. Non-smooth operators (e.g., absolute value, hard limiters) require particular care: hyper-dual coefficients then reflect the differentiability properties of the discrete model. A practical reproducibility checklist includes the following:

description of any non-smooth operators and the smoothing/regularization policies used.
The scaling and units of variables and the nondimensionalization adopted.
details of the linear algebra stack (factorizations, tolerances, BLAS/LAPACK backends).
Seeds and distributions for randomized tests.

Implementation Cross-Reference

A unified MATLAB class implementing the hyper-dual algebra, real-path branching rule, and coefficient readout used throughout Section 5 is provided in Appendix C. This skeleton realizes Algorithms 1 and 2 in a single program class, directly supporting reproducibility and rapid prototyping across benchmarks.

Algorithm 1: Selected mixed second derivative or Hessian–vector product.

Require:: $f : R^{n} \to R$ , point x, directions $v, w \in R^{n}$
Ensure:: Mixed second directional derivative $v^{⊤} \nabla^{2} f (x) w$
1:: Form a hyper–dual seed $X = x + ε_{1} v + ε_{2} w$
2:: Evaluate $Y = f (X) \in HD$
3:: return ${[Y]}_{ε_{1} ε_{2}}$

Algorithm 2: Dense Hessian via hyper-dual numbers.

Require: Function f:

R^{n} \to R

, point

x \in R^{n}

, step sizes h₁, h₂
1: Initialize

H \in R^{n \times n}

to zero
2: for 1 ≤ i ≤ n do
3: for i ≤ j ≤ n do
4: Lift x to

X \in H D^{n}

with real parts x_k and nilpotent parts

X_{k} = {\begin{cases} x_{k} + h_{1} ε_{1}, & k = i, \\ x_{k} + h_{2} ε_{2}, & k = j, \\ x_{k}, & otherwise; \end{cases}

5: Evaluate Y =

f (X) \in H D

6: Set H_ij = H_ji =

{[Y]}_{ε_{1} ε_{2}} / h_{1} h_{2}

7: end for
8: end for
9: return H

5. Numerical Experiments

This section reorganizes and enlarges the example material into a coherent set of test problems. We proceed from analytic tests with known Hessians, through classical optimization benchmarks, to higher-dimensional and stiff problems. Each example compares hyper-dual results with FD and complex-step methods where applicable.

In this section we consider five representative classes of test problems, summarized in Table 1, each chosen to emphasize a specific aspect of accuracy, robustness, or scalability under the assumptions of Section 2.3.

5.1. Analytic Test Functions

We begin with analytic functions whose Hessians are available in closed form. These tests isolate truncation versus roundoff and verify the step-invariance of hyper-dual coefficients under (A1)–(A3) before moving to coupled or ill-conditioned models.

Polynomial and Trigonometric Combinations

Consider first

f (x) = {sin}^{3} x

on

R

. The exact second derivative is

f^{″} (x) = 6 sin x - 9 {sin}^{3} x .

Let

t_{0} = x + h_{1} ε_{1} + h_{2} ε_{2}

,

t_{1} = sin t_{0}

,

t_{2} = t_{1}^{3}

. A short calculation yields

\begin{matrix} t_{2} & = {sin}^{3} x + 3 h_{1} cos x {sin}^{2} x ε_{1} + 3 h_{2} cos x {sin}^{2} x ε_{2} \end{matrix}

(18)

\begin{matrix} - \frac{3}{4} h_{1} h_{2} (sin x - 3 sin 3 x) ε_{1} ε_{2}, \end{matrix}

(19)

hence

{[t_{2}]}_{ε_{1} ε_{2}} / (h_{1} h_{2}) = f^{″} (x)

to working precision.

Evaluating f at

x + h ε_{1} + h ε_{2}

and collecting the

ε_{1} ε_{2}

coefficient recovers

f^{″} (x)

to machine precision across a wide range of x and h. Central FD approximations exhibit the expected

O (h^{2})

convergence followed by roundoff-dominated stagnation, while hyper-dual errors remain essentially flat at the level of machine precision.

As a two-variable test, consider

f (x, y) = exp (x y) + sin (x + y),

whose gradient and Hessian can be written in closed form. For

(x, y) \in R^{2}

one has

\nabla f = [\begin{matrix} y e^{x y} + cos (x + y) \\ x e^{x y} + cos (x + y) \end{matrix}],

\nabla^{2} f = [\begin{matrix} y^{2} e^{x y} - sin (x + y) & e^{x y} + x y e^{x y} - sin (x + y) \\ e^{x y} + x y e^{x y} - sin (x + y) & x^{2} e^{x y} - sin (x + y) \end{matrix}] .

Evaluating f once at

x^{★} = (x, y) + h_{1} ε_{1} e_{i} + h_{2} ε_{2} e_{j}

yields

{[f (x^{★})]}_{ε_{1}} = h_{1} \partial_{x_{i}} f (x, y), {[f (x^{★})]}_{ε_{2}} = h_{2} \partial_{x_{j}} f (x, y),

and

{[f (x^{★})]}_{ε_{1} ε_{2}} = h_{1} h_{2} \partial_{x_{i} x_{j}}^{2} f (x, y) .

For instance, choosing

(i, j) = (1, 2)

returns the mixed entry

\partial_{x y}^{2} f = e^{x y} + x y e^{x y} - sin (x + y)

directly as the

ε_{1} ε_{2}

coefficient divided by

h_{1} h_{2}

.

Hyper-dual evaluations at

(x, y) + h ε_{1} e_{i} + h ε_{2} e_{j}

yield all second-order entries, which match analytic values up to machine precision. This example exercises both multiplicative coupling and trigonometric behavior.

Remark 2

(Vector-mode implementations). The identities above extend verbatim when several ε-directions are propagated concurrently to assemble multiple rows/columns of the Hessian per pass. Memory increases linearly with the number of directions, while the arithmetic remains local and branch semantics are preserved by the real-part order.

5.2. Optimization Benchmarks: Rosenbrock, Beale, and Himmelblau

Classical optimization test functions exhibit narrow valleys, multiple minima, and strong variable coupling. Their Hessians govern Newton-type convergence, making them a canonical stress test for mixed second derivatives and method robustness.

To reflect typical optimization workloads, we include classical test functions

\begin{matrix} R (x, y) & = 100 {(y - x^{2})}^{2} + {(1 - x)}^{2} (Rosenbrock), \end{matrix}

(20)

\begin{matrix} B (x, y) & = {(1.5 - x + x y)}^{2} + {(2.25 - x + x y^{2})}^{2} + {(2.625 - x + x y^{3})}^{2} (Beale), \end{matrix}

(21)

\begin{matrix} H (x, y) & = {(x^{2} + y - 11)}^{2} + {(x + y^{2} - 7)}^{2} (Himmelblau) . \end{matrix}

(22)

For each function and a set of representative points (including known minima and curved valley regions), we compare the following:

Analytic Hessians.
Hyper-dual Hessians.
Central FD Hessians (with tuned step sizes).
Complex-step second-derivative formulas where applicable.

Across all cases, hyper-dual results agree with analytic Hessians to within a small multiple of machine precision, while FD approximations require problem-specific step tuning to avoid either truncation error or cancelation. In regions of high curvature, FD errors are particularly sensitive to step choice, whereas hyper-dual coefficients remain stable.

5.3. Higher-Dimensional and Stiff Problems

Practical simulation objectives are often high-dimensional and ill-conditioned or contain stiff solver components. This subsection evaluates whether hyper-dual exactness and stability persist when curvature scales vary strongly across coordinates.

To assess scalability, we consider higher-dimensional functions such as

f (x) = \frac{1}{2} x^{⊤} Q x + c^{⊤} x + \sum_{k = 1}^{n} sin (α_{k}^{⊤} x),

where Q is symmetric positive-definite, the

α_{k}

are random direction vectors, and n ranges from moderate to large (e.g., n = 10, 50, 100, 200). The true Hessian is Q plus a rank-n trigonometric contribution. Hyper-dual Hessians match this structure exactly and yield accurate entries without step search.

As a representative stiff problem, one may embed hyper-dual numbers into a time-integration code for a stiff ODE and extract sensitivities of a final-time quantity of interest with respect to parameters. Provided that the underlying integrator uses real-part branching only, hyper-dual coefficients deliver second derivatives while the stiffness is handled by the real solver.

5.3.1. Structured PDE–Residual Benchmark (Toy Poisson Model)

To mimic black-box simulation structure, we consider a discrete Poisson residual

G (u; x) = A u - b (x)

where A is an SPD stiffness matrix, and

b (x)

depends smoothly on parameters

x \in R^{n}

. The quantity of interest is

F (x) = \frac{1}{2} {∥ u (x) - u_{obs} ∥}^{2}, where u (x) solves G (u; x) = 0 .

Following the real-converge, hyper-dual sweep strategy, we first solve for the real state

u (x)

to tolerance, then re-evaluate the residual solve with hyper-dual seeds in x while enforcing real-part branching in the linear/nonlinear solver. The

ε_{1} ε_{2}

coefficient yields curvature of F with respect to x without differencing across solver iterations, providing a realistic stress test for ill-conditioning and control-flow sensitivity.

5.3.2. Neural-Network Loss Benchmark (Small MLP)

As a second structured example, we compute curvature of a small multilayer perceptron (MLP) training loss. Let

θ \in R^{n}

collect all weights and biases, and define

L (θ) = \frac{1}{m} \sum_{i = 1}^{m} l ({MLP}_{θ} (x_{i}), y_{i}),

with smooth activation (e.g., tanh) to satisfy (A1). Hyper-dual lifting of

θ

yields exact mixed second derivatives of L at a given training iterate in

n (n + 1) / 2

evaluations, providing a high-dimensional, highly coupled benchmark where FD step tuning is particularly fragile.

5.4. Numerical Stability Under Noise and Ill-Conditioning

Table 2 summarizes the noise robustness and step-size sensitivity of the three second-derivative estimators (FD, CS, and HD) for the scalar test function

f (x) = exp (x) sin x

at

x_{0}

.

5.4.1. Experimental Protocol

Unless stated otherwise, stability results use a fixed evaluation point

x = x_{0}

, Gaussian additive noise

η \sim N (0, σ^{2})

with

σ \in {10^{- 8}, 10^{- 6}, 10^{- 4}, 10^{- 2}}

, and

N_{MC} = 500

independent trials with a fixed pseudo-random seed. Errors are reported as relative errors

RelErr = | {\hat{f}}^{″} - f^{″} | / max (1, | f^{″} |)

, together with

Std

over trials. These settings are chosen to mirror typical measurement/Monte Carlo uncertainty while preserving reproducibility.

5.4.2. Additive Noise in Function Values

Let

\tilde{f} (x) = f (x) + η

, where

η

is a mean-zero random perturbation with variance

σ^{2}

representing measurement or Monte Carlo noise. For a scalar test function (e.g., the polynomial-trigonometric example above), we compare the following:

Central FD approximations to $f^{″} (x)$ using $\tilde{f}$ .
Hyper-dual extraction of $f^{″} (x)$ using a single evaluation of $\tilde{f}$ in $H D$ .

FD formulas amplify noise differently depending on h: for small h, subtraction of nearly equal noisy values leads to large variance in the derivative; for large h, truncation error dominates. Hyper-dual coefficients, by contrast, avoid explicit differencing and therefore inherit noise only through the real evaluation itself. In numerical experiments, the variance of hyper-dual second-derivative estimates remains essentially independent of h, whereas FD estimates exhibit a characteristic U-shaped error curve as a function of h.

5.4.3. Complex-Step Under Noise

While complex-step differentiation is cancelation-free for first derivatives, standard second-derivative complex-step identities rely on a differencing structure. Consequently, under the additive noise model

\tilde{f} = f + η

, the variance of CS second-derivative estimates scales similarly to central FD, with amplification proportional to

σ / h^{2}

. This explains why CS exhibits the same qualitative U-shaped error-versus-h pattern in noisy settings, whereas the hyper-dual estimate remains nearly h-invariant once the real path is fixed.

Figure 1 displays the mean relative error of FD, CS, and HD second-derivative estimates as a function of the step size h under additive Gaussian noise. Panel (a) corresponds to

σ = 10^{- 8}

and panel (b) to

σ = 10^{- 6}

.

5.4.4. Ill-Conditioned Problems and Scaling

To probe ill-conditioning, we consider functions whose Hessians have widely separated eigenvalues. Hyper-dual evaluations reproduce the ill-conditioned Hessian exactly (up to roundoff), while FD formulas struggle to resolve small curvature directions in the presence of large ones. We additionally perform scaling experiments: variables are rescaled by diagonal matrices, and FD step sizes are either kept fixed or adapted. Hyper-dual coefficients are invariant under such rescalings (modulo floating-point effects), while FD approximations are highly sensitive to both variable scaling and step-size heuristics.

These tests confirm that the algebraic advantages of hyper-dual numbers translate into practical robustness when function values are noisy or when problems are ill-conditioned.

6. Cost, Floating-Point Effects, and Comparative Analysis

We emphasize ratios and scaling trends rather than absolute timings, since wall-clock performance varies with compilers, BLAS/LAPACK backends, and hardware. The guiding theme is that hyper-duals trade modest per-call arithmetic overhead for a large reduction in expensive target evaluations while avoiding differencing instabilities.

6.1. Operation Counts, Runtime, and Memory Profiling

6.1.1. Operation Counts

A hyper-dual scalar stores four real coefficients; addition is componentwise, while multiplication expands to a small fixed number of real flops. For f:

R^{n} \to R

, dense Hessian assembly needs

n (n + 1) / 2

hyper-dual evaluations, versus

{(n + 1)}^{2}

forward FD and

2 n (n + 2)

central FD evaluations. Hence, when each f evaluation is costly (e.g., embedded PDE/CFD solves), the reduced call count dominates overall time.

6.1.2. Runtime and Memory Profiling

Per-scalar hyper-dual arithmetic is slower than real arithmetic, but the overhead is bounded by a modest constant factor. Empirically, the reduced evaluation count often yields net speedups, especially in simulation-driven optimization where f calls are the bottleneck. Memory overhead is predictable and linear: arrays require four times real storage, with similar access patterns.

6.1.3. Integration Pattern (“Real Converge, HD Sweep”)

To isolate solver cost from differentiation cost, we recommend (i) converge the state using purely real arithmetic, then (ii) perform one hyper-dual sweep of

n (n + 1) / 2

calls to assemble

\nabla^{2} f

. This preserves the exact second-derivative readout while keeping iterative loops inexpensive.

6.1.4. Algebraic Foundation (Concise)

Hyper-duals live in

HD = R [ε_{1}, ε_{2}] / (ε_{1}^{2}, ε_{2}^{2}, {(ε_{1} ε_{2})}^{2}),

so any lifted

C^{2}

function truncates exactly at bi-degree

(1, 1)

:

f (x + ε_{1} h_{1} v + ε_{2} h_{2} w) = f (x) + h_{1} \partial_{v} f ε_{1} + h_{2} \partial_{w} f ε_{2} + h_{1} h_{2} \partial_{v w} f ε_{1} ε_{2} .

Because lifting is an algebra homomorphism, compositions automatically satisfy the chain rule. This is the core reason mixed second derivatives can be read off from the

ε_{1} ε_{2}

coefficient without differencing.

6.1.5. Implicit Maps

For

G (x, y) = 0

defining

y = ϕ (x)

with

\partial_{y} G \neq 0

, lifting x to hyper-dual form, and propagating through a real-path Newton/fixed-point solve yields

ϕ^{'} (x_{0})

from

ε_{1, 2}

coefficients and

ϕ^{″} (x_{0})

from

ε_{1} ε_{2}

. This enables curvature extraction in implicitly defined models (e.g., shock-angle or constraint solves) in one stabilized pass.

6.1.6. Wall-Clock Scaling and Crossover Regimes

To make scalability transparent beyond operation counts, we distinguish two regimes: (i) cheap objectives, where runtime is dominated by scalar arithmetic, and (ii) expensive objectives, where runtime is dominated by the real evaluation path (e.g., PDE/CFD solves). In regime (ii), the reduction from

{(n + 1)}^{2}

(FD) to

n (n + 1) / 2

(HD) evaluations yields a clear wall-clock advantage, and a crossover dimension

n_{★}

is typically observed where HD becomes faster end-to-end despite per-call overhead.

6.2. Floating-Point Effects and Deviation from Theoretical Exactness

6.2.1. Numerical Deviation Sources

Hyper-dual differentiation is symbolically exact; finite-precision errors arise only from (i) roundoff in the real evaluation path, (ii) roundoff accumulation in coefficient arithmetic, and (iii) ill-conditioning of f at the evaluation point. Crucially, hyper-duals introduce no difference quotients, so they do not add subtractive cancelation beyond what already exists in the real code.

Figure 2 reports measured wall-clock time versus n in both regimes to illustrate this crossover.

6.2.2. Quantifying Floating-Point Deviation

To quantify departures from theoretical exactness, we sweep the conditioning of the real path and the arithmetic precision. Specifically, for quadratic-oscillatory benchmarks with SPD Q, we vary

κ (Q) = λ_{max} / λ_{min}

over several orders of magnitude, and measure the relative error between analytic and hyper-dual Hessians in both float32 and float64. Table 3 summarizes the resulting accuracy trends, confirming that hyper-dual errors track the conditioning of the real evaluation rather than any step-selection artifact.

Figure 3 summarizes evaluation counts versus dimension and target Hessian density; Table 4 records arithmetic/memory footprints.

6.2.3. Scaling and Nondimensionalization

To keep coefficient magnitudes well inside floating-point ranges, scale inputs so typical nondimensional variables are

O (1)

. This improves numeric robustness of

(a_{1}, a_{2}, a_{12})

without affecting algebraic exactness.

6.2.4. h-Sweep Sanity Check

On smooth benchmarks, the hyper-dual second derivative remains essentially flat near machine precision across wide h ranges once the real-path is resolved, whereas FD shows the classical truncation-to-roundoff transition. This behavior confirms that accuracy is governed by real-path conditioning, not by step selection.

6.3. Sensitivity to FD Step Size and Variable Scaling

6.3.1. FD Step-Size Policy Used Throughout

All FD baselines use the central second-difference formula with a uniform diagonal step policy

h_{i} = η max (1, | x_{i} |),

where

η

is swept on a logarithmic grid

η \in {10^{p_{1}}, 10^{p_{2}}, \dots, 10^{p_{k}}}

. This standard heuristic equalizes absolute and relative perturbations across variables and is commonly recommended in numerical differentiation practice. Reporting the full

η

-sweep reveals the truncation-to-roundoff transition and provides a fair, reproducible comparison against hyper-dual and complex-step references.

We assess robustness by sweeping FD steps over many orders of magnitude and by rescaling variables diagonally. FD accuracy depends delicately on both h and units; a step size that is optimal in one scale regime can fail in another. Hyper-dual values provide a stable reference across these sweeps, enabling clear attribution of FD error to step and scaling choices.

6.3.2. Verification Examples

Rosenbrock.

R (x, y) = 100 {(y - x^{2})}^{2} + {(1 - x)}^{2}, \nabla^{2} R = (\begin{matrix} 1200 x^{2} - 400 y + 2 & - 400 x \\ - 400 x & 200 \end{matrix}) .

One hyper-dual call at

(x + ε_{1} h, y + ε_{2} h)

recovers

\partial_{x y} R = - 400 x

exactly from the

ε_{1} ε_{2}

coefficient.

Beale and Himmelblau.

B (x, y) = {(1.5 - x + x y)}^{2} + {(2.25 - x + x y^{2})}^{2} + {(2.625 - x + x y^{3})}^{2},

H (x, y) = {(x^{2} + y - 11)}^{2} + {(x + y^{2} - 7)}^{2} .

Across multiple points and minima, hyper-dual mixed derivatives match analytic references, while FD varies with h and scaling.

Controlled polynomial. For

p (x, y) = x^{3} + 2 x^{2} y + x y^{2} + y^{3}

,

\partial_{x y} p = 4 x + 4 y

is recovered identically.

6.3.3. Noise Sensitivity

Because FD relies on differencing nearby values, it amplifies additive evaluation noise. Hyper-dual coefficients do not, making them preferable for noisy surrogates or ill-conditioned simulations.

6.3.4. Reporting Checklist (Condensed)

State nondimensional scales; list hardware, compiler, and linear algebra backends; identify any nonsmooth operators or smoothing policies; provide analytic baselines and error norms; and document random seeds and which kernels remained real versus hyper-dual.

6.4. Method Selection in Practice

6.4.1. Positioning

Complex-step gradients are attractive for first derivatives, but second-order formulas often combine multiple evaluations and may reintroduce subtraction. Hyper-duals avoid explicit differencing and supply mixed terms directly, with predictable overhead.

6.4.2. Practical Guidance

Gradients only, very high n: complex-step or reverse AD.
Dense Hessians or selected entries in legacy codes: hyper-duals.
PDE/CFD optimization: reverse AD for gradients + hyper-dual sweep for curvature at convergence.
Nonsmooth models: use smoothing or interpret coefficients diagnostically at kinks.

6.4.3. Implementation Pattern and Hessian Extraction

Wrap only assembly layers with hyper-dual types, keeping heavy solvers real. Mixed entries follow from a single evaluation:

function Hessian_HD(f, x):

n = length(x); H = zeros(n,n)

for j in 1..n:

for i in 1..j:

xHD = lift_to_HD(x)

xHD[i] += eps1*h; xHD[j] += eps2*h

yHD = f(xHD)

hij = coeff_eps1eps2(yHD)/(h*h)

H[i,j] = hij; H[j,i] = hij

return H

6.4.4. Pitfalls

Branch only on real parts; avoid or smooth

| x |

, max/min, limiters at kinks; and ensure lifted special functions respect bi-degree

(1, 1)

truncation. If third-party black-box kernels break dual semantics internally, confine hyper-dual propagation to the surrounding real assembly.

7. Extensions and Outlook

7.1. Higher-Order Derivatives

Hyper-dual numbers can be generalized to higher-order nilpotent algebras that encode third and higher derivatives via additional commuting nilpotent directions or via truncated polynomial algebras. Such constructions retain many of the algebraic advantages of hyper-dual numbers but increase the number of coefficients per scalar. The design of efficient higher-order algebras that remain numerically stable and implementation-friendly is an active topic of research.

Sketch: Third Derivatives via Tri-Dual Extension

A direct higher-order extension introduces three commuting nilpotent units

ε_{1}, ε_{2}, ε_{3}

with

ε_{i}^{2} = 0

, encoding third mixed derivatives in the

ε_{1} ε_{2} ε_{3}

coefficient. For

f \in C^{3}

and seed

x^{★} = x + ε_{1} h_{1} v + ε_{2} h_{2} w + ε_{3} h_{3} z

, the Taylor lift truncates exactly at tri-degree

(1, 1, 1)

, yielding

\frac{{[f (x^{★})]}_{ε_{1} ε_{2} ε_{3}}}{h_{1} h_{2} h_{3}} = \partial_{v w z}^{3} f (x) .

Although the coefficient count grows combinatorially, this construction remains purely algebraic and can be implemented by extending the scalar type in Appendix C. In many applications, mixed third derivatives are required only in a few directions; a direction-seeded tri-dual sweep then remains computationally feasible.

7.2. Vector-Valued Functions and Jacobians

For vector-valued maps

f : R^{n} \to R^{m}

, hyper-dual numbers can be used to obtain Jacobians and Hessian-vector products. In particular, propagating multiple directions in vector-mode implementations allows one to recover several columns of the Jacobian or Hessian in a single pass. Coupling hyper-dual arithmetic with reverse-mode AD (“forward-on-reverse” strategies) yields efficient Hessian-vector computations in high dimensions.

7.3. Integration with AD Frameworks

Hyper-dual numbers complement rather than replace existing AD tools. They are particularly attractive when the following apply:

Reverse-mode pipelines are impractical to retrofit into legacy codes.
Only modest numbers of variables or Hessian entries are required.
Reproducibility and robustness under noise are prioritized over absolute speed.

In AD frameworks, hyper-dual types can serve as local scalar types in tape-based computations, enabling hybrid differentiation strategies.

Discussion Bridge

Section 6.4 and Section 7 collectively emphasize that hyper-dual numbers are not a competitor to complex-step or AD, but a complementary tool that reliably supplies second-order information when differencing is fragile or retrofitting reverse-mode is impractical. With these practical and extensibility considerations in place, we now summarize the main contributions and implications.

8. Conclusions

We have provided a consolidated theoretical and experimental account of hyper-dual numbers for exact second derivatives. Building on a commutative, four-component nilpotent algebra, we proved the exact truncation of the Taylor lift at second order, derived coefficient-extraction formulas for gradients and Hessians, and stated clearly the assumptions under which these results hold, including the behavior at non-smooth points and the necessity of consistent real-part branching.

Practical Takeaways

For readers seeking immediate guidance, the main lessons of this study are as follows:

Hyper-dual numbers deliver mixed second derivatives to working precision without step tuning, provided $f \in C^{2}$ and real-path branching is enforced.
In noisy or ill-conditioned regimes, FD and CS second derivatives inherit $σ / h^{2}$ amplification, whereas hyper-dual coefficients remain essentially h-invariant and scale primarily with the real-path conditioning.
For dense Hessians (or selected mixed entries) in legacy or black-box codes, hyper-dual sweeps offer a robust, reproducible alternative to differencing.
When only gradients are needed at very large n, complex-step or reverse AD remains preferable; hyper-duals are most beneficial once curvature is required.
At non-smooth points, hyper-dual outputs diagnose the discrete model’s directional behavior; smoothing surrogates restore classical Hessians if needed.

On the practical side, we presented implementation patterns and language-specific skeletons (C++, Python, and MATLAB) for rapid prototyping, reorganized and expanded numerical examples to include analytic test functions, classical optimization benchmarks, and higher-dimensional problems, and carried out stability tests under noise and ill-conditioning. Runtime and memory profiling, together with sensitivity tests in FD step size and variable scaling, highlighted the reproducible robustness of hyper-dual coefficients relative to FD and complex-step methods and clarified when evaluation-count savings outweigh per-scalar algebraic overhead.

From a methodological standpoint, hyper-dual numbers offer an algebraically principled and computationally reliable route to second-order information in black-box codes. When coupled with real-path convergence and careful API boundaries—keeping heavy linear-algebra kernels in reals while wrapping only residual/functional assembly in hyper-duals—they reduce the trial-and-error tuning common in FD workflows. In this sense, hyper-duals naturally complement complex-step differentiation for gradients and reverse-mode/adjoint AD for large-scale problems: complex-step remains attractive for first derivatives and reverse AD for very high-dimensional gradients, while hyper-duals provide differencing-free, machine-precision Hessians or selected mixed entries with minimal refactoring of legacy code.

Our contribution is a synthesis and systematization of the hyper-dual approach: we collected the core algebraic facts, made assumptions explicit, and provided implementation recipes and fair cost/accuracy comparisons under clearly stated conditions. These results connect and extend early accounts that popularized hyper-duals for exact second derivatives [13,16], and they align with recent demonstrations of hyper-dual formalization and applications beyond aerospace and optimization, including thermodynamics [14] and formal methods/verification [17]. For a broader applied, textbook-level perspective on derivative-based design workflows, we refer to Martins and Ning [18].

Overall, hyper-dual numbers combine the numerical stability associated with complex-step differentiation and the composability of dual-number lifting, exposing both first- and second-order sensitivities in a single evaluation. We expect them to be particularly useful in optimization, uncertainty quantification, simulation-based design, and other settings where curvature information is valuable but differencing is fragile. Promising future directions include higher-order generalizations, GPU-friendly and vector-mode implementations, and further domain-specific deployments in aerodynamics, materials modeling, and machine-learning optimization.

Author Contributions

Conceptualization, J.E.K.; methodology, J.E.K.; software, S.B.P. and J.E.K.; validation, S.B.P. and J.E.K.; formal analysis, J.E.K.; investigation, J.E.K.; resources, J.E.K.; data curation, S.B.P.; writing—original draft, J.E.K.; writing—review and editing, J.E.K.; visualization, S.B.P. and J.E.K.; supervision, J.E.K.; project administration, J.E.K.; funding acquisition, S.B.P. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Dongguk University Research Fund 2023.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. C++ Skeleton with Operator Overloading

Listing A1. Minimal hyper–dual type and selected overloads (header-only).

Appendix B. Python Skeleton for Rapid Prototyping

Listing A2. Lightweight Python class for hyper–dual arithmetic.

Appendix C. MATLAB Skeleton for Rapid Prototyping

The following MATLAB class provides a minimal hyper-dual scalar type suitable for rapid prototyping. It mirrors the C++ and Python skeletons and allows users to implement all algorithms in a unified class of programs.

classdef HD

properties

r % real part

e1 % coeff of epsilon_1

e2 % coeff of epsilon_2

e12 % coeff of epsilon_1*epsilon_2

end

methods

function obj = HD(r,e1,e2,e12)

if nargin == 0

obj.r = 0;

obj.e1 = 0;

obj.e2 = 0;

obj.e12 = 0;

else

obj.r = r;

obj.e1 = e1;

obj.e2 = e2;

obj.e12 = e12;

end

% Addition

function c = plus(a,b)

c = HD(a.r + b.r, ...

a.e1 + b.e1, ...

a.e2 + b.e2, ...

a.e12 + b.e12);

end

% Subtraction

function c = minus(a,b)

c = HD(a.r - b.r, ...

a.e1 - b.e1, ...

a.e2 - b.e2, ...

a.e12 - b.e12);

end

% Multiplication

function c = mtimes(a,b)

c = HD( ...

a.r*b.r, ...

a.r*b.e1 + a.e1*b.r, ...

a.r*b.e2 + a.e2*b.r, ...

a.r*b.e12 + a.e1*b.e2 + a.e2*b.e1 + a.e12*b.r );

end

% Inverse (a.r ~= 0)

function c = inv(a)

r1 = 1.0 / a.r;

c = HD( ...

r1, ...

-a.e1 * r1^2, ...

-a.e2 * r1^2, ...

2.0*a.e1*a.e2 * r1^3 - a.e12 * r1^2 );

end

% Division

function c = mrdivide(a,b)

c = a * inv(b);

end

% Elementary functions (as separate m-files or static methods)

function y = hd_sin(x)

s = sin(x.r); c = cos(x.r);

y = HD( ...

s, ...

c*x.e1, ...

c*x.e2, ...

x.e12*c - x.e1*x.e2*s );

end

function y = hd_exp(x)

e = exp(x.r);

y = HD( ...

e, ...

e*x.e1, ...

e*x.e2, ...

e*(x.e12 + x.e1*x.e2) );

end

Using this class, a user can wrap an existing MATLAB function f that expects real vectors as follows:

function val = f_hd(x_hd)

% x_hd is a vector of HD objects

% implement f using overloaded operations and hd_* functions

end

A simple driver then seeds selected components with e1 and e2 and reads off the e12 coefficients to assemble the Hessian, following Algorithm 2.

Appendix C.1. Minimal Driver for Dense Hessians (MATLAB)

To close the loop between Appendix C and Algorithm 2, we provide a lightweight driver that assembles a dense Hessian using hyper-dual seeding. This wrapper matches the notation of Section 4 and is the routine used in the MATLAB experiments Algorithm 2.

function H = hess_dense_hd(f, x)

% H = hess_dense_hd(f, x)

% Dense Hessian via hyper-dual numbers (Algorithm 2)

% f : function handle accepting HD vectors and returning HD scalar

% x : real vector (n x 1)

n = length(x);

H = zeros(n,n);

h1 = 1; h2 = 1; % nondimensionalized default

for i = 1:n

for j = i:n

X = repmat(HD(0,0,0,0), n, 1);

for k = 1:n

X(k) = HD(x(k), 0, 0, 0);

end

X(i) = HD(x(i), h1, 0, 0);

X(j) = HD(x(j), 0, h2, 0);

Y = f(X);

H(i,j) = Y.e12/(h1*h2);

H(j,i) = H(i,j);

end

This driver highlights the systematic “one class–all algorithms” philosophy requested by Reviewers and enables rapid prototyping of any differentiable MATLAB model by simple operator overloading.

References

Squire, W.; Trapp, G. Using Complex Variables to Estimate Derivatives of Real Functions. SIAM Rev. 1998, 40, 110–112. [Google Scholar] [CrossRef]
Fornberg, B. Generation of Finite Difference Formulas on Arbitrarily Spaced Grids. Math. Comput. 1988, 51, 699–706. [Google Scholar] [CrossRef]
Higham, N.J. Accuracy and Stability of Numerical Algorithms, 2nd ed.; SIAM: Philadelphia, PA, USA, 2002. [Google Scholar] [CrossRef]
Goldberg, D. What Every Computer Scientist Should Know About Floating-Point Arithmetic. ACM Comput. Surv. 1991, 23, 5–48. [Google Scholar] [CrossRef]
Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
Giles, M.B.; Pierce, N.A. An Introduction to the Adjoint Approach to Design. Flow Turbul. Combust. 2000, 65, 393–415. [Google Scholar] [CrossRef]
Griewank, A.; Walther, A. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, 2nd ed.; SIAM: Philadelphia, PA, USA, 2008. [Google Scholar] [CrossRef]
Naumann, U. The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation; SIAM: Philadelphia, PA, USA, 2012. [Google Scholar] [CrossRef]
Martins, J.R.R.A.; Hwang, J.T. Review and Unification of Methods for Computing Derivatives of Multidisciplinary Computational Models. AIAA J. 2013, 51, 2582–2599. [Google Scholar] [CrossRef]
Baydin, A.G.; Pearlmutter, B.A.; Radul, A.A.; Siskind, J.M. Automatic Differentiation in Machine Learning: A Survey. J. Mach. Learn. Res. 2018, 18, 5595–5637. [Google Scholar]
Kim, J.E. Multidual Complex Numbers and the Hyperholomorphicity of Multidual Complex-Valued Functions. Axioms 2025, 14, 683. [Google Scholar] [CrossRef]
Neidinger, R.D. Introduction to Automatic Differentiation and MATLAB Object-Oriented Programming. SIAM Rev. 2010, 52, 545–563. [Google Scholar] [CrossRef]
Fike, J.A.; Alonso, J.J. The Development of Hyper–Dual Numbers for Exact Second–Derivative Calculations. In Proceedings of the 49th AIAA Aerospace Sciences Meeting, Orlando, FL, USA, 4–7 January 2011. AIAA-2011-886. [Google Scholar] [CrossRef]
Rehner, P.; Bauer, G. Application of Generalized (Hyper–) Dual Numbers in Thermodynamics. Front. Chem. Eng. 2021, 3, 758090. [Google Scholar] [CrossRef]
Pearlmutter, B.A. Fast Exact Multiplication by the Hessian. Neural Comput. 1994, 6, 147–160. [Google Scholar] [CrossRef]
Fike, J.A. Derivative Calculations Using Hyper-Dual Numbers; No. SAND2016-5252PE; Sandia National Lab.(SNL-NM): Albuquerque, NM, USA, 2016. [Google Scholar]
Smola, F. Mechanizing Hyperdual Numbers in Isabelle/HOL. Master’s Thesis, University of Edinburgh, Edinburgh, UK, 2020. [Google Scholar]
Martins, J.R.R.A.; Ning, A. Engineering Design Optimization; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar] [CrossRef]

Figure 1. Mean relative error of second derivatives versus step size h under additive noise. FD and CS show the characteristic U-shaped sensitivity with severe blow-up for small h, whereas hyper-dual (HD) estimates remain nearly h-invariant and scale primarily with

σ

. (a)

σ = 10^{- 8}

. (b)

σ = 10^{- 6}

.

Figure 1. Mean relative error of second derivatives versus step size h under additive noise. FD and CS show the characteristic U-shaped sensitivity with severe blow-up for small h, whereas hyper-dual (HD) estimates remain nearly h-invariant and scale primarily with

σ

. (a)

σ = 10^{- 8}

. (b)

σ = 10^{- 6}

.

Figure 2. Wall-clock time to assemble a dense Hessian versus dimension n. Solid lines: expensive objective (simulation-dominated); dashed lines: cheap objective (arithmetic-dominated). Hyper-dual methods exhibit a crossover

n_{★}

beyond which reduced evaluation count dominates.

Figure 2. Wall-clock time to assemble a dense Hessian versus dimension n. Solid lines: expensive objective (simulation-dominated); dashed lines: cheap objective (arithmetic-dominated). Hyper-dual methods exhibit a crossover

n_{★}

beyond which reduced evaluation count dominates.

Figure 3. Heat-map of (FD calls)/(hyper-dual calls) vs. dimension and desired Hessian density.

Table 1. Overview of numerical examples in Section 5. Each example is chosen to highlight a specific aspect of accuracy, robustness, or scalability under the assumptions in Section 2.3.

Example Class	Dimension n	Analytic Hessian?	Purpose
Analytic scalar/bivariate tests	$1, 2$	Yes	Validate exact coefficient readout; step invariance
Rosenbrock/Beale/Himmelblau	2	Yes	Coupling, nonlinearity, benchmark comparability
Quadratic-oscillatory high-n	10–200	Yes	Scaling and ill-conditioning sensitivity
Stiff solver/structured residual	problem-dependent	Partial/reference	Real-path branching and black-box robustness
Noise & step-sensitivity sweeps	1 (representative)	Yes	Quantify FD/CS U-shape vs. HD stability

Table 2. Noise robustness and step-size sensitivity for second-derivative estimation. We report the mean relative error (

\bar{RelErr}

) and standard deviation (

Std

) over

N_{MC}

Monte Carlo trials for the scalar test

f (x) = exp (x) sin x

at

x = x_{0}

. Additive noise is modeled by

\tilde{f} (x) = f (x) + η

,

η \sim N (0, σ^{2})

. FD uses the central second-difference formula, CS uses a standard second-derivative complex-step identity (which reintroduces differencing), and HD reads the

ε_{1} ε_{2}

coefficient from a single hyper-dual evaluation.

Table 2. Noise robustness and step-size sensitivity for second-derivative estimation. We report the mean relative error (

\bar{RelErr}

) and standard deviation (

Std

) over

N_{MC}

Monte Carlo trials for the scalar test

f (x) = exp (x) sin x

at

x = x_{0}

. Additive noise is modeled by

\tilde{f} (x) = f (x) + η

,

η \sim N (0, σ^{2})

. FD uses the central second-difference formula, CS uses a standard second-derivative complex-step identity (which reintroduces differencing), and HD reads the

ε_{1} ε_{2}

coefficient from a single hyper-dual evaluation.

Noise Level	Step h	FD (Central 2nd)		CS (2nd)		HD (Coef-Based)
$σ$	(Sweep)	$\bar{RelErr}$	$Std$	$\bar{RelErr}$	$Std$	$\bar{RelErr}$	$Std$
$10^{- 8}$	$10^{- 1}$	$3.2 \times 10^{- 4}$	$1.1 \times 10^{- 6}$	$8.4 \times 10^{- 5}$	$3.0 \times 10^{- 6}$	$1.1 \times 10^{- 8}$	$0.7 \times 10^{- 8}$
	$10^{- 3}$	$1.6 \times 10^{- 2}$	$1.0 \times 10^{- 2}$	$5.3 \times 10^{- 3}$	$3.4 \times 10^{- 3}$	$1.2 \times 10^{- 8}$	$0.8 \times 10^{- 8}$
	$10^{- 6}$	$1.2 \times 10^{3}$	$9.0 \times 10^{2}$	$4.1 \times 10^{2}$	$3.0 \times 10^{2}$	$1.2 \times 10^{- 8}$	$0.8 \times 10^{- 8}$
$10^{- 6}$	$10^{- 1}$	$4.4 \times 10^{- 4}$	$1.0 \times 10^{- 4}$	$1.3 \times 10^{- 4}$	$4.1 \times 10^{- 5}$	$1.2 \times 10^{- 6}$	$0.8 \times 10^{- 6}$
	$10^{- 3}$	$8.1 \times 10^{- 1}$	$6.2 \times 10^{- 1}$	$3.1 \times 10^{- 1}$	$2.2 \times 10^{- 1}$	$1.2 \times 10^{- 6}$	$0.8 \times 10^{- 6}$
	$10^{- 6}$	$2.0 \times 10^{5}$	$1.5 \times 10^{5}$	$6.2 \times 10^{4}$	$4.4 \times 10^{4}$	$1.3 \times 10^{- 6}$	$0.9 \times 10^{- 6}$

Protocol.

N_{MC} = 200

trials, pseudo-random seed fixed to 2025,

x_{0} = 0.7

, and relative error defined as

RelErr = | {\hat{f}}^{″} - f^{″} | / max (1, | f^{″} |)

. The data illustrate the characteristic U-shaped FD/CS sensitivity and the near h-invariance of hyper-dual coefficients under noise, consistent with the stability discussion in Section 5.4.

Table 3. Hessian accuracy versus conditioning and precision for FD, CS, and HD. Entries report the relative Frobenius error

∥ H_{approx} - H_{exact} ∥_{F} / {∥ H_{exact} ∥}_{F}

for a quadratic-oscillatory benchmark with varying condition number

κ (Q)

.

Table 3. Hessian accuracy versus conditioning and precision for FD, CS, and HD. Entries report the relative Frobenius error

∥ H_{approx} - H_{exact} ∥_{F} / {∥ H_{exact} ∥}_{F}

for a quadratic-oscillatory benchmark with varying condition number

κ (Q)

.

$κ (Q)$	FD		CS		HD
$κ (Q)$	float32	float64	float32	float64	float32	float64
$10^{2}$	$3.5 \times 10^{- 3}$	$7.0 \times 10^{- 5}$	$8.0 \times 10^{- 4}$	$1.5 \times 10^{- 5}$	$4.0 \times 10^{- 7}$	$2.0 \times 10^{- 14}$
$10^{4}$	$1.2 \times 10^{- 2}$	$4.0 \times 10^{- 4}$	$3.5 \times 10^{- 3}$	$9.0 \times 10^{- 5}$	$6.0 \times 10^{- 7}$	$4.0 \times 10^{- 14}$
$10^{6}$	$4.5 \times 10^{- 2}$	$1.7 \times 10^{- 3}$	$1.2 \times 10^{- 2}$	$3.8 \times 10^{- 4}$	$1.1 \times 10^{- 6}$	$1.0 \times 10^{- 13}$
$10^{8}$	$1.8 \times 10^{- 1}$	$7.0 \times 10^{- 3}$	$4.0 \times 10^{- 2}$	$1.6 \times 10^{- 3}$	$3.0 \times 10^{- 6}$	$3.0 \times 10^{- 13}$

Table 4. Arithmetic/memory footprint for hyper-dual scalars (dense).

Quantity	Real	Hyper-Dual
Scalar storage	1	4
Addition flops	1	4 (componentwise)
Multiplication flops	1	≈9 mult. + 5 add.
Array of size N	N reals	$4 N$ reals

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, S.B.; Kim, J.E. Hyper–Dual Numbers: A Theoretical Foundation for Exact Second Derivatives. Mathematics 2025, 13, 3909. https://doi.org/10.3390/math13243909

AMA Style

Park SB, Kim JE. Hyper–Dual Numbers: A Theoretical Foundation for Exact Second Derivatives. Mathematics. 2025; 13(24):3909. https://doi.org/10.3390/math13243909

Chicago/Turabian Style

Park, Sung Bum, and Ji Eun Kim. 2025. "Hyper–Dual Numbers: A Theoretical Foundation for Exact Second Derivatives" Mathematics 13, no. 24: 3909. https://doi.org/10.3390/math13243909

APA Style

Park, S. B., & Kim, J. E. (2025). Hyper–Dual Numbers: A Theoretical Foundation for Exact Second Derivatives. Mathematics, 13(24), 3909. https://doi.org/10.3390/math13243909

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hyper–Dual Numbers: A Theoretical Foundation for Exact Second Derivatives

Abstract

1. Introduction

2. Hyper-Dual Algebra and Assumptions for Exactness

2.1. From Generalized Complex Numbers to Hyper-Duals

2.2. Definition of the Hyper-Dual Algebra

2.3. Assumptions for Exactness and Limitations at Non-Smooth Points

A Practical Smoothing Remark

3. Exact Taylor Lift and Extraction of Derivatives

3.1. Scalar Case

3.2. Multivariate Case and Hessian Extraction

3.3. Chain Rule and Compositions

4. Implementation Patterns and Algorithms

4.1. Smooth Lifts of Elementary Functions

4.2. Hessian Extraction Algorithm

4.3. Branching, Non-Smooth Operators, and Reproducibility

Implementation Cross-Reference

5. Numerical Experiments

5.1. Analytic Test Functions

Polynomial and Trigonometric Combinations

5.2. Optimization Benchmarks: Rosenbrock, Beale, and Himmelblau

5.3. Higher-Dimensional and Stiff Problems

5.3.1. Structured PDE–Residual Benchmark (Toy Poisson Model)

5.3.2. Neural-Network Loss Benchmark (Small MLP)

5.4. Numerical Stability Under Noise and Ill-Conditioning

5.4.1. Experimental Protocol

5.4.2. Additive Noise in Function Values

5.4.3. Complex-Step Under Noise

5.4.4. Ill-Conditioned Problems and Scaling

6. Cost, Floating-Point Effects, and Comparative Analysis

6.1. Operation Counts, Runtime, and Memory Profiling

6.1.1. Operation Counts

6.1.2. Runtime and Memory Profiling

6.1.3. Integration Pattern (“Real Converge, HD Sweep”)

6.1.4. Algebraic Foundation (Concise)

6.1.5. Implicit Maps

6.1.6. Wall-Clock Scaling and Crossover Regimes

6.2. Floating-Point Effects and Deviation from Theoretical Exactness

6.2.1. Numerical Deviation Sources

6.2.2. Quantifying Floating-Point Deviation

6.2.3. Scaling and Nondimensionalization

6.2.4. h-Sweep Sanity Check

6.3. Sensitivity to FD Step Size and Variable Scaling

6.3.1. FD Step-Size Policy Used Throughout

6.3.2. Verification Examples

6.3.3. Noise Sensitivity

6.3.4. Reporting Checklist (Condensed)

6.4. Method Selection in Practice

6.4.1. Positioning

6.4.2. Practical Guidance

6.4.3. Implementation Pattern and Hessian Extraction

6.4.4. Pitfalls

7. Extensions and Outlook

7.1. Higher-Order Derivatives

Sketch: Third Derivatives via Tri-Dual Extension

7.2. Vector-Valued Functions and Jacobians

7.3. Integration with AD Frameworks

Discussion Bridge

8. Conclusions

Practical Takeaways

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. C++ Skeleton with Operator Overloading

Appendix B. Python Skeleton for Rapid Prototyping

Appendix C. MATLAB Skeleton for Rapid Prototyping

Appendix C.1. Minimal Driver for Dense Hessians (MATLAB)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI