xjb: Fast Float to String Algorithm

Xiang, Junbo; Wang, Tiejun

doi:10.3390/computers15050280

Open AccessArticle

xjb: Fast Float to String Algorithm

by

Junbo Xiang

and

Tiejun Wang

^*

School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China

^*

Author to whom correspondence should be addressed.

Computers 2026, 15(5), 280; https://doi.org/10.3390/computers15050280

Submission received: 29 March 2026 / Revised: 20 April 2026 / Accepted: 24 April 2026 / Published: 27 April 2026

(This article belongs to the Special Issue Computational Science and Its Applications 2025 (ICCSA 2025))

Download

Browse Figures

Versions Notes

Abstract

Efficiently and accurately converting floating-point numbers to decimal strings remains a fundamental challenge in numerical computation, data serialization, and human–computer interaction. While modern algorithms such as Ryū, Dragonbox, and Schubfach rigorously satisfy the Steele–White criteria for correctness and minimal output length, their performance is frequently constrained by branch mispredictions, high-precision multiplication overhead, and suboptimal utilization of instruction-level parallelism. This paper introduces xjb, a novel floating-point–string conversion algorithm derived from Schubfach that systematically overcomes these bottlenecks. By restructuring the core computation to reduce instruction dependencies, adopting branchless decision logic, and exploiting SIMD instruction sets for decimal-to-ASCII formatting, xjb delivers state-of-the-art throughput across diverse hardware platforms. The algorithm requires only a single 64-by-128-bit multiplication for IEEE 754 binary64 conversions and a single 64-by-64-bit multiplication for binary32, drastically decreasing arithmetic complexity. Extensive benchmarking on AMD R7-7840H and Apple M1/M5 processors demonstrates that xjb consistently outperforms leading contemporary implementations. Notably, on the Apple M5, xjb achieves speedups of approximately 20% and 136% for binary64 and binary32 conversions, respectively, when compared to the highly optimized zmij library. The algorithm is fully compliant with the Steele–White principle; exhaustive validation over the entire binary32 space and extensive random testing across the binary64 range confirm both its theoretical soundness and practical robustness.

Keywords:

floating point; printing; algorithm; performance; SIMD; branchless

1. Introduction

Floating-point–decimal string conversion is a fundamental operation in computer systems, with widespread applications across numerous domains. From scientific computing and financial systems to web services and database management, the ability to efficiently and accurately convert binary floating-point representations into human-readable decimal strings is essential. Despite its apparent simplicity, this conversion problem presents significant challenges in balancing the competing demands of correctness, performance, and output compactness.

1.1. Background and Motivation

In 1990, Steele and White [1,2] established the foundational principles for optimal floating-point printing algorithms, now widely known as the Steele–White (SW) principle. The SW principle comprises four key requirements:

Information Preservation: The printed result must be parsable back to the original floating-point number without loss of precision.
Minimum Length: The output string should be as short as possible while maintaining information preservation.
Correct Rounding: When multiple representations satisfy the first two criteria, the algorithm must correctly round to the nearest value, with ties broken by selecting the even value.
Left-to-Right Generation: The output digits should be generated sequentially from the most significant to the least significant digit.

The SW principle ensures that floating-point numbers have a unique, well-defined decimal representation that is both human-readable and machine-parsable. Algorithms satisfying the SW principle guarantee that the conversion process is reversible and produces the shortest possible output, which is crucial for data exchange, serialization, and user interface display. Over the past three decades, significant research efforts have been devoted to developing efficient algorithms. Early approaches, such as Dragon4, provided correct results but suffered from performance limitations due to arbitrary-precision arithmetic, and they were later improved by dtoa.c [3]. Grisu3 [4] pioneered the use of precomputed powers of ten to avoid expensive operations, although it occasionally fell back to slower methods. Errol [5] reduced the fallback rate through more precise error analysis. Ryū [6,7] established a new performance baseline through careful instruction scheduling and lookup table optimizations. Schubfach [8] introduced a compact and elegant solution based on the pigeonhole principle, while Grisu-Exact [9] eliminated fallbacks entirely. Dragonbox [10] reduced the number of multiplications, at the cost of more branches; yy_double [11] and yy_json [12] explored alternative computational strategies to minimize multiplication costs, but they still retained a few branches; uscale [13,14,15], proposed by Russ Cox, enhances floating-point printing performance in the Go programming language. Finally, zmij [16] builds upon yy_double with extensive code-level optimizations.

Despite these advances, existing algorithms still face several performance challenges that limit their effectiveness in high-throughput scenarios:

Branch Prediction Penalties: Many algorithms rely heavily on conditional branches to handle different cases, leading to frequent branch mispredictions on modern pipelined processors.
High-Precision Multiplication Overhead: The conversion process requires high-precision arithmetic operations, particularly multiplications involving large precomputed constants, which can be expensive on standard hardware.
Instruction Dependency Chains: Sequential dependencies between operations limit instruction-level parallelism and prevent efficient utilization of modern superscalar processors.
Limited SIMD Utilization: Most existing algorithms do not exploit vector instruction sets (SIMD) that are now ubiquitous in contemporary processors.

Despite the progress made by prior works, a critical research gap remains: no existing algorithm holistically addresses the four fundamental performance bottlenecks—branch mispredictions, high-precision multiplication overhead, instruction dependency chains, and underutilization of SIMD capabilities. For example,

Schubfach [8] offers an elegant approach but suffers from suboptimal performance due to unoptimized computation flow.
Dragonbox [10] reduces multiplications at the cost of increased branches, trading off one bottleneck for another.
yy_double [11] and yy_json [12] are highly optimized but do not use SIMD instructions.
zmij [16] provides competitive performance but still leaves room for improvement in instruction dependency reduction and branch optimization.

This gap necessitates a new approach that systematically integrates multiple optimization strategies within a cohesive framework, rather than trading off one bottleneck for another.

1.2. Contributions

This paper presents xjb, a novel floating-point–string conversion algorithm that achieves superior performance through systematic optimization of the underlying computational structure. The xjb algorithm is derived from the Schubfach algorithm and incorporates insights from yy_double and Dragonbox, but it introduces several key innovations that directly address the limitations of existing frameworks:

Reduced Instruction Dependencies: Unlike other algorithms that suffer from sequential dependencies, xjb carefully restructures the computation by decomposing d (introduce on Section 3.2) into $d / / 10$ and $d % 10$ instead of computing d directly. This minimizes data dependencies between operations, enabling better instruction-level parallelism and improved pipeline utilization on modern superscalar processors.
Minimized Multiplication Operations: Building on insights from yy_double, but without the trade-off of increased branches, xjb reduces the number of expensive high-precision multiplications required during conversion, significantly decreasing the computational overhead while maintaining branch efficiency. For IEEE 754 binary64, only one 64-bit by 128-bit multiplication is required, and for IEEE 754 binary32, only one 64-bit by 64-bit multiplication is needed.
Mitigated Branch Prediction Penalties: Through branchless programming techniques and careful case analysis, xjb addresses the branch prediction problem that plagues algorithms like Dragonbox. All branches in xjb are designed as unlikely branches, and the core conversion of normal floating-point numbers is completely branch-free, minimizing conditional branches that could lead to prediction failures.
SIMD Instruction Utilization: Unlike most existing algorithms that neglect SIMD potential, xjb is designed from the ground up to leverage SIMD instructions (NEON for ARM64, AVX512/SSE4.1/SSE2 for x86-64) for efficient decimal-to-ASCII conversion, fully exploiting the vector processing capabilities of contemporary processors.
Concise Core Implementation: Despite its sophisticated optimizations, xjb maintains a compact and readable core implementation, facilitating adoption and maintenance.

These innovations work synergistically to address all key performance bottlenecks simultaneously, filling the research gap identified in the previous section. The xjb algorithm supports both IEEE 754 single-precision (binary32) and double-precision (binary64) floating-point formats, which are the most widely used floating-point representations in modern computing. For simplicity, this paper uses float to refer to IEEE 754 binary32 and double to refer to IEEE 754 binary64.

1.3. Evaluation Overview

We conducted extensive benchmarking of xjb across diverse hardware platforms, including AMD R7-7840H and Apple M1/M5 processors. Our evaluation demonstrates that xjb outperforms state-of-the-art algorithms in most scenarios while maintaining full compliance with the SW principle. The algorithm exhibits excellent portability and scalability, making it suitable for deployment across a wide range of systems, from embedded devices to high-performance servers.

The remainder of this paper is organized as follows: Section 2 reviews the IEEE 754 floating-point representation and establishes the mathematical foundation for the conversion problem. Section 3 elucidates the core principles and derivation of the xjb algorithm and describes the implementation details and optimizations. Section 4 presents experimental results comparing xjb against existing algorithms. To conclude, Section 5 offers a comprehensive summary of the paper.

1.4. Explanation of Special Symbols in This Article

We provide special explanations for the special symbols used in the formulae of this article, as shown in Table 1.

2. IEEE 754 Floating-Point Number Representation

This section establishes the mathematical foundation for the representation of floating-point numbers and defines the notation used throughout this paper. We focus on the IEEE 754 standard, which is the most widely adopted floating-point arithmetic standard in modern computing systems.

2.1. Scope and Assumptions

For clarity of presentation, we make the following simplifying assumptions:

We consider only positive floating-point numbers, as negative numbers differ only by a leading minus sign.
We excluded special values (zero, NaN, and infinity) from our analysis, since these are handled separately in practice.

These assumptions are standard in the literature and do not affect the generality of our algorithm, as excluded cases can be handled with straightforward special-case logic.

2.2. Binary Representation

The IEEE 754 standard [17,18] defines two primary floating-point formats relevant to this work:

Double Precision (binary64): A 64-bit format consisting of the following:

One sign bit (s): indicates positive ( $s = 0$ ) or negative ( $s = 1$ ).
Eleven exponent bits (e): biased exponent in the range $[0, 2047]$ .
Fifty-two fraction bits (f): significant fraction in the range $[0, 2^{52} - 1]$ .

Single Precision (binary32): A 32-bit format consisting of the following:

one sign bit (s): indicates positive ( $s = 0$ ) or negative ( $s = 1$ ).
Eight exponent bits (e): biased exponent in the range $[0, 255]$ .
Twenty-three fraction bits (f): significant fraction in the range $[0, 2^{23} - 1]$ .

2.3. Classification of Floating-Point Numbers

We classify floating-point numbers into three categories based on their exponent and fraction fields:

Subnormal Numbers ( $e = 0$ and $f \neq 0$ ): These represent very small values close to zero, where the implicit leading bit of the significand is 0 instead of 1.
Normal Numbers ( $e \neq 0$ and $f \neq 0$ ): The standard case, where the implicit leading bit of the significand is 1.
Irregular Numbers ( $e \neq 0$ and $f = 0$ ): Numbers with zero fraction field, representing powers of two.

We use the term regular to refer to both subnormal and normal numbers (i.e., all cases where

f \neq 0

). Unless otherwise specified, this article discusses only regular values—that is, non-irregular values.

2.4. Value Representation

The real value v of a positive floating-point number can be expressed in the unified form

v = c \cdot 2^{q}

, where c is the integer significand and q is the exponent. The general formula covering all cases is as follows:

\begin{matrix} double : & v = (f + (e \neq 0 ? 2^{52} : 0)) \cdot 2^{max (e, 1) - 1023 - 52} = c \cdot 2^{q} \\ float : & v = (f + (e \neq 0 ? 2^{23} : 0)) \cdot 2^{max (e, 1) - 127 - 23} = c \cdot 2^{q} \end{matrix}

(1)

For each category, the values decompose as follows:

Subnormal Numbers (

e = 0

,

f \neq 0

):

\begin{matrix} subnormal : \{\begin{matrix} double : & v = f \cdot 2^{- 1074} \\ float : & v = f \cdot 2^{- 149} \end{matrix} \end{matrix}

(2)

Normal Numbers (

e \neq 0

,

f \neq 0

):

\begin{matrix} normal : \{\begin{matrix} double : & v = (f + 2^{52}) \cdot 2^{e - 1075} \\ float : & v = (f + 2^{23}) \cdot 2^{e - 150} \end{matrix} \end{matrix}

(3)

Irregular Numbers (

e \neq 0

,

f = 0

):

\begin{matrix} irregular : \{\begin{matrix} double : & v = 2^{52} \cdot 2^{e - 1075} \\ float : & v = 2^{23} \cdot 2^{e - 150} \end{matrix} \end{matrix}

(4)

2.5. Rounding Interval

A critical concept for accurate floating-point printing is the rounding interval

R_{v}

, which defines the range of real numbers that round to the given floating-point value v when parsed. The rounding interval is bounded by

\begin{matrix} v_{l} & = \{\begin{matrix} (c - \frac{1}{2}) \cdot 2^{q} = v - 2^{q - 1} & if f \neq 0 or e \leq 1 \\ (c - \frac{1}{4}) \cdot 2^{q} = v - 2^{q - 2} & if f = 0 \end{matrix} \\ v_{r} & = (c + \frac{1}{2}) \cdot 2^{q} = v + 2^{q - 1} \\ R_{v} & = \{\begin{matrix} [v_{l}, v_{r}] & if f mod 2 = 0 (even significand) \\ (v_{l}, v_{r}) & if f mod 2 = 1 (odd significand) \end{matrix} \end{matrix}

(5)

The rounding radius for regular floating-point numbers is

2^{q - 1} = v_{r} - v

. The distinction between closed and open intervals at the boundaries depends on the parity of the significand, ensuring correct rounding according to the round-to-even rule specified in the IEEE 754 standard. Any decimal number within

R_{v}

will parse back to the original floating-point value v, which is essential for ensuring the information preservation property of the SW principle.

Table 2 summarizes the valid ranges for c and q across different categories.

3. Algorithm Principles

This section presents the algorithmic principles and mathematical foundation of the xjb floating-point–string conversion algorithm. We first introduce the overall architecture and design goals, followed by the mathematical formulation of the conversion problem.

3.1. Design Overview

This paper focuses on converting float (single-precision) and double (double-precision) floating-point numbers to decimal strings. The conversion process consists of two stages:

Float-to-Decimal Conversion: Converting binary floating-point values to decimal significand–exponent pairs $(d, k)$ .
Decimal-to-String Conversion: Formatting $(d, k)$ into human-readable strings.

Table 3 presents the corresponding optimization methods for the identified limitations and the chapter information where they are located.

3.2. Mathematical Foundation

Before presenting the algorithm details, we establish the mathematical framework for the float-to-decimal conversion problem.

Recall from Section 2 that any floating-point value v can be expressed in the form

v = c \cdot 2^{q}

, where c is the integer significand and q is the exponent. Our goal is to find the optimal decimal representation

o p t = d \cdot 10^{k}

that satisfies the SW principle.

As established in Section 2, regular floating-point numbers (which include all normal and subnormal numbers with non-zero fraction fields) account for the vast majority of possible floating-point values. For the purposes of algorithm derivation, we focus primarily on regular numbers, as special cases can be handled with minimal additional logic.

The valid ranges for the significand c and exponent q for regular floating-point numbers are as follows:

\begin{matrix} float : & \{\begin{matrix} 1 \leq c \leq 2^{24} - 1, c \neq 2^{23}, & q = - 149 \\ 2^{23} + 1 \leq c \leq 2^{24} - 1, & - 148 \leq q \leq 104 \end{matrix} \\ double : & \{\begin{matrix} 1 \leq c \leq 2^{53} - 1, c \neq 2^{52}, & q = - 1074 \\ 2^{52} + 1 \leq c \leq 2^{53} - 1, & - 1073 \leq q \leq 971 \end{matrix} \end{matrix}

(6)

For irregular floating-point numbers (powers of two), the ranges are

\begin{matrix} float : & c = 2^{23}, - 149 \leq q \leq 104 \\ double : & c = 2^{52}, - 1074 \leq q \leq 971 \end{matrix}

(7)

For subnormal numbers,

\begin{matrix} float : & c \leq 2^{23} - 1, q = - 149 \\ double : & c \leq 2^{52} - 1, q = - 1074 \end{matrix}

(8)

The conversion problem can now be formally stated as follows: given a floating-point value

v = c \cdot 2^{q}

, find the optimal decimal representation

o p t = d \cdot 10^{k}

such that

\begin{matrix} v = c \cdot 2^{q} & \to o p t = d \cdot 10^{k} \\ subject to : & o p t \in R_{v}, d \in Z^{+}, k \in Z \end{matrix}

(9)

where

R_{v}

is the rounding interval of v, as defined in Section 2.

For example, consider the IEEE 754 binary64 floating-point number representing 1.3. Its actual value is 1.3000000000000000444089209850062616169452667236328125, with the hexadecimal representation 3ff4cccccccccccd. The optimal decimal representation

o p t

satisfying the SW principle is simply 1.3. Similarly, for the binary32 representation of 1.3, with an actual value of 1.2999999523162841796875 and a hexadecimal representation of 3fa66666, the optimal representation is also 1.3.

3.3. Overview of the Schubfach Algorithm and Derivation of Our Method

This section reviews the Schubfach algorithm and presents our derivation of an optimized variant. We begin by establishing the mathematical foundation for determining the optimal decimal representation.

3.3.1. Candidate Values for the Significand d

The Schubfach algorithm identifies four candidate values for the decimal significand d:

\begin{matrix} d \in {10 ⌊ v \cdot 10^{- k - 1} ⌋, ⌊ 10 v \cdot 10^{- k - 1} ⌋, ⌊ 10 v \cdot 10^{- k - 1} ⌋ + 1, 10 ⌊ v \cdot 10^{- k - 1} ⌋ + 10} \end{matrix}

(10)

The exponent k is computed as follows:

k = \{\begin{matrix} ⌊ q \cdot {log}_{10} (2) ⌋ & if v is regular \\ ⌊ q \cdot {log}_{10} (2) - {log}_{10} (4 / 3) ⌋ & otherwise \end{matrix}

(11)

For efficient computation on modern processors, Equation (11) can be implemented using integer arithmetic:

\begin{matrix} double : k & = (q \cdot 315653 - (v is regular ? 0 : 131072)) ≫ 20 \\ float : k & = (q \cdot 1233 - (v is regular ? 0 : 512)) ≫ 12 \end{matrix}

(12)

3.3.2. Decomposition into Integer and Fractional Parts

We will decompose

v \cdot 10^{- k - 1}

into its integer component

⌊ v \cdot 10^{- k - 1} ⌋

(floor function) and fractional component

v \cdot 10^{- k - 1} - ⌊ v \cdot 10^{- k - 1} ⌋

. Let

m = ⌊ v \cdot 10^{- k - 1} ⌋

denote the integer part and

n = v \cdot 10^{- k - 1} - m

the fractional part, where

0 \leq n < 1

. Substituting into Equation (10),

d \in {10 m, ⌊ 10 (m + n) ⌋, ⌊ 10 (m + n) ⌋ + 1, 10 m + 10}

(13)

Since

⌊ 10 (m + n) ⌋ = 10 m + ⌊ 10 n ⌋

, the candidates simplify to:

d \in {10 m, 10 m + ⌊ 10 n ⌋, 10 m + ⌊ 10 n ⌋ + 1, 10 m + 10}

(14)

Based on the Schubfach algorithm and the SW principle, if the value

10 m

or

10 m + 10

falls within the range

R_{v}

, it is selected as the optimal solution d. In cases where neither

10 m

nor

10 m + 10

lies within

R_{v}

, the optimal value d is determined as either

10 m + ⌊ 10 n ⌋

or

10 m + ⌊ 10 n ⌋ + 1

in accordance with the rules of correct rounding, as shown in Equation (15). We decompose d as

d = t e n + o n e

, where

t e n = 10 m

and

o n e \in {0, ⌊ 10 n ⌋, ⌊ 10 n ⌋ + 1, 10}

. The problem thus reduces to determining the appropriate value of

o n e

.

o n e = \{\begin{matrix} 0; if 10 m \in R_{v} \\ 10; else if 10 m + 10 \in R_{v} \\ ⌊ 10 n ⌋ or ⌊ 10 n ⌋ + 1; else apply correct rounding \end{matrix}

(15)

3.3.3. Selection Criteria for $o n e$

The selection of

o n e

depends on the relationship between the rounding interval and the candidate values. Recall that the rounding interval for a regular floating-point number

v = c \cdot 2^{q}

is

R_{v} = (f % 2 = 0) ? [v - 2^{q - 1}, v + 2^{q - 1}] : (v - 2^{q - 1}, v + 2^{q - 1})

, where the rounding radius

2^{q - 1}

represents the half-unit in the last place (ulp) of v.

Case $o n e = 0$ (i.e., $d = 10 m$ ): This case applies when $10 m \cdot 10^{k}$ falls inside the rounding interval $R_{v}$ . The condition is derived as follows:
The lower bound of the rounding interval must be less than $10 m \cdot 10^{k}$ :

$\begin{matrix} c \cdot 2^{q} - 10 m \cdot 10^{k} & < 2^{q - 1} \\ c \cdot 2^{q} - ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ \cdot 10^{k + 1} & < 2^{q - 1} \\ c \cdot 2^{q} \cdot 10^{- k - 1} - ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ & < 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \\ n & < 2^{q - 1} \cdot 10^{- k - 1} \end{matrix}$

(16)

When equality holds ( $n = 2^{q - 1} \cdot 10^{- k - 1}$ , or equal to $10 m \cdot 10^{k} = v - 2^{q - 1}$ ), we apply the round-to-even rule, requiring c to be even:

$o n e = 0 if 2^{q - 1} \cdot 10^{- k - 1} > n or (2^{q - 1} \cdot 10^{- k - 1} = n and c mod 2 = 0)$

(17)
Case $o n e = 10$ (i.e., $d = 10 m + 10$ ): This case applies when $(10 m + 10) \cdot 10^{k}$ falls inside the rounding interval $R_{v}$ . The condition is derived similarly:
The upper bound of the rounding interval must be greater than $(10 m + 10) \cdot 10^{k}$ :

$\begin{matrix} (10 m + 10) \cdot 10^{k} - c \cdot 2^{q} & < 2^{q - 1} \\ ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ \cdot 10^{k + 1} + 10^{k + 1} - c \cdot 2^{q} & < 2^{q - 1} \\ 1 - (c \cdot 2^{q} \cdot 10^{- k - 1} - ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋) & < 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \\ 1 - n & < 2^{q - 1} \cdot 10^{- k - 1} \end{matrix}$

(18)

When equality holds ( $1 - n = 2^{q - 1} \cdot 10^{- k - 1}$ , or equal to $(10 m + 10) \cdot 10^{k} = v + 2^{q - 1}$ ), we again apply round-to-even:

$o n e = 10 if 2^{q - 1} \cdot 10^{- k - 1} > 1 - n or (2^{q - 1} \cdot 10^{- k - 1} = 1 - n and c mod 2 = 0)$

(19)
Case $o n e \in {⌊ 10 n ⌋, ⌊ 10 n ⌋ + 1}$ : When neither boundary condition applies, the optimal value lies between $10 m + ⌊ 10 n ⌋$ and $10 m + ⌊ 10 n ⌋ + 1$ . We determine $o n e$ by rounding $10 n$ to the nearest integer:
–
If the fractional part ${10 n} < 0.5$ : $o n e = ⌊ 10 n ⌋$ ;
–
If the fractional part ${10 n} > 0.5$ : $o n e = ⌊ 10 n ⌋ + 1$ ;
–
If the fractional part ${10 n} = 0.5$ : apply round-to-even.
For irregular floating-point numbers (powers of two), additional verification is required to ensure that the selected value lies within the rounding interval $R_{v}$ , as the interval boundaries differ for these special cases.

3.3.4. Algorithm Overview

Algorithm 1 summarizes our optimized variant of the Schubfach algorithm (xjb32 and xjb64 for float and double, respectively.). Given inputs c and q, the algorithm returns d and k such that

d \cdot 10^{k}

satisfies the SW principle. The computation of k follows Equatio (12); the remainder of this chapter focuses on efficient computation of d for regular floating-point numbers.

We will provide a detailed introduction to the efficient implementation of Algorithm 1 in the following sections:

Lookup table precomputation;
Efficient computation of m;
Fast boundary condition testing for $o n e \in {0, 10}$ ;
Efficient computation of $⌊ 10 n ⌋$ and rounding;
Handling of irregular floating-point numbers;
Implementation of pseudocode.

In the last section of this chapter, we will briefly introduce the second stage: decimal-to-string conversion.

In essence, xjb diverges from the baseline Schubfach by transforming the conditional boundary tests (which cause branch mispredictions) into integer arithmetic and lookup operations. Furthermore, it reorders the computation of m and n to reduce data dependencies, directly addressing the instruction-level parallelism limitation identified in Section 1.1.

3.4. Lookup Table Precomputation

The algorithm in this paper employs a lookup table to store precomputed values of

10^{- k - 1}

for different ranges of q:

[- 149, 104]

for float and

[- 1074, 971]

for double. These lookup tables use extended precision: 64-bit for float and 128-bit for double. The reference implementation is available in (gen.py) https://github.com/xjb714/xjb/blob/main/py_test/gen.py (accessed on 20 April 2026).

3.4.1. Fundamental Calculation

Let B denote the bit length of each entry in the lookup table, with

B = 64

for float and

B = 128

for double. For any integer

e_{10}

(representing a power of 10), we aim to represent

10^{e_{10}}

in the form

f \cdot 2^{⌊ e_{2} ⌋}

, where

1 ⩽ f < 2

and

e_{2}

is a real number. This gives

\begin{matrix} f \cdot 2^{⌊ e_{2} ⌋} = 2^{e_{2}} = 10^{e_{10}} \end{matrix}

(20)

Taking the logarithm base 2 of both sides, we get

\begin{matrix} ⌊ e_{2} ⌋ = ⌊ e_{10} \cdot {log}_{2} (10) ⌋ \end{matrix}

(21)

Algorithm 1: The xjb Algorithm for Float-to-Decimal Conversion

Require:: Floating-point components c (significand) and q (exponent)
Ensure:: Decimal representation $d \cdot 10^{k}$ satisfying the SW principle
1:: $c \cdot 2^{q}$ ← v
2:: if v is regular then
3:: $k \leftarrow ⌊ q \cdot \log_{10} (2) ⌋$
4:: else
5:: $k \leftarrow ⌊ q \cdot \log_{10} (2) - \log_{10} (4 / 3) ⌋$
6:: end if
7:: $m \leftarrow ⌊ v \cdot 10^{- k - 1} ⌋$
8:: $n \leftarrow v \cdot 10^{- k - 1} - m$
9:: $ten \leftarrow 10 m$
10:: $δ \leftarrow 10 n - ⌊ 10 n ⌋$ {fractional part of 10n}
11:: if $δ = 0.5$ then
12:: if $⌊ 10 n ⌋ \mod 2 = 0$ then
13:: $one \leftarrow ⌊ 10 n ⌋$ {round to even}
14:: else
15:: $one \leftarrow ⌊ 10 n ⌋ + 1$
16:: end if
17:: else if $δ < 0.5$ then
18:: $one \leftarrow ⌊ 10 n ⌋$ {round to nearest}
19:: else
20:: $one \leftarrow ⌊ 10 n ⌋ + 1$ {round to nearest}
21:: end if
22:: if v is irregular then
23:: if $δ > 2^{q - 2} \cdot 10^{- k}$ then
24:: $one \leftarrow ⌊ 10 n ⌋ + 1$
25:: end if
26:: if $2^{q - 2} \cdot 10^{- k - 1} \geq n$ then
27:: $one \leftarrow 0$
28:: end if
29:: else
30:: if $2^{q - 1} \cdot 10^{- k - 1} > n or (2^{q - 1} \cdot 10^{- k - 1} = n and c \mod 2 = 0)$ then
31:: $one \leftarrow 0$ {minimum length}
32:: end if
33:: if $2^{q - 1} \cdot 10^{- k - 1} > 1 - n or (2^{q - 1} \cdot 10^{- k - 1} = 1 - n and c \mod 2 = 0)$ then
34:: $one \leftarrow 10$ {minimum length}
35:: end if
36:: end if
37:: $d \leftarrow ten + one$ {information preservation}
38:
39:: return $d, k$

Solving for f gives

\begin{matrix} f = \frac{10^{e_{10}}}{2^{⌊ e_{10} \cdot {log}_{2} (10) ⌋}} \end{matrix}

(22)

The lookup table entries are computed using upward rounding:

\begin{matrix} l o o k u p [e_{10}] & = ⌈ f \cdot 2^{B - 1} ⌉ \\ = ⌈ \frac{10^{e_{10}}}{2^{⌊ e_{10} \cdot {log}_{2} (10) ⌋}} \cdot 2^{B - 1} ⌉ \\ = ⌈ 10^{e_{10}} \cdot 2^{B - 1 - ⌊ e_{10} \cdot {log}_{2} (10) ⌋} ⌉ \end{matrix}

(23)

Notably,

f \cdot 2^{B - 1}

becomes an integer for certain ranges of

e_{10}

:

0 ⩽ e_{10} ⩽ 27

for float and

0 ⩽ e_{10} ⩽ 55

for double.

3.4.2. Detailed Calculation Process

The detailed calculation process is as follows:

Float
The range of $- k - 1$ is calculated to be [−32, 44] through the q value range in Equation (6), so the lookup table contains representation values from 10 to the power of −32 to 10 to the power of 44. The calculation process is as follows:

$\begin{matrix} - 32 ⩽ e_{10} ⩽ 44 \\ e_{2} = |⌊ e_{10} \cdot {log}_{2} (10) ⌋ - 63| \\ p o w 10 t = \{\begin{matrix} 2^{e_{2}} / / 10^{|e 10|}; if e_{10} < 0 \\ 10^{|e 10|} / / 2^{e_{2}}; if e_{10} ⩾ 20 \\ 10^{|e 10|} \cdot 2^{e_{2}}; if 1 ⩽ e_{10} ⩽ 19 \end{matrix} \\ f_{1, e_{10}} = p o w 10 = p o w 10 t + (0 ⩽ e_{10} ⩽ 27 ? 0 : 1) \end{matrix}$

(24)

When $0 ⩽ e_{10} ⩽ 27$ , the lookup table variable indicates that the values $f_{1, e_{10}} \cdot 2^{⌊ e_{10} \cdot {log}_{2} (10) ⌋ - 63}$ and $10^{e_{10}}$ are equal. In other cases, the relative error is less than $2^{- 63}$ , expressed as follows:

$\begin{matrix} r_{1, e_{10}} & = \frac{f_{1, e_{10}} \cdot 2^{⌊ e_{10} \cdot {log}_{2} (10) ⌋ - 63}}{10^{e_{10}}} \\ \in \{\begin{matrix} 1; if 0 ⩽ e_{10} ⩽ 27 \\ (1, 1 + 2^{- 63}); if e_{10} < 0 or e_{10} > 27 \end{matrix} \end{matrix}$

(25)
Double
The range of $- k - 1$ is calculated to be [−293, 323] through the q value range in Equation (6), so the lookup table contains representation values from 10 to the power of −293 to 10 to the power of 323. The calculation process is as follows:

$\begin{matrix} - 293 ⩽ e_{10} ⩽ 323 \\ e_{2} = |⌊ e_{10} \cdot {log}_{2} (10) ⌋ - 127| \\ p o w 10 t = \{\begin{matrix} 2^{e_{2}} / / 10^{|e 10|}; if e_{10} < 0 \\ 10^{|e 10|} / / 2^{e_{2}}; if e_{10} ⩾ 39 \\ 10^{|e 10|} \cdot 2^{e_{2}}; if 1 ⩽ e_{10} ⩽ 38 \end{matrix} \\ f_{1, e_{10}} = p o w 10 = p o w 10 t + (0 ⩽ e_{10} ⩽ 55 ? 0 : 1) \end{matrix}$

(26)

When $0 ⩽ e_{10} ⩽ 55$ , the lookup table variable indicates that the values $f_{1, e_{10}} \cdot 2^{⌊ e_{10} \cdot {log}_{2} (10) ⌋ - 127}$ and $10^{e_{10}}$ are equal. In other cases, the relative error is less than $2^{- 127}$ , expressed as follows:

$\begin{matrix} r_{1, e_{10}} & = \frac{f_{1, e_{10}} \cdot 2^{⌊ e_{10} \cdot {log}_{2} (10) ⌋ - 127}}{10^{e_{10}}} \\ \in \{\begin{matrix} 1; if 0 ⩽ e_{10} ⩽ 55 \\ (1, 1 + 2^{- 127}); if e_{10} < 0 or e_{10} > 55 \end{matrix} \end{matrix}$

(27)

Let

r_{1}

denote the error for float lookup table entries,

r_{2}

for double entries, and r for both. In Algorithm 1, we retrieve

10^{- k - 1}

from the lookup table. The lookup table provides exact values when q falls within specific ranges:

\begin{matrix} float : & 0 ⩽ - k - 1 ⩽ 27 \Rightarrow - 93 ⩽ q ⩽ - 1 \\ double : & 0 ⩽ - k - 1 ⩽ 55 \Rightarrow - 186 ⩽ q ⩽ - 1 \end{matrix}

(28)

For q outside these ranges, the lookup table entries have bounded relative errors:

\begin{matrix} float : & 0 < r_{1} - 1 < 2^{- 63} \\ double : & 0 < r_{2} - 1 < 2^{- 127} \end{matrix}

(29)

3.4.3. Storage Requirements

The float lookup table requires 616 bytes of storage, calculated as

(44 - (- 32) + 1) \times 8

bytes (77 entries × 8 bytes each). The double lookup table requires 9872 bytes, calculated as

(323 - (- 293) + 1) \times 16

bytes (617 entries × 16 bytes each).

3.4.4. Implementation Notes

The lookup table precomputation uses efficient integer arithmetic to avoid precision loss during calculations. The conditional logic in Equations (24) and (26) optimizes the computation based on the sign and magnitude of

e_{10}

, ensuring efficient generation of accurate lookup table entries.

3.5. Efficient Computation of m

This section presents an efficient method for calculating m in Algorithm 1, which is defined as

m = ⌊ v \cdot 10^{- k - 1} ⌋

.

3.5.1. Key Proof

In Algorithm 1, we need to compute

m = ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋

. We aim to prove

\begin{matrix} m = ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ c \cdot 2^{q} \cdot r \cdot 10^{- k - 1} ⌋ \end{matrix}

(30)

where r is the lookup table error defined in Equations (25) and (27). When condition Equation (28) is met,

r = 1

, and the equation holds trivially. For

r \neq 1

,

\begin{matrix} float : & 1 < r < 1 + 2^{- 63} \\ double : & 1 < r < 1 + 2^{- 127} \end{matrix}

(31)

Calculate the range of

2^{q} \cdot 10^{- k - 1}

, and we get

\begin{matrix} 2^{q} \cdot 10^{- k - 1} = 10^{- 1} \cdot (2^{q} \cdot 10^{- ⌊ q \cdot lg (2) ⌋}) = 10^{- 1} \cdot (10^{q \cdot lg (2) - ⌊ q \cdot lg (2) ⌋}) \end{matrix}

(32)

When q is not 0, Equation (32) exists:

\begin{matrix} q \cdot lg (2) \neq ⌊ q \cdot lg (2) ⌋ \\ 0 < q \cdot lg (2) - ⌊ q \cdot lg (2) ⌋ < 1 \end{matrix}

(33)

When q is 0,

q \cdot lg (2) - ⌊ q \cdot lg (2) ⌋ = 0

, so the final conclusion is

\begin{matrix} 10^{- 1} ⩽ 2^{q} \cdot 10^{- k - 1} < 1 \end{matrix}

(34)

because there is

\begin{matrix} c \cdot 2^{q} \cdot 10^{- k - 1} = c \cdot \frac{2^{q - k - 1}}{5^{k + 1}} \in [0.1 c, c) \end{matrix}

(35)

Therefore,

\begin{matrix} c \cdot 2^{q} \cdot 10^{- k - 1} = \{\begin{matrix} \frac{c \cdot 2^{q - k - 1}}{5^{k + 1}}; q ⩾ 1 \\ \frac{c}{2^{1 + k - q} \cdot 5^{k + 1}} = \frac{c}{10}; q = 0 \\ \frac{c \cdot 5^{- k - 1}}{2^{1 + k - q}}; q < 0 \end{matrix} \end{matrix}

(36)

Suppose

\begin{matrix} c \cdot 2^{q} \cdot 10^{- k - 1} = c \cdot \frac{x}{y} < c \end{matrix}

(37)

Then, there are

\begin{matrix} (x, y) = \{\begin{matrix} (2^{q - k - 1}, 5^{k + 1}); q ⩾ 1 \\ (1, 10); q = 0 \\ (5^{- k - 1}, 2^{1 + k - q}); q < 0 \end{matrix} \end{matrix}

(38)

3.5.2. Bit Width Calculation

Define maximum values for c:

\begin{matrix} float : & c ⩽ c_{m a x} = C_{1} = 2^{24} - 1 \\ double : & c ⩽ c_{m a x} = C_{2} = 2^{53} - 1 \end{matrix}

(39)

Let C denote either

C_{1}

or

C_{2}

, depending on the precision.

For

y > C

, compute

P^{*}

and

Q^{*}

for each q using Appendix A.5

f (C, x, y)

, and find the minimum

B I T

such that

\begin{matrix} \frac{x}{y} (1 + 2^{- B I T}) < \frac{P^{*}}{Q^{*}} \end{matrix}

(40)

For

y ⩽ C

,

\begin{matrix} c \cdot \frac{x}{y} (1 + \frac{1}{C y}) = \frac{c x + \frac{c}{C} \cdot \frac{x}{y}}{y} < \frac{c x + 1}{y} \end{matrix}

(41)

Thus,

\begin{matrix} ⌊ c \cdot \frac{x}{y} ⌋ = ⌊ c \cdot \frac{x}{y} (1 + \frac{1}{C y}) ⌋ \end{matrix}

(42)

Similarly, find the minimum

B I T

such that

\begin{matrix} \frac{x}{y} (1 + 2^{- B I T}) < \frac{x}{y} (1 + \frac{1}{C y}) \end{matrix}

(43)

3.5.3. Results

The maximum of the minimum

B I T

values for all q (calculated in (test1.py) https://github.com/xjb714/xjb/blob/main/py_test/test1.py (accessed on 20 April 2026) in about 1–2 s) is

\begin{matrix} float : & B I T_{m a x} = 52 \\ double : & B I T_{m a x} = 113 \end{matrix}

(44)

Thus,

\begin{matrix} float : & ⌊ c \cdot \frac{x}{y} ⌋ = ⌊ c \cdot \frac{x}{y} \cdot (1 + 2^{- 52}) ⌋ = ⌊ c \cdot \frac{x}{y} \cdot r_{1} ⌋ \\ double : & ⌊ c \cdot \frac{x}{y} ⌋ = ⌊ c \cdot \frac{x}{y} \cdot (1 + 2^{- 113}) ⌋ = ⌊ c \cdot \frac{x}{y} \cdot r_{2} ⌋ \end{matrix}

(45)

This result confirms that m can be calculated efficiently using the lookup table values, even with their inherent errors. Once m is determined,

t e n = 10 m

can be computed quickly.

3.6. Fast Boundary Condition Testing for $o n e = 0$ and $o n e = 10$

In Algorithm 1, the conditions for determining

o n e = 0

and

o n e = 10

appear on lines 31 and 34, respectively. This section introduces an optimized method to quickly test these boundary conditions using equivalent mathematical formulations.

3.6.1. Equivalent Conditions for Boundary Testing

We start by deriving equivalent mathematical conditions for testing

o n e = 0

and

o n e = 10

.

Case 1: Testing $o n e = 0$
When $2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n$ , this is equivalent to

$\begin{matrix} c \cdot 2^{q} \cdot 10^{- k - 1} - ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ & = 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \\ (2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} & = ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ \end{matrix}$

(46)
Case 2: Testing $o n e = 10$
When $2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n$ , this is equivalent to

$\begin{matrix} ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ - c \cdot 2^{q} \cdot 10^{- k - 1} + 1 & = 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \\ (2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} & = ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ + 1 \end{matrix}$

(47)

3.6.2. Integer Testing Analysis

To further analyze these conditions, we start with the range of

2^{q - 1} \cdot 10^{- k - 1}

from Equation (34):

2^{q - 1} \cdot 10^{- k - 1} \in [0.05, 0.5)

(48)

Analysis for $o n e = 0$
For the $o n e = 0$ case, we can derive

$\begin{matrix} ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ - 1 & < c \cdot 2^{q} \cdot 10^{- k - 1} - 0.5 \\ < (2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} \\ ⩽ c \cdot 2^{q} \cdot 10^{- k - 1} - 0.05 < ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ + 1 \end{matrix}$

(49)

This implies that when $(2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1}$ is an integer, it must equal $⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋$ .
Analysis for $o n e = 10$
Similarly, for the $o n e = 10$ case,

$\begin{matrix} ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ & < c \cdot 2^{q} \cdot 10^{- k - 1} + 0.05 \\ ⩽ (2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} \\ < c \cdot 2^{q} \cdot 10^{- k - 1} + 0.5 < ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ + 2 \end{matrix}$

(50)

This implies that when $(2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1}$ is an integer, it must equal $⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ + 1$ .

3.6.3. Key Insight: Integer Divisibility Test

The key insight is that testing for

o n e = 0

or

o n e = 10

is equivalent to checking whether

(2 c \pm 1) \cdot 2^{q - 1} \cdot 10^{- k - 1}

is an integer. This can be rewritten as follows:

(2 c \pm 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} = (2 c \pm 1) \cdot 2^{q - k - 2} \cdot 5^{- k - 1}

(51)

We analyze different ranges of q to simplify this condition:

Case $q ⩾ 2$
From $q ⩾ 2$ , we get $k ⩾ 0$ . The expression simplifies to checking whether $(2 c \pm 1) \cdot 2^{q - k - 2}$ is divisible by $5^{k + 1}$ . Since 2 and 5 are coprime, this reduces to checking whether $(2 c \pm 1)$ is divisible by $5^{k + 1}$ :

$(2 c \pm 1) mod 5^{k + 1} = 0$

(52)

Let t be a positive integer such that $2 c \pm 1 = t \cdot 5^{k + 1}$ . Since $2 c \pm 1$ is odd, t must also be odd. Considering the ranges of c for float and double,

$\begin{matrix} float : & 2 c - 1 \in [2^{24} + 1, 2^{25} - 3]; 2 c + 1 \in [2^{24} + 3, 2^{25} - 1]; \\ double : & 2 c - 1 \in [2^{53} + 1, 2^{54} - 3]; 2 c + 1 \in [2^{53} + 3, 2^{54} - 1]; \end{matrix}$

(53)

This gives us the range for t:

$\begin{matrix} float : & \frac{2^{24} + 1}{5^{k + 1}} ⩽ t ⩽ \frac{2^{25} - 1}{5^{k + 1}}; \\ double : & \frac{2^{53} + 1}{5^{k + 1}} ⩽ t ⩽ \frac{2^{54} - 1}{5^{k + 1}}; \end{matrix}$

(54)

The maximum values of k where t can be at least one odd integer are

$\begin{matrix} float : & k_{max} = 9 \Rightarrow q_{max} = 33, t = 3 \\ double : & k_{max} = 22 \Rightarrow q_{max} = 76, t = 1 \end{matrix}$

(55)
Case $1 ⩾ q ⩾ 0$
The denominator $2^{2 + k - q} \cdot 5^{k + 1}$ is even, while the numerator $(2 c \pm 1)$ is odd, so no solution exists.
Case $q < 0$
The denominator $2^{2 + k - q}$ is even, while the numerator $(2 c \pm 1) \cdot 5^{- k - 1}$ is odd, so no solution exists.

3.6.4. Summary of Boundary Conditions

In summary, the situations when

(2 c \pm 1) \cdot 2^{q - 1} \cdot 10^{- k - 1}

is an integer are as follows:

\begin{matrix} float : & 2 ⩽ q ⩽ 33 & & (2 c \pm 1) mod 5^{k + 1} = 0; \\ double : & 2 ⩽ q ⩽ 76 & & (2 c \pm 1) mod 5^{k + 1} = 0; \end{matrix}

(56)

The range of

- k - 1

is as follows:

\begin{matrix} float : & - 10 ⩽ - k - 1 ⩽ - 1 \\ double : & - 23 ⩽ - k - 1 ⩽ - 1 \end{matrix}

(57)

3.6.5. Efficient Implementation

We can further simplify the testing conditions using bitwise operations. For

o n e = 0

, when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n

,

\begin{matrix} float : & ⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} \cdot n ⌋ \\ double : & ⌊ 2^{63} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{64} \cdot n ⌋ \end{matrix}

(58)

For

o n e = 10

, when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n

, the following conclusions can be drawn:

\begin{matrix} float : & \{\begin{matrix} 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} = 2^{36} - 2^{36} \cdot n \Rightarrow \\ ⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} - 2^{36} \cdot n ⌋ = 2^{36} - 1 - ⌊ 2^{36} \cdot n ⌋ \end{matrix} \\ double : & \{\begin{matrix} 2^{63} \cdot 2^{q} \cdot 10^{- k - 1} = 2^{64} - 2^{64} \cdot n \Rightarrow \\ ⌊ 2^{63} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{64} - 2^{64} \cdot n ⌋ = 2^{64} - 1 - ⌊ 2^{64} \cdot n ⌋ \end{matrix} \end{matrix}

(59)

The discussion on whether

⌊ 2^{36} - 2^{36} \cdot n ⌋ = 2^{36} - 1 - ⌊ 2^{36} \cdot n ⌋

in Equation (59) holds true—that is, whether

2^{36} \cdot n

in Equation (59) is an integer—or equivalent to discussing whether the following values are integers when Equation (56) holds true (the same applies to double):

\begin{matrix} float : & 2^{36} \cdot (m + n) = c \cdot 2^{q + 36} \cdot 10^{- k - 1} = c \cdot 2^{q - k + 35} \cdot 5^{- k - 1} = c \cdot \frac{2^{q - k + 35}}{5^{k + 1}} \\ double : & 2^{64} \cdot (m + n) = c \cdot 2^{q + 64} \cdot 10^{- k - 1} = c \cdot 2^{q - k + 63} \cdot 5^{- k - 1} = c \cdot \frac{2^{q - k + 63}}{5^{k + 1}} \end{matrix}

(60)

Suppose c can divide

5^{k + 1}

evenly (where t is a temporary integer variable):

\begin{matrix} c = t \cdot 5^{k + 1}; t ⩾ 1 \end{matrix}

(61)

Therefore, when Equation (61) was established, there were

\begin{matrix} 2 c \pm 1 = 2 \cdot t \cdot 5^{k + 1} \pm 1 \end{matrix}

(62)

Expression Equation (62) cannot divide

5^{k + 1}

evenly, which contradicts Equation (56), so c cannot divide

5^{k + 1}

evenly. Therefore, for float,

c \cdot 2^{q + 36} \cdot 10^{- k - 1}

and

2^{36} \cdot n

are not integers. For double,

c \cdot 2^{64 + q} \cdot 10^{- k - 1}

and

2^{64} \cdot n

are not integers; that is,

\begin{matrix} float : & ⌊ 2^{36} - 2^{36} \cdot n ⌋ = 2^{36} + ⌊ - 2^{36} \cdot n ⌋ = 2^{36} - 1 - ⌊ 2^{36} \cdot n ⌋ \\ double : & ⌊ 2^{64} - 2^{64} \cdot n ⌋ = 2^{64} + ⌊ - 2^{64} \cdot n ⌋ = 2^{64} - 1 - ⌊ 2^{64} \cdot n ⌋ \end{matrix}

(63)

Therefore, the conclusion Equation (59) is correct. Discuss the necessary and sufficient conditions for whether

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} \cdot n ⌋

is

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n

. The same applies to double, expressed as follows:

\begin{matrix} float : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n \Leftrightarrow ⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} \cdot n ⌋ \\ double : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n \Leftrightarrow ⌊ 2^{63} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{64} \cdot n ⌋ \end{matrix}

(64)

Similarly, the necessary and sufficient condition for whether

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} - 2^{36} \cdot n ⌋

is

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n

. The same applies to double, expressed as follows:

\begin{matrix} float : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n \Leftrightarrow ⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} - 2^{36} \cdot n ⌋ \\ double : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n \Leftrightarrow ⌊ 2^{63} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{64} - 2^{64} \cdot n ⌋ \end{matrix}

(65)

The sufficient conditions of Equations (64) and (65) are obviously established. Introduce the proof that Equation (64) holds. For float, only the necessary conditions need to be discussed; that is, whether

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n

must hold true when

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} \cdot n ⌋

holds, or equivalent to

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ \neq ⌊ 2^{36} \cdot n ⌋

must hold true when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \neq n

. The following is proved by proof by contradiction.

Assume that

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} \cdot n ⌋

holds when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \neq n

. Then there is

\begin{matrix} ⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} \cdot n ⌋ \\ \Rightarrow 0 < |2^{35} \cdot 2^{q} \cdot 10^{- k - 1} - 2^{36} \cdot n| < 1 \\ \Rightarrow 0 < |(2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} - m| < 2^{- 36} \end{matrix}

(66)

As is known from Equation (49), there is

\begin{matrix} m - 1 < (2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} < m + 1 \end{matrix}

(67)

Suppose that the decimal part of

(2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1}

is represented as

n^{-}

; thus, we have

\begin{matrix} |(2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} - m| = \{\begin{matrix} n^{-}; if (2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} > m \\ 1 - n^{-}; if (2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} < m \end{matrix} \end{matrix}

(68)

Substitute Equation (68) into Equation (66), and we get

\begin{matrix} 0 < |(2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} - m| < 2^{- 36} \\ \Rightarrow 0 < n^{-} < 2^{- 36} or 0 < 1 - n^{-} < 2^{- 36} \end{matrix}

(69)

Similarly, it can be known that the double range is the range of

n^{-}

. Therefore, there is

\begin{matrix} float : & n^{-} \in (0, 2^{- 36}) \cup (1 - 2^{- 36}, 1) \\ double : & n^{-} \in (0, 2^{- 64}) \cup (1 - 2^{- 64}, 1) \end{matrix}

(70)

When

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \neq n

, it is known from Equation (46) that

(2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1}

is not an integer. Therefore, there is

\begin{matrix} 0 < n^{-} < 1 \end{matrix}

(71)

It is only necessary to prove that Equation (70) does not hold. Discuss the range of the decimal part

n^{-}

when

(2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1}

is not an integer. According to Equation (51), there are

\begin{matrix} (2 c - 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} = (2 c - 1) \cdot \frac{x}{y} = \{\begin{matrix} \frac{(2 c - 1) \cdot 2^{q - k - 2}}{5^{k + 1}}; q ⩾ 2 \\ \frac{(2 c - 1)}{2^{2 + k - q} \cdot 5^{k + 1}}; 1 ⩾ q ⩾ 0 \\ \frac{(2 c - 1) \cdot 5^{- k - 1}}{2^{2 + k - q}}; q < 0 \end{matrix} \end{matrix}

(72)

The maximum value of

2 c - 1

is

\begin{matrix} float : & {(2 c - 1)}_{max} = 2^{25} - 3 \\ double : & {(2 c - 1)}_{max} = 2^{54} - 3 \end{matrix}

(73)

Discuss based on the denominator range in Equation (72).

$y ⩽ {(2 c - 1)}_{max}$
When $y ⩽ {(2 c - 1)}_{max}$ , $y_{max}$ is the expression Equation (73), the following holds true:

$\begin{matrix} \frac{1}{y_{max}} ⩽ n^{-} ⩽ 1 - \frac{1}{y_{max}} \\ \frac{1}{y_{max}} ⩽ 1 - n^{-} ⩽ 1 - \frac{1}{y_{max}} \end{matrix}$

(74)

Therefore, when $y ⩽ {(2 c - 1)}_{max}$ , Equation (70) does not hold true.
$y > {(2 c - 1)}_{max}$
Call function Appendix A.5 to calculate the approximation results $\frac{P_{*}}{Q_{*}}$ and $\frac{P^{*}}{Q^{*}}$ of all possible upper and lower limit rational numbers:

$\begin{matrix} (\frac{P_{*}}{Q_{*}}, \frac{P^{*}}{Q^{*}}) = f ({(2 c - 1)}_{max}, x, y) \end{matrix}$

(75)

Therefore, for $n^{-}$ , the following conclusion can be drawn from Appendix A.4.

$\begin{matrix} n^{-} \in [\frac{(Q_{*} x) % y}{y}, \frac{(Q^{*} x) % y}{y}] \end{matrix}$

(76)

By exhausting all possibilities, we thus have the following (the test code file is (test3.py) https://github.com/xjb714/xjb/blob/main/py_test/test3.py) (accessed on 20 April 2026):

$\begin{matrix} float : & 2^{- 33} < n^{-} < 1 - 2^{- 29} \\ double : & 2^{- 62} < n^{-} < 1 - 2^{- 63} \end{matrix}$

(77)

$\begin{matrix} float : & [\frac{(Q_{*} x) % y}{y}, \frac{(Q^{*} x) % y}{y}] \cap (0, 2^{- 36}) = ⌀ \\ [\frac{(Q_{*} x) % y}{y}, \frac{(Q^{*} x) % y}{y}] \cap (1 - 2^{- 36}, 1) = ⌀ \\ double : & [\frac{(Q_{*} x) % y}{y}, \frac{(Q^{*} x) % y}{y}] \cap (0, 2^{- 64}) = ⌀ \\ [\frac{(Q_{*} x) % y}{y}, \frac{(Q^{*} x) % y}{y}] \cap (1 - 2^{- 64}, 1) = ⌀ \end{matrix}$

(78)

Therefore, when $y > {(2 c - 1)}_{max}$ , Equation (70) does not hold true.

In summary, when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \neq n

, Equation (70) does not hold true; that is,

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ \neq ⌊ 2^{36} \cdot n ⌋

must hold true. Therefore, when

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} \cdot n ⌋

holds,

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n

must hold true. Therefore, Equation (64) holds.

Similarly, it can be proved that when

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} - 2^{36} \cdot n ⌋

holds,

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n

must hold true. The same applies to double. Similarly, by proof of contradiction, for float, it is assumed that when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \neq 1 - n

holds,

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} - 2^{36} \cdot n ⌋

holds. That is,

\begin{matrix} ⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} - 2^{36} \cdot n ⌋ \\ \Rightarrow 0 < |2^{35} \cdot 2^{q} \cdot 10^{- k - 1} - 2^{36} + 2^{36} \cdot n| < 1 \\ \Rightarrow 0 < |2^{q - 1} \cdot 10^{- k - 1} - 1 + n| < 2^{- 36} \\ \Rightarrow - 2^{- 36} < (2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} - m - 1 < 2^{- 36} \end{matrix}

(79)

As is known from Equation (50), there is

\begin{matrix} m < (2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} < m + 2 \end{matrix}

(80)

Suppose that the decimal part of

(2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1}

is represented as

n^{+}

; thus, we have

\begin{matrix} (2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} - m - 1 = \{\begin{matrix} n^{+}; if (2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} > m + 1 \\ 1 - n^{+}; if (2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} < m + 1 \end{matrix} \end{matrix}

(81)

Substitute Equation (81) into Equation (79), and we get

\begin{matrix} 0 < |(2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} - m - 1| < 2^{- 36} \\ \Rightarrow 0 < 1 - n^{+} < 2^{- 36} or 0 < n^{+} < 2^{- 36} \end{matrix}

(82)

Similarly, it can be known that the double range is the range of

n^{+}

. Therefore, there is

\begin{matrix} float : & n^{+} \in (0, 2^{- 36}) \cup (1 - 2^{- 36}, 1) \\ double : & n^{+} \in (0, 2^{- 64}) \cup (1 - 2^{- 64}, 1) \end{matrix}

(83)

When

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \neq 1 - n

, it is known from Equation (47) that

(2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1}

is not an integer. Therefore, there is

\begin{matrix} 0 < n^{+} < 1 \end{matrix}

(84)

It is only necessary to prove that Equation (83) does not hold. Discuss the range of the decimal part

n^{+}

when

(2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1}

is not an integer. According to Equation (51), there are

\begin{matrix} (2 c + 1) \cdot 2^{q - 1} \cdot 10^{- k - 1} = (2 c + 1) \cdot \frac{x}{y} = \{\begin{matrix} \frac{(2 c + 1) \cdot 2^{q - k - 2}}{5^{k + 1}}; q ⩾ 2 \\ \frac{(2 c + 1)}{2^{2 + k - q} \cdot 5^{k + 1}}; 1 ⩾ q ⩾ 0 \\ \frac{(2 c + 1) \cdot 5^{- k - 1}}{2^{2 + k - q}}; q < 0 \end{matrix} \end{matrix}

(85)

The maximum value of

2 c + 1

is

\begin{matrix} float : & {(2 c + 1)}_{max} = 2^{25} - 1 \\ double : & {(2 c + 1)}_{max} = 2^{54} - 1 \end{matrix}

(86)

Discuss based on the denominator range in Equation (85).

$y ⩽ {(2 c + 1)}_{max}$
When $y ⩽ {(2 c + 1)}_{max}$ , $y_{max}$ is the expression Equation (86), the following holds true:

$\begin{matrix} \frac{1}{y_{max}} ⩽ n^{+} ⩽ 1 - \frac{1}{y_{max}} \\ \frac{1}{y_{max}} ⩽ 1 - n^{+} ⩽ 1 - \frac{1}{y_{max}} \end{matrix}$

(87)

Therefore, when $y ⩽ {(2 c + 1)}_{max}$ , Equation (83) does not hold true.
$y > {(2 c + 1)}_{max}$
Call function Appendix A.5 to calculate the approximation results $\frac{P_{*}}{Q_{*}}$ and $\frac{P^{*}}{Q^{*}}$ of all possible upper and lower limit rational numbers:

$\begin{matrix} (\frac{P_{*}}{Q_{*}}, \frac{P^{*}}{Q^{*}}) = f ({(2 c + 1)}_{max}, x, y) \end{matrix}$

(88)

Therefore, for $n^{+}$ , the following conclusion can be drawn from formula in Appendix A.4.

$\begin{matrix} n^{+} \in [\frac{(Q_{*} x) % y}{y}, \frac{(Q^{*} x) % y}{y}] \end{matrix}$

(89)

By exhausting all possibilities, we thus have the following (the test code file is (test7.py) https://github.com/xjb714/xjb/blob/main/py_test/test7.py (accessed on 20 April 2026)):

$\begin{matrix} float : & 2^{- 33} < n^{+} < 1 - 2^{- 29} \\ double : & 2^{- 62} < n^{+} < 1 - 2^{- 63} \end{matrix}$

(90)

$\begin{matrix} float : & [\frac{(Q_{*} x) % y}{y}, \frac{(Q^{*} x) % y}{y}] \cap (0, 2^{- 36}) = ⌀ \\ [\frac{(Q_{*} x) % y}{y}, \frac{(Q^{*} x) % y}{y}] \cap (1 - 2^{- 36}, 1) = ⌀ \\ double : & [\frac{(Q_{*} x) % y}{y}, \frac{(Q^{*} x) % y}{y}] \cap (0, 2^{- 64}) = ⌀ \\ [\frac{(Q_{*} x) % y}{y}, \frac{(Q^{*} x) % y}{y}] \cap (1 - 2^{- 64}, 1) = ⌀ \end{matrix}$

(91)

Therefore, when $y > {(2 c + 1)}_{max}$ , Equation (83) does not hold true.

In summary, when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} \neq 1 - n

, Equation (83) does not hold true; that is,

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ \neq ⌊ 2^{36} - 2^{36} \cdot n ⌋

must hold true. Therefore, when

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ 2^{36} - 2^{36} \cdot n ⌋

holds,

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n

must hold true. The same is true for double. Therefore, Equation (65) holds.

The following conclusions hold:

\begin{matrix} float : & ⌊ 2^{36} - 2^{36} \cdot n ⌋ & = \{\begin{matrix} 2^{36} - 1 - ⌊ 2^{36} \cdot n ⌋; if c \cdot 2^{36 + q} \cdot 10^{- k - 1} \notin Z \\ 2^{36} - ⌊ 2^{36} \cdot n ⌋; if c \cdot 2^{36 + q} \cdot 10^{- k - 1} \in Z \end{matrix} \\ double : & ⌊ 2^{64} - 2^{64} \cdot n ⌋ & = \{\begin{matrix} 2^{64} - 1 - ⌊ 2^{64} \cdot n ⌋; if c \cdot 2^{64 + q} \cdot 10^{- k - 1} \notin Z \\ 2^{64} - ⌊ 2^{64} \cdot n ⌋; if c \cdot 2^{64 + q} \cdot 10^{- k - 1} \in Z \end{matrix} \end{matrix}

(92)

Discuss whether the following Equation (93) holds when conditions Equations (56) and (57) are met:

\begin{matrix} float : & ⌊ c \cdot \frac{2^{q + 35 - k}}{5^{k + 1}} ⌋ = ⌊ c \cdot \frac{2^{q + 35 - k}}{5^{k + 1}} \cdot r ⌋ \\ = ⌊ c \cdot \frac{2^{q + 35 - k}}{5^{k + 1}} \cdot \frac{(2^{63 - ⌊ (- k - 1) \cdot {log}_{2} (10) ⌋} / / 10^{k + 1}) + 1}{10^{- k - 1}} \cdot 2^{⌊ (- k - 1) \cdot {log}_{2} (10) ⌋ - 63} ⌋ \\ double : & ⌊ c \cdot \frac{2^{q + 63 - k}}{5^{k + 1}} ⌋ = ⌊ c \cdot \frac{2^{q + 63 - k}}{5^{k + 1}} \cdot r ⌋ \\ = ⌊ c \cdot \frac{2^{q + 63 - k}}{5^{k + 1}} \cdot \frac{(2^{127 - ⌊ (- k - 1) \cdot {log}_{2} (10) ⌋} / / 10^{k + 1}) + 1}{10^{- k - 1}} \cdot 2^{⌊ (- k - 1) \cdot {log}_{2} (10) ⌋ - 127} ⌋ \end{matrix}

(93)

There are

\begin{matrix} float : & ⌊ c \cdot \frac{2^{q + 35 - k}}{5^{k + 1}} ⌋ = ⌊ 2^{36} \cdot (m + n) ⌋ = 2^{36} \cdot m + ⌊ 2^{36} \cdot n ⌋ \\ double : & ⌊ c \cdot \frac{2^{q + 63 - k}}{5^{k + 1}} ⌋ = ⌊ 2^{64} \cdot (m + n) ⌋ = 2^{64} \cdot m + ⌊ 2^{64} \cdot n ⌋ \end{matrix}

(94)

It has been proven earlier that m can be accurately calculated. Then, when Equation (93) holds true, the values

⌊ 2^{36} \cdot n ⌋

and

⌊ 2^{64} \cdot n ⌋

on the right side of equations Equations (58) and (59) can be accurately calculated.

From Equation (51), we have

\begin{matrix} c = \frac{t \cdot 5^{k + 1} - 1}{2} \end{matrix}

(95)

Substituting Equation (95) into Equation (93), we have

\begin{matrix} float : & c \cdot \frac{2^{q + 35 - k}}{5^{k + 1}} = t \cdot 2^{q + 34 - k} - \frac{2^{q + 34 - k}}{5^{k + 1}} \\ double : & c \cdot \frac{2^{q + 63 - k}}{5^{k + 1}} = t \cdot 2^{q + 62 - k} - \frac{2^{q + 62 - k}}{5^{k + 1}} \end{matrix}

(96)

When the conditions of Equations (56) and (57) are met,

t \cdot 2^{q + 34 - k}

and

t \cdot 2^{q + 62 - k}

are integers. Under the condition of meeting condition Equation (56), the decimal part of expression Equation (96) is represented as follows:

\begin{matrix} float : & \frac{2^{q + 34 - k} % 5^{k + 1}}{5^{k + 1}}; 2 ⩽ q ⩽ 33 \\ double : & \frac{2^{q + 62 - k} % 5^{k + 1}}{5^{k + 1}}; 2 ⩽ q ⩽ 76 \end{matrix}

(97)

It is only necessary to prove that the increase in the value

c \cdot \frac{2^{q + 35 - k}}{5^{k + 1}} \cdot r

on the right side of the expression compared to the value

c \cdot \frac{2^{q + 35 - k}}{5^{k + 1}}

on the left side plus the decimal part of the value on the left side is less than 1 for Equation (93) to hold true. That is,

\begin{matrix} float : & \frac{2^{q + 34 - k} % 5^{k + 1}}{5^{k + 1}} + (c \cdot \frac{2^{q + 35 - k}}{5^{k + 1}} \cdot r - c \cdot \frac{2^{q + 35 - k}}{5^{k + 1}}) < 1 \\ double : & \frac{2^{q + 62 - k} % 5^{k + 1}}{5^{k + 1}} + (c \cdot \frac{2^{q + 63 - k}}{5^{k + 1}} \cdot r - c \cdot \frac{2^{q + 63 - k}}{5^{k + 1}}) < 1 \end{matrix}

(98)

By exhaustively calculating the maximum possible c value under each q and substituting it into Equation (98), it holds. The calculation result can be found at in (test2.py) https://github.com/xjb714/xjb/blob/main/py_test/test2.py (accessed on 20 April 2026). The calculation results show that, for the float range and the double range, Equation (98) always holds true. Therefore, Equation (93) holds true, and thus, the values of

⌊ 2^{36} \cdot n ⌋

and

⌊ 2^{64} \cdot n ⌋

on the right side of Equations (58) and (59) can be accurately calculated. The values of

⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋

and

⌊ 2^{63} \cdot 2^{q} \cdot 10^{- k - 1} ⌋

on the left side of Equations (58) and (59) can be calculated through lookup tables.

\begin{matrix} float : & ⌊ 2^{35} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = p o w 10 ≫ (28 - q - ⌊ (- k - 1) \cdot {log}_{2} (10) ⌋) \\ double : & ⌊ 2^{63} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = p o w 10 ≫ (64 - q - ⌊ (- k - 1) \cdot {log}_{2} (10) ⌋) \end{matrix}

(99)

The code file for verifying the validity of Equation (99) is (test4.py) https://github.com/xjb714/xjb/blob/main/py_test/test4.py (accessed on 20 April 2026). Therefore, when the conditions of Equations (56) and (57) are met, the values of both sides of Equations (58) and (59) can be accurately calculated.

Discuss the relationship between the following two values within all ranges of floating-point numbers:

\begin{matrix} float : & ⌊ c \cdot 2^{q + 36} \cdot 10^{- k - 1} ⌋; ⌊ c \cdot 2^{q + 36} \cdot r \cdot 10^{- k - 1} ⌋; \\ double : & ⌊ c \cdot 2^{q + 64} \cdot 10^{- k - 1} ⌋; ⌊ c \cdot 2^{q + 64} \cdot r \cdot 10^{- k - 1} ⌋; \end{matrix}

(100)

When

r = 1

, it is obvious that the two values in Equation (100) are equal. When

r \neq 1

, or equivalent to

r > 1

,

\begin{matrix} float : c \cdot 2^{q + 36} \cdot r \cdot 10^{- k - 1} & = c \cdot 2^{q + 36} \cdot 10^{- k - 1} + c \cdot 2^{q + 36} \cdot (r - 1) \cdot 10^{- k - 1} \\ < c \cdot 2^{q + 36} \cdot 10^{- k - 1} + 2^{24} \cdot 2^{36} \cdot 2^{q} \cdot 10^{- k - 1} \cdot (r - 1) \\ < c \cdot 2^{q + 36} \cdot 10^{- k - 1} + 2^{- 3} \\ ⌊ c \cdot 2^{q + 36} \cdot r \cdot 10^{- k - 1} ⌋ & ⩽ ⌊ c \cdot 2^{q + 36} \cdot 10^{- k - 1} ⌋ + 1 \\ double : c \cdot 2^{q + 64} \cdot r \cdot 10^{- k - 1} & = c \cdot 2^{q + 64} \cdot 10^{- k - 1} + c \cdot 2^{q + 64} \cdot (r - 1) \cdot 10^{- k - 1} \\ < c \cdot 2^{q + 64} \cdot 10^{- k - 1} + 2^{53} \cdot 2^{64} \cdot 2^{q} \cdot 10^{- k - 1} \cdot (r - 1) \\ < c \cdot 2^{q + 64} \cdot 10^{- k - 1} + 2^{- 10} \\ ⌊ c \cdot 2^{q + 64} \cdot r \cdot 10^{- k - 1} ⌋ & ⩽ ⌊ c \cdot 2^{q + 64} \cdot 10^{- k - 1} ⌋ + 1 \end{matrix}

(101)

Therefore, there is

\begin{matrix} float : & 0 ⩽ ⌊ c \cdot 2^{q + 36} \cdot r \cdot 10^{- k - 1} ⌋ - ⌊ c \cdot 2^{q + 36} \cdot 10^{- k - 1} ⌋ ⩽ 1 \\ double : & 0 ⩽ ⌊ c \cdot 2^{q + 64} \cdot r \cdot 10^{- k - 1} ⌋ - ⌊ c \cdot 2^{q + 64} \cdot 10^{- k - 1} ⌋ ⩽ 1 \end{matrix}

(102)

because there is

\begin{matrix} ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = ⌊ c \cdot 2^{q} \cdot r \cdot 10^{- k - 1} ⌋ = m \end{matrix}

(103)

\begin{matrix} float : & ⌊ c \cdot 2^{q + 36} \cdot 10^{- k - 1} ⌋ = 2^{36} \cdot m + ⌊ 2^{36} \cdot n ⌋ \\ double : & ⌊ c \cdot 2^{q + 64} \cdot 10^{- k - 1} ⌋ = 2^{64} \cdot m + ⌊ 2^{64} \cdot n ⌋ \end{matrix}

(104)

Suppose

\begin{matrix} n_{r} = c \cdot 2^{q} \cdot r \cdot 10^{- k - 1} - m \end{matrix}

(105)

Therefore, the following conclusion can be drawn: when Equation (56) is met, from Equation (93), we have

\begin{matrix} float : & 2 ⩽ q ⩽ 33 & & (2 c \pm 1) % 5^{k + 1} = 0 \Rightarrow ⌊ 2^{36} \cdot n ⌋ = ⌊ 2^{36} \cdot n_{r} ⌋ \\ double : & 2 ⩽ q ⩽ 76 & & (2 c \pm 1) % 5^{k + 1} = 0 \Rightarrow ⌊ 2^{64} \cdot n ⌋ = ⌊ 2^{64} \cdot n_{r} ⌋ \end{matrix}

(106)

Within the range of floating-point numbers, there exists

\begin{matrix} float : & ⌊ 2^{36} \cdot n ⌋ ⩽ ⌊ 2^{36} \cdot n_{r} ⌋ ⩽ ⌊ 2^{36} \cdot n ⌋ + 1 \\ double : & ⌊ 2^{64} \cdot n ⌋ ⩽ ⌊ 2^{64} \cdot n_{r} ⌋ ⩽ ⌊ 2^{64} \cdot n ⌋ + 1 \end{matrix}

(107)

To simplify the expression,

e v e n

is used to indicate whether c is an even number:

\begin{matrix} e v e n = (c + 1) % 2 \in \{0, 1\} \end{matrix}

(108)

When

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n

or

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n

,

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n

is the boundary condition for

o n e = 0

, and

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n

is the boundary condition for

o n e = 10

. Whether

o n e

is 0 or 10 is determined based on whether c is an even number. Therefore, the following exists:

\begin{matrix} float : & \{\begin{matrix} o n e = 0 : ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n > ⌊ 2^{36} \cdot n_{r} ⌋ \\ o n e = 10 : ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n > 2^{36} - 1 - ⌊ 2^{36} \cdot n_{r} ⌋ \end{matrix} \\ double : & \{\begin{matrix} o n e = 0 : ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n > ⌊ 2^{64} \cdot n_{r} ⌋ \\ o n e = 10 : ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n > 2^{64} - 1 - ⌊ 2^{64} \cdot n_{r} ⌋ \end{matrix} \end{matrix}

(109)

Therefore, when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n

or

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n

, we can use the condition Equation (110) to determine whether

o n e = 0

or

o n e = 10

.

\begin{matrix} float : & \{\begin{matrix} if ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n > ⌊ 2^{36} \cdot n_{r} ⌋ : o n e = 0 \\ if ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n > 2^{36} - 1 - ⌊ 2^{36} \cdot n_{r} ⌋ : o n e = 10 \end{matrix} \\ double : & \{\begin{matrix} if ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n > ⌊ 2^{64} \cdot n_{r} ⌋ : o n e = 0 \\ if ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n > 2^{64} - 1 - ⌊ 2^{64} \cdot n_{r} ⌋ : o n e = 10 \end{matrix} \end{matrix}

(110)

When

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} > n

or

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} > 1 - n

, we can also use the above condition Equation (110) to determine whether

o n e = 0

or

o n e = 10

. When

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} < n

or

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} < 1 - n

, we can also use the above condition Equation (110) to determine whether

o n e \neq 0

or

o n e \neq 10

. There are a total of four situations. The proof is as follows:

(1): When $2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} < n$ , there must exist $o n e \neq 0$ , and there is

\begin{matrix} float : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} - n = n^{-} - 1 \in (2^{- 33} - 1, - 2^{- 29}) \\ double : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} - n = n^{-} - 1 \in (2^{- 62} - 1, - 2^{- 63}) \end{matrix}

(111)

Therefore, the following exists:

\begin{matrix} float : & 2^{q + 35} \cdot 10^{- k - 1} - 2^{36} \cdot n \in (2^{3} - 2^{36}, - 2^{7}) \\ double : & 2^{q + 63} \cdot 10^{- k - 1} - 2^{64} \cdot n \in (4 - 2^{64}, - 2) \end{matrix}

(112)

Suppose that there are two real numbers a and b, and the following relationship must exist:

\begin{matrix} 0 ⩽ & b - ⌊ b ⌋ < 1 \\ a - ⌊ a ⌋ - 1 < & b - ⌊ b ⌋ < 1 + a - ⌊ a ⌋ \\ a - b - 1 < & ⌊ a ⌋ - ⌊ b ⌋ < a - b + 1 \end{matrix}

(113)

When

a = 2^{q + 35} \cdot 10^{- k - 1}

and

b = 2^{36} \cdot n

or

a = 2^{q + 63} \cdot 10^{- k - 1}

and

b = 2^{64} \cdot n

, the following exists:

\begin{matrix} float : & ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ - ⌊ 2^{36} \cdot n ⌋ < 2^{q + 35} \cdot 10^{- k - 1} - 2^{36} \cdot n + 1 \\ double : & ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ - ⌊ 2^{64} \cdot n ⌋ < 2^{q + 63} \cdot 10^{- k - 1} - 2^{64} \cdot n + 1 \end{matrix}

(114)

From Equation (112), we have

\begin{matrix} float : & ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ - ⌊ 2^{36} \cdot n ⌋ < 1 - 2^{7} < 0 \\ double : & ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ - ⌊ 2^{64} \cdot n ⌋ < 1 - 2 < 0 \end{matrix}

(115)

Therefore, there is

\begin{matrix} float : ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n & ⩽ ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + 1 \\ < ⌊ 2^{36} \cdot n ⌋ ⩽ ⌊ 2^{36} \cdot n_{r} ⌋ \\ \Rightarrow ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n & < ⌊ 2^{36} \cdot n_{r} ⌋ \\ double : ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n & ⩽ ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + 1 \\ < ⌊ 2^{64} \cdot n ⌋ ⩽ ⌊ 2^{64} \cdot n_{r} ⌋ \\ \Rightarrow ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n & < ⌊ 2^{64} \cdot n_{r} ⌋ \end{matrix}

(116)

Therefore, when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} < n

, the condition Equation (110) can be used to determine that

o n e \neq 0

.

(2): When $2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} > n$ , there must exist $o n e = 0$ , and there is

\begin{matrix} float : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} - n = n^{-} \in (2^{- 33}, 1 - 2^{- 29}) \\ double : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} - n = n^{-} \in (2^{- 62}, 1 - 2^{- 63}) \end{matrix}

(117)

Therefore, the following exists:

\begin{matrix} float : & 2^{q + 35} \cdot 10^{- k - 1} - 2^{36} \cdot n \in (2^{3}, 2^{36} - 2^{7}) \\ double : & 2^{q + 63} \cdot 10^{- k - 1} - 2^{64} \cdot n \in (4, 2^{64} - 2) \end{matrix}

(118)

When

a = 2^{q + 35} \cdot 10^{- k - 1}

and

b = 2^{36} \cdot n

or

a = 2^{q + 63} \cdot 10^{- k - 1}

and

b = 2^{64} \cdot n

, from Equation (113), the following exists:

\begin{matrix} float : & ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ - ⌊ 2^{36} \cdot n ⌋ > 2^{q + 35} \cdot 10^{- k - 1} - 2^{36} \cdot n - 1 \\ double : & ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ - ⌊ 2^{64} \cdot n ⌋ > 2^{q + 63} \cdot 10^{- k - 1} - 2^{64} \cdot n - 1 \end{matrix}

(119)

From Equation (118), we have

\begin{matrix} float : & ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ - ⌊ 2^{36} \cdot n ⌋ > 2^{3} - 1 ⩾ 0 \\ double : & ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ - ⌊ 2^{64} \cdot n ⌋ > 4 - 1 ⩾ 0 \end{matrix}

(120)

Therefore, there is

\begin{matrix} float : ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n & ⩾ ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ \\ > ⌊ 2^{36} \cdot n ⌋ + 1 ⩾ ⌊ 2^{36} \cdot n_{r} ⌋ \\ \Rightarrow ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n & > ⌊ 2^{36} \cdot n_{r} ⌋ \\ double : ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n & ⩾ ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ \\ > ⌊ 2^{64} \cdot n ⌋ + 1 ⩾ ⌊ 2^{64} \cdot n_{r} ⌋ \\ \Rightarrow ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n & > ⌊ 2^{64} \cdot n_{r} ⌋ \end{matrix}

(121)

Therefore, when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} > n

, the condition Equation (110) can be used to determine that

o n e = 0

.

(3): When $2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} < 1 - n$ , there must exist $o n e \neq 10$ , and there is

\begin{matrix} float : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} + n = n^{+} \in (2^{- 33}, 1 - 2^{- 29}) \\ double : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} + n = n^{+} \in (2^{- 62}, 1 - 2^{- 63}) \end{matrix}

(122)

Therefore, the following exists:

\begin{matrix} float : & 2^{q + 35} \cdot 10^{- k - 1} + 2^{36} \cdot n \in (2^{3}, 2^{36} - 2^{7}) \\ double : & 2^{q + 63} \cdot 10^{- k - 1} + 2^{64} \cdot n \in (4, 2^{64} - 2) \end{matrix}

(123)

Suppose that there are two real numbers a and b, and the following relationship must exist:

\begin{matrix} a - 1 & < ⌊ a ⌋ ⩽ a \\ b - 1 & < ⌊ b ⌋ ⩽ b \\ a + b - 2 & < ⌊ a ⌋ + ⌊ b ⌋ ⩽ a + b \end{matrix}

(124)

When

a = 2^{q + 35} \cdot 10^{- k - 1}

and

b = 2^{36} \cdot n

or

a = 2^{q + 63} \cdot 10^{- k - 1}

and

b = 2^{64} \cdot n

, the following exists:

\begin{matrix} float : & ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + ⌊ 2^{36} \cdot n ⌋ ⩽ 2^{q + 35} \cdot 10^{- k - 1} + 2^{36} \cdot n \\ double : & ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + ⌊ 2^{64} \cdot n ⌋ ⩽ 2^{q + 63} \cdot 10^{- k - 1} + 2^{64} \cdot n \end{matrix}

(125)

From Equation (123), we have

\begin{matrix} float : & ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + ⌊ 2^{36} \cdot n ⌋ < 2^{36} - 2^{7} \\ double : & ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + ⌊ 2^{64} \cdot n ⌋ < 2^{64} - 2 \end{matrix}

(126)

Therefore, there is:

\begin{matrix} float : ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n & ⩽ ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + 1 \\ < 2^{36} - 2 - ⌊ 2^{36} \cdot n ⌋ \\ < 2^{36} - 1 - ⌊ 2^{36} \cdot n_{r} ⌋ \\ \Rightarrow ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n & < 2^{36} - 1 - ⌊ 2^{36} \cdot n_{r} ⌋ \\ double : ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n & ⩽ ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + 1 \\ < 2^{64} - 2 - ⌊ 2^{64} \cdot n ⌋ \\ < 2^{64} - 1 - ⌊ 2^{64} \cdot n_{r} ⌋ \\ \Rightarrow ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n & < 2^{64} - 1 - ⌊ 2^{64} \cdot n_{r} ⌋ \end{matrix}

(127)

Therefore, when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} < 1 - n

, the condition Equation (110) can be used to determine that

o n e \neq 10

.

(4): When $2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} > 1 - n$ , there must exist $o n e = 10$ , and there is

\begin{matrix} float : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} + n = n^{+} + 1 \in (1 + 2^{- 33}, 2 - 2^{- 29}) \\ double : & 2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} + n = n^{+} + 1 \in (1 + 2^{- 62}, 2 - 2^{- 63}) \end{matrix}

(128)

Therefore, the following exists:

\begin{matrix} float : & 2^{q + 35} \cdot 10^{- k - 1} + 2^{36} \cdot n \in (2^{3} + 2^{36}, 2^{37} - 2^{7}) \\ double : & 2^{q + 63} \cdot 10^{- k - 1} + 2^{64} \cdot n \in (4 + 2^{64}, 2^{65} - 2) \end{matrix}

(129)

When

a = 2^{q + 35} \cdot 10^{- k - 1}

and

b = 2^{36} \cdot n

or

a = 2^{q + 63} \cdot 10^{- k - 1}

and

b = 2^{64} \cdot n

, from Equation (124), the following exists:

\begin{matrix} float : & ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + ⌊ 2^{36} \cdot n ⌋ > 2^{q + 35} \cdot 10^{- k - 1} + 2^{36} \cdot n - 2 \\ double : & ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + ⌊ 2^{64} \cdot n ⌋ > 2^{q + 63} \cdot 10^{- k - 1} + 2^{64} \cdot n - 2 \end{matrix}

(130)

From Equation (129), we have

\begin{matrix} float : & ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + ⌊ 2^{36} \cdot n ⌋ > 2^{36} + 2^{3} - 2 > 2^{36} \\ double : & ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + ⌊ 2^{64} \cdot n ⌋ > 2^{64} + 2 - 2 > 2^{64} \end{matrix}

(131)

Therefore, there is

\begin{matrix} float : ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n & ⩾ ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ \\ > 2^{36} - ⌊ 2^{36} \cdot n ⌋ \\ > 2^{36} - 1 - ⌊ 2^{36} \cdot n_{r} ⌋ \\ \Rightarrow ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n & > 2^{36} - 1 - ⌊ 2^{36} \cdot n_{r} ⌋ \\ double : ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n & ⩾ ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ \\ > 2^{64} - ⌊ 2^{64} \cdot n ⌋ \\ > 2^{64} - 1 - ⌊ 2^{64} \cdot n_{r} ⌋ \\ \Rightarrow ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n & > 2^{64} - 1 - ⌊ 2^{64} \cdot n_{r} ⌋ \end{matrix}

(132)

Therefore, when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} > 1 - n

, the condition Equation (110) can be used to determine that

o n e = 10

.

From the above proof, it can be seen that when Equation (56) is met, the condition Equation (110) can be used to determine whether

o n e = 0

or

o n e = 10

when

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = n

or

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} = 1 - n

. When

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} > n

or

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} > 1 - n

, the condition Equation (110) can be used to determine whether

o n e = 0

or

o n e = 10

. When

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} < n

or

2^{- 1} \cdot 2^{q} \cdot 10^{- k - 1} < 1 - n

, the condition Equation (110) can be used to determine whether

o n e \neq 0

or

o n e \neq 10

.

The proof process of this section is completed. In the code implementation, the two judgment conditions can be quickly calculated using addition and subtraction shift operations, and they can be compiled by the compiler into cmov instructions, thereby reducing the impact of branch prediction failure on performance.

Ultimately, we reached the following conclusion:

o n e

can quickly determine whether

o n e

equals 0 or 10 by using the following method:

\begin{matrix} float : ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n & > ⌊ 2^{36} \cdot n_{r} ⌋ \Rightarrow o n e = 0 \\ ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + e v e n & > 2^{36} - 1 - ⌊ 2^{36} \cdot n_{r} ⌋ \Rightarrow o n e = 10 \\ double : ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n & > ⌊ 2^{64} \cdot n_{r} ⌋ \Rightarrow o n e = 0 \\ ⌊ 2^{q + 63} \cdot 10^{- k - 1} ⌋ + e v e n & > 2^{64} - 1 - ⌊ 2^{64} \cdot n_{r} ⌋ \Rightarrow o n e = 10 \end{matrix}

(133)

3.7. Efficient Computation of $⌊ 10 n ⌋$ and Rounding

Determine whether

o n e

is

⌊ 10 n ⌋

or

⌊ 10 n ⌋ + 1

based on the decimal part of

10 n

. There are two cases: the decimal part of

10 n

is 0.5, or it is not 0.5.

3.7.1. $10 n - ⌊ 10 n ⌋ = 0.5$

When the decimal part of

10 n

is 0.5, there must be

\begin{matrix} 10 n - ⌊ 10 n ⌋ = 0.5 \\ \Rightarrow 10 \cdot c \cdot 2^{q} \cdot 10^{- k - 1} - ⌊ 10 \cdot c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ = 0.5 \\ \Rightarrow c \cdot 2^{q} \cdot 10^{- k} - ⌊ c \cdot 2^{q} \cdot 10^{- k} ⌋ = 0.5 \\ \Rightarrow c \cdot 2^{q} \cdot 10^{- k} = ⌊ c \cdot 2^{q} \cdot 10^{- k} ⌋ + 0.5 \\ \Rightarrow 2 c \cdot 2^{q} \cdot 10^{- k} = 2 ⌊ c \cdot 2^{q} \cdot 10^{- k} ⌋ + 1 \end{matrix}

(134)

so

2 c \cdot 2^{q} \cdot 10^{- k}

is an odd number. Then, the following expression is odd:

\begin{matrix} c \cdot 2^{q + 1} \cdot 10^{- k} = c \cdot 2^{q - k + 1} \cdot 5^{- k} \end{matrix}

(135)

According to the range of q, there are

\begin{matrix} c \cdot 2^{q + 1} \cdot 10^{- k} = \{\begin{matrix} \frac{c \cdot 2^{q - k + 1}}{5^{k}}; q ⩾ 0 \\ c \cdot 2 \cdot 5^{- k}; q = - 1 \\ \frac{c \cdot 5^{- k}}{2^{k - q - 1}}; q ⩽ - 2 \end{matrix} \end{matrix}

(136)

According to the range of q, the following situations are discussed:

$q ⩾ 0$
When $q ⩾ 0$ , it can be concluded that $q - k + 1 ⩾ 1$ , the numerator $c \cdot 2^{q - k + 1}$ is even, and the denominator $5^{k}$ is odd, which does not meet the condition.
$q = - 1$
When $q = - 1$ , it can be concluded that $c \cdot 2 \cdot 5^{- k}$ is even, which does not meet the condition.
$q ⩽ - 2$
$5^{- k}$ is an odd number, and c is an odd multiple of $2^{k - q - 1}$ , so

$\begin{matrix} float : & c ⩾ 2^{k - q - 1} \Rightarrow k - q - 1 ⩽ 22 \Rightarrow q ⩾ - 34 \\ double : & c ⩾ 2^{k - q - 1} \Rightarrow k - q - 1 ⩽ 51 \Rightarrow q ⩾ - 75 \end{matrix}$

(137)

Therefore, when q meets the above conditions, c must be an odd multiple of $2^{k - q - 1}$ . Therefore, when the following conditions are met, the expression of Equation (135) is an odd number:

$\begin{matrix} float : & - 34 ⩽ q ⩽ - 2 & & c % 2^{k - q} = 2^{k - q - 1} \\ double : & - 75 ⩽ q ⩽ - 2 & & c % 2^{k - q} = 2^{k - q - 1} \end{matrix}$

(138)

When q is within the range of Equation (138), $r = 1$ is derived from Equation (28). Therefore, there is

$\begin{matrix} n_{r} = n \end{matrix}$

(139)

The following equation holds:

$\begin{matrix} 20 m + 20 n = c \cdot 2^{q} \cdot 10^{- k + 1} = c \cdot 2^{q - k + 1} \cdot 5^{- k} = \frac{c}{2^{k - q - 1}} \cdot 5^{- k} \end{matrix}$

(140)

Since $- k ⩾ 1$ , $5^{- k}$ is multiple of 5 and is an odd number. Since $\frac{c}{2^{k - q - 1}}$ and $5^{- k}$ are both odd numbers, $20 m$ is an even number, and $20 n$ is multiple of 5 and is an odd number. Therefore, there is

$\begin{matrix} 20 n \in \{5, 15\} \\ \Rightarrow n \in \{0.25, 0.75\} \\ \Rightarrow n_{r} \in \{0.25, 0.75\} \end{matrix}$

(141)

The result of $o n e$ is an even number between $⌊ 10 n ⌋$ and $⌊ 10 n ⌋ + 1$ . Therefore,

$\begin{matrix} o n e = \{\begin{matrix} ⌊ 10 n ⌋ = 2, if n = 0.25 \\ ⌊ 10 n ⌋ + 1 = 8, if n = 0.75 \end{matrix} \Rightarrow o n e = ⌊ 20 n + 1 ⌋ / / 2 - (n = 0.25 ? 1 : 0) \end{matrix}$

(142)

3.7.2. $10 n - ⌊ 10 n ⌋ \neq 0.5$

When the decimal part of

10 n

is not 0.5, round to the nearest integer value based on the decimal part of

10 n

. Therefore, there is

\begin{matrix} o n e = \{\begin{matrix} ⌊ 10 n ⌋, if 10 n - ⌊ 10 n ⌋ < 0.5 \\ ⌊ 10 n ⌋ + 1, if 10 n - ⌊ 10 n ⌋ > 0.5 \end{matrix} \Rightarrow o n e = ⌊ 10 n + 0.5 ⌋ = ⌊ 20 n + 1 ⌋ / / 2 \end{matrix}

(143)

Since

⌊ 20 n + 1 ⌋ = ⌊ 20 n ⌋ + 1

, it is only necessary to accurately calculate the value of

⌊ 20 n ⌋

, and there is

\begin{matrix} d & = t e n + o n e \\ = 10 m + ⌊ 20 n + 1 ⌋ / / 2 \\ = (⌊ 20 m + 20 n ⌋ + 1) / / 2 \end{matrix}

(144)

Suppose that there are

\begin{matrix} 20 m + 20 n = c \cdot 2^{q + 1} \cdot 10^{- k} = c \cdot 2^{q - k + 1} \cdot 5^{- k} = c \cdot \frac{x}{y} \end{matrix}

(145)

Suppose that the decimal part of

20 n

is

n_{20}

.

When

y ⩽ c_{max} = C

, the range of the decimal part must include the following:

\begin{matrix} float : & \frac{1}{2^{24} - 1} = \frac{1}{C} ⩽ n_{20} ⩽ 1 - \frac{1}{C} = \frac{2^{24} - 2}{2^{24} - 1} \\ double : & \frac{1}{2^{53} - 1} = \frac{1}{C} ⩽ n_{20} ⩽ 1 - \frac{1}{C} = \frac{2^{53} - 2}{2^{53} - 1} \end{matrix}

(146)

When

y > c_{max} = C

, the range of the decimal part must include (the test file is (test5.py) https://github.com/xjb714/xjb/blob/main/py_test/test5.py (accessed on 20 April 2026)):

\begin{matrix} float : & 2^{- 32} < n_{20} < 1 - 2^{- 30} \\ double : & 2^{- 64} < n_{20} < 1 - 2^{- 62} \end{matrix}

(147)

Therefore, the range of

n_{20}

satisfies Equation (147). In the code implementation, for float, only the high 36 bits of

n_{r}

are retained, and for double, only the high 70 bits of

n_{r}

are retained. Suppose that the discarded part of a float is represented as

n_{36}

, and similarly, the discarded part of a double is represented as

n_{70}

. Therefore, there is

\begin{matrix} float : & n_{36} \in [0, 2^{- 36}) \\ double : & n_{70} \in [0, 2^{- 70}) \end{matrix}

(148)

Calculate the boundary conditions of the following expression:

\begin{matrix} float : & F = 20 \cdot (c \cdot 2^{q} \cdot r \cdot 10^{- k - 1} - n_{36}) \\ double : & F = 20 \cdot (c \cdot 2^{q} \cdot r \cdot 10^{- k - 1} - n_{70}) \end{matrix}

(149)

Therefore, there is

\begin{matrix} float : F_{min} & > 20 \cdot (c \cdot 2^{q} \cdot 10^{- k - 1} - 2^{- 36}) \\ = 20 m + 20 n - 20 \cdot 2^{- 36} \\ F_{max} & < 20 \cdot (c \cdot 2^{q} \cdot (1 + 2^{- 63}) \cdot 10^{- k - 1} - 0) \\ < 20 m + 20 n + 20 \cdot 2^{- 63} \cdot c \\ < 20 m + ⌊ 20 n ⌋ + 1 \\ double : F_{min} & > 20 \cdot (c \cdot 2^{q} \cdot 10^{- k - 1} - 2^{- 70}) \\ = 20 m + 20 n - 20 \cdot 2^{- 70} \\ > 20 m + ⌊ 20 n ⌋ \\ F_{max} & < 20 \cdot (c \cdot 2^{q} \cdot (1 + 2^{- 127}) \cdot 10^{- k - 1} - 0) \\ < 20 m + 20 n + 20 \cdot 2^{- 127} \cdot c \\ < 20 m + ⌊ 20 n ⌋ + 1 \end{matrix}

(150)

Therefore, there is

\begin{matrix} float : & ⌊ F ⌋ = 20 m + ⌊ 20 n ⌋ \\ double : & ⌊ F ⌋ = 20 m + ⌊ 20 n ⌋ \end{matrix}

(151)

In fact, in the above proof process, for float,

⌊ F_{m i n} ⌋ \neq 20 m + ⌊ 20 n ⌋

may exist, but the code implementation has passed the exhaustive test. For the float type, our code implementation has passed all tests for all possible input values, so this not-so-perfect proof process can be ignored. Therefore, the calculation of d can be simplified as follows:

\begin{matrix} d & = t e n + o n e \\ = (⌊ F ⌋ + 1) / / 2 \\ = (⌊ 20 \cdot (c \cdot 2^{q} \cdot r \cdot 10^{- k - 1} - n_{x}) ⌋ + 1) / / 2 \end{matrix}

(152)

For the float range,

n_{x} = n_{36}

; for the double range,

n_{x} = n_{70}

.

3.7.3. Efficient Implementation of $n = 0.25$ for Double

For double, quickly determine that

n = 0.25

in Equation (142).

When

n = 0.25

,

⌊ 2^{64} \cdot n_{r} ⌋ = ⌊ 2^{64} \cdot n ⌋ = 2^{62}

. Therefore, the following condition can be used to quickly determine whether

n = 0.25

:

\begin{matrix} double : & n = 0.25 if ⌊ 2^{64} \cdot n_{r} ⌋ = 2^{62} \end{matrix}

(153)

When

n \neq 0.25

, calculate the range of the decimal part of the following expression:

\begin{matrix} 4 m + 4 n = c \cdot 2^{q + 2} \cdot 10^{- k - 1} \end{matrix}

(154)

Therefore, when Equation (154) is not an integer, we have the following (the test file is (test6.py) https://github.com/xjb714/xjb/blob/main/py_test/test6.py (accessed on 20 April 2026)):

\begin{matrix} 2^{- 62} < 4 n - ⌊ 4 n ⌋ < 1 - 2^{- 62} \end{matrix}

(155)

Calculate the two boundary cases of

4 n

that are closest to 1:

\begin{matrix} ⌊ 4 n ⌋ = 0 \Rightarrow 4 n - 0 < 1 - 2^{- 62} \Rightarrow ⌊ 2^{64} \cdot n ⌋ ⩽ 2^{62} - 2 \\ ⌊ 4 n ⌋ = 1 \Rightarrow 4 n - 1 > 2^{- 62} \Rightarrow ⌊ 2^{64} \cdot n ⌋ ⩾ 2^{62} + 1 \end{matrix}

(156)

Then, there are

\begin{matrix} ⌊ 2^{64} \cdot n ⌋ \neq 2^{62} & & ⌊ 2^{64} \cdot n ⌋ + 1 \neq 2^{62} \\ \Rightarrow ⌊ 2^{64} \cdot n_{r} ⌋ \neq 2^{62} \end{matrix}

(157)

Therefore, the following condition can be used to quickly determine whether

n \neq 0.25

:

\begin{matrix} double : & n \neq 0.25 if ⌊ 2^{64} \cdot n_{r} ⌋ \neq 2^{62} \end{matrix}

(158)

In summary, for double, the following condition can be used to quickly determine whether

n = 0.25

:

\begin{matrix} double : & n = 0.25 if ⌊ 2^{64} \cdot n_{r} ⌋ = 2^{62} \\ double : & n \neq 0.25 if ⌊ 2^{64} \cdot n_{r} ⌋ \neq 2^{62} \end{matrix}

(159)

3.7.4. Efficient Calculation of $o n e$ for Double

In the double range, introduce another faster way to calculate

o n e

:

\begin{matrix} double : & o n e = ⌊ \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 10 + ((n = 0.25) ? 0 : (2^{- 1} + \frac{6}{2^{64}})) ⌋ \end{matrix}

(160)

The proof of Equation (160) is as follows:

when $n = 0.25$ , $⌊ \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 10 ⌋ = ⌊ 10 n ⌋ = 2$ ;
when $n \neq 0.25$ , Equation (160) can be equivalent to the following:

\begin{matrix} double : & o n e = ⌊ \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 10 + 2^{- 1} + \frac{6}{2^{64}} ⌋ \end{matrix}

(161)

According to the

10 n - ⌊ 10 n ⌋

range,

o n e

is represented as follows:

\begin{matrix} double : & o n e = \{\begin{matrix} ⌊ 10 n ⌋, if 10 n - ⌊ 10 n ⌋ < 0.5 \\ 8, if 10 n - ⌊ 10 n ⌋ = 0.5 \\ ⌊ 10 n ⌋ + 1, if 10 n - ⌊ 10 n ⌋ > 0.5 \end{matrix} = ⌊ 20 n + 1 ⌋ / / 2 \end{matrix}

(162)

Therefore, when

n \neq 0.25

, we need to prove that the following equation holds:

\begin{matrix} ⌊ \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 10 + 2^{- 1} + \frac{6}{2^{64}} ⌋ = \{\begin{matrix} ⌊ 10 n ⌋, if 10 n - ⌊ 10 n ⌋ < 0.5 \\ 8, if 10 n - ⌊ 10 n ⌋ = 0.5 \\ ⌊ 10 n ⌋ + 1, if 10 n - ⌊ 10 n ⌋ > 0.5 \end{matrix} = ⌊ 20 n + 1 ⌋ / / 2 \end{matrix}

(163)

From the range of n, there is

\begin{matrix} \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \in (n_{r} - 2^{- 64}, n_{r}] \end{matrix}

(164)

because the following conditions exist:

\begin{matrix} c \cdot 2^{q} \cdot 10^{- k - 1} & = m + n \\ c \cdot 2^{q} \cdot r \cdot 10^{- k - 1} & = m + n_{r} \end{matrix}

(165)

Therefore, the following relationship can be concluded:

\begin{matrix} n_{r} - n & = (r - 1) \cdot c \cdot 2^{q} \cdot 10^{- k - 1} \\ n_{r} & = (r - 1) \cdot (m + n) + n \\ \Rightarrow n ⩽ n_{r} & < 2^{- 127} \cdot c + n \\ n ⩽ n_{r} & < 2^{- 127} \cdot 2^{53} + n \\ n ⩽ n_{r} & < 2^{- 74} + n \end{matrix}

(166)

From Equations (164) and (166), it can be concluded that

\begin{matrix} \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \in (n - 2^{- 64}, n + 2^{- 74}) \\ \Rightarrow \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 10 \in (10 n - 10 \cdot 2^{- 64}, 10 n + 10 \cdot 2^{- 74}) \\ \Rightarrow \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 20 \in (20 n - 20 \cdot 2^{- 64}, 20 n + 20 \cdot 2^{- 74}) \\ \Rightarrow \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 20 \in (⌊ 20 n ⌋ + n_{20} - 20 \cdot 2^{- 64}, ⌊ 20 n ⌋ + n_{20} + 20 \cdot 2^{- 74}) \end{matrix}

(167)

Discuss the range of values of x when the following conditions are met:

\begin{matrix} ⌊ \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 20 + 1 + x ⌋ / / 2 = ⌊ 20 n + 1 ⌋ / / 2 = o n e \end{matrix}

(168)

Therefore, the following conclusions can be drawn:

\begin{matrix} ⌊ 20 n ⌋ + n_{20} - 20 \cdot 2^{- 64} + 1 + x ⩾ ⌊ 20 n + 1 ⌋ & \Rightarrow x ⩾ 20 \cdot 2^{- 64} - n_{20} \\ ⌊ 20 n ⌋ + n_{20} + 20 \cdot 2^{- 74} + 1 + x < ⌊ 20 n + 2 ⌋ & \Rightarrow x < 1 - 20 \cdot 2^{- 74} - n_{20} \end{matrix}

(169)

Suppose

x = 12 \cdot 2^{- 64}

. From Equation (169), all floating-point numbers that do not meet the following conditions can be obtained:

\begin{matrix} x = 12 \cdot 2^{- 64} ⩾ 20 \cdot 2^{- 64} - n_{20} \end{matrix}

(170)

All floating-point numbers that do not meet the conditions of Equation (170) are as follows (hexadecimal and the printed results that meet the SW principle):

\begin{matrix} 0 x 0 d 17 c 0747 bd 76 fa 1, & 1.3588129002659584 e - 245 \\ 0 x 0 d 27 c 0747 bd 76 fa 1, & 2.7176258005319167 e - 245 \\ 0 x 4 d 73 de 005 bd 620 df, & 1.3076622631878654 e + 65 \\ 0 x 4 d 83 de 005 bd 620 df, & 2.6153245263757307 e + 65 \\ 0 x 4 d 93 de 005 bd 620 df, & 5.230649052751461 e + 65 \end{matrix}

(171)

From Equation (169), all floating-point numbers that do not meet the following conditions can be obtained:

\begin{matrix} x = 12 \cdot 2^{- 64} < 1 - 20 \cdot 2^{- 74} - n_{20} \end{matrix}

(172)

All floating-point numbers that do not meet the conditions of Equation (172) are as follows (hexadecimal and the printed results that meet the SW principle):

\begin{matrix} 0 x 612491 daad 0 ba 280, & 9.03725590277404 e + 159 \\ 0 x 6159 b 651584 e 8 b 20, & 9.03725590277404 e + 160 \\ 0 x 619011 f 2 d 73116 f 4, & 9.03725590277404 e + 161 \\ 0 x 61 c 4166 f 8 cfd 5 cb 1, & 9.03725590277404 e + 162 \\ 0 x 61 d 4166 f 8 cfd 5 cb 1, & 1.807451180554808 e + 163 \end{matrix}

(173)

There are

\begin{matrix} 2 (\frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 10 + 2^{- 1} + \frac{6}{2^{64}}) = \frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 20 + 1 + x \end{matrix}

(174)

When the floating-point number is not within the range specified in Equations (171) and (173), the condition of Equation (169) is satisfied. We have tested all floating-point numbers within the above-mentioned range (Equations (171) and (173)), and the algorithm implementation code output the correct result; that is, it satisfies the SW principle. The test process file is (test8.py) https://github.com/xjb714/xjb/blob/main/py_test/test8.py (accessed on 20 April 2026).

In summary, Equations (163) and (160) hold. Therefore, Equation (160) can be used to quickly calculate

o n e

.

3.8. Irregular Number

Due to the limited and small number of irregular floating-point numbers, there are a total of 2046 double floating-point numbers and 254 float floating-point numbers. The correctness of the algorithm code in this paper can be proved by the exhaustive method. Compared with the Schubfach algorithm, all of the irregular values produced exactly the same output results; therefore, it is not introduced in this article. In the code implementation, we use separate branches to calculate the irregular values. For the specific implementation process, please refer to the source code.

3.9. Implementation of Pseudocode

This subsection details the pseudocode implementation for converting regular floating-point numbers to decimal representation. The handling of irregular floating-point numbers is omitted here due to space constraints; interested readers may refer to the source code for the complete implementation.

3.9.1. Single-Precision Floating-Point Numbers

\begin{matrix} uint 32 v i & = & bit_copy_from_f 32 (v) \\ (uint 64 c, int 32 q) & = & extract (v i) \\ const int BIT & = & 36 \\ const uint 64 offset & = & (1 ULL ≪ (BIT - 2)) - 7 \\ int 32 k & = & (q \cdot 1233) ≫ 12 \\ int 32 h & = & q + ((1701 \cdot (- k - 1)) ≫ 9) \\ uint 64 pow 10 & = & get_pow 10 (- k - 1) \\ uint 64 c b & = & c ≪ (h + BIT + 1) \\ uint 64 hi 64 & = & umul_{hi}_{64 \times 64} (c b, pow 10) \\ bool even & = & (v i + 1) & 1 \\ uint 64 half & = & (pow 10 ≫ (65 - (h + BIT + 1))) + even \\ uint 32 shorter & = & (hi 64 + half) ≫ BIT \\ uint 64 bias & = & (hi 64 ≫ (BIT - 4)) & 15 \\ uint 32 longer & = & (5 \cdot hi 64 + offset + bias) ≫ (BIT - 1) \\ bool updown & = & shorter > ((hi 64 - half) ≫ BIT) \\ uint 32 d & = & updown ? shorter \cdot 10 : longer \end{matrix}

(175)

Pseudocode Equation (175) outlines the procedure for computing d and k for single-precision floating-point numbers.

Since c contains at most 23 significant bits and

pow 10

has 64 significant bits, their product contains at most 87 significant bits. Given that the range of h is

[- 4, - 1]

, the value

c b = c ≪ (h + BIT + 1)

does not overflow, preserving all bits of c. When

e_{10} = - k - 1

,

pow 10

is computed using Equation (24). From Equation (25), we derive

\begin{matrix} pow 10 \cdot 2^{⌊ (- k - 1) \cdot {log}_{2} 10 ⌋ - 63} = 10^{- k - 1} \cdot r_{1, - k - 1} \end{matrix}

(176)

From Equation (45),

\begin{matrix} m & = ⌊ v \cdot 10^{- k - 1} ⌋ \\ = ⌊ v \cdot 10^{- k - 1} \cdot r_{1, - k - 1} ⌋ \\ = ⌊ v \cdot pow 10 \cdot 2^{⌊ (- k - 1) \cdot {log}_{2} 10 ⌋ - 63} ⌋ \\ = ⌊ c \cdot pow 10 \cdot 2^{q + ⌊ (- k - 1) \cdot {log}_{2} 10 ⌋ - 63} ⌋ \\ = (c \cdot pow 10) ≫ (63 - q - ⌊ (- k - 1) \cdot {log}_{2} 10 ⌋) \\ = (c \cdot pow 10) ≫ (63 - h) \\ = ((c ≪ (37 + h)) \cdot pow 10) ≫ (36 + 64) \\ = hi 64 ≫ 36 \end{matrix}

(177)

Let

up

indicate whether

o n e = 10

. From Equations (132) and (99),

\begin{matrix} bool up & = ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + even > 2^{36} - 1 - ⌊ 2^{36} \cdot n_{r} ⌋ \\ = ⌊ 2^{q + 35} \cdot 10^{- k - 1} ⌋ + even + ⌊ 2^{36} \cdot n_{r} ⌋ ⩾ 2^{36} \\ = (pow 10 ≫ (28 - h)) + even + ⌊ 2^{36} \cdot n_{r} ⌋ ⩾ 2^{36} \\ = (half + ⌊ 2^{36} \cdot n_{r} ⌋) ≫ 36 \end{matrix}

(178)

When

o n e \neq 10

, we have

d / / 10 = m

; when

o n e = 10

, we have

d / / 10 = m + 1

. Therefore,

\begin{matrix} d / / 10 = m + up \end{matrix}

(179)

The upper 28 bits of

hi 64

represent m, while the lower 36 bits represent

⌊ 2^{36} \cdot n_{r} ⌋

:

\begin{matrix} hi 64 = (m ≪ 36) + ⌊ 2^{36} \cdot n_{r} ⌋ \end{matrix}

(180)

Consequently,

\begin{matrix} shorter = d / / 10 = (hi 64 + half) ≫ 36 \end{matrix}

(181)

Let

updown

indicate whether

o n e \in {0, 10}

, or equivalently whether the last digit of d is zero:

\begin{matrix} updown = (d mod 10 = 0) \end{matrix}

(182)

For brevity, let

dot_one = ⌊ 2^{36} \cdot n_{r} ⌋

. From Equations (128) and (134),

\begin{matrix} o n e = 0 & : half > dot_one \\ o n e = 10 & : half > 2^{36} - 1 - dot_one \end{matrix}

(183)

The following equations hold:

\begin{matrix} o n e = 0 & : (hi 64 - half) ≫ 36 = m - 1 \\ o n e = 10 & : (hi 64 - half) ≫ 36 = m \end{matrix}

(184)

Therefore,

\begin{matrix} o n e = 0 & : shorter = m > (hi 64 - half) ≫ 36 = m - 1 \\ o n e = 10 & : shorter = m + 1 > (hi 64 - half) ≫ 36 = m \end{matrix}

(185)

Thus, the condition for

o n e \in {0, 10}

is

\begin{matrix} shorter > (hi 64 - half) ≫ 36 \Rightarrow d mod 10 = 0 \end{matrix}

(186)

When

updown

is true,

d = 10 \cdot shorter

. Otherwise, we compute d using Equation (152). When

10 n - ⌊ 10 n ⌋ \neq 0.5

,

\begin{matrix} d & = (⌊ 20 \cdot (v \cdot 10^{- k - 1} \cdot r - n_{36}) ⌋ + 1) / / 2 \\ = (⌊ 20 \cdot (hi 64 \cdot 2^{- 36}) ⌋ + 1) / / 2 \\ = (⌊ 5 \cdot hi 64 \cdot 2^{- 34} ⌋ + 1) / / 2 \\ = (⌊ (5 \cdot hi 64 + 2^{34}) \cdot 2^{- 34} ⌋) / / 2 \\ = ((5 \cdot hi 64 + 2^{34}) ≫ 34) / / 2 \\ = (5 \cdot hi 64 + 2^{34}) ≫ 35 \end{matrix}

(187)

Adding

2^{34}

serves as a rounding operation. When

10 n - ⌊ 10 n ⌋ = 0.5

, Equation (142) requires checking whether

n = 0.25

. The derivation of

longer

involves careful handling of edge cases; the correctness of this computation has been verified through exhaustive testing. Similarly, for the float type, our code implementation has passed all tests for possible input values. From the above pseudocode, it can be seen that, for the float type, we only need to perform one 64-bit multiplication of 64-bit integers.

3.9.2. Double-Precision Floating-Point Numbers

\begin{matrix} uint 64 v i & = & bit_copy_from_f 64 (v) \\ (uint 64 c, int q) & = & extract (v i) \\ const int BIT & = & 6 \\ int k & = & (q \cdot 78913) ≫ 18 \\ int h & = & q + ((217707 \cdot (- k - 1)) ≫ 16) \\ uint 128 pow 10 & = & get_pow 10 (- k - 1) \\ uint 64 c b & = & c ≪ (h + BIT + 1) \\ uint 128 hi 128 & = & umul_{hi}_{64 \times 128} (c b, pow 10) \\ bool even & = & (v i + 1) & 1 \\ uint 64 half & = & (pow 10 ≫ (64 - h)) + even \\ uint 64 dot_one & = & (uint 64) (hi 128 ≫ BIT) \\ uint 64 ten & = & 10 \cdot (hi 128 ≫ (BIT + 64)) \\ uint 64 offset_num & = & (dot_one = 2^{62}) ? 0 : 2^{63} + 6 \\ uint 64 o n e & = & ((uint 128) 10 \cdot dot_one + offset_num) ≫ 64 \\ o n e & = & (half > dot_one) ? 0 : o n e \\ o n e & = & (half > 2^{64} - 1 - dot_one) ? 10 : o n e \\ uint 64 d & = & ten + o n e \end{matrix}

(188)

Pseudocode Equation (188) presents the corresponding procedure for double-precision floating-point numbers;

pow 10

is computed using Equation (26), satisfying

\begin{matrix} pow 10 \cdot 2^{⌊ (- k - 1) \cdot {log}_{2} 10 ⌋ - 127} = 10^{- k - 1} \cdot r_{1, - k - 1} \end{matrix}

(189)

From Equation (45),

\begin{matrix} m & = ⌊ v \cdot 10^{- k - 1} ⌋ \\ = ⌊ v \cdot 10^{- k - 1} \cdot r_{1, - k - 1} ⌋ \\ = ⌊ v \cdot pow 10 \cdot 2^{⌊ (- k - 1) \cdot {log}_{2} 10 ⌋ - 127} ⌋ \\ = ⌊ c \cdot pow 10 \cdot 2^{q + ⌊ (- k - 1) \cdot {log}_{2} 10 ⌋ - 127} ⌋ \\ = (c \cdot pow 10) ≫ (127 - q - ⌊ (- k - 1) \cdot {log}_{2} 10 ⌋) \\ = (c \cdot pow 10) ≫ (127 - h) \\ = ((c ≪ (7 + h)) \cdot pow 10) ≫ (64 + 70) \\ = hi 128 ≫ 70 \end{matrix}

(190)

Given

h \in [- 4, - 1]

and c having at most 53 significant bits,

c b

does not overflow, ensuring accurate computation of

ten = 10 \cdot m

. By definition,

dot_one = ⌊ 2^{64} \cdot n_{r} ⌋

. From Equation (99),

\begin{matrix} half & = (pow 10 ≫ (64 - h)) + even \\ = ⌊ 2^{63} \cdot 2^{q} \cdot 10^{- k - 1} ⌋ + even \end{matrix}

(191)

The value of

o n e

is first computed using Equation (160), and then adjusted if the conditions for

o n e = 0

or

o n e = 10

are met:

\begin{matrix} o n e & = ⌊\frac{⌊ 2^{64} \cdot n_{r} ⌋}{2^{64}} \cdot 10 + ((n = 0.25) ? 0 : (2^{- 1} + \frac{6}{2^{64}}))⌋ \\ = (dot_one \cdot 10 + (dot_one = 2^{62} ? 0 : 2^{63} + 6)) ≫ 64 \end{matrix}

(192)

An equivalent formulation is

\begin{matrix} o n e = (dot_one = 2^{62}) ? 2 : (dot_one \cdot 10 + 2^{63} + 6) ≫ 64 \end{matrix}

(193)

The conditions

o n e = 0

and

o n e = 10

are determined by Equations (121) and (132), respectively. From the above pseudocode, it can be seen that, for the double type, we only need one 64-bit multiplication by a 128-bit integer.

In the C/C++ implementation, the only branch occurs when computing c and q for subnormal floating-point numbers (marked as

unlikely

). The conversion of normal floating-point numbers is branch-free, eliminating branch misprediction penalties. As shown in Pseudocode Equations (175) and (188), the core algorithm is remarkably concise. Irregular numbers are handled in a separate branch—a worthwhile trade-off given their rarity.

3.10. Decimal-to-String Conversion

The floating-point number printing algorithm consists of two main phases. While the first phase, which computes the decimal digits d and exponent k, has been discussed in previous sections, this section focuses on the second phase: converting the computed decimal representation into a string.

Floating-point numbers are typically printed in two formats: fixed-point notation and scientific notation. Table 4 illustrates examples of both formats.

Having computed d and k, we can now convert the floating-point number to a string based on these values. According to the Schubfach algorithm, d may contain trailing zeros, meaning

d mod 10

could be zero. In order to reduce instruction dependencies, we decompose d into two parts:

d / / 10

and

d mod 10

. Let

u p

indicate whether

o n e

equals 10, while

u p d o w n

represents the cases where

o n e

is either 0 or 10. The following relationships hold:

\begin{matrix} d / / 10 & = m + u p \\ u p d o w n & = (o n e = 0 \lor o n e = 10) = (d mod 10 = 0) \\ d & = 10 \cdot (m + u p) + (u p d o w n ? 0 : o n e) \end{matrix}

(194)

By calculating the approximate range of m, we obtain

\begin{matrix} m & = ⌊ c \cdot 2^{q} \cdot 10^{- k - 1} ⌋ \\ c \cdot 2^{q} \cdot 10^{- k - 1} & \in [0.1 \cdot c, c) \\ \Rightarrow ⌊ 0.1 \cdot c_{min} ⌋ ⩽ ⌊ 0.1 \cdot c ⌋ ⩽ & m ⩽ ⌊ c ⌋ ⩽ ⌊ c_{max} ⌋ \end{matrix}

(195)

Based on the range of c, we derive

\begin{matrix} float : & \{\begin{matrix} normal : [\frac{2^{23} + 1}{10}, 2^{24} - 1] = [838860.9, 16777215] \\ subnormal : [\frac{1}{10}, 2^{23} - 1] = [0.1, 8388607] \end{matrix} \\ double : & \{\begin{matrix} normal : [\frac{2^{52} + 1}{10}, 2^{53} - 1] = [4.5 \times 10^{14}, 9 \times 10^{15}] \\ subnormal : [\frac{1}{10}, 2^{52} - 1] = [0.1, 4.5 \times 10^{15}] \end{matrix} \end{matrix}

(196)

Let

l e n 10 (x)

denote the number of decimal digits in x. For

x > 0

,

l e n 10 (x) = ⌊ {log}_{10} (x) ⌋ + 1

(197)

We define

l e n 10 (0) = 0

. For example,

l e n 10 (12345) = 5

. Consequently,

\begin{matrix} float : & \{\begin{matrix} normal : l e n 10 (m + u p) \in [6, 8] \\ subnormal : l e n 10 (m + u p) \in [0, 7] \end{matrix} \\ double : & \{\begin{matrix} normal : l e n 10 (m + u p) \in [15, 16] \\ subnormal : l e n 10 (m + u p) \in [0, 16] \end{matrix} \end{matrix}

(198)

Since

d / / 10 = m + u p

, we have

l e n 10 (d) = l e n 10 (m + u p) + 1

. Let

t z 10 (x)

denote the number of trailing zeros in the decimal representation of x; for example,

t z 10 (12300) = 2

. The number of significant digits

s i g 10 (x)

equals

l e n 10 (x) - t z 10 (x)

. When

u p d o w n = 0

,

o n e \neq 0

; thus,

d mod 10 = o n e \neq 0

and

t z 10 (d) = 0

. Therefore,

\begin{matrix} s i g 10 (d) & = u p d o w n ? l e n 10 (d) - t z 10 (d) : l e n 10 (d) \\ float : & s i g 10 (d) \in [1, 9] \\ double : & s i g 10 (d) \in [1, 17] \end{matrix}

(199)

For normal floating-point numbers,

l e n 10 (d)

can be computed as follows:

\begin{matrix} float : & l e n 10 (d) = l e n (m + u p) + 1 = 9 - [m + u p < 10^{7}] - [m + u p < 10^{6}] \\ double : & l e n 10 (d) = l e n (m + u p) + 1 = 16 + [(m + u p) ⩾ 10^{15}] \end{matrix}

(200)

where

[P]

denotes the Iverson bracket, which evaluates to 1 if predicate P is true and 0 otherwise.

Computing the ASCII representation of d reduces to computing the ASCII codes of

m + u p

and

o n e

. From the above, we have

\begin{matrix} float : & m + u p < 10^{8} \\ double : & m + u p < 10^{16} \end{matrix}

(201)

These bounds allow efficient computation of the ASCII code for

m + u p

using CPU SIMD instruction sets.

Scalar Implementation

We first present a scalar method that does not rely on SIMD instructions.

Let

x \in [0, 10^{8})

and

d e c_t o_a s c i i 8 (x)

be a function that computes the ASCII representation of x. Algorithm 2 describes this process. This version is an optimized version of the itoa algorithm written by Paul Khuong [19].

The function simultaneously computes

t z 10 (x)

as the return value

t z

. Example: X = 12345600, Y =“12345600”, tz = 2.

For

x \in [0, 10^{16})

, let

d e c_t o_a s c i i 16 (x)

compute the ASCII representation of x. Algorithm 3 describes this process. Example: X = 123456001234500, Y =“1234560012345600”, tz = 2.

Handling Undefined Behavior

Since

__b u i l t i n_c t z l l (0)

is undefined behavior in C language, special handling is required. For single-precision floating-point numbers, when

m + u p = 0

(the six smallest subnormal numbers),

u p d o w n = 0

, avoiding the undefined case. For double-precision numbers,

m + u p = 0

only occurs for

5 \times 10^{- 324}

(the smallest subnormal number). In our implementation, we handle this special case separately at the function entry.

Algorithm 2: Convert an 8-digit decimal number to ASCII: dec_to_ascii8(x)

Input:: X (type: uint64, range: [0, 10⁸))
Output:: Y (type: uint64), tz (type: uint32)
1:: uint64 abcd_efgh = X + (0x100000000 - 10000) * ((X * 0x68db8bbULL) $> >$ 40)
2:: uint64 ab_cd_ef_gh = abcd_efgh + (0x10000 - 100) * (((abcd_efgh * 0x147b) $> >$ 19) & 0x7f0000007f)
3:: uint64 a_b_c_d_e_f_g_h = ab_cd_ef_gh + (0x100 - 10) * (((ab_cd_ef_gh * 0x67) $> >$ 10) & 0xf000f000f000f)
4:: uint32 tz = __builtin_ctzll(a_b_c_d_e_f_g_h) $> >$ 3
5:: uint64 BCD = CPU_IS_LITTLE_ENDIAN ? byteswap(a_b_c_d_e_f_g_h) : a_b_c_d_e_f_g_h
6:: uint64 Y = BCD + 0x3030303030303030
7:: return Y, tz

Algorithm 3: Convert a 16-digit decimal number to ASCII: dec_to_ascii16(x)

Input:: X (type: uint64, range: [0, 10¹⁶))
Output:: Y (type: uint128), tz (type: uint32)
1:: uint64 abcdefgh = X // $10^{8}$
2:: uint64 ijklmnop = X mod $10^{8}$
3:: (uint64 abcdefgh_ascii, uint tz1) = dec_to_ascii8(abcdefgh)
4:: (uint64 ijklmnop_ascii, uint tz2) = dec_to_ascii8(ijklmnop)
5:: uint128 Y = CPU_IS_LITTLE_ENDIAN ? (ijklmnop_ascii $< <$ 64) + abcdefgh_ascii : (abcdefgh_ascii $< <$ 64) + ijklmnop_ascii
6:: uint32 tz = (ijklmnop == 0) ? 8 + tz1 : tz2
7:: return Y, tz

SIMD Implementation

The SIMD implementations of

d e c_t o_a s c i i 8

and

d e c_t o_a s c i i 16

follow the same principles as the scalar version. For the ARM64 architecture, we use the NEON instruction set; for the x86-64 architecture, we support three variants: AVX512, SSE4.1, and SSE2. Table 5 summarizes these implementations. Due to the limited space, it might not be appropriate to fully demonstrate the implementation process. Readers can refer to the source code design on their own.

Using the methods above, we compute the ASCII representation of

m + u p

and the number of significant digits. Printing d is equivalent to printing

m + u p

and

o n e

. Based on

s i g 10 (d)

, we determine which buffer contents to retain. In our implementation, we adopt different formats for floating-point numbers in different ranges to enhance readability, as shown in Table 6.

In scientific notation, the result includes an exponent. For example, in “1.2 × 10⁻²”, “10⁻²” represents the exponent. Converting

d \cdot 10^{k}

to standard decimal scientific notation yields

d \cdot 10^{k} = (d \cdot 10^{- ⌊ {log}_{10} (d) ⌋}) \cdot 10^{k + ⌊ {log}_{10} (d) ⌋}

(202)

Since

d \cdot 10^{- ⌊ {log}_{10} (d) ⌋} \in [1, 10)

, the exponent is determined by

k + ⌊ {log}_{10} (d) ⌋

. Let

E_{10} = k + ⌊ {log}_{10} (d) ⌋

. Then,

E_{10} = k + ⌊ {log}_{10} (d) ⌋ = k + ⌊ {log}_{10} (m + u p) ⌋ + 1

(203)

For single-precision floating-point numbers, fixed-point notation is used when

E_{10} \in [- 3, 6]

. For double-precision numbers, fixed-point notation is used when

E_{10} \in [- 4, 15]

.

The floating-point number printing algorithm proposed in this paper is illustrated as Algorithm 4. The key distinction between our algorithm and others is the use of SIMD instruction sets for converting

m + u p

to ASCII values. For single-precision numbers, we use

d e c_t o_a s c i i 8

; for double-precision, we use

d e c_t o_a s c i i 16

. The detailed implementation is available in the source code.

Algorithm 4: Floating-point number printing algorithm

Input:: v (type: float/double), buffer (type: char*)
Output:: buffer (type: char*)
1:: compute $m + u p$ , $u p d o w n$ , $o n e$ , k from v
2:: compute $s i g 10 (d) = u p d o w n ? l e n 10 (m + u p) - t z 10 (m + u p) : l e n 10 (m + u p) + 1$
3:: compute $E_{10} = k + ⌊ {log}_{10} (m + u p) ⌋ + 1$
4:: convert $m + u p$ , $o n e$ to ASCII codes and store in buffer
5:: based on $s i g 10 (d)$ and $E_{10}$ , determine which buffer contents to retain
6:: if result is in scientific notation, print $E_{10}$
7:: return buffer

In our C implementation, all branches are designed as unlikely branches to minimize the impact of branch prediction failures. By transforming the computation of d into computing

m + u p

and

o n e

, we reduce instruction dependencies and enhance instruction-level parallelism. Unlike other algorithms that compute the exact value of d before printing, we avoid computing d directly and instead quickly compute

d / / 10 = m + u p

.

3.11. Summary

This section explains how to quickly calculate d and k, as well as the printing optimization. Since

d = 10 m + o n e

, the process of calculating d is transformed into calculating m and

o n e

.

Section 3.5 introduces the method for quickly calculating m.
Section 3.6 and Section 3.7 introduce the methods for quick calculation of $o n e$ .
Section 3.9 provides a detailed description of the pseudocode implementation.
Section 3.10 discusses print optimization.

4. Experimental Evaluation

This section presents a comprehensive experimental evaluation of the xjb algorithm, comparing its performance against state-of-the-art floating-point–string conversion algorithms across multiple hardware platforms and compilers.

4.1. Correctness Verification

Prior to performance evaluation, we conducted rigorous correctness verification to ensure that xjb fully complies with the Steele–White (SW) principle and produces accurate results. The verification process covered two key aspects: floating-point–decimal conversion, and floating-point–string conversion.

Single-Precision (binary32): Given the manageable size of the binary32 search space ( $2^{32}$ possible values), we performed exhaustive testing across the entire range. Each output was compared against the reference Schubfach algorithm to ensure identical results, guaranteeing complete correctness for the binary32 format.
Double-Precision (binary64): Exhaustive testing of all $2^{64}$ binary64 values is computationally infeasible. Instead, we employed a comprehensive testing strategy that included the following:
–
Large-scale random testing with statistically significant sample sizes.
–
Targeted testing of edge cases including subnormal numbers, extreme exponents, and near-power-of-two values.

All test results confirmed that xjb produces outputs identical to the Schubfach algorithm while fully satisfying the SW principle. The complete verification test suite is publicly available at (check.cpp) https://github.com/xjb714/xjb/blob/main/bench/check.cpp (accessed on 20 April 2026).

4.2. Experimental Setup

4.2.1. Hardware Platforms

To evaluate cross-platform performance and portability, we conducted benchmarks on three representative hardware platforms spanning both x86-64 and ARM64 architectures:

AMD R7-7840H: A modern high-performance x86-64 processor with support for AVX2 and AVX-512 instruction sets, running Ubuntu 26.04. This platform represents state-of-the-art x86-64 computing (max frequency: 5.1 GHz).
Apple M1: A first-generation Apple Silicon ARM64 processor with NEON SIMD support, running macOS 26.4. This platform serves as a baseline for ARM64 performance (max frequency: 3.2 GHz).
Apple M5: A recent-generation Apple Silicon ARM64 processor with NEON SIMD support, running macOS 26.4. This platform represents the latest ARM64 technology (max frequency: 4.46 GHz).

4.2.2. Compilers and Compilation Flags

Each platform uses its native compiler toolchain to ensure optimal code generation:

AMD R7-7840H: Intel C++ Compiler (icpx) version 2025.0.4.
Apple M1/M5: Apple Clang version 21.0.0.

All benchmarks were compiled with -O3 -march=native to enable maximum compiler optimizations and generate architecture-specific instructions, ensuring fair comparison across platforms.

4.2.3. Benchmark Methodology

Our benchmark methodology was designed to ensure fairness, reproducibility, and statistical significance:

Input Generation: Generate $2^{24}$ (16,777,216) random floating-point numbers, excluding special values (NaN, and infinity) to focus on the core conversion logic.
Warm-Up Phase: Execute the benchmark multiple times before measurement to eliminate cold-start effects and ensure consistent cache behavior.
Measurement: Measure the total wall-clock time required to convert all numbers through multiple iterations.
Analysis: Calculate the average conversion time per floating-point number, discarding outliers to ensure robust results.

This methodology minimizes system noise and provides reliable, reproducible performance measurements while avoiding skewed results from special-case handling.

4.3. Algorithms Compared

We compared the xjb algorithm with the other algorithms listed in Table 7. However, for some special cases, we will provide advance explanations.

For floating-point numbers to decimal/string:

teju_jagua: Only implements float/double-to-decimal conversion.
jnum: Only implements double-to-string conversion. When comparing float to string, we convert the double value to a float value. Strictly speaking, the jnum algorithm does not satisfy the SW principle. However, its performance is also quite excellent, so we still included it in the benchmark.

For data type:

yy_double, uscalec: Only the double data type is supported.

4.4. Performance Results

We evaluated xjb through two primary conversion interfaces: floating-point–decimal and floating-point–string.

Float/Double-to-Decimal Conversion: Table 8 summarizes the benchmark results for float-to-decimal and double-to-decimal conversions across the AMD R7-7840H and Apple M1/M5 platforms. All benchmarks use random values, excluding NaN and infinity, to focus on core conversion performance.
Float/Double-to-String Conversion: Figure 1 and Figure 2 present comprehensive benchmark results for double-to-string and float-to-string conversions on the three processor platforms.
Specifically,
–
Figure 1a,c,e show results for completely random double-precision floating-point numbers.
–
Figure 2a–c show results for completely random single-precision floating-point numbers.
–
Figure 1b,d,f present results for fixed-length significant digits (ranging from 1 to 17 digits).
In Figure 1 and Figure 2, the suffixes denote specific implementations:
–
_comp (e.g., fmt_comp, dragonbox_comp, xjb32_comp, xjb64_comp): Versions using compressed constant tables for reduced memory footprint.
–
_full (e.g., fmt_full, dragonbox_full): Versions using uncompressed constant tables for potentially faster access.
–
null: An empty function used to isolate and measure the overhead of function calls.

4.5. Analysis and Discussion

4.5.1. Performance Comparison

The benchmark results demonstrate that xjb achieves state-of-the-art performance across all three hardware platforms and both conversion interfaces. For decimal conversion, we compared it against the baseline Schubfach algorithm; for string conversion, we focused on comparisons with zmij, a representative modern high-performance algorithm.

On AMD R7-7840H (x86-64):

Float-to-decimal: 2.24 ns, 5.44× faster than Schubfach (12.2 ns);
Double-to-decimal: 3.76 ns, 3.06× faster than Schubfach (11.51 ns);
Float-to-string: 33.72 cycle, 70% faster than zmij (57.06 cycle);
Double-to-string: 43.41 cycle, 13% faster than zmij (49.28 cycle).

On Apple M1 (ARM64):

Float-to-decimal: 2.15 ns, 5.41× faster than Schubfach (11.64 ns);
Double-to-decimal: 2.58 ns, 5.08× faster than Schubfach (13.12 ns);
Float-to-string: 16.98 cycle, 117% faster than zmij (36.87 cycle);
Double-to-string: 20.77 cycle, 25% faster than zmij (27.74 cycle).

On Apple M5 (ARM64):

Float-to-decimal: 1.44 ns, 5.27× faster than Schubfach (7.59 ns);
Double-to-decimal: 1.55 ns, 4.97× faster than Schubfach (7.71 ns);
Float-to-string: 13.87 cycle, 136% faster than zmij (32.77 cycle);
Double-to-string: 17.09 cycle, 20% faster than zmij (20.74 cycle).

These results establish xjb as the new performance leader in floating-point–string conversion.

4.5.2. Performance Consistency

A defining characteristic of xjb is its consistent performance across diverse input distributions. By designing all conditional branches as unlikely branches, xjb achieves near-optimal branch prediction rates regardless of input patterns. This property is particularly valuable in real-world applications, where input distributions can be unpredictable.

This design addresses a key limitation of algorithms like Dragonbox [10], which trade branch efficiency for reduced multiplication operations. In contrast, xjb simultaneously achieves both minimal multiplication operations and efficient branch handling, combining the strengths of Schubfach [8] and Dragonbox [10] while avoiding their respective trade-offs.

4.5.3. Cross-Platform Performance

The benchmark results demonstrate xjb’s strong portability across processor architectures:

x86-64 (AMD R7-7840H): The algorithm benefits from the compiler’s ability to generate highly optimized code for arithmetic operations and SIMD instructions.
ARM64 (Apple M1/M5): xjb maintains consistent performance advantages across both generations of Apple Silicon, demonstrating the algorithm’s robustness and effectiveness across different instruction set architectures.

This cross-platform consistency is achieved through careful algorithm design that works harmoniously with compiler optimizations on diverse platforms.

4.5.4. Comparison with Some Related Algorithms

Placing xjb in the context of prior work reveals several key insights:

vs. Schubfach: With 3–5× speedup over the baseline, xjb validates that Schubfach’s elegant mathematical framework can be substantially optimized through computational restructuring, branch optimization, and SIMD utilization.
vs. yy_json/yy_double: While these algorithms represent excellent engineering for JSON serialization, xjb outperforms them by 2–3×, demonstrating that SIMD instruction set utilization unlocks significant additional optimization potential.
vs. zmij: Achieving 1.2–2.3× speedup over zmij highlights the benefits of xjb’s approach to instruction dependency reduction, which enables better instruction-level parallelism on modern superscalar processors.
vs. Ryū and Dragonbox: xjb outperforms these established algorithms by 2.5–6×, demonstrating that systematic optimization of multiple bottlenecks simultaneously yields substantial performance gains over approaches that focus on single aspects of the problem.

4.5.5. Fixed-Length Performance Analysis

The fixed-length benchmark results (Figure 1b,d,f) reveal important performance characteristics:

Consistent Performance: All xjb variants (xjb32, xjb32_comp, xjb64, xjb64_comp) maintain consistent performance across all digit lengths (1–17 digits). In contrast, some competing algorithms show significant performance variations depending on the number of output digits.
Predictable Latency: The branch-free core design ensures that the conversion time remains relatively constant regardless of output length. This predictability is a valuable property for real-time systems and high-throughput applications, where consistent latency is as important as raw performance.
Compression Trade-Off: The compressed table variant (_comp) maintains strong competitiveness in performance compared to other algorithms, while also reducing memory usage. This demonstrates the efficiency of xjb in terms of memory utilization.

4.6. Summary

This comprehensive experimental evaluation establishes xjb as the new state-of-the-art method in floating-point–string conversion across multiple hardware platforms. The key findings can be summarized as follows:

Superior Performance: xjb consistently outperforms all competing algorithms across both x86-64 (AMD R7-7840H) and ARM64 (Apple M1/M5) architectures.
Significant Speedups: Achieves 3–5× improvement over the baseline Schubfach algorithm.
Performance Consistency: Maintains stable performance across diverse input distributions and output lengths due to effective branch prediction optimization and branch-free core design.

These results validate the effectiveness of our holistic optimization approach: minimizing multiplication operations, reducing instruction dependencies, optimizing branch patterns, and leveraging SIMD instructions—working synergistically to deliver exceptional performance on modern processor architectures.

5. Conclusions

This paper presented xjb, a novel high-performance algorithm for converting IEEE 754 floating-point numbers to decimal string representations. Building upon the Schubfach algorithm, xjb introduces several key optimizations that significantly improve conversion speed while maintaining full compliance with the Steele–White principle for accurate and minimal-length output.

5.1. Improvements to the Schubfach Algorithm

The xjb algorithm represents a significant advancement over the baseline Schubfach algorithm through several targeted improvements:

Restructured Computation Flow: By decomposing the significand calculation into integer and fractional parts, xjb minimizes instruction dependencies, enabling better instruction-level parallelism and improved pipeline utilization on modern superscalar processors.
Minimized Multiplication Operations: xjb reduces the number of expensive high-precision multiplications required during conversion. For IEEE 754 binary64, only one 64-bit by 128-bit multiplication is needed, and for binary32, only one 64-bit by 64-bit multiplication is required, significantly decreasing computational overhead.
Branch Optimization: The algorithm employs branchless programming techniques for core conversion logic and structures remaining branches as unlikely paths, enabling efficient branch prediction on modern processors and resulting in consistent performance across diverse input distributions.
SIMD Instruction Utilization: Unlike Schubfach and many other existing algorithms, xjb leverages SIMD instructions (NEON for ARM64, AVX512/SSE4.1/SSE2 for x86-64) for efficient decimal-to-ASCII conversion, fully exploiting the vector processing capabilities of contemporary processors.

5.2. Key Findings

Our extensive experimental evaluation across AMD R7-7840H (x86-64) and Apple M1/M5 (ARM64) platforms reveals several key findings that advance the state of the art in floating-point–string conversion:

Significant Performance Improvement: xjb achieves a remarkable 3–5× speedup over the baseline Schubfach algorithm, representing a substantial leap in performance compared to prior work. On Apple M5, xjb achieves an impressive 1.44 ns for float-to-decimal conversion and 1.55 ns for double-to-decimal conversion, setting a new performance benchmark in the field.
Superior to State-of-the-Art Algorithms: xjb consistently outperforms other high-performance algorithms, including yy_json, yy_double, and zmij, by margins of 1.2–3×. This indicates the performance improvement achieved by using the SIMD instruction set and reducing instruction dependencies. On the Apple M5 processor, compared with zmij, our algorithm achieves approximately 20% and 136% speedups for double-to-string and float-to-string conversion, respectively.
Consistent Performance across Platforms: Unlike prior work, which often shows significant performance variations between architectures, xjb maintains its performance advantage across both x86-64 and ARM64. This portability is achieved through careful algorithm design that works well with compiler optimizations on different platforms.
Stable Performance across Input Distributions: The algorithm maintains consistent performance regardless of input patterns and output digit lengths. This stability can be attributed to our branch-free core design and effective branch prediction optimization for all conditional branches, making xjb ideal for applications requiring predictable performance.
Synergistic Optimization Effects: The combination of instruction dependency reduction, multiplication minimization, branch optimization, and SIMD utilization works synergistically to deliver performance gains that exceed what any single optimization could achieve in isolation.
Concise Core Implementation: The core conversion logic of xjb is implemented in a concise manner, with minimal code lines and clear logic flow. This design simplifies maintenance and allows for easy integration into larger software systems.

These key findings collectively demonstrate that xjb successfully addresses the research gap identified in the Introduction, providing a comprehensive solution to the limitations of existing floating-point–string conversion algorithms.

5.3. Practical Implications

The xjb algorithm has immediate practical applications in numerous domains:

Data Serialization: JSON and other text-based data formats require efficient floating-point–string conversion for serialization operations.
Scientific Computing: Applications that output numerical results in a human-readable format benefit from faster conversion without sacrificing accuracy.
Database Systems: Export operations and query result formatting can leverage xjb for improved throughput.
Web Services: RESTful APIs and web applications that return numerical data can achieve lower latency with efficient conversion.

5.4. Limitations and Future Work

While xjb demonstrates strong performance, several areas warrant further investigation:

Extended Precision Support: Future work could extend xjb to support extended precision formats (e.g., 16-bit, 80-bit, 128-bit, and 256-bit floating-point numbers) for applications requiring higher precision.
SIMD Vectorization: Although xjb is designed to be SIMD-friendly, explicit vectorization using AVX-512 or NEON could yield additional performance gains for batch conversion workloads.
Compiler Compatibility: Further optimization for different compilers (particularly MSVC) would improve portability across development environments.
Memory-Constrained Environments: Investigating memory-efficient variants of xjb could benefit embedded systems and other resource-constrained platforms.

5.5. Availability

The complete implementation of the xjb algorithm, along with benchmark tools and test suites, is publicly available at https://github.com/xjb714/xjb/releases/tag/v1.5.0 (accessed on 20 April 2026). We encourage the community to integrate, test, and contribute to the ongoing development of this work.

Author Contributions

Methodology, J.X.; Software, J.X.; Validation, J.X.; Writing—original draft, J.X.; Funding acquisition, T.W.; Writing—review & editing, T.W.; Visualization, T.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Sichuan Science and Technology Program (2024ZDZX0001).

Data Availability Statement

All the source code files in our GitHub repository at https://github.com/xjb714/xjb (accessed on 20 April 2026) are freely accessible.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Mathematical Foundations of Fractional Part Boundary

In this Appendix, we collect several elementary facts concerning rational approximation and Farey sequences. These results are used in the main text to bound fractional parts and to compute best rational approximations efficiently.

Appendix A.1. Notation and Assumptions

Let n, P, Q,

n_{max}

be positive integers satisfying the following conditions:

P and Q are coprime and $P < Q$ ;
$1 ⩽ n ⩽ n_{max}$ ;
$Q > n_{max}$ .

For a given maximal denominator

n_{max}

, we denote by

\frac{P_{*}}{Q_{*}} and \frac{P^{*}}{Q^{*}}

the best rational approximations of

P / Q

from below and from above, respectively, i.e.,

\frac{P_{*}}{Q_{*}} = max_{\begin{matrix} 1 ⩽ n ⩽ n_{max} \\ Q_{*} ⩽ n_{max} \end{matrix}} \frac{⌊ n P / Q ⌋}{n}, \frac{P^{*}}{Q^{*}} = min_{\begin{matrix} 1 ⩽ n ⩽ n_{max} \\ Q^{*} ⩽ n_{max} \end{matrix}} \frac{⌈ n P / Q ⌉}{n} .

Appendix A.2. Basic Identities

If

n P

is not a multiple of Q, then clearly

⌊ n \cdot \frac{P}{Q} ⌋ + 1 = ⌈ n \cdot \frac{P}{Q} ⌉ .

When

n P

is a multiple of Q, the left-hand side is one larger than the right-hand side; the non-divisibility assumption avoids this degenerate case.

Appendix A.3. A Useful Equivalence

Assume that, for a real number

ξ

, the equality

⌊ n \cdot \frac{P}{Q} ⌋ = ⌊ n ξ ⌋

holds for all

1 ⩽ n ⩽ n_{max}

. Then,

ξ

must lie between the best lower and upper rational approximations of

P / Q

with the denominator bounded by

n_{max}

. More precisely,

\frac{P_{*}}{Q_{*}} = max_{1 ⩽ n ⩽ n_{max}} \frac{⌊ n P / Q ⌋}{n} ⩽ ξ < min_{1 ⩽ n ⩽ n_{max}} \frac{⌊ n P / Q ⌋ + 1}{n} = min_{1 ⩽ n ⩽ n_{max}} \frac{⌈ n P / Q ⌉}{n} = \frac{P^{*}}{Q^{*}} .

Consequently, the admissible range for

ξ

is the half-open interval

\frac{P_{*}}{Q_{*}} ⩽ ξ < \frac{P^{*}}{Q^{*}} .

Appendix A.4. Range of the Fractional Parts

The fractional part of

n \cdot P / Q

is defined as

{n P / Q} = n P / Q - ⌊ n P / Q ⌋

. Within the set

{1, 2, \dots, n_{max}}

, this quantity attains its minimum at

n = Q_{*}

and its maximum at

n = Q^{*}

. Hence, the fractional parts lie exactly in the interval

[\frac{(Q_{*} P) mod Q}{Q}, \frac{(Q^{*} P) mod Q}{Q}] .

(A1)

where

(x mod Q)

denotes the remainder of x upon division by Q, i.e.,

0 ⩽ (x mod Q) < Q

.

Proof of the Fractional Part Range:

We prove that for all

1 ⩽ n ⩽ n_{max}

,

{Q_{*} P / Q} ⩽ {n P / Q} ⩽ {Q^{*} P / Q},

with equality at

n = Q_{*}

and

n = Q^{*}

, respectively.

(1): Notation

Let

a = P_{*}

,

b = Q_{*}

,

c = P^{*}

,

d = Q^{*}

. By definition,

a / b

and

c / d

are adjacent terms in the Farey sequence

F_{n_{max}}

and satisfy

a / b < P / Q < c / d

and

b c - a d = 1

. For any n with

1 ⩽ n ⩽ n_{max}

, set

k = ⌊ n P / Q ⌋

. The fraction

k / n

cannot lie strictly between

a / b

and

c / d

, because

n ⩽ n_{max}

and the two are neighbors in

F_{n_{max}}

. Consequently,

\frac{k}{n} ⩽ \frac{a}{b} and \frac{k + 1}{n} ⩾ \frac{c}{d} .

(2): Integer Linear Representation

We now establish a convenient parameterization of the integers n and

k = ⌊ n P / Q ⌋

using the coefficients of the adjacent Farey fractions

a / b

and

c / d

. Because

a / b

and

c / d

are neighbors in the Farey sequence

F_{n_{max}}

, they satisfy the well-known unimodular relation

b c - a d = 1 .

This means that the

2 \times 2

matrix

M = (\begin{matrix} b & d \\ a & c \end{matrix})

has the determinant

det M = b c - a d = 1

. Any integer matrix with a determinant of 1 is invertible over the integers; its inverse is given explicitly by

M^{- 1} = (\begin{matrix} c & - d \\ - a & b \end{matrix}) .

For a given n and the corresponding

k = ⌊ n P / Q ⌋

, consider the column vector

(\begin{matrix} n \\ k \end{matrix})

. Since M is unimodular, there exists a unique pair of integers

(x, y)

such that

M (\begin{matrix} x \\ y \end{matrix}) = (\begin{matrix} n \\ k \end{matrix}) ⟺ (\begin{matrix} b & d \\ a & c \end{matrix}) (\begin{matrix} x \\ y \end{matrix}) = (\begin{matrix} n \\ k \end{matrix}) .

Multiplying both sides on the left by

M^{- 1}

yields the explicit formulae for x and y:

(\begin{matrix} x \\ y \end{matrix}) = (\begin{matrix} c & - d \\ - a & b \end{matrix}) (\begin{matrix} n \\ k \end{matrix}) = (\begin{matrix} c n - d k \\ - a n + b k \end{matrix}) .

Thus, we obtain the two relations

x = c n - d k, y = b k - a n .

Now, recall that

k / n ⩽ a / b

. Because

n > 0

and

b > 0

, we may cross-multiply without changing the inequality to obtain

b k ⩽ a n

. This immediately implies that

y = b k - a n ⩽ 0 .

Let us define

y^{'} = - y ⩾ 0

. Substituting

y = - y^{'}

back into the original system gives

n = b x + d (- y^{'}) = b x - d y^{'}, k = a x + c (- y^{'}) = a x - c y^{'} .

We therefore arrive at the non-negative parameterization

n = b x - d y^{'}, k = a x - c y^{'},

with integers

x ⩾ 1

and

y^{'} ⩾ 0

. If

x ⩽ 0

, then

n = b x - d y^{'} ⩽ 0

, contradicting

n ⩾ 1

; hence x must be strictly positive.

This representation is the key to expressing the fractional part

{n P / Q}

as a simple linear combination of two fundamental positive quantities, which will be exploited in the next step.

(3): Minimum of the Fractional Parts

Let

ε = P / Q - a / b > 0

. Then,

{b P / Q} = b ε

. Using the representation above,

{n P / Q} = \frac{n P}{Q} - k = \frac{P}{Q} (b x - d y^{'}) - (a x - c y^{'}) = x (\frac{b P}{Q} - a) - y^{'} (\frac{d P}{Q} - c) .

Since

d P / Q - c = - (c - d P / Q) = - d δ

, where

δ = c / d - P / Q > 0

, we have

{n P / Q} = x \cdot b ε + y^{'} \cdot d δ .

Both

b ε

and

d δ

are positive, and

x ⩾ 1

,

y^{'} ⩾ 0

. Therefore,

{n P / Q} ⩾ 1 \cdot b ε + 0 \cdot d δ = b ε = {b P / Q},

with equality if and only if

x = 1

and

y^{'} = 0

, i.e.,

n = b

and

k = a

. Thus, the minimum is attained at

n = Q_{*}

.

(4): Maximum of the Fractional Parts

A completely symmetric argument using the upper approximation gives

1 - {n P / Q} = \frac{k + 1}{n} \cdot n - \frac{n P}{Q} = n (\frac{k + 1}{n} - \frac{P}{Q}) .

We have

(k + 1) / n ⩾ c / d

, and using the unimodular relations one can show that

1 - {n P / Q} = x^{'} \cdot d δ + y^{″} \cdot b ε

for some non-negative integers

x^{'}, y^{″}

with

x^{'} ⩾ 1

. Hence,

1 - {n P / Q} ⩾ d δ = 1 - {d P / Q},

which is equivalent to

{n P / Q} ⩽ {d P / Q}

. The maximum is therefore achieved at

n = d = Q^{*}

. For a detailed proof of the upper bound, one may also consult the theory of continued fractions: the largest fractional part among

n θ (mod 1)

for

n ⩽ N

occurs at the denominator of the best upper approximation.

(5): Conclusion

The fractional parts all lie in the interval

[{Q_{*} P / Q}, {Q^{*} P / Q}]

, and the endpoints are exactly

(Q_{*} P mod Q) / Q

and

(Q^{*} P mod Q) / Q

, as claimed.

Appendix A.5. Computation via Farey Sequences

The best rational approximations with bounded denominators can be obtained efficiently from the Farey sequence of order C. We define the function

(\frac{P_{*}}{Q_{*}}, \frac{P^{*}}{Q^{*}}) = (D N, U P) = f (C, P, Q)

(A2)

which returns the two adjacent terms in the C-th Farey sequence

F_{C}

that bracket

P / Q

. The implementation of f relies on the mediant property of Farey sequences and is provided in the accompanying code (see (test1.py) https://github.com/xjb714/xjb/blob/main/py_test/test1.py (accessed on 20 April 2026)).

Using this function, the bounds in Appendix A.4 can be computed in negligible time, thereby providing a fast method to determine the range of fractional parts discussed in the main text.

References

Steel, G.L., Jr.; White, J.L. How to Print Floating-Point Numbers Accurately. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, PLDI 1990; ACM: New York, NY, USA, 1990; pp. 112–126. [Google Scholar] [CrossRef]
Steel, G.L., Jr.; White, J.L. How to Print Floating-Point Numbers Accurately (Retrospective). ACM SIGPLAN Notices 39(4), April 2004 (Best of PLDI, 1979–1999). Available online: https://dl.acm.org/doi/10.1145/989393.989431 (accessed on 1 April 2004).
Burger, R.G.; Dybvig, R.K. Printing Floating-point Numbers Quickly and Accurately. In Proceedings of the ACM SIGPLAN1996 Conference on Programming Language Design and Implementation (PLDI ’96); ACM: New York, NY, USA, 1996; pp. 108–116. [Google Scholar] [CrossRef]
Loitsch, F. Printing Floating-Point Numbers Quickly and Accurately with Integers. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation, PLDI 2010; ACM: New York, NY, USA, 2010; pp. 233–243. [Google Scholar] [CrossRef]
Andrysco, M.; Jhala, R.; Lerner, S. Printing Floating-Point Numbers: A Faster, Always Correct Method. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016; ACM: New York, NY, USA, 2016; pp. 555–567. [Google Scholar] [CrossRef]
Adams, U. Ryū: Fast Float-to-String Conversion. In Proceedings of 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’18); ACM: New York, NY, USA, 2018; pp. 270–282. [Google Scholar] [CrossRef]
Adams, U. Ryū Revisited: Printf Floating Point Conversion. Proc. ACM Program. Lang. 2019, 3, 169. [Google Scholar] [CrossRef] [PubMed]
Giulietti, R. The Schubfach Way to Render Doubles. 2020. Available online: https://drive.google.com/file/d/1KLtG_LaIbK9ETXI290zqCxvBW94dj058/view (accessed on 1 September 2020).
Jeon, J. Grisu-Exact: A Fast and Exact Floating-Point Printing Algorithm. 2020. Available online: https://github.com/jk-jeon/Grisu-Exact/blob/master/other_files/Grisu-Exact.pdf (accessed on 1 September 2020).
Jeon, J. Dragonbox: A New Floating-Point Binary-to-Decimal Conversion Algorithm. 2024. Available online: https://github.com/jk-jeon/Dragonbox (accessed on 1 July 2024).
Guo, Y.Y. Available online: https://github.com/ibireme/c_numconv_benchmark/blob/master/vendor/yy_double/yy_double.c (accessed on 1 January 2025).
Guo, Y.Y. Available online: https://github.com/ibireme/yyjson (accessed on 1 August 2025).
Cox, R. Available online: https://github.com/rsc/fpfmt (accessed on 1 January 2026).
Cox, R. Floating-Point Printing and Parsing Can Be Simple And Fast. Available online: https://research.swtch.com/fp (accessed on 1 January 2026).
Cox, R. Fast Unrounded Scaling: Proof by Ivy. Available online: https://research.swtch.com/fp-proof (accessed on 1 January 2026).
Zverovich, V. Available online: https://github.com/vitaut/zmij (accessed on 1 March 2026).
ANSI/IEEE Std 754-1985; IEEE Standard for Binary Floating-Point Arithmetic. IEEE: New York, NY, USA, 1985; pp. 1–20. [CrossRef]
IEEE Std 754-2019; (Revision of IEEE 754-2008) IEEE Standard for Floating-Point Arithmetic. IEEE: New York, NY, USA, 2019; pp. 1–84. [CrossRef]
Khuong, P. How to Print Integers Really Fast (with Open Source AppNexus Code!). Available online: https://pvk.ca/Blog/2017/12/22/appnexus-common-framework-its-out-also-how-to-print-integers-faster/ (accessed on 1 December 2017).
Johnson, D. Converting Integers to Fixed-Width Strings Faster with Neon SIMD on the Apple M1. Available online: https://dougallj.wordpress.com/2022/04/01/converting-integers-to-fixed-width-strings-faster-with-neon-simd-on-the-apple-m1/ (accessed on 1 April 2022).
Muła, W. SSE: Conversion Integers to Decimal Representation. Available online: http://0x80.pl/notesen/2011-10-21-sse-itoa.html (accessed on 1 October 2011).
Lemire, D. Converting Integers to Decimal Strings Faster with AVX-512. Available online: https://lemire.me/blog/2022/03/28/converting-integers-to-decimal-strings-faster-with-avx-512/ (accessed on 1 March 2022).
Xiang, J. Available online: https://github.com/xjb714/xjb/tree/main/bench/schubfach_xjb (accessed on 1 April 2026).
Zverovich, V. Available online: https://github.com/fmtlib/fmt (accessed on 1 October 2025).
Neri, C. Available online: https://github.com/cassioneri/teju_jagua (accessed on 1 November 2025).
Leng, J. Available online: https://github.com/lengjingzju/json/jnum.c (accessed on 1 November 2025).

Figure 1. Benchmark results for random and fixed-length double-precision numbers (excluding NaN and Inf).

Figure 2. Benchmark results for random float-precision numbers (excluding NaN and Inf).

Table 1. Explanation of special symbols in this article.

Symbol	Brief Explanation	Example
%	Integer modulus operation	2 = 8%3
//	Integer division operation	1 = 5//3
$< <$ or $> >$	Left or right shift of binary values	8 = 1 $< <$ 3
? :	Similar to the ternary operator in C syntax	a = 1?a:b

Table 2. Valid ranges for significand c and exponent q.

Category	Float (Binary32)	Double (Binary64)
Subnormal	$1 \leq c \leq 2^{23} - 1$ , $q = - 149$	$1 \leq c \leq 2^{52} - 1$ , $q = - 1074$
Normal	$2^{23} + 1 \leq c \leq 2^{24} - 1$	$2^{52} + 1 \leq c \leq 2^{53} - 1$
Normal	$- 148 \leq q \leq 104$	$- 1073 \leq q \leq 971$
Irregular	$c = 2^{23}$ , $- 149 \leq q \leq 104$	$c = 2^{52}$ , $- 1074 \leq q \leq 971$

Table 3. Mapping between identified limitations and xjb’s solutions.

Identified Limitation	Corresponding Solution in xjb
Frequent branch mispredictions	Branchless programming for core decision logic (Section 3.6, Section 3.7 and Section 3.10)
High-precision multiplication overhead	Minimized multiplication count via lookup-table restructuring (Section 3.4, Section 3.5 and Section 3.6)
Long instruction dependency chains	Restructured computation flow to expose instruction-level parallelism (Section 3.10)
Limited SIMD utilization	SIMD-optimized ASCII generation for decimal-to-string stage (Section 3.10, Table 5)

Table 4. Examples of floating-point number printing results.

Float Number	Fixed-Point	Scientific
2.34	“2.34”	“2.34”
12	“12.0”	“1.2”
120	“120.0”	“1.2 × 10²”
0.012	“0.012”	“1.2 × 10⁻²”

Table 5. SIMD implementations of

d e c_t o_a s c i i 8

and

d e c_t o_a s c i i 16

.

Table 5. SIMD implementations of

d e c_t o_a s c i i 8

and

d e c_t o_a s c i i 16

.

SIMD Implementation	Description
NEON [20]	Original author: Dougall Johnson. Runs on ARM processors with NEON instruction set.
SSE2 [21]	Based on scalar version; requires only SSE2 instruction set.
SSE4.1	Nearly identical to SSE2 implementation; requires SSE4.1 instruction set.
AVX512 [22]	Original author: Daniel Lemire. Requires AVX512IFMA and AVX512VBMI instruction sets.

Table 6. Printing formats for different ranges.

Type	Fixed-Point	Scientific
Float	$[10^{- 3}, 10^{7})$	Other ranges
Double	$[10^{- 4}, 10^{16})$	Other ranges

Table 7. All algorithms in the benchmark test.

Algorithm	Float	Double	Description: Author and Source Code
Schubfach [8]	Schubfach32	Schubfach64	Raffaello Giulietti, https://github.com/abolz/Drachennest/tree/master/src (accessed on 4 December 2025).
Schubfach_xjb [23]	Schubfach32_xjb	Schubfach64_xjb	The computation flow in the Schubfach source code has been modified by me, without altering the original output results, https://github.com/xjb714/xjb/tree/main/bench/schubfach_xjb (accessed on 4 December 2025).
Ryū [6,7]	Ryū32	Ryū64	Ulf Adams, https://github.com/ulfjack/ryu (accessed on 4 December 2025).
Dragonbox [10]	Dragonbox32	Dragonbox64	Junekey Jeon, https://github.com/jk-jeon/Dragonbox (accessed on 4 December 2025).
fmt [24]	fmt32	fmt64	Victor Zverovich, https://github.com/fmtlib/fmt version:12.1.0 (accessed on 4 December 2025)
yy_double [11]	-	yy_double	Guo YaoYuan, https://github.com/ibireme/c_numconv_benchmark/blob/master/vendor/yy_double/yy_double.c (accessed on 4 December 2025).
yy_json [12]	yy_json32	yy_json64	Guo YaoYuan, https://github.com/ibireme/yyjson version:0.12.0 (accessed on 4 December 2025)
teju_jagua [25]	teju32	teju64	Cassio Neri, https://github.com/cassioneri/teju_jagua (accessed on 4 December 2025).
xjb	xjb32	xjb64	This paper, https://github.com/xjb714/xjb (accessed on 4 December 2025).
zmij [16]	zmij32	zmij64	Victor Zverovich, https://github.com/vitaut/zmij (accessed on 8 April 2026).
jnum [26]	jnum32	jnum64	Jing Leng, https://github.com/lengjingzju/json/jnum.c (accessed on 4 December 2025).
uscalec [13]	-	uscalec	Russ Cox, https://github.com/rsc/fpfmt commit 6255750 (accessed on 19 January 2026).

Table 8. Float/double-to-decimal benchmark results (time in nanoseconds).

Algorithm	AMD R7-7840H		Apple M1		Apple M5
	Icpx 2025.0.4		Apple Clang 21.0.0		Apple Clang 21.0.0
	Float	Double	Float	Double	Float	Double
Schubfach	12.20	11.51	11.64	13.12	7.59	7.71
Schubfach_xjb	4.44	6.33	5.16	6.58	3.15	3.75
Ryū	14.02	13.08	15.75	14.16	10.23	9.50
Dragonbox	10.19	10.05	11.78	12.03	7.56	7.39
yy_json	4.67	5.72	3.97	4.46	2.40	2.72
yy_double	–	5.24	–	4.08	–	2.71
teju_jagua	14.99	14.37	20.25	18.66	13.49	12.71
zmij	4.76	4.78	4.11	3.83	2.82	2.14
uscalec	–	11.27	–	15.26	–	9.61
xjb	2.24	3.76	2.15	2.58	1.44	1.55

Note that different algorithms may produce semantically equivalent but syntactically varying decimal outputs. For example, some algorithms (including Dragonbox, uscale, and Ryū) omit trailing zeros in their output representations. Although the results were not consistent, the real values represented by all of the algorithms’ outputs all met the SW principle.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiang, J.; Wang, T. xjb: Fast Float to String Algorithm. Computers 2026, 15, 280. https://doi.org/10.3390/computers15050280

AMA Style

Xiang J, Wang T. xjb: Fast Float to String Algorithm. Computers. 2026; 15(5):280. https://doi.org/10.3390/computers15050280

Chicago/Turabian Style

Xiang, Junbo, and Tiejun Wang. 2026. "xjb: Fast Float to String Algorithm" Computers 15, no. 5: 280. https://doi.org/10.3390/computers15050280

APA Style

Xiang, J., & Wang, T. (2026). xjb: Fast Float to String Algorithm. Computers, 15(5), 280. https://doi.org/10.3390/computers15050280

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

xjb: Fast Float to String Algorithm

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Contributions

1.3. Evaluation Overview

1.4. Explanation of Special Symbols in This Article

2. IEEE 754 Floating-Point Number Representation

2.1. Scope and Assumptions

2.2. Binary Representation

2.3. Classification of Floating-Point Numbers

2.4. Value Representation

2.5. Rounding Interval

3. Algorithm Principles

3.1. Design Overview

3.2. Mathematical Foundation

3.3. Overview of the Schubfach Algorithm and Derivation of Our Method

3.3.1. Candidate Values for the Significand d

3.3.2. Decomposition into Integer and Fractional Parts

3.3.3. Selection Criteria for o n e

3.3.4. Algorithm Overview

3.4. Lookup Table Precomputation

3.4.1. Fundamental Calculation

3.4.2. Detailed Calculation Process

3.4.3. Storage Requirements

3.4.4. Implementation Notes

3.5. Efficient Computation of m

3.5.1. Key Proof

3.5.2. Bit Width Calculation

3.5.3. Results

3.6. Fast Boundary Condition Testing for o n e = 0 and o n e = 10

3.6.1. Equivalent Conditions for Boundary Testing

3.6.2. Integer Testing Analysis

3.6.3. Key Insight: Integer Divisibility Test

3.6.4. Summary of Boundary Conditions

3.6.5. Efficient Implementation

3.7. Efficient Computation of ⌊ 10 n ⌋ and Rounding

3.7.1. 10 n − ⌊ 10 n ⌋ = 0.5

3.7.2. 10 n − ⌊ 10 n ⌋ ≠ 0.5

3.7.3. Efficient Implementation of n = 0.25 for Double

3.7.4. Efficient Calculation of o n e for Double

3.8. Irregular Number

3.9. Implementation of Pseudocode

3.9.1. Single-Precision Floating-Point Numbers

3.9.2. Double-Precision Floating-Point Numbers

3.10. Decimal-to-String Conversion

3.11. Summary

4. Experimental Evaluation

4.1. Correctness Verification

4.2. Experimental Setup

4.2.1. Hardware Platforms

4.2.2. Compilers and Compilation Flags

4.2.3. Benchmark Methodology

4.3. Algorithms Compared

4.4. Performance Results

4.5. Analysis and Discussion

4.5.1. Performance Comparison

4.5.2. Performance Consistency

4.5.3. Cross-Platform Performance

4.5.4. Comparison with Some Related Algorithms

4.5.5. Fixed-Length Performance Analysis

4.6. Summary

5. Conclusions

5.1. Improvements to the Schubfach Algorithm

5.2. Key Findings

5.3. Practical Implications

5.4. Limitations and Future Work

5.5. Availability

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Mathematical Foundations of Fractional Part Boundary

Appendix A.1. Notation and Assumptions

Appendix A.2. Basic Identities

Appendix A.3. A Useful Equivalence

Appendix A.4. Range of the Fractional Parts

Appendix A.5. Computation via Farey Sequences

References

3.3.3. Selection Criteria for $o n e$

3.6. Fast Boundary Condition Testing for $o n e = 0$ and $o n e = 10$

3.7. Efficient Computation of $⌊ 10 n ⌋$ and Rounding

3.7.1. $10 n - ⌊ 10 n ⌋ = 0.5$

3.7.2. $10 n - ⌊ 10 n ⌋ \neq 0.5$

3.7.3. Efficient Implementation of $n = 0.25$ for Double

3.7.4. Efficient Calculation of $o n e$ for Double