Next Article in Journal
LACE-Net: A Swin Transformer with Local Frequency-Domain Energy and Adaptive Contrast Enhancement for Fine-Grained Land Cover Classification
Next Article in Special Issue
A Nonlinear State-Space Model for Fatigue Attention Dynamics in Online Learning Environments
Previous Article in Journal
Reason2Decide-C: Adaptive Cycle-Consistent Training for Clinical Rationales
Previous Article in Special Issue
Case Studies on the Logical Structure of the Algorithms Tabu Search and Threshold Accepting for Generating Solutions in Searching and Solving the Bin-Packing Problem
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

xjb: Fast Float to String Algorithm

by
Junbo Xiang
and
Tiejun Wang
*
School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
*
Author to whom correspondence should be addressed.
Computers 2026, 15(5), 280; https://doi.org/10.3390/computers15050280
Submission received: 29 March 2026 / Revised: 20 April 2026 / Accepted: 24 April 2026 / Published: 27 April 2026
(This article belongs to the Special Issue Computational Science and Its Applications 2025 (ICCSA 2025))

Abstract

Efficiently and accurately converting floating-point numbers to decimal strings remains a fundamental challenge in numerical computation, data serialization, and human–computer interaction. While modern algorithms such as Ryū, Dragonbox, and Schubfach rigorously satisfy the Steele–White criteria for correctness and minimal output length, their performance is frequently constrained by branch mispredictions, high-precision multiplication overhead, and suboptimal utilization of instruction-level parallelism. This paper introduces xjb, a novel floating-point–string conversion algorithm derived from Schubfach that systematically overcomes these bottlenecks. By restructuring the core computation to reduce instruction dependencies, adopting branchless decision logic, and exploiting SIMD instruction sets for decimal-to-ASCII formatting, xjb delivers state-of-the-art throughput across diverse hardware platforms. The algorithm requires only a single 64-by-128-bit multiplication for IEEE 754 binary64 conversions and a single 64-by-64-bit multiplication for binary32, drastically decreasing arithmetic complexity. Extensive benchmarking on AMD R7-7840H and Apple M1/M5 processors demonstrates that xjb consistently outperforms leading contemporary implementations. Notably, on the Apple M5, xjb achieves speedups of approximately 20% and 136% for binary64 and binary32 conversions, respectively, when compared to the highly optimized zmij library. The algorithm is fully compliant with the Steele–White principle; exhaustive validation over the entire binary32 space and extensive random testing across the binary64 range confirm both its theoretical soundness and practical robustness.

1. Introduction

Floating-point–decimal string conversion is a fundamental operation in computer systems, with widespread applications across numerous domains. From scientific computing and financial systems to web services and database management, the ability to efficiently and accurately convert binary floating-point representations into human-readable decimal strings is essential. Despite its apparent simplicity, this conversion problem presents significant challenges in balancing the competing demands of correctness, performance, and output compactness.

1.1. Background and Motivation

In 1990, Steele and White [1,2] established the foundational principles for optimal floating-point printing algorithms, now widely known as the Steele–White (SW) principle. The SW principle comprises four key requirements:
  • Information Preservation: The printed result must be parsable back to the original floating-point number without loss of precision.
  • Minimum Length: The output string should be as short as possible while maintaining information preservation.
  • Correct Rounding: When multiple representations satisfy the first two criteria, the algorithm must correctly round to the nearest value, with ties broken by selecting the even value.
  • Left-to-Right Generation: The output digits should be generated sequentially from the most significant to the least significant digit.
The SW principle ensures that floating-point numbers have a unique, well-defined decimal representation that is both human-readable and machine-parsable. Algorithms satisfying the SW principle guarantee that the conversion process is reversible and produces the shortest possible output, which is crucial for data exchange, serialization, and user interface display. Over the past three decades, significant research efforts have been devoted to developing efficient algorithms. Early approaches, such as Dragon4, provided correct results but suffered from performance limitations due to arbitrary-precision arithmetic, and they were later improved by dtoa.c [3]. Grisu3 [4] pioneered the use of precomputed powers of ten to avoid expensive operations, although it occasionally fell back to slower methods. Errol [5] reduced the fallback rate through more precise error analysis. Ryū [6,7] established a new performance baseline through careful instruction scheduling and lookup table optimizations. Schubfach [8] introduced a compact and elegant solution based on the pigeonhole principle, while Grisu-Exact [9] eliminated fallbacks entirely. Dragonbox [10] reduced the number of multiplications, at the cost of more branches; yy_double [11] and yy_json [12] explored alternative computational strategies to minimize multiplication costs, but they still retained a few branches; uscale [13,14,15], proposed by Russ Cox, enhances floating-point printing performance in the Go programming language. Finally, zmij [16] builds upon yy_double with extensive code-level optimizations.
Despite these advances, existing algorithms still face several performance challenges that limit their effectiveness in high-throughput scenarios:
  • Branch Prediction Penalties: Many algorithms rely heavily on conditional branches to handle different cases, leading to frequent branch mispredictions on modern pipelined processors.
  • High-Precision Multiplication Overhead: The conversion process requires high-precision arithmetic operations, particularly multiplications involving large precomputed constants, which can be expensive on standard hardware.
  • Instruction Dependency Chains: Sequential dependencies between operations limit instruction-level parallelism and prevent efficient utilization of modern superscalar processors.
  • Limited SIMD Utilization: Most existing algorithms do not exploit vector instruction sets (SIMD) that are now ubiquitous in contemporary processors.
Despite the progress made by prior works, a critical research gap remains: no existing algorithm holistically addresses the four fundamental performance bottlenecks—branch mispredictions, high-precision multiplication overhead, instruction dependency chains, and underutilization of SIMD capabilities. For example,
  • Schubfach [8] offers an elegant approach but suffers from suboptimal performance due to unoptimized computation flow.
  • Dragonbox [10] reduces multiplications at the cost of increased branches, trading off one bottleneck for another.
  • yy_double [11] and yy_json [12] are highly optimized but do not use SIMD instructions.
  • zmij [16] provides competitive performance but still leaves room for improvement in instruction dependency reduction and branch optimization.
This gap necessitates a new approach that systematically integrates multiple optimization strategies within a cohesive framework, rather than trading off one bottleneck for another.

1.2. Contributions

This paper presents xjb, a novel floating-point–string conversion algorithm that achieves superior performance through systematic optimization of the underlying computational structure. The xjb algorithm is derived from the Schubfach algorithm and incorporates insights from yy_double and Dragonbox, but it introduces several key innovations that directly address the limitations of existing frameworks:
  • Reduced Instruction Dependencies: Unlike other algorithms that suffer from sequential dependencies, xjb carefully restructures the computation by decomposing d (introduce on Section 3.2) into d / / 10 and d % 10 instead of computing d directly. This minimizes data dependencies between operations, enabling better instruction-level parallelism and improved pipeline utilization on modern superscalar processors.
  • Minimized Multiplication Operations: Building on insights from yy_double, but without the trade-off of increased branches, xjb reduces the number of expensive high-precision multiplications required during conversion, significantly decreasing the computational overhead while maintaining branch efficiency. For IEEE 754 binary64, only one 64-bit by 128-bit multiplication is required, and for IEEE 754 binary32, only one 64-bit by 64-bit multiplication is needed.
  • Mitigated Branch Prediction Penalties: Through branchless programming techniques and careful case analysis, xjb addresses the branch prediction problem that plagues algorithms like Dragonbox. All branches in xjb are designed as unlikely branches, and the core conversion of normal floating-point numbers is completely branch-free, minimizing conditional branches that could lead to prediction failures.
  • SIMD Instruction Utilization: Unlike most existing algorithms that neglect SIMD potential, xjb is designed from the ground up to leverage SIMD instructions (NEON for ARM64, AVX512/SSE4.1/SSE2 for x86-64) for efficient decimal-to-ASCII conversion, fully exploiting the vector processing capabilities of contemporary processors.
  • Concise Core Implementation: Despite its sophisticated optimizations, xjb maintains a compact and readable core implementation, facilitating adoption and maintenance.
These innovations work synergistically to address all key performance bottlenecks simultaneously, filling the research gap identified in the previous section. The xjb algorithm supports both IEEE 754 single-precision (binary32) and double-precision (binary64) floating-point formats, which are the most widely used floating-point representations in modern computing. For simplicity, this paper uses float to refer to IEEE 754 binary32 and double to refer to IEEE 754 binary64.

1.3. Evaluation Overview

We conducted extensive benchmarking of xjb across diverse hardware platforms, including AMD R7-7840H and Apple M1/M5 processors. Our evaluation demonstrates that xjb outperforms state-of-the-art algorithms in most scenarios while maintaining full compliance with the SW principle. The algorithm exhibits excellent portability and scalability, making it suitable for deployment across a wide range of systems, from embedded devices to high-performance servers.
The remainder of this paper is organized as follows: Section 2 reviews the IEEE 754 floating-point representation and establishes the mathematical foundation for the conversion problem. Section 3 elucidates the core principles and derivation of the xjb algorithm and describes the implementation details and optimizations. Section 4 presents experimental results comparing xjb against existing algorithms. To conclude, Section 5 offers a comprehensive summary of the paper.

1.4. Explanation of Special Symbols in This Article

We provide special explanations for the special symbols used in the formulae of this article, as shown in Table 1.

2. IEEE 754 Floating-Point Number Representation

This section establishes the mathematical foundation for the representation of floating-point numbers and defines the notation used throughout this paper. We focus on the IEEE 754 standard, which is the most widely adopted floating-point arithmetic standard in modern computing systems.

2.1. Scope and Assumptions

For clarity of presentation, we make the following simplifying assumptions:
  • We consider only positive floating-point numbers, as negative numbers differ only by a leading minus sign.
  • We excluded special values (zero, NaN, and infinity) from our analysis, since these are handled separately in practice.
These assumptions are standard in the literature and do not affect the generality of our algorithm, as excluded cases can be handled with straightforward special-case logic.

2.2. Binary Representation

The IEEE 754 standard [17,18] defines two primary floating-point formats relevant to this work:
Double Precision (binary64): A 64-bit format consisting of the following:
  • One sign bit (s): indicates positive ( s = 0 ) or negative ( s = 1 ).
  • Eleven exponent bits (e): biased exponent in the range [ 0 , 2047 ] .
  • Fifty-two fraction bits (f): significant fraction in the range [ 0 , 2 52 1 ] .
Single Precision (binary32): A 32-bit format consisting of the following:
  • one sign bit (s): indicates positive ( s = 0 ) or negative ( s = 1 ).
  • Eight exponent bits (e): biased exponent in the range [ 0 , 255 ] .
  • Twenty-three fraction bits (f): significant fraction in the range [ 0 , 2 23 1 ] .

2.3. Classification of Floating-Point Numbers

We classify floating-point numbers into three categories based on their exponent and fraction fields:
  • Subnormal Numbers ( e = 0 and f 0 ): These represent very small values close to zero, where the implicit leading bit of the significand is 0 instead of 1.
  • Normal Numbers ( e 0 and f 0 ): The standard case, where the implicit leading bit of the significand is 1.
  • Irregular Numbers ( e 0 and f = 0 ): Numbers with zero fraction field, representing powers of two.
We use the term regular to refer to both subnormal and normal numbers (i.e., all cases where f 0 ). Unless otherwise specified, this article discusses only regular values—that is, non-irregular values.

2.4. Value Representation

The real value v of a positive floating-point number can be expressed in the unified form v = c · 2 q , where c is the integer significand and q is the exponent. The general formula covering all cases is as follows:
double : v = f + ( e 0 ? 2 52 : 0 ) · 2 max ( e , 1 ) 1023 52 = c · 2 q float : v = f + ( e 0 ? 2 23 : 0 ) · 2 max ( e , 1 ) 127 23 = c · 2 q
For each category, the values decompose as follows:
Subnormal Numbers ( e = 0 , f 0 ):
subnormal : double : v = f · 2 1074 float : v = f · 2 149
Normal Numbers ( e 0 , f 0 ):
normal : double : v = ( f + 2 52 ) · 2 e 1075 float : v = ( f + 2 23 ) · 2 e 150
Irregular Numbers ( e 0 , f = 0 ):
irregular : double : v = 2 52 · 2 e 1075 float : v = 2 23 · 2 e 150

2.5. Rounding Interval

A critical concept for accurate floating-point printing is the rounding interval  R v , which defines the range of real numbers that round to the given floating-point value v when parsed. The rounding interval is bounded by
v l = c 1 2 · 2 q = v 2 q 1 if f 0 or e 1 c 1 4 · 2 q = v 2 q 2 if f = 0 v r = c + 1 2 · 2 q = v + 2 q 1 R v = [ v l , v r ] if f mod 2 = 0 ( even significand ) ( v l , v r ) if f mod 2 = 1 ( odd significand )
The rounding radius for regular floating-point numbers is 2 q 1 = v r v . The distinction between closed and open intervals at the boundaries depends on the parity of the significand, ensuring correct rounding according to the round-to-even rule specified in the IEEE 754 standard. Any decimal number within R v will parse back to the original floating-point value v, which is essential for ensuring the information preservation property of the SW principle.
Table 2 summarizes the valid ranges for c and q across different categories.

3. Algorithm Principles

This section presents the algorithmic principles and mathematical foundation of the xjb floating-point–string conversion algorithm. We first introduce the overall architecture and design goals, followed by the mathematical formulation of the conversion problem.

3.1. Design Overview

This paper focuses on converting float (single-precision) and double (double-precision) floating-point numbers to decimal strings. The conversion process consists of two stages:
  • Float-to-Decimal Conversion: Converting binary floating-point values to decimal significand–exponent pairs ( d , k ) .
  • Decimal-to-String Conversion: Formatting ( d , k ) into human-readable strings.
Table 3 presents the corresponding optimization methods for the identified limitations and the chapter information where they are located.

3.2. Mathematical Foundation

Before presenting the algorithm details, we establish the mathematical framework for the float-to-decimal conversion problem.
Recall from Section 2 that any floating-point value v can be expressed in the form v = c · 2 q , where c is the integer significand and q is the exponent. Our goal is to find the optimal decimal representation o p t = d · 10 k that satisfies the SW principle.
As established in Section 2, regular floating-point numbers (which include all normal and subnormal numbers with non-zero fraction fields) account for the vast majority of possible floating-point values. For the purposes of algorithm derivation, we focus primarily on regular numbers, as special cases can be handled with minimal additional logic.
The valid ranges for the significand c and exponent q for regular floating-point numbers are as follows:
float : 1 c 2 24 1 , c 2 23 , q = 149 2 23 + 1 c 2 24 1 , 148 q 104 double : 1 c 2 53 1 , c 2 52 , q = 1074 2 52 + 1 c 2 53 1 , 1073 q 971
For irregular floating-point numbers (powers of two), the ranges are
float : c = 2 23 , 149 q 104 double : c = 2 52 , 1074 q 971
For subnormal numbers,
float : c 2 23 1 , q = 149 double : c 2 52 1 , q = 1074
The conversion problem can now be formally stated as follows: given a floating-point value v = c · 2 q , find the optimal decimal representation o p t = d · 10 k such that
v = c · 2 q o p t = d · 10 k subject to : o p t R v , d Z + , k Z
where R v is the rounding interval of v, as defined in Section 2.
For example, consider the IEEE 754 binary64 floating-point number representing 1.3. Its actual value is 1.3000000000000000444089209850062616169452667236328125, with the hexadecimal representation 3ff4cccccccccccd. The optimal decimal representation o p t satisfying the SW principle is simply 1.3. Similarly, for the binary32 representation of 1.3, with an actual value of 1.2999999523162841796875 and a hexadecimal representation of 3fa66666, the optimal representation is also 1.3.

3.3. Overview of the Schubfach Algorithm and Derivation of Our Method

This section reviews the Schubfach algorithm and presents our derivation of an optimized variant. We begin by establishing the mathematical foundation for determining the optimal decimal representation.

3.3.1. Candidate Values for the Significand d

The Schubfach algorithm identifies four candidate values for the decimal significand d:
d { 10 v · 10 k 1 , 10 v · 10 k 1 , 10 v · 10 k 1 + 1 , 10 v · 10 k 1 + 10 }
The exponent k is computed as follows:
k = q · log 10 ( 2 ) if v is regular q · log 10 ( 2 ) log 10 ( 4 / 3 ) otherwise
For efficient computation on modern processors, Equation  (11) can be implemented using integer arithmetic:
double : k = ( q · 315653 ( v is regular ? 0 : 131072 ) ) 20 float : k = ( q · 1233 ( v is regular ? 0 : 512 ) ) 12

3.3.2. Decomposition into Integer and Fractional Parts

We will decompose v · 10 k 1 into its integer component v · 10 k 1 (floor function) and fractional component v · 10 k 1 v · 10 k 1 . Let m = v · 10 k 1 denote the integer part and n = v · 10 k 1 m the fractional part, where 0 n < 1 . Substituting into Equation (10),
d { 10 m , 10 ( m + n ) , 10 ( m + n ) + 1 , 10 m + 10 }
Since 10 ( m + n ) = 10 m + 10 n , the candidates simplify to:
d { 10 m , 10 m + 10 n , 10 m + 10 n + 1 , 10 m + 10 }
Based on the Schubfach algorithm and the SW principle, if the value 10 m or 10 m + 10 falls within the range R v , it is selected as the optimal solution d. In cases where neither 10 m nor 10 m + 10 lies within R v , the optimal value d is determined as either 10 m + 10 n or 10 m + 10 n + 1 in accordance with the rules of correct rounding, as shown in Equation (15). We decompose d as d = t e n + o n e , where t e n = 10 m and o n e { 0 , 10 n , 10 n + 1 , 10 } . The problem thus reduces to determining the appropriate value of o n e .
o n e = 0 ; if 10 m R v 10 ; else if 10 m + 10 R v 10 n or 10 n + 1 ; else apply correct rounding

3.3.3. Selection Criteria for o n e

The selection of o n e depends on the relationship between the rounding interval and the candidate values. Recall that the rounding interval for a regular floating-point number v = c · 2 q is R v = ( f % 2 = 0 ) ? [ v 2 q 1 , v + 2 q 1 ] : ( v 2 q 1 , v + 2 q 1 ) , where the rounding radius 2 q 1 represents the half-unit in the last place (ulp) of v.
  • Case  o n e = 0  (i.e.,  d = 10 m ): This case applies when 10 m · 10 k falls inside the rounding interval R v . The condition is derived as follows:
    The lower bound of the rounding interval must be less than 10 m · 10 k :
    c · 2 q 10 m · 10 k < 2 q 1 c · 2 q c · 2 q · 10 k 1 · 10 k + 1 < 2 q 1 c · 2 q · 10 k 1 c · 2 q · 10 k 1 < 2 1 · 2 q · 10 k 1 n < 2 q 1 · 10 k 1
    When equality holds ( n = 2 q 1 · 10 k 1 , or equal to 10 m · 10 k = v 2 q 1 ), we apply the round-to-even rule, requiring c to be even:
    o n e = 0 if 2 q 1 · 10 k 1 > n or 2 q 1 · 10 k 1 = n and c mod 2 = 0
  • Case  o n e = 10  (i.e.,  d = 10 m + 10 ): This case applies when ( 10 m + 10 ) · 10 k falls inside the rounding interval R v . The condition is derived similarly:
    The upper bound of the rounding interval must be greater than ( 10 m + 10 ) · 10 k :
    ( 10 m + 10 ) · 10 k c · 2 q < 2 q 1 c · 2 q · 10 k 1 · 10 k + 1 + 10 k + 1 c · 2 q < 2 q 1 1 c · 2 q · 10 k 1 c · 2 q · 10 k 1 < 2 1 · 2 q · 10 k 1 1 n < 2 q 1 · 10 k 1
    When equality holds ( 1 n = 2 q 1 · 10 k 1 , or equal to ( 10 m + 10 ) · 10 k = v + 2 q 1 ), we again apply round-to-even:
    o n e = 10 if 2 q 1 · 10 k 1 > 1 n or 2 q 1 · 10 k 1 = 1 n and c mod 2 = 0
  • Case  o n e { 10 n , 10 n + 1 } : When neither boundary condition applies, the optimal value lies between 10 m + 10 n and 10 m + 10 n + 1 . We determine o n e by rounding 10 n to the nearest integer:
    If the fractional part { 10 n } < 0.5 : o n e = 10 n ;
    If the fractional part { 10 n } > 0.5 : o n e = 10 n + 1 ;
    If the fractional part { 10 n } = 0.5 : apply round-to-even.
    For irregular floating-point numbers (powers of two), additional verification is required to ensure that the selected value lies within the rounding interval R v , as the interval boundaries differ for these special cases.

3.3.4. Algorithm Overview

Algorithm 1 summarizes our optimized variant of the Schubfach algorithm (xjb32 and xjb64 for float and double, respectively.). Given inputs c and q, the algorithm returns d and k such that d · 10 k satisfies the SW principle. The computation of k follows Equatio (12); the remainder of this chapter focuses on efficient computation of d for regular floating-point numbers.
We will provide a detailed introduction to the efficient implementation of Algorithm 1 in the following sections:
  • Lookup table precomputation;
  • Efficient computation of m;
  • Fast boundary condition testing for o n e { 0 , 10 } ;
  • Efficient computation of 10 n and rounding;
  • Handling of irregular floating-point numbers;
  • Implementation of pseudocode.
In the last section of this chapter, we will briefly introduce the second stage: decimal-to-string conversion.
In essence, xjb diverges from the baseline Schubfach by transforming the conditional boundary tests (which cause branch mispredictions) into integer arithmetic and lookup operations. Furthermore, it reorders the computation of m and n to reduce data dependencies, directly addressing the instruction-level parallelism limitation identified in Section 1.1.

3.4. Lookup Table Precomputation

The algorithm in this paper employs a lookup table to store precomputed values of 10 k 1 for different ranges of q: 149 , 104 for float and 1074 , 971 for double. These lookup tables use extended precision: 64-bit for float and 128-bit for double. The reference implementation is available in (gen.py) https://github.com/xjb714/xjb/blob/main/py_test/gen.py (accessed on 20 April 2026).

3.4.1. Fundamental Calculation

Let B denote the bit length of each entry in the lookup table, with  B = 64 for float and B = 128 for double. For any integer e 10 (representing a power of 10), we aim to represent 10 e 10 in the form f · 2 e 2 , where 1 f < 2 and e 2 is a real number. This gives
f · 2 e 2 = 2 e 2 = 10 e 10
Taking the logarithm base 2 of both sides, we get
e 2 = e 10 · log 2 ( 10 )
Algorithm 1: The xjb Algorithm for Float-to-Decimal Conversion
Require: 
Floating-point components c (significand) and q (exponent)
Ensure: 
Decimal representation d · 10 k satisfying the SW principle
  1:
c · 2 q v
  2:
if v is regular then
  3:
   k q log 10 ( 2 )
  4:
else
  5:
   k q log 10 ( 2 ) log 10 ( 4 / 3 )
  6:
end if
  7:
m v 10 k 1
  8:
n v 10 k 1 m
  9:
ten 10 m
10:
δ 10 n 10 n {fractional part of 10n}
11:
if  δ = 0.5    then
12:
  if  10 n   mod   2 = 0    then
13:
     one 10 n {round to even}
14:
  else
15:
     one 10 n + 1
16:
  end if
17:
else if  δ < 0.5  then
18:
   one 10 n {round to nearest}
19:
else
20:
   one 10 n + 1 {round to nearest}
21:
end if
22:
if v is irregular then
23:
  if  δ > 2 q 2 · 10 k  then
24:
     one 10 n + 1
25:
  end if
26:
  if  2 q 2 10 k 1 n    then
27:
     one 0
28:
  end if
29:
else
30:
  if  2 q 1 10 k 1 > n   or   ( 2 q 1 10 k 1 = n   and   c mod 2 = 0 )    then
31:
     one 0 {minimum length}
32:
  end if
33:
  if  2 q 1 10 k 1 > 1 n   or   ( 2 q 1 10 k 1 = 1 n   and   c mod 2 = 0 )    then
34:
     one 10 {minimum length}
35:
  end if
36:
end if
37:
d ten + one {information preservation}
38:
 
39:
return  d , k
Solving for f gives
f = 10 e 10 2 e 10 · log 2 ( 10 )
The lookup table entries are computed using upward rounding:
l o o k u p [ e 10 ] = f · 2 B 1 = 10 e 10 2 e 10 · log 2 ( 10 ) · 2 B 1 = 10 e 10 · 2 B 1 e 10 · log 2 ( 10 )
Notably, f · 2 B 1 becomes an integer for certain ranges of e 10 : 0 e 10 27 for float and 0 e 10 55 for double.

3.4.2. Detailed Calculation Process

The detailed calculation process is as follows:
  • Float
    The range of k 1 is calculated to be [−32, 44] through the q value range in Equation (6), so the lookup table contains representation values from 10 to the power of −32 to 10 to the power of 44. The calculation process is as follows:
    32 e 10 44 e 2 = e 10 · log 2 ( 10 ) 63 p o w 10 t = 2 e 2 / / 10 e 10 ; if e 10 < 0 10 e 10 / / 2 e 2 ; if e 10 20 10 e 10 · 2 e 2 ; if 1 e 10 19 f 1 , e 10 = p o w 10 = p o w 10 t + 0 e 10 27 ? 0 : 1
    When 0 e 10 27 , the lookup table variable indicates that the values f 1 , e 10 · 2 e 10 · log 2 ( 10 ) 63 and 10 e 10 are equal. In other cases, the relative error is less than 2 63 , expressed as follows:
    r 1 , e 10 = f 1 , e 10 · 2 e 10 · log 2 ( 10 ) 63 10 e 10 1 ; if 0 e 10 27 1 , 1 + 2 63 ; if e 10 < 0 or e 10 > 27
  • Double
    The range of k 1 is calculated to be [−293, 323] through the q value range in Equation (6), so the lookup table contains representation values from 10 to the power of −293 to 10 to the power of 323. The calculation process is as follows:
    293 e 10 323 e 2 = e 10 · log 2 ( 10 ) 127 p o w 10 t = 2 e 2 / / 10 e 10 ; if e 10 < 0 10 e 10 / / 2 e 2 ; if e 10 39 10 e 10 · 2 e 2 ; if 1 e 10 38 f 1 , e 10 = p o w 10 = p o w 10 t + 0 e 10 55 ? 0 : 1
    When 0 e 10 55 , the lookup table variable indicates that the values f 1 , e 10 · 2 e 10 · log 2 ( 10 ) 127 and 10 e 10 are equal. In other cases, the relative error is less than 2 127 , expressed as follows:
    r 1 , e 10 = f 1 , e 10 · 2 e 10 · log 2 ( 10 ) 127 10 e 10 1 ; if 0 e 10 55 1 , 1 + 2 127 ; if e 10 < 0 or e 10 > 55
Let r 1 denote the error for float lookup table entries, r 2 for double entries, and r for both. In Algorithm 1, we retrieve 10 k 1 from the lookup table. The lookup table provides exact values when q falls within specific ranges:
float : 0 k 1 27 93 q 1 double : 0 k 1 55 186 q 1
For q outside these ranges, the lookup table entries have bounded relative errors:
float : 0 < r 1 1 < 2 63 double : 0 < r 2 1 < 2 127

3.4.3. Storage Requirements

The float lookup table requires 616 bytes of storage, calculated as ( 44 ( 32 ) + 1 ) × 8 bytes (77 entries × 8 bytes each). The double lookup table requires 9872 bytes, calculated as ( 323 ( 293 ) + 1 ) × 16 bytes (617 entries × 16 bytes each).

3.4.4. Implementation Notes

The lookup table precomputation uses efficient integer arithmetic to avoid precision loss during calculations. The conditional logic in Equations (24) and (26) optimizes the computation based on the sign and magnitude of e 10 , ensuring efficient generation of accurate lookup table entries.

3.5. Efficient Computation of m

This section presents an efficient method for calculating m in Algorithm 1, which is defined as m = v · 10 k 1 .

3.5.1. Key Proof

In Algorithm 1, we need to compute m = c · 2 q · 10 k 1 . We aim to prove
m = c · 2 q · 10 k 1 = c · 2 q · r · 10 k 1
where r is the lookup table error defined in Equations (25) and (27). When condition Equation (28) is met, r = 1 , and the equation holds trivially. For  r 1 ,
float : 1 < r < 1 + 2 63 double : 1 < r < 1 + 2 127
Calculate the range of 2 q · 10 k 1 , and we get
2 q · 10 k 1 = 10 1 · 2 q · 10 q · lg ( 2 ) = 10 1 · 10 q · lg ( 2 ) q · lg ( 2 )
When q is not 0, Equation (32) exists:
q · lg ( 2 ) q · lg ( 2 ) 0 < q · lg ( 2 ) q · lg ( 2 ) < 1
When q is 0, q · lg ( 2 ) q · lg ( 2 ) = 0 , so the final conclusion is
10 1 2 q · 10 k 1 < 1
because there is
c · 2 q · 10 k 1 = c · 2 q k 1 5 k + 1 0.1 c , c
Therefore,
c · 2 q · 10 k 1 = c · 2 q k 1 5 k + 1 ; q 1 c 2 1 + k q · 5 k + 1 = c 10 ; q = 0 c · 5 k 1 2 1 + k q ; q < 0
Suppose
c · 2 q · 10 k 1 = c · x y < c
Then, there are
x , y = 2 q k 1 , 5 k + 1 ; q 1 1 , 10 ; q = 0 5 k 1 , 2 1 + k q ; q < 0

3.5.2. Bit Width Calculation

Define maximum values for c:
float : c c m a x = C 1 = 2 24 1 double : c c m a x = C 2 = 2 53 1
Let C denote either C 1 or C 2 , depending on the precision.
For y > C , compute P * and Q * for each q using Appendix A.5  f ( C , x , y ) , and find the minimum B I T such that
x y ( 1 + 2 B I T ) < P * Q *
For y C ,
c · x y 1 + 1 C y = c x + c C · x y y < c x + 1 y
Thus,
c · x y = c · x y 1 + 1 C y
Similarly, find the minimum B I T such that
x y ( 1 + 2 B I T ) < x y 1 + 1 C y

3.5.3. Results

The maximum of the minimum B I T values for all q (calculated in (test1.py) https://github.com/xjb714/xjb/blob/main/py_test/test1.py (accessed on 20 April 2026) in about 1–2 s) is
float : B I T m a x = 52 double : B I T m a x = 113
Thus,
float : c · x y = c · x y · ( 1 + 2 52 ) = c · x y · r 1 double : c · x y = c · x y · ( 1 + 2 113 ) = c · x y · r 2
This result confirms that m can be calculated efficiently using the lookup table values, even with their inherent errors. Once m is determined, t e n = 10 m can be computed quickly.

3.6. Fast Boundary Condition Testing for o n e = 0 and o n e = 10

In Algorithm 1, the conditions for determining o n e = 0 and o n e = 10 appear on lines 31 and 34, respectively. This section introduces an optimized method to quickly test these boundary conditions using equivalent mathematical formulations.

3.6.1. Equivalent Conditions for Boundary Testing

We start by deriving equivalent mathematical conditions for testing o n e = 0 and o n e = 10 .
  • Case 1: Testing  o n e = 0
    When 2 1 · 2 q · 10 k 1 = n , this is equivalent to
    c · 2 q · 10 k 1 c · 2 q · 10 k 1 = 2 1 · 2 q · 10 k 1 ( 2 c 1 ) · 2 q 1 · 10 k 1 = c · 2 q · 10 k 1
  • Case 2: Testing  o n e = 10
    When 2 1 · 2 q · 10 k 1 = 1 n , this is equivalent to
    c · 2 q · 10 k 1 c · 2 q · 10 k 1 + 1 = 2 1 · 2 q · 10 k 1 ( 2 c + 1 ) · 2 q 1 · 10 k 1 = c · 2 q · 10 k 1 + 1

3.6.2. Integer Testing Analysis

To further analyze these conditions, we start with the range of 2 q 1 · 10 k 1 from Equation (34):
2 q 1 · 10 k 1 [ 0.05 , 0.5 )
  • Analysis for  o n e = 0
    For the o n e = 0 case, we can derive
    c · 2 q · 10 k 1 1 < c · 2 q · 10 k 1 0.5 < ( 2 c 1 ) · 2 q 1 · 10 k 1 c · 2 q · 10 k 1 0.05 < c · 2 q · 10 k 1 + 1
    This implies that when ( 2 c 1 ) · 2 q 1 · 10 k 1 is an integer, it must equal c · 2 q · 10 k 1 .
  • Analysis for  o n e = 10
    Similarly, for the o n e = 10 case,
    c · 2 q · 10 k 1 < c · 2 q · 10 k 1 + 0.05 ( 2 c + 1 ) · 2 q 1 · 10 k 1 < c · 2 q · 10 k 1 + 0.5 < c · 2 q · 10 k 1 + 2
    This implies that when ( 2 c + 1 ) · 2 q 1 · 10 k 1 is an integer, it must equal c · 2 q · 10 k 1 + 1 .

3.6.3. Key Insight: Integer Divisibility Test

The key insight is that testing for o n e = 0 or o n e = 10 is equivalent to checking whether ( 2 c ± 1 ) · 2 q 1 · 10 k 1 is an integer. This can be rewritten as follows:
( 2 c ± 1 ) · 2 q 1 · 10 k 1 = ( 2 c ± 1 ) · 2 q k 2 · 5 k 1
We analyze different ranges of q to simplify this condition:
  • Case  q 2
    From q 2 , we get k 0 . The expression simplifies to checking whether ( 2 c ± 1 ) · 2 q k 2 is divisible by 5 k + 1 . Since 2 and 5 are coprime, this reduces to checking whether ( 2 c ± 1 ) is divisible by 5 k + 1 :
    ( 2 c ± 1 ) mod 5 k + 1 = 0
    Let t be a positive integer such that 2 c ± 1 = t · 5 k + 1 . Since 2 c ± 1 is odd, t must also be odd. Considering the ranges of c for float and double,
    float : 2 c 1 [ 2 24 + 1 , 2 25 3 ] ; 2 c + 1 [ 2 24 + 3 , 2 25 1 ] ; double : 2 c 1 [ 2 53 + 1 , 2 54 3 ] ; 2 c + 1 [ 2 53 + 3 , 2 54 1 ] ;
    This gives us the range for t:
    float : 2 24 + 1 5 k + 1 t 2 25 1 5 k + 1 ; double : 2 53 + 1 5 k + 1 t 2 54 1 5 k + 1 ;
    The maximum values of k where t can be at least one odd integer are
    float : k max = 9 q max = 33 , t = 3 double : k max = 22 q max = 76 , t = 1
  • Case  1 q 0
    The denominator 2 2 + k q · 5 k + 1 is even, while the numerator ( 2 c ± 1 ) is odd, so no solution exists.
  • Case  q < 0
    The denominator 2 2 + k q is even, while the numerator ( 2 c ± 1 ) · 5 k 1 is odd, so no solution exists.

3.6.4. Summary of Boundary Conditions

In summary, the situations when ( 2 c ± 1 ) · 2 q 1 · 10 k 1 is an integer are as follows:
float : 2 q 33 & & ( 2 c ± 1 ) mod 5 k + 1 = 0 ; double : 2 q 76 & & ( 2 c ± 1 ) mod 5 k + 1 = 0 ;
The range of k 1 is as follows:
float : 10 k 1 1 double : 23 k 1 1

3.6.5. Efficient Implementation

We can further simplify the testing conditions using bitwise operations. For  o n e = 0 , when 2 1 · 2 q · 10 k 1 = n ,
float : 2 35 · 2 q · 10 k 1 = 2 36 · n double : 2 63 · 2 q · 10 k 1 = 2 64 · n
For o n e = 10 , when 2 1 · 2 q · 10 k 1 = 1 n , the following conclusions can be drawn:
float : 2 35 · 2 q · 10 k 1 = 2 36 2 36 · n 2 35 · 2 q · 10 k 1 = 2 36 2 36 · n = 2 36 1 2 36 · n double : 2 63 · 2 q · 10 k 1 = 2 64 2 64 · n 2 63 · 2 q · 10 k 1 = 2 64 2 64 · n = 2 64 1 2 64 · n
The discussion on whether 2 36 2 36 · n = 2 36 1 2 36 · n in Equation (59) holds true—that is, whether 2 36 · n in Equation (59) is an integer—or equivalent to discussing whether the following values are integers when Equation (56) holds true (the same applies to double):
float : 2 36 · ( m + n ) = c · 2 q + 36 · 10 k 1 = c · 2 q k + 35 · 5 k 1 = c · 2 q k + 35 5 k + 1 double : 2 64 · ( m + n ) = c · 2 q + 64 · 10 k 1 = c · 2 q k + 63 · 5 k 1 = c · 2 q k + 63 5 k + 1
Suppose c can divide 5 k + 1 evenly (where t is a temporary integer variable):
c = t · 5 k + 1 ; t 1
Therefore, when Equation (61) was established, there were
2 c ± 1 = 2 · t · 5 k + 1 ± 1
Expression Equation (62) cannot divide 5 k + 1 evenly, which contradicts Equation (56), so c cannot divide 5 k + 1 evenly. Therefore, for float, c · 2 q + 36 · 10 k 1 and 2 36 · n are not integers. For double, c · 2 64 + q · 10 k 1 and 2 64 · n are not integers; that is,
float : 2 36 2 36 · n = 2 36 + 2 36 · n = 2 36 1 2 36 · n double : 2 64 2 64 · n = 2 64 + 2 64 · n = 2 64 1 2 64 · n
Therefore, the conclusion Equation (59) is correct. Discuss the necessary and sufficient conditions for whether 2 35 · 2 q · 10 k 1 = 2 36 · n is 2 1 · 2 q · 10 k 1 = n . The same applies to double, expressed as follows:
float : 2 1 · 2 q · 10 k 1 = n 2 35 · 2 q · 10 k 1 = 2 36 · n double : 2 1 · 2 q · 10 k 1 = n 2 63 · 2 q · 10 k 1 = 2 64 · n
Similarly, the necessary and sufficient condition for whether 2 35 · 2 q · 10 k 1 = 2 36 2 36 · n is 2 1 · 2 q · 10 k 1 = 1 n . The same applies to double, expressed as follows:
float : 2 1 · 2 q · 10 k 1 = 1 n 2 35 · 2 q · 10 k 1 = 2 36 2 36 · n double : 2 1 · 2 q · 10 k 1 = 1 n 2 63 · 2 q · 10 k 1 = 2 64 2 64 · n
The sufficient conditions of Equations (64) and (65) are obviously established. Introduce the proof that Equation (64) holds. For float, only the necessary conditions need to be discussed; that is, whether 2 1 · 2 q · 10 k 1 = n must hold true when 2 35 · 2 q · 10 k 1 = 2 36 · n holds, or equivalent to 2 35 · 2 q · 10 k 1 2 36 · n must hold true when 2 1 · 2 q · 10 k 1 n . The following is proved by proof by contradiction.
Assume that 2 35 · 2 q · 10 k 1 = 2 36 · n holds when 2 1 · 2 q · 10 k 1 n . Then there is
2 35 · 2 q · 10 k 1 = 2 36 · n 0 < 2 35 · 2 q · 10 k 1 2 36 · n < 1 0 < 2 c 1 · 2 q 1 · 10 k 1 m < 2 36
As is known from Equation (49), there is
m 1 < 2 c 1 · 2 q 1 · 10 k 1 < m + 1
Suppose that the decimal part of 2 c 1 · 2 q 1 · 10 k 1 is represented as n ; thus, we have
2 c 1 · 2 q 1 · 10 k 1 m = n ; if 2 c 1 · 2 q 1 · 10 k 1 > m 1 n ; if 2 c 1 · 2 q 1 · 10 k 1 < m
Substitute Equation (68) into Equation (66), and we get
0 < 2 c 1 · 2 q 1 · 10 k 1 m < 2 36 0 < n < 2 36 or 0 < 1 n < 2 36
Similarly, it can be known that the double range is the range of n . Therefore, there is
float : n 0 , 2 36 1 2 36 , 1 double : n 0 , 2 64 1 2 64 , 1
When 2 1 · 2 q · 10 k 1 n , it is known from Equation (46) that 2 c 1 · 2 q 1 · 10 k 1 is not an integer. Therefore, there is
0 < n < 1
It is only necessary to prove that Equation (70) does not hold. Discuss the range of the decimal part n when 2 c 1 · 2 q 1 · 10 k 1 is not an integer. According to Equation (51), there are
2 c 1 · 2 q 1 · 10 k 1 = 2 c 1 · x y = 2 c 1 · 2 q k 2 5 k + 1 ; q 2 2 c 1 2 2 + k q · 5 k + 1 ; 1 q 0 2 c 1 · 5 k 1 2 2 + k q ; q < 0
The maximum value of 2 c 1 is
float : 2 c 1 max = 2 25 3 double : 2 c 1 max = 2 54 3
Discuss based on the denominator range in Equation (72).
  • y 2 c 1 max
    When y 2 c 1 max , y max is the expression Equation (73), the following holds true:
    1 y max n 1 1 y max 1 y max 1 n 1 1 y max
    Therefore, when y 2 c 1 max , Equation (70) does not hold true.
  • y > 2 c 1 max
    Call function Appendix A.5 to calculate the approximation results P * Q * and P * Q * of all possible upper and lower limit rational numbers:
    P * Q * , P * Q * = f 2 c 1 max , x , y
    Therefore, for  n , the following conclusion can be drawn from Appendix A.4.
    n Q * x % y y , Q * x % y y
    By exhausting all possibilities, we thus have the following (the test code file is (test3.py) https://github.com/xjb714/xjb/blob/main/py_test/test3.py) (accessed on 20 April 2026):
    float : 2 33 < n < 1 2 29 double : 2 62 < n < 1 2 63
    float : Q * x % y y , Q * x % y y 0 , 2 36 = Q * x % y y , Q * x % y y 1 2 36 , 1 = double : Q * x % y y , Q * x % y y 0 , 2 64 = Q * x % y y , Q * x % y y 1 2 64 , 1 =
    Therefore, when y > 2 c 1 max , Equation (70) does not hold true.
In summary, when 2 1 · 2 q · 10 k 1 n , Equation (70) does not hold true; that is, 2 35 · 2 q · 10 k 1 2 36 · n must hold true. Therefore, when 2 35 · 2 q · 10 k 1 = 2 36 · n holds, 2 1 · 2 q · 10 k 1 = n must hold true. Therefore, Equation (64) holds.
Similarly, it can be proved that when 2 35 · 2 q · 10 k 1 = 2 36 2 36 · n holds, 2 1 · 2 q · 10 k 1 = 1 n must hold true. The same applies to double. Similarly, by proof of contradiction, for float, it is assumed that when 2 1 · 2 q · 10 k 1 1 n holds, 2 35 · 2 q · 10 k 1 = 2 36 2 36 · n holds. That is,
2 35 · 2 q · 10 k 1 = 2 36 2 36 · n 0 < 2 35 · 2 q · 10 k 1 2 36 + 2 36 · n < 1 0 < 2 q 1 · 10 k 1 1 + n < 2 36 2 36 < 2 c + 1 · 2 q 1 · 10 k 1 m 1 < 2 36
As is known from Equation (50), there is
m < 2 c + 1 · 2 q 1 · 10 k 1 < m + 2
Suppose that the decimal part of 2 c + 1 · 2 q 1 · 10 k 1 is represented as n + ; thus, we have
2 c + 1 · 2 q 1 · 10 k 1 m 1 = n + ; if 2 c + 1 · 2 q 1 · 10 k 1 > m + 1 1 n + ; if 2 c + 1 · 2 q 1 · 10 k 1 < m + 1
Substitute Equation (81) into Equation (79), and we get
0 < 2 c + 1 · 2 q 1 · 10 k 1 m 1 < 2 36 0 < 1 n + < 2 36 or 0 < n + < 2 36
Similarly, it can be known that the double range is the range of n + . Therefore, there is
float : n + 0 , 2 36 1 2 36 , 1 double : n + 0 , 2 64 1 2 64 , 1
When 2 1 · 2 q · 10 k 1 1 n , it is known from Equation (47) that 2 c + 1 · 2 q 1 · 10 k 1 is not an integer. Therefore, there is
0 < n + < 1
It is only necessary to prove that Equation (83) does not hold. Discuss the range of the decimal part n + when 2 c + 1 · 2 q 1 · 10 k 1 is not an integer. According to Equation (51), there are
2 c + 1 · 2 q 1 · 10 k 1 = 2 c + 1 · x y = 2 c + 1 · 2 q k 2 5 k + 1 ; q 2 2 c + 1 2 2 + k q · 5 k + 1 ; 1 q 0 2 c + 1 · 5 k 1 2 2 + k q ; q < 0
The maximum value of 2 c + 1 is
float : 2 c + 1 max = 2 25 1 double : 2 c + 1 max = 2 54 1
Discuss based on the denominator range in Equation (85).
  • y 2 c + 1 max
    When y 2 c + 1 max , y max is the expression Equation (86), the following holds true:
    1 y max n + 1 1 y max 1 y max 1 n + 1 1 y max
    Therefore, when y 2 c + 1 max , Equation (83) does not hold true.
  • y > 2 c + 1 max
    Call function Appendix A.5 to calculate the approximation results P * Q * and P * Q * of all possible upper and lower limit rational numbers:
    P * Q * , P * Q * = f 2 c + 1 max , x , y
    Therefore, for  n + , the following conclusion can be drawn from formula in Appendix A.4.
    n + Q * x % y y , Q * x % y y
    By exhausting all possibilities, we thus have the following (the test code file is (test7.py) https://github.com/xjb714/xjb/blob/main/py_test/test7.py (accessed on 20 April 2026)):
    float : 2 33 < n + < 1 2 29 double : 2 62 < n + < 1 2 63
    float : Q * x % y y , Q * x % y y 0 , 2 36 = Q * x % y y , Q * x % y y 1 2 36 , 1 = double : Q * x % y y , Q * x % y y 0 , 2 64 = Q * x % y y , Q * x % y y 1 2 64 , 1 =
    Therefore, when y > 2 c + 1 max , Equation (83) does not hold true.
In summary, when 2 1 · 2 q · 10 k 1 1 n , Equation (83) does not hold true; that is, 2 35 · 2 q · 10 k 1 2 36 2 36 · n must hold true. Therefore, when 2 35 · 2 q · 10 k 1 = 2 36 2 36 · n holds, 2 1 · 2 q · 10 k 1 = 1 n must hold true. The same is true for double. Therefore, Equation (65) holds.
The following conclusions hold:
float : 2 36 2 36 · n = 2 36 1 2 36 · n ; if c · 2 36 + q · 10 k 1 Z 2 36 2 36 · n ; if c · 2 36 + q · 10 k 1 Z double : 2 64 2 64 · n = 2 64 1 2 64 · n ; if c · 2 64 + q · 10 k 1 Z 2 64 2 64 · n ; if c · 2 64 + q · 10 k 1 Z
Discuss whether the following Equation (93) holds when conditions Equations (56) and (57) are met:
float : c · 2 q + 35 k 5 k + 1 = c · 2 q + 35 k 5 k + 1 · r = c · 2 q + 35 k 5 k + 1 · 2 63 k 1 · log 2 ( 10 ) / / 10 k + 1 + 1 10 k 1 · 2 k 1 · log 2 ( 10 ) 63 double : c · 2 q + 63 k 5 k + 1 = c · 2 q + 63 k 5 k + 1 · r = c · 2 q + 63 k 5 k + 1 · 2 127 k 1 · log 2 ( 10 ) / / 10 k + 1 + 1 10 k 1 · 2 k 1 · log 2 ( 10 ) 127
There are
float : c · 2 q + 35 k 5 k + 1 = 2 36 · m + n = 2 36 · m + 2 36 · n double : c · 2 q + 63 k 5 k + 1 = 2 64 · m + n = 2 64 · m + 2 64 · n
It has been proven earlier that m can be accurately calculated. Then, when Equation (93) holds true, the values 2 36 · n and 2 64 · n on the right side of equations Equations (58) and (59) can be accurately calculated.
From Equation (51), we have
c = t · 5 k + 1 1 2
Substituting Equation (95) into Equation (93), we have
float : c · 2 q + 35 k 5 k + 1 = t · 2 q + 34 k 2 q + 34 k 5 k + 1 double : c · 2 q + 63 k 5 k + 1 = t · 2 q + 62 k 2 q + 62 k 5 k + 1
When the conditions of Equations (56) and (57) are met, t · 2 q + 34 k and t · 2 q + 62 k are integers. Under the condition of meeting condition Equation (56), the decimal part of expression Equation (96) is represented as follows:
float : 2 q + 34 k % 5 k + 1 5 k + 1 ; 2 q 33 double : 2 q + 62 k % 5 k + 1 5 k + 1 ; 2 q 76
It is only necessary to prove that the increase in the value c · 2 q + 35 k 5 k + 1 · r on the right side of the expression compared to the value c · 2 q + 35 k 5 k + 1 on the left side plus the decimal part of the value on the left side is less than 1 for Equation (93) to hold true. That is,
float : 2 q + 34 k % 5 k + 1 5 k + 1 + c · 2 q + 35 k 5 k + 1 · r c · 2 q + 35 k 5 k + 1 < 1 double : 2 q + 62 k % 5 k + 1 5 k + 1 + c · 2 q + 63 k 5 k + 1 · r c · 2 q + 63 k 5 k + 1 < 1
By exhaustively calculating the maximum possible c value under each q and substituting it into Equation (98), it holds. The calculation result can be found at in (test2.py) https://github.com/xjb714/xjb/blob/main/py_test/test2.py (accessed on 20 April 2026). The calculation results show that, for the float range and the double range, Equation (98) always holds true. Therefore, Equation (93) holds true, and thus, the values of 2 36 · n and 2 64 · n on the right side of Equations (58) and (59) can be accurately calculated. The values of 2 35 · 2 q · 10 k 1 and 2 63 · 2 q · 10 k 1 on the left side of Equations (58) and (59) can be calculated through lookup tables.
float : 2 35 · 2 q · 10 k 1 = p o w 10 28 q k 1 · log 2 ( 10 ) double : 2 63 · 2 q · 10 k 1 = p o w 10 64 q k 1 · log 2 ( 10 )
The code file for verifying the validity of Equation (99) is (test4.py) https://github.com/xjb714/xjb/blob/main/py_test/test4.py (accessed on 20 April 2026). Therefore, when the conditions of Equations (56) and (57) are met, the values of both sides of Equations (58) and (59) can be accurately calculated.
Discuss the relationship between the following two values within all ranges of floating-point numbers:
float : c · 2 q + 36 · 10 k 1 ; c · 2 q + 36 · r · 10 k 1 ; double : c · 2 q + 64 · 10 k 1 ; c · 2 q + 64 · r · 10 k 1 ;
When r = 1 , it is obvious that the two values in Equation (100) are equal. When r 1 , or equivalent to r > 1 ,
float : c · 2 q + 36 · r · 10 k 1 = c · 2 q + 36 · 10 k 1 + c · 2 q + 36 · r 1 · 10 k 1 < c · 2 q + 36 · 10 k 1 + 2 24 · 2 36 · 2 q · 10 k 1 · r 1 < c · 2 q + 36 · 10 k 1 + 2 3 c · 2 q + 36 · r · 10 k 1 c · 2 q + 36 · 10 k 1 + 1 double : c · 2 q + 64 · r · 10 k 1 = c · 2 q + 64 · 10 k 1 + c · 2 q + 64 · r 1 · 10 k 1 < c · 2 q + 64 · 10 k 1 + 2 53 · 2 64 · 2 q · 10 k 1 · r 1 < c · 2 q + 64 · 10 k 1 + 2 10 c · 2 q + 64 · r · 10 k 1 c · 2 q + 64 · 10 k 1 + 1
Therefore, there is
float : 0 c · 2 q + 36 · r · 10 k 1 c · 2 q + 36 · 10 k 1 1 double : 0 c · 2 q + 64 · r · 10 k 1 c · 2 q + 64 · 10 k 1 1
because there is
c · 2 q · 10 k 1 = c · 2 q · r · 10 k 1 = m
float : c · 2 q + 36 · 10 k 1 = 2 36 · m + 2 36 · n double : c · 2 q + 64 · 10 k 1 = 2 64 · m + 2 64 · n
Suppose
n r = c · 2 q · r · 10 k 1 m
Therefore, the following conclusion can be drawn: when Equation (56) is met, from Equation (93), we have
float : 2 q 33 & & 2 c ± 1 % 5 k + 1 = 0 2 36 · n = 2 36 · n r double : 2 q 76 & & 2 c ± 1 % 5 k + 1 = 0 2 64 · n = 2 64 · n r
Within the range of floating-point numbers, there exists
float : 2 36 · n 2 36 · n r 2 36 · n + 1 double : 2 64 · n 2 64 · n r 2 64 · n + 1
To simplify the expression, e v e n is used to indicate whether c is an even number:
e v e n = ( c + 1 ) % 2 0 , 1
When 2 1 · 2 q · 10 k 1 = n or 2 1 · 2 q · 10 k 1 = 1 n , 2 1 · 2 q · 10 k 1 = n is the boundary condition for o n e = 0 , and  2 1 · 2 q · 10 k 1 = 1 n is the boundary condition for o n e = 10 . Whether o n e is 0 or 10 is determined based on whether c is an even number. Therefore, the following exists:
float : o n e = 0 : 2 q + 35 · 10 k 1 + e v e n > 2 36 · n r o n e = 10 : 2 q + 35 · 10 k 1 + e v e n > 2 36 1 2 36 · n r double : o n e = 0 : 2 q + 63 · 10 k 1 + e v e n > 2 64 · n r o n e = 10 : 2 q + 63 · 10 k 1 + e v e n > 2 64 1 2 64 · n r
Therefore, when 2 1 · 2 q · 10 k 1 = n or 2 1 · 2 q · 10 k 1 = 1 n , we can use the condition Equation (110) to determine whether o n e = 0 or o n e = 10 .
float : if 2 q + 35 · 10 k 1 + e v e n > 2 36 · n r : o n e = 0 if 2 q + 35 · 10 k 1 + e v e n > 2 36 1 2 36 · n r : o n e = 10 double : if 2 q + 63 · 10 k 1 + e v e n > 2 64 · n r : o n e = 0 if 2 q + 63 · 10 k 1 + e v e n > 2 64 1 2 64 · n r : o n e = 10
When 2 1 · 2 q · 10 k 1 > n or 2 1 · 2 q · 10 k 1 > 1 n , we can also use the above condition Equation (110) to determine whether o n e = 0 or o n e = 10 . When 2 1 · 2 q · 10 k 1 < n or 2 1 · 2 q · 10 k 1 < 1 n , we can also use the above condition Equation (110) to determine whether o n e 0 or o n e 10 . There are a total of four situations. The proof is as follows:
(1)
When 2 1 · 2 q · 10 k 1 < n , there must exist o n e 0 , and there is
float : 2 1 · 2 q · 10 k 1 n = n 1 2 33 1 , 2 29 double : 2 1 · 2 q · 10 k 1 n = n 1 2 62 1 , 2 63
Therefore, the following exists:
float : 2 q + 35 · 10 k 1 2 36 · n 2 3 2 36 , 2 7 double : 2 q + 63 · 10 k 1 2 64 · n 4 2 64 , 2
Suppose that there are two real numbers a and b, and the following relationship must exist:
0 b b < 1 a a 1 < b b < 1 + a a a b 1 < a b < a b + 1
When a = 2 q + 35 · 10 k 1 and b = 2 36 · n or a = 2 q + 63 · 10 k 1 and b = 2 64 · n , the following exists:
float : 2 q + 35 · 10 k 1 2 36 · n < 2 q + 35 · 10 k 1 2 36 · n + 1 double : 2 q + 63 · 10 k 1 2 64 · n < 2 q + 63 · 10 k 1 2 64 · n + 1
From Equation (112), we have
float : 2 q + 35 · 10 k 1 2 36 · n < 1 2 7 < 0 double : 2 q + 63 · 10 k 1 2 64 · n < 1 2 < 0
Therefore, there is
float : 2 q + 35 · 10 k 1 + e v e n 2 q + 35 · 10 k 1 + 1 < 2 36 · n 2 36 · n r 2 q + 35 · 10 k 1 + e v e n < 2 36 · n r double : 2 q + 63 · 10 k 1 + e v e n 2 q + 63 · 10 k 1 + 1 < 2 64 · n 2 64 · n r 2 q + 63 · 10 k 1 + e v e n < 2 64 · n r
Therefore, when 2 1 · 2 q · 10 k 1 < n , the condition Equation (110) can be used to determine that o n e 0 .
(2)
When 2 1 · 2 q · 10 k 1 > n , there must exist o n e = 0 , and there is
float : 2 1 · 2 q · 10 k 1 n = n 2 33 , 1 2 29 double : 2 1 · 2 q · 10 k 1 n = n 2 62 , 1 2 63
Therefore, the following exists:
float : 2 q + 35 · 10 k 1 2 36 · n 2 3 , 2 36 2 7 double : 2 q + 63 · 10 k 1 2 64 · n 4 , 2 64 2
When a = 2 q + 35 · 10 k 1 and b = 2 36 · n or a = 2 q + 63 · 10 k 1 and b = 2 64 · n , from Equation (113), the following exists:
float : 2 q + 35 · 10 k 1 2 36 · n > 2 q + 35 · 10 k 1 2 36 · n 1 double : 2 q + 63 · 10 k 1 2 64 · n > 2 q + 63 · 10 k 1 2 64 · n 1
From Equation (118), we have
float : 2 q + 35 · 10 k 1 2 36 · n > 2 3 1 0 double : 2 q + 63 · 10 k 1 2 64 · n > 4 1 0
Therefore, there is
float : 2 q + 35 · 10 k 1 + e v e n 2 q + 35 · 10 k 1 > 2 36 · n + 1 2 36 · n r 2 q + 35 · 10 k 1 + e v e n > 2 36 · n r double : 2 q + 63 · 10 k 1 + e v e n 2 q + 63 · 10 k 1 > 2 64 · n + 1 2 64 · n r 2 q + 63 · 10 k 1 + e v e n > 2 64 · n r
Therefore, when 2 1 · 2 q · 10 k 1 > n , the condition Equation (110) can be used to determine that o n e = 0 .
(3)
When 2 1 · 2 q · 10 k 1 < 1 n , there must exist o n e 10 , and there is
float : 2 1 · 2 q · 10 k 1 + n = n + 2 33 , 1 2 29 double : 2 1 · 2 q · 10 k 1 + n = n + 2 62 , 1 2 63
Therefore, the following exists:
float : 2 q + 35 · 10 k 1 + 2 36 · n 2 3 , 2 36 2 7 double : 2 q + 63 · 10 k 1 + 2 64 · n 4 , 2 64 2
Suppose that there are two real numbers a and b, and the following relationship must exist:
a 1 < a a b 1 < b b a + b 2 < a + b a + b
When a = 2 q + 35 · 10 k 1 and b = 2 36 · n or a = 2 q + 63 · 10 k 1 and b = 2 64 · n , the following exists:
float : 2 q + 35 · 10 k 1 + 2 36 · n 2 q + 35 · 10 k 1 + 2 36 · n double : 2 q + 63 · 10 k 1 + 2 64 · n 2 q + 63 · 10 k 1 + 2 64 · n
From Equation (123), we have
float : 2 q + 35 · 10 k 1 + 2 36 · n < 2 36 2 7 double : 2 q + 63 · 10 k 1 + 2 64 · n < 2 64 2
Therefore, there is:
float : 2 q + 35 · 10 k 1 + e v e n 2 q + 35 · 10 k 1 + 1 < 2 36 2 2 36 · n < 2 36 1 2 36 · n r 2 q + 35 · 10 k 1 + e v e n < 2 36 1 2 36 · n r double : 2 q + 63 · 10 k 1 + e v e n 2 q + 63 · 10 k 1 + 1 < 2 64 2 2 64 · n < 2 64 1 2 64 · n r 2 q + 63 · 10 k 1 + e v e n < 2 64 1 2 64 · n r
Therefore, when 2 1 · 2 q · 10 k 1 < 1 n , the condition Equation (110) can be used to determine that o n e 10 .
(4)
When 2 1 · 2 q · 10 k 1 > 1 n , there must exist o n e = 10 , and there is
float : 2 1 · 2 q · 10 k 1 + n = n + + 1 1 + 2 33 , 2 2 29 double : 2 1 · 2 q · 10 k 1 + n = n + + 1 1 + 2 62 , 2 2 63
Therefore, the following exists:
float : 2 q + 35 · 10 k 1 + 2 36 · n 2 3 + 2 36 , 2 37 2 7 double : 2 q + 63 · 10 k 1 + 2 64 · n 4 + 2 64 , 2 65 2
When a = 2 q + 35 · 10 k 1 and b = 2 36 · n or a = 2 q + 63 · 10 k 1 and b = 2 64 · n , from Equation (124), the following exists:
float : 2 q + 35 · 10 k 1 + 2 36 · n > 2 q + 35 · 10 k 1 + 2 36 · n 2 double : 2 q + 63 · 10 k 1 + 2 64 · n > 2 q + 63 · 10 k 1 + 2 64 · n 2
From Equation (129), we have
float : 2 q + 35 · 10 k 1 + 2 36 · n > 2 36 + 2 3 2 > 2 36 double : 2 q + 63 · 10 k 1 + 2 64 · n > 2 64 + 2 2 > 2 64
Therefore, there is
float : 2 q + 35 · 10 k 1 + e v e n 2 q + 35 · 10 k 1 > 2 36 2 36 · n > 2 36 1 2 36 · n r 2 q + 35 · 10 k 1 + e v e n > 2 36 1 2 36 · n r double : 2 q + 63 · 10 k 1 + e v e n 2 q + 63 · 10 k 1 > 2 64 2 64 · n > 2 64 1 2 64 · n r 2 q + 63 · 10 k 1 + e v e n > 2 64 1 2 64 · n r
Therefore, when 2 1 · 2 q · 10 k 1 > 1 n , the condition Equation (110) can be used to determine that o n e = 10 .
From the above proof, it can be seen that when Equation (56) is met, the condition Equation (110) can be used to determine whether o n e = 0 or o n e = 10 when 2 1 · 2 q · 10 k 1 = n or 2 1 · 2 q · 10 k 1 = 1 n . When 2 1 · 2 q · 10 k 1 > n or 2 1 · 2 q · 10 k 1 > 1 n , the condition Equation (110) can be used to determine whether o n e = 0 or o n e = 10 . When 2 1 · 2 q · 10 k 1 < n or 2 1 · 2 q · 10 k 1 < 1 n , the condition Equation (110) can be used to determine whether o n e 0 or o n e 10 .
The proof process of this section is completed. In the code implementation, the two judgment conditions can be quickly calculated using addition and subtraction shift operations, and they can be compiled by the compiler into cmov instructions, thereby reducing the impact of branch prediction failure on performance.
Ultimately, we reached the following conclusion: o n e can quickly determine whether o n e equals 0 or 10 by using the following method:
float : 2 q + 35 · 10 k 1 + e v e n > 2 36 · n r o n e = 0 2 q + 35 · 10 k 1 + e v e n > 2 36 1 2 36 · n r o n e = 10 double : 2 q + 63 · 10 k 1 + e v e n > 2 64 · n r o n e = 0 2 q + 63 · 10 k 1 + e v e n > 2 64 1 2 64 · n r o n e = 10

3.7. Efficient Computation of 10 n and Rounding

Determine whether o n e is 10 n or 10 n + 1 based on the decimal part of 10 n . There are two cases: the decimal part of 10 n is 0.5, or it is not 0.5.

3.7.1. 10 n 10 n = 0.5

When the decimal part of 10 n is 0.5, there must be
10 n 10 n = 0.5 10 · c · 2 q · 10 k 1 10 · c · 2 q · 10 k 1 = 0.5 c · 2 q · 10 k c · 2 q · 10 k = 0.5 c · 2 q · 10 k = c · 2 q · 10 k + 0.5 2 c · 2 q · 10 k = 2 c · 2 q · 10 k + 1
so 2 c · 2 q · 10 k is an odd number. Then, the following expression is odd:
c · 2 q + 1 · 10 k = c · 2 q k + 1 · 5 k
According to the range of q, there are
c · 2 q + 1 · 10 k = c · 2 q k + 1 5 k ; q 0 c · 2 · 5 k ; q = 1 c · 5 k 2 k q 1 ; q 2
According to the range of q, the following situations are discussed:
  • q 0
    When q 0 , it can be concluded that q k + 1 1 , the numerator c · 2 q k + 1 is even, and the denominator 5 k is odd, which does not meet the condition.
  • q = 1
    When q = 1 , it can be concluded that c · 2 · 5 k is even, which does not meet the condition.
  • q 2
    5 k is an odd number, and c is an odd multiple of 2 k q 1 , so
    float : c 2 k q 1 k q 1 22 q 34 double : c 2 k q 1 k q 1 51 q 75
    Therefore, when q meets the above conditions, c must be an odd multiple of 2 k q 1 . Therefore, when the following conditions are met, the expression of Equation (135) is an odd number:
    float : 34 q 2 & & c % 2 k q = 2 k q 1 double : 75 q 2 & & c % 2 k q = 2 k q 1
    When q is within the range of Equation (138), r = 1 is derived from Equation (28). Therefore, there is
    n r = n
    The following equation holds:
    20 m + 20 n = c · 2 q · 10 k + 1 = c · 2 q k + 1 · 5 k = c 2 k q 1 · 5 k
    Since k 1 , 5 k is multiple of 5 and is an odd number. Since c 2 k q 1 and 5 k are both odd numbers, 20 m is an even number, and 20 n is multiple of 5 and is an odd number. Therefore, there is
    20 n 5 , 15 n 0.25 , 0.75 n r 0.25 , 0.75
    The result of o n e is an even number between 10 n and 10 n + 1 . Therefore,
    o n e = 10 n = 2 , if n = 0.25 10 n + 1 = 8 , if n = 0.75 o n e = 20 n + 1 / / 2 ( n = 0.25 ? 1 : 0 )

3.7.2. 10 n 10 n 0.5

When the decimal part of 10 n is not 0.5, round to the nearest integer value based on the decimal part of 10 n . Therefore, there is
o n e = 10 n , if 10 n 10 n < 0.5 10 n + 1 , if 10 n 10 n > 0.5 o n e = 10 n + 0.5 = 20 n + 1 / / 2
Since 20 n + 1 = 20 n + 1 , it is only necessary to accurately calculate the value of 20 n , and there is
d = t e n + o n e = 10 m + 20 n + 1 / / 2 = ( 20 m + 20 n + 1 ) / / 2
Suppose that there are
20 m + 20 n = c · 2 q + 1 · 10 k = c · 2 q k + 1 · 5 k = c · x y
Suppose that the decimal part of 20 n is n 20 .
When y c max = C , the range of the decimal part must include the following:
float : 1 2 24 1 = 1 C n 20 1 1 C = 2 24 2 2 24 1 double : 1 2 53 1 = 1 C n 20 1 1 C = 2 53 2 2 53 1
When y > c max = C , the range of the decimal part must include (the test file is (test5.py) https://github.com/xjb714/xjb/blob/main/py_test/test5.py (accessed on 20 April 2026)):
float : 2 32 < n 20 < 1 2 30 double : 2 64 < n 20 < 1 2 62
Therefore, the range of n 20 satisfies Equation (147). In the code implementation, for float, only the high 36 bits of n r are retained, and for double, only the high 70 bits of n r are retained. Suppose that the discarded part of a float is represented as n 36 , and similarly, the discarded part of a double is represented as n 70 . Therefore, there is
float : n 36 0 , 2 36 double : n 70 0 , 2 70
Calculate the boundary conditions of the following expression:
float : F = 20 · c · 2 q · r · 10 k 1 n 36 double : F = 20 · c · 2 q · r · 10 k 1 n 70
Therefore, there is
float : F min > 20 · c · 2 q · 10 k 1 2 36 = 20 m + 20 n 20 · 2 36 F max < 20 · c · 2 q · 1 + 2 63 · 10 k 1 0 < 20 m + 20 n + 20 · 2 63 · c < 20 m + 20 n + 1 double : F min > 20 · c · 2 q · 10 k 1 2 70 = 20 m + 20 n 20 · 2 70 > 20 m + 20 n F max < 20 · c · 2 q · 1 + 2 127 · 10 k 1 0 < 20 m + 20 n + 20 · 2 127 · c < 20 m + 20 n + 1
Therefore, there is
float : F = 20 m + 20 n double : F = 20 m + 20 n
In fact, in the above proof process, for float, F m i n 20 m + 20 n may exist, but the code implementation has passed the exhaustive test. For the float type, our code implementation has passed all tests for all possible input values, so this not-so-perfect proof process can be ignored. Therefore, the calculation of d can be simplified as follows:
d = t e n + o n e = ( F + 1 ) / / 2 = ( 20 · ( c · 2 q · r · 10 k 1 n x ) + 1 ) / / 2
For the float range, n x = n 36 ; for the double range, n x = n 70 .

3.7.3. Efficient Implementation of n = 0.25 for Double

For double, quickly determine that n = 0.25 in Equation (142).
When n = 0.25 , 2 64 · n r = 2 64 · n = 2 62 . Therefore, the following condition can be used to quickly determine whether n = 0.25 :
double : n = 0.25 if 2 64 · n r = 2 62
When n 0.25 , calculate the range of the decimal part of the following expression:
4 m + 4 n = c · 2 q + 2 · 10 k 1
Therefore, when Equation (154) is not an integer, we have the following (the test file is (test6.py) https://github.com/xjb714/xjb/blob/main/py_test/test6.py (accessed on 20 April 2026)):
2 62 < 4 n 4 n < 1 2 62
Calculate the two boundary cases of 4 n that are closest to 1:
4 n = 0 4 n 0 < 1 2 62 2 64 · n 2 62 2 4 n = 1 4 n 1 > 2 62 2 64 · n 2 62 + 1
Then, there are
2 64 · n 2 62 & & 2 64 · n + 1 2 62 2 64 · n r 2 62
Therefore, the following condition can be used to quickly determine whether n 0.25 :
double : n 0.25 if 2 64 · n r 2 62
In summary, for double, the following condition can be used to quickly determine whether n = 0.25 :
double : n = 0.25 if 2 64 · n r = 2 62 double : n 0.25 if 2 64 · n r 2 62

3.7.4. Efficient Calculation of o n e for Double

In the double range, introduce another faster way to calculate o n e :
double : o n e = 2 64 · n r 2 64 · 10 + n = 0.25 ? 0 : 2 1 + 6 2 64
The proof of Equation (160) is as follows:
  • when n = 0.25 , 2 64 · n r 2 64 · 10 = 10 n = 2 ;
  • when n 0.25 , Equation (160) can be equivalent to the following:
double : o n e = 2 64 · n r 2 64 · 10 + 2 1 + 6 2 64
According to the 10 n 10 n range, o n e is represented as follows:
double : o n e = 10 n , if 10 n 10 n < 0.5 8 , if 10 n 10 n = 0.5 10 n + 1 , if 10 n 10 n > 0.5 = 20 n + 1 / / 2
Therefore, when n 0.25 , we need to prove that the following equation holds:
2 64 · n r 2 64 · 10 + 2 1 + 6 2 64 = 10 n , if 10 n 10 n < 0.5 8 , if 10 n 10 n = 0.5 10 n + 1 , if 10 n 10 n > 0.5 = 20 n + 1 / / 2
From the range of n, there is
2 64 · n r 2 64 n r 2 64 , n r
because the following conditions exist:
c · 2 q · 10 k 1 = m + n c · 2 q · r · 10 k 1 = m + n r
Therefore, the following relationship can be concluded:
n r n = r 1 · c · 2 q · 10 k 1 n r = r 1 · m + n + n n n r < 2 127 · c + n n n r < 2 127 · 2 53 + n n n r < 2 74 + n
From Equations (164) and (166), it can be concluded that
2 64 · n r 2 64 n 2 64 , n + 2 74 2 64 · n r 2 64 · 10 10 n 10 · 2 64 , 10 n + 10 · 2 74 2 64 · n r 2 64 · 20 20 n 20 · 2 64 , 20 n + 20 · 2 74 2 64 · n r 2 64 · 20 20 n + n 20 20 · 2 64 , 20 n + n 20 + 20 · 2 74
Discuss the range of values of x when the following conditions are met:
2 64 · n r 2 64 · 20 + 1 + x / / 2 = 20 n + 1 / / 2 = o n e
Therefore, the following conclusions can be drawn:
20 n + n 20 20 · 2 64 + 1 + x 20 n + 1 x 20 · 2 64 n 20 20 n + n 20 + 20 · 2 74 + 1 + x < 20 n + 2 x < 1 20 · 2 74 n 20
Suppose x = 12 · 2 64 . From Equation (169), all floating-point numbers that do not meet the following conditions can be obtained:
x = 12 · 2 64 20 · 2 64 n 20
All floating-point numbers that do not meet the conditions of Equation (170) are as follows (hexadecimal and the printed results that meet the SW principle):
0 x 0 d 17 c 0747 bd 76 fa 1 , 1.3588129002659584 e - 245 0 x 0 d 27 c 0747 bd 76 fa 1 , 2.7176258005319167 e - 245 0 x 4 d 73 de 005 bd 620 df , 1.3076622631878654 e + 65 0 x 4 d 83 de 005 bd 620 df , 2.6153245263757307 e + 65 0 x 4 d 93 de 005 bd 620 df , 5.230649052751461 e + 65
From Equation (169), all floating-point numbers that do not meet the following conditions can be obtained:
x = 12 · 2 64 < 1 20 · 2 74 n 20
All floating-point numbers that do not meet the conditions of Equation (172) are as follows (hexadecimal and the printed results that meet the SW principle):
0 x 612491 daad 0 ba 280 , 9.03725590277404 e + 159 0 x 6159 b 651584 e 8 b 20 , 9.03725590277404 e + 160 0 x 619011 f 2 d 73116 f 4 , 9.03725590277404 e + 161 0 x 61 c 4166 f 8 cfd 5 cb 1 , 9.03725590277404 e + 162 0 x 61 d 4166 f 8 cfd 5 cb 1 , 1.807451180554808 e + 163
There are
2 ( 2 64 · n r 2 64 · 10 + 2 1 + 6 2 64 ) = 2 64 · n r 2 64 · 20 + 1 + x
When the floating-point number is not within the range specified in Equations (171) and (173), the condition of Equation (169) is satisfied. We have tested all floating-point numbers within the above-mentioned range (Equations (171) and (173)), and the algorithm implementation code output the correct result; that is, it satisfies the SW principle. The test process file is (test8.py) https://github.com/xjb714/xjb/blob/main/py_test/test8.py (accessed on 20 April 2026).
In summary, Equations (163) and (160) hold. Therefore, Equation (160) can be used to quickly calculate o n e .

3.8. Irregular Number

Due to the limited and small number of irregular floating-point numbers, there are a total of 2046 double floating-point numbers and 254 float floating-point numbers. The correctness of the algorithm code in this paper can be proved by the exhaustive method. Compared with the Schubfach algorithm, all of the irregular values produced exactly the same output results; therefore, it is not introduced in this article. In the code implementation, we use separate branches to calculate the irregular values. For the specific implementation process, please refer to the source code.

3.9. Implementation of Pseudocode

This subsection details the pseudocode implementation for converting regular floating-point numbers to decimal representation. The handling of irregular floating-point numbers is omitted here due to space constraints; interested readers may refer to the source code for the complete implementation.

3.9.1. Single-Precision Floating-Point Numbers

uint 32 v i = bit _ copy _ from _ f 32 ( v ) ( uint 64 c , int 32 q ) = extract ( v i ) const int BIT = 36 const uint 64 offset = ( 1 ULL ( BIT 2 ) ) 7 int 32 k = ( q · 1233 ) 12 int 32 h = q + ( ( 1701 · ( k 1 ) ) 9 ) uint 64 pow 10 = get _ pow 10 ( k 1 ) uint 64 c b = c ( h + BIT + 1 ) uint 64 hi 64 = umul _ hi 64 × 64 ( c b , pow 10 ) bool even = ( v i + 1 ) & 1 uint 64 half = ( pow 10 ( 65 ( h + BIT + 1 ) ) ) + even uint 32 shorter = ( hi 64 + half ) BIT uint 64 bias = ( hi 64 ( BIT 4 ) ) & 15 uint 32 longer = ( 5 · hi 64 + offset + bias ) ( BIT 1 ) bool updown = shorter > ( ( hi 64 half ) BIT ) uint 32 d = updown ? shorter · 10 : longer
Pseudocode Equation (175) outlines the procedure for computing d and k for single-precision floating-point numbers.
Since c contains at most 23 significant bits and pow 10 has 64 significant bits, their product contains at most 87 significant bits. Given that the range of h is [ 4 , 1 ] , the value c b = c ( h + BIT + 1 ) does not overflow, preserving all bits of c. When e 10 = k 1 , pow 10 is computed using Equation (24). From Equation (25), we derive
pow 10 · 2 ( k 1 ) · log 2 10 63 = 10 k 1 · r 1 , k 1
From Equation (45),
m = v · 10 k 1 = v · 10 k 1 · r 1 , k 1 = v · pow 10 · 2 ( k 1 ) · log 2 10 63 = c · pow 10 · 2 q + ( k 1 ) · log 2 10 63 = ( c · pow 10 ) ( 63 q ( k 1 ) · log 2 10 ) = ( c · pow 10 ) ( 63 h ) = ( ( c ( 37 + h ) ) · pow 10 ) ( 36 + 64 ) = hi 64 36
Let up indicate whether o n e = 10 . From Equations (132) and (99),
bool up = 2 q + 35 · 10 k 1 + even > 2 36 1 2 36 · n r = 2 q + 35 · 10 k 1 + even + 2 36 · n r 2 36 = ( pow 10 ( 28 h ) ) + even + 2 36 · n r 2 36 = ( half + 2 36 · n r ) 36
When o n e 10 , we have d / / 10 = m ; when o n e = 10 , we have d / / 10 = m + 1 . Therefore,
d / / 10 = m + up
The upper 28 bits of hi 64 represent m, while the lower 36 bits represent 2 36 · n r :
hi 64 = ( m 36 ) + 2 36 · n r
Consequently,
shorter = d / / 10 = ( hi 64 + half ) 36
Let updown indicate whether o n e { 0 , 10 } , or equivalently whether the last digit of d is zero:
updown = ( d mod 10 = 0 )
For brevity, let dot _ one = 2 36 · n r . From Equations (128) and (134),
o n e = 0 : half > dot _ one o n e = 10 : half > 2 36 1 dot _ one
The following equations hold:
o n e = 0 : ( hi 64 half ) 36 = m 1 o n e = 10 : ( hi 64 half ) 36 = m
Therefore,
o n e = 0 : shorter = m > ( hi 64 half ) 36 = m 1 o n e = 10 : shorter = m + 1 > ( hi 64 half ) 36 = m
Thus, the condition for o n e { 0 , 10 } is
shorter > ( hi 64 half ) 36 d mod 10 = 0
When updown is true, d = 10 · shorter . Otherwise, we compute d using Equation (152). When 10 n 10 n 0.5 ,
d = ( 20 · ( v · 10 k 1 · r n 36 ) + 1 ) / / 2 = ( 20 · ( hi 64 · 2 36 ) + 1 ) / / 2 = ( 5 · hi 64 · 2 34 + 1 ) / / 2 = ( ( 5 · hi 64 + 2 34 ) · 2 34 ) / / 2 = ( ( 5 · hi 64 + 2 34 ) 34 ) / / 2 = ( 5 · hi 64 + 2 34 ) 35
Adding 2 34 serves as a rounding operation. When 10 n 10 n = 0.5 , Equation (142) requires checking whether n = 0.25 . The derivation of longer involves careful handling of edge cases; the correctness of this computation has been verified through exhaustive testing. Similarly, for the float type, our code implementation has passed all tests for possible input values. From the above pseudocode, it can be seen that, for the float type, we only need to perform one 64-bit multiplication of 64-bit integers.

3.9.2. Double-Precision Floating-Point Numbers

uint 64 v i = bit _ copy _ from _ f 64 ( v ) ( uint 64 c , int q ) = extract ( v i ) const int BIT = 6 int k = ( q · 78913 ) 18 int h = q + ( ( 217707 · ( k 1 ) ) 16 ) uint 128 pow 10 = get _ pow 10 ( k 1 ) uint 64 c b = c ( h + BIT + 1 ) uint 128 hi 128 = umul _ hi 64 × 128 ( c b , pow 10 ) bool even = ( v i + 1 ) & 1 uint 64 half = ( pow 10 ( 64 h ) ) + even uint 64 dot _ one = ( uint 64 ) ( hi 128 BIT ) uint 64 ten = 10 · ( hi 128 ( BIT + 64 ) ) uint 64 offset _ num = ( dot _ one = 2 62 ) ? 0 : 2 63 + 6 uint 64 o n e = ( ( uint 128 ) 10 · dot _ one + offset _ num ) 64 o n e = ( half > dot _ one ) ? 0 : o n e o n e = ( half > 2 64 1 dot _ one ) ? 10 : o n e uint 64 d = ten + o n e
Pseudocode Equation (188) presents the corresponding procedure for double-precision floating-point numbers; pow 10 is computed using Equation (26), satisfying
pow 10 · 2 ( k 1 ) · log 2 10 127 = 10 k 1 · r 1 , k 1
From Equation (45),
m = v · 10 k 1 = v · 10 k 1 · r 1 , k 1 = v · pow 10 · 2 ( k 1 ) · log 2 10 127 = c · pow 10 · 2 q + ( k 1 ) · log 2 10 127 = ( c · pow 10 ) ( 127 q ( k 1 ) · log 2 10 ) = ( c · pow 10 ) ( 127 h ) = ( ( c ( 7 + h ) ) · pow 10 ) ( 64 + 70 ) = hi 128 70
Given h [ 4 , 1 ] and c having at most 53 significant bits, c b does not overflow, ensuring accurate computation of ten = 10 · m . By definition, dot _ one = 2 64 · n r . From Equation (99),
half = ( pow 10 ( 64 h ) ) + even = 2 63 · 2 q · 10 k 1 + even
The value of o n e is first computed using Equation (160), and then adjusted if the conditions for o n e = 0 or o n e = 10 are met:
o n e = 2 64 · n r 2 64 · 10 + ( n = 0.25 ) ? 0 : ( 2 1 + 6 2 64 ) = ( dot _ one · 10 + ( dot _ one = 2 62 ? 0 : 2 63 + 6 ) ) 64
An equivalent formulation is
o n e = ( dot _ one = 2 62 ) ? 2 : ( dot _ one · 10 + 2 63 + 6 ) 64
The conditions o n e = 0 and o n e = 10 are determined by Equations (121) and (132), respectively. From the above pseudocode, it can be seen that, for the double type, we only need one 64-bit multiplication by a 128-bit integer.
In the C/C++ implementation, the only branch occurs when computing c and q for subnormal floating-point numbers (marked as unlikely ). The conversion of normal floating-point numbers is branch-free, eliminating branch misprediction penalties. As shown in Pseudocode Equations (175) and (188), the core algorithm is remarkably concise. Irregular numbers are handled in a separate branch—a worthwhile trade-off given their rarity.

3.10. Decimal-to-String Conversion

The floating-point number printing algorithm consists of two main phases. While the first phase, which computes the decimal digits d and exponent k, has been discussed in previous sections, this section focuses on the second phase: converting the computed decimal representation into a string.
Floating-point numbers are typically printed in two formats: fixed-point notation and scientific notation. Table 4 illustrates examples of both formats.
Having computed d and k, we can now convert the floating-point number to a string based on these values. According to the Schubfach algorithm, d may contain trailing zeros, meaning d mod 10 could be zero. In order to reduce instruction dependencies, we decompose d into two parts: d / / 10 and d mod 10 . Let u p indicate whether o n e equals 10, while  u p d o w n represents the cases where o n e is either 0 or 10. The following relationships hold:
d / / 10 = m + u p u p d o w n = ( o n e = 0 o n e = 10 ) = ( d mod 10 = 0 ) d = 10 · ( m + u p ) + ( u p d o w n ? 0 : o n e )
By calculating the approximate range of m, we obtain
m = c · 2 q · 10 k 1 c · 2 q · 10 k 1 [ 0.1 · c , c ) 0.1 · c min 0.1 · c m c c max
Based on the range of c, we derive
float : normal : [ 2 23 + 1 10 , 2 24 1 ] = [ 838860.9 , 16777215 ] subnormal : [ 1 10 , 2 23 1 ] = [ 0.1 , 8388607 ] double : normal : [ 2 52 + 1 10 , 2 53 1 ] = [ 4.5 × 10 14 , 9 × 10 15 ] subnormal : [ 1 10 , 2 52 1 ] = [ 0.1 , 4.5 × 10 15 ]
Let l e n 10 ( x ) denote the number of decimal digits in x. For  x > 0 ,
l e n 10 ( x ) = log 10 ( x ) + 1
We define l e n 10 ( 0 ) = 0 . For example, l e n 10 ( 12345 ) = 5 . Consequently,
float : normal : l e n 10 ( m + u p ) [ 6 , 8 ] subnormal : l e n 10 ( m + u p ) [ 0 , 7 ] double : normal : l e n 10 ( m + u p ) [ 15 , 16 ] subnormal : l e n 10 ( m + u p ) [ 0 , 16 ]
Since d / / 10 = m + u p , we have l e n 10 ( d ) = l e n 10 ( m + u p ) + 1 . Let t z 10 ( x ) denote the number of trailing zeros in the decimal representation of x; for example, t z 10 ( 12300 ) = 2 . The number of significant digits s i g 10 ( x ) equals l e n 10 ( x ) t z 10 ( x ) . When u p d o w n = 0 , o n e 0 ; thus, d mod 10 = o n e 0 and t z 10 ( d ) = 0 . Therefore,
s i g 10 ( d ) = u p d o w n ? l e n 10 ( d ) t z 10 ( d ) : l e n 10 ( d ) float : s i g 10 ( d ) [ 1 , 9 ] double : s i g 10 ( d ) [ 1 , 17 ]
For normal floating-point numbers, l e n 10 ( d ) can be computed as follows:
float : l e n 10 ( d ) = l e n ( m + u p ) + 1 = 9 [ m + u p < 10 7 ] [ m + u p < 10 6 ] double : l e n 10 ( d ) = l e n ( m + u p ) + 1 = 16 + [ ( m + u p ) 10 15 ]
where [ P ] denotes the Iverson bracket, which evaluates to 1 if predicate P is true and 0 otherwise.
Computing the ASCII representation of d reduces to computing the ASCII codes of m + u p and o n e . From the above, we have
float : m + u p < 10 8 double : m + u p < 10 16
These bounds allow efficient computation of the ASCII code for m + u p using CPU SIMD instruction sets.
  • Scalar Implementation
We first present a scalar method that does not rely on SIMD instructions.
Let x [ 0 , 10 8 ) and d e c _ t o _ a s c i i 8 ( x ) be a function that computes the ASCII representation of x. Algorithm 2 describes this process. This version is an optimized version of the itoa algorithm written by Paul Khuong [19].
The function simultaneously computes t z 10 ( x ) as the return value t z . Example: X = 12345600, Y =“12345600”, tz = 2.
For x [ 0 , 10 16 ) , let d e c _ t o _ a s c i i 16 ( x ) compute the ASCII representation of x. Algorithm 3 describes this process. Example: X = 123456001234500, Y =“1234560012345600”, tz = 2.
  • Handling Undefined Behavior
Since __ b u i l t i n _ c t z l l ( 0 ) is undefined behavior in C language, special handling is required. For single-precision floating-point numbers, when m + u p = 0 (the six smallest subnormal numbers), u p d o w n = 0 , avoiding the undefined case. For double-precision numbers, m + u p = 0 only occurs for 5 × 10 324 (the smallest subnormal number). In our implementation, we handle this special case separately at the function entry.
Algorithm 2: Convert an 8-digit decimal number to ASCII: dec_to_ascii8(x)
Input: 
X (type: uint64, range: [0, 108))
Output: 
Y (type: uint64), tz (type: uint32)
  1:
uint64 abcd_efgh = X + (0x100000000 - 10000) * ((X * 0x68db8bbULL) > > 40)
  2:
uint64 ab_cd_ef_gh = abcd_efgh + (0x10000 - 100) * (((abcd_efgh * 0x147b) > > 19) & 0x7f0000007f)
  3:
uint64 a_b_c_d_e_f_g_h = ab_cd_ef_gh + (0x100 - 10) * (((ab_cd_ef_gh * 0x67) > > 10) & 0xf000f000f000f)
  4:
uint32 tz = __builtin_ctzll(a_b_c_d_e_f_g_h) > > 3
  5:
uint64 BCD = CPU_IS_LITTLE_ENDIAN ? byteswap(a_b_c_d_e_f_g_h) : a_b_c_d_e_f_g_h
  6:
uint64 Y = BCD + 0x3030303030303030
  7:
return Y, tz
Algorithm 3: Convert a 16-digit decimal number to ASCII: dec_to_ascii16(x)
Input: 
X (type: uint64, range: [0, 1016))
Output: 
Y (type: uint128), tz (type: uint32)
  1:
uint64 abcdefgh = X // 10 8
  2:
uint64 ijklmnop = X mod 10 8
  3:
(uint64 abcdefgh_ascii, uint tz1) = dec_to_ascii8(abcdefgh)
  4:
(uint64 ijklmnop_ascii, uint tz2) = dec_to_ascii8(ijklmnop)
  5:
uint128 Y = CPU_IS_LITTLE_ENDIAN ? (ijklmnop_ascii < < 64) + abcdefgh_ascii : (abcdefgh_ascii < < 64) + ijklmnop_ascii
  6:
uint32 tz = (ijklmnop == 0) ? 8 + tz1 : tz2
  7:
return Y, tz
  • SIMD Implementation
The SIMD implementations of d e c _ t o _ a s c i i 8 and d e c _ t o _ a s c i i 16 follow the same principles as the scalar version. For the ARM64 architecture, we use the NEON instruction set; for the x86-64 architecture, we support three variants: AVX512, SSE4.1, and SSE2. Table 5 summarizes these implementations. Due to the limited space, it might not be appropriate to fully demonstrate the implementation process. Readers can refer to the source code design on their own.
Using the methods above, we compute the ASCII representation of m + u p and the number of significant digits. Printing d is equivalent to printing m + u p and o n e . Based on s i g 10 ( d ) , we determine which buffer contents to retain. In our implementation, we adopt different formats for floating-point numbers in different ranges to enhance readability, as shown in Table 6.
In scientific notation, the result includes an exponent. For example, in “1.2 × 10−2”, “10−2” represents the exponent. Converting d · 10 k to standard decimal scientific notation yields
d · 10 k = d · 10 log 10 ( d ) · 10 k + log 10 ( d )
Since d · 10 log 10 ( d ) [ 1 , 10 ) , the exponent is determined by k + log 10 ( d ) . Let E 10 = k + log 10 ( d ) . Then,
E 10 = k + log 10 ( d ) = k + log 10 ( m + u p ) + 1
For single-precision floating-point numbers, fixed-point notation is used when E 10 [ 3 , 6 ] . For double-precision numbers, fixed-point notation is used when E 10 [ 4 , 15 ] .
The floating-point number printing algorithm proposed in this paper is illustrated as Algorithm 4. The key distinction between our algorithm and others is the use of SIMD instruction sets for converting m + u p to ASCII values. For single-precision numbers, we use d e c _ t o _ a s c i i 8 ; for double-precision, we use d e c _ t o _ a s c i i 16 . The detailed implementation is available in the source code.
Algorithm 4: Floating-point number printing algorithm
Input: 
v (type: float/double), buffer (type: char*)
Output: 
buffer (type: char*)
  1:
compute m + u p , u p d o w n , o n e , k from v
  2:
compute s i g 10 ( d ) = u p d o w n ? l e n 10 ( m + u p ) t z 10 ( m + u p ) : l e n 10 ( m + u p ) + 1
  3:
compute E 10 = k + log 10 ( m + u p ) + 1
  4:
convert m + u p , o n e to ASCII codes and store in buffer
  5:
based on s i g 10 ( d ) and E 10 , determine which buffer contents to retain
  6:
if result is in scientific notation, print E 10
  7:
return buffer
In our C implementation, all branches are designed as unlikely branches to minimize the impact of branch prediction failures. By transforming the computation of d into computing m + u p and o n e , we reduce instruction dependencies and enhance instruction-level parallelism. Unlike other algorithms that compute the exact value of d before printing, we avoid computing d directly and instead quickly compute d / / 10 = m + u p .

3.11. Summary

This section explains how to quickly calculate d and k, as well as the printing optimization. Since d = 10 m + o n e , the process of calculating d is transformed into calculating m and o n e .

4. Experimental Evaluation

This section presents a comprehensive experimental evaluation of the xjb algorithm, comparing its performance against state-of-the-art floating-point–string conversion algorithms across multiple hardware platforms and compilers.

4.1. Correctness Verification

Prior to performance evaluation, we conducted rigorous correctness verification to ensure that xjb fully complies with the Steele–White (SW) principle and produces accurate results. The verification process covered two key aspects: floating-point–decimal conversion, and floating-point–string conversion.
  • Single-Precision (binary32): Given the manageable size of the binary32 search space ( 2 32 possible values), we performed exhaustive testing across the entire range. Each output was compared against the reference Schubfach algorithm to ensure identical results, guaranteeing complete correctness for the binary32 format.
  • Double-Precision (binary64): Exhaustive testing of all 2 64 binary64 values is computationally infeasible. Instead, we employed a comprehensive testing strategy that included the following:
    Large-scale random testing with statistically significant sample sizes.
    Targeted testing of edge cases including subnormal numbers, extreme exponents, and near-power-of-two values.
All test results confirmed that xjb produces outputs identical to the Schubfach algorithm while fully satisfying the SW principle. The complete verification test suite is publicly available at (check.cpp) https://github.com/xjb714/xjb/blob/main/bench/check.cpp (accessed on 20 April 2026).

4.2. Experimental Setup

4.2.1. Hardware Platforms

To evaluate cross-platform performance and portability, we conducted benchmarks on three representative hardware platforms spanning both x86-64 and ARM64 architectures:
  • AMD R7-7840H: A modern high-performance x86-64 processor with support for AVX2 and AVX-512 instruction sets, running Ubuntu 26.04. This platform represents state-of-the-art x86-64 computing (max frequency: 5.1 GHz).
  • Apple M1: A first-generation Apple Silicon ARM64 processor with NEON SIMD support, running macOS 26.4. This platform serves as a baseline for ARM64 performance (max frequency: 3.2 GHz).
  • Apple M5: A recent-generation Apple Silicon ARM64 processor with NEON SIMD support, running macOS 26.4. This platform represents the latest ARM64 technology (max frequency: 4.46 GHz).

4.2.2. Compilers and Compilation Flags

Each platform uses its native compiler toolchain to ensure optimal code generation:
  • AMD R7-7840H: Intel C++ Compiler (icpx) version 2025.0.4.
  • Apple M1/M5: Apple Clang version 21.0.0.
All benchmarks were compiled with -O3 -march=native to enable maximum compiler optimizations and generate architecture-specific instructions, ensuring fair comparison across platforms.

4.2.3. Benchmark Methodology

Our benchmark methodology was designed to ensure fairness, reproducibility, and statistical significance:
  • Input Generation: Generate 2 24 (16,777,216) random floating-point numbers, excluding special values (NaN, and infinity) to focus on the core conversion logic.
  • Warm-Up Phase: Execute the benchmark multiple times before measurement to eliminate cold-start effects and ensure consistent cache behavior.
  • Measurement: Measure the total wall-clock time required to convert all numbers through multiple iterations.
  • Analysis: Calculate the average conversion time per floating-point number, discarding outliers to ensure robust results.
This methodology minimizes system noise and provides reliable, reproducible performance measurements while avoiding skewed results from special-case handling.

4.3. Algorithms Compared

We compared the xjb algorithm with the other algorithms listed in Table 7. However, for some special cases, we will provide advance explanations.
For floating-point numbers to decimal/string:
  • teju_jagua: Only implements float/double-to-decimal conversion.
  • jnum: Only implements double-to-string conversion. When comparing float to string, we convert the double value to a float value. Strictly speaking, the jnum algorithm does not satisfy the SW principle. However, its performance is also quite excellent, so we still included it in the benchmark.
For data type:
  • yy_double, uscalec: Only the double data type is supported.

4.4. Performance Results

We evaluated xjb through two primary conversion interfaces: floating-point–decimal and floating-point–string.
  • Float/Double-to-Decimal Conversion: Table 8 summarizes the benchmark results for float-to-decimal and double-to-decimal conversions across the AMD R7-7840H and Apple M1/M5 platforms. All benchmarks use random values, excluding NaN and infinity, to focus on core conversion performance.
  • Float/Double-to-String Conversion: Figure 1 and Figure 2 present comprehensive benchmark results for double-to-string and float-to-string conversions on the three processor platforms.
    Specifically,
    Figure 1a,c,e show results for completely random double-precision floating-point numbers.
    Figure 2a–c show results for completely random single-precision floating-point numbers.
    Figure 1b,d,f present results for fixed-length significant digits (ranging from 1 to 17 digits).
    In Figure 1 and Figure 2, the suffixes denote specific implementations:
    _comp (e.g., fmt_comp, dragonbox_comp, xjb32_comp, xjb64_comp): Versions using compressed constant tables for reduced memory footprint.
    _full (e.g., fmt_full, dragonbox_full): Versions using uncompressed constant tables for potentially faster access.
    null: An empty function used to isolate and measure the overhead of function calls.

4.5. Analysis and Discussion

4.5.1. Performance Comparison

The benchmark results demonstrate that xjb achieves state-of-the-art performance across all three hardware platforms and both conversion interfaces. For decimal conversion, we compared it against the baseline Schubfach algorithm; for string conversion, we focused on comparisons with zmij, a representative modern high-performance algorithm.
On AMD R7-7840H (x86-64):
  • Float-to-decimal: 2.24 ns, 5.44× faster than Schubfach (12.2 ns);
  • Double-to-decimal: 3.76 ns, 3.06× faster than Schubfach (11.51 ns);
  • Float-to-string: 33.72 cycle, 70% faster than zmij (57.06 cycle);
  • Double-to-string: 43.41 cycle, 13% faster than zmij (49.28 cycle).
On Apple M1 (ARM64):
  • Float-to-decimal: 2.15 ns, 5.41× faster than Schubfach (11.64 ns);
  • Double-to-decimal: 2.58 ns, 5.08× faster than Schubfach (13.12 ns);
  • Float-to-string: 16.98 cycle, 117% faster than zmij (36.87 cycle);
  • Double-to-string: 20.77 cycle, 25% faster than zmij (27.74 cycle).
On Apple M5 (ARM64):
  • Float-to-decimal: 1.44 ns, 5.27× faster than Schubfach (7.59 ns);
  • Double-to-decimal: 1.55 ns, 4.97× faster than Schubfach (7.71 ns);
  • Float-to-string: 13.87 cycle, 136% faster than zmij (32.77 cycle);
  • Double-to-string: 17.09 cycle, 20% faster than zmij (20.74 cycle).
These results establish xjb as the new performance leader in floating-point–string conversion.

4.5.2. Performance Consistency

A defining characteristic of xjb is its consistent performance across diverse input distributions. By designing all conditional branches as unlikely branches, xjb achieves near-optimal branch prediction rates regardless of input patterns. This property is particularly valuable in real-world applications, where input distributions can be unpredictable.
This design addresses a key limitation of algorithms like Dragonbox [10], which trade branch efficiency for reduced multiplication operations. In contrast, xjb simultaneously achieves both minimal multiplication operations and efficient branch handling, combining the strengths of Schubfach [8] and Dragonbox [10] while avoiding their respective trade-offs.

4.5.3. Cross-Platform Performance

The benchmark results demonstrate xjb’s strong portability across processor architectures:
  • x86-64 (AMD R7-7840H): The algorithm benefits from the compiler’s ability to generate highly optimized code for arithmetic operations and SIMD instructions.
  • ARM64 (Apple M1/M5): xjb maintains consistent performance advantages across both generations of Apple Silicon, demonstrating the algorithm’s robustness and effectiveness across different instruction set architectures.
This cross-platform consistency is achieved through careful algorithm design that works harmoniously with compiler optimizations on diverse platforms.

4.5.4. Comparison with Some Related Algorithms

Placing xjb in the context of prior work reveals several key insights:
  • vs. Schubfach: With 3–5× speedup over the baseline, xjb validates that Schubfach’s elegant mathematical framework can be substantially optimized through computational restructuring, branch optimization, and SIMD utilization.
  • vs. yy_json/yy_double: While these algorithms represent excellent engineering for JSON serialization, xjb outperforms them by 2–3×, demonstrating that SIMD instruction set utilization unlocks significant additional optimization potential.
  • vs. zmij: Achieving 1.2–2.3× speedup over zmij highlights the benefits of xjb’s approach to instruction dependency reduction, which enables better instruction-level parallelism on modern superscalar processors.
  • vs. Ryū and Dragonbox: xjb outperforms these established algorithms by 2.5–6×, demonstrating that systematic optimization of multiple bottlenecks simultaneously yields substantial performance gains over approaches that focus on single aspects of the problem.

4.5.5. Fixed-Length Performance Analysis

The fixed-length benchmark results (Figure 1b,d,f) reveal important performance characteristics:
  • Consistent Performance: All xjb variants (xjb32, xjb32_comp, xjb64, xjb64_comp) maintain consistent performance across all digit lengths (1–17 digits). In contrast, some competing algorithms show significant performance variations depending on the number of output digits.
  • Predictable Latency: The branch-free core design ensures that the conversion time remains relatively constant regardless of output length. This predictability is a valuable property for real-time systems and high-throughput applications, where consistent latency is as important as raw performance.
  • Compression Trade-Off: The compressed table variant (_comp) maintains strong competitiveness in performance compared to other algorithms, while also reducing memory usage. This demonstrates the efficiency of xjb in terms of memory utilization.

4.6. Summary

This comprehensive experimental evaluation establishes xjb as the new state-of-the-art method in floating-point–string conversion across multiple hardware platforms. The key findings can be summarized as follows:
  • Superior Performance: xjb consistently outperforms all competing algorithms across both x86-64 (AMD R7-7840H) and ARM64 (Apple M1/M5) architectures.
  • Significant Speedups: Achieves 3–5× improvement over the baseline Schubfach algorithm.
  • Performance Consistency: Maintains stable performance across diverse input distributions and output lengths due to effective branch prediction optimization and branch-free core design.
These results validate the effectiveness of our holistic optimization approach: minimizing multiplication operations, reducing instruction dependencies, optimizing branch patterns, and leveraging SIMD instructions—working synergistically to deliver exceptional performance on modern processor architectures.

5. Conclusions

This paper presented xjb, a novel high-performance algorithm for converting IEEE 754 floating-point numbers to decimal string representations. Building upon the Schubfach algorithm, xjb introduces several key optimizations that significantly improve conversion speed while maintaining full compliance with the Steele–White principle for accurate and minimal-length output.

5.1. Improvements to the Schubfach Algorithm

The xjb algorithm represents a significant advancement over the baseline Schubfach algorithm through several targeted improvements:
  • Restructured Computation Flow: By decomposing the significand calculation into integer and fractional parts, xjb minimizes instruction dependencies, enabling better instruction-level parallelism and improved pipeline utilization on modern superscalar processors.
  • Minimized Multiplication Operations: xjb reduces the number of expensive high-precision multiplications required during conversion. For IEEE 754 binary64, only one 64-bit by 128-bit multiplication is needed, and for binary32, only one 64-bit by 64-bit multiplication is required, significantly decreasing computational overhead.
  • Branch Optimization: The algorithm employs branchless programming techniques for core conversion logic and structures remaining branches as unlikely paths, enabling efficient branch prediction on modern processors and resulting in consistent performance across diverse input distributions.
  • SIMD Instruction Utilization: Unlike Schubfach and many other existing algorithms, xjb leverages SIMD instructions (NEON for ARM64, AVX512/SSE4.1/SSE2 for x86-64) for efficient decimal-to-ASCII conversion, fully exploiting the vector processing capabilities of contemporary processors.

5.2. Key Findings

Our extensive experimental evaluation across AMD R7-7840H (x86-64) and Apple M1/M5 (ARM64) platforms reveals several key findings that advance the state of the art in floating-point–string conversion:
  • Significant Performance Improvement: xjb achieves a remarkable 3–5× speedup over the baseline Schubfach algorithm, representing a substantial leap in performance compared to prior work. On Apple M5, xjb achieves an impressive 1.44 ns for float-to-decimal conversion and 1.55 ns for double-to-decimal conversion, setting a new performance benchmark in the field.
  • Superior to State-of-the-Art Algorithms: xjb consistently outperforms other high-performance algorithms, including yy_json, yy_double, and zmij, by margins of 1.2–3×. This indicates the performance improvement achieved by using the SIMD instruction set and reducing instruction dependencies. On the Apple M5 processor, compared with zmij, our algorithm achieves approximately 20% and 136% speedups for double-to-string and float-to-string conversion, respectively.
  • Consistent Performance across Platforms: Unlike prior work, which often shows significant performance variations between architectures, xjb maintains its performance advantage across both x86-64 and ARM64. This portability is achieved through careful algorithm design that works well with compiler optimizations on different platforms.
  • Stable Performance across Input Distributions: The algorithm maintains consistent performance regardless of input patterns and output digit lengths. This stability can be attributed to our branch-free core design and effective branch prediction optimization for all conditional branches, making xjb ideal for applications requiring predictable performance.
  • Synergistic Optimization Effects: The combination of instruction dependency reduction, multiplication minimization, branch optimization, and SIMD utilization works synergistically to deliver performance gains that exceed what any single optimization could achieve in isolation.
  • Concise Core Implementation: The core conversion logic of xjb is implemented in a concise manner, with minimal code lines and clear logic flow. This design simplifies maintenance and allows for easy integration into larger software systems.
These key findings collectively demonstrate that xjb successfully addresses the research gap identified in the Introduction, providing a comprehensive solution to the limitations of existing floating-point–string conversion algorithms.

5.3. Practical Implications

The xjb algorithm has immediate practical applications in numerous domains:
  • Data Serialization: JSON and other text-based data formats require efficient floating-point–string conversion for serialization operations.
  • Scientific Computing: Applications that output numerical results in a human-readable format benefit from faster conversion without sacrificing accuracy.
  • Database Systems: Export operations and query result formatting can leverage xjb for improved throughput.
  • Web Services: RESTful APIs and web applications that return numerical data can achieve lower latency with efficient conversion.

5.4. Limitations and Future Work

While xjb demonstrates strong performance, several areas warrant further investigation:
  • Extended Precision Support: Future work could extend xjb to support extended precision formats (e.g., 16-bit, 80-bit, 128-bit, and 256-bit floating-point numbers) for applications requiring higher precision.
  • SIMD Vectorization: Although xjb is designed to be SIMD-friendly, explicit vectorization using AVX-512 or NEON could yield additional performance gains for batch conversion workloads.
  • Compiler Compatibility: Further optimization for different compilers (particularly MSVC) would improve portability across development environments.
  • Memory-Constrained Environments: Investigating memory-efficient variants of xjb could benefit embedded systems and other resource-constrained platforms.

5.5. Availability

The complete implementation of the xjb algorithm, along with benchmark tools and test suites, is publicly available at https://github.com/xjb714/xjb/releases/tag/v1.5.0 (accessed on 20 April 2026). We encourage the community to integrate, test, and contribute to the ongoing development of this work.

Author Contributions

Methodology, J.X.; Software, J.X.; Validation, J.X.; Writing—original draft, J.X.; Funding acquisition, T.W.; Writing—review & editing, T.W.; Visualization, T.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Sichuan Science and Technology Program (2024ZDZX0001).

Data Availability Statement

All the source code files in our GitHub repository at https://github.com/xjb714/xjb (accessed on 20 April 2026) are freely accessible.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Mathematical Foundations of Fractional Part Boundary

In this Appendix, we collect several elementary facts concerning rational approximation and Farey sequences. These results are used in the main text to bound fractional parts and to compute best rational approximations efficiently.

Appendix A.1. Notation and Assumptions

Let n, P, Q, n max be positive integers satisfying the following conditions:
  • P and Q are coprime and P < Q ;
  • 1 n n max ;
  • Q > n max .
For a given maximal denominator n max , we denote by
P * Q * and P * Q *
the best rational approximations of P / Q from below and from above, respectively, i.e.,
P * Q * = max 1 n n max Q * n max n P / Q n , P * Q * = min 1 n n max Q * n max n P / Q n .

Appendix A.2. Basic Identities

If n P is not a multiple of Q, then clearly
n · P Q + 1 = n · P Q .
When n P is a multiple of Q, the left-hand side is one larger than the right-hand side; the non-divisibility assumption avoids this degenerate case.

Appendix A.3. A Useful Equivalence

Assume that, for a real number ξ , the equality
n · P Q = n ξ
holds for all 1 n n max . Then, ξ must lie between the best lower and upper rational approximations of P / Q with the denominator bounded by n max . More precisely,
P * Q * = max 1 n n max n P / Q n ξ < min 1 n n max n P / Q + 1 n = min 1 n n max n P / Q n = P * Q * .
Consequently, the admissible range for ξ is the half-open interval
P * Q * ξ < P * Q * .

Appendix A.4. Range of the Fractional Parts

The fractional part of n · P / Q is defined as { n P / Q } = n P / Q n P / Q . Within the set { 1 , 2 , , n max } , this quantity attains its minimum at n = Q * and its maximum at n = Q * . Hence, the fractional parts lie exactly in the interval
( Q * P ) mod Q Q , ( Q * P ) mod Q Q .
where ( x mod Q ) denotes the remainder of x upon division by Q, i.e.,  0 ( x mod Q ) < Q .
  • Proof of the Fractional Part Range:
We prove that for all 1 n n max ,
{ Q * P / Q } { n P / Q } { Q * P / Q } ,
with equality at n = Q * and n = Q * , respectively.
(1)
Notation
Let a = P * , b = Q * , c = P * , d = Q * . By definition, a / b and c / d are adjacent terms in the Farey sequence F n max and satisfy a / b < P / Q < c / d and b c a d = 1 . For any n with 1 n n max , set k = n P / Q . The fraction k / n cannot lie strictly between a / b and c / d , because n n max and the two are neighbors in F n max . Consequently,
k n a b and k + 1 n c d .
(2)
Integer Linear Representation
We now establish a convenient parameterization of the integers n and k = n P / Q using the coefficients of the adjacent Farey fractions a / b and c / d . Because a / b and c / d are neighbors in the Farey sequence F n max , they satisfy the well-known unimodular relation
b c a d = 1 .
This means that the 2 × 2 matrix
M = b d a c
has the determinant det M = b c a d = 1 . Any integer matrix with a determinant of 1 is invertible over the integers; its inverse is given explicitly by
M 1 = c d a b .
For a given n and the corresponding k = n P / Q , consider the column vector n k . Since M is unimodular, there exists a unique pair of integers ( x , y ) such that
M x y = n k b d a c x y = n k .
Multiplying both sides on the left by M 1 yields the explicit formulae for x and y:
x y = c d a b n k = c n d k a n + b k .
Thus, we obtain the two relations
x = c n d k , y = b k a n .
Now, recall that k / n a / b . Because  n > 0 and b > 0 , we may cross-multiply without changing the inequality to obtain b k a n . This immediately implies that
y = b k a n 0 .
Let us define y = y 0 . Substituting y = y back into the original system gives
n = b x + d ( y ) = b x d y , k = a x + c ( y ) = a x c y .
We therefore arrive at the non-negative parameterization
n = b x d y , k = a x c y ,
with integers x 1 and y 0 . If x 0 , then n = b x d y 0 , contradicting n 1 ; hence x must be strictly positive.
This representation is the key to expressing the fractional part { n P / Q } as a simple linear combination of two fundamental positive quantities, which will be exploited in the next step.
(3)
Minimum of the Fractional Parts
Let ε = P / Q a / b > 0 . Then, { b P / Q } = b ε . Using the representation above,
{ n P / Q } = n P Q k = P Q ( b x d y ) ( a x c y ) = x b P Q a y d P Q c .
Since d P / Q c = ( c d P / Q ) = d δ , where δ = c / d P / Q > 0 , we have
{ n P / Q } = x · b ε + y · d δ .
Both b ε and d δ are positive, and  x 1 , y 0 . Therefore,
{ n P / Q } 1 · b ε + 0 · d δ = b ε = { b P / Q } ,
with equality if and only if x = 1 and y = 0 , i.e.,  n = b and k = a . Thus, the minimum is attained at n = Q * .
(4)
Maximum of the Fractional Parts
A completely symmetric argument using the upper approximation gives
1 { n P / Q } = k + 1 n · n n P Q = n k + 1 n P Q .
We have ( k + 1 ) / n c / d , and using the unimodular relations one can show that
1 { n P / Q } = x · d δ + y · b ε
for some non-negative integers x , y with x 1 . Hence,
1 { n P / Q } d δ = 1 { d P / Q } ,
which is equivalent to { n P / Q } { d P / Q } . The maximum is therefore achieved at n = d = Q * . For a detailed proof of the upper bound, one may also consult the theory of continued fractions: the largest fractional part among n θ ( mod 1 ) for n N occurs at the denominator of the best upper approximation.
(5)
Conclusion
The fractional parts all lie in the interval [ { Q * P / Q } , { Q * P / Q } ] , and the endpoints are exactly ( Q * P mod Q ) / Q and ( Q * P mod Q ) / Q , as claimed.

Appendix A.5. Computation via Farey Sequences

The best rational approximations with bounded denominators can be obtained efficiently from the Farey sequence of order C. We define the function
P * Q * , P * Q * = ( D N , U P ) = f ( C , P , Q )
which returns the two adjacent terms in the C-th Farey sequence F C that bracket P / Q . The implementation of f relies on the mediant property of Farey sequences and is provided in the accompanying code (see (test1.py) https://github.com/xjb714/xjb/blob/main/py_test/test1.py (accessed on 20 April 2026)).
Using this function, the bounds in Appendix A.4 can be computed in negligible time, thereby providing a fast method to determine the range of fractional parts discussed in the main text.

References

  1. Steel, G.L., Jr.; White, J.L. How to Print Floating-Point Numbers Accurately. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, PLDI 1990; ACM: New York, NY, USA, 1990; pp. 112–126. [Google Scholar] [CrossRef]
  2. Steel, G.L., Jr.; White, J.L. How to Print Floating-Point Numbers Accurately (Retrospective). ACM SIGPLAN Notices 39(4), April 2004 (Best of PLDI, 1979–1999). Available online: https://dl.acm.org/doi/10.1145/989393.989431 (accessed on 1 April 2004).
  3. Burger, R.G.; Dybvig, R.K. Printing Floating-point Numbers Quickly and Accurately. In Proceedings of the ACM SIGPLAN1996 Conference on Programming Language Design and Implementation (PLDI ’96); ACM: New York, NY, USA, 1996; pp. 108–116. [Google Scholar] [CrossRef]
  4. Loitsch, F. Printing Floating-Point Numbers Quickly and Accurately with Integers. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation, PLDI 2010; ACM: New York, NY, USA, 2010; pp. 233–243. [Google Scholar] [CrossRef]
  5. Andrysco, M.; Jhala, R.; Lerner, S. Printing Floating-Point Numbers: A Faster, Always Correct Method. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016; ACM: New York, NY, USA, 2016; pp. 555–567. [Google Scholar] [CrossRef]
  6. Adams, U. Ryū: Fast Float-to-String Conversion. In Proceedings of 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’18); ACM: New York, NY, USA, 2018; pp. 270–282. [Google Scholar] [CrossRef]
  7. Adams, U. Ryū Revisited: Printf Floating Point Conversion. Proc. ACM Program. Lang. 2019, 3, 169. [Google Scholar] [CrossRef] [PubMed]
  8. Giulietti, R. The Schubfach Way to Render Doubles. 2020. Available online: https://drive.google.com/file/d/1KLtG_LaIbK9ETXI290zqCxvBW94dj058/view (accessed on 1 September 2020).
  9. Jeon, J. Grisu-Exact: A Fast and Exact Floating-Point Printing Algorithm. 2020. Available online: https://github.com/jk-jeon/Grisu-Exact/blob/master/other_files/Grisu-Exact.pdf (accessed on 1 September 2020).
  10. Jeon, J. Dragonbox: A New Floating-Point Binary-to-Decimal Conversion Algorithm. 2024. Available online: https://github.com/jk-jeon/Dragonbox (accessed on 1 July 2024).
  11. Guo, Y.Y. Available online: https://github.com/ibireme/c_numconv_benchmark/blob/master/vendor/yy_double/yy_double.c (accessed on 1 January 2025).
  12. Guo, Y.Y. Available online: https://github.com/ibireme/yyjson (accessed on 1 August 2025).
  13. Cox, R. Available online: https://github.com/rsc/fpfmt (accessed on 1 January 2026).
  14. Cox, R. Floating-Point Printing and Parsing Can Be Simple And Fast. Available online: https://research.swtch.com/fp (accessed on 1 January 2026).
  15. Cox, R. Fast Unrounded Scaling: Proof by Ivy. Available online: https://research.swtch.com/fp-proof (accessed on 1 January 2026).
  16. Zverovich, V. Available online: https://github.com/vitaut/zmij (accessed on 1 March 2026).
  17. ANSI/IEEE Std 754-1985; IEEE Standard for Binary Floating-Point Arithmetic. IEEE: New York, NY, USA, 1985; pp. 1–20. [CrossRef]
  18. IEEE Std 754-2019; (Revision of IEEE 754-2008) IEEE Standard for Floating-Point Arithmetic. IEEE: New York, NY, USA, 2019; pp. 1–84. [CrossRef]
  19. Khuong, P. How to Print Integers Really Fast (with Open Source AppNexus Code!). Available online: https://pvk.ca/Blog/2017/12/22/appnexus-common-framework-its-out-also-how-to-print-integers-faster/ (accessed on 1 December 2017).
  20. Johnson, D. Converting Integers to Fixed-Width Strings Faster with Neon SIMD on the Apple M1. Available online: https://dougallj.wordpress.com/2022/04/01/converting-integers-to-fixed-width-strings-faster-with-neon-simd-on-the-apple-m1/ (accessed on 1 April 2022).
  21. Muła, W. SSE: Conversion Integers to Decimal Representation. Available online: http://0x80.pl/notesen/2011-10-21-sse-itoa.html (accessed on 1 October 2011).
  22. Lemire, D. Converting Integers to Decimal Strings Faster with AVX-512. Available online: https://lemire.me/blog/2022/03/28/converting-integers-to-decimal-strings-faster-with-avx-512/ (accessed on 1 March 2022).
  23. Xiang, J. Available online: https://github.com/xjb714/xjb/tree/main/bench/schubfach_xjb (accessed on 1 April 2026).
  24. Zverovich, V. Available online: https://github.com/fmtlib/fmt (accessed on 1 October 2025).
  25. Neri, C. Available online: https://github.com/cassioneri/teju_jagua (accessed on 1 November 2025).
  26. Leng, J. Available online: https://github.com/lengjingzju/json/jnum.c (accessed on 1 November 2025).
Figure 1. Benchmark results for random and fixed-length double-precision numbers (excluding NaN and Inf).
Figure 1. Benchmark results for random and fixed-length double-precision numbers (excluding NaN and Inf).
Computers 15 00280 g001
Figure 2. Benchmark results for random float-precision numbers (excluding NaN and Inf).
Figure 2. Benchmark results for random float-precision numbers (excluding NaN and Inf).
Computers 15 00280 g002
Table 1. Explanation of special symbols in this article.
Table 1. Explanation of special symbols in this article.
SymbolBrief ExplanationExample
%Integer modulus operation2 = 8%3
//Integer division operation1 = 5//3
< < or > > Left or right shift of binary values8 = 1 < < 3
? :Similar to the ternary operator in C syntaxa = 1?a:b
Table 2. Valid ranges for significand c and exponent q.
Table 2. Valid ranges for significand c and exponent q.
CategoryFloat (Binary32)Double (Binary64)
Subnormal 1 c 2 23 1 , q = 149 1 c 2 52 1 , q = 1074
Normal 2 23 + 1 c 2 24 1 2 52 + 1 c 2 53 1
148 q 104 1073 q 971
Irregular c = 2 23 , 149 q 104 c = 2 52 , 1074 q 971
Table 3. Mapping between identified limitations and xjb’s solutions.
Table 3. Mapping between identified limitations and xjb’s solutions.
Identified LimitationCorresponding Solution in xjb
Frequent branch mispredictionsBranchless programming for core decision logic (Section 3.6, Section 3.7 and Section 3.10)
High-precision multiplication overheadMinimized multiplication count via lookup-table restructuring (Section 3.4, Section 3.5 and Section 3.6)
Long instruction dependency chainsRestructured computation flow to expose instruction-level parallelism (Section 3.10)
Limited SIMD utilizationSIMD-optimized ASCII generation for decimal-to-string stage (Section 3.10, Table 5)
Table 4. Examples of floating-point number printing results.
Table 4. Examples of floating-point number printing results.
Float NumberFixed-PointScientific
2.34“2.34”“2.34”
12“12.0”“1.2”
120“120.0”“1.2 × 102
0.012“0.012”“1.2 × 10−2
Table 5. SIMD implementations of d e c _ t o _ a s c i i 8 and d e c _ t o _ a s c i i 16 .
Table 5. SIMD implementations of d e c _ t o _ a s c i i 8 and d e c _ t o _ a s c i i 16 .
SIMD ImplementationDescription
NEON [20]Original author: Dougall Johnson. Runs on ARM processors with NEON instruction set.
SSE2 [21]Based on scalar version; requires only SSE2 instruction set.
SSE4.1Nearly identical to SSE2 implementation; requires SSE4.1 instruction set.
AVX512 [22]Original author: Daniel Lemire. Requires AVX512IFMA and AVX512VBMI instruction sets.
Table 6. Printing formats for different ranges.
Table 6. Printing formats for different ranges.
TypeFixed-PointScientific
Float [ 10 3 , 10 7 ) Other ranges
Double [ 10 4 , 10 16 ) Other ranges
Table 7. All algorithms in the benchmark test.
Table 7. All algorithms in the benchmark test.
AlgorithmFloatDoubleDescription: Author and Source Code
Schubfach [8]Schubfach32Schubfach64Raffaello Giulietti, https://github.com/abolz/Drachennest/tree/master/src (accessed on 4 December 2025).
Schubfach_xjb [23]Schubfach32_xjbSchubfach64_xjbThe computation flow in the Schubfach source code has been modified by me, without altering the original output results, https://github.com/xjb714/xjb/tree/main/bench/schubfach_xjb (accessed on 4 December 2025).
Ryū [6,7]Ryū32Ryū64Ulf Adams, https://github.com/ulfjack/ryu (accessed on 4 December 2025).
Dragonbox [10]Dragonbox32Dragonbox64Junekey Jeon, https://github.com/jk-jeon/Dragonbox (accessed on 4 December 2025).
fmt [24]fmt32fmt64Victor Zverovich, https://github.com/fmtlib/fmt version:12.1.0 (accessed on 4 December 2025)
yy_double [11]-yy_doubleGuo YaoYuan, https://github.com/ibireme/c_numconv_benchmark/blob/master/vendor/yy_double/yy_double.c (accessed on 4 December 2025).
yy_json [12]yy_json32yy_json64Guo YaoYuan, https://github.com/ibireme/yyjson version:0.12.0 (accessed on 4 December 2025)
teju_jagua [25]teju32teju64Cassio Neri, https://github.com/cassioneri/teju_jagua (accessed on 4 December 2025).
xjbxjb32xjb64This paper, https://github.com/xjb714/xjb (accessed on 4 December 2025).
zmij [16]zmij32zmij64Victor Zverovich, https://github.com/vitaut/zmij (accessed on 8 April 2026).
jnum [26]jnum32jnum64Jing Leng, https://github.com/lengjingzju/json/jnum.c (accessed on 4 December 2025).
uscalec [13]-uscalecRuss Cox, https://github.com/rsc/fpfmt commit 6255750 (accessed on 19 January 2026).
Table 8. Float/double-to-decimal benchmark results (time in nanoseconds).
Table 8. Float/double-to-decimal benchmark results (time in nanoseconds).
AlgorithmAMD R7-7840HApple M1Apple M5
Icpx 2025.0.4Apple Clang 21.0.0Apple Clang 21.0.0
FloatDoubleFloatDoubleFloatDouble
Schubfach12.2011.5111.6413.127.597.71
Schubfach_xjb4.446.335.166.583.153.75
Ryū14.0213.0815.7514.1610.239.50
Dragonbox10.1910.0511.7812.037.567.39
yy_json4.675.723.974.462.402.72
yy_double5.244.082.71
teju_jagua14.9914.3720.2518.6613.4912.71
zmij4.764.784.113.832.822.14
uscalec11.2715.269.61
xjb2.243.762.152.581.441.55
Note that different algorithms may produce semantically equivalent but syntactically varying decimal outputs. For example, some algorithms (including Dragonbox, uscale, and Ryū) omit trailing zeros in their output representations. Although the results were not consistent, the real values represented by all of the algorithms’ outputs all met the SW principle.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiang, J.; Wang, T. xjb: Fast Float to String Algorithm. Computers 2026, 15, 280. https://doi.org/10.3390/computers15050280

AMA Style

Xiang J, Wang T. xjb: Fast Float to String Algorithm. Computers. 2026; 15(5):280. https://doi.org/10.3390/computers15050280

Chicago/Turabian Style

Xiang, Junbo, and Tiejun Wang. 2026. "xjb: Fast Float to String Algorithm" Computers 15, no. 5: 280. https://doi.org/10.3390/computers15050280

APA Style

Xiang, J., & Wang, T. (2026). xjb: Fast Float to String Algorithm. Computers, 15(5), 280. https://doi.org/10.3390/computers15050280

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop