1. Introduction
Floating-point arithmetic underpins modern digital computation, supporting platforms ranging from specialized Application-Specific Integrated Circuits (ASICs) and reconfigurable Field-Programmable Gate Arrays (FPGAs) to high-performance CPUs and GPUs, all of which now address performance and efficiency demands beyond the capabilities of traditional processors [
1,
2,
3]. As embedded systems, IoT devices, and edge-computing platforms increasingly require high-performance and high-dynamic-range signal processing, floating-point operations—such as division, inverse square root, square root, exponentiation, and logarithms—have become indispensable across scientific computing, signal processing, control, graphics, and machine learning [
4,
5,
6]. In these domains, integer arithmetic cannot provide the necessary dynamic-range performance, yet implementing floating-point operations efficiently remains challenging, particularly under constraints on area, power, and latency [
7,
8,
9,
10,
11].
To overcome these limitations, numerous approximation strategies have been explored. Among them, Taylor-series-based approximation has emerged as a compelling approach for implementing complex mathematical functions in hardware [
12,
13]. A Taylor series constructs a polynomial determined by the derivatives of a function at a chosen expansion point [
14], and truncating the series enables designers to balance accuracy and computational cost [
15,
16]. Its polynomial structure is especially attractive for hardware realization, as it relies only on multiplication and addition—operations well supported in CPUs, GPUs, and custom hardware [
17,
18,
19].
However, directly applying the Taylor series to floating-point functions introduces challenges. Convergence behavior varies widely across the function domain [
20,
21]; for example, the Taylor expansion of ln(x) converges rapidly near the expansion point but poorly elsewhere [
22,
23]. Techniques such as interval partitioning and interpolation [
24], exponent decomposition [
25], and LUT-optimized or CORDIC-based methods [
26,
27] have been proposed to improve convergence and reduce hardware cost.
This review summarizes recent advances in floating-point algorithms for division, inverse square root, square root, exponentiation, and logarithmic functions developed using our Taylor-series-based approach with mantissa-region division [
13,
28,
29,
30,
31]. We further clarify the common structural principles underlying these techniques and examine the limits of their applicability.
After this introduction, the discussion unfolds in several stages. We begin by examining and reviewing prior works in floating-point computing arithmetic in
Section 2. Building on this foundation,
Section 3 clarifies the role of the proposed Taylor series expansion method with mantissa-region division, while
Section 4 presents the key theoretical and technical foundations that support our approach, including the Taylor series expansion, floating-point number representation, and the mantissa-division technique used in the expansion.
Section 5,
Section 6,
Section 7,
Section 8 and
Section 9 present floating-point computation algorithms based on Taylor series expansion with mantissa division, covering division, inverse square root, square root, exponentiation, and logarithm calculations, respectively; these descriptions appear repeatedly across the sections, indicating that this method is shared among these arithmetic algorithms.
Section 10 examines the characteristics of the Taylor series expansion with the mantissa region division method and compares it with other floating-point computation techniques. Finally,
Section 11 concludes the paper.
We emphasize that although many studies have investigated polynomial approximation combined with region division for efficient floating-point computation, to the best of our knowledge, research on Taylor series expansion with mantissa-based region division has been reported only by our group. The purpose of this review is two-fold: (i) to clarify and consolidate the defining characteristics of this method, and (ii) to review various floating-point computation methods in order to situate our method within the broader context. While the discussion is broad in scope, this paper consolidates and systematizes the authors’ method within a broader contextual discussion, rather than presenting a fully systematic review of the entire state of the art in floating-point arithmetic algorithms.
The referenced papers for the purpose (i) are based on our previous publications and additional considerations. On the other hand, the ones for the purpose (ii) were collected from major scholarly databases, including IEEE Xplore, the ACM Digital Library, and ScienceDirect, while the reference books were selected from reputable publishers such as MIT Press, A K Peters/CRC Press, and Springer Nature. Our survey primarily focuses on recent advances from 2019 to 2025, encompassing both classical approaches and emerging trends. Each selected work satisfies at least one of the following criteria:
- -
Proposes novel algorithms for floating-point function evaluation.
- -
Improves accuracy, performance, or hardware efficiency.
- -
Introduces new numerical formats or rounding techniques.
This structured methodology enables us to present a balanced and comprehensive overview of the current state of the art. Here, the floating-point computation methods are compared in terms of computational complexity, hardware cost, speed, and accuracy.
2. Various Floating-Point Calculation Methods
A wide range of techniques has been proposed to accelerate floating-point function evaluation, including polynomial approximations, lookup-table (LUT)-based schemes, and iterative algorithms [
21,
32,
33,
34,
35]. Traditional floating-point units rely on well-established approaches such as digit-recurrence algorithms—for example, Sweeney–Robertson–Tocher (SRT) division and square root [
36]—and iterative refinement methods such as Newton–Raphson and Goldschmidt iterations for division, square root, and reciprocal square root [
37,
38,
39]. Although these methods provide high numerical precision, they typically require multiple computational cycles and complex control logic, which limits their suitability for high-speed or resource-constrained systems, particularly in modern energy-efficient accelerators [
40,
41].
Digit-recurrence (SRT) and CORDIC-based algorithms have the advantage of eliminating multipliers in hardware implementations, significantly reducing area and power consumption. This makes them attractive for embedded systems, though their control circuitry is often intricate and their multi-cycle latency restricts applicability in high-performance scientific computing [
42]. In contrast, polynomial-approximation methods—such as Taylor series expansions and Chebyshev (min–max) polynomials—can exploit the abundant floating-point adders, multipliers, and fused multiply–add (FMA) units available in modern CPUs and GPUs, enabling highly parallel and pipelined execution.
The Newton–Raphson method remains appealing for embedded applications despite requiring a multiplier, as it offers rapid convergence when supplied with a sufficiently accurate initial estimate. Recent research has therefore focused on generating high-quality initial values to reduce iteration count and latency [
43].
Polynomial-based approaches provide an alternative to iterative or digit-wise computation by replacing them with direct polynomial evaluation [
12,
32]. Using Horner’s rule [
44] or related schemes, polynomial evaluation can be mapped efficiently onto simple combinational circuits or low-latency pipelines [
14,
18]. Recent studies show that polynomial approximations can substantially reduce latency and hardware complexity in floating-point units and application-specific accelerators [
45,
46], making Taylor-series-based methods particularly attractive for embedded processors and hardware accelerators.
In recent years, polynomial-approximation-based floating-point algorithms have gained significant attention due to their favorable balance among accuracy, latency, and hardware efficiency in both general-purpose and application-specific processors [
42,
47,
48,
49]. These works emphasize optimizing trade-offs among arithmetic operations (additions, subtractions, multiplications) and LUT size, which is crucial for minimizing area and power consumption while maintaining adequate numerical accuracy in modern floating-point units [
50,
51]. Such algorithms aim to deliver compact, low-power solutions suitable for embedded and portable systems with limited hardware resources, while also scaling effectively to high-performance CPUs and GPUs that rely on large numbers of floating-point operators and high memory bandwidth [
51].
Chebyshev polynomial (min–max) approximations can achieve a given accuracy with fewer mantissa segments and lower polynomial order than Taylor expansions [
52]. However, their coefficient design is more involved; for example, any change in the number of interval partitions requires recomputing the coefficients.
Table 1 summarizes the characteristics and trade-offs of the major floating-point computation methods discussed above.
3. Taylor Series Expansion Method in Floating-Point Arithmetic
3.1. Motivation for Taylor-Series-Based Floating-Point Arithmetic
Recent studies have advanced the use of Taylor series expansion in floating-point arithmetic to achieve high precision while reducing hardware resource requirements. When applying Taylor-series-based methods to high-precision floating-point computation, designers must carefully balance several competing factors: computational accuracy (minimizing the number of polynomial terms), hardware efficiency (reducing LUT size), and implementation complexity. This review analyzes these design trade-offs and introduces a divide-and-conquer implementation strategy that enhances the efficiency of Taylor-series-based floating-point approximation.
A direct application of the Taylor series is limited by its convergence behavior, which depends strongly on the input domain [
22]. For instance, a Taylor expansion centered at
x = 1.0 yields an excellent approximation of ln(x) only when
x remains close to the expansion point, and its accuracy deteriorates rapidly as
x deviates from that region [
23]. To address this issue, domain segmentation and range-reduction techniques have been explored to maintain accuracy across the full input range while keeping computational cost low [
53]. Hence, we present a quantitative analysis of these techniques across several floating-point algorithmic computations, including the required LUT size, and asses its applicability and limitations.
3.2. Mantissa Region Division Technique
To extend the applicability of the Taylor series to floating-point arithmetic, mantissa region division methods are described in this review. This technique divides the mantissa domain into smaller subregions where the approximation error remains within acceptable bounds using low-order polynomials.
3.3. Balancing LUT Size and Arithmetic Complexity
A key advantage of Taylor-series-based methods is their potential to reduce reliance on large LUTs. Traditional LUT-based methods for transcendental functions often consume significant memory, particularly when high precision is required across a broad domain. By using domain segmentation and low-order polynomials, Taylor-series-based designs can achieve similar precision with smaller LUTs that only store coefficients or domain-specific parameters. Also note that the control circuitry for LUT addressing is highly straightforward and requires only minimal logic.
At the same time, designers must carefully balance LUT size with the number of arithmetic operations required. A lower-order Taylor polynomial reduces multiplications but may require more domain partitions (and thus more LUT entries). Conversely, a higher-order polynomial reduces the need for domain segmentation but increases the number of arithmetic operations. Exploring these trade-offs is essential for achieving optimal designs, especially in power- and area-constrained environments.
Among these strategies, Taylor series expansion has emerged as a promising approach for approximating complex mathematical functions in floating-point computation [
12,
13]. At the heart of local function approximation lies the Taylor series, which constructs an infinite polynomial whose coefficients are determined by the function’s derivatives at a chosen expansion point [
14]. By truncating the series after a limited number of terms, designers can approximate functions with a controlled balance between accuracy and computational effort [
15,
16]. The elegance of the Taylor series lies in its polynomial form, enabling efficient realization using elementary basic calculations such as multiplication and addition, which are well-supported in hardware design such as CPUs and GPUs [
17,
18,
19].
Despite the apparent simplicity of Taylor series-based approximations, applying them directly to floating-point functions presents several challenges. The convergence of a Taylor series can vary significantly across the domain of the target function [
20,
21]. For instance, a Taylor series approximation of the natural logarithm, ln(x), converges quickly near the expansion point but poorly at values further away [
22,
23]. To address this, methods such as interpolation based on Taylor series approximation can be used to divide the period into multiple intervals [
24]. Additionally, exponential operations can be performed, and an integer part can be added for floating-point arithmetic [
25]. These techniques transform or partition the input domain into smaller subregions where the Taylor expansion achieves faster convergence with fewer terms. For LUT optimization, some researchers employ deep neural networks and CORDIC-based integer arithmetic methods, which reduce the number of required arithmetic operations without multiplication and minimize the size of auxiliary lookup LUTs needed for storing coefficients or intermediate values [
26,
27].
3.4. Comparative Analysis and Hardware Trade-Offs
The reviewed algorithms demonstrate that Taylor series expansion, when combined with smart domain partitioning and LUT optimization, can achieve high precision with reduced hardware complexity. The primary trade-offs include the following:
LUT size vs. arithmetic operations: Larger LUTs reduce polynomial order but increase memory usage.
Number of segments vs. approximation error: Finer segmentation improves accuracy but requires more comparators and control logic.
Multiplier complexity vs. latency: Higher-order polynomials need more multipliers but may reduce iteration counts.
These insights are crucial for hardware designers aiming to implement energy-efficient floating-point units in ASICs or FPGAs.
Also, the consolidation of this method for various types of floating-point arithmetic is discussed.
4. Preparation for Taylor Series Expansion with Mantissa Region Division
4.1. Taylor Series Expansion
Consider a function
that is infinitely differentiable (smooth) on its domain. Its Taylor series expansion about the point
x =
a is given by the following:
A classic illustration is the Taylor series expansions of
and
centered at
, which converge to their respective functions for all
(i.e.,
) as
increases (
Figure 1).
In this paper, we review digital floating-point arithmetic algorithms used to compute division, inverse square roots, square roots, exponential functions, and logarithms, based on Taylor series expansion.
4.2. Representation of Floating-Point Numbers
Consider the binary representation of floating-point numbers.
Mantissa: M
Exponent: E
Binary representation: ×
Mantissa (Here… is 0 or 1)
Note that the binary point is placed such that 1 ≤
M < 2. Following the IEEE-754 specification (
Figure 2), the encoding of a floating-point number involves the assignment of three distinct binary segments: a single sign bit (
), a biased exponent (
), and a normalized mantissa (
).
The IEEE 754 standard provides multiple precision formats—including 16-bit (half), 32-bit (single), and 64-bit (double) [
1]. To simplify the forthcoming derivation, we limit it to floating-point numbers with a positive sign bit. In this context, a given number
is primarily characterized by two fields: the exponent
and the mantissa
. Their fundamental relationship is captured by the following equation:
In this context, the floating-point mantissa is expressed as M = 1. …, where , etc., are binary digits (each either 0 or 1).
4.3. Mantissa Division Method for Taylor Series Expansion
Building on the floating-point representation outlined earlier, this subsection details our uniform mantissa segmentation technique for efficient Taylor series expansion. We demonstrate that increasing segmentation reduces the required number of series terms for a target accuracy.
The efficacy of the proposed segmentation technique was evaluated through numerical simulations designed to ascertain the necessary Taylor series expansion terms
for
across a spectrum of precision requirements and evaluation domains, while adhering to the specified conditions:
Here, represents the ideal value of functions such as the reciprocal, inverse square root, square root, exponentiation, or logarithm. denotes the Taylor expansion approximation using n terms, and indicates the target accuracy for all x within the specified region.
4.3.1. No Mantissa Region Division for Taylor Series Expansion
There is no division in the region 1 ≤ x < 2. The central value
a is defined as 1.5, as shown in
Table 2.
4.3.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2
The mantissa domain 1 ≤ x < 2 is uniformly segmented into two sub-intervals. Based on the value of
, a specific sub-interval is selected (as shown in
Table 3), and the function is approximated via a Taylor series expansion about its midpoint
a.
4.3.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2
The mantissa domain 1 ≤ x < 2 is uniformly segmented into four sub-intervals. Based on the value of
, a specific sub-interval is selected (as shown in
Table 4), and the function is approximated via a Taylor series expansion about its midpoint
a.
4.3.4. Eight Sub-Intervals of the Mantissa Region 1 ≤ x < 2
The mantissa domain 1 ≤ x < 2 is uniformly segmented into eight sub-intervals. Based on the value of
, a specific sub-interval is selected (as shown in
Table 5), and the function is approximated via a Taylor series expansion about its midpoint a.
In the subsequent sections, we apply the above segmentation method to division, inverse square root, square root, exponential, and logarithmic algorithms using Taylor series expansion, in order to determine the minimum number of expansion terms required to achieve a given precision. Moreover, we emphasize the trade-offs involved in the number of multiplications and additions/subtractions, as well as the size of the LUTs required in hardware. These factors are analyzed to identify the most suitable algorithms for engineering applications and to support their practical implementation.
5. Division Algorithm Using Taylor Series Expansion
5.1. Introduction
This section introduces floating-point division algorithms using the Taylor series expansion with uniform mantissa division [
28].
Typical applications of floating-point division in digital processors are as follows: (i) Computer Graphics (Rendering [
54]): Division is used in pixel color calculations and light reflection models, where ratios and normalization require it. (ii) Digital Signal Processing (DSP): Audio and image filtering often involve normalization coefficients or gain adjustments that rely on division. (iii) Machine Learning and AI: Neural network training uses division for learning rate adjustments and normalization, especially in batch normalization and probability distribution calculations. (iv) Physics Simulations (Games and Scientific Computing): Fundamental formulas like velocity = distance ÷ time or density = mass ÷ volume depend on division. (v) Financial Calculations: Ratios such as exchange rates or interest rates require division. Example: profit margin = profit ÷ investment. (vi) Image Processing (Computer Vision): Histogram normalization and brightness correction often divide pixel values by maximum or average values. (vii) Numerical Analysis in Engineering: Division is essential in linear algebra (matrix inversion, Gaussian elimination) and in simulations like finite element methods or fluid dynamics.
In short, floating-point division is indispensable whenever ratios, normalization, or distribution calculations are needed—spanning graphics, AI, physics, finance, vision, and engineering.
5.2. Problem Formulation
We define a division algorithm to compute
, where
,
, and
are binary floating-point numbers with the following representations:
This design decomposes the division into a reciprocal evaluation followed by a standard multiplication, thereby leveraging the efficiency of optimized digital multipliers to complete the operation .
Since 1/D = , calculating the reciprocal of the mantissa () is the key step in obtaining 1/D. Therefore, to realize the division algorithm outlined above, the key subproblem of computing is addressed by our novel algorithm. This algorithm employs a uniform mantissa division technique to enable an efficient and accurate Taylor series approximation of to a specified precision.
5.3. Reciprocal Calculation by Taylor Series Expansion
Let us consider using the Taylor series expansion of
centered at
x =
a (1 ≤
a < 2) to compute
(1 ≤
< 2) and determine the minimum number of expansion terms required to achieve the desired accuracy. The Taylor expansion of
around
x = a is given as follows:
Here, .
It is noteworthy that all coefficients for in the expansion are either +1 or −1. This structure simplifies the arithmetic logic by replacing multiplication with straightforward sign manipulation, thereby enhancing computational efficiency.
5.4. Numerical Simulation Results
In this subsection, simulations are employed to determine the number of terms (n) required for the Taylor series expansion of under various conditions of given accuracies and regions. The Taylor expansion is then performed at the center point a of the selected region. Based on numerical simulations, we have determined the minimum number of expansion terms n needed to achieve various accuracy levels.
5.4.1. No Mantissa Division Within the Range of 1 ≤ x < 2 (Table 2)
Based on numerical simulations, the minimum required expansion order
to achieve various accuracy levels is summarized in
Table 6.
5.4.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals:
. First, a specific region is selected using the
(see
Table 3). The efficacy of this strategy is confirmed by the simulation data presented in
Table 7.
5.4.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into four sub-intervals:
. First, a specific region is selected using the
(see
Table 4). The efficacy of this strategy is confirmed by the simulation data presented in
Table 8.
5.4.4. Eight Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we examine a method based on dividing the domain into eight sub-intervals:
. First, a specific region is selected using the
(see
Table 5). The efficacy of this strategy is confirmed by the simulation data presented in
Table 9.
5.5. Considerations for Hardware Implementation
The complexity is quantified by determining the total count and breakdown of fundamental arithmetic operations (such as multiplications, additions, subtractions, and comparisons) required for each Taylor series expansion computation.
5.5.1. Required Arithmetic Operations for Taylor Series Evaluation
Case Study 1: Taylor Series Expansion with Terms. Here, an expansion term of is used to demonstrate the trade-offs of the algorithm.
We will compute the following expression using a combined approach of the basic algorithm and a LUT.
For constant
and variable
, let the LUT store the precomputed value
. Introducing the auxiliary variables
and
, and substituting them into Equation (8) yields:
Equation (9) implies that a five-term Taylor series expansion of necessitates exactly 4 multiplications and four addition/subtraction operations.
Case Study 2: Taylor Series Expansion with Terms. Here, an expansion term of is used to demonstrate the trade-offs of the algorithm.
By applying the same operational steps detailed in Case Study 1, we derive Equations (10) and (11):
Equation (9) implies that a seven-term Taylor series expansion of
necessitates exactly five multiplications and five addition/subtraction operations. Applying the methods from Case Studies 1 and 2 to compute 1/x with different numbers of Taylor series terms yields
Table 10.
A comparison of
Table 9 and
Table 10 indicates that, to achieve a 20-bit precision, the required operation counts are four multiplications and four additions/subtractions.
5.5.2. LUT Contents and Size
The value of 1/
a is precomputed and stored in memory for later use during the Taylor series calculation. A four-segment division necessitates a compact four-word LUT (
Table 11). During operation, the address bits
derived from
(where
) are used to index and fetch the corresponding precomputed value. It should be noted that LUT addressing is straightforward since an address decoder is not needed.
5.6. Summary of Division Algorithm Part
This section has described a floating-point division algorithm that utilizes the Taylor series expansion of and is further enhanced by a mantissa division technique. The design investigates critical hardware trade-offs: achieving higher division accuracy while reducing the count of arithmetic operations (multiplications and additions/subtractions) inevitably demands a larger LUT, and vice versa. Numerical calculation results demonstrate that refining the granularity of mantissa division reduces the required number of arithmetic operations at the expense of increased LUT memory overhead. This offers a tunable design parameter to meet diverse precision and resource requirements in digital division applications. Also, note the following:
- (i)
The coefficients of the Taylor series expansion of are +1 or −1, which reduces the number of multiplications.
- (ii)
The LUT-address control logic is straightforward, as it directly maps the upper mantissa bits to the address.
6. Inverse Square Root Algorithm Using Taylor Series Expansion
6.1. Introduction
This section introduces floating-point inverse square root algorithms using the Taylor series expansion with mantissa region division [
29].
Typical applications of floating-point inverse square root are as follows: (i) Computer Graphics and 3D Rendering: It is used for vector normalization (e.g., calculating unit vectors for lighting, shading, and camera transformations). (ii) Physics Simulations in Games: It is essential for computing distances and normalizing velocity vectors quickly, improving real-time performance. (iii) Machine Learning and AI: It appears in optimization algorithms and normalization steps where inverse square roots stabilize variance scaling. (iv) Robotics and Control Systems: It is used in orientation calculations (e.g., quaternion normalization [
55]) for stable motion control. (v) DSP: It is applied in Root Mean Square (RMS) normalization and energy scaling, where inverse square roots help adjust signal magnitudes efficiently. (vi) Computer Vision and Image Processing: It is used in feature extraction and normalization of gradient vectors (e.g., in edge detection or Scale-Invariant Feature Transform (SIFT) descriptor [
56]). (vii) Scientific and Engineering Computations: It appears in numerical methods requiring normalized vectors, such as finite element analysis, fluid dynamics, and electromagnetics.
In short, the inverse square root is a performance-critical shortcut for normalization tasks across graphics, physics, AI, robotics, DSP, vision, and engineering.
6.2. Representation and Computation of Floating-Point Inverse Square Roots
Building upon the preceding division framework, we extend the methodology to calculate the inverse square root
of a binary floating-point number.
To handle the square root in the exponent for
, we must consider the parity of exponent
. For the even case (
), the derivation simplifies to the following form:
For the even-exponent case (
), this process isolates the exponent part as
and the mantissa part as
. Applying floating-point normalization to the mantissa then yields the following form:
Within the domain
, we have
and thus
, yielding the following form:
In this representation, denotes the mantissa, and denotes the exponent of the inverse square root result .
For the odd case (
), a similar derivation yields the following equation:
Normalizing the expression
to a standard floating-point format results in the following equation:
Given
, we have
. This implies
, and
, we obtain the following equation:
In this representation, denotes the mantissa, and denotes the exponent of the inverse square root result .
6.3. Taylor Series Expansion of Inverse Square Root
The Taylor series expansion of
about the point
is given by the following equation:
6.4. Number Simulation Results
Now we discuss how to achieve the specified precision by employing the minimal number of Taylor series expansion terms required for computation under the given accuracy constraints. The Taylor expansion is then performed at the center point a of the selected region. Based on numerical simulations, we have determined the minimum number of expansion terms n needed to achieve various accuracy levels.
6.4.1. No Mantissa Division Within the Range of 1 ≤ x < 2 (Table 2)
Based on numerical simulations, the minimum required expansion order
to achieve various accuracy levels is summarized in
Table 12.
6.4.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals:
. First, a specific region is selected using the
(see
Table 3). The efficacy of this strategy is confirmed by the simulation data presented in
Table 13.
6.4.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into four sub-intervals:
. First, a specific region is selected using the
(see
Table 4). The efficacy of this strategy is confirmed by the simulation data presented in
Table 14.
6.4.4. Eight Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into eight sub-intervals:
. First, a specific region is selected using the
(see
Table 5). The efficacy of this strategy is confirmed by the simulation data presented in
Table 15.
6.5. Considerations for Hardware Implementation
This section analyzes the hardware implementation complexity of the inverse square root algorithm . The analysis quantifies the arithmetic cost—specifically the counts of multiplications, additions, and subtractions—across different implementation scenarios based on Taylor series approximations.
As a concrete illustration, consider the case of a five-term Taylor series expansion () for about a chosen center point . The resulting expression and its associated operation count are as follows:
- (1)
For the even case (), the derivation simplifies to the following form:
Let,
and
.
where
.
- (2)
For the odd case (), the derivation simplifies to the following form:
Let, and .
Equation (22) is explicitly defined with the coefficients
for
, where
is a variable and
a constant. All necessary coefficients
and
are precomputed and stored in a LUT, from which they are fetched during runtime computation.
Subsequently, let
, we obtain the following equation:
Equation (23) implies that a five-term Taylor series expansion of
necessitates exactly five multiplications and five addition/subtraction operations. Applying the same methodology to compute
with different numbers of Taylor series terms yields
Table 16.
Table 15 and
Table 16 show that by dividing the inverse square root domain into eight regions, the reciprocal of the mantissa can be computed with 24-bit precision using only five multiplications and five additions/subtractions.
The required LUT size scales linearly with the number of regions, given by
words. In the case of a mantissa
, the two most significant bits (
) are used directly to address the corresponding LUT entry. As an example,
Table 17 illustrates the configuration for
regions, which requires an LUT of 40 words. Note that the addressing scheme is straightforward and does not require a dedicated address decoder.
6.6. Summary of Inverter Square Root Algorithm Part
This section has reviewed inverse square root algorithms utilizing region-segmented Taylor series expansions, quantifying the implementation trade-offs regarding accuracy, arithmetic complexity, and memory footprint. This framework thereby offers adaptable support for meeting varied precision and resource constraints. Thus, the insights presented herein equip designers to conceptualize and implement specialized hardware for efficient inverse square root calculation.
7. Square Root Algorithm Using Taylor Series Expansion
7.1. Introduction
This section introduces floating-point square root algorithms using Taylor series expansion with uniform mantissa division [
30].
Typical applications of floating-point square root operations are as follows: (i) Computer Graphics and 3D Rendering: Square roots are used in vector normalization (e.g., calculating unit vectors for lighting and shading). (ii) DSP: In audio and image processing, square roots appear in RMS calculations for measuring signal strength. (iii) Machine Learning and AI: Algorithms like gradient descent often use square roots in optimization methods (e.g., Root Mean Square Propagation (RMSProp), Adam optimizers [
57]). (iv) Physics Simulations and Games: Square roots are needed for distance calculations (e.g., Euclidean distance in 3D space) and solving equations of motion. (v) Statistics and Data Analysis: Standard deviation and variance calculations require square roots to measure data spread. (vi) Engineering and Scientific Computing: Square roots are essential in solving quadratic equations, wave equations, and numerical methods in structural or fluid analysis. (vii) Cryptography and Security: Certain algorithms (e.g., modular arithmetic, elliptic curve cryptography [
58]) involve square root operations in finite fields.
In short, square root operations are indispensable in geometry, optimization, signal measurement, statistical analysis, and cryptography—making them one of the most widely used floating-point functions in digital processors.
7.2. Problem Formulation
A floating-point number is typically composed of a mantissa, a sign bit, and an exponent. In this section, we focus on cases where the mantissa is positive. For clarity, we denote the binary mantissa field, the exponent field, and the overall binary floating-point representation as
M,
E, and
X, respectively, as follows:
In Equation (24), the mantissa field is represented as M = 1. …, where each of …, etc., is either one or 0. For a normalized binary floating-point representation, the mantissa satisfies .
The square root computation in binary floating-point arithmetic can be formulated as follows:
To handle the square root in the exponent for
, we must consider the parity of exponent
. For the even case (
), the derivation simplifies to the following form:
This computation yields a square root representation partitioned into an integer exponent part
and a fractional mantissa part
. By normalizing
, we obtain the following expression:
Within the domain
, we have
and thus
, yielding the following form:
We obtain the exponent part k and the mantissa part of the final expansion of Sr.
When the exponent E of a floating-point number is odd, let E = 2k + 1. It can then be expressed as follows:
In this representation, denotes the mantissa, and denotes the exponent of the square root result .
For the odd case (
), a similar derivation yields the following expression:
Let
be used to normalize the floating-point number, resulting in the following expression:
Since
, it follows that
, which implies
≤
and
Based on this, we derive the following:
In this representation, denotes the mantissa, and denotes the exponent of the inverse square root result .
7.3. Taylor Series Expansion of Square Root
The Taylor series expansion of
about the point
is given by the following equation:
7.4. Numerical Simulation Results
In the following, we examine the relationship between the function , the number of terms used in its Taylor series expansion, and the resulting approximation accuracy. The Taylor expansion is then performed at the center point a of the selected region. Based on numerical simulations, we have determined the minimum number of expansion terms n needed to achieve various accuracy levels.
7.4.1. No Mantissa Division Within the Range of 1 ≤ x < 2 (Table 2)
Based on numerical simulations, the minimum required expansion order
to achieve various accuracy levels is summarized in
Table 18.
7.4.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals:
. First, a specific region is selected using the
(see
Table 3). The efficacy of this strategy is confirmed by the simulation data presented in
Table 19.
7.4.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into 4 sub-intervals:
. First, a specific region is selected using the
(see
Table 4). The efficacy of this strategy is confirmed by the simulation data presented in
Table 20.
7.4.4. Eight Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into eight sub-intervals:
. First, a specific region is selected using the
(see
Table 5). The efficacy of this strategy is confirmed by the simulation data presented in
Table 21.
7.5. Considerations for Hardware Implementation
We examine the hardware complexity of the algorithm described here. Consider the equation
, and evaluate the number of subtractions, multiplications and additions required when using the Taylor series expansion. Consider the five-term Taylor series expansion
for
centered at
. Its expression is as follows:
where
We also define the following:
Here,
Depending on the parity of , the square root takes one of two forms:
even: With , .
odd: With , .
In Equation (34),
a is a constant and
x is a variable. The coefficients
and
are individually stored in the LUT and can be retrieved as needed. We then compute y = x − a and z = y
2, which yields the following:
The final Equation (36) is derived by algebraic transformation of
using a five-term Taylor series expansion. This equation consists of five multiplications and five subtractions/additions. Taylor series expansions of
with varying numbers of terms result in different computational requirements, and the corresponding numbers of additions, subtractions, and multiplications are summarized in
Table 22.
In the case of dividing the mantissa into four regions, and by referencing
Table 22 and
Table 23, it can be determined that the square root calculation with 20-bit precision requires five multiplications and five subtractions/additions.
For LUT size estimation, the size can be calculated as N × 8 (N: the number of partition segments), and the first and second MSBs of M = 1.αβ… are used as address inputs. For example, with a four-region division. Accordingly, a 32-word LUT is sufficient, according to the results calculated in
Table 23.
7.6. Summary of Square Root Algorithm Part
This section has described a square root algorithm for floating-point numbers that synergizes mantissa region segmentation with Taylor series expansion. It also discussed the required LUT size, the number of arithmetic operations—including multiplications, and additions/subtractions—and the trade-offs involved in achieving arithmetic accuracy. The described method enables designers to construct dedicated hardware architectures tailored to specific square root computation requirements.
8. Exponentiation Algorithm Using Taylor Series Expansion
8.1. Introduction
This section introduces floating-point exponentiation algorithms using Taylor series expansion with uniform mantissa region division [
31].
Typical applications of floating-point exponentiation are as follows: (i) Computer Graphics and Animation: Exponentiation is used in lighting models (e.g., specular highlights in the Phong reflection model [
59]) and gamma correction for color adjustment. (ii) DSP: Power functions are applied in audio compression, dynamic range control, and modeling nonlinear systems. (iii) Machine Learning and AI: Exponentiation is crucial in activation functions (e.g., softmax uses
[
57]), probability distributions, and exponential moving averages in optimizers. (iv) Physics and Engineering Simulations: Many physical laws involve powers, such as inverse-square laws (gravity, electromagnetism) or exponential decay in radioactive processes. (v) Finance and Economics: Compound interest and growth models rely on exponentiation. (vi) Statistics and Data Analysis: Exponential functions appear in probability distributions (e.g., exponential distribution, Gaussian distribution with
). (vii) Cryptography and Security: Modular exponentiation is a core operation in RSA and Diffie–Hellman key exchange [
60], making exponentiation vital for secure communications. It is also widely used in finance, statistics, and cryptography, making it one of the most computationally intensive but essential floating-point operations in digital processors.
8.2. Problem Formulation
We now turn to the floating-point algorithm for computing the exponential function (Exp), derived from its binary floating-point representation.
We consider the case where and E are zero or a positive integer. The extension to negative values of and E is straightforward. For example, = 1, 2, 4, 8, 16, 32, ⋯, for E = 0, 1, 2, 3, 4, 5, .
We observe that once exp(M) is obtained, can be derived by raising it to an integer power with E times multiplications.
8.3. Taylor Series Expansion of Exponential Function
We compute
(with
) via its Taylor series expansion about a chosen center
(where
). The expansion of
at
is as follows:
Equation (38) gives the 6th-order Taylor series expansion of the exponential function, where .
8.4. Numerical Simulation Results
In the following, we examine the relationship between the function , the number of expansion terms in its Taylor series expansion, and the resulting approximation accuracy. The Taylor expansion is then performed at the center point a of the selected region. Based on numerical simulations, we have determined the minimum number of expansion terms n needed to achieve various accuracy levels.
8.4.1. No Mantissa Division Within the Range of 1 ≤ x < 2 (Table 2)
Based on numerical simulations, the minimum required expansion order
to achieve various accuracy levels is summarized in
Table 24.
8.4.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals:
. First, a specific region is selected using the
(see
Table 3). The efficacy of this strategy is confirmed by the simulation data presented in
Table 25.
8.4.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into four sub-intervals:
. First, a specific region is selected using the
(see
Table 4). The efficacy of this strategy is confirmed by the simulation data presented in
Table 26.
Likewise, the interval 1 ≤ x < 2 may be partitioned into finer subintervals—for example, into 8, 16, 32 parts, and so forth.
8.5. Considerations for Hardware Implementation
We assess hardware complexity through the metrics calculated below. For instance, for a Taylor series with
terms, this assessment yields the specific LUT size and the number of addition, subtraction, and multiplication operations.
For constant
and variable
, let the LUT store the precomputed value
. Introducing the auxiliary variables
and
, and substituting them into Equation (39) yields the following:
The five-term Taylor expansion requires five additions/subtractions and six multiplications, achieving a precision of
(see
Table 26). This specific case exemplifies the general trend documented in
Table 27, which maps the expansion order
to the required counts of arithmetic operations (multiplications, additions, subtractions) for computing
.
LUT Construction and Addressing: The LUT is populated with precomputed values of
. As outlined in
Table 28, segmenting the mantissa region into four parts requires a corresponding four-word LUT. The same principle can be extended to any number of partitioned regions, enabling an estimate of the total LUT size. Access is direct: for a mantissa
, the address bits
point to the corresponding entry. Consequently, as the number of partitions grows, the required LUT size increases correspondingly.
8.6. Summary of Exponentiation Algorithm Part
This section analyzes how the convergence range and precision of the exponential function’s Taylor expansion are governed by two key parameters under the mantissa division technique: the number of series terms (which can be controlled by adjusting the number of partitions) and the expansion center value. It further investigates the trade-offs among LUT size, computational precision, and fundamental arithmetic operations, offering design insights that designers can leverage to develop efficient digital algorithms. Notably, this technique concentrates on the mantissa of the domain variable; as the exponent component increases, the number of multiplications required for the overall exponentiation function also grows.
9. Logarithm Algorithm Using Taylor Series Expansion
9.1. Introduction
This section introduces floating-point logarithm algorithms using the Taylor series expansion with uniform mantissa region division [
13].
Typical applications of floating-point logarithm operations are as follows: (i) Computer Graphics and Imaging: Logarithms are used in tone mapping [
61] and dynamic range compression, helping to represent brightness levels more naturally. (ii) DSP: Logarithmic scales measure sound intensity (decibels) and frequency response, making audio analysis more accurate. (iii) Machine Learning and AI: Logarithms are essential in loss functions (e.g., cross-entropy [
57]), probability calculations, and log-likelihood estimation. (vi) Data Compression: Algorithms like JPEG and MP3 use logarithmic transformations to model human perception of sound and light more efficiently. (v) Statistics and Probability: Logarithms are used in statistical models, especially for log-normal distributions and hypothesis testing. (vi) Finance and Economics: Logarithmic returns are widely used to measure investment performance and volatility over time. (vii) Scientific and Engineering Computations: Logarithms appear in exponential growth/decay models, chemical reaction rates, and complexity analysis in algorithms.
In short, logarithm operations are indispensable in graphics, DSP, AI, compression, statistics, finance, and scientific modeling, making them one of the most versatile floating-point functions in digital processors.
9.2. Problem Formulation
We now turn to the floating-point algorithm for computing the exponential function (
), derived from its binary floating-point representation.
Here, x = .
We observe that once is determined, can be obtained simply by adding . Therefore, our focus is on calculating .
9.3. Taylor Series Expansion of Mantissa Part for Base-2 Logarithm
We compute
(with
) via its Taylor series expansion about a chosen center
(where
). The expansion of
at
is as follows:
Equation (42) gives the Taylor series expansion of the exponential function, where .
9.4. Numerical Simulation Results
In the following, we examine the relationship between the function a, the number of expansion terms in its Taylor series expansion, and the resulting approximation accuracy. The Taylor expansion is then performed at the center point a of the selected region. Based on numerical simulations, we have determined the minimum number of expansion terms n needed to achieve various accuracy levels.
9.4.1. No Mantissa Division Within the Range of 1 ≤ x < 2 (Table 2)
Based on numerical simulations, the minimum required expansion order
to achieve various accuracy levels is summarized in
Table 29.
9.4.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals:
. First, a specific region is selected using the
(see
Table 3). The efficacy of this strategy is confirmed by the simulation data presented in
Table 30.
9.4.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2
To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals:
. First, a specific region is selected using the
(see
Table 4). The efficacy of this strategy is confirmed by the simulation data presented in
Table 31.
9.5. Considerations for Hardware Implementation
We assess hardware complexity through the metrics calculated below. For instance, for a Taylor series with
terms, this assessment yields the specific LUT size and the number of multiplication, addition, and subtraction operations.
Here, , ,, , .
Here,
x and
a represent a variable and a constant, respectively. The coefficients
are precomputed and stored in an LUT; they are retrieved at runtime during the calculation. We then compute
p =
y =
x −
a, followed by
z = , and proceed as follows:
The five-term expansion (
) requires five additions/subtractions and five multiplications. This specific case exemplifies the general trend documented in
Table 32, which maps the expansion order
to the required counts of arithmetic operations (multiplications, additions, subtractions) for computing
.
The data in
Table 31 and
Table 32 demonstrate that a four-region partition (e.g., domains R4-3, R4-4) achieves 22-bit accuracy in computing the mantissa logarithm with a cost of seven multiplications and seven additions/subtractions. This efficiency stems from a scalable architecture where, for an
-term expansion over
regions, the LUT size is given by
words. Addressing is direct: the two most significant bits (MSBs)
of the mantissa
serve as the lookup address.
Table 33 instantiates this model for
and
, requiring a 20-word LUT. This partitioning strategy, generalizable as shown in
Table 33, delivers a key advantage: by confining the approximation to smaller sub-domains, it reduces both the required Taylor series order
for a target accuracy and, consequently, the number of runtime LUT accesses, enhancing overall efficiency.
9.6. Summary of Logarithm Algorithm Part
This work has developed and analyzed floating-point logarithm algorithms that integrate Taylor series expansion with a mantissa region division technique. Our analysis explicitly quantifies the key hardware implementation trade-offs—among simulation accuracy, arithmetic operation counts (multiplications, additions/subtractions), and LUT size—to provide a configurable design framework adaptable to diverse digital system requirements.
10. Discussion
In this section, we examine the common structural features shared by our method across several floating-point arithmetic computations, as well as the limitations of its applicability.
- (1)
We have examined floating-point calculations of functions f(x) such as in the previous sections.
(A) For
, the function
f(
x) can be expressed as follows:
The term g(M) is computed using a Taylor series expansion with the mantissa division technique, while the evaluation of h(E) is straightforward.
(B) For exp(
x), the function
f(
x) can be expressed as follows:
The term g(M) is computed using a Taylor series expansion with the mantissa division technique, whereas the effect of the exponent part is apparent.
(C) For
, the function
f(
x) can be expressed as follows:
The term g(M) is computed using a Taylor series expansion with the mantissa division technique, while the evaluation of h (E) is straightforward (h(E) = E).
We therefore consider the Taylor series expansion technique combined with mantissa-region division to be effective for functions f(x) in which the contribution of the exponent part is explicit. Accordingly, we emphasize that the applicability of the mantissa region division method to a given function f(x) depends on whether the contribution of the exponent part of the input x to f(x) can be obtained with ease. In contrast, the method is difficult to apply directly to certain functions that do not exhibit these properties; this is the limitation of the proposed method.
- (2)
It has been shown for that as the number of mantissa divisions increases, the number of additions/subtractions and multiplications decreases, while the LUT size increases. However, LUT addressing remains simple because no address decoder is required, and the additional LUT resources are negligible in some modern LSI technologies. Therefore, it is worthwhile to pursue increasing the division number and to determine the optimal division number as future work.
- (3)
Table 34 compares the required orders of the Taylor series to achieve an accuracy of 1/
for each arithmetic operation. We observe that the required orders are similar, mainly because the domain 1 ≤
x < 2 is relatively narrow. We also note that the orders for
and 1/
x are higher, as these functions vary more significantly over the interval 1 ≤
x < 2.
- (4)
Note that the LUT-address control logic is straightforward, as it directly maps the upper mantissa bits to the address. Also, observe that these algorithms share a common region-division scheme, since it is based on partitioning the mantissa interval . These characteristics make them well-suited for a unified hardware implementation.
- (5)
Chebyshev polynomial (min–max) approximations can achieve the required accuracy with coarser mantissa segments and lower polynomial order than the Taylor series expansion [
52]. However, designing their coefficients is more involved. In contrast, coefficient generation for the Taylor series expansion is straightforward, since the coefficients are obtained simply by differentiating the target function. As the number of region divisions increases, the difference in effectiveness between the Taylor series expansion and the Chebyshev polynomial (min–max) approximations becomes smaller. Moreover, when the function varies rapidly, a Taylor series expansion with fine mantissa segmentation can be advantageous, because it provides high accuracy in the vicinity of the expansion point.
11. Conclusions
This study has examined floating-point computations of fundamental functions—division, inverse square root, square root, exponentiation, and logarithm—through the application of Taylor series expansion with the mantissa region division. Their hardware implementations were also investigated: they can be realized using adders/subtractors, multipliers, and look-up tables, and a common hardware architecture can be employed by switching through simple programming. A comprehensive explanation is provided for each algorithm: inverse, inverse square root, square root, exponential, and logarithmic functions. We also discussed the applicability of this method to a given function f(x); it depends on how easily the contribution of the exponential part of x to f(x) can be computed.
These findings highlight the efficiency and versatility of the Taylor series and mantissa-region-division-based approach, offering a practical foundation for the design of high-performance arithmetic units in modern computing systems [
57,
58]; The method is particularly effective for high-performance CPUs and GPUs equipped with abundant floating-point adders, subtractors, and multipliers, as well as sufficient main memory, where polynomial-based arithmetic can be efficiently mapped onto deeply pipelined and massively parallel execution units [
19,
62].
Future work includes the following directions:
- (i)
Identifying additional functions that are well-suited to Taylor series expansion with mantissa region division, especially for transcendental and composite functions that are frequently used in signal processing, graphics, and machine learning workloads [
18].
- (ii)
Hybrid methods combining our Taylor approximation with iterative refinement.
- (iii)
Optimizing these implementations for parallel architectures and investigating their integration into domain-specific accelerators, where recent studies have shown that approximation-based arithmetic can significantly improve performance–energy efficiency trade-offs [
20].
- (iv)
Investigating their hardware-software co-design for optimized implementation. We have demonstrated that our method shares a common structure with several floating-point arithmetic computations, and that a unified co-design of these computations can be highly effective.
- (v)
Applying the Taylor series expansion with mantissa region division to floating-point computation in the AI-oriented numerical formats (posit, bfloat16, and TensorFloat32 (TF32)) [
21,
25,
63,
64,
65,
66].
- (vi)
Investigating “correct rounding” when applying the proposed method [
67].