Next Article in Journal
Satellite Resource Allocation Strategy for the Combined Scenario of Unmanned Terminals and Mobile Users
Previous Article in Journal
A Novel HOT-STA-SMC Strategy Integrated with MRAS for High-Performance Sensorless PMSM Drives
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Review of Floating-Point Arithmetic Algorithms Using Taylor Series Expansion and Mantissa Region Division Techniques

by
Jianglin Wei
1 and
Haruo Kobayashi
2,*
1
School of Electronic Information and Engineering, Yibin University, Yibin 644000, China
2
Faculty of Science and Technology, Gunma University, Kiryu 376-8515, Gunma, Japan
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(5), 1106; https://doi.org/10.3390/electronics15051106
Submission received: 2 January 2026 / Revised: 4 March 2026 / Accepted: 4 March 2026 / Published: 6 March 2026

Abstract

This paper presents a comprehensive review of digital floating-point arithmetic algorithms that utilize Taylor series expansion in combination with mantissa-region division techniques, and it further demonstrates their generalization and applicability based on the findings of our research. While the discussion is broad in scope, this paper consolidates and systematizes the authors’ method within a broader contextual discussion, rather than presenting a fully systematic review of the entire state of the art in floating-point arithmetic algorithms. In many scientific computing applications, compact and low-power hardware implementations are essential. To address these requirements, this review presents algorithms specifically designed to operate under such constraints. The focus is placed on efficient floating-point operations—including division, inverse square root, square root, exponentiation, and logarithmic functions—all realized through Taylor series expansion with mantissa region division techniques. Furthermore, the trade-offs are examined in detail, covering factors such as the required numbers of additions, subtractions, and multiplications, along with the look-up table (LUT) size. The study further identifies the environments and application domains where the Taylor series expansion method combined with mantissa-region division is most effective, based on comparisons with various other floating-point computation algorithms and their corresponding hardware implementations. Overall, the review underscores the value of this unified framework in enabling efficient and adaptable floating-point computation across a wide range of hardware-constrained environments.

1. Introduction

Floating-point arithmetic underpins modern digital computation, supporting platforms ranging from specialized Application-Specific Integrated Circuits (ASICs) and reconfigurable Field-Programmable Gate Arrays (FPGAs) to high-performance CPUs and GPUs, all of which now address performance and efficiency demands beyond the capabilities of traditional processors [1,2,3]. As embedded systems, IoT devices, and edge-computing platforms increasingly require high-performance and high-dynamic-range signal processing, floating-point operations—such as division, inverse square root, square root, exponentiation, and logarithms—have become indispensable across scientific computing, signal processing, control, graphics, and machine learning [4,5,6]. In these domains, integer arithmetic cannot provide the necessary dynamic-range performance, yet implementing floating-point operations efficiently remains challenging, particularly under constraints on area, power, and latency [7,8,9,10,11].
To overcome these limitations, numerous approximation strategies have been explored. Among them, Taylor-series-based approximation has emerged as a compelling approach for implementing complex mathematical functions in hardware [12,13]. A Taylor series constructs a polynomial determined by the derivatives of a function at a chosen expansion point [14], and truncating the series enables designers to balance accuracy and computational cost [15,16]. Its polynomial structure is especially attractive for hardware realization, as it relies only on multiplication and addition—operations well supported in CPUs, GPUs, and custom hardware [17,18,19].
However, directly applying the Taylor series to floating-point functions introduces challenges. Convergence behavior varies widely across the function domain [20,21]; for example, the Taylor expansion of ln(x) converges rapidly near the expansion point but poorly elsewhere [22,23]. Techniques such as interval partitioning and interpolation [24], exponent decomposition [25], and LUT-optimized or CORDIC-based methods [26,27] have been proposed to improve convergence and reduce hardware cost.
This review summarizes recent advances in floating-point algorithms for division, inverse square root, square root, exponentiation, and logarithmic functions developed using our Taylor-series-based approach with mantissa-region division [13,28,29,30,31]. We further clarify the common structural principles underlying these techniques and examine the limits of their applicability.
After this introduction, the discussion unfolds in several stages. We begin by examining and reviewing prior works in floating-point computing arithmetic in Section 2. Building on this foundation, Section 3 clarifies the role of the proposed Taylor series expansion method with mantissa-region division, while Section 4 presents the key theoretical and technical foundations that support our approach, including the Taylor series expansion, floating-point number representation, and the mantissa-division technique used in the expansion. Section 5, Section 6, Section 7, Section 8 and Section 9 present floating-point computation algorithms based on Taylor series expansion with mantissa division, covering division, inverse square root, square root, exponentiation, and logarithm calculations, respectively; these descriptions appear repeatedly across the sections, indicating that this method is shared among these arithmetic algorithms. Section 10 examines the characteristics of the Taylor series expansion with the mantissa region division method and compares it with other floating-point computation techniques. Finally, Section 11 concludes the paper.
We emphasize that although many studies have investigated polynomial approximation combined with region division for efficient floating-point computation, to the best of our knowledge, research on Taylor series expansion with mantissa-based region division has been reported only by our group. The purpose of this review is two-fold: (i) to clarify and consolidate the defining characteristics of this method, and (ii) to review various floating-point computation methods in order to situate our method within the broader context. While the discussion is broad in scope, this paper consolidates and systematizes the authors’ method within a broader contextual discussion, rather than presenting a fully systematic review of the entire state of the art in floating-point arithmetic algorithms.
The referenced papers for the purpose (i) are based on our previous publications and additional considerations. On the other hand, the ones for the purpose (ii) were collected from major scholarly databases, including IEEE Xplore, the ACM Digital Library, and ScienceDirect, while the reference books were selected from reputable publishers such as MIT Press, A K Peters/CRC Press, and Springer Nature. Our survey primarily focuses on recent advances from 2019 to 2025, encompassing both classical approaches and emerging trends. Each selected work satisfies at least one of the following criteria:
-
Proposes novel algorithms for floating-point function evaluation.
-
Improves accuracy, performance, or hardware efficiency.
-
Introduces new numerical formats or rounding techniques.
This structured methodology enables us to present a balanced and comprehensive overview of the current state of the art. Here, the floating-point computation methods are compared in terms of computational complexity, hardware cost, speed, and accuracy.

2. Various Floating-Point Calculation Methods

A wide range of techniques has been proposed to accelerate floating-point function evaluation, including polynomial approximations, lookup-table (LUT)-based schemes, and iterative algorithms [21,32,33,34,35]. Traditional floating-point units rely on well-established approaches such as digit-recurrence algorithms—for example, Sweeney–Robertson–Tocher (SRT) division and square root [36]—and iterative refinement methods such as Newton–Raphson and Goldschmidt iterations for division, square root, and reciprocal square root [37,38,39]. Although these methods provide high numerical precision, they typically require multiple computational cycles and complex control logic, which limits their suitability for high-speed or resource-constrained systems, particularly in modern energy-efficient accelerators [40,41].
Digit-recurrence (SRT) and CORDIC-based algorithms have the advantage of eliminating multipliers in hardware implementations, significantly reducing area and power consumption. This makes them attractive for embedded systems, though their control circuitry is often intricate and their multi-cycle latency restricts applicability in high-performance scientific computing [42]. In contrast, polynomial-approximation methods—such as Taylor series expansions and Chebyshev (min–max) polynomials—can exploit the abundant floating-point adders, multipliers, and fused multiply–add (FMA) units available in modern CPUs and GPUs, enabling highly parallel and pipelined execution.
The Newton–Raphson method remains appealing for embedded applications despite requiring a multiplier, as it offers rapid convergence when supplied with a sufficiently accurate initial estimate. Recent research has therefore focused on generating high-quality initial values to reduce iteration count and latency [43].
Polynomial-based approaches provide an alternative to iterative or digit-wise computation by replacing them with direct polynomial evaluation [12,32]. Using Horner’s rule [44] or related schemes, polynomial evaluation can be mapped efficiently onto simple combinational circuits or low-latency pipelines [14,18]. Recent studies show that polynomial approximations can substantially reduce latency and hardware complexity in floating-point units and application-specific accelerators [45,46], making Taylor-series-based methods particularly attractive for embedded processors and hardware accelerators.
In recent years, polynomial-approximation-based floating-point algorithms have gained significant attention due to their favorable balance among accuracy, latency, and hardware efficiency in both general-purpose and application-specific processors [42,47,48,49]. These works emphasize optimizing trade-offs among arithmetic operations (additions, subtractions, multiplications) and LUT size, which is crucial for minimizing area and power consumption while maintaining adequate numerical accuracy in modern floating-point units [50,51]. Such algorithms aim to deliver compact, low-power solutions suitable for embedded and portable systems with limited hardware resources, while also scaling effectively to high-performance CPUs and GPUs that rely on large numbers of floating-point operators and high memory bandwidth [51].
Chebyshev polynomial (min–max) approximations can achieve a given accuracy with fewer mantissa segments and lower polynomial order than Taylor expansions [52]. However, their coefficient design is more involved; for example, any change in the number of interval partitions requires recomputing the coefficients.
Table 1 summarizes the characteristics and trade-offs of the major floating-point computation methods discussed above.

3. Taylor Series Expansion Method in Floating-Point Arithmetic

3.1. Motivation for Taylor-Series-Based Floating-Point Arithmetic

Recent studies have advanced the use of Taylor series expansion in floating-point arithmetic to achieve high precision while reducing hardware resource requirements. When applying Taylor-series-based methods to high-precision floating-point computation, designers must carefully balance several competing factors: computational accuracy (minimizing the number of polynomial terms), hardware efficiency (reducing LUT size), and implementation complexity. This review analyzes these design trade-offs and introduces a divide-and-conquer implementation strategy that enhances the efficiency of Taylor-series-based floating-point approximation.
A direct application of the Taylor series is limited by its convergence behavior, which depends strongly on the input domain [22]. For instance, a Taylor expansion centered at x = 1.0 yields an excellent approximation of ln(x) only when x remains close to the expansion point, and its accuracy deteriorates rapidly as x deviates from that region [23]. To address this issue, domain segmentation and range-reduction techniques have been explored to maintain accuracy across the full input range while keeping computational cost low [53]. Hence, we present a quantitative analysis of these techniques across several floating-point algorithmic computations, including the required LUT size, and asses its applicability and limitations.

3.2. Mantissa Region Division Technique

To extend the applicability of the Taylor series to floating-point arithmetic, mantissa region division methods are described in this review. This technique divides the mantissa domain into smaller subregions where the approximation error remains within acceptable bounds using low-order polynomials.

3.3. Balancing LUT Size and Arithmetic Complexity

A key advantage of Taylor-series-based methods is their potential to reduce reliance on large LUTs. Traditional LUT-based methods for transcendental functions often consume significant memory, particularly when high precision is required across a broad domain. By using domain segmentation and low-order polynomials, Taylor-series-based designs can achieve similar precision with smaller LUTs that only store coefficients or domain-specific parameters. Also note that the control circuitry for LUT addressing is highly straightforward and requires only minimal logic.
At the same time, designers must carefully balance LUT size with the number of arithmetic operations required. A lower-order Taylor polynomial reduces multiplications but may require more domain partitions (and thus more LUT entries). Conversely, a higher-order polynomial reduces the need for domain segmentation but increases the number of arithmetic operations. Exploring these trade-offs is essential for achieving optimal designs, especially in power- and area-constrained environments.
Among these strategies, Taylor series expansion has emerged as a promising approach for approximating complex mathematical functions in floating-point computation [12,13]. At the heart of local function approximation lies the Taylor series, which constructs an infinite polynomial whose coefficients are determined by the function’s derivatives at a chosen expansion point [14]. By truncating the series after a limited number of terms, designers can approximate functions with a controlled balance between accuracy and computational effort [15,16]. The elegance of the Taylor series lies in its polynomial form, enabling efficient realization using elementary basic calculations such as multiplication and addition, which are well-supported in hardware design such as CPUs and GPUs [17,18,19].
Despite the apparent simplicity of Taylor series-based approximations, applying them directly to floating-point functions presents several challenges. The convergence of a Taylor series can vary significantly across the domain of the target function [20,21]. For instance, a Taylor series approximation of the natural logarithm, ln(x), converges quickly near the expansion point but poorly at values further away [22,23]. To address this, methods such as interpolation based on Taylor series approximation can be used to divide the period into multiple intervals [24]. Additionally, exponential operations can be performed, and an integer part can be added for floating-point arithmetic [25]. These techniques transform or partition the input domain into smaller subregions where the Taylor expansion achieves faster convergence with fewer terms. For LUT optimization, some researchers employ deep neural networks and CORDIC-based integer arithmetic methods, which reduce the number of required arithmetic operations without multiplication and minimize the size of auxiliary lookup LUTs needed for storing coefficients or intermediate values [26,27].

3.4. Comparative Analysis and Hardware Trade-Offs

The reviewed algorithms demonstrate that Taylor series expansion, when combined with smart domain partitioning and LUT optimization, can achieve high precision with reduced hardware complexity. The primary trade-offs include the following:
  • LUT size vs. arithmetic operations: Larger LUTs reduce polynomial order but increase memory usage.
  • Number of segments vs. approximation error: Finer segmentation improves accuracy but requires more comparators and control logic.
  • Multiplier complexity vs. latency: Higher-order polynomials need more multipliers but may reduce iteration counts.
These insights are crucial for hardware designers aiming to implement energy-efficient floating-point units in ASICs or FPGAs.
Also, the consolidation of this method for various types of floating-point arithmetic is discussed.

4. Preparation for Taylor Series Expansion with Mantissa Region Division

4.1. Taylor Series Expansion

Consider a function f ( x ) that is infinitely differentiable (smooth) on its domain. Its Taylor series expansion about the point x = a is given by the following:
f x = f a + f a x a + f a 2 ! x a 2 + + f n a n ! x a n +
A classic illustration is the Taylor series expansions of s i n ( x ) and c o s ( x ) centered at a = 0 , which converge to their respective functions for all x (i.e., < x < ) as n increases (Figure 1).
In this paper, we review digital floating-point arithmetic algorithms used to compute division, inverse square roots, square roots, exponential functions, and logarithms, based on Taylor series expansion.

4.2. Representation of Floating-Point Numbers

Consider the binary representation of floating-point numbers.
Mantissa: M
Exponent: E
Binary representation: ( 1 ) S × M   × 2 E
Mantissa M = 1 . α β γ (Here   α , β , γ … is 0 or 1)
Note that the binary point is placed such that 1 ≤ M < 2. Following the IEEE-754 specification (Figure 2), the encoding of a floating-point number involves the assignment of three distinct binary segments: a single sign bit ( S ), a biased exponent ( E ), and a normalized mantissa ( M ).
The IEEE 754 standard provides multiple precision formats—including 16-bit (half), 32-bit (single), and 64-bit (double) [1]. To simplify the forthcoming derivation, we limit it to floating-point numbers with a positive sign bit. In this context, a given number X is primarily characterized by two fields: the exponent E and the mantissa M . Their fundamental relationship is captured by the following equation:
X = M   ×   2 E
In this context, the floating-point mantissa is expressed as M = 1. α β γ …, where α , β , γ , etc., are binary digits (each either 0 or 1).

4.3. Mantissa Division Method for Taylor Series Expansion

Building on the floating-point representation outlined earlier, this subsection details our uniform mantissa segmentation technique for efficient Taylor series expansion. We demonstrate that increasing segmentation reduces the required number of series terms n for a target accuracy.
The efficacy of the proposed segmentation technique was evaluated through numerical simulations designed to ascertain the necessary Taylor series expansion terms n for f ( x ) across a spectrum of precision requirements and evaluation domains, while adhering to the specified conditions:
f x t ( n ) f ( x ) p
Here, f ( x ) represents the ideal value of functions such as the reciprocal, inverse square root, square root, exponentiation, or logarithm. t ( n ) denotes the Taylor expansion approximation using n terms, and p indicates the target accuracy for all x within the specified region.

4.3.1. No Mantissa Region Division for Taylor Series Expansion

There is no division in the region 1 ≤ x < 2. The central value a is defined as 1.5, as shown in Table 2.

4.3.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2

The mantissa domain 1 ≤ x < 2 is uniformly segmented into two sub-intervals. Based on the value of M , a specific sub-interval is selected (as shown in Table 3), and the function is approximated via a Taylor series expansion about its midpoint a.

4.3.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2

The mantissa domain 1 ≤ x < 2 is uniformly segmented into four sub-intervals. Based on the value of M , a specific sub-interval is selected (as shown in Table 4), and the function is approximated via a Taylor series expansion about its midpoint a.

4.3.4. Eight Sub-Intervals of the Mantissa Region 1 ≤ x < 2

The mantissa domain 1 ≤ x < 2 is uniformly segmented into eight sub-intervals. Based on the value of M , a specific sub-interval is selected (as shown in Table 5), and the function is approximated via a Taylor series expansion about its midpoint a.
In the subsequent sections, we apply the above segmentation method to division, inverse square root, square root, exponential, and logarithmic algorithms using Taylor series expansion, in order to determine the minimum number of expansion terms required to achieve a given precision. Moreover, we emphasize the trade-offs involved in the number of multiplications and additions/subtractions, as well as the size of the LUTs required in hardware. These factors are analyzed to identify the most suitable algorithms for engineering applications and to support their practical implementation.

5. Division Algorithm Using Taylor Series Expansion

5.1. Introduction

This section introduces floating-point division algorithms using the Taylor series expansion with uniform mantissa division [28].
Typical applications of floating-point division in digital processors are as follows: (i) Computer Graphics (Rendering [54]): Division is used in pixel color calculations and light reflection models, where ratios and normalization require it. (ii) Digital Signal Processing (DSP): Audio and image filtering often involve normalization coefficients or gain adjustments that rely on division. (iii) Machine Learning and AI: Neural network training uses division for learning rate adjustments and normalization, especially in batch normalization and probability distribution calculations. (iv) Physics Simulations (Games and Scientific Computing): Fundamental formulas like velocity = distance ÷ time or density = mass ÷ volume depend on division. (v) Financial Calculations: Ratios such as exchange rates or interest rates require division. Example: profit margin = profit ÷ investment. (vi) Image Processing (Computer Vision): Histogram normalization and brightness correction often divide pixel values by maximum or average values. (vii) Numerical Analysis in Engineering: Division is essential in linear algebra (matrix inversion, Gaussian elimination) and in simulations like finite element methods or fluid dynamics.
In short, floating-point division is indispensable whenever ratios, normalization, or distribution calculations are needed—spanning graphics, AI, physics, finance, vision, and engineering.

5.2. Problem Formulation

We define a division algorithm to compute A = N / D , where A , N , and D are binary floating-point numbers with the following representations:
A =   M A × 2 E A
N = M N × 2 E N
D = M D × 2 E D
This design decomposes the division into a reciprocal evaluation followed by a standard multiplication, thereby leveraging the efficiency of optimized digital multipliers to complete the operation A = N × ( 1 / D ) .
Since 1/D = 1 / M D × 2 E D , calculating the reciprocal of the mantissa ( 1 / M D ) is the key step in obtaining 1/D. Therefore, to realize the division algorithm outlined above, the key subproblem of computing 1 / D is addressed by our novel algorithm. This algorithm employs a uniform mantissa division technique to enable an efficient and accurate Taylor series approximation of f ( x ) = 1 / x to a specified precision.

5.3. Reciprocal Calculation by Taylor Series Expansion

Let us consider using the Taylor series expansion of f ( x ) = 1 / x centered at x = a (1 ≤ a < 2) to compute 1 / M D (1 ≤ M D < 2) and determine the minimum number of expansion terms required to achieve the desired accuracy. The Taylor expansion of f ( x ) = 1 / x around x = a is given as follows:
f x = 1 a 1 a 2 x a + 1 a 3 x a 2 1 a 4 x a 3 + 1   a 5 x a 4 1 a 6 x a 5 +   1   a 7   x     a 6 1   a 8   x     a 7 + + 1 n 1 a n + 1 x a n + + = ( 1 / a ) [ 1 y + y 2 y 3 + y 4 y 5 + y 6
Here, y = ( x a ) / a   = ( 1 / a ) x     1 .
It is noteworthy that all coefficients for y in the expansion are either +1 or −1. This structure simplifies the arithmetic logic by replacing multiplication with straightforward sign manipulation, thereby enhancing computational efficiency.

5.4. Numerical Simulation Results

In this subsection, simulations are employed to determine the number of terms (n) required for the Taylor series expansion of f ( x ) = 1 / x under various conditions of given accuracies and regions. The Taylor expansion is then performed at the center point a of the selected region. Based on numerical simulations, we have determined the minimum number of expansion terms n needed to achieve various accuracy levels.

5.4.1. No Mantissa Division Within the Range of 1 ≤ x < 2 (Table 2)

Based on numerical simulations, the minimum required expansion order n to achieve various accuracy levels is summarized in Table 6.

5.4.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals: 1 x < 2 . First, a specific region is selected using the M D (see Table 3). The efficacy of this strategy is confirmed by the simulation data presented in Table 7.

5.4.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into four sub-intervals: 1 x < 2 . First, a specific region is selected using the M D (see Table 4). The efficacy of this strategy is confirmed by the simulation data presented in Table 8.

5.4.4. Eight Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we examine a method based on dividing the domain into eight sub-intervals: 1 x < 2 . First, a specific region is selected using the M D (see Table 5). The efficacy of this strategy is confirmed by the simulation data presented in Table 9.

5.5. Considerations for Hardware Implementation

The complexity is quantified by determining the total count and breakdown of fundamental arithmetic operations (such as multiplications, additions, subtractions, and comparisons) required for each Taylor series expansion computation.

5.5.1. Required Arithmetic Operations for Taylor Series Evaluation

Case Study 1: Taylor Series Expansion with n = 5 Terms. Here, an expansion term of n = 5 is used to demonstrate the trade-offs of the algorithm.
We will compute the following expression using a combined approach of the basic algorithm and a LUT.
f 5 = 1 a 1 a 2 x a + 1 a 3 x a 2 1 a 4 x a 3 + 1   a 5 x a 4
For constant a and variable x , let the LUT store the precomputed value 1 / a . Introducing the auxiliary variables y = ( 1 / a ) x 1 and z = y 2 , and substituting them into Equation (8) yields:
f 5 = ( 1 / a ) [ 1 y + y 2 y 3 + y 4 ] = ( 1 / a ) [ 1 y z 1 + z ]
Equation (9) implies that a five-term Taylor series expansion of 1 / x necessitates exactly 4 multiplications and four addition/subtraction operations.
Case Study 2: Taylor Series Expansion with n = 7 Terms. Here, an expansion term of n = 7 is used to demonstrate the trade-offs of the algorithm.
By applying the same operational steps detailed in Case Study 1, we derive Equations (10) and (11):
f 7 = 1 a 1 a 2 x a + 1 a 3 x a 2 1 a 4 x a 3 + 1   a 5 x a 4 1 a 6 x a 5 + 1 a 7 x a 6
f 7 = ( 1 / a ) [ 1 y + y 2 y 3 + y 4 y 5 + y 6 ] = ( 1 / a ) [ 1 y z 1 + z + z 2   ]                      
Equation (9) implies that a seven-term Taylor series expansion of 1 / x necessitates exactly five multiplications and five addition/subtraction operations. Applying the methods from Case Studies 1 and 2 to compute 1/x with different numbers of Taylor series terms yields Table 10.
A comparison of Table 9 and Table 10 indicates that, to achieve a 20-bit precision, the required operation counts are four multiplications and four additions/subtractions.

5.5.2. LUT Contents and Size

The value of 1/a is precomputed and stored in memory for later use during the Taylor series calculation. A four-segment division necessitates a compact four-word LUT (Table 11). During operation, the address bits α β derived from M D (where M D = 1 . α β ) are used to index and fetch the corresponding precomputed value. It should be noted that LUT addressing is straightforward since an address decoder is not needed.

5.6. Summary of Division Algorithm Part

This section has described a floating-point division algorithm that utilizes the Taylor series expansion of f ( x ) = 1 / x and is further enhanced by a mantissa division technique. The design investigates critical hardware trade-offs: achieving higher division accuracy while reducing the count of arithmetic operations (multiplications and additions/subtractions) inevitably demands a larger LUT, and vice versa. Numerical calculation results demonstrate that refining the granularity of mantissa division reduces the required number of arithmetic operations at the expense of increased LUT memory overhead. This offers a tunable design parameter to meet diverse precision and resource requirements in digital division applications. Also, note the following:
(i)
The coefficients of the Taylor series expansion of f ( x ) = 1 / x are +1 or −1, which reduces the number of multiplications.
(ii)
The LUT-address control logic is straightforward, as it directly maps the upper mantissa bits to the address.

6. Inverse Square Root Algorithm Using Taylor Series Expansion

6.1. Introduction

This section introduces floating-point inverse square root algorithms using the Taylor series expansion with mantissa region division [29].
Typical applications of floating-point inverse square root are as follows: (i) Computer Graphics and 3D Rendering: It is used for vector normalization (e.g., calculating unit vectors for lighting, shading, and camera transformations). (ii) Physics Simulations in Games: It is essential for computing distances and normalizing velocity vectors quickly, improving real-time performance. (iii) Machine Learning and AI: It appears in optimization algorithms and normalization steps where inverse square roots stabilize variance scaling. (iv) Robotics and Control Systems: It is used in orientation calculations (e.g., quaternion normalization [55]) for stable motion control. (v) DSP: It is applied in Root Mean Square (RMS) normalization and energy scaling, where inverse square roots help adjust signal magnitudes efficiently. (vi) Computer Vision and Image Processing: It is used in feature extraction and normalization of gradient vectors (e.g., in edge detection or Scale-Invariant Feature Transform (SIFT) descriptor [56]). (vii) Scientific and Engineering Computations: It appears in numerical methods requiring normalized vectors, such as finite element analysis, fluid dynamics, and electromagnetics.
In short, the inverse square root is a performance-critical shortcut for normalization tasks across graphics, physics, AI, robotics, DSP, vision, and engineering.

6.2. Representation and Computation of Floating-Point Inverse Square Roots

Building upon the preceding division framework, we extend the methodology to calculate the inverse square root I s q of a binary floating-point number.
I s q = 1 X = 1 M × 1 2 E
To handle the square root in the exponent for 1 / x , we must consider the parity of exponent E . For the even case ( E = 2 k ), the derivation simplifies to the following form:
I s q = 1 X = 1 M × 1 2 E
1 X = 1 M × 2 k
For the even-exponent case ( E = 2 k ), this process isolates the exponent part as k and the mantissa part as 1 / M . Applying floating-point normalization to the mantissa then yields the following form:
1 M =   M 1 × 2 k 1
Within the domain 1 M < 2 , we have 1 / 2 < M 1 1 and thus k 1 = 0 , yielding the following form:
I s q = 1 X = M 1 × 2 k
In this representation, M 1 denotes the mantissa, and k denotes the exponent of the inverse square root result I s q .
For the odd case ( E = 2 k + 1 ), a similar derivation yields the following equation:
1 2 E = 1 2 × 2 k
Normalizing the expression 1 / 2 × 1 / M to a standard floating-point format results in the following equation:
1 2 × 1 M =   M 2 × 2 k 2
Given 1 M < 2 , we have 1 / 2 < 1 / 2 × 1 / M 1 / 2 . This implies 1 / 2 < M 2 1 / 2 , and k 2 = 0 , we obtain the following equation:
I s q = 1 X = M 2 × 2 k
In this representation, M 2 denotes the mantissa, and k denotes the exponent of the inverse square root result I s q .

6.3. Taylor Series Expansion of Inverse Square Root

The Taylor series expansion of f ( x ) = 1 / x about the point x = a is given by the following equation:
f x = 1 a × 1 x a 2 a + 3 x a 2 8 a 2 5 x a 3 16 a 3 + 35 x     a 4 128 a 4 +  

6.4. Number Simulation Results

Now we discuss how to achieve the specified precision by employing the minimal number of Taylor series expansion terms required for computation under the given accuracy constraints. The Taylor expansion is then performed at the center point a of the selected region. Based on numerical simulations, we have determined the minimum number of expansion terms n needed to achieve various accuracy levels.

6.4.1. No Mantissa Division Within the Range of 1 ≤ x < 2 (Table 2)

Based on numerical simulations, the minimum required expansion order n to achieve various accuracy levels is summarized in Table 12.

6.4.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals: 1 x < 2 . First, a specific region is selected using the M (see Table 3). The efficacy of this strategy is confirmed by the simulation data presented in Table 13.

6.4.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into four sub-intervals: 1 x < 2 . First, a specific region is selected using the M (see Table 4). The efficacy of this strategy is confirmed by the simulation data presented in Table 14.

6.4.4. Eight Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into eight sub-intervals: 1 x < 2 . First, a specific region is selected using the M (see Table 5). The efficacy of this strategy is confirmed by the simulation data presented in Table 15.

6.5. Considerations for Hardware Implementation

This section analyzes the hardware implementation complexity of the inverse square root algorithm I s r = 1 / X = 1 / M × 1 / 2 E . The analysis quantifies the arithmetic cost—specifically the counts of multiplications, additions, and subtractions—across different implementation scenarios based on Taylor series approximations.
As a concrete illustration, consider the case of a five-term Taylor series expansion ( f 5 ( x ) ) for f ( x ) = 1 / x about a chosen center point a . The resulting expression and its associated operation count are as follows:
(1)
For the even case ( E = 2 k ), the derivation simplifies to the following form:
Let, I s r = M 1 × 2 k and M 1 = f 5 M = f 5 x .
f 5 x = 1 a 1 + x a 2 a 3 x a 2 8 a 2 + 5 x a 3 16 a 3 35 x a 4 128 a 4 = α 0 + α 1 x a α 2 x a 2 + α 3 x a 3 α 4 x a 4
where α 0 = 1 a , α 1 = 1 2 × a 3 ,   α 2 = 3 8 × a 5 ,   α 3 = 5 16 × a 7 , α 4 = 35 128 × a 9 .
(2)
For the odd case ( E = 2 k + 1 ), the derivation simplifies to the following form:
Let, I s r = M 2 × 2 k and M 2 = 1 2   f 5 M = g 5 M = g 5 x .
Equation (22) is explicitly defined with the coefficients β k = 1 / ( 2 α k ) for k = 0,1 , 2,3 , 4 , where x is a variable and a a constant. All necessary coefficients α 0 , , α 4 and β 0 , , β 4 are precomputed and stored in a LUT, from which they are fetched during runtime computation.
g 5 x = 1 2   f 5 x = β 0 + β 1 ( x a ) β 2 x a 2 + β 3 x a 3 β 4 x a 4
Subsequently, let y = x a ,   z = y 2 , we obtain the following equation:
f 5 x = α 0 + α 1 y   z ( α 2 α 3 y + α 4 z )
Equation (23) implies that a five-term Taylor series expansion of 1 / x necessitates exactly five multiplications and five addition/subtraction operations. Applying the same methodology to compute 1 / x with different numbers of Taylor series terms yields Table 16. Table 15 and Table 16 show that by dividing the inverse square root domain into eight regions, the reciprocal of the mantissa can be computed with 24-bit precision using only five multiplications and five additions/subtractions.
The required LUT size scales linearly with the number of regions, given by N × 10 words. In the case of a mantissa M = 1 . α β , the two most significant bits ( α β ) are used directly to address the corresponding LUT entry. As an example, Table 17 illustrates the configuration for N = 4 regions, which requires an LUT of 40 words. Note that the addressing scheme is straightforward and does not require a dedicated address decoder.

6.6. Summary of Inverter Square Root Algorithm Part

This section has reviewed inverse square root algorithms utilizing region-segmented Taylor series expansions, quantifying the implementation trade-offs regarding accuracy, arithmetic complexity, and memory footprint. This framework thereby offers adaptable support for meeting varied precision and resource constraints. Thus, the insights presented herein equip designers to conceptualize and implement specialized hardware for efficient inverse square root calculation.

7. Square Root Algorithm Using Taylor Series Expansion

7.1. Introduction

This section introduces floating-point square root algorithms using Taylor series expansion with uniform mantissa division [30].
Typical applications of floating-point square root operations are as follows: (i) Computer Graphics and 3D Rendering: Square roots are used in vector normalization (e.g., calculating unit vectors for lighting and shading). (ii) DSP: In audio and image processing, square roots appear in RMS calculations for measuring signal strength. (iii) Machine Learning and AI: Algorithms like gradient descent often use square roots in optimization methods (e.g., Root Mean Square Propagation (RMSProp), Adam optimizers [57]). (iv) Physics Simulations and Games: Square roots are needed for distance calculations (e.g., Euclidean distance in 3D space) and solving equations of motion. (v) Statistics and Data Analysis: Standard deviation and variance calculations require square roots to measure data spread. (vi) Engineering and Scientific Computing: Square roots are essential in solving quadratic equations, wave equations, and numerical methods in structural or fluid analysis. (vii) Cryptography and Security: Certain algorithms (e.g., modular arithmetic, elliptic curve cryptography [58]) involve square root operations in finite fields.
In short, square root operations are indispensable in geometry, optimization, signal measurement, statistical analysis, and cryptography—making them one of the most widely used floating-point functions in digital processors.

7.2. Problem Formulation

A floating-point number is typically composed of a mantissa, a sign bit, and an exponent. In this section, we focus on cases where the mantissa is positive. For clarity, we denote the binary mantissa field, the exponent field, and the overall binary floating-point representation as M, E, and X, respectively, as follows:
X = M   ×   2 E
In Equation (24), the mantissa field is represented as M = 1. α β γ …, where each of α , β , γ , …, etc., is either one or 0. For a normalized binary floating-point representation, the mantissa satisfies 1 M < 2 .
The square root computation in binary floating-point arithmetic can be formulated as follows:
S r = X = M × 2 E
To handle the square root in the exponent for X , we must consider the parity of exponent E . For the even case ( E = 2 k ), the derivation simplifies to the following form:
2 E = 2 k
S r = M × 2 k
This computation yields a square root representation partitioned into an integer exponent part k and a fractional mantissa part M . By normalizing M , we obtain the following expression:
M =   M 1 × 2 k 1
Within the domain 1 M < 2 , we have 1 M 1 < 2 and thus k 1 = 0 , yielding the following form:
S r = M 1 × 2 k
We obtain the exponent part k and the mantissa part M 1 of the final expansion of Sr.
When the exponent E of a floating-point number is odd, let E = 2k + 1. It can then be expressed as follows:
In this representation, M 1 denotes the mantissa, and k denotes the exponent of the square root result S r .
For the odd case ( E = 2 k + 1 ), a similar derivation yields the following expression:
2 E = 2 × 2 k
Let 2 × M be used to normalize the floating-point number, resulting in the following expression:
2 × M =   M 2 × 2 k 2
Since 1 M < 2 , it follows that 2 2 × M < 2 , which implies 2   M 2 < 2 and k 2 = 0 . Based on this, we derive the following:
S r = M 2 × 2 k
In this representation, M 2 denotes the mantissa, and k denotes the exponent of the inverse square root result S r .

7.3. Taylor Series Expansion of Square Root

The Taylor series expansion of f ( x ) = x about the point x = a is given by the following equation:
f x = a 1 + x a 2 x a 2 8 a + x a 3 16 a 2 5 x a 4 128 a 3 + 7 x a 5 256 a 4      

7.4. Numerical Simulation Results

In the following, we examine the relationship between the function f ( x ) = x , the number of terms used in its Taylor series expansion, and the resulting approximation accuracy. The Taylor expansion is then performed at the center point a of the selected region. Based on numerical simulations, we have determined the minimum number of expansion terms n needed to achieve various accuracy levels.

7.4.1. No Mantissa Division Within the Range of 1 ≤ x < 2 (Table 2)

Based on numerical simulations, the minimum required expansion order n to achieve various accuracy levels is summarized in Table 18.

7.4.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals: 1 x < 2 . First, a specific region is selected using the M (see Table 3). The efficacy of this strategy is confirmed by the simulation data presented in Table 19.

7.4.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into 4 sub-intervals: 1 x < 2 . First, a specific region is selected using the M (see Table 4). The efficacy of this strategy is confirmed by the simulation data presented in Table 20.

7.4.4. Eight Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into eight sub-intervals: 1 x < 2 . First, a specific region is selected using the M (see Table 5). The efficacy of this strategy is confirmed by the simulation data presented in Table 21.

7.5. Considerations for Hardware Implementation

We examine the hardware complexity of the algorithm described here. Consider the equation S r = X = M × 2 E , and evaluate the number of subtractions, multiplications and additions required when using the Taylor series expansion. Consider the five-term Taylor series expansion f 5 ( x ) for x centered at a . Its expression is as follows:
f 5 x = a 1 + x a 2 x a 2 8 a + x a 3 16 a 2 5 x a 4 128 a 3 = a + α 1 ( x a ) α 2 x a 2 + α 3 x a 3 α 4 x a 4
where α 1 = a 2 ,   α 2 = a 8 × a ,   α 3 = a 16 × a 2 , α 4 = 5 a 128 × a 3   .
We also define the following:
f g 5 x = 2 × f 5 x =   β 0 + β 1 ( x a )   β 2 x a 2 + β 3 x a 3 β 4 x a 4
Here, β 0 = 2 a ,   β 1 = 2 α 1 ,   β 2 = 2 α 2 ,   β 3 = 2 α 3 ,   β 4 = 2 α 4 .
Depending on the parity of E , the square root S takes one of two forms:
E even: With E = 2 k , S = f 5 ( M ) × 2 k .
E odd: With E = 2 k + 1 , S = 2 × f 5 ( M ) × 2 k = g 5 ( M ) × 2 k .
In Equation (34), a is a constant and x is a variable. The coefficients α 1 , ,   α 4 and β 0 , ,   β 4   are individually stored in the LUT and can be retrieved as needed. We then compute y = x − a and z = y2, which yields the following:
f 5 x = α 0 2 + y   z ( α 2 α 3 y + α 4 z )  
The final Equation (36) is derived by algebraic transformation of f ( x ) = x using a five-term Taylor series expansion. This equation consists of five multiplications and five subtractions/additions. Taylor series expansions of x with varying numbers of terms result in different computational requirements, and the corresponding numbers of additions, subtractions, and multiplications are summarized in Table 22.
In the case of dividing the mantissa into four regions, and by referencing Table 22 and Table 23, it can be determined that the square root calculation with 20-bit precision requires five multiplications and five subtractions/additions.
For LUT size estimation, the size can be calculated as N × 8 (N: the number of partition segments), and the first and second MSBs of M = 1.αβ… are used as address inputs. For example, with a four-region division. Accordingly, a 32-word LUT is sufficient, according to the results calculated in Table 23.

7.6. Summary of Square Root Algorithm Part

This section has described a square root algorithm for floating-point numbers that synergizes mantissa region segmentation with Taylor series expansion. It also discussed the required LUT size, the number of arithmetic operations—including multiplications, and additions/subtractions—and the trade-offs involved in achieving arithmetic accuracy. The described method enables designers to construct dedicated hardware architectures tailored to specific square root computation requirements.

8. Exponentiation Algorithm Using Taylor Series Expansion

8.1. Introduction

This section introduces floating-point exponentiation algorithms using Taylor series expansion with uniform mantissa region division [31].
Typical applications of floating-point exponentiation are as follows: (i) Computer Graphics and Animation: Exponentiation is used in lighting models (e.g., specular highlights in the Phong reflection model [59]) and gamma correction for color adjustment. (ii) DSP: Power functions are applied in audio compression, dynamic range control, and modeling nonlinear systems. (iii) Machine Learning and AI: Exponentiation is crucial in activation functions (e.g., softmax uses e x [57]), probability distributions, and exponential moving averages in optimizers. (iv) Physics and Engineering Simulations: Many physical laws involve powers, such as inverse-square laws (gravity, electromagnetism) or exponential decay in radioactive processes. (v) Finance and Economics: Compound interest and growth models rely on exponentiation. (vi) Statistics and Data Analysis: Exponential functions appear in probability distributions (e.g., exponential distribution, Gaussian distribution with e x 2 ). (vii) Cryptography and Security: Modular exponentiation is a core operation in RSA and Diffie–Hellman key exchange [60], making exponentiation vital for secure communications. It is also widely used in finance, statistics, and cryptography, making it one of the most computationally intensive but essential floating-point operations in digital processors.

8.2. Problem Formulation

We now turn to the floating-point algorithm for computing the exponential function (Exp), derived from its binary floating-point representation.
E x p = e x p 2 E × M = ( e x p ( M ) ) 2 E
We consider the case where M and E are zero or a positive integer. The extension to negative values of M and E is straightforward. For example, 2 E = 1, 2, 4, 8, 16, 32, ⋯, for E = 0, 1, 2, 3, 4, 5, .
We observe that once exp(M) is obtained, E x p can be derived by raising it to an integer power 2 E with E times multiplications.

8.3. Taylor Series Expansion of Exponential Function

We compute e x p ( M ) (with 1 M < 2 ) via its Taylor series expansion about a chosen center a (where 1 a < 2 ). The expansion of f ( x ) = e x p ( x ) at x = a is as follows:
f x = exp a [ 1 + q + q 2 2 + q 3 6 + q 4 24 + q 5 120 + q 6 720 + ] = e x p ( a ) [ 1 + q ( 1 + q ( 1 2 + q ( 1 6 + q 24 + q ( 1 120 + q 720 ) ) ) ) ]
Equation (38) gives the 6th-order Taylor series expansion of the exponential function, where q = x a .

8.4. Numerical Simulation Results

In the following, we examine the relationship between the function f ( x ) = e x p ( x ) , the number of expansion terms in its Taylor series expansion, and the resulting approximation accuracy. The Taylor expansion is then performed at the center point a of the selected region. Based on numerical simulations, we have determined the minimum number of expansion terms n needed to achieve various accuracy levels.

8.4.1. No Mantissa Division Within the Range of 1 ≤ x < 2 (Table 2)

Based on numerical simulations, the minimum required expansion order n to achieve various accuracy levels is summarized in Table 24.

8.4.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals: 1 x < 2 . First, a specific region is selected using the M (see Table 3). The efficacy of this strategy is confirmed by the simulation data presented in Table 25.

8.4.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into four sub-intervals: 1 x < 2 . First, a specific region is selected using the M (see Table 4). The efficacy of this strategy is confirmed by the simulation data presented in Table 26.
Likewise, the interval 1 ≤ x < 2 may be partitioned into finer subintervals—for example, into 8, 16, 32 parts, and so forth.

8.5. Considerations for Hardware Implementation

We assess hardware complexity through the metrics calculated below. For instance, for a Taylor series with n = 5 terms, this assessment yields the specific LUT size and the number of addition, subtraction, and multiplication operations.
f 5 = exp a 1 + x a + x a 2 2 +   x a 3 6 + x a 4 24
For constant a and variable x , let the LUT store the precomputed value e x p ( a ) . Introducing the auxiliary variables y = ( 1 / a ) x 1 and z = y 2 , and substituting them into Equation (39) yields the following:
f 5   = e x p ( a ) × ( 1 + y + y 2 2 + y 3 6 + y 4 24 ) = exp a × 1 + y + z 2 × ( 1 + y 3 + z 12 )
The five-term Taylor expansion requires five additions/subtractions and six multiplications, achieving a precision of 1 / 2 20 (see Table 26). This specific case exemplifies the general trend documented in Table 27, which maps the expansion order n to the required counts of arithmetic operations (multiplications, additions, subtractions) for computing f ( x ) = e x p ( x ) .
LUT Construction and Addressing: The LUT is populated with precomputed values of e x p ( a ) . As outlined in Table 28, segmenting the mantissa region into four parts requires a corresponding four-word LUT. The same principle can be extended to any number of partitioned regions, enabling an estimate of the total LUT size. Access is direct: for a mantissa M = 1 . α β , the address bits α β point to the corresponding entry. Consequently, as the number of partitions grows, the required LUT size increases correspondingly.

8.6. Summary of Exponentiation Algorithm Part

This section analyzes how the convergence range and precision of the exponential function’s Taylor expansion are governed by two key parameters under the mantissa division technique: the number of series terms (which can be controlled by adjusting the number of partitions) and the expansion center value. It further investigates the trade-offs among LUT size, computational precision, and fundamental arithmetic operations, offering design insights that designers can leverage to develop efficient digital algorithms. Notably, this technique concentrates on the mantissa of the domain variable; as the exponent component increases, the number of multiplications required for the overall exponentiation function also grows.

9. Logarithm Algorithm Using Taylor Series Expansion

9.1. Introduction

This section introduces floating-point logarithm algorithms using the Taylor series expansion with uniform mantissa region division [13].
Typical applications of floating-point logarithm operations are as follows: (i) Computer Graphics and Imaging: Logarithms are used in tone mapping [61] and dynamic range compression, helping to represent brightness levels more naturally. (ii) DSP: Logarithmic scales measure sound intensity (decibels) and frequency response, making audio analysis more accurate. (iii) Machine Learning and AI: Logarithms are essential in loss functions (e.g., cross-entropy [57]), probability calculations, and log-likelihood estimation. (vi) Data Compression: Algorithms like JPEG and MP3 use logarithmic transformations to model human perception of sound and light more efficiently. (v) Statistics and Probability: Logarithms are used in statistical models, especially for log-normal distributions and hypothesis testing. (vi) Finance and Economics: Logarithmic returns are widely used to measure investment performance and volatility over time. (vii) Scientific and Engineering Computations: Logarithms appear in exponential growth/decay models, chemical reaction rates, and complexity analysis in algorithms.
In short, logarithm operations are indispensable in graphics, DSP, AI, compression, statistics, finance, and scientific modeling, making them one of the most versatile floating-point functions in digital processors.

9.2. Problem Formulation

We now turn to the floating-point algorithm for computing the exponential function ( l o g 2 x ), derived from its binary floating-point representation.
log 2 x = log 2 M + E
Here, x = M × 2 E .
We observe that once log 2 M is determined, l o g 2 can be obtained simply by adding E . Therefore, our focus is on calculating l o g 2 M .

9.3. Taylor Series Expansion of Mantissa Part for Base-2 Logarithm

We compute l o g 2 M (with 1 M < 2 ) via its Taylor series expansion about a chosen center a (where 1 a < 2 ). The expansion of f ( x ) = l o g 2 x at x = a is as follows:
f x = 1 ln 2 ln a + p a p 2 2 a 2 + p 3 3 a 3 p 4 4 a 4 + = 1 ln 2 ln a + p a ( 1 p a ( 1 2 + p a ( 1 3 p a ( 1 4 + ) ) ) )
Equation (42) gives the Taylor series expansion of the exponential function, where p = x a .

9.4. Numerical Simulation Results

In the following, we examine the relationship between the function f x = l o g 2 x , a, the number of expansion terms in its Taylor series expansion, and the resulting approximation accuracy. The Taylor expansion is then performed at the center point a of the selected region. Based on numerical simulations, we have determined the minimum number of expansion terms n needed to achieve various accuracy levels.

9.4.1. No Mantissa Division Within the Range of 1 ≤ x < 2 (Table 2)

Based on numerical simulations, the minimum required expansion order n to achieve various accuracy levels is summarized in Table 29.

9.4.2. Two Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals: 1 x < 2 . First, a specific region is selected using the M (see Table 3). The efficacy of this strategy is confirmed by the simulation data presented in Table 30.

9.4.3. Four Sub-Intervals of the Mantissa Region 1 ≤ x < 2

To reduce the number of terms required in the Taylor series expansion, we propose a method based on dividing the domain into two sub-intervals: 1 x < 2 . First, a specific region is selected using the M (see Table 4). The efficacy of this strategy is confirmed by the simulation data presented in Table 31.

9.5. Considerations for Hardware Implementation

We assess hardware complexity through the metrics calculated below. For instance, for a Taylor series with n = 5 terms, this assessment yields the specific LUT size and the number of multiplication, addition, and subtraction operations.
f 5 ( x ) = 1 ln 2 ln a + p a p 2 2 a 2 + p 3 3 a 3 p 4 4 a 4 = α 0 + α 1 p α 2 p 2 + α 3 p 3 α 4 p 4
Here, a 0 =   l n ( a ) l n ( 2 ) , a 1 =   1 a × l n 2 ,   a 2 =   1 2 a 2 × l n 2 , a 3 = 1 3 a 3 × l n 2 , a 4 = 1 4 a 4 × l n 2 .
Here, x and a represent a variable and a constant, respectively. The coefficients α 0 , ,   α 4 are precomputed and stored in an LUT; they are retrieved at runtime during the calculation. We then compute p = y = xa, followed by z = y 2 , and proceed as follows:
f 5 ( x ) = a 0 + a 1 y z ( a 2 + a 3 y a 4 z )
The five-term expansion ( f 5 ) requires five additions/subtractions and five multiplications. This specific case exemplifies the general trend documented in Table 32, which maps the expansion order n to the required counts of arithmetic operations (multiplications, additions, subtractions) for computing f ( x ) = l o g 2 x .
The data in Table 31 and Table 32 demonstrate that a four-region partition (e.g., domains R4-3, R4-4) achieves 22-bit accuracy in computing the mantissa logarithm with a cost of seven multiplications and seven additions/subtractions. This efficiency stems from a scalable architecture where, for an n -term expansion over N regions, the LUT size is given by ( n + 1 ) × N words. Addressing is direct: the two most significant bits (MSBs) α β of the mantissa M = 1 . α β serve as the lookup address.
Table 33 instantiates this model for n = 4 and N = 4 , requiring a 20-word LUT. This partitioning strategy, generalizable as shown in Table 33, delivers a key advantage: by confining the approximation to smaller sub-domains, it reduces both the required Taylor series order n for a target accuracy and, consequently, the number of runtime LUT accesses, enhancing overall efficiency.

9.6. Summary of Logarithm Algorithm Part

This work has developed and analyzed floating-point logarithm algorithms that integrate Taylor series expansion with a mantissa region division technique. Our analysis explicitly quantifies the key hardware implementation trade-offs—among simulation accuracy, arithmetic operation counts (multiplications, additions/subtractions), and LUT size—to provide a configurable design framework adaptable to diverse digital system requirements.

10. Discussion

In this section, we examine the common structural features shared by our method across several floating-point arithmetic computations, as well as the limitations of its applicability.
(1)
We have examined floating-point calculations of functions f(x) such as 1 x , 1 x , x , exp x   a n d   l o g 2 x in the previous sections.
(A) For 1 x , 1 x   a n d   x , the function f(x) can be expressed as follows:
f ( x )   =   g ( M ) × h ( E )
The term g(M) is computed using a Taylor series expansion with the mantissa division technique, while the evaluation of h(E) is straightforward.
(B) For exp(x), the function f(x) can be expressed as follows:
f ( x )   = ( g M ) 2 E
The term g(M) is computed using a Taylor series expansion with the mantissa division technique, whereas the effect of the exponent part is apparent.
(C) For l o g 2 ( x ) , the function f(x) can be expressed as follows:
f x =   g M + h ( E )
The term g(M) is computed using a Taylor series expansion with the mantissa division technique, while the evaluation of h (E) is straightforward (h(E) = E).
We therefore consider the Taylor series expansion technique combined with mantissa-region division to be effective for functions f(x) in which the contribution of the exponent part is explicit. Accordingly, we emphasize that the applicability of the mantissa region division method to a given function f(x) depends on whether the contribution of the exponent part of the input x to f(x) can be obtained with ease. In contrast, the method is difficult to apply directly to certain functions that do not exhibit these properties; this is the limitation of the proposed method.
(2)
It has been shown for 1 x , 1 x , x , exp x ,   and   l o g 2 x that as the number of mantissa divisions increases, the number of additions/subtractions and multiplications decreases, while the LUT size increases. However, LUT addressing remains simple because no address decoder is required, and the additional LUT resources are negligible in some modern LSI technologies. Therefore, it is worthwhile to pursue increasing the division number and to determine the optimal division number as future work.
(3)
Table 34 compares the required orders of the Taylor series to achieve an accuracy of 1/ 2 16 for each arithmetic operation. We observe that the required orders are similar, mainly because the domain 1 ≤ x < 2 is relatively narrow. We also note that the orders for l o g 2 x and 1/x are higher, as these functions vary more significantly over the interval 1 ≤ x < 2.
(4)
Note that the LUT-address control logic is straightforward, as it directly maps the upper mantissa bits to the address. Also, observe that these algorithms share a common region-division scheme, since it is based on partitioning the mantissa interval 1 x < 2 . These characteristics make them well-suited for a unified hardware implementation.
(5)
Chebyshev polynomial (min–max) approximations can achieve the required accuracy with coarser mantissa segments and lower polynomial order than the Taylor series expansion [52]. However, designing their coefficients is more involved. In contrast, coefficient generation for the Taylor series expansion is straightforward, since the coefficients are obtained simply by differentiating the target function. As the number of region divisions increases, the difference in effectiveness between the Taylor series expansion and the Chebyshev polynomial (min–max) approximations becomes smaller. Moreover, when the function varies rapidly, a Taylor series expansion with fine mantissa segmentation can be advantageous, because it provides high accuracy in the vicinity of the expansion point.

11. Conclusions

This study has examined floating-point computations of fundamental functions—division, inverse square root, square root, exponentiation, and logarithm—through the application of Taylor series expansion with the mantissa region division. Their hardware implementations were also investigated: they can be realized using adders/subtractors, multipliers, and look-up tables, and a common hardware architecture can be employed by switching through simple programming. A comprehensive explanation is provided for each algorithm: inverse, inverse square root, square root, exponential, and logarithmic functions. We also discussed the applicability of this method to a given function f(x); it depends on how easily the contribution of the exponential part of x to f(x) can be computed.
These findings highlight the efficiency and versatility of the Taylor series and mantissa-region-division-based approach, offering a practical foundation for the design of high-performance arithmetic units in modern computing systems [57,58]; The method is particularly effective for high-performance CPUs and GPUs equipped with abundant floating-point adders, subtractors, and multipliers, as well as sufficient main memory, where polynomial-based arithmetic can be efficiently mapped onto deeply pipelined and massively parallel execution units [19,62].
Future work includes the following directions:
(i)
Identifying additional functions that are well-suited to Taylor series expansion with mantissa region division, especially for transcendental and composite functions that are frequently used in signal processing, graphics, and machine learning workloads [18].
(ii)
Hybrid methods combining our Taylor approximation with iterative refinement.
(iii)
Optimizing these implementations for parallel architectures and investigating their integration into domain-specific accelerators, where recent studies have shown that approximation-based arithmetic can significantly improve performance–energy efficiency trade-offs [20].
(iv)
Investigating their hardware-software co-design for optimized implementation. We have demonstrated that our method shares a common structure with several floating-point arithmetic computations, and that a unified co-design of these computations can be highly effective.
(v)
Applying the Taylor series expansion with mantissa region division to floating-point computation in the AI-oriented numerical formats (posit, bfloat16, and TensorFloat32 (TF32)) [21,25,63,64,65,66].
(vi)
Investigating “correct rounding” when applying the proposed method [67].

Author Contributions

Conceptualization, J.W. and H.K.; methodology, J.W. and H.K.; software, J.W. and H.K.; validation, J.W. and H.K.; formal analysis, J.W. and H.K.; investigation, J.W. and H.K.; resources, J.W. and H.K.; data curation, J.W. and H.K.; writing—original draft preparation, J.W. and H.K.; writing—review and editing, J.W. and H.K.; visualization, J.W. and H.K.; supervision, H.K.; project administration, H.K.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Yibin University Scientific Research Startup Project, grant number 2023QH27.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. IEEE Std 754-2019; IEEE Standard for Floating-Point Arithmetic. IEEE Computer Society: Piscataway, NJ, USA, 2019.
  2. Muller, J.M.; Brunie, N.; De Dinechin, F.D.; Jeannerod, C.-P.; Joldes, M.; Lefèvre, V.; Melquiond, G.; Revol, N.; Serge Torres, S. Handbook of Floating-Point Arithmetic; Birkhäuser: Basel, Switzerland, 2018. [Google Scholar]
  3. Koren, I. Computer Arithmetic Algorithms, 2nd ed.; A K Peters/CRC Press: Natick, MA, USA, 2001. [Google Scholar]
  4. Chen, C.; Li, Z.; Zhang, Y.; Zhang, S.; Hou, J.; Zhang, H. Low-Power FPGA Implementation of Convolution Neural Network Accelerator for Pulse Waveform Classification. Algorithms 2020, 13, 213. [Google Scholar] [CrossRef]
  5. Lin, J.; Li, Y. Efficient Floating-Point Implementation of Model Predictive Control on an Embedded FPGA. IEEE Trans. Control. Syst. Technol. 2021, 29, 1473–1486. [Google Scholar]
  6. Chen, C.; Li, Z.; Zhang, Y.; Zhang, S.; Hou, J.; Zhang, H. A 3D Wrist Pulse Signal Acquisition System for Width Information of Pulse Wave. Sensors 2020, 20, 11. [Google Scholar] [CrossRef]
  7. Zhou, J.; Liu, Z.; Song, X. Constructing High-Radix Quotient Digit Selection Tables for SRT Division and Square Root. IEEE Trans. Comput. 2023, 72, 987–1001. [Google Scholar]
  8. Wang, X.; Yu, Z.; Gao, B.; Wu, H. An In-Memory Computing Architecture Based on a Duplex Two-Dimensional Material Structure for In-Situ Machine Learning. Nat. Nanotechnol. 2023, 18, 456–465. [Google Scholar]
  9. Yang, C.; Xiang, S.; Wang, J.; Liang, L. A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units. In Proceedings of the 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS), Dubai, United Arab Emirates, 28 Novemember–1 December 2021. [Google Scholar]
  10. Lai, S.; He, X. Design of the Vector Floating-Point Unit with High Area Efficiency. J. Phys. Conf. Ser. 2023, 2524, 012027. [Google Scholar] [CrossRef]
  11. Sivaprasad, P.; Anandi, V.; Murthy, S. Design of High Throughput and Low Latency Double Precision Floating Point Arithmetic Unit for Space Signal Applications. Int. J. Comput. Sci. Netw. Secur. 2022, 22, 808–815. [Google Scholar]
  12. Kwon, T.-J.; Sondeen, J.; Draper, J. Floating-Point Division and Square Root using a Taylor-Series Expansion Algorithm. In Proceedings of the 50th Midwest Symposium on Circuits and Systems, Montreal, QC, Canada, 5–8 August 2007. [Google Scholar]
  13. Wei, J.; Kuwana, A.; Kobayashi, H.; Kubo, K. IEEE754 Binary32 Floating-Point Logarithmic Algorithms Based on Taylor-Series Expansion with Mantissa Region Conversion and Division. IEICE Trans. Fundam. 2022, E105-A, 1020–1027. [Google Scholar] [CrossRef]
  14. Donisi, A.; Di Benedetto, L.; Liguori, R.; Licciardo, G.D.; Rubino, A. A FPGA Hardware Architecture for AZSPWM Based on a Taylor Series Decomposition. In Applications in Electronics Pervading Industry, Environment and Society; Berta, R., De Gloria, A., Eds.; Lecture Notes in Electrical Engineering; Springer: Cham, Switzerland, 2022; Volume 1036. [Google Scholar]
  15. Vazquez-Leal, H.; Benhammouda, B.; Filobello-Nino, U.A.; Sarmiento-Reyes, A.; Jimenez-Fernandez, V.M.; Marin-Hernandez, A.; Herrera-May, A.L.; Diaz-Sanchez, A.; Huerta-Chua, J. Modified Taylor Series Method for Solving Nonlinear Differential Equations with Mixed Boundary Conditions Defined on Finite Intervals. SpringerPlus 2014, 3, 160. [Google Scholar] [CrossRef] [PubMed]
  16. Chopde, A.; Bodas, S.; Deshmukh, V.; Bramhekar, S. Fast Inverse Square Root Using FPGA. In Advancements in Communication and Systems; Soft Computing Research Society (SCRS): Delhi, India, 2024; pp. 231–239. Available online: https://www.publications.scrs.in/chapter/978-81-955020-7-3/21 (accessed on 3 March 2026).
  17. Moroz, L.V.; Samotyy, V.V.; Horyachyy, O.Y. Modified Fast Inverse Square Root and Square Root Approximation Algorithms: The Method of Switching Magic Constants. Computers 2021, 9, 21. [Google Scholar] [CrossRef]
  18. Li, P.; Jin, H.; Xi, W.; Xu, C.; Yao, H.; Huang, K. Reconfigurable Hardware Architecture for Miscellaneous Floating-Point Transcendental Functions. Electronics 2023, 12, 233. [Google Scholar] [CrossRef]
  19. Aamodt, T.M.; Fung, W.W.L.; Rogers, T.G. General-Purpose Graphics Processor Architectures; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  20. Bandil, L.; Nagar, B.C. Hardware Implementation of Unsigned Approximate Hybrid Square Rooters for Error-Resilient Applications. IEEE Trans. Comput. 2024, 73, 2734–2746. [Google Scholar] [CrossRef]
  21. Kim, S.; Norris, C.J.; Oelund, J.I.; Rutenbar, R.A. Area-Efficient Iterative Logarithmic Approximate Multipliers for IEEE 754 and Posit Numbers. IEEE Trans. Very Large Scale Integr. Syst. 2024, 32, 455–467. [Google Scholar] [CrossRef]
  22. Haselman, M.; Beauchamp, M.; Wood, A.; Hauck, S.; Underwood, K.; Hemmert, K.S. A Comparison of Floating point and Logarithmic Number Systems for FPGAs. In Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’05), Napa, CA, USA, 18–20 April 2005; pp. 181–190. [Google Scholar]
  23. Park, S.; Yoo, Y. A New Fast Logarithm Algorithm Using Advanced Exponent Bit Extraction for Software-based Ultrasound Imaging Systems. Electronics 2022, 12, 170. [Google Scholar] [CrossRef]
  24. Palomäki, K.I.; Nurmi, J. Taylor Series Interpolation-Based Direct Digital Frequency Synthesizer with High Memory Compression Ratio. Sensors 2025, 25, 2403. [Google Scholar] [CrossRef] [PubMed]
  25. Gustafsson, O.; Hellman, N. Approximate Floating-Point Operations With Integer Units by Processing in the Logarithmic Domain. In Proceedings of the IEEE 28th Symposium on Computer Arithmetic (ARITH), Lyngby, Denmark, 14–16 June 2021; pp. 45–52. [Google Scholar]
  26. Kim, S.Y.; Kim, C.H.; Lee, W.J.; Park, I.; Kim, S.W. Low-Overhead Inverted LUT Design for Bounded DNN Activation Functions on Floating-point Vector ALUs. Microprocess. Microsyst. 2022, 93, 104592. [Google Scholar] [CrossRef]
  27. Węgrzyn, M.; Voytusik, S.; Gavkalova, N. FPGA-based Low Latency Square Root CORDIC Algorithm. J. Telecommun. Inf. Technol. 2025, 99, 21–29. [Google Scholar] [CrossRef]
  28. Wei, J.; Kuwana, A.; Kobayashi, H.; Kubo, K. Revisit to Floating-Point Division Algorithm Based on Taylor-Series Expansion. In Proceedings of the 16th IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Ha Long Bay, Vietnam, 8–10 December 2020. [Google Scholar]
  29. Wei, J.; Kuwana, A.; Kobayashi, H.; Kubo, K.; Tanaka, Y. Floating-Point Inverse Square Root Algorithm Based on Taylor-Series Expansion. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 2640–2644. [Google Scholar] [CrossRef]
  30. Wei, J.; Kuwana, A.; Kobayashi, H.; Kubo, K.; Tanaka, Y. Floating-Point Square Root Calculation Algorithm Based on Taylor-Series Expansion and Region Division. In Proceedings of the IEEE 64th International Midwest Symposium on Circuits and Systems (MWSCAS2021), Fully Virtual and Online, 8–11 August 2021. [Google Scholar]
  31. Wei, J.; Kuwana, A.; Kobayashi, H.; Kubo, K. Divide and Conquer: Floating-Point Exponential Calculation Based on Taylor-Series Expansion. In Proceedings of the IEEE 14th International Conference on ASIC (ASICON 2021), Online Virtual, 26–29 October 2021. [Google Scholar]
  32. Kwon, T.J.; Draper, J. Floating-Point Division and Square Root Using a Taylor-series Expansion Algorithm. Microelectron. J. 2009, 40, 1601–1605. [Google Scholar] [CrossRef]
  33. Yao, D.; Xin, H.; Li, X.; Guo, Y. FPGA-Based Hardware Efficient Approximate Floating-Point Multiplier with LUT-Oriented Unit. In Proceedings of the 5th International Conference on Electronics, Circuits and Information Engineering, Guangzhou, China, 23–25 May 2025. [Google Scholar]
  34. Dinechin, F.; Kumm, M. Polynomial-Based Architectures for Function Evaluation. In Application-Specific Arithmetic; Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar]
  35. Muñoz, D.M.; Sánchez, D.F.; Llanos, C.H.; Ayala-Rincon, M. Tradeoff of FPGA Design of a Floating-Point Library for Arithmetic Operators. J. Integr. Circuits Syst. 2010, 5, 42–52. [Google Scholar] [CrossRef]
  36. Parhami, B. Computer Arithmetic: Algorithms and Hardware Designs, 2nd ed.; The Oxford Series in Electrical and Computer Engineering; Oxford Press: Oxford, UK, 2009. [Google Scholar]
  37. Burden, R.L.; Faires, J.D.; Burden, A.M. Numerical Analysis, 10th ed.; Cengage Learning: Mason, OH, USA, 2015. [Google Scholar]
  38. Ercegovac, M.D.; Imbert, L.; Matula, D.W.; Muller, J.-M.; Wei, G. Improving Goldschmidt Division, Square Root, and Square Root Reciprocal. IEEE Trans. Comput. 2000, 49, 759–763. [Google Scholar] [CrossRef]
  39. Jiang, H.; Hernandez Santiago, F.J.; Mo, H.; Liu, L.; Han, J. Approximate Arithmetic Circuits: A Survey, Characterization and Recent Applications. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2021, 40, 1685–1700. [Google Scholar] [CrossRef]
  40. Ercegovac, M.D.; Lang, T. Digital Arithmetic: Algorithms and Hardware, 2nd ed.; Morgan Kaufmann: Boston, MA, USA, 2020. [Google Scholar]
  41. Kim, S.; Lee, J.; Park, I.C. Energy-Efficient Floating-Point Unit Design for Embedded Processors. Electronics 2021, 10, 2471. [Google Scholar]
  42. Muller, J.-M.; Brunie, N.; de Dinechin, F. Approximation-Based Arithmetic for Floating-Point Computing: Recent Advances and Trends. IEEE Trans. Comput. 2020, 69, 1403–1416. [Google Scholar]
  43. Kornerup, P.; Muller, J.-M. Choosing Starting Values for Certain Newton–Raphson Iterations. Theor. Comput. Sci. 2006, 351, 101–110. [Google Scholar] [CrossRef]
  44. Donald, E. Seminumerical Algorithms. In The Art of Computer Programming, 3rd ed.; Addison Wesley Longman Publishing: Redwood, CA, USA, 1997; Volume 2, pp. 485–515. [Google Scholar]
  45. Muller, J.-M.; Brunie, N.; de Dinechin, F. Approximation-Based Arithmetic: A Survey of Polynomial and Table-Based Methods. IEEE Trans. Comput. 2020, 69, 1271–1288. [Google Scholar]
  46. Li, Y.; Zhang, H.; Chen, D. Low-Latency Polynomial Approximation for Floating-Point Functions in Hardware Accelerators. Sensors 2022, 22, 6143. [Google Scholar]
  47. Lee, D.; Burgess, N.; Constantinides, G.A. Polynomial Approximation Techniques for High-Performance Floating-Point Units. IEEE Trans. Very Large Scale Integr. Syst. 2021, 29, 1489–1502. [Google Scholar]
  48. Li, Y.; Zhang, H.; Chen, D. Efficient Polynomial-Based Floating-Point Function Evaluation for Embedded and Accelerator-Oriented Systems. Electronics 2022, 11, 3621. [Google Scholar]
  49. Wang, Y.; Luo, Y.; Xie, Y. Hardware-Efficient Design of Floating-Point Elementary Functions Using Taylor and Minimax Approximations. Sensors 2023, 23, 4187. [Google Scholar]
  50. Kim, J.; Park, S.; Lee, J. Trade-Off Analysis of LUT Size and Arithmetic Complexity in Approximation-Based Floating-Point Units. Electronics 2021, 10, 2854. [Google Scholar]
  51. Zhou, Q.; Xu, J.; Chen, Z. Polynomial Approximation Methods for Floating-Point Arithmetic on Heterogeneous CPU–GPU Platforms. Appl. Sci. 2024, 14, 1129. [Google Scholar]
  52. Hrycak, T.; Schmutzhard, S. Accurate Evaluation of Chebyshev Polynomials in Floating-Point Arithmetic. BIT Numer. Math. 2019, 59, 403–416. [Google Scholar] [CrossRef]
  53. Wang, X.; Xu, J.; Chen, Z. Domain Segmentation and Range Reduction Techniques for Efficient Hardware Implementation of Elementary Functions. Electronics 2023, 12, 1894. [Google Scholar]
  54. Pharr, M.; Jakob, W.; Humphreys, G. Physically Based Rendering, Fourth Edition: From Theory to Implementation; The MIT Press: Cambridge, MA, USA, 2023. [Google Scholar]
  55. Corke, P.; Jachimczyk, W.; Pillat, R. Robotics, Vision and Control: Fundamental Algorithms in MATLAB; Springer Nature: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
  56. Szeliski, R. Computer Vision—Algorithms and Applications, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
  57. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; Adaptive Computation and Machine Learning Series; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  58. Hankerson, D.; Menezes, A.J.; Vanstone, S. Guide to Elliptic Curve Cryptography; Springer Professional Computing; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
  59. Marschner, S.; Shirley, P. Fundamentals of Computer Graphics; A K Peters/CRC Press: Natick, MA, USA, 2021. [Google Scholar]
  60. Hou, B. Number Theory Based Modern Cryptography: RSA and Diffie-Hellman Algorithms. Theor. Nat. Sci. 2024, 51, 107–113. [Google Scholar] [CrossRef]
  61. Ikebe, M.; Ambalathankandy, P.; Ou, Y. HDR Tone Mapping: System Implementations and Benchmarking. ITE Trans. Media Technol. Appl. 2022, 10, 27–51. [Google Scholar] [CrossRef]
  62. Yu, H.; Yuan, G.; Kong, D.; Lei, L.; He, Y. An Optimized Method for Nonlinear Function Approximation Based on Multiplierless Piecewise Linear Approximation. Appl. Sci. 2022, 12, 10616. [Google Scholar] [CrossRef]
  63. Burgess, N.; Milanovic, J.; Stephens, N.; Monachopoulos, K.; Mansell, D. Bfloat16 Processing for Neural Networks. In Proceedings of the IEEE 26th Symposium on Computer Arithmetic (ARITH), Kyoto, Japan, 10–12 June 2019. [Google Scholar]
  64. Kumar, S.R.; Sk, N.M. Preferential Fault-Tolerant based TF32 Floating Point Adder for Mission Critical Systems. In Proceedings of the IEEE 33rd Asian Test Symposium (ATS), Ahmedabad, India, 17–20 December 2024. [Google Scholar]
  65. Pennestrì, P.; Huang, Y.; Alachiotis, N. A Novel Approximation Scheme for Floating-Point Square Root and Inverse Square Root for FPGAs. In Proceedings of the 11th International Conference on Modern Circuits and Systems Technologies (MOCAST), Bremen, Germany, 8–10 June 2022. [Google Scholar]
  66. Larsson-Edefors, P. Energy-Efficient Computation of TensorFloat32 Numbers on an FP32 Multiplier. In Proceedings of the IEEE/IFIP International Conference on VLSI and System-on-Chip (VLSI-SoC), Puerto Varas, Chile, 12–15 October 2025. [Google Scholar]
  67. Borges, C.F.; Jeannerod, C.-P.; Muller, J.-M. High-Level Algorithms for Correctly-Rounded Reciprocal Square Roots. In Proceedings of the IEEE 29th Symposium on Computer Arithmetic (ARITH), Lyon, France, 12–14 September 2022. [Google Scholar]
Figure 1. The waveforms generated by the Taylor series expansions of sin(x) and cos(x), centered at a = 0 , when truncated at up to 25 terms. Subfigures (a) and (b) illustrate the results for the sine and cosine functions, respectively.
Figure 1. The waveforms generated by the Taylor series expansions of sin(x) and cos(x), centered at a = 0 , when truncated at up to 25 terms. Subfigures (a) and (b) illustrate the results for the sine and cosine functions, respectively.
Electronics 15 01106 g001
Figure 2. IEEE-754 single-precision floating-point format.
Figure 2. IEEE-754 single-precision floating-point format.
Electronics 15 01106 g002
Table 1. Comparison of representative floating-point computation methods.
Table 1. Comparison of representative floating-point computation methods.
MethodAdvantagesDisadvantages
Taylor series
expansion
Simple polynomial method,
Easy coefficient generation,
Broad functional applicability
Slow convergence in regions,
High-order rounding errors,
Needs interval reduction
Chebyshev
Polynomial
approximation
Minimizes max. approximation
error, Lower degree for accuracy, Efficient polynomial evaluation
Complex coefficient computation, Poor accuracy outside the interval,
Requires domain transformation
Newton–Raphson Very fast quadratic convergence, High precision with few iterations, Suitable for division/sqrtNeeds a good initial estimate,
Multiple multiplications required,
Possible divergence issues
GoldschmidHighly parallelizable operations, Fast convergence like Newton,
High throughput for division
Requires many multipliers,
Sensitive to initial accuracy,
Latency is not always minimal
Digit-recurrenceStable digit-by-digit iteration,
No initial estimate needed,
Simple hardware structure
Slow convergence rate,
Complex control with tables,
Limited pipelining capability
CORDICUses shifts and additions,
Supports many math functions, Small, low-power hardware
Many iterations required, Higher latency for precision, Needs scaling compensation
LUT methodExtremely fast constant time,
Very simple hardware logic,
Predictable fixed latency
Large memory footprint,
Limited precision per entry,
Poor scalability for range
Table 2. Scheme without mantissa division.
Table 2. Scheme without mantissa division.
NumberMantissa RegionCenter Value a
R1-1M = 1.xxxxxx⋯ (1.0 ≤ M < 2.0)1.5
Table 3. Two-region division scheme (two sub-intervals).
Table 3. Two-region division scheme (two sub-intervals).
NumberMantissa RegionCenter Value a
R2-1M = 1.0xxxxx⋯ (1.0 ≤ M < 1.5)1.25
R2-2M = 1.1xxxxx⋯ (1.5 ≤ M < 2.0)1.75
Table 4. Four-region division scheme (four sub-intervals).
Table 4. Four-region division scheme (four sub-intervals).
NumberMantissa RegionCenter Value a
R4-1M = 1.00xxxxx⋯ (1.00 ≤ M < 1.25)1.125
R4-2M = 1.01xxxxx⋯ (1.25 ≤ M < 1.50)1.375
R4-3M = 1.10xxxxx⋯ (1.50 ≤ M < 1.75)1.625
R4-4M = 1.11xxxxx⋯ (1.75 ≤ M < 2.00)1.875
Table 5. Eight-region division scheme (eight sub-intervals).
Table 5. Eight-region division scheme (eight sub-intervals).
NumberMantissa RegionCenter Value a
R8-1M = 1.000xxxx⋯ (1.000 ≤ M < 1.125)1.0625
R8-2M = 1.001xxxx⋯ (1.125 ≤ M < 1.250)1.1875
R8-3M = 1.010xxxx⋯ (1.250 ≤ M < 1.375)1.3125
R8-4M = 1.011xxxx⋯ (1.375 ≤ M < 1.500)1.4375
R8-5M = 1.100xxxx⋯ (1.500 ≤ M < 1.625)1.5625
R8-6M = 1.101xxxx⋯ (1.625 ≤ M < 1.750)1.6875
R8-7M = 1.110xxxx⋯ (1.750 ≤ M < 1.875)1.8125
R8-8M = 1.111xxxx⋯ (1.875 ≤ M < 2.000)1.9375
Table 6. Required expansion terms n vs. target accuracy (non-division).
Table 6. Required expansion terms n vs. target accuracy (non-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R1-1 611131621
Table 7. Required expansion terms n vs. target accuracy (two divisions).
Table 7. Required expansion terms n vs. target accuracy (two divisions).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R2-1 4791114
R2-2 368912
Table 8. Required expansion terms n vs. target accuracy (four divisions).
Table 8. Required expansion terms n vs. target accuracy (four divisions).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R4-1 367811
R4-2 356710
R4-3 35679
R4-4 35679
Table 9. Required expansion terms n vs. target accuracy (eight divisions).
Table 9. Required expansion terms n vs. target accuracy (eight divisions).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R8-1 24568
R8-2 24568
R8-3 24568
R8-4 24568
R8-5 24567
R8-6 24567
R8-7 24557
R8-8 24557
Table 10. Operation counts for the n-term Taylor series expansion.
Table 10. Operation counts for the n-term Taylor series expansion.
# of Taylor Series Expansion Terms# of Multiplications# of Subtractions and Additions
333
444
544
655
755
866
Table 11. LUT for four-segment region division.
Table 11. LUT for four-segment region division.
Address ( α β )LUT Data
111/a for a = 1.875
101/a for a = 1.625
011/a for a = 1.357
001/a for a = 1.125
Table 12. Required expansion terms n vs. target accuracy (non-division).
Table 12. Required expansion terms n vs. target accuracy (non-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R1-1 59121419
Table 13. Required expansion terms n vs. target accuracy (two-division).
Table 13. Required expansion terms n vs. target accuracy (two-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R2-1 3781013
R2-2 36789
Table 14. Required expansion terms n vs. target accuracy (four-division).
Table 14. Required expansion terms n vs. target accuracy (four-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R4-1 356710
R4-2 25679
R4-3 24568
R4-4 24568
Table 15. Required expansion terms n vs. target accuracy (eight-division).
Table 15. Required expansion terms n vs. target accuracy (eight-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R8-1 24568
R8-2 24567
R8-3 24567
R8-4 24557
R8-5 24457
R8-6 24457
R8-7 23457
R8-8 23457
Table 16. Operation counts for n-term Taylor series expansion.
Table 16. Operation counts for n-term Taylor series expansion.
# of Taylor Series Expansion Terms# of Multiplications# of Additions and Subtractions# of LUT Words for
N Regions
3338N
44410N
54412N
65514N
75516N
86618N
Table 17. LUT for four-segment region division (10 × 4 = 40 words).
Table 17. LUT for four-segment region division (10 × 4 = 40 words).
Address   ( α β )LUT Data
11 β 0 , , β 4 ;   α 0 , , α 4
for a = 1.875
10 β 0 , , β 4 ;   α 0 , , α 4
for a = 1.625
01 β 0 , , β 4 ;   α 0 , , α 4
for a = 1.357
00 β 0 , , β 4 ;   α 0 , , α 4
for a = 1.125
Table 18. Required expansion terms n vs. target accuracy (non-division).
Table 18. Required expansion terms n vs. target accuracy (non-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R1-1 3791215
Table 19. Required expansion terms n vs. target accuracy (two-division).
Table 19. Required expansion terms n vs. target accuracy (two-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R2-1 357811
R2-2 256710
Table 20. Required expansion terms n vs. target accuracy (four-division).
Table 20. Required expansion terms n vs. target accuracy (four-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R4-1 24569
R4-2 24568
R4-3 24568
R4-4 24557
Table 21. Required expansion terms n vs. target accuracy (eight-division).
Table 21. Required expansion terms n vs. target accuracy (eight-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R8-1 23457
R8-2 23457
R8-3 23457
R8-4 23456
R8-5 23456
R8-6 23456
R8-7 23446
R8-8 23446
Table 22. Operation counts for n-term Taylor series expansion.
Table 22. Operation counts for n-term Taylor series expansion.
# of Taylor Series Expansion Terms# of Multiplications# of Additions and Subtractions
333
444
555
666
777
888
Table 23. LUT for four-segment region division.
Table 23. LUT for four-segment region division.
Address   ( α β )LUT Data
00 α 1 , , α 4   and   β 0 , , β 4  
for a = 1.125
01 α 1 , , α 4   and   β 0 , , β 4
for a = 1.357
10 α 1 , , α 4   and   β 0 , , β 4
for a = 1.625
11 α 1 , , α 4   and   β 0 , , β 4
for a = 1.875
Table 24. Required expansion terms n vs. target accuracy (non-division).
Table 24. Required expansion terms n vs. target accuracy (non-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R1-1 478911
Table 25. Required expansion terms n vs. target accuracy (two-division).
Table 25. Required expansion terms n vs. target accuracy (two-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R2-1 35679
R2-2 35679
Table 26. Required expansion terms n vs. target accuracy (four-division).
Table 26. Required expansion terms n vs. target accuracy (four-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R4-1 34557
R4-2 34557
R4-3 34557
R4-4 34557
Table 27. Operation counts for n-term Taylor series expansion.
Table 27. Operation counts for n-term Taylor series expansion.
# of Taylor Series Expansion Terms# of Multiplications# of Additions and Subtractions
333
444
555
666
777
888
Table 28. LUT for four-segment region division.
Table 28. LUT for four-segment region division.
Address   ( α β )LUT Data
00Exp(a) for a = 1.125
01Exp(a) for a = 1.357
10Exp(a) for a = 1.625
11Exp(a) for a = 1.875
Table 29. Required expansion terms n vs. target accuracy (non-division).
Table 29. Required expansion terms n vs. target accuracy (non-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R1-1 1315182223
Table 30. Required expansion terms n vs. target accuracy (two-division).
Table 30. Required expansion terms n vs. target accuracy (two-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R2-1 1011131616
R2-2 457910
Table 31. Required expansion terms n vs. target accuracy (four-division).
Table 31. Required expansion terms n vs. target accuracy (four-division).
Accuracy 1 2 8 1 2 16 1 2 20 1 2 24 1 2 32
Taylor Series
Expansion Region
R4-1 88101212
R4-2 45688
R4-3 44678
R4-4 44577
Table 32. Operation counts for n-term Taylor series expansion.
Table 32. Operation counts for n-term Taylor series expansion.
# of Taylor Series Expansion Terms# of Multiplications# of Additions and Subtractions
333
444
555
666
777
888
Table 33. LUT for four-segment region division.
Table 33. LUT for four-segment region division.
Address   ( α β )LUT Data
00 α 0 , , α 4
for a = 1.125
01 α 0 , , α 4
for a = 1.357
10 α 0 , , α 4
for a = 1.625
11 α 0 , , α 4
for a = 1.875
Table 34. Required orders of Taylor series for 1/ 2 16 accuracy with the mantissa region divided into four parts.
Table 34. Required orders of Taylor series for 1/ 2 16 accuracy with the mantissa region divided into four parts.
Arithmetic 1 x 1 x x exp x l o g 2 x
Mantissa-
Region Division
4 65448
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, J.; Kobayashi, H. Review of Floating-Point Arithmetic Algorithms Using Taylor Series Expansion and Mantissa Region Division Techniques. Electronics 2026, 15, 1106. https://doi.org/10.3390/electronics15051106

AMA Style

Wei J, Kobayashi H. Review of Floating-Point Arithmetic Algorithms Using Taylor Series Expansion and Mantissa Region Division Techniques. Electronics. 2026; 15(5):1106. https://doi.org/10.3390/electronics15051106

Chicago/Turabian Style

Wei, Jianglin, and Haruo Kobayashi. 2026. "Review of Floating-Point Arithmetic Algorithms Using Taylor Series Expansion and Mantissa Region Division Techniques" Electronics 15, no. 5: 1106. https://doi.org/10.3390/electronics15051106

APA Style

Wei, J., & Kobayashi, H. (2026). Review of Floating-Point Arithmetic Algorithms Using Taylor Series Expansion and Mantissa Region Division Techniques. Electronics, 15(5), 1106. https://doi.org/10.3390/electronics15051106

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop