Approximation of General Functions Using Stochastic Computing

Kind, Adriel

doi:10.3390/electronics14091845

Open AccessArticle

Approximation of General Functions Using Stochastic Computing

by

Adriel Kind

Wireless Research Centre, Faculty of Engineering, University of Canterbury, Christchurch 8041, New Zealand

Electronics 2025, 14(9), 1845; https://doi.org/10.3390/electronics14091845

Submission received: 13 February 2025 / Revised: 14 April 2025 / Accepted: 27 April 2025 / Published: 30 April 2025

(This article belongs to the Special Issue Stochastic Computing and Its Application)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Stochastic computing (SC) is a computational paradigm that represents numbers using Bernoulli processes. It is attractive due to its low power, low hardware complexity, and excellent noise tolerance properties. An SC algorithm that approximates arbitrary functions on

[- 1, 1]

using a partial sum of Chebyshev polynomials is presented. The new method displays several advantages over existing SC methods for function approximation, including the accurate modelling of complicated functions with relatively few terms, the fact that an output sample is produced for every input sample, and the extension of the domain and range to negative numbers. As a building block of the complete algorithm, an efficient method of computing the dot product of two vectors is presented, which has a broad utility beyond the application presented here.

Keywords:

stochastic computing; function approximation

1. Introduction

The foundational principle of stochastic computing is the representation of a number

p \in [0, 1]

as a

Bernoulli (p)

process, which is an infinite sequence of statistically independent bits where p is the probability that any given bit is ‘1’. The representation is motivated by the facts that arithmetical operations on such bit streams are typically very hardware- and power-efficient, and the represented values are highly tolerant of bit-flip errors [1,2,3,4,5,6]. Potential future applications of SC include extremely low power edge processing, sensor conditioning for embedded/wearable thin-film devices, molecular computing, and sensing/telemetry in extreme environments where errors are frequent. Even without specifying specific applications, which may be yet to be discovered, the motivation for the ability to model arbitrary mathematical functions is self-evident.

The goal is to closely approximate a real function f over some domain

X \in R

using stochastic computing methods. Approximation using partial sums of basis functions

ϕ_{n} (x)

is considered,

f_{N} (x) = \sum_{n = 0}^{N} a_{n} ϕ_{n} (x)

(1)

which converges at least pointwise to f, i.e.,

lim_{N \to \infty} f_{N} (x) = f (x), \forall x \in X .

(2)

The function f is then completely defined by the set of coefficients

{a_{n}}

. In Section 4, Chebyshev polynomials are employed as the basis functions of interest.

The method developed in the paper is compared to the Bernstein polynomial approach of [7,8], which is the only existing method of arbitrary function approximation using stochastic computing. Performance results show that our approach is much more accurate as a function of complexity.

In order to extend the available mathematical machinery, the ability to represent negative numbers is required. A bipolar value x can be represented using a

Bernoulli (p)

process as follows. Given a parameter

L > 0

and the constraint that

| x | \leq L

, the representation may be achieved by the following linear transformation [9]:

T (p; L) = L (2 p - 1) .

(3)

The inverse transform is

T^{- 1} (x; L) = \frac{1}{2} (\frac{x}{L} + 1)

(4)

In the remainder of the work, the shorthand notation

B_{x}

is used to refer to a

Bernoulli (T^{- 1} (x; L))

process.

The paper is organised as follows. In Section 2, prior approaches to stochastic computing-based function approximation are reviewed. In Section 3, an algorithm that implements the inner product of two vectors is presented, namely the stochastic dot product (SDP). The SDP is generally useful; however, in the context of this work, it is employed as a building block of the more general function approximation design, presented in detail in Section 4. Having defined the algorithm, Section 5 is used to demonstrate the utility of the design, investigate its behaviour in various situations, and compare its performance with existing methods. Section 6 is a discussion comparing the relative merit of this work with the Bernstein polynomial approach, and Section 7 concludes the work.

2. Existing Function Approximation Methods

Before describing the architecture, some of the current approaches to the problem of function approximation using stochastic computing are briefly reviewed.

2.1. Logic Gates with Memory

Combinations of logic gates can implement various mathematical functions by manipulating the distribution of the resulting bit stream; the canonical SC example is multiplication with a single AND gate.

Introducing memory and feedback increases the range of available functions at the cost of introducing correlations into the bit stream. For example, an AND gate with an inverter at each input and the output fed into one of the inputs implements the function

(1 - p) / (2 - p)

. Using symbolic algebra software, one can quickly generate tables of feedback logic circuits and their associated mathematical functions, which are always ratios of polynomials [10]. Occasionally, one of these functions turns out to be useful for a particular application, e.g., the Muller-C element [11], but the range of functions is too limited to be generally applicable.

2.2. Finite State Machines

A finite state machine (FSM) can be used to implement certain functions in SC [6,12,13,14]. State transitions are controlled by the input bits (and, hence, p), while the output bit distribution is defined by the current state. The output distributions can be matched to a desired function as described in [13]. The technique is very hardware-efficient and an excellent implementation choice for certain functions such as

sigmoid

. Unfortunately, however, the FSM method is not general, and it performs poorly when modelling most functions of interest.

2.3. Spectral Methods

A spectral transform interpretation of SC combinatorial logic functions is presented in [5], along with an innovative framework for designing logic circuits to implement arbitrary multi-linear equations. Similarly to FSM, the technique is very efficient but cannot model general nonlinear functions.

2.4. Bernstein Polynomials

Setting the basis function

ϕ_{n}

to a Bernstein polynomial of degree N,

P_{n, N} (p) = (\binom{N}{n}) p^{n} {(1 - p)}^{N - n}

(5)

for

p \in [0, 1]

, true arbitrary functions can be represented in the sense of (1) and (2) [15]. An ingenious connection with SC lies in the fact that

P_{n, N} (p)

is also the mass function in n of a

Binomial (N, p)

distribution. After summing N input bits to obtain the binomial sample n, a bit from a

Bernoulli (a_{n})

process is sampled as the output. The resulting bit stream is a

Bernoulli (f_{N} (p))

process [7,8].

Of all the stochastic computing-based function approximation methods discussed in Section 2, the Bernstein polynomial method described in [7,8] is the only one capable of a general function approximation. Although elegant and innovative, this method suffers from two drawbacks. The first is that function approximation using partial sums of Bernstein basis polynomials suffers from notoriously slow convergence [15]. Compared to Chebyshev basis polynomials for functions considered in Section 5, the Bernstein partial sum can require larger values of N by a factor of 100 for a similar accuracy. This is a mathematical property unrelated to the stochastic computing-based implementation. Secondly, the method described above produces only a single output bit for each N input bits. A true

Bernoulli (p)

process can be expensive, and so, ideally, an output would be generated for every input (the entropy of the input and the information-theoretic relationship between stochastic inputs and outputs is explored in [16]).

3. Stochastic Dot Product

In this section, an algorithm to compute the vector dot product

c \cdot x = \sum_{k = 1}^{K} c_{k} x_{k}

is described, where

c

is a length-K vector of pre-defined integers, and

x

is an arbitrary length-K vector with the constraint that

|x_{k}| \leq L

. The constraint on

x

allows for implementation as a bipolar stochastic circuit, with output

x_{o}

defined as

x_{o} = sign (c \cdot x) min (L, |c \cdot x|) .

(6)

In other words,

x_{o}

is the desired dot product saturated at

\pm L

, as imposed by the stochastic bipolar representation. Note that the integers

c_{k}

are unconstrained; only

| c \cdot x | \leq L

is required. A special case of the SDP is the integer scaling of a single bipolar bit stream, enabled by setting

K = 1

.

We allow a deviation from classical stochastic computing techniques by introducing some deterministic operations, the most complex of which is a sum of two integers.

The SDP requires selection of an even sample period

M > 1

. At each time

m \in {1, \dots, M}

within the sample period, an output is the bit

b_{o, m}

, and an input is a set of K bits

{b_{1, m}, \dots, b_{K, m}}

, with bit

b_{k, m}

being drawn from

B_{x_{k}}

.

The SDP has a delay of M samples, where a set of M outputs is computed based on the previous M inputs. The SDP works for a general M; however, as will be shown in Section 3.6, the architecture benefits greatly in terms of efficiency if M is as large as possible, and also if M is selected to be a power of 2.

As a reference for the detailed description of each algorithm component in the remainder of the section, the complete SDP algorithm is listed in Algorithm 1.

Algorithm 1: Stochastic Dot Product Algorithm

3.1. Output Bit Stream

The output bit stream is required to be a

B_{x_{o}}

process. In order to approximate the correct distribution after receiving M inputs, a

Bernoulli ({\hat{p}}_{o})

distribution is imposed on the subsequent M output bits. Parameter

{\hat{p}}_{o}

is the local empirical estimate of

p_{o} = T^{- 1} (x_{o}; L)

, with the constraint that

π_{1} = {\hat{p}}_{o} M

is an integer. The output is given the desired distribution using one of the following options.

Option 1: Emit M samples from a

Bernoulli ({\hat{p}}_{o})

distribution. For

M = 2

, this rule becomes

b_{o, m} \leftarrow \{\begin{matrix} 0, & π_{1} = 0 \\ u_{m}, & π_{1} = 1 \\ 1, & π_{1} = 2 \end{matrix}

where

u_{m}

is a sample from a uniform binary distribution.

Option 2: Emit

b_{o, m} \leftarrow 1

for each of the first

π_{1}

outputs and

b_{o, m} \leftarrow 0

for the remaining

M - π_{1}

outputs.

In this work, we exclusively used Option 2, which has a simpler implementation, being deterministic given

π_{1}

. The drawback of Option 2 is the introduction of correlations into the output bit sequence.

3.2. Empirical Distribution

It remains to determine

{\hat{p}}_{o}

or, equivalently,

π_{1}

.

At the beginning of each M-input period, initialise integer

σ \leftarrow σ_{0}

, where

σ_{0} = \frac{M}{2} (1 - \sum_{k = 1}^{K} c_{k}),

(7)

which is a constant that can be computed offline and stored at circuit creation. At each input time

m \in 1, \dots, M

, for each of the K inputs

b_{k, m}

, update

σ

according to

σ \leftarrow σ + c_{k} b_{k, m},

(8)

i.e., increment

σ

by the constant

c_{k}

when the input bit corresponding to

x_{k}

is 1.

After receiving M inputs, the mean

E [σ] = p_{o} M

is precisely the quantity of interest, namely M times the unipolar representation of the dot product (6). To verify, expand the mean as

\begin{matrix} E [σ] & = σ_{0} + E [\sum_{k = 1}^{K} \sum_{m = 1}^{M} c_{k} b_{k, m}] \\ = \frac{M}{2} (1 - \sum_{k = 1}^{K} c_{k}) + M \sum_{k = 1}^{K} c_{k} p_{k} \\ = \frac{M}{2} (1 + \sum_{k = 1}^{K} c_{k} (2 p_{k} - 1)) . \end{matrix}

(9)

The corresponding bipolar output is

\begin{matrix} x_{o} & = T (p_{o}; L) \\ = L (2 (\frac{E [σ]}{M}) - 1) \\ = \sum_{k = 1}^{K} c_{k} L (2 p_{k} - 1) \\ = \sum_{k = 1}^{K} c_{k} x_{k}, \end{matrix}

(10)

where the substitution from (3) in the last line yields the desired result.

3.3. Bit Stream Correlations

Note that the order of summation in (8) does not affect the result, making the SDP impervious to correlations in each M-input bit sequence. In Section 4, we cascade an arbitrary number of SDPs in series; this observation allows the output bit stream of each to be assigned according to Option 2 in Section 3.1. The output of the final SDP can be assigned according to Option 1 if downstream correlations need to be avoided.

3.4. Averaging Across Sample Periods

Recall that

π_{1} = {\hat{p}}_{o} M

is the local estimate of the output distribution. In practice, the circuit measures

{\bar{π}}_{1}

, which is an unconstrained estimate that may not actually be implementable unless

0 \leq {\bar{π}}_{1} \leq M

. This unconstrained estimate

{\bar{π}}_{1}

is computed as

{\bar{π}}_{1} \leftarrow t + σ,

(11)

where the integer t is defined below. Using the clamping function (defined strictly for

a < b

)

{⌊ i ⌉}_{a}^{b} = \{\begin{matrix} a, & i < a \\ b, & i > b \\ i, & otherwise, \end{matrix}

(12)

an implementable value can be obtained using

π_{1} \leftarrow {⌊ {\bar{π}}_{1} ⌉}_{0}^{M} .

(13)

The truncated value

π_{1}

is then used to generate the desired output distribution according to Section 3.1.

Finally, in order to account for the nonlinearity of the clamping function, the SDP maintains an accumulator

t \leftarrow {\bar{π}}_{1} - π_{1}

(14)

which is updated after each measurement of

π_{1}

.

The rationale for the accumulator is that if an instantaneous value of

{\bar{π}}_{1}

cannot be represented explicitly, the “excess” value is stored until it can be represented in the output bit stream at some point in the future. Considering the ratio of total ‘1’s to total ‘0’s over many samples of

π_{1}

, the accumulator t ensures that the overall output distribution is correct, even if it is locally skewed by the clamping function. Each time t crosses zero, the distribution of all previous outputs in totality is just the unbiased empirical estimate of

σ

.

3.5. Accumulator Stability

It is instructive to consider the accumulator behaviour when

t ≪ 0

. In this case,

π_{1} = 0

and

\begin{matrix} E [{\bar{π}}_{1} - π_{1}] & = t + E (σ) - π_{1} \\ = t + p_{o} M \end{matrix}

and, hence, t increases over time at a rate proportional to

p_{o}

while the condition remains true. Similarly, if

t ≫ M

, then

π_{1} = M

and

E [{\bar{π}}_{1} - π_{1}] = t - (1 - p_{o}) M,

which decreases over time. This negative feedback ensures that the value of t will always eventually return to zero.

While the accumulator is stable in expectation, the zeroing effect is relatively slow and the accumulator is prone to large local perturbations if the variance of

σ

is significantly greater than M. The effect on t of a large sample

σ = s

where

s ≫ M

will take a minimum of

s / M

sample periods to remove, during which time the output bit stream will be all ‘1’s.

The variance can be quantified as

\begin{matrix} Var [σ] & = Var [\sum_{k = 1}^{K} \sum_{m = 1}^{M} c_{k} b_{k, m}] \\ = \sum_{k = 1}^{K} c_{k}^{2} \sum_{m = 1}^{M} Var [b_{k, m}] \\ = M \sum_{k = 1}^{K} c_{k}^{2} p_{k} (1 - p_{k}), \end{matrix}

(15)

which is maximized when the values of

p_{k}

are all equal to

1 / 2

(equivalently

x_{k} = 0)

. Assuming the values of

p_{k}

are uniformly distributed, their influence can be averaged out to obtain

\begin{matrix} E_{p} [Var [σ]] & = \int_{0}^{1} M \sum_{k = 1}^{K} c_{k}^{2} p (1 - p) d p \\ = \frac{M}{6} \sum_{k = 1}^{K} c_{k}^{2} . \end{matrix}

(16)

The above expression demonstrates that when algorithms are designed with the SDP as a building block, the magnitude of constants

c_{k}

critically affects the stability of the circuit.

3.6. SDP Complexity

If an SC architecture such as the SDP finds widespread practical utility, it will likely be associated with a newer and more novel technology than classical CMOS. Rather than introducing various assumptions required to compute die area and gate counts, the complexity discussion will remain as general as possible.

In the main algorithm implementation presented in Section 4, the elements of vector

c

are small constant integers (

| c_{k} | \leq 2

), and as such the increment (8) has an efficient hardware implementation.

If M is chosen as a power of 2, Equation (14) is trivial when

{\bar{π}}_{1} \leq M

and, otherwise, is performed by simply flipping bits from bit

μ

towards the most-significant bit (MSB) until a ‘1’ is found (where

μ = 1 + {log}_{2} M

). Similarly, the function (13) has a very simple implementation.

The bulk of the hardware complexity resides in the general addition (11), which is performed once every M input samples. Notwithstanding the discussion in Section 3.4 regarding the accumulator behaviour, the magnitude of t (and, hence, the size of the register required to contain

{\bar{π}}_{1}

) is theoretically unbounded. For the simulations in Section 5, 16 bits was more than enough to safely represent

{\bar{π}}_{1}

, and further optimization is likely possible.

Finally, given that the operations discussed here are only performed once every M bits, it is prudent to consider using larger values of M. Experiments showed that M could be varied between 2 and 1024 without affecting the accuracy or precision.

3.7. Alternative Dot-Product Methods

While the SDP was developed specifically for the application described in Section 4, other dot-product approaches, or techniques that could be applied to such, have been described in the literature [12,14,17,18,19,20,21]. These were not considered here for the following reasons.

In our application, one of the dot product inputs is a vector of integers, which is known at circuit creation. We wish to exploit this fact in order to reduce noise.
It is unclear how to adapt existing methods to represent unconstrained integer inputs, or to produce a bipolar output $| x | \leq L$ .

On the other hand, a drawback of the proposed SDP is the use of deterministic binary additions. An attempt to adapt existing dot-product methods to this application in order to eliminate additions would be a worthwhile future endeavour.

4. Chebyshev Function Approximation

The goal is to approximate an arbitrary function

f (x)

on

[- 1, 1]

as the partial sum

f_{N} (x) = \sum_{n = 0}^{N} a_{n} T_{n} (x)

(17)

where

T_{n} (x) = cos (n {cos}^{- 1} (x))

is the nth Chebyshev polynomial of the first kind [22,23]. Chebyshev polynomials are an excellent basis for function approximation due to their simplicity and low error for a given N, and as such, they are frequently applied in numerical computing applications [24,25]. The computation of the coefficients

a_{n}

was performed offline and is not considered here [26].

Since

T_{n} (x)

can be expressed as a degree-n polynomial in x with integer coefficients, in principle, it could be computed directly using the SDP along with a circuit that raises x to an integer power (a sequence of n XNOR gates in series implements

x^{n} / L^{n - 1}

). Unfortunately, the coefficients of

T_{n} (x)

increase rapidly in magnitude with n, leading to large accumulator excursions in the SDP implementation. To illustrate, the variance (16) for the SDP implementation of

T_{13} (x)

is approximately

10^{8} M

.

4.1. Stochastic Clenshaw Algorithm

To obtain a more stable and efficient implementation, the Clenshaw algorithm [27] can be applied to compute (17). Stating the algorithm directly, the partial sum can be expressed as

f_{N} (x) = a_{0} + x g_{1} (x) - g_{2} (x)

(18)

where the

g_{n} (x)

are defined by the recurrence relation

g_{n} (x) = a_{n} + 2 x g_{n + 1} (x) - g_{n + 2} (x),

(19)

initialised by

g_{N + 1} (x) = g_{N + 2} (x) = 0 .

(20)

The algorithm suggests a direct stochastic implementation as

N + 1

cascaded repeating circuit stages, with each circuit implementing one stage of the recurrence (19).

4.1.1. Clenshaw Stage

Equation (19) can expressed as the dot product of vectors

x = [a_{n}, x g_{n + 1}, g_{n + 2}]

and

c = [1, 2, - 1]

, which immediately suggests the use of an SDP. The stochastic representation of the product

x g_{n + 1}

can be computed with an XNOR gate on the output of

B_{x}

and

B_{g_{n + 1}}

, which yields the bipolar representation of

x g_{n + 1} / L

[9,28]. The combined circuit representing one stage of the stochastic Clenshaw algorithm is shown in Figure 1.

In order to account for the

1 / L

scale factor introduced by the XNOR gate, the corresponding SDP constant could be scaled as

c = [1, 2 L, - 1]

(assuming

2 L

is an integer). Given the expression for variance (16), however, and the fact that many such stages will be cascaded, scaling the constant in this way significantly increases the noise level. Since x is only defined within

[- 1, 1]

, a better solution is to scale the input by L. This could be performed as part of generating the input Bernoulli process or online via a single-term SDP with

c = [L]

.

4.1.2. Complete Clenshaw Circuit

Having defined a single stage, the entire algorithm can be implemented using

N + 1

stages in series, connected as shown in Figure 2. This is a direct implementation of Equations (18)–(20).

Noting the edge cases in the algorithm definition, the constant vector

c_{n}

for the SDP in stage n is defined as follows:

c_{n} = \{\begin{matrix} [1, 0, 0], & if n = N \\ [1, 2, 0], & if n = N - 1 \\ [1, 1, - 1], & if n = 0 \\ [1, 2, - 1], & otherwise, \end{matrix}

(21)

so the output of stage

n = 0

is the desired partial sum (17).

Once an

N + 1

-stage dedicated circuit implementing Figure 2 is created, it can model any function f with an order-N partial sum simply by providing the appropriate input processes

B_{a_{n}}

.

One useful optimization is that in the case

a_{n} = 0

, the process

B_{a_{n}}

can be ignored, or equivalently the integer

{(c_{n})}_{1}

can be set to zero. This is useful to reduce the variance (16) when modelling odd or even functions, in which case

a_{n} = 0

for every other n.

Parallel implementations of the Clenshaw algorithm are known [29]; however, in this application, the serial implementation can be fully pipelined with no loss of throughput. Each unbiased output sample from the algorithm requires only a single sample from

B_{L x}

, along with a single sample from each of the

B_{a_{n}}

.

5. Numerical Investigation

For each of the experimental results presented in this section,

S = 10^{6}

input bits were presented to the stochastic circuit in order to obtain the measurement at the output. The output estimator measures

T (\sum b_{o} / S)

, which can be implemented using techniques in, e.g., [18]. The variance of this estimator scales as

1 / S

. While

S = 10^{6}

is almost certainly impractically large in real applications, for the purposes of Section 5, we seek to investigate systematic errors and biases which might otherwise be obscured by sampling noise if a smaller S were used.

Axis labels are omitted from plots in order to preserve page space; the horizontal axis always represents the independent variable x, and the vertical axis represents

f (x)

.

5.1. Examples

Some selected examples of simple modelled functions are shown in Figure 3 along with the stochastic circuit simulation results. The function definitions are listed in the caption along with the number of terms N in the partial sum.

In Figure 3a,b is shown the degree-3 polynomials in the examples from [13] and [7], respectively, extended into the bipolar domain. Since any degree-d polynomial can always be represented exactly as the sum (17) with

N = d

[22], only four stochastic Clenshaw stages are required in each case.

Figure 3c shows a tanh function modelled with 12 Clenshaw stages. This is the curve from Example 3 of [12], shifted and scaled to be centred in the bipolar regime. In this case, the partial sum is not an exact representation but a truncated infinite sum. This represents an alternative implementation of an activation function for artificial neural networks, given recent interest in using SC for such applications [30,31].

The final subplot of Figure 3 shows the sum of two Chebyshev polynomials. This is an exact representation with only two non-zero terms appearing in (17).

Further examples are presented in Figure 4, this time for general functions with no limitation on complexity. Figure 4a,b show selected oscillatory functions with 21 terms. These are examples of transcendental functions that are complicated to implement using traditional logic.

Figure 4c shows a

sign

function modelled with

N = 50

. The Gibbs phenomenon can be seen causing oscillations due to the nonlinearity at the step. Even with 51 stochastic Clenshaw stages configured in series, however, the stochastic circuit faithfully models the partial sum with low noise.

Finally, Figure 4d shows an extreme example where the method fails. The Gamma function is modelled with 61 terms over a domain containing multiple singularities and discontinuities. The right-hand side of the plot shows a somewhat accurate approximation (albeit with some oscillation), but in the left-hand side of the plot, the approximation completely breaks down due to noise propagation through the Clenshaw stages.

5.2. Choosing the Scale Parameter

The setting of parameter L in the bipolar representation (3) is a design decision to be addressed. The representation is limited to

| x | \leq L

, so L should be at least as large as the expected largest value, including any intermediate values within the Clenshaw stages. If L is much larger than necessary, however, the domain of interest is mapped via (4) onto only a portion of the available unipolar space. For example, if x is in

[- 1, 1]

and

L = 10

, the inverse map (4) restricts p to the interval

[0.45, 0.55]

, ignoring 90% of the available range of values. The experiment shown in Figure 5 explores this effect.

In the top left sub-figure, the scale is set to

L = 1

. The noise appears quite small; however, the stochastic circuit is completely unable to represent

f (x)

for

x > 0.7

. The reason is that intermediate values within the Clenshaw stages violate the condition

| g_{n} | \leq L

. These intermediate values are clipped before propagating through the circuit, and as a result, the final value of

f (x)

is inaccurate.

In contrast, when

L = 8

(shown in the bottom right sub-figure), the output shows no evidence of clipping, but the high level of noise produces a poor approximation.

In general, for optimal performance, L should be set as small as possible, but large enough that any clipping is avoided. The appropriate value of L can be quickly investigated offline using a floating-point version of the Clenshaw algorithm as the maximum absolute intermediate value recorded. For the function shown in Figure 5, the optimal value of L is evidently between 2 and 4.

5.3. Comparison with Bernstein Approximation

Of the various function approximation methods introduced in Section 2, only the Bernstein approximation [7,8] is capable of representing general functions.

Figure 6 shows the measured squared error between a given function and the two universal approximators, as a function of the number of stages N. The squared error is averaged over 200 values of x, evenly spaced across the domain of interest. The very long sequences of input bits (

S = 10^{6}

) minimises sampling noise and allows for an investigation of the inherent error for each design.

In Figure 6a, the function modelled in Figure 4b is reinvestigated. As shown in Figure 4, the Chebyshev approximation gives a good fit with

N = 20

terms, corresponding to a mean squared error of roughly

10^{- 4}

in Figure 6a. In contrast, even with

N = 200

stages, the Bernstein approximation cannot reach this level of accuracy.

In Figure 6b, a much more complicated function is modelled, as specified in the figure caption. The Chebyshev approximator requires around

N = 50

stages to achieve a good fit. In contrast, the Bernstein approximator requires

N = 1000

to even approach this level of accuracy.

The examples discussed above were not cherry-picked; the relative difference in the required number of stages seems consistent for all functions of interest.

Interestingly, for very small values of N, the Bernstein architecture outperforms the proposed method in terms of accuracy, as can be seen in Figure 6. For such small values of N, however, the approximation remains very poor for both architectures.

6. Discussion

The examples in Section 5 clearly demonstrate the utility and performance of the proposed function approximation circuit. Some of the limitations and drawbacks are in turn discussed here.

The most notable drawback is the complexity of the

N + 1

SDPs. The deterministic operations discussed in Section 3.6 are performed

(N + 1) / M

times per output bit, the most significant of which is a signed integer addition. Noting from Section 5 that the most complicated functions can require

N \approx 50

, and setting

M = 1024

, the signed addition is required around once every 20 output bits in the worst case.

Another source of complexity is the

N + 1

Bernoulli source processes representing the coefficients

a_{0}, \dots, a_{N}

. Whether consumed from some high-entropy source or pseudo-randomly produced within the circuit itself, the Bernoulli processes represent significant resources. The Chebyshev circuit is not unique in this regard; the Bernstein circuit also requires

N + 1

such processes representing the coefficients, and in that case, N can be at least an order of magnitude larger as observed in Section 5.3.

In the following, we refer to the circuit proposed in this work as “Chebyshev” and the circuit proposed in [7] as “Bernstein”. It is not possible to definitively rank the two circuit designs without considering the application, the technology used to implement the circuit, and the design constraints.

If the application is rate-limited, where the maximum number of output samples is required in a given time as much as possible, Chebyshev is clearly the best.

In a power-limited scenario where the source processes must be generated or conditioned in some way, Chebyshev likely consumes less power due to the greatly reduced number of Bernoulli processes representing coefficients. This will be offset somewhat by the relatively expensive processing required by Chebychev every

(N + 1) / M

inputs.

If the goal is to minimise the circuit size, the answer, again, depends on the availability of the source processes and is similarly murky. The

N + 1

SDPs used to implement Chebyshev obviously requires more logic gates than the simple 1-bit adders composing Bernstein. However, the smallest overall circuit depends on the relative size of the source processes and the fact that Bernstein requires so many more of them.

In requiring some deterministic addition operations, both approximations lose some of the tolerance to bit-flip errors characteristic of stochastic computing. For example, an error in the MSB of the integer

σ

in one of the SDPs would be far more damaging to accuracy than an error in one of the input bits. If the circuit were to be used in an environment or application where bit-flip errors are common, the deterministic portions of the circuit may require additional protection of some kind.

7. Conclusions

In this work, a new stochastic computing architecture for function approximation is proposed based on a partial sum Chebyshev expansion. The method can model arbitrarily complicated functions on

[- 1, 1]

with high accuracy using relatively few terms in the partial sum. Unlike other proposals in the literature, the method extends the domain over the negative numbers and has an adjustable range. The architecture is full-rate, in that an output sample is produced for every input sample.

As a core building block of the new architecture, a stochastic circuit is presented that computes the dot product of a bipolar vector with an arbitrary constant integer vector. This dot-product core has useful applications in and of itself.

Funding

This research received no external funding.

Data Availability Statement

Example Julia code demonstrating the methods described above is available at https://github.com/turbhrus/StochasticApprox, accessed on 26 April 2025.

Conflicts of Interest

The author declares no conflicts of interest.

References

von Neumann, J. Probabilistic logics and the synthesis of reliable organisms from unreliable components. In Automata Studies; Princeton University Press: Princeton, NJ, USA, 1965; Volume 43. [Google Scholar]
Poppelbaum, W.J.; Afuso, C.; Esch, J.W. Stochastic computing elements and systems. In Proceedings of the AFIPS Fall Joint Computer Conference, Anaheim, CA, USA, 14–16 November 1967. [Google Scholar]
Alaghi, A.; Hayes, J.P. Survey of Stochastic Computing. ACM Trans. Embed. Comput. Syst. 2013, 12, 1–19. [Google Scholar] [CrossRef]
Alaghi, A.; Qian, W.; Hayes, J.P. The promise and challenge of stochastic computing. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2018, 37, 1515–1531. [Google Scholar]
Alaghi, A.; Hayes, J.P. A spectral transform approach to stochastic circuits. In Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD), Montreal, QC, Canada, 30 September–3 October 2012; pp. 315–321. [Google Scholar] [CrossRef]
Li, P.; Lilja, D.J.; Qian, W.; Bazargan, K.; Riedel, M.D. Computation on stochastic bit streams digital image processing case studies. IEEE Trans. Very Large Scale Integr. Syst. (VLSI) 2014, 2, 449–462. [Google Scholar] [CrossRef]
Qian, W.; Riedel, M.D. The synthesis of robust polynomial arithmetic with stochastic logic. In Proceedings of the Design Automation Conference, Anaheim, CA, USA, 8–13 June 2008; pp. 648–653. [Google Scholar]
Qian, W.; Li, X.; Riedel, M.D.; Bazargan, K.; Lilja, D.J. An architecture for fault-tolerant computation with stochastic logic. IEEE Trans. Comput. 2011, 60, 93–105. [Google Scholar] [CrossRef]
Gaines, B.R. Techniques of Identification with the Stochastic Computer. In Proceedings of the IFAC Symposium on “The Problems of Identification in Automatic Control Systems”, Prague, Czech Republic, 12–19 June 1967; pp. 635–644. [Google Scholar]
Alaghi, A.; Hayes, J.P. On the Functions Realized by Stochastic Computing Circuits. In Proceedings of the GLSVLSI ’15: Proceedings of the 25th Edition on Great Lakes Symposium on VLSI, Pittsburgh, PA, USA, 20–22 May 2015; pp. 331–336. [Google Scholar]
Faix, M.; Laurent, R.; Bessiere, P.; Mazer, E.; Droulez, J. Design of Stochastic Machines Dedicated to Approximate Bayesian Inferences. IEEE Trans. Emerg. Top. Comput. 2016, 7, 60–66. [Google Scholar] [CrossRef]
Brown, B.; Card, H. Stochastic neural computation. I. Computational elements. IEEE Trans. Comput. 2001, 50, 891–905. [Google Scholar] [CrossRef]
Li, P.; Qian, W.; Riedel, M.D.; Bazargan, K.; Lilja, D.J. The synthesis of linear Finite State Machine-based Stochastic Computational Elements. In Proceedings of the 17th Asia and South Pacific Design Automation Conference, Sydney, NSW, Australia, 30 January–2 February 2012. [Google Scholar]
Temenos, N.; Sotiriadis, P.P. A Markov chain framework for modeling the statistical properties of stochastic computing finite-state machines. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2022, 42, 1965–1977. [Google Scholar] [CrossRef]
Farouki, R.T. The Bernstein polynomial basis: A centennial retrospective. Comput. Aided Geom. Des. 2012, 29, 379–419. [Google Scholar] [CrossRef]
Kind, A. Information flow in stochastic computers. In Proceedings of the 2016 Australian Communications Theory Workshop (AusCTW), Melbourne, VIC, Australia, 20–22 January 2016; pp. 101–106. [Google Scholar]
Temenos, N.; Sotiriadis, P.P. A stochastic computing sigma-delta adder architecture for efficient neural network design. IEEE J. Emerg. Sel. Top. Circuits Syst. 2023, 13, 285–294. [Google Scholar] [CrossRef]
Parhami, B.; Yeh, C.H. Accumulative parallel counters. In Proceedings of the Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 30 October–1 November 1995; pp. 966–970. [Google Scholar]
Canals, V.; Morro, A.; Oliver, A.; Alomar, M.L.; Rossellè, J.L. A new stochastic computing methodology for efficient neural network implementation. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 551–564. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Liu, S.; Wang, Y.; Lombardi, F. A stochastic computational multi-layer perceptron with backward propagation. IEEE Trans. Comput. 2018, 67, 1273–1286. [Google Scholar] [CrossRef]
Temenos, N.; Sotiriadis, P.P. Nonscaling adders and subtracters for stochastic computing using Markov chains. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2021, 29, 1612–1623. [Google Scholar] [CrossRef]
Mason, J.; Handscomb, D.C. Chebyshev Polynomials, 1st ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2002. [Google Scholar]
Cody, W.J. A survey of practical rational and polynomial approximation of functions. SIAM Rev. 1970, 12, 400–423. [Google Scholar] [CrossRef]
Battles, Z.; Trefethen, L.N. An Extension of MATLAB to Continuous Functions and Operators. SIAM J. Sci. Comput. 2004, 25, 1743–1770. [Google Scholar]
Platte, R.B.; Trefethen, L.N. Chebfun: A New Kind of Numerical Computing. In Progress in Industrial Mathematics at ECMI; Springer: Berlin/Heidelberg, Germany, 2008; pp. 69–87. [Google Scholar]
Elliott, D. The evaluation and estimation of the coefficients in the Chebyshev series expansion of a function. Math. Comp. 1964, 18, 274–284. [Google Scholar] [CrossRef]
Clenshaw, C.W. A note on the summation of Chebyshev series. Math. Tables Other Aids Comput. 1955, 9, 118–120. [Google Scholar] [CrossRef]
Gaines, B.R. Stochastic computing systems. Adv. Inf. Syst. Sci. 1969, 2, 37–172. [Google Scholar]
Barrio, R.; Sabadell, J. Parallel evaluation of Chebyshev and trigonometric series. Comput. Math. Appl. 1999, 38, 99–106. [Google Scholar] [CrossRef]
Kim, K.; Kim, J.; Yu, J.; Seo, J.; Lee, J.; Choi, K. Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks. In Proceedings of the 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, USA, 5–9 June 2016. [Google Scholar]
Liu, Y.; Liu, S.; Wang, Y.; Lombardi, F.; Han, J. A Survey of Stochastic Computing Neural Networks for Machine Learning Applications. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 2809–2824. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Clenshaw stage internal circuit. In the diagram,

B_{x}

denotes the bipolar bit stream representing x.

Figure 1. Clenshaw stage internal circuit. In the diagram,

B_{x}

denotes the bipolar bit stream representing x.

Figure 2. Section of stochastic Clenshaw algorithm showing connections for the nth stage. A single sample bit from

B_{L x}

is applied to all stages.

Figure 2. Section of stochastic Clenshaw algorithm showing connections for the nth stage. A single sample bit from

B_{L x}

is applied to all stages.

Figure 3. Example functions shown as solid blue lines, with stochastic Chebyshev expansions shown with black dots. In all plots, the horizontal and vertical axes represent x and

f (x)

, respectively. (a)

f (x) = 1 / 4 + 9 / 8 x - 15 / 8 x^{2} + 5 / 4 x^{3}

with

N = 3

. (b)

f (x) = 3 x - 8 x^{2} + 6 x^{3}

with

N = 3

. (c)

f (x) = tanh (4 x)

with

N = 11

. (d)

f (x) = (T_{3} (x) + T_{8} (x)) / 2

(Chebyshev polynomials) with

N = 8

.

Figure 3. Example functions shown as solid blue lines, with stochastic Chebyshev expansions shown with black dots. In all plots, the horizontal and vertical axes represent x and

f (x)

, respectively. (a)

f (x) = 1 / 4 + 9 / 8 x - 15 / 8 x^{2} + 5 / 4 x^{3}

with

N = 3

. (b)

f (x) = 3 x - 8 x^{2} + 6 x^{3}

with

N = 3

. (c)

f (x) = tanh (4 x)

with

N = 11

. (d)

f (x) = (T_{3} (x) + T_{8} (x)) / 2

(Chebyshev polynomials) with

N = 8

.

Figure 4. Example functions shown as solid blue lines, with stochastic Chebyshev expansions shown with black dots. (a)

f (x) = J_{0} (15 (x + 1))

(Bessel function) with

N = 20

. (b)

f (x) = (6 x / 5 - 3 / 10) cos (11 x^{4})

with

N = 20

. (c)

f (x) = (4 / 5) sign (x)

with

N = 50

. (d)

f (x) = Γ (3 x + 1) / 10

(Gamma function) with

N = 60

.

Figure 4. Example functions shown as solid blue lines, with stochastic Chebyshev expansions shown with black dots. (a)

f (x) = J_{0} (15 (x + 1))

(Bessel function) with

N = 20

. (b)

f (x) = (6 x / 5 - 3 / 10) cos (11 x^{4})

with

N = 20

. (c)

f (x) = (4 / 5) sign (x)

with

N = 50

. (d)

f (x) = Γ (3 x + 1) / 10

(Gamma function) with

N = 60

.

Figure 5. Effect of range parameter L when modelling

f (x) = x sin (10 e^{x})

(solid blue line) with a 20-term Chebyshev expansion (black dots).

Figure 5. Effect of range parameter L when modelling

f (x) = x sin (10 e^{x})

(solid blue line) with a 20-term Chebyshev expansion (black dots).

Figure 6. Average squared error between desired function and circuit output for Chebyshev and Bernstein approximations. The target functions are (a)

f (x) = (6 x / 5 - 3 / 10) cos (11 x^{4})

and (b)

f (x) = x (2 + {Ai}^{'} (- 12 (x + 1))) / 3

, where

{Ai}^{'} (x)

is the derivative of the first Airy function.

Figure 6. Average squared error between desired function and circuit output for Chebyshev and Bernstein approximations. The target functions are (a)

f (x) = (6 x / 5 - 3 / 10) cos (11 x^{4})

and (b)

f (x) = x (2 + {Ai}^{'} (- 12 (x + 1))) / 3

, where

{Ai}^{'} (x)

is the derivative of the first Airy function.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kind, A. Approximation of General Functions Using Stochastic Computing. Electronics 2025, 14, 1845. https://doi.org/10.3390/electronics14091845

AMA Style

Kind A. Approximation of General Functions Using Stochastic Computing. Electronics. 2025; 14(9):1845. https://doi.org/10.3390/electronics14091845

Chicago/Turabian Style

Kind, Adriel. 2025. "Approximation of General Functions Using Stochastic Computing" Electronics 14, no. 9: 1845. https://doi.org/10.3390/electronics14091845

APA Style

Kind, A. (2025). Approximation of General Functions Using Stochastic Computing. Electronics, 14(9), 1845. https://doi.org/10.3390/electronics14091845

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Approximation of General Functions Using Stochastic Computing

Abstract

1. Introduction

2. Existing Function Approximation Methods

2.1. Logic Gates with Memory

2.2. Finite State Machines

2.3. Spectral Methods

2.4. Bernstein Polynomials

3. Stochastic Dot Product

3.1. Output Bit Stream

3.2. Empirical Distribution

3.3. Bit Stream Correlations

3.4. Averaging Across Sample Periods

3.5. Accumulator Stability

3.6. SDP Complexity

3.7. Alternative Dot-Product Methods

4. Chebyshev Function Approximation

4.1. Stochastic Clenshaw Algorithm

4.1.1. Clenshaw Stage

4.1.2. Complete Clenshaw Circuit

5. Numerical Investigation

5.1. Examples

5.2. Choosing the Scale Parameter

5.3. Comparison with Bernstein Approximation

6. Discussion

7. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI