Skew-Circulant-Matrix-Based Harmonic-Canceling Synthesizer for BIST Applications

Testing is an important part of the design flow in the semiconductor industry. Unfortunately, it also consumes up to half of the production cost. On-silicon stimulus generators and response analyzers can be integrated with the Device-Under-Test (DUT) to reduce production costs with a minimum increment in power and area consumption. This practice is known as the Built-In Self-Test (BIST). This work presents a single-tone generator for BIST applications that is based on the Harmonic-Canceling (HC) technique. The main idea is to cancel or filter out the harmonics of a square-wave signal in order to obtain a highly pure sine wave. The design challenges of this technique are the precise implementation of irrational coefficients in silicon and the strong dependence of the output’s linearity on the coefficients’ precision. In order to reduce this dependence, this work introduces an irrational coefficient generator that is based on the recursive use of special matrices called skew-circulant matrices (SCMs). A complete study of the SCM-based HC synthesizer, its properties, and the proposed implementation in 180 nm CMOS technology are presented. The measured results show that the proposed HC synthesizer is able to filter out up to the 47th harmonic of a given square wave and to generate signals from 0.8 to 100 MHz with a maximum Spurious-Free Dynamic Range (SFDR) of 66 dB.


Introduction
The semiconductor industry has evolved significantly since its creation in the 1950s. Nowadays, testing has proven to be a decisive stage of the production flow. However, testing can consume as much as 55% of the production cost [1]. Consequently, adding onchip, self-testing capabilities to the Device-Under-Test (DUT), provided by signal generators and response analyzers, has become a practical solution known as the Built-In Self-Test (BIST) approach. In order to make this an efficient solution, the required circuitry must be small in area and consume low power relative to the DUT. A block diagram of a BIST system and the complementary optimization system is shown in Figure 1. The BIST system consists of the stimulus generator, the response analyzer, and an Analog-to-Digital converter (ADC). In order to characterize the DUT, several stimuli can be made available, such as sine wave (single-tone) generators [2][3][4][5][6][7][8][9][10][11][12][13][14][15], two-tone generators [16,17], etc. Complementarily, in order to study the DUT response, several on-chip analyzers have been proposed such as spectrum analyzers [18][19][20][21], linearity analyzers [22][23][24], etc. Based on the BIST path output, the optimization path is able to take a decision and feed back the corresponding tuning signals into the BIST path.
This work focuses on the stimulus generator block, specifically, in the single-tone generator. In addition, it is an expanded version of a previous work [2]. For BIST applications, besides the low-area and power requirements, this block's output must present a high linearity. For instance, in order to characterize a 10-bit ADC, a sinewave with Total Harmonic This work proposes a programmable, high-order HC synthesizer that presents an irrational coefficient generator that ideally produces high-precision coefficients with no calibration scheme. This coefficient generator exploits the properties of a special family of matrices called skew-circulant matrices (SCMs) in a recursive approach. Its programmability allows the user to select the position of the non-cancelable harmonics, which are intrinsic to any HC synthesizer, in order to meet different linearity requirements. On the other hand, its high order reduces the complexity of the required additional low-pass filter (LPF) [12,13,15].
The document is organized as follows. Section 2 presents the mathematical background and classification of the HC synthesizer. Section 3 shows the relationship between the HC synthesizer and the SCMs. In addition, it presents the proposed SCM-based HCF and its properties. Next, a detailed circuit implementation is shown in Section 4. Sections 5-7 show the measurement results of the fabricated synthesizer, discussion, and conclusions, respectively.

Harmonic-Canceling Filter
The main concept behind this type of filter is the rejection of the harmonics of a specific input signal in order to obtain a highly pure sine wave at its output; hence, they can be used as single-tone generators. Due to their frequency behavior, digital nature, and not very complex implementation, square waves (SWs) are considered as the filter's input in this work. Figure 3a presents the operation of an ideal HCF when it is driven by a 50% duty cycle SW with fundamental angular frequency ω 0 = 2π f 0 . The ideal output corresponds to a pure single-tone signal with period T = 1/ f 0 . Based on the Fourier series theory, any periodic signal f (t) can be expressed as where A k and B k are the Fourier coefficients, and ω 0 is the fundamental angular frequency of f (t).
If M periodic signals f (t) with weight α i , delay d i = θ i /ω 0 , and no DC component are considered, the Fourier series of the resultant signal f eq (t) is where its Fourier coefficients are The goal of an HCF is to eliminate X k and Y k for k ≥ 2. In order to achieve this, from (3) and (4), there are two available degrees of freedom: α i and θ i . Depending on which one is fixed, there are two approaches to implement an HCF, which are the constant-amplitude HCF and the constant-delay HCF. Figure 3b shows a generic block diagram of an HCF which resembles a Finite Impulse Response (FIR) filter.

Constant-Amplitude HCF
The basic implementation and transfer function |H( f )| of the constant-amplitude or time-mode HCF are shown in Figure 4a,b, respectively. Its transfer function is equal to where f is the frequency in Hz. Interestingly, with only one delay element and a summer, the filter's transfer function presents nulls at odd multiples of 1/2τ D . Therefore, considering the input x(t) with period T, and setting τ D = T/2k, it is possible to cancel the odd multiples of the input's k-th harmonic. Consequently, by adding several time delays in a specific manner, more harmonics can be canceled. For example, if the 3rd and 5th harmonics are to be suppressed, the corresponding HCF transfer function is Figure 4c,d show the block diagram and transfer function of this HCF, respectively. As expected, the odd multiples of the 3rd and 5th harmonics are canceled.
Unfortunately, the number of harmonics to be canceled is inversely proportional to the size of the required delay unit. For instance, a delay unit of T/1890 is needed to suppress the odd multiples of the 3rd, 5th, and 9th harmonics. This trade-off turns the constant-amplitude HCFs into an impractical solution for high-speed applications. Nonetheless, some solutions have combined constant-amplitude HCFs with passive filters and optimization algorithms to tackle this problem [10].

Constant-Delay HCF
This type of filter is based on the concept of half-sine impulse response filters, which is shown in Figure 5a and was first proposed by [25]. Its transfer function is expressed as and is plotted in Figure 5b. This filter is able to suppress all the odd harmonics of the fundamental frequency f 0 = 1/T of the SW input x(t) with period T, providing a highly pure tone as its output. Recent publications have proposed practical implementations of this type of filters that used sampled versions of the half-sine impulse response [11][12][13][14][15][16]. If n samples of the impulse response are taken every τ d = T/2n, the filter is able to suppress all the input's odd harmonics except those located at (2ln ± 1) f o for l = 1, 2, . . .. Every sample corresponds to a tap coefficient α k expressed as This filter is also known as the n-tap HCF. Its transfer function is equal to Figure 5c,d illustrate the sampled impulse response and the transfer function of the 4-tap HCF. It is clear that the transfer function is periodic with a period of 2n f 0 = 8 f 0 . Furthermore, Figure 5e shows its block diagram, SW input, and staircase sine-wave output. Since α 0 = 0, only three coefficients and two delay units are required. Note that an irrational coefficient is used, and the 7th and 9th harmonics are non-cancelable due to the sampling operation. If the non-cancelable harmonics are required to be pushed to higher frequencies, it is necessary to increase the number of taps. At this point, a simple passive filter can attenuate them.   As discussed in this section, the sampled half-sine or constant-delay HCFs present advantages with respect to the constant-amplitude HCFs. For comparison purposes, an HCF that suppresses the 3rd and 5th harmonics is considered. On the one hand, a constantdelay 4-tap HCF requires a time step of T/8 and two unique coefficients. On the other hand, a constant-amplitude HCF requires a time step of T/30. It is clear that the former can achieve the same performance with a larger time delay. However, this comes with the challenge of implementing irrational coefficients. Considering BIST applications that use moderate to high frequency ranges in the order of MHz, this work focuses on the constant-delay HCFs. In the next section, a recursive approach to implement this filter is presented.

Matrix Representation of the HCF
From this point, a sampled half-sine HCF or constant-delay HCF is simply referred to as HCF. As presented in previous sections, an n-tap HCF requires n input SWs and n tap coefficients. Considering a 50% duty-cycle SW φ i (t) with period T, then the n-tap HCF needs n versions of φ i (t) with a delay of τ D = T/2n with respect to each other. These are referred to as the input phases and can be expressed as Note that this set of SWs is periodic and odd symmetric. Hence, φ i,k+2n = φ i,k and On the other hand, the tap coefficients α k are given by (8). For an even n, it holds that α 0 = 0, α n/2 = 1 and α k = α n−k . In other words, the HCF is a linear phase FIR filter; i.e., it provides a constant input-to-output group delay of τ D · (n/2). For this specific case, the HCF's output φ o,n/2 can be defined as Assuming that n outputs with a group delay ranging from 0 to (n − 1) are required, the system can be expressed in matrix form as or in compact notation, where Φ i , Φ o , and A i are the input phase vector, output phase vector, and the coefficients matrix, respectively. Interestingly, A i corresponds to a special matrix type called Skew-Circulant Matrix (SCM). A n × n SCM S n is a matrix that presents a right cyclic shift between each consecutive row and the sub-diagonal elements change of sign [26]. Consequently, it is completely defined by the elements of its first row as S n = scirc(s o , s 1 , . . . , s n−1 ). Another feature of the SCMs is that their eigenvectors y m only depend on their order n and can be expressed as where j is the unit imaginary number and T is the transpose operator. In addition, the eigenvalues λ m of S n are Considering the eigenvalues and eigenvectors of S n , its eigen decomposition is expressed as S n = UΛU * , where U = [y 0 |y 1 | . . . |y n−1 ], Λ = diag(λ 0 , λ 1 , . . . , λ n−1 ) and U * is the conjugate transpose of U. Based on these properties, all SCMs of the same order n share the same eigenvectors; hence, the same matrix U.

HCF with Multi-Stage Open-Loop SCM-Based Coefficient Generator
As shown in (13), an n-tap HCF can be represented by an SCM A i such that where s k = cos(kπ/n) for k = 0, 1, . . . , n − 1. For this case, it is proven in Appendix A that the eigenvalues of A i are equal to Consider the normalized, even-order n SCM [A i ], and its eigen decomposition where A i is the Euclidean norm of A i . Furthermore, from (17), it follows that matrix For practical implementations, the main drawback of [A i ] is that its elements potentially can be irrational numbers. In order to avoid this, matrix A is defined such that where s k = sgn(s k ), and sgn(x) is the sign function. In this fashion, A is an integercoefficient SCM. In Appendix B, it is proven that the eigenvalues of A are given by Its normalized version [A] presents an eigen decomposition equal to Interestingly, using (20), Λ = diag(1, 1 , . . . , n−2 , 1) where m = λ m / max(|λ m |) < 1. Based on this property, and recalling that all SCMs of the same order n share the same eigenvectors, if M replicas of [A] are cascaded, then where Therefore, a cascade of M normalized, even-order n, integer-coefficient SCMs [A] can be used to approximate an irrational-coefficient SCM [A i ], as shown in Figure 6a. In addition, Figure 6b shows the eigenvalues of the resultant SCM for different values of M and n = 6. Note that the intermediate eigenvalues decrease as M increases. In other words, these intermediate eigenvalues can be considered as the error of the integer-coefficient SCM. It is important to note that the reason for using normalized matrices is that the outputs are bounded to the absolute magnitude of the input phases.
Since only one HCF's output is required, the system architecture can be modified as shown in Figure 6c where the coefficients and phases generation processes are independent from each other. This improved approach allows that coefficients can be generated from a vector of DC signals C 0 = [1, 0, . . . , 0] T and the phases present a faster path to the output, reducing potential phase errors. Nonetheless, this comes with the need for a combiner block.
Note that even if the challenge of using an irrational-coefficient-based SCM is met, it appears to be moved to the norm A since now, it can be an irrational number. It can be proven that A n −1 = tan(π/2n). However, since this value affects the complete matrix A, it does not affect the coefficients' relative ratio between each other; i.e., it can be considered as a gain error. In this work, the approximation A n −1 ≈ 8/5n is used.

HCF with Single-Stage Closed-Loop SCM-Based Coefficient Generator
From (23), it is implied that if M→∞, the outputs of [A i ] and the cascade of [A] M are similar. This suggests the concept of the closed-loop SCM-based coefficient generator, which is presented in Figure 7a. Using the improved approach and at steady-state, the output vector C cl of the closed-loop coefficient generator is expressed as: where A f b = diag(0, 1, 1, . . . , 1), and C 0 = [1, 0, . . . , 0] T . This is correct only if the ideal matrix norm A is used. The use of the approximation A n −1 ≈ 8/5n affects the coefficients' relative ratio between each other; hence, it generates a systematic error.
[A]  In order to compare the performance of the multi-stage open-loop and single-stage closed-loop approaches, the spurious-free dynamic range (SFDR) of the filter's output is evaluated using a system-level model. The SFDR is calculated as the ratio of the power of the fundamental frequency and the strongest cancelable harmonic up to the (2n−1)-th harmonic. Figure 7b shows the values of SFDR for different n-tap SCM-based HCFs using M open-stages and the closed-loop approach. It is observed that the closed-loop coefficient generator with A −1 = 8/5n is capable of achieving similar SFDR values as a 5-stage open-loop CG for n > 6. Thus, the closed-loop CG with a non-ideal norm represents a less complex solution in comparison with the straightforward M-stage open-loop CG approach.

High-Order HCF
As introduced in [16], a high-order n-tap HCF can be implemented by cascading lower-order n 1 -tap and n 2 -tap HCFs (Figure 8). A formal proof is shown in this section.
In order to use both HCFs, n input phases equally spaced by π/n are required such that n = lcm(n 1 , n 2 ), where lcm(.) is the least common multiple operator. For the first stage to properly operate, n/n 1 parallel n 1 -tap HCFs are needed. The phases are distributed based on a perfect shuffle permutation P n/n 1 where n = s × r and I n is the n × n identity matrix. The MATLAB colon notation to designate submatrices is used. At the output of the n 1 -tap HCFs, a perfect shuffle permutator P n 1 n/n 1 is required to reorganize the output phases back to their original order. A similar process is done for the n 2 -tap HCF. For each stage, these operations can be expressed as Φ a = P n 1 n/n 1 (I n/n 1 ⊗ A n 1 ×n 1 )P n/n 1 where ⊗ is the Kronecker product operator. For X m×n = x ij i=1,...,m;j=1,...,n and Y p×q = (y hk ) h=1,...,p;k=1,...,q , their Kronecker product is the mp×nq matrix given by Based on the properties of the Kronecker product, (26) can be simplified to: As derived in Appendix C, matrix (A n 2 ×n 2 ⊗ I n/n 2 )(A n 1 ×n 1 ⊗ I n/n 1 ) is simply a scaled version of A n×n if and only if gcd(n 1 , n 2 ) > 1, and it is equal to where gcd(.) is the greatest common divisor operator. Hence, the cascade of the n 1 -tap and n 2 -tap HCFs is equivalent to an HCF of order n = lcm(n 1 , n 2 ) if and only if n 1 and n 2 have a common factor; i.e., gcd(n 1 , n 2 ) > 1.

Band-Pass HCF
As shown in Section 2, the objective of the half-sine HCF is to filter out all the harmonics of the input SW except its fundamental frequency. Nonetheless, it is possible to select the input's m-th harmonic, which gives place to the band-pass HCF. Its impulse response h m (t) is given by Figure 9a shows a comparison between the basic and band-pass HCFs. If the m-th harmonic is to be bypassed to the output, then the HCF's impulse response presents m half-sine segments.
For practical implementation, the impulse response is sampled at T/2n, where n > m to satisfy the Nyquist sampling theorem. Thus, for a given n-tap HCF, several band-pass HCFs can be obtained. Moreover, the sampled values h m [0, 1, . . . , n/2] are all different if m and 2n are relatively prime, i.e., their greatest common divisor is 1. Figure 9b shows several band-pass HCFs for n = 8. Note that h m [k] = sin(mkπ/n) is symmetric around k = n/2, and that the coefficients are similar for all the filters except that they present different orders and signs. Hence, assuming that the tap coefficients are available, it is possible to implement different band-pass HCFs by rearranging the tap coefficients accordingly.

System Architecture
In this work, a reconfigurable, SCM-based, 24-tap HCF is implemented. This filter is able to cancel up to the 47th harmonic of the SW signal φ(t) with frequency f CLK /48. In other words, this HCF is used as a single-tone generator that produces a stepwise sinewave differential current signal with frequency f o = max( f CLK )/48. Figure 10a shows its impulse response h(t), which corresponds to a cosine function cos(πk/24). It is noted that the coefficients related to φ 2r−2 , r = 1, 2, . . . , 12 and φ 4r−4 , r = 1, 2, . . . , 6 correspond to the 12-tap, and 6-tap HCFs, respectively. Thus, by selecting specific phases, the 24-tap, 12-tap, and 6-tap HCFs are available. This feature allows to extend the maximum frequency of the output signal to f o = max( f CLK )/12. Figure 10b shows the block diagram of the complete system, which is divided in four main blocks: the frequency divider, the phase scrambler, the retimer and buffer, and the 24tap HCF core. The frequency divider generates the 24 equally-spaced phases φ d [0:23] from a clock signal CLK with programmable frequency division ratios in order to select between the 24-tap, 12-tap, and 6-tap HCFs. The phase scrambler allows for the rearrangement of the phases such that it can bypass the fundamental or the 5th input's harmonic to its output. The 24-tap HCF core is divided in the CG and combiner. In order to achieve the required SCM order, 8-tap and 5-tap SCM-based CGs are used in cascade. All the required coefficients are generated using only one input DC current I in . By means of a combiner, the system produces the differential output current I o , which is converted to voltage by the load resistors R L . Each block is presented in detail in the next subsections.

Frequency Divider
The frequency divider (FD) is shown in Figure 11a. The 24 equally spaced phases are generated from the input clock signal CLK by a variable-length ring counter, which is based on a cascade of D flip-flops (DFFs). The outputs of this counter are Q k , for k = 0, 1, . . . , 23. Depending on the value of the input DIV ∈ {1, 2, 3}, the outputs Q 5 , Q 11 , or Q 23 are fed back to the input of the first DFF by an inverting feedback multiplexer, providing with a frequency division ratio of 12, 24, or 48, respectively.
The bus signal Q is connected to a phase selector with output Φ d . Depending on the value of DIV, each signal φ d [k] is connected to Q k/4 , Q k/2 , or Q k . Figure 11b shows the FD's output phases pattern for each value of DIV. For example, for DIV = 2, every two consecutive phases are connected; i.e., the corresponding coefficients are connected in parallel. In this fashion, the number of tap coefficients is kept constant for all available HCFs; hence, all the HCFs present the same output peak-to-peak amplitude.

Phase Scrambler
As shown in Section 3.5, the proposed HCF can be configured to bypass an input signal's harmonic different from the fundamental frequency by rearranging its coefficients or phases. The latter approach is chosen due to its lower implementation complexity based on digital multiplexers. Figure 12a shows the implementation of the phase scrambler (PS). Depending on the value of H ∈ {0, 1}, the fundamental frequency or the 5th harmonic of φ d [k] are bypassed to the filter's output, respectively. Note that 5 is coprime with 2n for the three available HCFs. Then, it is true that the tap coefficients of the bandpass HCF h 5 [k] = cos(5kπ/n) are similar to those of the low-pass HCF h 1 [k] = cos(kπ/n) but with a different order and sign. Figure 12b presents the input-to-output connections.

Retimer and Buffer
The required routing and operation of the phase selector and phase scrambler introduce phase errors. These are reduced by sampling the phase scrambler outputs φ s [k] at the rising edge of the input clock CLK. This is done by an array of DFFs. Each of them provides an inverted version of each phase. The output of the retimer and buffer (R&B) is the bus Φ, where each signal φ[k] = − φ[k + 24] for k = 0, 1, . . . , 23. This work does not present any additional phase calibration scheme.

24-Tap HCF Core
The required tap coefficients of the 24-tap HCF are generated by cascading the 8tap and 6-tap CGs. Once these coefficients are available, they need to be combined with the phases accordingly in order to produce the system's output. These operations are performed by the 24-tap HCF core.
The quarter-wave symmetry of the cosine function is used to reduce the implementation complexity of the CGs. In other words, by taking advantage of the SMC's symmetry around α n/2 = 1, any given even-order n × n SCM [A n ] can be expressed as an n/2 × n/2 SCM [A nr ] such that This reduced matrix contains the information related to only one quadrant of the cosine function. Using this property, matrix [A 8 ] = A 8 −1 scirc(1, 1, 1, 1, 0, −1, −1, −1) can be reduced to In addition, [A 6 ] = A 6 −1 scirc(1, 1, 1, 0, −1, −1) can be reduced even further, considering that it produces the coefficients 0.5 and 1(=0.5 × 2). Then Figure 13a shows the 24-tap HCF core block diagram. Based on the improved implementation presented in Section 3.3, input vector C 0 = (1, 0, . . . , 0) is used; i.e., a single input current I in is required to generate all the current-mode coefficients. The 8-tap CG implements the reduced SCM [A 8r ]. It produces four output currents whose relative ratios with respect to each other correspond to the coefficients 0.5, 0.923, 0.707, and 0.382. Each of these outputs is connected to four 6-tap CGs, which in turn implement the SCM [A r6 ] and produce eight replicas of the currents I a and I b such that I a :I b = 1:0.866.
The connection between the phases and coefficients is shown in Figure 13b. The absolute value and sign of the coefficients related to the 6-tap CG are color-coded. Each of them are scaled in the shown order by the 8-tap CG coefficients associated with each row. Moreover, each row shows the order of the phases connected to each 6-tap combiner unit. It is important to mention that the time delay between two consecutive combiner subcells of each row is 4T/48 = T/12 , that is, the unit delay of the 6-tap HCF, whereas the time delay between each row and the one below is 3T/48 = T/16, which is the unit delay of the 8-tap HCF. In this way, all the phases present the same load, which reduces the systematic phase mismatch that limits the filter's performance. Next, the resultant coefficient α k corresponding to the sum of elements of the k-th column is multiplied by the corresponding phase. Finally, the output I o is equal to the sum of all α k φ k products. The circuit-level implementation of the 6-tap CG is shown in Figure 14a. It implements a cascade of three stages of matrix [A 6r ] along with its norm A 6 based on NMOS current mirrors (CMs). As presented in Section 3.3, the first stage is connected in a closed loop in order to achieve a filter's output with SFDR > 70 dB. In this work, the number of SCM stages is set to three due to a trade-off between the coefficient accuracy and area overhead. The PMOS CMs are used to transport the currents from stage to stage. The last PMOS CM provides eight copies of currents I a and I b . The same approach is used to implement the 8-tap CG, as shown in Figure 14b. The implementation of the combiner unit is shown in Figure 14c. It is divided in twelve differential pairs and uses four copies of I a and I b that are connected as tail currents. In addition, six phases CK 0:5 , each with its corresponding inverted version, are used to steer the input currents accordingly to the pattern presented in Figure 13b. If a negative sign is required, the differential clock is connected in opposite polarity. In this way, each section of the combiner inside the colored rectangles corresponds to each 6-tap coefficients; i.e., 0.5, 0.866, 1, 0.866, 0.5, and 0.  1 : 1 f b [3] 1 : 1  Figure 14. Circuit-level implementation of (a) 6-tap CG, (b) 8-tap CG, and (c) 6-tap combiner unit.

Measurement Results
The proposed single-tone generator is fabricated in 180 nm CMOS technology, operates with a supply voltage of 1.8 V, and occupies an area of 0.505 mm 2 . Its micrograph is shown in Figure 15 along with the area occupied by each sector and its corresponding percentage with respect to the total area. The CGs occupy around 70% of the total area, since they are composed of a large amount of CMs. Furthermore, these CMs use large transistors in order to reduce their current-ratio mismatch, i.e., to improve the coefficients' precision. In a CMOS process, the mismatch between two nominally identical transistors is inversely proportional to their channel length. Furthermore, recall that due to the recursive nature of the proposed solution, several identical blocks are required in order to obtain a specific SFDR, increasing the occupied area even further. In addition, the uncoupling of the phase generator from the coefficient generator contributes to the area cost. As presented in Section 4, the system incorporates six HCFs, which are selectable based on the value of the inputs n ∈ {6, 12, 24} and H ∈ {0, 1}. The former selects between the 6-tap, 12-tap, or 24-tap HCFs, and the latter selects between the fundamental or 5th harmonic of the SW signal φ(t) with frequency f CLK /2n. Figure 16 shows the measurement setup. The clock signal CLK with frequency f CLK is provided by an Agilent E8267D vector signal generator. The input current I in is set by a variable resistor. The differential output current I o is converted to voltage by the off-chip load resistors R L . Next, this signal is buffered and converted to single-ended by the LTC6417 and TC1-1TX+, respectively. Finally, the resulting signal is analyzed using the Agilent DSA91304A Infiniium digital signal analyzer.  Figure 17a shows the measured power consumption of each block versus the output frequency f o of the 24-tap HCF when the fundamental frequency of φ(t), f CLK /48, is of interest or H = 0. Since the CGs only carry DC currents, its power consumption is independent of frequency. Furthermore, these currents are fed to the unit combiners, which steer them according to the pattern shown in Figure 13; hence, the combiner's power consumption is also constant. Due to their digital nature, the FD, PS, and R&B blocks consume power proportional to the output frequency. In addition, Figure 17b shows the total power consumption of the 6-tap, 12-tap, and 24-tap HCFs versus the output frequency when H = 0. These results show that the slopes of the curves are proportional to the filter's order. This difference is mainly dictated by the fully digital blocks FD, PS, and R&B, especially the former, which enables only the required n DFFs. The SFDR versus output frequency is shown in Figure 17c,d, for H = 0 and H = 1, respectively. It is noted that the SFDR decreases as the output frequency increases. This is due to the increasing phase error from the FD that causes even harmonics to show at the output [14]. Only the waveforms that present even harmonics with lower power than the odd cancelable harmonics are considered. Since the working frequency of the FD is greater for H = 0 than for H = 1, smaller SFDR values are obtained for H = 1. Figure 18a,b show the output's waveform and power spectral density (PSD) of the 24tap HCF, respectively, when H = 0. The obtained staircase sine-wave waveform presents the first pair of non-cancelable harmonics at 47 f o and 49 f o , which can be suppresed with a low-order passive LPF [12,13,15]. In addition, Figure 18c Table 1 summarizes the performance of the six HCFs proposed in this work and compares them to previous works. The Figure of Merit (FoM) used in this work is given by

24-tap HCF
where f o,max is the maximum output frequency, SFDR best is the highest measured SFDR, AF is the number of available filters, FNCH is the first non-cancelable harmonic, P total is the maximum total power consumption, and A is the area. This FoM is based on the one used by [13,14] with the addition that it accounts for the programmability and the harmonic-canceling range of the system. In this fashion, the number of implemented HCFs in the same area, i.e., the system's area efficiency, is included in the FoM. On the other hand, recall that an external LPF is still required at the output of the HC-based generators due to the presence of the non-cancelable harmonics at (2n ± 1) f o . The order (and therefore, the complexity and power consumption) of the required external LPF is inversely proportional to the order n of the HCF. For this reason, it is relevant to include the FNCH in the FoM. In summary, this work presents the only programmable HCF and the highest-order HCF. The 24-tap HCF allows the cancellation up to the 47th harmonic of the SW signal φ(t), which is the highest FNCH reported to the best knowledge of the authors. It also implements the first band-pass HCF. The proposed SCM-based HCFs provide SFDR and power consumption values comparable to previous works that use calibration techniques. For this work, the calculated FoM only includes the three HCFs when H = 0. Considering the FoM values, this work performs better than most of the previous works except [13] only after it uses calibration.

Discussion
In the presented analysis, only ideal SCM elements and equally spaced SWs are considered. Therefore, it does not include non-idealities such as coefficients mismatch due to variations during fabrication or phase errors produced by the FD, PS, and R&B blocks. Under ideal conditions, as shown in Figure 7, the SFDR of the output signal increases as the number of SCM stages, M, increases, for a given HCF order n. Unfortunately, as presented in [16], non-idealities set a maximum limit for the output linearity. In other words, it is expected that the SFDR saturates and remains constant regardless of the number of SCM stages. This is reflected in the measured SFDR values, which are lower than expected from the ideal analysis. For this reason, an statistical analysis is required to optimize the HCF design in a future work. For instance, a model of the proposed HCF that considers the standard deviation of the CMs and the phase errors can be used to evaluate the trade-off between phase error, coefficient precision, and SFDR.
The use of a first-order approximation of the matrix norm A −1 = 8/5n is another source of SFDR limitation. Nonetheless, a better approximation requires the ratio of higher-integer numbers. For instance, consider the HCF of order n = 6. Its ideal norm A i −1 = tan(π/12) ≈ 0.2679 is approximated as A −1 = 0.2666. The next set of integer numbers, the ratio of which is closer to A i −1 , is 15/56 ≈ 0.2678. The use of 15 and 56 in the matrix norm implementation implies the use of more unit transistors and a more complex device layout, i.e., more error sources that affect the SFDR.
In order to increase the output frequency range, the phase error produced by the FD, PS, and R&B blocks must be reduced. Note that these blocks operate at 2n f o . This is the main reason for the difference between the frequency ranges of the 6-tap, 12-tap, and 24-tap HCFs. In order to reduce the phase error in a future work, a delay error correction mechanism would be required. This can be provided by a Delay-Locked Loop (DLL) that generates the required phases with a negative feedback loop.

Conclusions
In this work, a harmonic-canceling single-tone synthesizer that uses an SCM-based coefficient generator for BIST applications is proposed. This coefficient generator produces irrational coefficients from integer numbers in a recursive approach with no calibration scheme. Measured SFDR values prove the effectiveness of the proposed SCM-based coefficient generator architecture, since they are comparable with those of previous works that use calibration. The selectable 24-tap, 12-tap, and 6-tap HCFs are implemented along with their band-pass versions. They cover a frequency range from 0.8 to 100 MHz and provide the highest number of operation modes and the highest first non-cancellable harmonic reported. Acknowledgments: The authors would like to thank Edgar Sanchez-Sinencio for his inspiration and contribution to this work. His legacy will transcend in his family, friends and students. In addition, the authors would like to thank Silicon Labs and MOSIS for their contribution to this work.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

BIST
Built-In Self