A New, Fast Pseudo-Random Pattern Generator for Advanced Logic Built-In Self-Test Structures

: Digital cores that are currently incorporated into advanced Systems on Chip (SoC) frequently include Logic Built-In Self-Test (LBIST) modules with the Self-Test Using MISR/Parallel Shift Register Sequence Generator (STUMPS) architecture. Such a solution always comprises a Pseudo-Random Pattern Generator (PRPG), usually designed as a Linear Feedback Shift Register (LFSR) with a phase shifter attached to the register and arranged as a network of XOR gates. This study discloses an original and innovative structure of such a PRPG unit referred to as the DT-LFSR-TPG module that needs no phase shifter. The module is designed as a set of identical linear registers of the DT-LFSR type with the same primitive polynomial. Each register has a form of a ring made up exclusively of D and T ﬂip-ﬂops. This study is focused on the investigation of those parameters of DT-LFSR registers that are essential to use these registers as components of PRPG modules. The investigated parameters include phase shifts and the correlation between sequences of bits appearing at outputs of T ﬂip-ﬂops, implementation cost, and the maximum frequency of the register operation. It is demonstrated that PRPG modules of the DT-LFSR-TPG type enable much higher phase shifts and substantially higher operation frequencies as compared to competitive solutions. Such modules can also drive signiﬁcantly more scan paths than other PRPGs described in reference studies and based on phase shifters. However, the cost of the foregoing advantages of DT-LFSR-TPG modules is the larger hardware overhead associated with the implementation of the solution proposed.


Introduction
Digital cores that are currently incorporated into advanced and up-to-date Systems on Chip (SoC) usually include Logic Built-In Self-Test (Logic BIST, LBIST) modules. A very popular LBIST architecture operated in the test-per-scan mode is STUMPS (Self-Test Using MISR/Parallel Shift register sequence generator) [1,2] (pp. 288-290), [3] (pp. 176-177), [4] (pp. 522-525), or various versions of it [5][6][7][8][9]. The structural diagram of a typical LBIST module with the STUMPS architecture as depicted in Figure 1 includes some key components, such as a generator of pseudo-random tests, or more specifically, a Pseudo-Random Pattern Generator (PRPG) and a Multi-Input Signature Register (MISR). The PRPG module is intended to supply pseudo-random sequences of bits simultaneously to all scan paths of a circuit to be tested, and these sequences are then delivered as parallel test vectors to inputs of combinational logic blocks of such a Circuit Under Test (CUT). A PRPG module is usually designed as a shift register with linear feedback (Linear Feedback Shift Register-LFSR) [3] (pp. [65][66][67][68][69][70][71][176][177], although Cellular Automata (CA) can also be used for these purposes [3] (pp. [72][73][74][75], [10][11][12][13]. Some specific solutions may also comprise other types of linear registers such as PRPG modules, for instance Galois Linear Feedback Shift Register (GLFSR) [14][15][16]. Responses to test vectors gathered from outputs of a CUT are loaded in the parallel mode into scan paths, and then scan paths are switched to the serial mode and information from all scan paths is supplied to inputs of the MISR module where information is subject to compaction (lossy compression). The final content of the MISR is referred from outputs of a CUT are loaded in the parallel mode into scan paths, and then scan paths are switched to the serial mode and information from all scan paths is supplied to inputs of the MISR module where information is subject to compaction (lossy compression). The final content of the MISR is referred to as the signature, which provides information on whether the CUT is faulty or not. The substantial advantage of LBIST structures with the STUMPS architecture consists of the fact that such solutions offer much shorter testing times as compared to conventional built-in self-test solutions operated in the test-per-scan mode since pseudo-random sequences of bits are supplied simultaneously to a plurality of scan paths [3] (pp. 176-177). Although STUMPS architecture appeared as early as the 1980 s, it is still quite popular and constantly in use. There are numerous studies where researchers make efforts to improve the solution in terms of hardware overhead, testing time, power consumption, fault coverage, diagnosability, etc. [17][18][19][20][21][22][23][24][25][26]. Some variations of the STUMPS architecture are implemented into hybrid LBIST, where pseudo-random testing is combined with deterministic testing or test data compression [27][28][29][30][31][32]. The remainder of this study shall have the following structure. Section 2 deals with some problems with fault coverage that may happen to basic options of the LBIST structure with the STUMPS architecture and explains the reasons for such problems. Section 3 outlines how PRPG modules are usually designed to remedy the aforementioned problems. New concepts for the implementation of PRPG modules into LBIST circuits with the STUMP architecture are shown in Section 4, where the implementation of specific DT-LFSR ring linear registers is proposed. The innovative structure, operation principle, and key properties of DT-LFSR Test Pattern Generators (DT-LFSR-TPG), designed according to the concept shown in previous sections, are brought together in Section 5. The logical synthesis results of the new type of pseudo-random pattern generators are presented in Section 6. The study ends up in Section 7 with a discussion of the results obtained, recapitulation, and key conclusions.

Motivation
Requirements that are imposed to any Test Pattern Generator (TPG) usually include on one hand the need to guarantee a high coverage factor of faults that may occur in a CUT and, on the other hand, the acceptable testing time. Pseudo-random TPGs are frequently designed as conventional Linear Feedback Shift Registers (LFSR) based on primitive polynomials. The main cycle of the state diagram of such an n-bit register contains all 2 n − 1 states except for the vector <00…0>. In addition, every LFSR register produces identical sequences of bits at each of its outputs, and these sequences are merely mutually shifted in phase. Every single sequence is of the pseudo-random nature, i.e., meets the The remainder of this study shall have the following structure. Section 2 deals with some problems with fault coverage that may happen to basic options of the LBIST structure with the STUMPS architecture and explains the reasons for such problems. Section 3 outlines how PRPG modules are usually designed to remedy the aforementioned problems. New concepts for the implementation of PRPG modules into LBIST circuits with the STUMP architecture are shown in Section 4, where the implementation of specific DT-LFSR ring linear registers is proposed. The innovative structure, operation principle, and key properties of DT-LFSR Test Pattern Generators (DT-LFSR-TPG), designed according to the concept shown in previous sections, are brought together in Section 5. The logical synthesis results of the new type of pseudo-random pattern generators are presented in Section 6. The study ends up in Section 7 with a discussion of the results obtained, recapitulation, and key conclusions.

Motivation
Requirements that are imposed to any Test Pattern Generator (TPG) usually include on one hand the need to guarantee a high coverage factor of faults that may occur in a CUT and, on the other hand, the acceptable testing time. Pseudo-random TPGs are frequently designed as conventional Linear Feedback Shift Registers (LFSR) based on primitive polynomials. The main cycle of the state diagram of such an n-bit register contains all 2 n − 1 states except for the vector <00 . . . 0>. In addition, every LFSR register produces identical sequences of bits at each of its outputs, and these sequences are merely mutually shifted in phase. Every single sequence is of the pseudo-random nature, i.e., meets the requirements stipulated in [33] for pseudo-random binary sequences. Unfortunately, in spite of the foregoing properties attributable to conventional LFSRs, implementation of them as generators of pseudo-random test patterns may entail some problems with the Appl. Sci. 2021, 11, 9476 3 of 27 fault coverage, which is strongly manifested for the STUMPS-type LBIST architecture. Namely, when a conventional LFSR with an external feedback loop (type I LFSR [34]) is implemented as a TPG into the mentioned LBIST architecture, some faults in the CUT may remain undetected due to an insufficient phase shift between bit sequences supplied to scan paths. The foregoing problem is illustrated by the example below.
Example 1 Figure 2 depicts a fragment of a circuit to be tested by means of five scan paths. Serial inputs of these scan paths are fed from parallel outputs of a five-bit type I LFSR designed for the primitive polynomial p(x) = x 5 + x 4 + x 2 + x + 1. Outputs of that register produce identical pseudo-random sequences of bits but with a sequential phase shift. The figure depicts an example of content in the scan paths and the register, whilst dotted lines indicate interrelationships between contents of individual scan paths. Outputs of these scan paths cells feed inputs of exemplary gates included in the CUT (gates G1-G4). To make the picture more clear, the remaining part of the CUT, where outputs of the mentioned gates are wired, is not shown in the figure. The issue disclosed in Example 1 can be formulated in a more general way; i.e., a low phase shift between test bit sequences appearing at neighboring outputs of a pseudo-random TPG leads to a strong correlation between test vectors supplied to scan paths fed from these outputs. It is why these vectors may no longer be considered as pseudo-random ones, which adversely affects the fault coverage in digital CUTs with the STUMPS-type LBIST architecture [3] (pp. 177-181), [12,[34][35][36][37]. In addition, page 322 in [38] shows an example to demonstrate that test vector sequences generated by an I-type LFSR and feeding parallel inputs of a circuit tested in the test-per-clock mode shall be The foregoing example demonstrates that the following faults of the CUT are not detectable: stuck-at-1 at the output of the OR (G2) gate, stuck-at-0 at the output of the XOR (G4) gate, short (bridging fault) between inputs of the AND (G3) gate as well as an Appl. Sci. 2021, 11, 9476 4 of 27 open break at the input of the CMOS transistor with the p channel (input A) within the NOR (G1) gate. The reason why those faults are undetectable lies in insufficient phase shift, actually equal to 1, between binary sequences produced at adjacent outputs of the TPG. Due to such an insufficient phase shift the content of both the j + kth and jth cells in the scan paths wired to respective ith and i + kth outputs of TPG shall be always equal to another (i = 0, 1, . . . , n − 2, j = 0, 1, . . . , L − 2, k = 1, 2, . . . , i + k < n, j + k < L), where n stands for the length of the TPG and L is the length of scan paths (for the example in question n = 5, L = 8). Therefore, the "00" pair of bits shall never appear at the A and B inputs of gate G2, which is mandatory to switch the gate output over to the state that is opposite to the fault, i.e., to zero. Likewise, it is also infeasible to deliver such combinations as AB = 01 or AB = 10 to inputs of the G4 gate, which is necessary to set the gate output to "1". Moreover, both inputs of the G3 gate are always equal, so they appear to be shorted, and that makes it infeasible to detect the type of the actual fault. In turn, the detection of a fault affecting the G1 gate needs to deliver the test vector sequence "00,10" to the AB inputs. Since inputs of that gate also always adopt the same state during the entire testing process, the aforementioned sequence shall never occur.
The issue disclosed in Example 1 can be formulated in a more general way; i.e., a low phase shift between test bit sequences appearing at neighboring outputs of a pseudorandom TPG leads to a strong correlation between test vectors supplied to scan paths fed from these outputs. It is why these vectors may no longer be considered as pseudo-random ones, which adversely affects the fault coverage in digital CUTs with the STUMPS-type LBIST architecture [3] (pp. 177-181), [12,[34][35][36][37]. In addition, page 322 in [38] shows an example to demonstrate that test vector sequences generated by an I-type LFSR and feeding parallel inputs of a circuit tested in the test-per-clock mode shall be incapable of detecting some faults in the digital circuit under test.
The concept of output channels from a Test Pattern Generator (TPG) frequently appears in reference studies. These channels are actually numerous scan paths fed from outputs of the TPG. The phase shift between binary bitstreams supplied to individual scan paths is also referred to as the separation factor between data channels or between scan paths' content [3] (pp. 184-188), [13,14,34,37,39].
The low phase shift between bit sequences appearing at outputs of the type I LFSR register and their strong mutual correlation is also confirmed by the State-Time Diagram (STD) that depicts the sequence of test vectors produced by such a shift register [10]. For such an illustration of test vectors, each "0" appearing at any output of the register is represented by a white dot, whilst black dots stand for high ("1") values of outputs. Figure 3a shows an example of a 20-bit type I LFSR associated with the primitive polynomial p(x) = 1 + x 3 + x 4 + x 19 + x 20 , and a fragment of the STD graph for that register is depicted in Figure 3b. The Q 0 output of the register corresponds to dots at the top of the graph, while the Q 19 output is associated with dots in the graph bottom. The time axis for the register operation extends horizontally.
The register starts its operation from the state of <1000 . . . 00>. The STD graph from Figure 3b features with pretty regular shape since it has the form of oblique white and black stripes. One can easily spot that sequences of bits appearing at subsequent outputs of the LFSR register are gradually shifted by only one bit. Likewise, subsequent vectors appearing at outputs of that register are also gradually shifted by only one bit (except for the bit number 0). These properties indicate a strong correlation both between binary sequences and between subsequent vectors appearing at outputs of the LFSR register [10,12,35]. As a consequence, sequences of test vectors produced by such a register must not be considered as fully pseudo-random ones, both for the test-per-scan and test-per-clock testing strategies [10,40]. Similar observations are reported in [38] with respect to poor randomness of test vectors sequences delivered at outputs of a conventional LFSR. Figure 3a shows an example of a 20-bit type I LFSR associated with the primitive polynomial = 1 + + + + , and a fragment of the STD graph for that register is depicted in Figure 3b. The Q0 output of the register corresponds to dots at the top of the graph, while the Q19 output is associated with dots in the graph bottom. The time axis for the register operation extends horizontally.
(a) (b) The register starts its operation from the state of <1000…00>. The STD graph from Figure 3b features with pretty regular shape since it has the form of oblique white and black stripes. One can easily spot that sequences of bits appearing at subsequent outputs of the LFSR register are gradually shifted by only one bit. Likewise, subsequent vectors appearing at outputs of that register are also gradually shifted by only one bit (except for the bit number 0). These properties indicate a strong correlation both between binary sequences and between subsequent vectors appearing at outputs of the LFSR register [10,12,35]. As a consequence, sequences of test vectors produced by such a register must not be considered as fully pseudo-random ones, both for the test-per-scan and test-per-clock testing strategies [10,40]. Similar observations are reported in [38] with respect to poor randomness of test vectors sequences delivered at outputs of a conventional LFSR.
The strong correlation between sequences of test bits appearing at outputs of the type I LFSR is also confirmed by the graph plotted in Figure 4 for the correlation coefficient.
The correlation coefficient ,  for binary sequences is calculated from the formula [33]: where and stand for the values of the corresponding ith and jth outputs of the register at the moment of t, while = 0,1, . . . − 1, and is the length of binary sequences used to calculate the correlation coefficient. The parameters of  represent the time offset (phase shift) between binary sequences at the outputs and , where test sequences are observed and the difference between indexes andis considered as the space distance between these two sequences. When = , the function , calculates the autocorrelation coefficient for the binary stream appearing at the qi output. For all other cases, i.e., when ≠ , the function , calculates the coefficient of cross correlation between binary sequences produced at the outputs and . That coefficient can be calculated either for the entire period of the register operation, in which = 2 − 1 in Equation (1), or for the actual length of the test sequence. It must be also kept in mind that The strong correlation between sequences of test bits appearing at outputs of the type I LFSR is also confirmed by the graph plotted in Figure 4 for the correlation coefficient. The correlation coefficient C i,j (τ) for binary sequences is calculated from the formula [33]: where q i (t) and q j (t) stand for the values of the corresponding ith and jth outputs of the register at the moment of t, while i = 0, 1, [?]n − 1, and T is the length of binary sequences used to calculate the correlation coefficient. The parameters of τ represent the time offset (phase shift) between binary sequences at the outputs q i and q j , where test sequences are observed and the difference between indexes and j − i is considered as the space distance between these two sequences. When j = i, the function C i,i (τ) calculates the autocorrelation coefficient for the binary stream appearing at the q i output. For all other cases, i.e., when i = j, the function C i,j (τ) calculates the coefficient of cross correlation between binary sequences produced at the outputs q i and q j . That coefficient can be calculated either for the entire period of the register operation, in which T = 2 n − 1 in Equation (1), or for the actual length of the test sequence. It must be also kept in mind that for computation of the correlation coefficient by means of the equation (1) all parameters q i (t) = 0 shall be substituted with the values of "−1" [33].
In the graph depicted in Figure 4, the horizontal axes show the time offset τ expressed in the number of clock cycles (the axis designated as "time offset τ") as well as space distance, understood as the number of the register output (the axis designated as "output no"). In turn, the vertical axis ("correlation coefficient") is graduated with the values of correlation coefficients. The coefficient is calculated with respect to the binary sequence appearing at the q 0 output with no time offset. The graph presented in Figure 4 demonstrates that in the case of a conventional type I LFSR register, the coefficient of correlation between the binary sequence appearing at the q i output and the corresponding sequence at the q 0 output but delayed by i clock cycles adopts the maximum value, i.e., equals 1. On the other hand, the correlation coefficient between a bit sequence at any of the q i outputs and the bit sequence at the q 0 output and delayed by j clock cycles, where j = i, adopts really small values and ranges from −0.02 to 0.02, which is shown in Figure 5. for computation of the correlation coefficient by means of the equation (1) all parameters qi(t) = 0 shall be substituted with the values of "−1" [33]. In the graph depicted in Figure 4, the horizontal axes show the time offset τ expressed in the number of clock cycles (the axis designated as "time offset τ") as well as space distance, understood as the number of the register output (the axis designated as "output no"). In turn, the vertical axis ("correlation coefficient") is graduated with the values of correlation coefficients. The coefficient is calculated with respect to the binary sequence appearing at the q0 output with no time offset. The graph presented in Figure 4 demonstrates that in the case of a conventional type I LFSR register, the coefficient of correlation between the binary sequence appearing at the qi output and the corresponding sequence at the q0 output but delayed by i clock cycles adopts the maximum value, i.e., equals 1. On the other hand, the correlation coefficient between a bit sequence at any of the qi outputs and the bit sequence at the q0 output and delayed by j clock cycles, where j ≠ i, adopts really small values and ranges from −0.02 to 0.02, which is shown in Figure 5.   Figure 3a-the zoomed-in picture for the correlation coefficient ranging from −0.03 to +0.05. The correlation coefficient was calculated for the test sequence with the length of 10,000 vectors.
As mentioned before, not only the LFSR register can be applied as a PRPG with the LBIST arrangement and the STUMPS architecture, but linear registers of the CA or GLFSR can be used as well. The most frequent design option of the CA register is the one-dimensional Linear Hybrid Cellular Automata (LHCA) with null boundary conditions, made up of cells with the rules 90 and 150 [10,12,[41][42][43][44]. In a later part of this study, such a design option of the CA register shall be designated as LHCA 90/150. The reference literature also reports the application of two-dimensional LHCA to LBIST solutions, where a portion of cells is interconnected with at least three neighboring cells [45][46][47][48]. Additionally, the concept of the TPGs design as an LHCA with the hierarchical structure was developed [49]. However, all the aforementioned solutions suffer from some drawbacks that are discussed below.
The [12] reports that the separation coefficient between output channels of the LHCA 90/150 adopts pretty different values, and some of these values are very low. In addition,  Figure 3a-the zoomed-in picture for the correlation coefficient ranging from −0.03 to +0.05. The correlation coefficient was calculated for the test sequence with the length of 10,000 vectors.
As mentioned before, not only the LFSR register can be applied as a PRPG with the LBIST arrangement and the STUMPS architecture, but linear registers of the CA or GLFSR can be used as well. The most frequent design option of the CA register is the one-dimensional Linear Hybrid Cellular Automata (LHCA) with null boundary conditions, made up of cells with the rules 90 and 150 [10,12,[41][42][43][44]. In a later part of this study, such a design option of the CA register shall be designated as LHCA 90/150. The reference literature also reports the application of two-dimensional LHCA to LBIST solutions, where a portion of cells is interconnected with at least three neighboring cells [45][46][47][48]. Addi-tionally, the concept of the TPGs design as an LHCA with the hierarchical structure was developed [49]. However, all the aforementioned solutions suffer from some drawbacks that are discussed below.
The [12] reports that the separation coefficient between output channels of the LHCA 90/150 adopts pretty different values, and some of these values are very low. In addition, the information only about the configuration of the LHCA 90/150 register is insufficient for an unambiguous indication of the register outputs affected by the mentioned problem. The foregoing observation is also confirmed in [14], even with an additional conclusion that the problem with insufficient separation between a portion of output channels relates to as much as about 75% of structures developed for LHCA 90/150 registers associated with primitive polynomials. Similarly, two-dimensional Tree-structured Linear Cellular Automata (TLCA) also exhibited very low values of the phase shift between a portion of bit sequences appearing at outputs of the register [46].
In the case of two-dimensional LHCAs, at least some cells are interconnected with outputs of more than two neighboring cells [45,47,48]. Therefore, these cells comprise at least two XOR gates that are arranged in series. The need to use multi-layered networks of XOR gates is also a drawback of CA solutions with hierarchical structures [49]. Likewise, a portion of wires in the linear feedback of a GLFSR register requires a serial arrangement of XOR gates used as so-called multiplying units and other XOR gates, which are the so-called adding units [16]. Eventually, the need to use at least two-layered networks of XOR gates in feedback lines of the aforementioned linear registers mitigates the maximum operation frequency of such solutions.
The use of CA or GLFSR structures as PRPGs in LBIST circuits with the STUMP architecture needs the determination of phase shifts between individual bit sequences appearing at outputs of such registers. Some methods that have been applied for that purpose are actually based on simulation of CA or GLFSR modules by means of matrix operations and seeking adequate states of the register where single bits are set to "1" [43,50]. However, such methods are very time-consuming, which is confirmed in [12], where the algorithm developed in [50] is applied. Due to the very long computation time necessary to find out phase shifts between binary sequences at outputs of the LHCA 90/150 structures, the overall length of CA registers investigated in the mentioned study had to be limited. In turn, studies [11,51] report the application of an algebraic method based on discrete logarithms [52,53] (pp. 97-111) to compute the phase shifts. Unfortunately, all algorithms that are suitable for the calculation of discrete logarithms [53] (pp. 97-111) are very complex, and the application of the mentioned methods is practically limited to linear registers of a relatively short length.

Classical Method to Design LBIST Circuits with the STUMPS Architecture
The previous chapter outlines reasons that prevent us from achieving high coverage of faults in CUTs that are provided with LBIST circuits with the STUMPS architecture. These problems stem from the low phase shift between binary sequences appearing at outputs of pseudo-random TPGs that are designed exclusively on the basis of a conventional LFSR structure associated with a primitive polynomial. This is why a phase shifter [3] (pp. 184-188), [54] is usually inserted between outputs of the LFSR [34][35][36][37]55] or LHCA 90/150 [12,37,39,55,56] and serial inputs of scan paths incorporated into the CUT. Such a modified STUMPS architecture is shown in Figure 6. In a later part of this study, the TPG provided with a phase shifter at its outputs shall be referred to as a test pattern generator based on LFSR or CA with Phase Shifter at the outputs and designated as LFSR + PS and CA + PS, respectively. The phase shifter itself is a network of XOR gates appropriately wired to outputs of the LFSR or CA register.
[3] (pp. 184-188), [54] is usually inserted between outputs of the LFSR [34][35][36][37]55] or LHCA 90/150 [12,37,39,55,56] and serial inputs of scan paths incorporated into the CUT. Such a modified STUMPS architecture is shown in Figure 6. In a later part of this study, the TPG provided with a phase shifter at its outputs shall be referred to as a test pattern generator based on LFSR or CA with Phase Shifter at the outputs and designated as LFSR + PS and CA + PS, respectively. The phase shifter itself is a network of XOR gates appropriately wired to outputs of the LFSR or CA register. Its application not only makes it possible to provide sufficient phase shift between binary sequences supplied to serial inputs of scan paths but also enables incorporation of LFSR structures of much lower size than the number of scan paths ( ≪ in Figure 6). For instance, in [34], the authors used LFSR registers with lengths of 24, 32, 48, 64, 80, 96, and 128 bits to produce test sequences supplied then to multiple scan paths, where the number of scan paths fed from a single TPG ranged from several dozens to more than 8000. However, the use of a phase shifter inevitably entails a delay introduced into the Its application not only makes it possible to provide sufficient phase shift between binary sequences supplied to serial inputs of scan paths but also enables incorporation of LFSR structures of much lower size than the number of scan paths (n m in Figure 6). For instance, in [34], the authors used LFSR registers with lengths of 24, 32, 48, 64, 80, 96, and 128 bits to produce test sequences supplied then to multiple scan paths, where the number of scan paths fed from a single TPG ranged from several dozens to more than 8000. However, the use of a phase shifter inevitably entails a delay introduced into the hardware, which is a drawback of the proposed solution. The time of signal propagation throughout the phase shifter is several times longer than the propagation time of multiplexers incorporated into scan path cells [34], and, in addition, that time is added to the propagation time of the multiplexer embedded into the first cell of a scan path. These two factors restrict the maximum operation frequency of the test facilities made up of TPG and scan paths. It eventually leads to an extended total time of test execution.
In addition, when a single LFSR feeds a plurality of scan paths, things get worse due to the need to route interconnections within a sophisticated network of XOR gates incorporated into a phase shifter. Typically, the desired fan-out factor is achieved by incorporation of additional buffers (logic amplifiers) wired to the flip-flops' outputs of the LFSR and to some XOR gates outputs of the phase shifter. These two factors lead to further growth of propagation time for signals passing the phase shifter and make the engineering process for test pattern generators much more complicated.
The studies referenced disclose several methods to design phase shifters with the aim to ensure the assumed value of the phase shift between output data channels. Some of these methods are based on operations on digital matrices that describe structures of linear registers and phase shifters connected to the LFSR or CA outputs [37,57]. Other solutions employ simulation of a linear register operation, where the simulated linear register structure reflects the layout of an original linear register combined with a phase shifter. However, it is an approach that may prove time-consuming in the case of long linear registers [34][35][36]39,56]. Additionally, other methods have been developed intending to find out such structures of both an LFSR register and a phase shifter that are capable of generating the desired set of deterministic test vectors [22,58].

New Approach to Engineering of Pseudo-Random Pattern Generators for LBIST Circuits with the STUMPS Architecture
The engineering concept disclosed in this chapter is based on the application of DT-LFSR (Linear Feedback Shift Register composed of D-and T-type flip-flops) shift registers with a linear feedback [38,59] to new types of pseudo-random TPGs for LBIST circuits with the STUMPS architecture. However, before the structure of the innovative TPG is revealed, let us outline some selected properties of DT-LFSR registers.

Layout of a DT-LFSR Register
Any DT-LFSR is designed as an endless ring of D and T flip-flops connected in series. The principal schematic of an n-bit DT-LFSR register comprising k flip-flops of the D type and t = n − k flip-flops of the T type is depicted in Figure 7. In this study, such a structure of a DT-LFSR register shall be symbolically designated as D k T t . Attention should be paid to the fact that such a register contains no XOR gates in the main loop of the linear feedback. Instead, it contains XOR gates in internal linear feedback loops inside T-flip-flops. A T-type flip-flop is actually a D-type flip-flop with an XOR gate in a feedback loop. Therefore, the characteristic polynomial p(x) that corresponds to the linear feedback of an n-bit DT-LFSR from Figure 7 adopts the following form: where k + t = n. It is important to mention that any rearrangement of the register layout with reordering of the D and T flip-flop sequence in the ring register has no effect on the characteristic polynomial p(x) of that register. Relocation of T-type flip-flops in such registers can be actually considered as a sequence of left shift operations applied to local linear feedback loops. Such a procedure was named in [60] as the EL transformation. According to Theorem 1 disclosed in that study, such modifications of linear register structures are indifferent to the characteristic polynomial of the register. It is also worth mentioning that a DT-LFSR register is actually a specific variation of Linear Hybrid Cellular Automata (LHCA) with cyclic boundary conditions, made up of cells with the rules 60 and 240. Each T-flip-flop is a CA cell with the rule 60 while a D-flip-flop is a CA cell with the rule 240. A table summarizing all possible structures of DT-LFSR registers with the length of ∈ 15,100 and a linear feedback loop described by means of a primitive polynomial can be found in some past papers, including [38]. The table content serves as the proof that for the specific length n of a DT-LFSR register only some combinations of k and t parameters lead to such a D k T t structure of the register that corresponds to a primitive characteristic polynomial. Each of these structures contains at least one D-type flip-flop (i.e., k > 0 for every case). In addition, for some specific lengths of registers, any DT-LFSR structure associated with a primitive characteristic polynomial may not exist.

Correlations between Binary Sequences at Outputs of T-type Flip-Flops Incorporated into DT-LFSRs
Section 2 discloses the STD graph for the sequence of test vectors and the graph of the correlation coefficient for binary sequences appearing at outputs of a conventional 20-bit type I LFSR register. These parameters are deemed as the key quality indices for a register operated as a generator of pseudo-random sequences of test vectors. These two graphs for the DT-LFSR register with the D 3 T 17 structure (see Figure 8a), associated with the primitive polynomial: ( ) = 1 + ⋅ (1 + ) = 1 + + + + , are depicted in respective graphs in Figures 8b and 9. It is also worth mentioning that a DT-LFSR register is actually a specific variation of Linear Hybrid Cellular Automata (LHCA) with cyclic boundary conditions, made up of cells with the rules 60 and 240. Each T-flip-flop is a CA cell with the rule 60 while a D-flip-flop is a CA cell with the rule 240. A table summarizing all possible structures of DT-LFSR registers with the length of n ∈ [15, 100] and a linear feedback loop described by means of a primitive polynomial can be found in some past papers, including [38]. The table content serves as the proof that for the specific length n of a DT-LFSR register only some combinations of k and t parameters lead to such a D k T t structure of the register that corresponds to a primitive characteristic polynomial. Each of these structures contains at least one D-type flip-flop (i.e., k > 0 for every case). In addition, for some specific lengths n of registers, any DT-LFSR structure associated with a primitive characteristic polynomial may not exist.

Correlations between Binary Sequences at Outputs of T-type Flip-Flops Incorporated into DT-LFSRs
Section 2 discloses the STD graph for the sequence of test vectors and the graph of the correlation coefficient for binary sequences appearing at outputs of a conventional 20-bit type I LFSR register. These parameters are deemed as the key quality indices for a register operated as a generator of pseudo-random sequences of test vectors. These two graphs for the DT-LFSR register with the D 3 T 17 structure (see Figure 8a), associated with the primitive polynomial: mial may not exist.

Correlations between Binary Sequences at Outputs of T-type Flip-Flops Incorporated into DT-LFSRs
Section 2 discloses the STD graph for the sequence of test vectors and the graph of the correlation coefficient for binary sequences appearing at outputs of a conventional 20-bit type I LFSR register. These parameters are deemed as the key quality indices for a register operated as a generator of pseudo-random sequences of test vectors. These two graphs for the DT-LFSR register with the D 3 T 17 structure (see Figure 8a), associated with the primitive polynomial: ( ) = 1 + ⋅ (1 + ) = 1 + + + + , are depicted in respective graphs in Figures 8b and 9.    The STD graph for the aforementioned register is less regular than the graph for a conventional LFSR structure since oblique stripes, typical for conventional LFSR registers, are seen only in the areas that correspond to D-type flip-flops. Instead, in areas corresponding to T-type flip-flops, white triangle-shaped patterns can be seen at random locations. The foregoing observations are coherent with the findings reported in [38] with regard to test vector sequences generated by linear feedback registers with T-type flip-flops. In turn, the graphs for correlation coefficients clearly indicate that only the binary sequences appearing at outputs of D-type flip-flops are strongly correlated. On the other hand, the correlation coefficient for binary sequences produced at outputs of T-type flip-flops ranges from −0.02 to 0.02 (see Figure 10); i.e., it is extremely low. The STD graph for the aforementioned register is less regular than the graph for a conventional LFSR structure since oblique stripes, typical for conventional LFSR registers, are seen only in the areas that correspond to D-type flip-flops. Instead, in areas corresponding to T-type flip-flops, white triangle-shaped patterns can be seen at random locations. The foregoing observations are coherent with the findings reported in [38] with regard to test vector sequences generated by linear feedback registers with T-type flip-flops. In turn, the graphs for correlation coefficients clearly indicate that only the binary sequences appearing at outputs of D-type flip-flops are strongly correlated. On the other hand, the correlation coefficient for binary sequences produced at outputs of T-type flip-flops ranges from −0.02 to 0.02 (see Figure 10); i.e., it is extremely low.

Phase Shifts between Binary Sequences at Outputs of T-type Flip-Flops Incorporated into DT-LFSRs
This section is dedicated to investigating phase shifts between bitstreams at the outputs of D-and T-type flip-flops in the DT-LFSR type ring registers. Let us consider an nbit register with the D k T n−k structure and associated with the primitive polynomial ( ) = 1 + ⋅ (1 + ) . Let us also denote the phase shift introduced by D-type and T-type flip-flops present in that register as Φ and Φ , respectively. The analysis carried out in [38] makes it possible to find out that Φ = 1, while the phase shift Φ can be calculated from the following congruence: Please note that the phase shift Φ is identical for all D-type flip-flops incorporated into the DT-LFSR structure; likewise, the phase shift Φ is the same for all T-type flipflops within that register. In addition, the congruence (3) indicates that Φ ≠ 1 for every n > 1. According to Corollary 2 in [53] (p. 20), when g.c.d.( − , 2 − 1) = 1, the Φ solution of the Equation 3 always exists, and the solution can be found by means of (log (2 − 1)) binary operations. In practice, for the example when = 100 and = 63, the Φ solution of the Equation 3 was found in less than 4 s with the use of a calculator available online [61]. The n and k parameters provided in that example correspond to the only D 63 T 37 structure of a DT-LFSR register with the length of 100 bits that is associated with a primitive polynomial. Figure 11 shows a sequence of test vectors produced by a register of the DT-LFSR type with the D 1 T 2 structure. That register is associated with the primitive polynomial ( ) = 1 + + = 1 + ⋅ (1 + ) . Therefore, the respective coefficients are = 3 and = 1. For the example in question, the congruence (3) adopts the following form:

Phase Shifts between Binary Sequences at Outputs of T-type Flip-Flops Incorporated into DT-LFSRs
This section is dedicated to investigating phase shifts between bitstreams at the outputs of D-and T-type flip-flops in the DT-LFSR type ring registers. Let us consider an n-bit register with the D k T n−k structure and associated with the primitive polynomial p(x) = 1 + x k · (1 + x) n−k . Let us also denote the phase shift introduced by D-type and T-type flip-flops present in that register as Φ D and Φ T , respectively. The analysis carried out in [38] makes it possible to find out that Φ D = 1, while the phase shift Φ T can be calculated from the following congruence: Please note that the phase shift Φ D is identical for all D-type flip-flops incorporated into the DT-LFSR structure; likewise, the phase shift Φ T is the same for all T-type flipflops within that register. In addition, the congruence (3) indicates that Φ T = 1 for every n > 1. According to Corollary 2 in [53] (p. 20), when g.c.d. (n − k, 2 n − 1) = 1, the Φ T solution of the Equation (3) always exists, and the solution can be found by means of O log 3 (2 n − 1) binary operations. In practice, for the example when n = 100 and k = 63, the Φ T solution of the Equation (3) was found in less than 4 s with the use of a calculator available online [61]. The n and k parameters provided in that example correspond to the only D 63 T 37 structure of a DT-LFSR register with the length of 100 bits that is associated with a primitive polynomial. Figure 11 shows a sequence of test vectors produced by a register of the DT-LFSR type with the D 1 T 2 structure. That register is associated with the primitive polynomial p(x) = 1 + x + x 3 = 1 + x · (1 + x) 2 . Therefore, the respective coefficients are n = 3 and k = 1. For the example in question, the congruence (3) adopts the following form:

Example 2
which indicates that the phase shift introduced by the T-type flip-flops is Φ T = 3. Obviously, the phase shift introduced by D-type flip-flop is Φ D = 1.
which indicates that the phase shift introduced by the T-type flip-flops is Φ = 3. Obviously, the phase shift introduced by D-type flip-flop is Φ = 1. The magnitude of phase shift between binary sequences produced at the adjacent outputs of the foregoing DT-LFSR register with the D 1 T 2 structure is explained in Figure  11, where arrows indicate corresponding bits in mutually shifted binary sequences. To make the picture more clear, the binary sequence appearing at the Q2 output is repeated at the left-hand side of the binary sequence produced at the Q0 output. In addition, the numbering of clock cycles is shown in the leftmost part of the figure for easy calculation of the phase shift.
The congruence (3) also enabled calculation of the phase shift Φ for DT-LFSRs with the length exceeding 24 bits and containing a considerable number of T-type flip-flops (more than 80% of the total register length). Results of these calculations are summarized in Table 1, where the overall number of T-type flip-flops within each specific register and the total length of the register are provided in columns 2 and 3 of that table, respectively, column 4 specifies the duration (length) Ω of the operation cycle for each specific register, and column 5 contains the Φ parameters, i.e., the phase shift between binary sequences appearing at the output and the input of a T-type flip-flop within the DT-LFSR register in question.  The magnitude of phase shift between binary sequences produced at the adjacent outputs of the foregoing DT-LFSR register with the D 1 T 2 structure is explained in Figure 11, where arrows indicate corresponding bits in mutually shifted binary sequences. To make the picture more clear, the binary sequence appearing at the Q 2 output is repeated at the left-hand side of the binary sequence produced at the Q 0 output. In addition, the numbering of clock cycles is shown in the leftmost part of the figure for easy calculation of the phase shift.
The congruence (3) also enabled calculation of the phase shift Φ T for DT-LFSRs with the length exceeding 24 bits and containing a considerable number of T-type flip-flops (more than 80% of the total register length). Results of these calculations are summarized in Table 1, where the overall number of T-type flip-flops within each specific register and the total length of the register are provided in columns 2 and 3 of that table, respectively, column 4 specifies the duration (length) of the operation cycle for each specific register, and column 5 contains the Φ T parameters, i.e., the phase shift between binary sequences appearing at the output and the input of a T-type flip-flop within the DT-LFSR register in question. Now let us assume that an n-bit register of the DT-LFSR type contains t flip-flops of the T type that are deployed in register cells with the numbers n − t, n − t + 1, . . . , n − 1. Let us also denote Φ T i,j as the phase shift between binary sequences produced at the outputs i and j of the register in question, where i, j{n − t, n − t + 1, . . . , n − 1}; i.e., it is a phase shift between binary sequences appearing at outputs of two selected T-type flip-flops within the register. The phase shift Φ T i,j is then where = 2 n − 1 stands for the aforementioned duration of the operation cycle for the DT-LFSR register. For negative numbers, the operation mod appearing in (5) is calculated according to the formula a < 0 a mod Ω ⇒ Ω − (|a| mod Ω) . Please note that the phase shift Φ T i,j is calculated as modulo , and thus it can be less than Φ T . Therefore, let us define the minimum phase shift-Φ Tmin -between binary sequences produced by two discretely selected T-type flip-flops of the DT-LFSR registers. Such a minimum phase shift is defined by the following equation: The minimum phase shift Φ Tmin for some selected DT-LFSR registers are summarized in column 6 of Table 1. The foregoing deliberations take account of binary sequences merely at outputs of T-type flip-flops since only these flip-flops introduce a phase shift that is different from 1. Therefore, only the outputs of T-type flip-flops can be exclusively used as outputs of a Test Pattern Generator.

Pseudo-Random Pattern Generator DT-LFSR-TPG for LBIST Circuits with the STUMPS Architecture
Let us denote Ψ min as the minimum guaranteed phase shift between binary sequences produced at outputs of any TPG applicable for the STUMPS architecture. Results from Table 1 demonstrate that a single DT-LFSR register used as a pseudo-random TPG enables a pretty high level of phase shift that is much more than the one achieved by means of the methods disclosed in [13,34,35,39].
The number of scan paths implemented in LBIST structures with the STUMPS architecture and incorporated into modern integrated circuits is as high as several thousand [25]. The implementation of a PRPG as such a long and single DT-LFSR register is a sophisticated engineering challenge. There are difficulties associated with development for the register topology as well as a network of interconnections between its cells and a network of interconnections between the register outputs and inputs of scan paths so as to achieve the desired operational frequency of such a TPG. The next problem is associated with seeking an appropriate D k T t structure associated with a primitive polynomial with the length of several thousand bits. The verification of whether the characteristic polynomial of such a register is really a primitive one can be carried out by means of the efficient algorithm proposed in [39,62]. However, such a solution needs prime factorization of a very large number expressed as 2 n − 1, where n stands for the length of the DT-LFSR register.
The aforementioned problems can be resolved by means of the original method and the innovative method outlined in this paper dedicated to the implementation of PRPG structures as a set of DT-LFSR registers operated in the parallel mode. The STUMPS architecture with a TPG designed according to the foregoing concept is depicted in Figure 12. All DT-LFSR registers of such a test pattern generator are of identical design and are synchronized by means of a common clock signal and their operation is controlled simultaneously by the same control signals. Outputs of T-type flip-flops incorporated into each single register deliver sequences of test vectors to a separate group of scan paths, whilst outputs of D-type flip-flops remain unused, which is indicated in the sketch below. A Test Pattern Generator with such a structure shall be further referred to as DT-LFSR-TPG. The aforementioned problems can be resolved by means of the original method and the innovative method outlined in this paper dedicated to the implementation of PRPG structures as a set of DT-LFSR registers operated in the parallel mode. The STUMPS architecture with a TPG designed according to the foregoing concept is depicted in Figure  12. All DT-LFSR registers of such a test pattern generator are of identical design and are synchronized by means of a common clock signal and their operation is controlled simultaneously by the same control signals. Outputs of T-type flip-flops incorporated into each single register deliver sequences of test vectors to a separate group of scan paths, whilst outputs of D-type flip-flops remain unused, which is indicated in the sketch below. A Test Pattern Generator with such a structure shall be further referred to as DT-LFSR-TPG. According to [34,35], the minimum phase shift Θ between binary sequences produced at any two outputs of a future TPG is imposed a priori before the engineering of a pseudo-random TPG for the STUMPS architecture is commenced. The sufficient level of that phase shift is deemed as a guarantee that a sequence of test vectors supplied to inputs of the CUT shall be of a pseudo-random nature. Any properly designed TPG must meet the condition that Ψ ≥ Θ ; i.e., the minimum phase shift between binary sequences appearing at outputs of such a TPG must be not less than the required limit imposed by the circuit designer. To meet the foregoing requirement, each single-DT-LFSR register incorporated into the DT-LFSR-TPG structure of the TPG must start its operation from a different seeding vector (i.e., from a different initial state). Let us assume that the seeding content S1 of the DT-LFSR1 register is adopted as the reference point and let us also assume that such a seeding content appears in the sequence of vectors produced at outputs of the foregoing register in the clock cycle No. 0. Therefore, the seeding content Si of any According to [34,35], the minimum phase shift Θ min between binary sequences produced at any two outputs of a future TPG is imposed a priori before the engineering of a pseudo-random TPG for the STUMPS architecture is commenced. The sufficient level of that phase shift is deemed as a guarantee that a sequence of test vectors supplied to inputs of the CUT shall be of a pseudo-random nature. Any properly designed TPG must meet the condition that Ψ min ≥ Θ min ; i.e., the minimum phase shift between binary sequences appearing at outputs of such a TPG must be not less than the required limit imposed by the circuit designer. To meet the foregoing requirement, each single-DT-LFSR register incorporated into the DT-LFSR-TPG structure of the TPG must start its operation from a different seeding vector (i.e., from a different initial state). Let us assume that the seeding content S 1 of the DT-LFSR1 register is adopted as the reference point and let us also assume that such a seeding content appears in the sequence of vectors produced at outputs of the foregoing register in the clock cycle No. 0. Therefore, the seeding content S i of any DT-LFSRi register should be included in the initial sequence in clock cycle d i , where d i can be expressed by means of the following formula: If so, binary sequences produced at the rth outputs of the DT-LFSR1 and DT-LFSRi registers, where r = 0, 1, . . . , n − 1, shall be mutually shifted with the phase shift of d i clock cycles.
The maximum number of scan paths m max that can be wired to outputs of a TPG with the specific minimum limits for Θ min and Φ Tmin shall be defined by Equation (8).
The maximum numbers m max calculated from the foregoing equation are summarized in Table 2, where columns 2 and 3 of the table contain parameters of DT-LFSR registers incorporated into the TPG. The number of scan paths for various Θ min is provided in columns 4 to 9. The Θ min are adequately selected so as to cover the range assumed in [34,39]. Please note that in cases when the presumed offset Θ min is known, Table 2 makes it possible to find out n and t parameters of the appropriate DT-LFSR register to enable designing of a TPG that is able to supply test sequences to any number of scan paths, or at least as many as practicable. For comparison, Table 3 summarizes the maximum number of scan paths that were obtained in [34,39] for similar phase shifts Θ min as included in Table 2. The LFSR I and LFSR II symbols in the first column of the table correspond to the test pattern generator of the LFSR + PS type as outlined in [34], where the phase shifter is wired to outputs of LFSR registers of the respective I and II types. In turn, the LHCA 90/150 symbol stands for the generator of the CA + PS type designed to produce pseudo-random tests and including the LHCA 90/150 register and a phase shifter. In addition, Table 3 comprises results for two lengths of LFSR and LHCA 90/150 registers designed to control phase shifters, where the length (n) can be either n = 24 or n = 32 bits. The comparison between Tables 2 and 3 clearly indicates that for the same separation between channels, the DT-LFSR-TPG is able to feed more or even sufficiently more scan paths than the solutions disclosed in [34,39]. It is only necessary to properly select the n and t parameters of the DT-LFSR register that is then used to design the DT-LFSR-TPG structure. For instance, all DT-LFSR registers with the n and t parameters listed in lines 5 to 24 of Table 2 make it possible to design a PRPG of the DT-LFSR-TPG type with the m max parameters that exceed the ones provided in Table 3. To emphasize advantages of DT-LFSR-TPG structures, Table 4 summarizes the maximum number m max of scan paths that can be fed from a DT-LFSR-TPG structure for much higher phase shifts Θ min (separation offset between channels) than the ones considered in Tables 2 and 3. Please note that when the phase shift Θ min is as high as tens of trillions of clock cycles, it is also possible to find a DT-LFSR register with appropriate n and t parameters to use it as the basis to design a DT-LFSR-TPG structure capable of feeding any number of scan paths. Table 4. Maximum numbers of output channels for test pattern generators of the DT-LFSR-TPG type when the Θ min separation between output channels is high.

No.
t n Θ min 10 5 10 6 10 7 10 8 10 9 10 10 10 11 10 12 10 13   1  22  25  15  1  -------2  25  31  858  85  8  ------3  27  29  198  19  1  ------4  28  31  766  76  7  ------5  33  35  10,412  1041  104  10  1  ----6  38  Not only is m max essential for the engineering of a test pattern generator, but also the minimum phase shift Ψ min between binary sequences supplied by the TPG to inputs of the predefined number m of scan paths. The minimum phase shift is expressed by (9). Table 5 summarizes parameters Ψ min for various numbers of scan paths fed from the TPG, where the numbers of scan paths are almost identical to the ones in [34,39]. The analysis of the table contents makes it possible to conclude that for a specific number of scan paths a suitable DT-LFSR register can be selected so that the required phase shift can be assured. The maximum limit of the phase shift taken into account in studies [34,39] is 4096 × 10 4 . On the contrary, the figures from Table 5 indicate that the phase shift (separation factor for output channels) enabled by the DT-LFSR Test Pattern Generator (DT-LFSR-TPG) for a specific number of scan chains can be higher by several magnitudes than numbers offered by LFSR + PS and CA + PS solutions disclosed in aforementioned studies.

Results of Logic Synthesis
Other parameters that are essential for any test pattern generator include the maximum operational frequency and the cost of hardware implementation, expressed, for instance, as the area of silicon surface occupied by the TPG or as the number of equivalent two-input NAND gates. The foregoing properties for the innovative type of DT-LFSR-TPG pseudorandom pattern generators were established from the logic synthesis of selected DT-LFSR registers. For that purpose, a parameterized model of a DT-LFSR register was developed in the VHDL language, where the general layout of the register is shown in Figure 13a. Besides D-type flip-flops and a 2 to 1 multiplexer, the register also comprises configurable D/T flip-flops with the internal structure as shown in Figure 13b. Corresponding to the input signal M, such a circuit can perform as a D-type flip-flop (for M = 0) or a T-type flip-flop (when M = 1). Variations of parameters for the VHDL model make it possible to set up both the register length and the number of configurable D/T flip-flops included in the register, while the MODE signal shown in Figure 13a is dedicated to controlling the operation mode for the entire DT-LFSR register module. For MODE = 1, the module operates as a DT-LFSR linear shift register, whilst setting the MODE signal to zero switches the module to operation as a shift register, and its initial content can be seeded by feeding a corresponding series of bits to its serial input SI. The module also has a serial output SO, which enables serial sequencing of DT-LFSR registers included in a large DT-LFSR-TPG unit, where the SO output of the ith register is connected to the SI input of the register with the number (i + 1), where i = 1, 2, Z − 1, and Z stands for the total number of DT-LFSR registers included into the test pattern generator. When MODE = 0, all DT-LFSR modules make up a single and long shift register, which makes it possible to seed any initial content of that registers. The logic synthesis of DT-LFSR registers was carried out with the use of the commercial software LeonardoSpectrum from Mentor Graphics, Wilsonville, OR, US (currently Siemens EDA), as well as the TSMC 0.25 µm standard cell library included in the ASIC Design Kit rev. 3.1, also made available by Mentor Graphics, Wilsonville, OR, US. The decision to use this particular standard cell library was triggered by the fact that the same technology, with 0.25 µm, was used in [34] for synthesis of phase shifters investigated in that study. It enables a reliable comparison of experimental results disclosed in the further part of this paper against the ones discussed in [34].
Nevertheless, it should be mentioned herein that the structure of the DT-LFSR register is ring-shaped; therefore, it enables substantial reduction of length for all interconnections between the register cells and, in consequence, delays caused by these interconnections. To achieve the most satisfying results for the DT-LFSR structure, it is recommended that the so-called interleaving technique be applied [38,63,64]. The topology example of a D 3 T 17 register together with a network of interconnections between its cells, obtained by means of the mentioned interleaving technique, is shown in Figure 14. The rectangles designated in the picture with letters "D" and "T" stand for corresponding standard cells with both D flip-flops as well as configurable D/T flip-flops, whereas the rectangle designated with "M" corresponds to a multiplexer 2 to 1. Inputs and outputs of standard cells are deployed on the bottom and top edges of these cells. Directions of signal flows are indicated with an arrow for each interconnection presented in the picture. To achieve better clarity of the picture, the line of the clock signal is omitted. The only long interconnection in an DT-LFSR register as shown in Figure 14 is the line of the MODE signal and the interconnecting line between the output of the flip-flop No. 19 and the SO output of the entire module. Nevertheless, propagation time down these interconnections has no impact on the maximum operational frequency of the register in the testing mode. The logic synthesis of DT-LFSR registers was carried out with the use of the commercial software LeonardoSpectrum from Mentor Graphics, Wilsonville, OR, US (currently Siemens EDA), as well as the TSMC 0.25 µm standard cell library included in the ASIC Design Kit rev. 3.1, also made available by Mentor Graphics, Wilsonville, OR, US. The decision to use this particular standard cell library was triggered by the fact that the same technology, with 0.25 µm, was used in [34] for synthesis of phase shifters investigated in that study. It enables a reliable comparison of experimental results disclosed in the further part of this paper against the ones discussed in [34].
Nevertheless, it should be mentioned herein that the structure of the DT-LFSR register is ring-shaped; therefore, it enables substantial reduction of length for all interconnections between the register cells and, in consequence, delays caused by these interconnections. To achieve the most satisfying results for the DT-LFSR structure, it is recommended that the so-called interleaving technique be applied [38,63,64]. The topology example of a D 3 T 17 register together with a network of interconnections between its cells, obtained by means of the mentioned interleaving technique, is shown in Figure 14. The rectangles designated in the picture with letters "D" and "T" stand for corresponding standard cells with both D flip-flops as well as configurable D/T flip-flops, whereas the rectangle designated with "M" corresponds to a multiplexer 2 to 1. Inputs and outputs of standard cells are deployed on the bottom and top edges of these cells. Directions of signal flows are indicated with an arrow for each interconnection presented in the picture. To achieve better clarity of the picture, the line of the clock signal is omitted. The only long interconnection in an DT-LFSR register as shown in Figure 14 is the line of the MODE signal and the interconnecting line between the output of the flip-flop No. 19 and the SO output of the entire module. Nevertheless, propagation time down these interconnections has no impact on the maximum operational frequency of the register in the testing mode.
corresponding standard cells with both D flip-flops as well as configurable D/T flip-flops, whereas the rectangle designated with "M" corresponds to a multiplexer 2 to 1. Inputs and outputs of standard cells are deployed on the bottom and top edges of these cells. Directions of signal flows are indicated with an arrow for each interconnection presented in the picture. To achieve better clarity of the picture, the line of the clock signal is omitted. The only long interconnection in an DT-LFSR register as shown in Figure 14 is the line of the MODE signal and the interconnecting line between the output of the flip-flop No. 19 and the SO output of the entire module. Nevertheless, propagation time down these interconnections has no impact on the maximum operational frequency of the register in the testing mode. Synthesis results summarized in Table 6 served as the basis to calculate hardware costs of DT-LFSR Test Pattern Generator (DT-LFSR-TPG) dedicated to LBIST structures with the STUMPS architecture and with 128, 256, 512, 1024, 2048, and 4096 scan paths within the structure. The numbers of scan paths selected for experiments were exactly the same as in Table IX in [34] to simplify a comparison of cost and maximum operational frequencies between the test generators investigated in this study and the ones shown in [34]. The results from the mentioned calculations are provided in Table 7, where columns 2 and 3 of the table specify t and n parameters of DT-LFSR registers with the D n−t T t structure incorporated into the DT-LFSR Test Pattern Generator (DT-LFSR-TPG). Corresponding implementation costs for each test pattern generator feeding the specific numbers of scan paths are listed in columns from 4 to 9. Similarly to Table 6, these costs are expressed as the number of equivalent two-input NAND gates. The grey background highlights solutions with the lowest cost for the specific number of scan paths. For instance, when an LBIST circuit with the STUMPS architecture contains 1024 scan paths, the cheapest test pattern generator of the DT-LFSR-TPG type is made up of 17 DT-LFSR registers and each of these registers can be designated as D 1 T 62 . The overall cost of such a solution equals 7837 equivalent two-input NAND gates. Some supplementary details, separated with commas and related to structures of DT-LFSR-TPG units feeding the specific number of scan paths, are provided in Table 8, which has exactly the same layout as Table 7 but slightly different content. These details are the Z number of DT-LFSR registers with the D n−t T t structure incorporated into the test pattern generator as well as the number of "redundant" flip-flops of the D and T types present in the structure of the DT-LFSR-TPG unit. The term "redundant" flip-flops refers to the flip-flops of both the D and T types with outputs that are not connected to any scan paths (not used to feed any scan path). On the other hand, these flip-flops are indispensable to design Z properly operating DT-LFSR registers with the D n−t T t structures and incorporated as components of the DT-LFSR-TPG unit.
The maximum operational frequency of the DT-LFSR-TPG unit in the testing mode is the same as for DT-LFSR registers included as components of the unit and equals 994 MHz. However, the interleaving technique may be necessary for the topology of the unit to achieve the maximum possible operation frequency of the DT-LFSR registers (see Figure 14). Eventually, it leads to long interconnections in the layout of the DT-LFSR-TPG unit between SO outputs of preceding DT-LFSR registers and SI inputs of subsequent registers. It is why the operating frequency of the TPG unit may be slightly lower in the seeding mode (serial shift) than in the testing mode. Figures provided in Table IX of [34] were used to calculate implementation costs and maximum operational frequencies for test pattern generators of the LFSR + PS types, where each test pattern generator of that type is made up of an LFSR register with a phase shifter connected to its outputs. The calculation results are summarized in Table 9. The investigations were dedicated to LFSR + PS test pattern generators with LFSR registers, where feedback loops of these registers are defined by means of the following primitive polynomials [34]: Each polynomial was investigated for two options of a linear register designed according to that polynomial, namely an LFSR register with an external feedback loop (type I LFSR [34]) and an LFSR register with an internal feedback loop (type II LFSR [34]). The numbers of scan paths fed from a test pattern generator of the LFSR + PS type are listed in the first column of Table 9. For each test pattern generator, the table contains two numbers, one above the other. The first number stands for the implementation cost of an LFSR + PS test pattern generator, expressed as the number of equivalent two-input NAND gates and comprises both the cost of a phase shifter and the one of an LFSR register. The second number is the maximum operating frequency of a TPG unit expressed in megahertz (MHz). The frequency was calculated with consideration of delay time contributed by the tree of XOR gates in the phase shifter as well as the propagation time of the 2-to-1 multiplexer and setup time of the D-type flip-flop that are incorporated into the first cell of each scan path. In addition, the asterisk "*" was used to indicate the maximum operational frequencies of LFSR + PS units, when the frequency is not limited by the operation of the phase shifter but by the time of signal propagation down the feedback loop of an LFSR register. Table 9. Implementation costs and maximum operation frequencies for test pattern generators of the LFSR + PS type disclosed in [34].

Number of Scan Paths
Type I LFSRs Type II LFSRs Both the parameters discussed above were mutually compared for test pattern generators of the DT-LFSR-TPG type and the LFSR + PS type and comparison results are summarized in Table 10 with the same layout as Table 9 but slightly different content. The topmost number in each cell of the table says how much (in percentage) the hardware overhead contributed by the DT-LFSR-TPG unit exceeds the corresponding overhead attributable to the TPG unit of the LFSR + PS type. In turn, the bottommost number in the table cell informs how much (in percentage) the maximum operating frequency for the DT-LFSR Test Pattern Generator (DT-LFSR-TPG) exceeds the frequency offered by a corresponding LFSR + PS solution. The comparison was carried out for those DT-LFSR-TPG units that offer the cheapest cost for the specific number of scan paths fed from the TPG unit (pursuant to information from Table 7). Table 10. Comparison of implementation costs and maximum operating frequencies between test pattern generators of the LFSR + PS and DT-LFSR-TPG types. The maximum operating frequency of DT-LFSR-TPG units exceeds the corresponding frequency for LFSR + PS solutions given in [34] by 20% to 191%. The discrepancy between operation frequencies for both test pattern generators tends to increase in pace with the number of scan paths fed by these TPG units. The higher operating frequency of a TPG unit enables the reduction of testing time, which, in turn, contributes to a reduction in expenses for the entire testing process of integrated circuits. However, the benefit is paid by much higher hardware overhead associated with the use of test pattern generators of the DT-LFSR-TPG type, where such an overhead may be higher by 17% to 223% than the LFSR + PS modules given in [34].

Type I LFSRs Type II LFSRs
However, one has to keep in mind that the silicon area assigned for a network of interconnections between flip-flops and gates included in these circuits is not taken into consideration for both solutions of test pattern generators. The delay (propagation) time introduced by these interconnections is also ignored. For pseudo-random pattern generators of the DT-LFSR-TPG type, this is permissible since the layout of interconnection networks in such TPGs is regular and the networks are of a local nature, in particular when the topology of components that make up the circuits is carefully designed. This is why only a slight impact of these interconnections on the overall circuit surface and the maximum operating frequency can be assumed. On the contrary, the network of interconnections in test pattern generators of the LFSR + PS is much more chaotic with numerous branches. It particularly refers to LFSR + PS modules with a large number of output channels, where the area occupied by interconnections takes a significant portion of the entire area assigned to the test pattern generator. Consequently, propagation times down these interconnections also have a substantial impact on the maximum operating frequencies of such TPGs.

Discussion and Conclusions
This study is intended to present a new concept for the engineering of a Test Pattern Generator (TPG) capable of producing pseudo-random test vectors for scan path testing of the STUMPS architecture. The innovative structure assumes the implementation of multiple Linear Feedback Shift Registers with D-type and T-type flip-flops (DT-LFSR) with identical layouts. Such a TPG, designated herein as DT-LFSR-TPG, enables a substantial increase in phase shift between binary sequences appearing at the TPG outputs compared to phase shifters described in [34,39]. What is also important in such TPGs is that the acceptable separation factor (phase shift) between output channels is guaranteed for any pair of the TPG outputs, quite the opposite to other solutions; for instance, the one disclosed in [13], where such an offset is guaranteed only for adjacent outputs. Moreover, with the assumption of the minimum phase shift, the DT-LFSR-TPG circuits also make it possible to wire more scan paths to the TPG outputs as compared to the numbers achieved in [34,39].
The DT-LFSR-TPG modules also outrank the test pattern generators of the LFSR + PS type as proposed in [34] in terms of the maximum operating frequency. The divergence between operation frequencies becomes even more significant when the number of TPG output channels increases. The high generation frequency for pseudo-random test patterns, achieved for DT-LFSR-TPG solutions, enables substantial reduction of the testing time and, in consequence, reduction of expenses for testing. However, the higher hardware overhead, as compared to solutions outlined in [34], is a certain drawback of the TPGs investigated in this study. The DT-LFSR-TPG structures also prevail in higher operation frequencies as compared to two-dimensional and hierarchical CA and GLFSR registers that comprise at least two layers of XOR gates in their paths of linear feedback. On the contrary, only a single XOR gate is included in each local feedback loop (i.e., the internal loop of a T-type flip-flop) when DT-LFSR registers are used as components of test pattern generators of the DT-LFSR-TPG type.
Pseudo-random pattern generators of the DT-LFSR-TPG type seem to be an attractive solution for designers who still have some redundant hardware remaining in their projects, where integrated circuits with a large number of input and output lines with quite low utilization of internal resources can serve as a good example. At least for a portion of such circuits, the overall area occupied on a silicon substrate is determined by dimensions of numerous pads for input and output lines. If so, a substantial part of silicon areas available for an internal logic of an IC still remains redundant and can be used for the needs of the LBIST structure.
The DT-LFSR-TPG structures of pseudo-random test pattern generators are easy to design, which is a subsequent advantage of such solutions. The design is just reduced to selection of an appropriate D k T t structure of the DT-LFSR register (based on Tables 1 and 6) suitable for the test pattern generator. The selection must guarantee that the required separation between output channels of the DT-LFSR-TPG module is preserved while keeping the cost of the test pattern generator at the lowest possible level. The determination of content for both of the aforementioned tables is also a minor problem. As was mentioned in Section 4.3, the calculation of phase shifts between binary sequences at outputs of the DT-LFSR register with the register length up to as many as 100 bits takes not more than several seconds. In turn, the logic synthesis of all DT-LFSR registers listed in Table 6 took less than 1 min. On the contrary, the time necessary to design a test pattern generator of the LFSR + PS type with the use of methods given in [34] was sometimes as long as several hours. It is worth mentioning that the very short time necessary to find the separation coefficient between output channels of the DT-LFSR-TPG structure is also a distinctive property of that solution that overcomes the PRPGs designed on grounds of linear registers of the LHCA 90/150 type, two-dimensional and hierarchical CA, or GLFSR ones.
Further investigations on pseudo-random pattern generators of the DT-LFSR-TPG type shall be dedicated to the reduction of hardware overhead introduced by these modules. The figures provided in Table 8 indicate that such an overhead is caused, for instance, by a high number of redundant D and T flip-flops that are not connected to any scan path. Therefore, the idea of reducing the overhead, in particular for TPGs with a large number of output channels, may consist in seeking DT-LFSR registers designed according to primitive polynomials but with the length of n much higher than the ones investigated in this paper. Hopefully, the suitable DT-LFSRs shall be found to enable the construction of DT-LFSR-TPG modules with only a slight number of redundant D and T flip-flops. Another opportunity to significantly reduce the hardware overhead introduced by a DT-LFSR Test Pattern Generator lies in the application of purposefully designed T-type flip-flops with low implementation cost (i.e., with a little area occupied on a silicon substrate), similarly to the solution disclosed in [65]. Then, the original seeding method can be applied to T-type flip-flops, as shown in [65,66].
Another opportunity to seek for PRPGs offering some compromise between the maximum operating frequency and the implementation cost lies in further improvements of LFSR + PS and CA + PS test pattern generators. One has to be aware that the key factor that mitigates the maximum operating frequency is the time of signal propagation through a phase shifter wired to outputs of the LFSR or CA registers. Logic structures of that type can be described in a very instinctive manner by means of the Reed-Muller (RM) algebra [67] (pp. [44][45][46], [68][69][70][71]. The logic circuits based on the RM logic are pretty well investigated and numerous methods for optimization of propagation time have already been developed for them, where one of such methods is disclosed in [72]. The author expects that some of these methods can be used to draw up a new method for the engineering of phase shifters with propagation times much shorter than the ones reported in [34,39].

Conflicts of Interest:
The author declares no conflict of interest.