FATE: A Flexible FPGA-Based Automatic Test Equipment for Digital ICs

: The limits of chip technology are constantly being pushed with the continuous development of integrated circuit manufacturing processes and equipment. Currently, chips contain several billion, and even tens of billions, of transistors, making chip testing increasingly challenging. The verification of very large-scale integrated circuits (VLSI) requires testing on specialized automatic test equipment (ATE), but their cost and size significantly limit their applicability. The current FPGA-based ATE is limited in its scalability and support for few test channels and short test vector lengths. As a result, it is only suitable for testing specific chips in small-scale circuits and cannot be used to test VLSI. This paper proposes a low-cost hardware and software solution for testing digital integrated circuits based on design for testability (DFT) on chips, which enables the functional and performance test of the chip. The solution proposed can effectively use the resources within the FPGA to provide additional test channels. Furthermore, the round-robin data transmission mode can also support test vectors of any length and it can satisfy different types of chip test projects through the dynamic configuration of each test channel. The experiment successfully tested a digital signal processor (DSP) chip with 72 scan test pins (theoretically supporting 160 test pins). Compared to our previous work, the work in this paper increases the number of test channels by four times while reducing resource utilization per channel by 37.5%, demonstrating good scalability and versatility.


Introduction
With the continuous advancement of integrated circuit technology, the integration density and complexity of internal transistors in chips continue to increase.Currently, the number of internal transistors in chips has reached tens of billions [1].At the same time, competition among chip manufacturers is becoming more intense, with more powerful chips being introduced every six months or a year to capture the market.The shortened production cycles and increased complexity of chips have made chip verification and testing a bottleneck in the microelectronics production process.The importance of chip testing lies in its ability to promptly identify problems after manufacturing is completed, to weed out defective chips from batch production, preventing them from entering the market and significantly increasing subsequent after-sales costs.
During the final stage of post-silicon validation for product testing, specialized ATE is often necessary.ATE can test various types of chips, including digital, radio frequency (RF), analog, power, and memory, and is widely used in the production testing of integrated circuits [2].However, the acquisition or rental of such specialized ATEs can be costly.For example, testing equipment like Advantest's V93000 [3] and Teradyne's J750 [4], both from Japan and the United States, respectively, can cost upwards of millions of dollars.Moreover, these devices are bulky and require significant time for technical personnel to learn and master operation, all contributing to the high testing costs of chips.However, with the rapid development of VLSI and the intensification of market competition, it is crucial for small chip design companies to validate their designs within a limited budget by using cost-effective testing solutions.Particularly for digital chips, many functionalities of ATE are redundant.Conducting functional and parametric tests, including connectivity, static characteristic parameters, dynamic characteristic parameters, and switch characteristic tests [5], can identify most chip defects, thereby fulfilling the testing requirements for small-batch chip production.
Furthermore, in certain specific-use scenarios, such as presenting to clients at conferences or conducting joint debugging across different teams, renting dedicated automated testing equipment would entail a significant amount of time spent on preparation.Therefore, a low-cost, reconfigurable portable digital chip testing device is crucial.
Due to the widespread use of FPGA devices in ATE testing and board-level application testing, there is a large amount of work attempting to achieve the test of the chip through the FPGA to replace some of the functions of the ATE equipment.This can greatly reduce the size and cost of the test equipment, and can quickly carry out the retest of failed chips or the manufacturing test of small batches of chips.However, most of the research has only conducted functional testing on specific chips to verify whether the system meets the design requirements.However, for ultra-large-scale high-speed chips, it is also necessary to detect internal delay faults to determine whether the chip can operate at the set frequency.Furthermore, obtaining the chip's maximum operating frequency for performance evaluation and design optimization is crucial for the reliability of high-speed chips.These testing systems need to be custom-developed for different chip test items, lacking flexibility.Additionally, the number of test channels is limited, making them unsuitable for ultra-large-scale high-speed chips with many pins.
To solve the above problems, this paper proposes a flexible FPGA-based automatic test equipment (FATE), which can be applied to all chips with DFT internally.It features strong scalability and high flexibility.Compared with existing FPGA-based automatic test equipment, the FATE has three main advantages: 1.
In this study, we designed a configurable test channel, which can achieve different chip test items, to detect the chip's functional faults, delay faults, etc., and provide more test channels to adapt to more chips.2.
In the case of a limited number of interfaces, the system supports test vectors of any length by round-robin scheduling interface transmissions, thereby improving fault coverage.

3.
Based on the structure of the chip's design for testability, the frequency scanning test method can quickly obtain the limit performance of the chip for the appropriate scene.
The remainder of this paper is as follows: the second part introduces the main approach of DFT and related work, and the third part describes the overall structure of the system, as well as the specific hardware and software design of the system.The fourth part validates the system through actual chip testing and analyzes the system's performance limits.Finally, the fifth part summarizes this paper and elucidates future optimization directions.

Scan Test Design
Design for testability refers to the practice of inserting various hardware logic into the chip design to improve the testability of the chip and reduce the time required for testing.Scan test design is an important method of DFT, in which the chip's internal D-flip-flops are replaced with multiplexer-equipped scan flip-flops (SFF), and these are connected in series with the combinational logic on the timing path to form a scan chain [6][7][8].Test vectors Based on the types of faults in the chip and the different clock frequencies used for launch-capture, the scan test can be categorized into normal scan test, at-speed test [9], and faster-than-at-speed test [10].Normal scan test employs a lower clock frequency, aiming to verify the correctness of the chip's logic functionality.In contrast, the at-speed test utilizes a testing frequency consistent with the chip's working frequency to detect internal delay faults [11], ensuring normal operation at the set frequency.To enhance test quality and detect minor delay faults, a faster-than-at-speed test uses a testing frequency higher than the chip's working frequency.By conducting the faster-than-at-speed test, the limit frequency of chip failure can be obtained, providing insights into the chip's limit performance.

Related Works
Current FPGA-based test equipment primarily focuses on verifying the functionality of chips.For example, Nawarathna [12] designed a test device with 16 digital I/O interfaces that support testing of NAND gate circuits.Rabakavi [13] used an FPGA to conduct functional tests on simple chips like 7408 and 74,138.Vanitha [14] proposed an FPGAbased automatic test device for detecting behavioral faults in digital components such as flip-flops, multiplexers, and full adders.Bayrakci [15] developed a low-cost embedded device and designed software for testing the functionality and transmission delay of MCU chips.Che [16] proposed a memory chip testing system based on FPGA, designing a memory function testing module capable of completing basic read and write operation commands and common memory testing algorithms.Wang [17] designed an automatic testing system for an ARM core SoC, and using the timer capture function verification and MLDO power calibration as examples, introduced the process of using this system for hardware-software co-verification and FT (Functional Testing) tests.The aforementioned The scan test can be divided into two states: shift and launch-capture.During test preparation, when Scan Enable (SE) is set to 1, the test vector is shifted to the corresponding SFF under the driving of the test clock (TCK).Then, when SE is set to 0, TCK provides two clock cycles with adjustable frequencies.The first clock cycle is used to launch the test vector into the combinational logic for fault activation, and the second clock cycle is used for the next SFF to capture the test response.Finally, Scan Enable is set back to 1, and the test responses are shifted out through the scan chain and compared with the standard results to determine if the chip has any faults.
Based on the types of faults in the chip and the different clock frequencies used for launch-capture, the scan test can be categorized into normal scan test, at-speed test [9], and faster-than-at-speed test [10].Normal scan test employs a lower clock frequency, aiming to verify the correctness of the chip's logic functionality.In contrast, the at-speed test utilizes a testing frequency consistent with the chip's working frequency to detect internal delay faults [11], ensuring normal operation at the set frequency.To enhance test quality and detect minor delay faults, a faster-than-at-speed test uses a testing frequency higher than the chip's working frequency.By conducting the faster-than-at-speed test, the limit frequency of chip failure can be obtained, providing insights into the chip's limit performance.

Related Works
Current FPGA-based test equipment primarily focuses on verifying the functionality of chips.For example, Nawarathna [12] designed a test device with 16 digital I/O interfaces that support testing of NAND gate circuits.Rabakavi [13] used an FPGA to conduct functional tests on simple chips like 7408 and 74,138.Vanitha [14] proposed an FPGA-based automatic test device for detecting behavioral faults in digital components such as flipflops, multiplexers, and full adders.Bayrakci [15] developed a low-cost embedded device and designed software for testing the functionality and transmission delay of MCU chips.Che [16] proposed a memory chip testing system based on FPGA, designing a memory function testing module capable of completing basic read and write operation commands and common memory testing algorithms.Wang [17] designed an automatic testing system for an ARM core SoC, and using the timer capture function verification and MLDO power calibration as examples, introduced the process of using this system for hardware-software co-verification and FT (Functional Testing) tests.The aforementioned research focuses on functional testing of specific chips.Although such tests can verify the correctness of a chip's functions, they are unable to detect structural faults within the chip.Moreover, the test vectors required for functional testing are relatively short and can be directly stored within the resources of an FPGA.However, the test vector lengths for large-scale integrated circuits generally span tens of millions of bits, making these testing devices unsuitable for large-scale integrated circuits.
Additionally, Carvalho [18] designed automatic test equipment for SAMPA (a gas detection chip), which utilized the chip's DFT to detect internal stuck-at faults.In our previous work [19], we implemented an ATE that uses the scan chain to detect stuck-at faults and delay faults inside the chip.Although these efforts have begun utilizing scan chains to detect structural faults within chips, the need for custom development arises when conducting different fault detections due to the varying test pin configurations of chips, resulting in a lack of versatility.Additionally, these studies have also overlooked the potential of using scan chains to ascertain the chips' maximum frequency for performance testing.Furthermore, these ATEs support few test channels and rely on the limited test vectors stored in the FPGA's internal resources, resulting in insufficient versatility of the system and an inability to test chips in VLSI.

Overview
The FATE consists of four components: FPGA, Master DSP, personal computer (PC), and device under test (DUT).These components are divided into the user layer, control layer, and hardware layer based on the different functionalities of the system, with data transmission between different layers achieved through the interface.The architecture of the system is illustrated in Figure 2.
research focuses on functional testing of specific chips.Although such tests can verify the correctness of a chip's functions, they are unable to detect structural faults within the chip.Moreover, the test vectors required for functional testing are relatively short and can be directly stored within the resources of an FPGA.However, the test vector lengths for largescale integrated circuits generally span tens of millions of bits, making these testing devices unsuitable for large-scale integrated circuits.
Additionally, Carvalho [18] designed automatic test equipment for SAMPA (a gas detection chip), which utilized the chip's DFT to detect internal stuck-at faults.In our previous work [19], we implemented an ATE that uses the scan chain to detect stuck-at faults and delay faults inside the chip.Although these efforts have begun utilizing scan chains to detect structural faults within chips, the need for custom development arises when conducting different fault detections due to the varying test pin configurations of chips, resulting in a lack of versatility.Additionally, these studies have also overlooked the potential of using scan chains to ascertain the chips' maximum frequency for performance testing.Furthermore, these ATEs support few test channels and rely on the limited test vectors stored in the FPGA's internal resources, resulting in insufficient versatility of the system and an inability to test chips in VLSI.

Overview
The FATE consists of four components: FPGA, Master DSP, personal computer (PC), and device under test (DUT).These components are divided into the user layer, control layer, and hardware layer based on the different functionalities of the system, with data transmission between different layers achieved through the interface.The architecture of the system is illustrated in Figure 2. The hardware layer consists of FPGA and DUT.FPGA is a programmable, versatile digital chip based on configurable logic block (CLB) matrices, allowing programming and reconfiguration anytime and anywhere due to its powerful flexibility.Moreover, FPGA has numerous external I/O pins supporting various voltage standards, making it highly The hardware layer consists of FPGA and DUT.FPGA is a programmable, versatile digital chip based on configurable logic block (CLB) matrices, allowing programming and reconfiguration anytime and anywhere due to its powerful flexibility.Moreover, FPGA has numerous external I/O pins supporting various voltage standards, making it highly suitable for connecting to the test pins of DUT and providing test vectors with different timing sequences.
The control layer, through the Master DSP, manages the entire testing process.This DSP is a high-performance chip independently developed by the research group, featuring eight cores with a working frequency of 1 GHz.It can operate normally in temperatures ranging from −55 degrees to 125 degrees Celsius and possesses significant computational capabilities, making it suitable for high-speed real-time signal processing in fields such as radar, communications, and electronic warfare.
The user layer provides users with a visual operating interface, converting user inputs into operations or commands, enabling intuitive interaction with the system of FATE.
The FATE achieves reduced system coupling through layered design, enabling the portability of system modules across different platforms.FPGA storage resources are primarily allocated for caching a small amount of data, aiming to maximize the provision of test channels to meet different testing requirements and enhance system scalability.Validating test results on the Master DSP can conserve PC storage space and improve system utilization.

Hardware Design of FATE 3.2.1. Test Channel
The test channel is the fundamental part of the system.Each test channel is directly connected to the test pins of the DUT.Based on the input and output properties of the DUT pins, the module selects whether to provide test vectors to the DUT or to capture the test responses.Due to the limited storage capacity of internal FPGA resources, each test channel of the system needs to cache data through FIFOs.As shown in Figure 3, each test channel is composed of two FIFOs, used for transmitting test vectors to the DUT and capturing test responses, respectively.
suitable for connecting to the test pins of DUT and providing test vectors with different timing sequences.
The control layer, through the Master DSP, manages the entire testing process.This DSP is a high-performance chip independently developed by the research group, featuring eight cores with a working frequency of 1 GHz.It can operate normally in temperatures ranging from −55 degrees to 125 degrees Celsius and possesses significant computational capabilities, making it suitable for high-speed real-time signal processing in fields such as radar, communications, and electronic warfare.
The user layer provides users with a visual operating interface, converting user inputs into operations or commands, enabling intuitive interaction with the system of FATE.
The FATE achieves reduced system coupling through layered design, enabling the portability of system modules across different platforms.FPGA storage resources are primarily allocated for caching a small amount of data, aiming to maximize the provision of test channels to meet different testing requirements and enhance system scalability.Validating test results on the Master DSP can conserve PC storage space and improve system utilization.

Test Channel
The test channel is the fundamental part of the system.Each test channel is directly connected to the test pins of the DUT.Based on the input and output properties of the DUT pins, the module selects whether to provide test vectors to the DUT or to capture the test responses.Due to the limited storage capacity of internal FPGA resources, each test channel of the system needs to cache data through FIFOs.As shown in Figure 3, each test channel is composed of two FIFOs, used for transmitting test vectors to the DUT and capturing test responses, respectively.Each test channel uses two FIFOs because the data width of the interface between the Master DSP and FPGA is 64 bits.However, Xilinx's FIFO IP core can only output a minimum of 8 bits when writing 64 bits.Additionally, transmitting test vectors to the DUT involves single-bit transmission.Therefore, two FIFOs must complete the width conversion and asynchronous clock processing.The first FIFO has a capacity of 72 Kb, used for data caching, and the second FIFO is primarily utilized for width conversion, with a capacity of 18 Kb.It should be noted that the capacity of FIFO does not represent the maximum test vector length that the system can support.All threshold flags for input and output channels will form the corresponding threshold registers.The Master DSP reads the values of the registers through the EMC interface and makes judgments to timely supplement test vectors and read out test responses.This method supports the length of test Each test channel uses two FIFOs because the data width of the interface between the Master DSP and FPGA is 64 bits.However, Xilinx's FIFO IP core can only output a minimum of 8 bits when writing 64 bits.Additionally, transmitting test vectors to the DUT involves single-bit transmission.Therefore, two FIFOs must complete the width conversion and asynchronous clock processing.The first FIFO has a capacity of 72 Kb, used for data caching, and the second FIFO is primarily utilized for width conversion, with a capacity of 18 Kb.It should be noted that the capacity of FIFO does not represent the maximum test vector length that the system can support.All threshold flags for input and output channels will form the corresponding threshold registers.The Master DSP reads the values of the registers through the EMC interface and makes judgments to timely supplement test vectors and read out test responses.This method supports the length of test vectors not being constrained by the FIFO depth, and longer test vectors can improve the chip's fault coverage rate.

SRIO Controller
This paper selects the serial RapidIO (SRIO) interface for data transmission between the Master DSP and FPGA [20].SRIO is a high-speed serial protocol, and its interface can choose from three different link channel widths: 1×, 2×, and 4×.Each link channel supports four different transmission rates: 1.25, 2.5, 3.15, and 5.0 Gbauds.Compared with other data transmission interfaces such as Ethernet and PCIE, the SRIO interface can better meet the requirements of high-performance embedded systems for high-speed data transmission between multiple devices.An SRIO data packet consists of a header, a valid data payload, and a 16-bit CRC checksum.Since the SRIO interface adopts the HELLO (Header Encoded Logical Layer Optimized) format, each SRIO transmission can carry up to 256 bytes, with 8 bytes being the header containing transmission information.The data packet efficiency can be determined by the ratio of the effective data length to the total packet length.In contrast, when transmitting the same 256 bytes, Ethernet requires encapsulation using protocols such as IP and MAC, resulting in a lower effective data encapsulation efficiency.Table 1 shows the comparison of the three interfaces.The SRIO adopts a three-layer hierarchical architecture.The logical layer defines the format of packets; the transport layer, positioned in the middle, defines the SRIO address space and routing information; and the physical layer, at the bottom of the entire hierarchical structure, primarily governs the transmission mode.XILINX's FPGAs have integrated highspeed serial transceivers such as GTP, GTX, or GTH.In XILINX's development environment, the Serial-RapidIO-Gen2 IP [21] can be directly called to implement the three-layer structure of SRIO.In the system described in this paper, the IP is set to Initiator/Target mode, the link channel is set to 4×, and each channel's transmission rate is 5 Gbauds.The SRIO controller needs to convert signals between the SRIO interface and the FIFO.Taking writing test vectors into the source FIFO as an example, this module is shown in Figure 4, and each signal in the SRIO2FIFO module is illustrated in Table 2.     Scan tests require connecting all test pins of the chip to the testing channel correspondingly, but different types of tests may utilize different test pins.To ensure that the system has better flexibility to meet various chip testing requirements, it is necessary to design the testing channels in a configurable manner according to the actual testing needs.We consider that SRIO can transfer data between different devices, with each device having its device ID number.By using different IDs, we ensure that data are accurately transmitted to the corresponding device.In this system, we regard different pins of the chip under test as different devices and have assigned a unique 8-bit ID to each pin.The allocation of test pin IDs is shown in Table 3.After assigning the IDs, we transfer data to the corresponding data channels based on these IDs.To validate the effectiveness of the entire data transmission process, we conducted simulations of the entire system, taking data writing as an example.The simulated waveform is shown in Figure 5. conducted simulations of the entire system, taking data writing as an example.The simulated waveform is shown in Figure 5.

Clock Management Module
The clock management module is responsible for generating the clocks required for the test process.Different chips have different requirements for the test clock.In this system, the main clock of the FPGA is 25 MHz, and the test clock required by the test chip, DFT_TEST_CLK, is also 25 MHz with a duty cycle of 25%.The test vectors in the test channel need to be synchronized with the timing.For example, the test vectors should be outputted when the test clock is at a high level; otherwise, they should be at a low level.To ensure synchronization between the clock and data, the actual output of the test channel is jointly generated by the test vectors and the test clock.To accurately capture test responses, it is essential to generate PROBE_CLK based on the output pins' characteristics.In this system, the capture clock frequency, designed for the test chip, aligns with the DFT_TEST_CLK frequency but exhibits a phase difference.Figure 6 depicts the relationship between various clocks and outputs, the actual output is obtained by performing an AND operation between DFT_TEST_CLK and the test vector.

Clock Management Module
The clock management module is responsible for generating the clocks required for the test process.Different chips have different requirements for the test clock.In this system, the main clock of the FPGA is 25 MHz, and the test clock required by the test chip, DFT_TEST_CLK, is also 25 MHz with a duty cycle of 25%.The test vectors in the test channel need to be synchronized with the timing.For example, the test vectors should be outputted when the test clock is at a high level; otherwise, they should be at a low level.To ensure synchronization between the clock and data, the actual output of the test channel is jointly generated by the test vectors and the test clock.To accurately capture test responses, it is essential to generate PROBE_CLK based on the output pins' characteristics.In this system, the capture clock frequency, designed for the test chip, aligns with the DFT_TEST_CLK frequency but exhibits a phase difference.Figure 6 depicts the relationship between various clocks and outputs, the actual output is obtained by performing an AND operation between DFT_TEST_CLK and the test vector.The hardware part of the FATE has implemented basic testing functionalities.The software system of the FATE adopts a modular software architecture and hierarchical design approach to facilitate functional expansion.The software functional architecture of the FATE is illustrated in Figure 7.The hardware part of the FATE has implemented basic testing functionalities.The software system of the FATE adopts a modular software architecture and hierarchical design approach to facilitate functional expansion.The software functional architecture of the FATE is illustrated in Figure 7.The software functionality architecture consists of four levels: the view layer, service layer, control layer, and data layer.The view layer provides a user-friendly human-computer interaction interface for managing and maintaining operations, including test configuration, result display, and multiple operable interfaces.The service layer starts from the test files and generates binary files that the hardware can process.It also transmits data with the Master DSP through a network interface and designs test instructions for The software functionality architecture consists of four levels: the view layer, service layer, control layer, and data layer.The view layer provides a user-friendly humancomputer interaction interface for managing and maintaining operations, including test configuration, result display, and multiple operable interfaces.The service layer starts from the test files and generates binary files that the hardware can process.It also transmits data with the Master DSP through a network interface and designs test instructions for chip testing convenience.The core function of the control layer is to execute the specific test process, controlling the system to complete the testing process of the DUT based on the system's state.The data layer achieves overall analysis and storage of test result data based on visualization tools and databases.

Chip Performance Test
Chip performance testing is achieved through the faster-than-at-speed test of the scan chain, aiming to obtain the minimum failure frequency of internal timing paths within the chip.This testing method requires continuously scanning the frequency of the fasterthan-at-speed test until the chip testing fails at a certain frequency, obtaining the chip's maximum operating frequency.
However, it requires providing different frequency test vectors for each test, and generating a test vector for testing the chip takes several hours, leading to low testing efficiency.Consider that the two-pulse high-frequency clock for launch-captured in the scan chain is generally achieved by inserting an on-chip clock controller into the chip control logic to switch between low-speed shift clocks provided by the test equipment and high-speed operating clocks generated internally by the chip's phase-locked loop (PLL).The logic structure of the PLL generating the high-speed clock for the scan test is shown in Figure 8.The test vectors are inputted into the corresponding SFF under the control of the Scan Enable to latch, and the generated high-speed clock is then selected for output through the OCC.According to the DFT of the chip under test, the first 32 bits of one of the test pins' input test vectors are used to scan in the multiplication factor and division factor of the PLL, achieving different test frequencies for the scan test.By configuring these 32-bit test vectors with binary values representing the desired frequency multiplication factor and division factor, the frequency sweeping faster-than-at-speed test can be achieved without the need to regenerate test vectors, thereby obtaining the chip's maximum performance.Since the system achieves different test frequencies by changing the multiplication factor, the current minimum accuracy of frequency sweeping is the source clock frequency of the chip under test, which is 25 MHz.

Round-Robin Scheduling Transmit
Each test channel only has a small-capacity FIFO for data buffering.If the Source FIFO is empty or the Capture FIFO overflows, it will affect the seamlessness of the test, leading to inaccurate test results and the inability to correctly diagnose real chip faults.The test vectors are inputted into the corresponding SFF under the control of the Scan Enable to latch, and the generated high-speed clock is then selected for output through the OCC.According to the DFT of the chip under test, the first 32 bits of one of the test pins' input test vectors are used to scan in the multiplication factor and division factor of the PLL, achieving different test frequencies for the scan test.By configuring these 32-bit test vectors with binary values representing the desired frequency multiplication factor and division factor, the frequency sweeping faster-than-at-speed test can be achieved without the need to regenerate test vectors, thereby obtaining the chip's maximum performance.Since the system achieves different test frequencies by changing the multiplication factor, the current minimum accuracy of frequency sweeping is the source clock frequency of the chip under test, which is 25 MHz.

Round-Robin Scheduling Transmit
Each test channel only has a small-capacity FIFO for data buffering.If the Source FIFO is empty or the Capture FIFO overflows, it will affect the seamlessness of the test, leading to inaccurate test results and the inability to correctly diagnose real chip faults.Due to the limited interface between FPGA and Master DSP, when there are a large number of channels, each test channel cannot transmit in parallel.We designed a round-robin scheduling algorithm by the threshold flags of the test channels, with each test channel invoking the SRIO interface for data transmission sequentially.The specific data transmission process is illustrated in Algorithm 1.
Master DSP transmits a vector of length H to input channel i. 3: end for 4: if T then 7: for FPGA transmits the test response of length Q from the output channel i to the Master DSP.9: end for 10: for (i = 1 to X) do 11: Master DSP transmits a test vector of length Q to input channel i. 12: end for 13: end if 15: for (i = 1 to X) do 16: Master DSP transmits a test vector of length (L − P) to the input channel i. 17: end for 18: for (i = 1 to Y) do 19: FPGA transmits the remaining test responses from the output channel i to the Master DSP.

20: end for
This algorithm enables the unlimited transmission of test vectors through a specified number of test channels, ensuring effective testing by preventing situations where the source FIFO is empty or the capture FIFO overflows.

Test Control and Configuration
To realize a flexible ATE, we use the Master DSP to control the entire testing process.The Master DSP is responsible for not only receiving test vectors and configuration information from the PC but also driving the hardware.For different types of tests, different pins and test vectors are used, requiring reconfiguration for each test.Configuration information includes test pins and test vector lengths obtained from parsing WGL test files generated by EDA tools.Furthermore, due to the need for frequency scanning tests on the chip, the configuration also includes a custom frequency range and a number of tests.All configuration content is converted into binary files for transmission.
An open-source LWIP protocol based on TCP/IP, known for its lightness and reliability, is utilized between the Master DSP and the PC.After transmitting the configuration information, the PC sends the test vectors.Once the Master DSP receives them, it begins to transmit to the FPGA for testing.These events occur in sequence according to the test procedure.Hence, a state machine is designed in the Master DSP to ensure the continuity and integrity of the testing.The state machine of the Master DSP is shown in Figure 9.
An open-source LWIP protocol based on TCP/IP, known for its lightness and reliability, is utilized between the Master DSP and the PC.After transmitting the configuration information, the PC sends the test vectors.Once the Master DSP receives them, it begins to transmit to the FPGA for testing.These events occur in sequence according to the test procedure.Hence, a state machine is designed in the Master DSP to ensure the continuity and integrity of the testing.The state machine of the Master DSP is shown in Figure 9.

The Development Board
The FATE uses a development board that incorporates a Xilinx Kintex-7 XC7K325T and a Master DSP as its core components.These elements interface with the computer through their individual JTAG interfaces.The board also has 2 GB of DDR memory that

The Development Board
The FATE uses a development board that incorporates a Xilinx Kintex-7 XC7K325T and a Master DSP as its core components.These elements interface with the computer through their individual JTAG interfaces.The board also has 2 GB of DDR memory that is connected directly to the Master DSP.To make testing easier, the main board is connected to a subordinate card that has a socket for the DSP chip under DUT.This ensures that all necessary connections, including those for the scan chain's input and output, scan control, and ports necessary for at-speed test and faster-than-at-speed test, are connected with the FPGA in accordance with the chip's DFT.Additionally, the board has a variety of interfaces, such as an SD card slot, a PCIE interface, and a network port, all of which can be used as sources for the test vector.
Figure 10 shows the actual development board we used.We selected a self-developed DSP chip for testing.This DSP embodies a high level of integration and intricate circuit functionalities.It comprises approximately 816 million transistors over 1.1 million flip-flops and 72 scan test pins which connected with test channels.is connected directly to the Master DSP.To make testing easier, the main board is connected to a subordinate card that has a socket for the DSP chip under DUT.This ensures that all necessary connections, including those for the scan chain's input and output, scan control, and ports necessary for at-speed test and faster-than-at-speed test, are connected with the FPGA in accordance with the chip's DFT.Additionally, the board has a variety of interfaces, such as an SD card slot, a PCIE interface, and a network port, all of which can be used as sources for the test vector.
Figure 10 shows the actual development board we used.We selected a self-developed DSP chip for testing.This DSP embodies a high level of integration and intricate circuit functionalities.It comprises approximately 816 million transistors over 1.1 million flip-flops and 72 scan test pins which connected with test channels.

Test Progress
The collaborative design between hardware and software enables flexible testing of the chip.The testing process for different types of test items is consistent, with the main differences being in the test vectors and the pins used for testing.A complete test, as shown in Figure 11, demonstrates the workflow of the PC, Master DSP, and FPGA.It is important to note that when cross-component operations are required, priority is given to entering the workflow of other components.The collaborative design between hardware and software enables flexible testing of the chip.The testing process for different types of test items is consistent, with the main differences being in the test vectors and the pins used for testing.A complete test, as shown in Figure 11, demonstrates the workflow of the PC, Master DSP, and FPGA.It is important to note that when cross-component operations are required, priority is given to entering the workflow of other components.

Test Vectors and Results
To verify the accuracy of the system, 10 actual chips were selected and tested.Additionally, in order to comprehensively cover the faults of the chips, we need to generate three different sets of test vectors based on different fault models.Taking the generation of test vectors for stuck-at fault model as an example, detecting a stuck-at fault requires generating a value at the fault point that is opposite to the expected value, which means activating the fault.Figure 12 illustrates a stuck-at-1 fault; to change the value of the fault point to 0, the input at port A must be 1.To ensure that the value of the fault point can be transmitted to the output port, ports B, C, and D must have specific values to successfully convey the fault point's value.When the input ports ABCD receive the test vector "1011", if the value detected at output port O is 1, it indicates that there are internal defects in the chip.The ABCD represents the scan flip-flops in the scan chain.We utilized professional EDA (Electronic Design Automation) tools to generate test vectors for different fault models.The specific test vectors and their functions are shown in Table 4.

Test Vectors and Results
To verify the accuracy of the system, 10 actual chips were selected and tested.Additionally, in order to comprehensively cover the faults of the chips, we need to generate three different sets of test vectors based on different fault models.Taking the generation of test vectors for stuck-at fault model as an example, detecting a stuck-at fault requires generating a value at the fault point that is opposite to the expected value, which means activating the fault.Figure 12 illustrates a stuck-at-1 fault; to change the value of the fault point to 0, the input at port A must be 1.To ensure that the value of the fault point can be transmitted to the output port, ports B, C, and D must have specific values to successfully convey the fault point's value.When the input ports ABCD receive the test vector "1011", if the value detected at output port O is 1, it indicates that there are internal defects in the chip.The ABCD represents the scan flip-flops in the scan chain.

Test Vectors and Results
To verify the accuracy of the system, 10 actual chips were selected and tested.Additionally, in order to comprehensively cover the faults of the chips, we need to generate three different sets of test vectors based on different fault models.Taking the generation of test vectors for stuck-at fault model as an example, detecting a stuck-at fault requires generating a value at the fault point that is opposite to the expected value, which means activating the fault.Figure 12 illustrates a stuck-at-1 fault; to change the value of the fault point to 0, the input at port A must be 1.To ensure that the value of the fault point can be transmitted to the output port, ports B, C, and D must have specific values to successfully convey the fault point's value.When the input ports ABCD receive the test vector "1011", if the value detected at output port O is 1, it indicates that there are internal defects in the chip.The ABCD represents the scan flip-flops in the scan chain.We utilized professional EDA (Electronic Design Automation) tools to generate test vectors for different fault models.The specific test vectors and their functions are shown in Table 4.We utilized professional EDA (Electronic Design Automation) tools to generate test vectors for different fault models.The specific test vectors and their functions are shown in Table 4.The DC test vectors have 10 million bits and the AC test vectors have 40 million bits, providing more than 96% fault coverage across the chip.An amount of 1100 test vectors have 100,000 bits because they only require direct inputs and outputs.The test results are shown in Table 5.In Figure 13, green dots indicate passing tests at that frequency, while red dots indicate failing tests at that frequency.The fault count reflects the number of differences between the test response and the standard result when compared bit by bit.To validate the effectiveness of obtaining the chip's maximum performance through faster-than-at-speed testing, the system verifies the tested DSP chip.The validation method involves ensuring that when a chip fails the faster-than-at-speed test at a certain frequency, the chip's operational ultimate frequency should also be close to that frequency.The process of obtaining the maximum operational frequency at which the chip can function properly is depicted in Figure 14.The limit frequencies obtained by two methods are shown in Table 6.In Figure 13, green dots indicate passing tests at that frequency, while red dots indicate failing tests at that frequency.The fault count reflects the number of differences between the test response and the standard result when compared bit by bit.To validate the effectiveness of obtaining the chip's maximum performance through faster-than-at-speed testing, the system verifies the tested DSP chip.The validation method involves ensuring that when a chip fails the faster-than-at-speed test at a certain frequency, the chip's operational ultimate frequency should also be close to that frequency.The process of obtaining the maximum operational frequency at which the chip can function properly is depicted in Figure 14.The limit frequencies obtained by two methods are shown in Table 6.The limit frequencies obtained by two methods are shown in Table 6.According to the test results, the maximum frequencies obtained by the two testing methods are consistent, validating the effectiveness of the system in determining the chip's maximum performance through the faster-than-at-speed test.The designed ATE not only detects chip faults but also helps to select chips with varying performance for different application scenarios.

Performance of the FATE
Increasing the number of test pins can expand the support for different chip types; however, it is important to note that the number of test pins that the system can provide is limited by the FPGA's internal resources.Each test pin requires the use of FIFO for data buffering, and each FIFO occupies Block RAM resources within the FPGA.Therefore, the maximum number of test pins that an FPGA-based verification automatic test equipment can provide is limited.Furthermore, since the Master DSP and FPGA adopt a roundrobin schedule for data transfer, the chip under test remains in a state of consuming test vectors and generating test responses during the transfer process, which may cause the test channel's FIFO to overflow or be empty.Therefore, the interface transfer rate between the FPGA and Master DSP also limits the number of test channels the system can support.
Therefore, in order to provide more test channels and ensure the effectiveness of the tests, we have conducted a modeling analysis of the system.When the test chip has only one input pin and one output pin, the system model is shown in Figure 15.
different application scenarios.

Performance of the FATE
Increasing the number of test pins can expand the support for different chip types; however, it is important to note that the number of test pins that the system can provide is limited by the FPGA's internal resources.Each test pin requires the use of FIFO for data buffering, and each FIFO occupies Block RAM resources within the FPGA.Therefore, the maximum number of test pins that an FPGA-based verification automatic test equipment can provide is limited.Furthermore, since the Master DSP and FPGA adopt a round-robin schedule for data transfer, the chip under test remains in a state of consuming test vectors and generating test responses during the transfer process, which may cause the test channel's FIFO to overflow or be empty.Therefore, the interface transfer rate between the FPGA and Master DSP also limits the number of test channels the system can support.
Therefore, in order to provide more test channels and ensure the effectiveness of the tests, we have conducted a modeling analysis of the system.When the test chip has only one input pin and one output pin, the system model is shown in Figure 15.The depth of the system's source FIFO and capture FIFO is the same, and the rate at which the test chip consumes test vectors is equal to the rate at which test responses are generated.We use V to represent the depth of the Source FIFO, N to represent the threshold depth, and R to represent the length of the test response generated during data transmission.According to the conditions for effective testing, we can obtain the following: The depth of the system's source FIFO and capture FIFO is the same, and the rate at which the test chip consumes test vectors is equal to the rate at which test responses are generated.We use V to represent the depth of the Source FIFO, N to represent the threshold depth, and R to represent the length of the test response generated during data transmission.According to the conditions for effective testing, we can obtain the following: R ≤ N (1) In the worst situation, where R equals N, we have the following: The FIFO's minimum capacity must be double the threshold.When the Master DSP continuously reads out test responses and supplements test vectors of threshold length, the time taken is the longest, which can easily lead to an invalid test.When the test chip has P input input pins and P output output pins, the length of data transmitted during this process is (P input + P output ) × N, and the time required to complete the transmission is expressed as follows: T = N•P input /W in + N•P output /W out (4) where T represents the transmission time, W in represents the bandwidth at which the Master DSP supplements test vectors, and W out represents the bandwidth for reading out test responses.When the test rate of the test chip is W test , the length of the test responses generated during this time is as follows: Once N is determined, by calculating the corresponding R and then referring to Equations ( 1) and ( 2), it can be determined whether the system is capable of conducting valid testing.
The system sets the threshold N to 32 Kb, and when the SRIO interface is set to a rate of 20 Gbps, by conducting a loopback test on the SRIO interface, the actual effective data transfer rate is approximately 16 Gbps.The test rate of the test chip is 25 Mbps.Xilinx's XC7K325T chip contains 445 Block RAMs [22] with a capacity of 36 Kb each, of which 400 are used for data buffering of test pins, with the remainder used for debugging and other purposes.Due to the system requiring two stages of FIFO for bit width conversion, each test channel consumes 2.5 Block RAM blocks.Based on the resources within the FPGA, the maximum number of test pins supported is 160.Both test vectors and test responses are transmitted to the Master DSP through the SRIO interface, with both having the same bandwidth.According to Equation (4), the longest transmission time for a single complete operation is determined to be 305 µs According to Equation ( 5), the generated test responses are approximately 7.8 Kb, meeting the conditions for valid testing.
Current FPGA-based test equipment utilizes various FPGA chips and system architectures.Furthermore, existing literature lacks detailed design parameters, making it challenging to find clear standards for comparison with the work presented in this paper.Therefore, this paper selects four parameters commonly mentioned in the literature for comparison: test channels, test functionality, supported test vector length, and the types of chips tested.Papers that do not provide detailed parameters on these aspects will not be used for comparison.The results of the comparison with other works are shown in Table 7. Table 8 shows a detailed comparison with our prior work [19], which includes experimental details.Table 8 indicates that with a 37.5% reduction in resources used per test channel, the number of supported test channels in this work has increased to four times that of the previous work.Furthermore, the average transmission rate between the FPGA and the Master DSP has been approximately increased to 18 times.Although a significant increase in transmission rate can reduce the test duration to some extent, the overall time required to complete the test is constrained by the test frequency of the chip itself.However, the

Figure 1 .
Figure 1.Structure of the scan chain.

Figure 1 .
Figure 1.Structure of the scan chain.

Figure 4 .
Figure 4. Block diagram of the SRIO2FIFO module.

Figure 4 .
Figure 4. Block diagram of the SRIO2FIFO module.

Figure 5 .Figure 5 .
Figure 5.The simulated waveform graph of writing test vectors to the test channel.3.2.4.Clock Management Module The clock management module is responsible for generating the clocks required for the test process.Different chips have different requirements for the test clock.In this system, the main clock of the FPGA is 25 MHz, and the test clock required by the test chip, DFT_TEST_CLK, is also 25 MHz with a duty cycle of 25%.The test vectors in the test

Figure 5 .
Figure 5.The simulated waveform graph of writing test vectors to the test channel.

Figure 6 .
Figure 6.Relationship between clock and output.Figure 6. Relationship between clock and output.

Figure 6 .
Figure 6.Relationship between clock and output.Figure 6. Relationship between clock and output.
Electronics 2024, 13, x FOR PEER REVIEW 10 of 19 logic structure of the PLL generating the high-speed clock for the scan test is shown in Figure 8.

Figure 8 .
Figure 8.The PLL generates clocks at different frequencies.

Figure 8 .
Figure 8.The PLL generates clocks at different frequencies.

Algorithm 1 .
Round-robin scheduling algorithm Input: Number of input pins X, Number of output pins Y, Threshold value Q, Depth of the FIFO H, Length of the test vector L, Length of completed transmission of test vectors P, Threshold flags T Initial: P = 0 1: for

Figure 9 .
Figure 9.The state machine of the Master DSP.

Figure 9 .
Figure 9.The state machine of the Master DSP.

Figure 11 .
Figure 11.Test flow of automatic test equipment.

Figure 13 .
Figure 13.The screenshot of the system showing the chip performance test result.

Figure 14 .
Figure 14.Procedure for obtaining the limit frequency of a chip.

Figure 13 .
Figure 13.The screenshot of the system showing the chip performance test result.

Figure 13 .
Figure 13.The screenshot of the system showing the chip performance test result.

Figure 14 .
Figure 14.Procedure for obtaining the limit frequency of a chip.

Figure 14 .
Figure 14.Procedure for obtaining the limit frequency of a chip.

Figure 15 .
Figure 15.Simple system model.The conditions for the system to conduct valid tests are as follows: • Before the test vectors in the Source FIFO are completely consumed, new test vectors can be supplied for the next round; • Before the Capture FIFO overflows, the Master DSP can read the test responses from the FIFO; • During the data transmission between the Master DSP and FPGA, both the test vectors consumed by the test chip and the test responses generated do not exceed the set threshold depth.

Figure 15 .
Figure 15.Simple system model.The conditions for the system to conduct valid tests are as follows: • Before the test vectors in the Source FIFO are completely consumed, new test vectors can be supplied for the next round; • Before the Capture FIFO overflows, the Master DSP can read the test responses from the FIFO; • During the data transmission between the Master DSP and FPGA, both the test vectors consumed by the test chip and the test responses generated do not exceed the set threshold depth.

Table 1 .
Comparison of Three Interfaces.

Table 2 .
Port descriptions of the SRIO2FIFO module.

Table 2 .
Port descriptions of the SRIO2FIFO module.

Table 3 .
Description of pin assignment.

Table 4 .
Functional description table for test vectors.

Table 6 .
Test results of the two methods.

Table 6 .
Test results of the two methods.

Table 6 .
Test results of the two methods.

Table 7 .
Comparison with other works.

Table 8 .
Comparison with our previous work.