1. Introduction
As the complexity and scale of today’s System-on-Chip (SoC) designs continue to increase, the time and resources dedicated to functional verification have expanded proportionally. Functional verification ensures that the SoC operates correctly according to its specifications, which is critical in fields such as automotive, aviation, medical, and aerospace [
1,
2]. This laborious process involves creating a verification plan, developing a verification environment, and implementing tests to achieve maximum coverage. System Verilog, along with the Universal Verification Methodology (UVM), is commonly used to build these environments [
3]. Despite the effectiveness of UVM, writing assertions and creating comprehensive test scenarios for large designs can be challenging.
The functional verification process typically involves creating a comprehensive verification plan, developing a robust verification environment, implementing various test scenarios, and achieving maximum coverage. The verification environment is usually written in a hardware verification language, such as System Verilog, coupled with a methodology, the most prevalent being the UVM [
4]. These environments, the Design Under Test (DUT), and test scenarios are compiled and executed using simulators, such as Cadence Incisive, Mentor Questa Sim, and Synopsys Verilog Compiler Simulator. The UVM framework mandates several classes, such as item, sequencer, driver, monitor, model, sequence, and test, each playing an essential role in ensuring thorough verification [
5].
Despite the structured approach provided by UVM, the sheer size and complexity of modern designs can make writing and managing assertions particularly challenging. Assertions are code constructs used to verify that the design behaves as expected, making them integral to the verification process. However, creating comprehensive assertions for large designs can be burdensome and prone to human error. This is where the advent of artificial intelligence (AI), particularly Large Language Models (LLMs), such as ChatGPT-4.0, presents an exciting opportunity to revolutionize the verification landscape [
6].
This paper aims to prove the utility and speed-up of using LLMs, instead of the traditional approach of manually writing the code by an engineer. This approach optimizes the engineer’s time, allowing them time to work on other critical tasks. This research explores the implementation of an Advanced Peripheral Bus (APB) verification environment in System Verilog using UVM, enhanced by ChatGPT. Also, this research aims to streamline the verification workflow by leveraging AI, reducing the manual effort required, and improving overall efficiency. Specifically, the use of ChatGPT-4.0 for generating System Verilog assertions through text-based prompts and image-fed inputs was investigated, evaluating this approach’s impact on the verification process. Our findings suggest that integrating AI into verification accelerates the process and enhances the accuracy and reliability of the results, paving the way for more efficient SoC development.
Our study aims to demonstrate the effectiveness and efficiency of using Large Language Models (LLMs), as opposed to engineers’ traditional method of manual code writing. This approach optimizes engineers’ time, allowing them to focus on other critical tasks. Our research delves into developing an Advanced Peripheral Bus (APB) verification environment in System Verilog using UVM, enhanced by ChatGPT. Additionally, our goal is to streamline the verification workflow by harnessing AI, reducing manual effort, and improving overall efficiency. Specifically, ChatGPT-4.0 was explored to generate System Verilog assertions through text-based prompts and image-fed inputs, evaluating its impact on the verification process. Our findings indicate that incorporating AI into verification accelerates the process and enhances the accuracy and reliability of results, thus promoting more efficient SoC development.
This paper is structured to explore the use of ChatGPT in enhancing the functional verification of SoC designs through System Verilog and UVM. The remainder of the paper is organized as follows:
Section 2 provides a comprehensive discussion of the materials and methods, detailing the implementation of the APB verification environment and the integration of ChatGPT for assertion generation.
Section 3 presents the results obtained from applying this methodology, including the observed improvements in effectiveness and efficiency.
Section 4 presents the comments referring to other relevant studies. Finally,
Section 5 concludes the paper with a summary of the findings, functional verification implications, and potential future research directions.
2. Materials and Methods
The verification environment for ensuring a project’s correctness and functionality includes several key classes, each with specific roles and responsibilities. First, the item class represents the foundation of the verification environment by encapsulating the variables that represent the protocol rules. This defines the properties and behaviors of the elements checked in the environment. Second, the sequencer class manages the handshake between the driver and sequence to ensure the seamless flow of data during verification, which is essential in coordinating the interactions between various components of the verification environment. Another item is the driver class, which generates and drives the stimuli necessary to exercise the design under test (DUT) and verify its functionality [
7].
Four other items are necessary for the verification system’s operation. The monitor class collects data from the bus. It is sent further through a UVM analysis port for subsequent evaluation, which enables efficient monitoring and analysis of the data being exchanged in the verification environment. Serving as a reference point for comparison, the model class is responsible for comparing the expected data with the actual data, and any differences identified during this process trigger a UVM_ERROR, signaling a potential problem in the verification process. The sequence class is responsible for creating transactions or initiating other necessary transactions as part of the verification process. It helps define the sequence of operations and transactions to be executed during verification. The test class is used for instantiating and leveraging various sequences and scenarios to execute comprehensive verification processes. It provides the flexibility to override sequences from the base test class, enabling the creation of extensive and specific verification scenarios tailored to the design under test [
8,
9].
Figure 1 exemplifies and explains the verification activities. The verification engineer understands the specification, plans and implements the verification environment, reports bugs, and completes 100% coverage.
Functional verification can be conducted at various levels, each with specific methodologies and objectives. Individual modules are verified at the unit or module level using a verification environment developed in System Verilog with UVM, with verification deemed complete once all regression tests pass and 100% coverage is achieved. Moving to the cluster level, DPI-C (Direct Programming Interface for C) is commonly utilized alongside System Verilog and UVM, allowing the emulation of processor behavior without the actual processor being involved. At the system level, tests are primarily written in C, with some checkers in System Verilog, and include components such as the processor, memories, and peripherals [
11]. Verification at this level is considered complete when all regression tests pass, and coverage is often implemented for critical scenarios. Finally, analog models are incorporated at the SoC level, and tests are conducted using C and Assembly languages. Verification at this level is concluded when the SoC regression tests pass, ensuring the comprehensive functionality of the entire chip [
12].
AI is being considered to aid the verification process and reduce the time spent on it. AI can assist the verification process in several ways.
One method involves clustering failing tests according to their specific failures. In coverage-driven constrained random verification with a randomly chosen seed, it is not uncommon to encounter the same RTL bug in multiple tests. This approach can be automated using distinct methods: rule-based (creating a set of rules corresponding to the features of the failure cause) and model-based (building a model of the system under debugging). Research indicates that the best-performing clustering algorithm for this task was DBSCAN with dimensionality reduction PCA [
13,
14].
Another area where AI can contribute is in stimulus and test generation. By utilizing supervised and reinforcement machine learning algorithms [
15], AI can help achieve the planned coverage for the DUT [
16]. Recent research combines supervised and unsupervised machine learning algorithms, such as neural networks, random forests, and support vector machines, to reduce the number of test cases required to reach maximum coverage [
17,
18,
19,
20]. A supervised Artificial Neural Network (ANN) combined with Cocotb as a platform has demonstrated improved iterations to achieve maximum coverage [
21].
AI also has potential in document classification. It can assist in classifying protocol specifications to highlight critical features for implementation versus those that may never be used or are not available in future releases to clients. This application of AI can streamline the development process by focusing efforts on the most relevant aspects of the documentation [
22].
Generative AI, a branch of AI, can generate different testbench classes. It has many applications, from image creation to natural language processing and music generation [
23]. Recently, interest in using Generative AI for educational purposes has surged, particularly following the introduction of ChatGPT-3.0 in November 2022 [
24]. LLMs are being considered for completing various laborious tasks. Notably, the entire chip design flow was produced using ChatGPT, marking a significant milestone by designing and taping out the first chip written with an LLM [
25].
Lastly, AI can aid in assertion generation for various protocols, such as the APB and Advanced High-Performance Bus (AHB), using LLMs such as ChatGPT [
26]. In System Verilog, there are two types of assertions: immediate and concurrent. Immediate assertions function as if–else statements, while an event triggers concurrent assertions and remains active throughout the simulation [
27]. These assertions can also be integrated into other verification environments, highlighting the importance of writing assertion code effectively.
This research used a Large Language Model, ChatGPT-4.0, to generate protocol assertions for integration into an existing System Verilog Advanced Peripheral Bus protocol verification environment. These claims were generated by directly asking the model or providing the model with an image. The methodology used for this verification process is the UVM, which is currently the most widely used in the industry. The verification environment developed for this study comprises several essential classes: APB Sequence Item (or APB Transaction), APB Driver, APB Sequencer, APB Monitor, APB Agent, APB Environment, APB Sequence, and APB Core Test [
28]. Each of these classes correlates to ensure comprehensive and effective verification of the APB protocol, with the generated assertions increasing the robustness and efficiency of the verification process.
Figure 2 represents a basic APB-UVM environment without the scoreboard class.
In this environment, assertions were added to increase the verification quality further. Due to their increased complexity and usefulness for the testbench, only concurrent assertions were introduced. Concurrent assertions are parts of code that run for the whole simulation, similar to a while (1) loop or a forever construct. Some protocol rules can be implemented using these assertions, vastly simplifying the verification environment regarding the signal-checking part. If written correctly, concurrent assertions can be used at the unit, system, and even SoC levels. Therefore, they are an essential part of the testbench itself.
In the standard procedure, the verification engineer typically thoroughly studies the protocol rules and waveforms defined in the specification and implements the concurrent assertions needed, often manually. Depending on the complexity of the protocol, it usually takes several hours to implement and debug the assertions when performed manually and correctly. In some cases, assertions are written toward the end of the verification environment implementation, making the code more susceptible to human error due to project time constraints and engineer workload. This approach is often time-consuming and laborious, making it a strong candidate for automation wherever possible.
This study explores two approaches for implementing assertions manually: a text prompt approach and a waveform-input-based approach. The specific waveforms used in this study are available online and are part of the Advanced Microcontroller Bus Architecture (AMBA) and the Universal Serial Bus (USB) rules.
To generate the results, in this study, ChatGPT was asked to use the prompt to generate System Verilog assertions for an APB-UVM testbench.
Figure 3 shows how the LLM was asked to provide the results. In this case, the Advanced Microcontroller Advanced Architecture (AMBA) Advanced Peripheral Bus (APB) documentation is available online, so the LLM was asked to generate a specific assertion regarding the PENABLE, PSEL, PREADY, and PSLVERR signals. The specific assertion must take into consideration the protocol rules, meaning in this case that PSLVERR was only valid during the last phase of the APB transfer, either written or read, when PENABLE, PSEL, and PREADY were all asserted.
In this approach, it is necessary to analyze and correctly specify what assertion is needed for implementation. In this case, the assertion needs to be written for the PSLVERR signal, which is an error signal, not for PREADY or PENABLE or any other signal mentioned in the text-based prompt approach. If the request is unclear, such as in
Figure 4, the resulting code is undefined, as the LLM interprets whatever it thinks might be needed. To generate the needed assertion code, extra caution must be considered to be very specific when requesting parts of the code. If the request is formulated this way, the results will not be very clear, and the LLM will implement something different than what the engineer initially envisioned. Thus, more work is required to fix it.
The text-based prompt approach is very practical, but it needs to be detailed for the LLM to generate the necessary information correctly. The speed-up is considered good in an APB verification environment developed by the authors. The engineer would have to write all kinds of assertions for most of the approximately ten signals usually used in protocol implementations. For an engineer with nine years of experience, not focusing only on System Verilog assertions, this would take approximately four hours to implement and debug. Using the LLM with the text-based approach reduces the time consumed on assertion writing and debugging to around two and a half hours (37.5% speed-up), where most of the time spent was on debugging. This speed-up is more valuable in more extensive and complex SoC projects, as the time consumed on assertions can now be dedicated to test writing, coverage closure, and reviews.
Another way of using the LLM is to provide a protocol waveform to let ChatGPT interpret the waveform rules and provide the needed assertions.
Figure 5 presents the write transfer transaction waveforms of the APB protocol being fed into the LLM and asking ChatGPT to output the assertions without defining the protocol or rules. The LLM’s task was to interpret the waveform and generate the assertions according to the rules implied by the waveform.
In this approach, by being fed only the waveform image of the protocol, the LLM must solely write all central assertions for this write transfer. This approach might be, at first glance, more error-free, as the text approach requires the engineer to be very specific about what is needed. In this way, the LLM can suggest any assertion code, but the engineer must supervise it carefully and assess what parts are necessary for the respective verification environment. In this case, the speed-up of those results from using the waveform-based approach is higher, 75%, as opposed to writing the assertions manually, as the assertions are dumped and integrated into the verification environment. In large SoC projects, where time is critical, as simulations last days and sometimes weeks, the speed-up provided in this waveform image approach is valuable, as the engineer can focus on other project issues and implementations.
Using these two approaches, the expected results highlight how quickly LLM can generate critical code, significantly reducing the time spent on verification. Of course, the verification engineer will have to supervise the LLM and systematically analyze whether the model outputs the correct output. The engineer must correct the code or ask the LLM to regenerate it manually if errors are found.
The verification environment and the Register Transfer Layer (RTL) implementation were then developed. These two components, the verification environment and the RTL, were run on an open-access site using the Cadence INCISIVE simulator.
3. Results
Using the first approach, the LLM was asked via text to provide assertions for different signals of the APB protocol. The output of the LLM for an assertion that has the condition of ‘PENABLE signal not active during reset’ is described in
Figure 6.
In this text, using a prompt-based approach, no corrections were necessary to the assertion, as the researcher was very specific about the goal and the output expectancy. The LLM code has been introduced at the top.sv file of the testbench, where the top module resides, for easier integration. The verification engineer corrected the signals’ names to match the interface used in the testbench. ChatGPT-4.0 generated the code in
Figure 7 regarding PENABLE and PRESETN signals.
A test was run with the code in
Figure 7 active. The test contained different write transfers. The test failed because the ChatGPT-generated assertion was invalidated, thus reinforcing the positive result that the LLM achieved. This was because the PENABLE signal was high during RESET, which is incorrect according to the protocol rules. This validates the correctness and good understanding of the APB protocol. In
Figure 8, the failure log of the respective test is presented. It can be clearly seen that the ChatGPT-4.0 assertion correctly indicated the error in the test.
Figure 9 shows a timing diagram from a typical System Verilog simulation developed in the context of UVM-based functional verification. A key observation is that the PENABLE signal was raised during a reset condition, which typically raises concerns in functional verification scenarios.
When the LLM asked for basic assertions for the AMBA APB protocol via a text prompt, the LLM output for this task was as presented in
Figure 10. The figure shows the LLM providing the assertions with explanations on each point and carefully structuring the code.
These assertions provided that the LLM was integrated into the APB verification environment.
Figure 11 presents all the basic AMBA APB System Verilog assertions integrated at the top.sv file. The LLM suggested five basic assertions: PSEL should be high when a transfer is in progress, PENABLE should be asserted only after PSEL is asserted, PADDR should be stable when PSEL and PENABLE are high, PWRITE should be stable when PSEL and PENABLE are high, and PWDATA should be stable during the enable phase of the transfer.
The signals from PCLK, PENABLE, etc., were renamed to clk and vif.penable to resolve the simulation issues. This was carried out to correct the variables and point to the testbench’s APB virtual interface. The test was run, with errors triggered by the assertions suggested by ChatGPT-4.0.
Despite the good results obtained and promising industry implementations in the future, as mentioned in the previous section the engineer must foresee the assertion generation closely. Suppose the request is formulated incorrectly with respect to the protocol rules, such as in
Figure 12 and
Figure 13 (the resulting incorrect code). In that case, the LLM cannot detect and correct the error inserted by the request.
The LLM is more prone to errors regarding the request and not the code or the code syntax generated. Thus, the engineer must thoroughly supervise the processes and assess which assertions were requested and generated correctly.
In this paper’s second approach, the LLM was fed an AMBA APB write transfer image and an AMBA APB read transfer image. No further details were added, just the images themselves and the request to generate System Verilog assertions.
Figure 14a,b exhibit the way the LLM was asked to provide the code and the images fed into it.
For the write transfer, ChatGPT-4.0 generated System Verilog assertions that check the functionality of PADDR (should be valid when PSEL is high), PWRITE (should be set to 1′b1), PENABLE (should assert after PSEL), PWDATA (make sure that PWDATA is valid when PENABLE is high), and PREADY (should be high after PENABLE is high). Similar assertion ideas also apply to the read transfer, as the only difference is that PWRITE should be low, indicating a read transfer. ChatGPT correctly generated the code for the assertions, although a few modifications are sometimes needed, mostly for the integration into the existing testbench.
While these two approaches generate fast and quite accurate results, the verification engineer must supervise the model results to correct them in each case. Regardless of the way the LLM is asked for the assertions, sometimes it can provide wrong results or some of them need more adjustment. For instance, when asked to provide a System Verilog assertion for the USB protocol, ChatGPT-4.0 provided results that are more suggestive than being able to be integrated swiftly into the verification environment. When trying to use ChatGPT for simple protocols or for waveforms that are not so complex, it is very effective, as the results presented above prove that the assertions are reliable and easy to integrate. In
Figure 15, the LLM was asked for a USB protocol assertion using a text prompt. The result was more suggestive and may serve as a clue to the implementation needed on the testbench. Thus, here, the LLM serves more as a co-pilot.
The verification engineer can rely on the LLM to generate the protocol assertions using both approaches. Nonetheless, it is the task of the verification engineer to ensure that the assertions are correct and respect the protocol rules. The role of the engineer shifted more to a supervisor for the LLM and integrator of the LLM output into the testbench. By using these two approaches, the text prompt (asking ChatGPT-4.0 what is needed to specify the language, methodology, and protocol) and the image prompt (providing the LLM with just the waveforms and asking for the output), the time spent on functional verification was been sped up to approximately 37.5% for the text-based prompt approach and 75% for the waveform image-based approach (by not implementing a dedicated score-boarding class and relying only on assertions).
5. Conclusions
In the growing industry of semiconductors, the functional verification process is critical. In fields such as medicine, automotive, and aviation, having a SoC that functions according to the specification is essential. As time spent on functional verification increases vastly each year, the need for various solutions, which mainly include AI, is increasing. In functional verification, the engineer is tasked with creating and writing the verification plan, architecture of the testbench, implementing the testbench, implementing the tests defined, debugging the regression failures, and achieving 100% functional coverage. Thus, the task is laborious, complex, and time-consuming. Even for small testbenches, all these steps take a lot of time to implement them correctly and efficiently, to make sure that the DUT behaves as specified in the documentation.
There are several ways in which AI can improve the process of functional verification:
AI may be used to cluster the regression failure errors, thus enabling easier debugging.
AI can dynamically change the stimuli from the test to achieve a high coverage percentage when running tests. This aids the verification task because the process is completed when 100% coverage is hit.
Classification of features from documentation to highlight the critical functionalities, features, and information that is considered redundant or not used in the chip can be dropped.
Testbench class generation using Generative AI. Generative AI may be able to successfully generate testbench components from scratch for various protocols.
Generating assertions using Generative AI is an essential aspect of System Verilog UVM verification testbenches. Assertions can be immediate or concurrent, with the latter being more crucial, as they validate the protocol’s or DUT’s behavior, as described in the documentation. Concurrent assertions can enhance code readability and reduce the need for excessive checking of code in a specific class. Moreover, these assertions can be seamlessly integrated into a high SoC environment, facilitating reusability.
This paper discussed the impact of using ChatGPT-4.0 to generate System Verilog assertions and integrate them into an existing APB-UVM testbench. These assertions were generated using two approaches: one was a text prompt approach (where the LLM was asked specifically what language to use, methodology, and what kind of assertions are needed), and the other was an imaging approach, where the LLM was just fed the image, without any other knowledge provided. Both approaches presented good results, finding issues in the testbench and how some signals were implemented. Using these assertions, there was no need for a scoreboard class, thus simplifying the environment. By generating assertions, the time spent on functional verification was sped up to about 37.5% (text-based approach) and 75% (for the waveform image-based approach) for the testbench presented in this paper (APB-UVM testbench) by not implementing a dedicated scoreboard class and relying only on assertions. This environment was run on a free platform, which made it harder to develop than in an already dedicated simulation environment. Considering these findings, using Generative AI to create assertions makes it a great candidate for greatly reducing the time spent on functional verification.
This research is particularly beneficial to verification engineers, researchers, and developers in the semiconductor industry, especially those working on complex SoC designs. By integrating AI-driven tools, such as ChatGPT-4.0, into the verification process, these professionals can significantly reduce the time and effort required to generate protocol assertions, thereby enhancing efficiency and accuracy. Additionally, academic researchers exploring the intersection of AI and hardware verification will find valuable insights and practical applications in our methodology, fostering further innovation and development in automated verification techniques.
Despite the promising results, this research also has some limitations. A significant constraint was the dependence on the accuracy and reliability of ChatGPT-4.0. While the model has demonstrated the ability to generate useful protocol assertions, it occasionally produces incorrect or suboptimal code that requires manual review and correction by verification engineers. Additionally, the study was limited to the APB protocol and may not generalize to other protocols or verification environments without substantial adjustments. The research also relied on open-access platforms for simulation, which may not offer the same level of performance or features as industry-standard tools, potentially affecting the robustness of the verification process. Additionally, while AI can significantly reduce the time spent on verification tasks, it does not eliminate the need for skilled engineers to oversee and validate AI-generated results, ensuring that they adhere to specified protocols and standards. These limitations highlight the need for further research to refine the integration of AI into verification processes and expand its applicability across protocols and environments.
Future research directions should focus on developing more sophisticated algorithms and training models on various protocols to improve generalization. Also, future research could explore integrating AI with industry-standard simulation tools to ensure robust performance and compatibility. Finally, investigating the use of AI in the verification of complex protocols and SoC designs, along with the creation of standardized benchmarks for evaluating the effectiveness of AI in functional verification, would provide valuable insights and drive progress in this field.