A Systematic Method to Generate Effective STLs for the In-Field Test of CAN Bus Controllers
Round 1
Reviewer 1 Report
The paper describes a method to generate STLs for in-field test of CAN bus. The paper is extremely well written and the experimental results are appropriate. I have just one comment:
Can you provide a comparison of your approach with state-of-the-art STL generation methodologies?
Author Response
Thank you very much for your question, which really helps in putting into perspective our work with respect to state-of-the-art methodologies.
To take it into account, the following paragraph has been added to the article (section V, line 610).
“The approach presented in this article is a mix of manual STL development based on ad hoc methods targeting specific modules within the CAN Controller and ATPG-based solutions regarding the incoming message filter. For some modules (e.g., the Register Interface) we described in the paper a new method to test them. Results too are comparable to those that are typically achieved on processors and simpler peripherals, with a typical achieved coverage above 80% of faults, proving the effectiveness of the approach [30,36,37].”
Reviewer 2 Report
The paper brings the consideration devoted to the testing of a CAN bus controller. There is utilized pure software testing procedure that can be incorporated into the normal operation of the system provided given requirements to controllers and implemented software are met. The proposed testing methodology is very useful for in-field testing without the need of using additional test equipment.
The authors were able to improve a number of detected faults by using interesting techniques of test development. There is utilized a model of controller. In the paper, it is not stated clearly what type of model is used. I am guessing that this is one of the HDL models based on the results present in table 1 (simulation time).
There arise several questions devoted to functional coverage and the methodology employed for this detection.
1. Clearly describe your approach for the generation of a test program using the CAN bus controller HDL model. In order to be able to model faults, this must be at least model utilizing mapped primitive components (post-synthesis using technology library or post-implementation).
2. The fault injection model is not described. At least a brief reference to the generation of faults in the model is required
3. The test pattern generation is based on the model. Is there employed a backtracing for building a test sequence? How the test sequence is developed?
4. There are faults that result in a response failure. How to cope with writing a test where no response arrives due to a fault in the system. An infinite wait (deadlock) appears in the testing procedure. An explanation is required for such faults from a point of view of the embedded test.
The formal description of test sets requires attention. There is a commonly used naming convention that reserve capital letter for sets while items of sets are marked using lower case letters. In case of a test set, we should use the formal concept as follows:
1. The test set T = {t0, t1, ...tn}
2. The test ti is a tuple that consists of ti = (CFGi, Xi, Yi, si)
3. In this example we have CFGi - the set of configuration vectors/word, Xi - the set of input vectors (data vectors), the Yi denotes the set of valid responses that can be replaced by the signature si calculated for test ti.
4. It is essential when using a signature to show/briefly explain the test compactor architecture (possibly an LFSR or other derivative of calculating unique abbreviation for data sequence) and determine possible fault masking by the signature calculation process.
There are other technical and formal remarks placed in the attached manuscript. Please check the attached manuscript for comments. I hope this helps you improve the presentation of your ideas.
Comments for author File: Comments.pdf
Author Response
Thank you very much for your detailed comments and reviews, they were extremely helpful to strengthen the method and results description sections.
Starting with the attached manuscript you provided, we have included your observations in the introduction section, better clarifying the statement involving DfT power consumption: “Testing a device through DfT means requires paying special attention to the power consumed while testing the module, as it may exceed that of functional operations” (section 1, line 48), the statement involving the usage of SBST solutions in literature and by companies: “Developing SBST solutions for CPUs [5– 8] and peripherals [9– 13] is a well-known topic in literature. In addition to that, several companies currently provide STLs for their own products [14– 20]” (section 1, line 62), and the test of the modules’ wiring: “This latter configuration is also relevant to test the wiring of modules. Such wirings might be affected, in case of industrial installations, by problems related to mechanical normal operation condition conditions, e.g., vibration or electromagnetic noise” (section 1, line 75).
Moving on to the background, we rewrote some portions that were not so clear, better describing our earlier article and remarking how this one aims at improving it: “The systematic approach proposed by our group [10], finally, introduces a systematic methodology to test CAN controllers on modern SoCs. The presented approach relies on SBST means, showing how to build STLs capable of testing all different functional modes on a generic CAN bus peripheral implementation. The test procedure described in [10] consists of early on chip stage and system-based communication. Moreover, with such methodology, diagnostics is possible with multiple nodes. Such a test procedure requires the presence of two nodes, both being able to transmit and receive messages. Experimental results show that the methodology presented in [10] is capable of reaching high fault coverages but does not consider constraints coming from the application run by the system into which they are embedded. In this article we tackle this shortcoming by proposing a testing methodology based on two different configurations: either the device under test performs a self-testing routine, thus not affecting whatsoever the other nodes on the CAN bus, or the whole network goes into test mode, with some nodes actively partaking into the testing routine while others do not interact” (section 2, line 237).
In the STL development strategies section, besides the questions you posed, we have re-worked the algorithms for the CAN submodules, improving their formal description and clarifying details for the FIFO testing routine. In section 3.1, line 322, we added the following paragraph:
“FIFOs can have different implementations, typically in the form of circular buffers or shift registers with pointers to the head and tail of the memory that are updated anytime a read or write operation is performed. Such pointers are used to avoid overflow and underflow conditions and can be accompanied by almost full and almost empty signals to better coordinate memory access. There are, however, some details and issues that are implementation dependent, e.g., the FIFO fill level at which the almost empty/full flags rise. For this reason, specific solutions related to implementation-related problems can easily be added to our test routine.”
The FIFO algorithm has been edited, adding details on the empty condition too, and another paragraph has been added in section 3.1, line 335:
“The test consists of two bulks of write and read operations. First, we fill the FIFO with one half of the checkerboard patterns, with the effect of generating an overrun condition, corresponding to a full FIFO, followed by the complete emptying of the memory with the check of its relative empty flag. Next, once the full and empty flags have been tested, we proceed with the remaining half of the checkerboard patterns. In case the almost full and almost empty conditions need to be tested, the test engineer can interrupt the bulk writing/reading operation to check on the relative flags, resuming it after.”
In the Experimental Results section, we have added more details on the fault injection process we adopted in this article: “All STLs have been evaluated by a fault injection mechanism based on a commercial functional fault simulation tool by Synopsys that runs fault-parallel simulations. We performed the fault simulation on the gate-level netlist of the DUT. Test patterns were obtained while performing a logic simulation of the DUT running the STL, recording all input ports of the DUT at each clock cycle. In the same logic simulation, we also recorded the values at the output ports at each clock cycle, thus generating the set of fault-free responses, also referred to as golden responses, that have been then provided to the fault simulation tool” (section 5, line 609).
Other edits and typos have been fixed too and are reported in the reviewed article.
Following, I have reported replies to your questions.
Q: Clearly describe your approach for the generation of a test program using the CAN bus controller HDL model. To be able to model faults, this must be at least modeled utilizing mapped primitive components (post-synthesis using technology library or post-implementation).
A: “The outlined STL strategies are independent from the specific CAN controller implementation and target all stuck-at faults that are found in the post-synthesis netlist obtained by the CAN HDL description.”
We have added this paragraph in section III, line 250. We then specify how we make use of the post-synthesis model in the fault injection process.
Q: The fault injection model is not described. At least a brief reference to the generation of faults in the model is needed
A: As pinpointed by the reviewer, the fault injection mechanism was not explained thoroughly in the article. Describing the fault injection model is crucial not only for the methodology section, but also when presenting results. For this reason, this answer has been split into two parts.
The first one has been integrated into section III, line 204:
“Once the STL has been generated, a fault injection mechanism is used to estimate its effectiveness. Fault injections can be carried out in several diverse ways, e.g., by using ad-hoc commercial fault simulation tools, logic simulators, or hardware-based fault injection via FPGA. Regardless of the method, test vectors for the fault injection are obtained from the stimuli applied at the primary inputs of the DUT during the execution of the STL. These vectors are then applied to the synthesized netlist of the DUT where stuck-at faults are injected. In this way, it is possible to assess the achieved test coverage.”
The second one refers to the experimental setup we adopted, and is found in section V, line 554:
“All STLs have been evaluated by a fault injection mechanism based on a commercial functional fault simulation tool by Synopsys that runs fault-parallel simulations. We performed the fault simulation on the gate-level netlist of the DUT. Test patterns were obtained while performing a logic simulation of the DUT running the STL, recording all input ports of the DUT at each clock cycle. In the same logic simulation, we also recorded the values at the output ports at each clock cycle, thus generating the set of fault-free responses, also referred to as golden responses, that have been then provided to the fault simulation tool.
Q: The test pattern generation is based on the model. Is there employed a back tracing for building a test sequence? How is the test sequence developed?
A: This topic is strictly related to the fault injection process, as back tracing depends on how well the STL performs. For this reason, the following comment has been added right after the one on fault injection in section III, line 299.
“Such a process [fault injection] is also used as a means of back tracing during the STL generation process. Results obtained during the fault simulation can be arranged so that details on the submodule coverages are obtained, providing insights on what areas of the peripheral need to be better tested (if the achieved Fault Coverage is not enough) after the basic algorithm proposed for each module has been transformed into test code. In this way, we can also easily identify what portions of the STL need to be improved. Refining the STL can be done in an iterative fashion, with as many cycles of code refinement and fault simulation as required until the desired coverage is achieved. If required, test engineers can finally apply some post-processing techniques to reduce the final STL size by removing redundant portions of code [31-34].”
Q: There are faults that result in a response failure. How to cope with writing a test where no response arrives due to a fault in the system. An infinite wait (deadlock) appears in the testing procedure. An explanation is required for such faults from the point of view of the embedded test.
A: Thanks for the observation. In our experience, the number of faults in the CAN Controller leading to a deadlock or affecting its time behavior, only, are relatively rare. However, to take into account your comment the following paragraph has been added in section III, line 287.
“It is noted that, if enabled, the CAN peripheral can issue interrupts; hence, the test engineer should consider writing appropriate Interrupt Service Routines to handle such cases. This, however, could also lead to a situation where an interrupt never occurs because of the presence of faults. More in general, deviations from the usual execution of the STL may occur, e.g., deadlocks or exceptions from other modules. Works that tackle these issues can be found in literature (e.g., [30]), providing techniques that make the STL more robust. As an example, to avoid deadlocks, a watchdog timer that brings the system into a safe state in case of infinite waits can be employed. Since in this work we assumed not to be allowed to modify the hardware or add modules, we did not consider this choice that could clearly increase the achieved Fault Coverage.”
Q: It is essential when using a signature to show/briefly explain the test compactor architecture (an LFSR or other derivative of calculating unique abbreviation for data sequence) and determine any possible fault masking by the signature calculation process.
A: We agree with your comment, and we added the following paragraph to the article (section III, line 277).
“The signature computation process is a crucial step in the execution of an SBST test procedure, and several articles focus on this topic (e.g., [24-26]). Typically, this is performed by either making use of special hardware structures, e.g., MISR modules within the DUT that can be directly accessed and fed with data through code, or by emulating such hardware in software through arithmetic and logic operations, e.g., by means of sums with carry and xor operations. Previous papers [27-29] showed that when dealing with STLs and suitably implementing / operating the SW-implemented MISRs, the aliasing probability can be reduced to very low values.”
Round 2
Reviewer 2 Report
The paper after introducing revisions addresses all remarks included in the review. Now the paper clearly shows the link between test development and the use of gate-level netlist for analysing fault detection coverage.
The formal side of test notation has been improved that also is important.
The presentation of the physical properties of the CAN bus and hardware properties influencing the test construction has been significantly improved.
There have been observed minor problems like repeated words in neighbouring statements e.g. line 238/239 twice systematic is used. Similarly, lines 518 and 521 where statements begin with "The adopted CAN"