Impact of Fabrication Defects on FPGA Logic Using Memristor-Based Memory Cells
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript presents a detailed study on the impact of fabrication-induced memristor defects on FPGA logic mapping, evaluating various LUT architectures and proposing defect-aware synthesis enhancements. The topic is relevant and timely, especially as memristor-based FPGAs gain attention. The experimental setup is well‑constructed and the integration with VTR/VPR is technically valuable.
However, several important issues must be addressed before publication.
-
Please discuss the impact of realistic defect densities (e.g., 100–500 DPM random defects) on final device-level yield across different FPGA architectures.
While the manuscript provides results for defect percentages, it does not analyze how wafer-level defect density translates into LUT failure probability and system-level mapping yield. Adding a quantitative relationship (e.g., yield curves, tables, or sensitivity analysis) will substantially improve the practical value of the work. -
Although 6-input LUTs exhibit lower defect tolerance, their higher sensitivity may actually make them useful as screening structures for BEOL memristor process monitoring.
Please discuss whether highly defect-sensitive LUTs could serve as sentinel circuits or process monitors to detect contamination, forming variability, or cluster defects during early process development. -
Please expand the discussion on reliability aspects beyond manufacturing defects.
Topics that deserve inclusion:- endurance and retention degradation of memristors,
- drift and variability during operation,
- the impact of reliability failures on mapping hardness,
- implications for standard memory cell, or other various applications (e.g., short-/long-term memory behavior, PIM or neuromorphic functions).
Including these points will significantly strengthen the completeness of the analysis.
- Please provide a benchmark study in this work as compared to other research works.
Due to the above comments, this referee would like to put the manuscript status as "Major Revision" in the current phase.
Author Response
Thank you for your review of our work. We appreciate the constructive comments and suggestions and have updated our manuscript to reflect them. All changes are highlighted in blue in the revised version of the work, please find our responses to each comment below.
Comment 1:
Please discuss the impact of realistic defect densities (e.g., 100–500 DPM random defects) on final device-level yield across different FPGA architectures. While the manuscript provides results for defect percentages, it does not analyze how wafer-level defect density translates into LUT failure probability and system-level mapping yield. Adding a quantitative relationship (e.g., yield curves, tables, or sensitivity analysis) will substantially improve the practical value of the work.
Response 1:
Thank you for making this point. Many defect-mitigation techniques from the literature assume failure rates to be much lower (e.g. due to mature fabrication processes). The presented approach for defect-tolerant mapping is explicitly designed for large memory cell error rates. The comparison with other defect-tolerant mapping approaches shows that the presented approach excels at handling defect rates in the 1% to 5% range.
To allow a comparison with other approaches, we have extended our manuscript with data representing error rates between 0.01% and 0.05%, reflecting your suggested defect rates. This discussion was added to Sections 3.2 and 3.6 (pages 12-13 and 17-18). In this regime, the presented approach performs comparably to existing defect mitigation techniques, partially due to the overprovisioning of the FPGA resources allowing a certain headroom already.
Defining yield metrics for defect-affected FPGA chips on the other hand is not straightforward, since the hardware itself is reconfigurable. A defect, which would render classical chips unusable (e.g., a defect in the multiplier of an ALU), most likely does not eliminate the entire functionality of the affected technology element on an FPGA. Logic elements on the FPGA can tolerate a certain amount of errors. The allowed number of errors depends on the concrete functionality that should be realized on the chip. The static definition of yield must therefore be replaced with a case-by-case study of whether the given functionality can be realized on the given chip.
The translation between wafer-level defect rates and LUT failure rates likewise isn’t trivial, since it is highly dependent on the actual defect pattern and device architecture. For this work, reasonable assumptions for the defect patterns (random and particle contamination patches) were made. However, since every fabrication process is different, the underlying software framework allows for arbitrary defect pattern generation.
Additionally, we investigated how much the functionality of a LUT is impaired when a given amount of defective memory cells is present. This depends on the size of the LUT and can only be understood as a statistical value since the given error configuration can lead to some functions being successfully mapped and others not being mapped successfully. For a 6-input LUT, every additional error leads, on average, to a reduction of functions that can be mapped successfully of 7 percentage points. However, the average number of defective memory cells per LUT that contains at least one error depends on the error distribution. For the clustered distribution, the number of LUTs containing no errors is larger, however the LUTs that are affected contain significantly more errors.
Comment 2:
Although 6-input LUTs exhibit lower defect tolerance, their higher sensitivity may actually make them useful as screening structures for BEOL memristor process monitoring. Please discuss whether highly defect-sensitive LUTs could serve as sentinel circuits or process monitors to detect contamination, forming variability, or cluster defects during early process development.
Response 2:
Thank you for your recommendation to include screening structures with higher sensitivity. Currently the geometrical information, which is available as a side product of the presented approach, is not fully utilized yet. With the assumed FPGA structure, it is feasible to to consider using defect information of neighboring sensitive structures to extrapolate whether the defect pattern is more clustered or more uniform. This information could be used to classify potential issues of the fabrication process. Although not very common, FPGAs with multiple LUT sizes are commercially available. The more sensitive LUTs could be used as early stage sorting criterion during mapping to estimate, if the same logic block is likely to show further defects. While this is not done yet, we have extended Section 4.3 (Future Work, page 20) to comment on this as an avenue for further research.
Comment 3:
Please expand the discussion on reliability aspects beyond manufacturing defects.
Topics that deserve inclusion:
- endurance and retention degradation of memristors,
- drift and variability during operation,
- the impact of reliability failures on mapping hardness,
- implications for standard memory cell, or other various applications (e.g., short-/long-term memory behavior, PIM or neuromorphic functions).
Including these points will significantly strengthen the completeness of the analysis.
Response 3:
That is a good point, our discussion of other cell read/write failures was limited in the initial version of the manuscript.
For the material systems we consider to use in NV-FPGAs such as HfOx or Yttrium oxide, we observed no retention or read disturbance issues. Memristors made at TU Darmstadt have a very stable state even under enhanced environmental conditions (temperatures of up to 150°C) in excess of 1e14 readouts. The measured resistance levels did not show degradation over time. Even if there is a slight to medium degradation over time, the material system can be tuned to achieve a very high LRS to HRS ratio (several magnitudes) while a stable digital readout is possible even at a ratio of 1:10. The higher energy required to change state of such a engineered memristor makes accidental switches due to the applied readout voltage during normal operation unlikely. Together with the inherent drift resistance of the 1T2R-cells utilized in this work, state retention of the memristors is not a direct concern for the intended structures.
Likewise read variability is less of an issue here since the cells are read out using the ratio between both of their memristors, leading to a good tolerance for variations in the actual resistance values. Temporary failures when setting the devices can be mitigated by reading back the written bitstream and attempting to rewrite bits that did not get set correctly, updating the defect map as needed if cells remain stuck.
We have extended Section 2.4 (page 6) to discuss reliability aspects and the influence of the chosen 1T2R cell design on them in more detail.
Comment 4:
Please provide a benchmark study in this work as compared to other research works.
Response 4:
Defect-aware and tolerant mapping approaches have indeed been studied previously. They can be categorized into two classes. The first class of approaches uses device-level tolerance mechanisms. These approaches propose additional spare resources that can be used in case of an error, without involving external processing. These approaches include adding extra rows or blocks and using additional resources to enable shifting functionality from defective elements to spare ones.
The second class of approaches requires configuration-level changes, aided by external processing, while not requiring architectural-level changes to the device. Most of these approaches avoid defective elements altogether. Different approaches differ in the granularity of the elements they avoid. The granularity can range from LUTs to entire logic blocks, up to large structures such as Slices.
We have done additional simulations and extended the provided data by running synthesis approaches that exclude defective LUTs and logic blocks entirely as soon as defects are detected. The manuscript was extended to show the results in Sections 3.4 (Architecture Comparison, page 15), 3.5 (Improvements, pages 15-17) and added Section 3.6 (Comparison to other approaches, pages 17-18) that discusses other approaches and compares their strategy against ours.
Reviewer 2 Report
Comments and Suggestions for AuthorsThis manuscript investigates the impact of fabrication defects on FPGA logic using memristor-based memory cells. The study is interesting. I recommend it for publication in Micromachines after appropriate revisions.
- The Introduction section is overly simplistic. At a minimum, a brief description of the memristor should be included, such as its primary working mechanisms, including ion migration (e.g., Mater. 2009, 21, 2632-2663) and carrier trapping/detrapping (e.g., Adv. Funct. Mater. 2021, 31, 2005582).
- The authors state: “Since all defects investigated in this work are caused by fabrication issues, it is only possible to get a complete picture of all defects after fabrication.”However, concerning memristive devices based on ion migration, each switching cycle between HRS and LRS induces changes in material microstructures and defect states. Only for purely electronic devices based on carrier trapping/detrapping are all defects caused during the fabrication process.
- The authors are advised to clarify why they employ a 1T2R architecture for the memory cell.
Author Response
Thank you for your review of our work. We appreciate the constructive comments and suggestions and have updated our manuscript to reflect them. All changes are highlighted in blue in the revised version of the work, please find our responses to each comment below.
Comment 1:
The Introduction section is overly simplistic. At a minimum, a brief description of the memristor should be included, such as its primary working mechanisms, including ion migration (e.g., Mater. 2009, 21, 2632-2663) and carrier trapping/detrapping (e.g., Adv. Funct. Mater. 2021, 31, 2005582).
Response 1:
Thank you for pointing this out, the introduction of our work did indeed not introduce memristors in detail. We have supplemented the introduction by a description of the memristor (l. 27-36), their basic working principles (l. 37-54) and their suitability for constructing non-volatile memory cells (l.55-63). Additionally, we have added commentary on the influence of the less desirable properties of memristors to the introduction of the selected memory cell type in Section 2.3 (page 5).
Comment 2:
The authors state: “Since all defects investigated in this work are caused by fabrication issues, it is only possible to get a complete picture of all defects after fabrication.”However, concerning memristive devices based on ion migration, each switching cycle between HRS and LRS induces changes in material microstructures and defect states. Only for purely electronic devices based on carrier trapping/detrapping are all defects caused during the fabrication process.
Response 2:
Indeed, cycle-to-cycle variations and other effects during write cycles of the memristors are a major topic in the development of memristor material systems and circuits utilizing memristors. For NV-FPGAs, we benefit from the low number of write cycles typically seen in configuration memory, making this a much smaller concern.
During normal operation of an FPGA the memristors are not frequently reprogrammed. This eliminates the risk of newly occurring defects due to material microstructure changes during normal operation. During reprogramming additional defects may occur. Most of these write failures are temporary, which allows mitigation through Write-Modify writing methods. The basic building structure of FPGAs already allows for bit accurate reading out of the actual memory cell values, which is normally done anyway to ensure the correct programming of the device. Most erratic temporary bit failures can be handled this way. If additional permanent stuck cells have appeared, they can be added to the existing defect map of the device and subsequently treated the same way fabrication errors are.
If additional errors have occurred, they can just be added to the existing defect map and treated like fabrication errors. Regarding the read endurance in house research shows very stable states of Hf and Yt devices even under enhanced environmental conditions (temperatures up to 150 °C) for at least 1e14 cycles. Leaving the memory cells under constant read safe voltage levels is therefore not problematic for state retention.
We have extended the discussion of failure modes in Section 2.4 (Modeling defective memory cells, l. 206-214) and added an explanation for how the features of FPGAs allow us to extract a defect location map and treat temporary write errors to l. 224-249.
Comment 3:
The authors are advised to clarify why they employ a 1T2R architecture for the memory cell.
Response 3:
Thank you for pointing out this oversight on our part. The reason for selecting an 1T2R architecture comes down to the requirements of our specific application.
FPGAs need a lot of configuration memory cells (easily in the order of 1e5 to 1e6 for medium sized chips). Building small memory cells therefore is crucial for area efficiency. At the same time, the static power consumption of the cells needs to be kept as small as possible, since memory cells in active LUTs need to be under constant supply voltage for the FPGA to function correctly. This means that the cell must have a sufficiently high resistance between VDD and GND. Additionally, the constant voltage supply means that some memristor material technologies may exhibit drift of their resistance value. 1T2R uses the ratio between the two memristors instead of their specific value, offering good tolerance against cycle-to-cycle variability and drift during normal operation of the cell.
For the application as LUT configuration memory the 1T2R cell is a good compromise. Smaller cells like 1T1R will exhibit a too large static power consumption for this purpose and are much less tolerant against variations in the resistance of the memristor.
We have added this discussion to the manuscript in Section 2.3 (l. 171-187) and additionally commented on the option of tuning the material system specifically for the properties that benefit this cell arrangement.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsMy comments have been satisfactorily addressed. The revised manuscript has been greatly enhanced. I recommend it for acceptance following minor revisions.
Specifically, the reference list should be reformatted and several entries corrected. Two examples are given below:
- “2. L. Chua, "Memristor-The missing circuit element," in IEEE Transactions on Circuit Theory, vol. 18, no. 5, pp. 507-519, September 1971.”
should be formatted as:
“2. Chua, L. Memristor-The missing circuit element. IEEE Trans. Circuit Theory 1971, 18, 507-519.”
- “5. Hu, L., Yang, J., Wang, J., Cheng, P., Chua, L.O. and Zhuge, F. (2021), Optoelectronic Neuromorphic Computing: All-Optically Controlled Memristor for Optoelectronic Neuromorphic Computing (Adv. Funct. Mater. 4/2021). Adv. Funct. Mater., 31: 2170027.”
should be replaced with:
“5. Hu, L.; Yang, J.; Wang, J.; Cheng, P.; Chua, L.O.; Zhuge, F. All-optically controlled memristor for optoelectronic neuromorphic computing. Adv. Funct. Mater. 2021, 31, 2005582.”
