Figure 1.
Overall architecture of the proposed Arm Cortex-M0 SoC platform. Blue-highlighted ports indicate the I/O boundary. I2C Channel 1 and the AHB Slave port face inward, designated for on-chip IP integration.
Figure 1.
Overall architecture of the proposed Arm Cortex-M0 SoC platform. Blue-highlighted ports indicate the I/O boundary. I2C Channel 1 and the AHB Slave port face inward, designated for on-chip IP integration.
Figure 2.
Detailed address map of the dual I2C interfaces within the APB subsystem. I2C Channel 0 (0x4000_9000) is designated for external devices; Channel 1 (0x4000_A000) is reserved for on-chip IP integration. Both channels share an identical hardware implementation.
Figure 2.
Detailed address map of the dual I2C interfaces within the APB subsystem. I2C Channel 0 (0x4000_9000) is designated for external devices; Channel 1 (0x4000_A000) is reserved for on-chip IP integration. Both channels share an identical hardware implementation.
Figure 3.
I2C device driver and usage example. Red boxes highlight the key memory-mapped register access operations. The driver applies to both Channel 0 and Channel 1 by changing only the base address argument.
Figure 3.
I2C device driver and usage example. Red boxes highlight the key memory-mapped register access operations. The driver applies to both Channel 0 and Channel 1 by changing only the base address argument.
Figure 4.
The proposed FPGA-to-silicon verification methodology. The dashed boundary denotes the scope of the methodology. Purple and teal boxes represent the RTL/firmware starting point and FPGA verification steps, blue boxes represent ASIC implementation steps and the amber box represents the functional equivalence check, which is the primary verification step. Yellow boxes indicate shared software artifacts applied to both platforms without modification.
Figure 4.
The proposed FPGA-to-silicon verification methodology. The dashed boundary denotes the scope of the methodology. Purple and teal boxes represent the RTL/firmware starting point and FPGA verification steps, blue boxes represent ASIC implementation steps and the amber box represents the functional equivalence check, which is the primary verification step. Yellow boxes indicate shared software artifacts applied to both platforms without modification.
Figure 5.
FPGA implementation result of the Arm Cortex-M0 SoC on a Terasic DE2-115 board (Intel Cyclone IV EP4CE115F29) using Quartus Prime 18.0. The design meets setup and hold timing constraints at 50 MHz, including the ROM Writer and all peripherals.
Figure 5.
FPGA implementation result of the Arm Cortex-M0 SoC on a Terasic DE2-115 board (Intel Cyclone IV EP4CE115F29) using Quartus Prime 18.0. The design meets setup and hold timing constraints at 50 MHz, including the ROM Writer and all peripherals.
Figure 6.
Internal architecture of the ROM Writer. HEX code is received at 115,200 bps via UART, buffered in a FIFO, reassembled into 32-bit little-endian words, and written sequentially to ROM. Upon completion, the ROM Writer gates its own clock and transfers control to the Cortex-M0 core.
Figure 6.
Internal architecture of the ROM Writer. HEX code is received at 115,200 bps via UART, buffered in a FIFO, reassembled into 32-bit little-endian words, and written sequentially to ROM. Upon completion, the ROM Writer gates its own clock and transfers control to the Cortex-M0 core.
Figure 7.
UART terminal output showing ROM write completion and Cortex-M0 core startup after system reset. Following HEX code transfer via the ROM Writer, the core acquires ROM control and begins program execution immediately after reset.
Figure 7.
UART terminal output showing ROM write completion and Cortex-M0 core startup after system reset. Following HEX code transfer via the ROM Writer, the core acquires ROM control and begins program execution immediately after reset.
Figure 8.
HW/SW integration verification result on the FPGA platform. UART communication, interrupt handling, timer, and watchdog timer functions all operate correctly, establishing the verified FPGA baseline for the subsequent FPGA-to-silicon equivalence check.
Figure 8.
HW/SW integration verification result on the FPGA platform. UART communication, interrupt handling, timer, and watchdog timer functions all operate correctly, establishing the verified FPGA baseline for the subsequent FPGA-to-silicon equivalence check.
Figure 9.
UART terminal output confirming I2C EEPROM write and read operation on the FPGA platform (DE2-115 on-board 24LC08). Values 0–9 are written and read back correctly, establishing software-level I2C correctness prior to the FPGA-to-silicon equivalence check.
Figure 9.
UART terminal output confirming I2C EEPROM write and read operation on the FPGA platform (DE2-115 on-board 24LC08). Values 0–9 are written and read back correctly, establishing software-level I2C correctness prior to the FPGA-to-silicon equivalence check.
Figure 10.
Logic analyzer waveform showing simultaneous operation of I2C Channel 0 and Channel 1 on the FPGA platform. This is the sole waveform evidence for dual-channel concurrent I2C operation in the FPGA verification stage.
Figure 10.
Logic analyzer waveform showing simultaneous operation of I2C Channel 0 and Channel 1 on the FPGA platform. This is the sole waveform evidence for dual-channel concurrent I2C operation in the FPGA verification stage.
Figure 11.
Complete SoC platform validated through FPGA implementation, encompassing RTL timing closure, ROM Writer-based code loading, HW/SW integration verification, and I2C peripheral testing. This flow constitutes the FPGA-side foundation of the proposed FPGA-to-silicon verification methodology.
Figure 11.
Complete SoC platform validated through FPGA implementation, encompassing RTL timing closure, ROM Writer-based code loading, HW/SW integration verification, and I2C peripheral testing. This flow constitutes the FPGA-side foundation of the proposed FPGA-to-silicon verification methodology.
Figure 12.
ASIC digital design flow for Samsung 28 nm LPP CMOS, following the foundry-recommended Topographical Synthesis approach with MCMM analysis and OCV derating. EDA tool details for each stage are listed in
Table 2.
Figure 12.
ASIC digital design flow for Samsung 28 nm LPP CMOS, following the foundry-recommended Topographical Synthesis approach with MCMM analysis and OCV derating. EDA tool details for each stage are listed in
Table 2.
Figure 13.
Memory Wrapper implementation: issue identification (left), intermediate correction (center), and final resolved result (right). Routing concentrated at pin centers and clock shield VSS patterns penetrating the memory interior are corrected. A single Wrapper instance is reused for both ROM and RAM, which share an identical SRAM16384×32 macro.
Figure 13.
Memory Wrapper implementation: issue identification (left), intermediate correction (center), and final resolved result (right). Routing concentrated at pin centers and clock shield VSS patterns penetrating the memory interior are corrected. A single Wrapper instance is reused for both ROM and RAM, which share an identical SRAM16384×32 macro.
Figure 14.
AHB bus and Samsung on-chip SRAM timing before (top) and after (bottom) the negative-edge clock optimization in AHB_ROM. Without optimization, SRAM presents valid data at T3, one cycle after the required T2, causing incorrect SoC operation. Implementing the control signal generation on the negative clock edge advances the SRAM output by half a cycle, resolving the mismatch with no area or power overhead.
Figure 14.
AHB bus and Samsung on-chip SRAM timing before (top) and after (bottom) the negative-edge clock optimization in AHB_ROM. Without optimization, SRAM presents valid data at T3, one cycle after the required T2, causing incorrect SoC operation. Implementing the control signal generation on the negative clock edge advances the SRAM output by half a cycle, resolving the mismatch with no area or power overhead.
Figure 15.
Post-layout simulation waveform of the AHB-to-memory read path with the Samsung 28 nm on-chip SRAM model. HRDATA presents valid data (0x2000_0368) at T2, two clock cycles after the instruction fetch request at T0 (HADDR = 0x0, HWRITE = 0), confirming correct timing achieved by the negative-edge clock implementation in AHB_ROM.
Figure 15.
Post-layout simulation waveform of the AHB-to-memory read path with the Samsung 28 nm on-chip SRAM model. HRDATA presents valid data (0x2000_0368) at T2, two clock cycles after the instruction fetch request at T0 (HADDR = 0x0, HWRITE = 0), confirming correct timing achieved by the negative-edge clock implementation in AHB_ROM.
Figure 16.
Post-layout STA result after ECO, generated by Synopsys PrimeTime. All 131,021 timing checks (setup, hold, recovery, removal) pass with zero violations under both MAX and MIN corners at the 4 ns (250 MHz) constraint with OCV derating applied.
Figure 16.
Post-layout STA result after ECO, generated by Synopsys PrimeTime. All 131,021 timing checks (setup, hold, recovery, removal) pass with zero violations under both MAX and MIN corners at the 4 ns (250 MHz) constraint with OCV derating applied.
Figure 17.
Final hierarchical design views with I/O cells (left) and without I/O cells (right). The SoC core block (653 m × 769 m) is placed on the right side of the 3958 m × 3958 m die, with the left side reserved as open space for designers integrating the platform as a hard macro.
Figure 17.
Final hierarchical design views with I/O cells (left) and without I/O cells (right). The SoC core block (653 m × 769 m) is placed on the right side of the 3958 m × 3958 m die, with the left side reserved as open space for designers integrating the platform as a hard macro.
Figure 18.
Final layout view after GDS merge with Samsung 28 nm LPP foundry cells in Cadence Virtuoso. The two SRAM macros (ROM and RAM) are placed symmetrically. The total chip size is 4000 m × 4000 m including the seal ring.
Figure 18.
Final layout view after GDS merge with Samsung 28 nm LPP foundry cells in Cadence Virtuoso. The two SRAM macros (ROM and RAM) are placed symmetrically. The total chip size is 4000 m × 4000 m including the seal ring.
Figure 19.
FPGA-to-silicon I2C SDA/SCL waveform comparison confirming HW/SW functional equivalence. Both platforms produce identical 200 kHz SCL and SDA data sequence (0xA0, 0x00, 0x00) using the same firmware source code.
Figure 19.
FPGA-to-silicon I2C SDA/SCL waveform comparison confirming HW/SW functional equivalence. Both platforms produce identical 200 kHz SCL and SDA data sequence (0xA0, 0x00, 0x00) using the same firmware source code.
Figure 20.
Bare die (left) and LQFP-208 packaged chip (right) fabricated in Samsung 28 nm LPP CMOS. The die size is 4000 m × 4000 m including the seal ring. Limited visibility of metal layers in the bare die photograph is due to the passivation layer applied during fabrication.
Figure 20.
Bare die (left) and LQFP-208 packaged chip (right) fabricated in Samsung 28 nm LPP CMOS. The die size is 4000 m × 4000 m including the seal ring. Limited visibility of metal layers in the bare die photograph is due to the passivation layer applied during fabrication.
Figure 21.
Socket module and PCB test board for chip measurement. The board supplies 0.9–1.6 V to the core and 1.8 V to I/O cells, and connects to a pulse generator (81130A), Saleae Logic Pro, Tektronix TDS3052 oscilloscope, and PC via USB-to-UART converter.
Figure 21.
Socket module and PCB test board for chip measurement. The board supplies 0.9–1.6 V to the core and 1.8 V to I/O cells, and connects to a pulse generator (81130A), Saleae Logic Pro, Tektronix TDS3052 oscilloscope, and PC via USB-to-UART converter.
Figure 22.
Power efficiency across the measured voltage range—higher is better. The optimal point at 1.0 V (yellow) achieves 491 MOPS/mW. Efficiency degrades above 1.4 V (blue) as leakage power becomes dominant.
Figure 22.
Power efficiency across the measured voltage range—higher is better. The optimal point at 1.0 V (yellow) achieves 491 MOPS/mW. Efficiency degrades above 1.4 V (blue) as leakage power becomes dominant.
Figure 23.
Power consumption and energy efficiency (pJ/cycle) across the measured voltage range—lower is better. The dashed line represents theoretical V2 dynamic power scaling. Measured values follow the trend up to 1.2 V, then deviate significantly above 1.4 V due to leakage-dominated operation.
Figure 23.
Power consumption and energy efficiency (pJ/cycle) across the measured voltage range—lower is better. The dashed line represents theoretical V2 dynamic power scaling. Measured values follow the trend up to 1.2 V, then deviate significantly above 1.4 V due to leakage-dominated operation.
Figure 24.
Verification of I2C Channel 0 SDA/SCL waveforms on the fabricated ASIC, compared against firmware behavior. SCL operates at 200 kHz and SDA correctly drives the target sequence (0xA0, 0x00, 0x00). Since Channel 1 uses identical hardware, these results confirm reliable operation of both I2C channels.
Figure 24.
Verification of I2C Channel 0 SDA/SCL waveforms on the fabricated ASIC, compared against firmware behavior. SCL operates at 200 kHz and SDA correctly drives the target sequence (0xA0, 0x00, 0x00). Since Channel 1 uses identical hardware, these results confirm reliable operation of both I2C channels.
Table 1.
Memory map of the proposed SoC platform. Address space is organized in 4 KB-aligned segments. I2C Channel 0 and Channel 1 sub-ranges are highlighted in red.
Table 1.
Memory map of the proposed SoC platform. Address space is organized in 4 KB-aligned segments. I2C Channel 0 and Channel 1 sub-ranges are highlighted in red.
| Name | Address Range |
|---|
| ROM (Booting Codes) | 0x0000_0000~0x0000_FFFF |
| SRAM | 0x2000_0000~0x2000_FFFF |
| APB Peripherals | 0x4000_0000~0x4000_FFFF |
| (I2C Channel 0) | 0x4000_9000~0x4000_9FFF |
| (I2C Channel 1) | 0x4000_A000~0x4000_AFFF |
| AHB Peripherals | 0x4001_0000~0x4001_FFFF |
Table 2.
EDA tools applied at each stage of the ASIC design flow. All Synopsys tools are version 2021.06, ensuring a consistent sign-off environment across synthesis, timing, power, and parasitic extraction.
Table 2.
EDA tools applied at each stage of the ASIC design flow. All Synopsys tools are version 2021.06, ensuring a consistent sign-off environment across synthesis, timing, power, and parasitic extraction.
| Stage | Tool | Version |
|---|
| Logic Synthesis | Synopsys Design Compiler | 2021.06-SP4 |
| Place-and-Route | Synopsys ICC2 | 2021.06-SP4 |
| Static Timing Analysis | Synopsys PrimeTime | 2021.06 |
| Dynamic Timing Simulation | Synopsys VCS+Verdi | 2021.09 |
| Power Consumption Analysis | Synopsys PrimePower | 2021.06-SP5 |
| Net Parasitic Extraction | Synopsys StarRCXT | 2021.06-SP2 |
| Equivalence Check | Synopsys Formality | 2021.06 |
| Physical Verification | Siemens Calibre | aoi_cal_2014.1 |
| Merge & Layout Patterning | Cadence Virtuoso | IC617_ISR23 |
| On Chip Memory Generation | Samsung Memory Compiler | SRAM16384×32 |
Table 3.
Library corners and operating conditions applied for MCMM static timing analysis with OCV derating (1.036 late/0.964 early). FF (BC) at −40 °C is used for hold checks; SS (WC) at +125 °C is used for setup checks.
Table 3.
Library corners and operating conditions applied for MCMM static timing analysis with OCV derating (1.036 late/0.964 early). FF (BC) at −40 °C is used for hold checks; SS (WC) at +125 °C is used for setup checks.
| Corner | Temperature | STD | IO | Memory | OCV Derate |
|---|
| FF (BC) | −40 °C | 1.1 V | 1.95 V | 1.1 V | 1.036 |
| SS (WC) | +125 °C | 0.90 V | 1.65 V | 0.95 V | 0.964 |
Table 4.
Chip design results obtained from EDA tools. The operating frequency (250 MHz) is the design target, and the power value (5.592 mW) is a pre-silicon PrimePower estimate. Measured results are reported separately in
Table 5.
Table 4.
Chip design results obtained from EDA tools. The operating frequency (250 MHz) is the design target, and the power value (5.592 mW) is a pre-silicon PrimePower estimate. Measured results are reported separately in
Table 5.
| Items | Result |
|---|
| Process | Samsung 28 nm LPP CMOS |
| Bare Die Chip Size | 4000 m × 4000 m |
| Digital Design Area | 3958 m × 3958 m |
| SoC Area | 653 m × 769 m |
| Memory Area | 2 × (276.5 m × 769 m) |
| Core Area (Except Memory) | 100 m × 769 m |
| Gates Count | 2296 Gates |
| Memory Instance | SRAM 2 × (16,384 × 32) bits |
| Operating Frequency | Cortex-M0 SoC: 250 MHz (design target) |
| Power Consumption (Matrix Mult.) | 5.592 mW (averaged) |
| Process Reference Voltage | Core: 1.0 V, I/O: 1.8 V, Memory: 1.0 V |
| Package Type | LQFP 208 Type |
Table 5.
Matrix multiplication benchmark results measured on the fabricated ASIC at room temperature (+25 °C), averaged over ten runs. The optimal operating point at 1.0 V (highlighted in red). Execution time is consistent at 111 s across all conditions.
Table 5.
Matrix multiplication benchmark results measured on the fabricated ASIC at room temperature (+25 °C), averaged over ten runs. The optimal operating point at 1.0 V (highlighted in red). Execution time is consistent at 111 s across all conditions.
| Clock | Core | Power | Energy | Power | |
|---|
| Frequency | Voltage | Consumption | Efficiency | Efficiency | Remark |
| (Hz) | (V) | (mW) | (pJ/Cycle) | (MOPS/mW) | |
| 125 MHz | 0.90 | 32.4 | 259 | 265 | Near Threshold |
| 125 MHz | 1.00 | 17.5 | 140 | 491 | Optimal |
| 125 MHz | 1.02 | 21.8 | 174 | 394 | Normal |
| 125 MHz | 1.10 | 22.0 | 176 | 391 | Normal |
| 125 MHz | 1.20 | 24.0 | 192 | 358 | Normal |
| 130 MHz | 1.40 | 44.0 | 338 | 195 | High Voltage |
| 135 MHz | 1.60 | 61.0 | 453 | 141 | High Voltage |
Table 6.
Pre-silicon power breakdown estimated by PrimePower at 1.0 V, +25 °C, 125 MHz (matrix multiplication benchmark).
Table 6.
Pre-silicon power breakdown estimated by PrimePower at 1.0 V, +25 °C, 125 MHz (matrix multiplication benchmark).
| Power Group | Internal (mW) | Switching (mW) | Leakage (mW) | Total (mW) | Ratio (%) |
|---|
| Clock Network | 1.401 | 0.264 | <0.001 | 1.665 | 43.06 |
| I/O Cells | 1.461 | 0.009 | 0.680 | 2.150 | 55.61 |
| Memory | 0.008 | 0.000 | 0.036 | 0.044 | 1.14 |
| Core Logic | 0.001 | <0.001 | 0.006 | 0.007 | 0.19 |
| Total | 2.872 | 0.273 | 0.722 | 3.867 | 100.0 |
Table 7.
Comparison of platform-oriented SoC works in terms of methodology and extensibility. Works focused primarily on energy minimization through subthreshold operation are compared separately, as their design objectives differ fundamentally from the platform-oriented goals of this work.
Table 7.
Comparison of platform-oriented SoC works in terms of methodology and extensibility. Works focused primarily on energy minimization through subthreshold operation are compared separately, as their design objectives differ fundamentally from the platform-oriented goals of this work.
| Feature | CHIPKIT [17] | Tiny Tapeout [20] | OQPSK [21] | This Work |
|---|
| Silicon fabricated | Yes (16 nm) | Yes (130 nm) a | Yes (180 nm) a | Yes (28 nm) |
| Commercial foundry process | Yes (TSMC) | No | No | Yes (Samsung) |
| Post-silicon power measured | No | No | No | Yes (17.5 mW) |
| Voltage characterization | No | No | No | Yes (0.9–1.6 V) |
| FPGA-to-ASIC identical FW/driver | Partial c | No | No | Yes |
| Documented reproducible methodology | Partial | No | No | Yes |
| Standard AMBA bus (AHB + APB) | Yes | No b | No | Yes |
| Dual low-speed interface (I2C × 2) | No | No | No | Yes |
| SW driver + memory map template | No | No | No | Yes |
| Hard macro delivery | No | No | No | Yes |