3.1. Proposed Classification Scheme for TCAM Table
The proposed design uses the bits extracted from specific bit positions of the TCAM words to classify the TCAM table words into groups called as TCAM subtables. In the proposed partitioning scheme we extract
$lo{g}_{2}M$ classification bits from the specified bit positions of the TCAM words to produce
M subtables. For example, suppose two bits are used for the classification of a sample TCAM table of size
$6\times 6$ is presented in
Table 2. The TCAM table presented in
Table 2 is classified using two different set of bit positions
${S}_{1}=\{{b}_{0},{b}_{1}\}$ and
${S}_{2}=\{{b}_{1},{b}_{3}\}$ as shown in
Figure 2a,b, respectively. The subtables constructed based on the bit values (00, 01, 10, and 11) of bit positions
$\{{b}_{0},{b}_{1}\}$ are shown in
Figure 2a. The number of TCAM words in the constructed subtables
$S{T}_{0}$,
$S{T}_{1}$,
$S{T}_{2}$ and
$S{T}_{3}$ varies based on the pattern of bits in the bit positions
$\{{b}_{0},{b}_{1}\}$ selected for the classification of
Table 2. TCAM words with ‘x’ as bit value in the classification bit positions are stored in more than one subtable. For example, the TCAM word at address 2 has the bit values of ‘x1’ at
$\{{b}_{0},{b}_{1}\}$, and is thus stored in both subtables
$S{T}_{1}$ and
$S{T}_{3}$. This redundancy expands the resultant TCAM subtables. The classification of any realistic dataset based on a specific set of bit positions may not necessarily produce subtables of the same size.
The proposed solution is based on the concept that the classification effectiveness of a set of bit positions varies from that of other bit positions for a specific dataset when the target of the classification is to construct balancedsize subtables. The classification example of the TCAM table in
Table 2 using two different sets of bit positions is illustrated in
Figure 2. It shows that the classification using
${S}_{2}=\{{b}_{1},{b}_{3}\}$ is more effective for the TCAM table presented in
Table 2, as the constructed subtables are of balanced size (2), when compared with the unbalanced size subtables constructed for bit positions
${S}_{1}=\{{b}_{0},{b}_{1}\}$ (2, 3, 1, and 3) as explained above.
The constructed TCAM subtables are further mapped to the distinct rows of SRAM blocks in the proposed design.
Figure 3 shows the mapping of the proposed classification scheme constructed subtables to SRAM memory. The contents of the four bit places
${S}_{2}=\{{b}_{2},{b}_{3},{b}_{4},{b}_{5}\}$ of the unbalanced size subtables shown in
Figure 2a and that of bit places
${S}_{2}=\{{b}_{0},{b}_{2},{b}_{4},{b}_{5}\}$ of the balanced size subtables shown in
Figure 2b are vertically partitioned into width of two, further mapped to the SRAMs with depth
$D=4$ shown in
Figure 3a,b, respectively. It clearly illustrates that the SRAM memory requirement for storing balanced size TCAM subtables constructed for bit positions
${S}_{2}=\{{b}_{1},{b}_{3}\}$ is lower than that of unbalanced size subtables constructed for bit positions
${S}_{1}=\{{b}_{0},{b}_{1}\}$. The SRAM memory utilization overhead for preclassifying the TCAM table contents in the proposed approach is minimal as balanced size subtables are constructed based on effective classification bits.
Algorithm 1 describes the proposed classification scheme. It classifies the TCAM words of the $D\times W$ TCAM table into M subtables based on the comparison with the $lo{g}_{2}M$bit values extracted from specific bit positions. The resultant subtables formed are tested for the maximum depth bound (MDB) of $(\u23bf\frac{D}{M{R}_{W}}\u23cc+\alpha ){R}_{W}$, where ${R}_{W}$ is the width of the configured SRAM blocks of the design on FPGA and $\alpha $ is a scaling factor with integer values of $\alpha \ge 1$. If the number of TCAM words in the resultant subtables exceeds MDB, a subsequent set of bit positions is used for the classification of the TCAM table. In the worstcase scenario, all subsets of $lo{g}_{2}M$ bit positions from W bit positions are used to classify the TCAM table. The worstcase classification complexity of the proposed Algorithm 1 is reduced by using a relaxed MDB for the construction of subtables. An increase in the value of $\alpha $ by one increases the MDB of the subtables by ${R}_{W}$. A relaxed MDB of the subtables results in an increased RAM memory usage, as the resultant subtables are mapped to the SRAM blocks of the proposed design. The value of $\alpha $ provides a tradeoff between the time complexity of the proposed classification algorithm and the overall RAM memory usage of the proposed design.
The words of the M TCAM subtables are mapped to the M rows of the SRAM blocks of the architecture and the corresponding classification bit positions are used to configure the preclassifier bit positions in the proposed architecture.
Algorithm 1 Algorithm for the classification of the TCAM table into M subtables. 
INPUT:D ternary words of W bits: ${T}_{i,j}$, where ${T}_{i,j}\in \{0,1,\phantom{\rule{3.33333pt}{0ex}}$x${\}}^{W},i=0,1,\dots ,D1$, All possible subsets of $lo{g}_{2}M$ bit positions from W bit positions: ${S}_{u,v}$, where $u=1,2,\dots ,\left(\right)open="("\; close=")">\genfrac{}{}{0pt}{}{W}{lo{g}_{2}M}$, $v=0,1,\dots ,lo{g}_{2}M1.$ OUTPUT:M subtables (STs) with identification addresses of ${A}_{M}=0,1,2,\dots ,M1$, and each ST of $(\u23bf\frac{D}{M{R}_{W}}\u23cc+\alpha ){R}_{W}$ ternary words of W bits: $S{T}_{i,j}$, where $S{T}_{i,j}\in \{0,1,\phantom{\rule{3.33333pt}{0ex}}$x${\}}^{W},i=0,1,\dots ,(\u23bf\frac{D}{M{R}_{W}}\u23cc+\alpha ){R}_{W}1$. for$u=1,2,\dots ,\left(\right)open="("\; close=")">\genfrac{}{}{0pt}{}{W}{lo{g}_{2}M}$do for $i=0,1,\dots ,D1$ do // Check for the maximum depth bound if ($Siz{e}_{{A}_{M}}==(\u23bf\frac{D}{M{R}_{W}}\u23cc+\alpha ){R}_{W}$) then $\mathbf{break}$ else // Extraction of classification bits & construction of subtables ${C}_{bits}\leftarrow Extract({S}_{u,v},{T}_{i,j})$ if (${A}_{M}=={C}_{bits}$) then $Add\_ST({A}_{M},{T}_{i,j})$ $Siz{e}_{{A}_{M}}\leftarrow Siz{e}_{{A}_{M}}+1$ end if end if end for end for ${C}_{bits}$: Extracted classification bits $Siz{e}_{{A}_{M}}$: Size of constructed subtables

3.2. Proposed Architecture
The TCAM table of $D\times W$ size is classified into M subtables using the proposed classification scheme in Algorithm 1. The Wbit TCAM words of M subtables constructed are further divided into V subwords of $lo{g}_{2}{R}_{D}$bits. The resultant $M\times V$ subpartitions of the TCAM table are mapped to the M rows of the V SRAM blocks in the proposed architecture as shown in Figure 5. Each TCAM table subpartition of size $(\u23bf\frac{D}{M{R}_{W}}\u23cc+\alpha ){R}_{W}\times lo{g}_{2}{R}_{D}$ is implemented using an SRAM block. The SRAM block is a cascade of $(\u23bf\frac{D}{M{R}_{W}}\u23cc+\alpha )$ number of ${R}_{D}\times {R}_{W}$ size SRAM blocks.
The proposed architecture comprises a preclassifier unit and an SRAMbased TCAM. The preclassifier unit of the proposed architecture is shown in
Figure 4. The bit positions of the preclassifier bits are specified using Algorithm 1 provided
$lo{g}_{2}M$ number of preclassification bit positions. The preclassifier bits are extracted from the
$lo{g}_{2}M$ bit positions of the incoming TCAM words using the
$lo{g}_{2}M$ number of select lines of
Wto1 multiplexers as shown in
Figure 4. The extracted
$lo{g}_{2}M$ bits are further decoded to get an
Mbit control signal that selectively activates at most one row of SRAM blocks of the proposed architecture.
The proposed SRAMbased TCAM architecture is shown in
Figure 5. The incoming
Wbit TCAM word is divided into
V subwords of
$lo{g}_{2}{R}_{D}$bits. The
V subwords are provided as addresses to the selected row of
V SRAM blocks in parallel and
V SRAM words are read. The
V SRAM words read undergo a bitwise
AND operation and the resultant matching information bit vector is provided to the associated PE. The PE unit encodes the highestpriority matching bit position with the level high as the matching address.
3.3. Update Operation
The proposed TCAM design maps a new TCAM table dataset of the same size to the SRAM blocks of the configured architecture on FPGA. The SRAM blocks of the architecture has storage space for $(\u23bf\frac{D}{M{R}_{W}}\u23cc+\alpha ){R}_{W}$ number of $lo{g}_{2}{R}_{D}$bit TCAM subwords. Algorithm 1 finds the set of classification bit positions, which makes TCAM subtables considering the maximum depth limitation of the configured architecture SRAM blocks on FPGA. The updated TCAM subtables are mapped to the SRAM blocks of the design on FPGA. The corresponding classification bit positions from Algorithm 1 configure the preclassifier unit. The classification bits are now extracted from the updated set of bit positions for the incoming TCAM words. The proposed solution performs reconfiguration of the hardware design on FPGA in two cases: first, when the number of TCAM words in the Algorithm 1 constructed TCAM subtables exceeds the storage space of the configured architecture SRAM blocks, resulting in a relaxed MDB on the updated TCAM subtables. Second, when a TCAM table of different size is implemented in the proposed design.
During runtime, an update process of a TCAM word in proposed design includes the writing of the update word to the respective TCAM subtable first, and the updated subtable is then written to the corresponding row of SRAM blocks in the proposed architecture. The number of TCAM words in updated subtables is tested for the MDB of
$(\u23bf\frac{D}{M{R}_{W}}\u23cc+\alpha ){R}_{W}$ as this is the storage capacity of a row of configured SRAM blocks in proposed architecture. Owing to the presence of the don’tcare bits (x) in the TCAM words, EETCAM in the worst case writes the entire used SRAM memory to complete the update process of a TCAM word. The partitioned subtables are written in parallel to the corresponding SRAM blocks in the proposed architecture, and the depth
${R}_{D}$ of the configured BRAMs determines the update latency of EETCAM design. The update latency of EETCAM is 513 cycles. While native TCAMs have also comparable worst case TCAM write time of O(N) for updating a TCAM word, where N is the number of words in the TCAM table [
28,
29,
30,
31].