You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

18 March 2022

Improving Characteristics of LUT-Based Three-Block Mealy FSMs’ Circuits

,
,
and
1
Institute of Metrology, Electronics and Computer Science, University of Zielona Góra, ul. Licealna 9, 65-417 Zielona Góra, Poland
2
Department of Computer Science and Information Technology, Vasyl Stus’ Donetsk National University, 600-richya str. 21, 21021 Vinnytsia, Ukraine
3
Department of Infocommunication Engineering, Faculty of Infocommunications, Kharkiv National University of Radio Electronics, Nauky Avenue 14, 61166 Kharkiv, Ukraine
4
Department of Technology, The Jacob of Paradies University, ul. Teatralna 25, 66-400 Gorzów Wielkopolski, Poland
This article belongs to the Special Issue Feature Papers in Circuit and Signal Processing

Abstract

One of the very important problems connected with FPGA-based design is reducing the hardware amount in implemented circuits. In this paper, we discuss the implementation of Mealy finite state machines (FSMs) by circuits consisting of look-up tables (LUT). A method is proposed to reduce the LUT count of three-block circuits of Mealy FSMs. The method is based on finding a partition of set of internal states by classes of compatible states. To reduce the LUT count, we propose a special kind of state code, named complex state codes. The complex codes include two parts. The first part includes the binary codes of a state as the element of some partition class. The second part consists of the code of corresponding partition class. Using complex state codes allows obtaining FPGA-based FSM circuits with exactly four logic blocks. If some conditions hold, then any FSM function from the first and second blocks is implemented by a single LUT. The third level is represented as a network of multiplexers. These multiplexers generate either additional variable encoding collections of outputs or input memory functions. The fourth level generates FSM outputs. An example of synthesis and experimental results is shown and discussed. The experiments prove that the proposed approach allows reducing hardware compared to such methods as auto and one-hot of Vivado, JEDI. Further, the proposed approach produces circuits with fewer LUTs than for three-level Mealy FSMs based on joint use of several methods of structural decomposition. The experiments show that our approach allows reducing the LUT counts on average from 11 to 77 percent. As the complexity of an FSM increases, the gain from the application of the proposed method grows; the same is true for both the FSM performance and power consumption.

1. Introduction

The behavior of a sequential device can be represented by the model of a Mealy finite state machine (FSM) [,]. This stimulates constant development of various methods of designing Mealy FSM logic circuits [,]. As a rule, these methods are aimed at optimizing one or more basic characteristics of resulting FSM circuits []. There are three basic characteristics, namely: (1) the chip area occupied by an FSM circuit), (2) the operating frequency, and (3) the power consumption; however, as a rule, it is impossible to optimize these three characteristics at the same time. For example, a decrease in the required internal resources (the required chip area) is often associated with a decrease in the maximum operating frequency []. As it is known, the occupied chip area significantly affects other characteristics of an FSM circuit []. At the same time, it is important that reducing the area as little as possible increases the delay time of the circuit. As it is known [], the major challenge in the LUT-based FSM design is developing a low-area circuit without the compromising an FSM performance. In this paper, we propose a method to create Mealy FSMs whose three-level circuits are implemented using internal resources of field-programmable gate arrays (FPGAs) [,]. The proposed approach belongs to methods of structural decomposition [].
Recently, more and more digital systems are implemented using FPGA chips []. The analysis of VLSI’ market shows that Xilinx [] is the largest manufacturer of FPGA chips. This fact explains the orientation of our article to FPGAs of Xilinx. We discuss a case when an FSM circuit is implemented using internal resources of FPGAs such as look-up table (LUT) elements, programmable flip-flops, inter-slice multiplexers, programmable interconnects, synchronization tree, and programmable input–outputs.
Our current article is devoted to improving the LUT count of three-block LUT-based Mealy FSM circuits obtained with the simultaneous usage of the replacement of FSM inputs and the encoding of collections of FSM outputs []. The resulting FSM circuits have three blocks of LUTs; each block has a unique system of inputs and outputs. When certain conditions are met, the circuit of some (or even all) logic block is synthesized using the methods of functional decomposition [,]. Such blocks are represented by circuits having several levels of LUTs. This leads to significant decrease in the FSM operating frequency. Moreover, the interconnection system of a multi-level block becomes dramatically more complex, which leads to a further decrease in the FSM performance. This is why it is so important to reduce the number of levels in each logic block of FSM circuits.
The main contribution of this paper is a novel design method aimed at reducing the number of LUTs and their levels in circuits of three-block LUT-based Mealy FSMs. The reduction diminishes the total number of LUTs in an FSM circuit compared to this number for equivalent FSMs based on the functional decomposition. To apply our method, it is necessary to construct classes of compatible states. This in turn leads to an increase in the number of state variables compared to their minimum number. To reduce the number of state variables, we propose a new type of state codes. We name them complex state codes (CSC). A CSC of any state includes two parts. The first part is a code of a class of compatible states including the particular state. The second part is a code of this state as an element of a particular class. Our method produces four-block FSM circuits. In the best case, each block is represented by a single-level LUT-based circuit. As experimental results show, the proposed approach also provides the performance at the level of three-block FSMs and reduces the power consumption. These phenomena are additional positive qualities of the proposed method.
The further text of the paper includes five sections. Section 2 shows the background of LUT-based Mealy FSMs. Section 3 analyses the related works. The main idea of the proposed method is shown in Section 4. An example of a CSC-based FSM synthesis is shown in Section 5. Section 6 analyses the results of experiments. The paper ends with a short conclusion.

2. Basic Information

A Mealy FSM logic circuit can be represented by two systems of Boolean functions (SBFs) []. One of these SBFs represents FSM outputs connected with operational units of a particular digital system. The second SBF represents input memory functions (IMFs). The arguments of these SBFs are external FSM inputs and internal state variables. The inputs form a set X = { x 1 , , x L } ; the IMFs create a set Φ = { D 1 , , D R } . An FSM circuit is represented by the following SBFs:
Y = Y ( T , X ) ;
Φ = Φ ( T , X ) .
The state variables T r T encode internal states from a set A = { a 1 , , a M } . To encode M states, the minimum number of state variables is determined as []
R = l o g 2 M .
Each state a m A is represented by a binary code K ( a m ) having R bits. These codes are kept into the state code register (RG). In this article, we discuss a case when the RG has informational inputs of D type. This is the most common case []. The systems (1) and (2) determine so called P Mealy FSM [] shown in Figure 1.
Figure 1. Structural diagram of P Mealy FSM.
In Figure 1, the block of IMFs is implemented using SBF (2); the block of outputs is based on SBF (1). The state register has R D flip-flops. The r-th flip-flop keeps the state variable T r T . The pulse S t a r t allows clearing the content of RG. This pulse loads a code of the initial state a 1 A into RG. As a rule, the code K ( a 1 ) consists of zeros. The pulse C l o c k shows an instant when the RG content can be changed by current IMFs.
As a rule, an FSM is represented by either a state transition table (STT) [] or a state transition graph (STG) []. To obtain the systems (1) and (2), it is necessary to form an FSM direct structure table (DST) []. In this article, we start from the STG. Next, this graph is transformed into the equivalent STT. Using the STT, we construct the DST.
An STG is a directed graph whose nodes correspond to FSM states. Interstate transitions are represented by edges of STG. Each edge is marked by a combination of inputs causing a particular transition and collection of outputs (COs) generated during this transition. An STT is a representation of STG as a list of interstate transitions. An STT includes five columns with: a current state a m ; a state of transition a T ; an input signal X h which is a conjunction of some inputs (or their complements) determining this particular transition; CO Y h generated during this transition; h is a number of transition ( h { 1 , , H } ) [].
A DST includes the columns with state codes and IMFs []. These columns are: the code of the current state K ( a m ) , the code of the next state K ( a T ) , and a collection of IMFs D h Φ equal to 1 to load the code of the next state into the state register RG.
In this paper, we consider a case when internal resources of FPGA chips are used for implementing SBFs (1) and (2). An FSM circuit is implemented using configurable logic blocks (CLB) of FPGAs produced by Xilinx []. A circuit is represented as a network of CLBs connected with help of a programmable routing matrix []. In this paper, we discuss a case when CLBs include LUTs, multiplexers, and programmable flip-flops. Using the notation [], we denote as I L -LUT a single-output LUT with I L inputs. If a Boolean function depends on up to I L arguments, then it is represented by a single-LUT logic circuit. If the number of LUT inputs is less than the number of arguments, then a circuit has more than a single level of LUTs. To implement multi-level circuits, the methods of functional decomposition (FD) are used [,]. As a rule, the FD-based circuits have the complicated systems of “spaghetti-type” interconnections [].
We discuss a case when each CLB is a part of slice []. The slice includes internal multiplexers. They can be used for changing the number of LUT inputs within one slice. The internal multiplexers are connected with LUTs by a system of fast inter-slice interconnections. Due to this, the delay time for 6-, 7-, and 8-input LUTs is practically the same for SLICEL of Virtex-7 [,]. This approach makes it possible to flexibly adapt the LUT parameters to the characteristics of the function being implemented. For example, the SLICEL of Virtex-7 includes four 6-LUTs, 8 flip-flops, and 27 multiplexers []. Each 6-LUT can be used as two 5-LUTs with shared inputs. This explains the presence of eight flip-flops in each SLICEL. Using internal multiplexers allows combining two 6-LUTs into a single 7-LUT. Next, four 6-LUTs can be combined into a single 8-LUT. The control inputs of multiplexers can be used as inputs of 7- and 8-LUTs. Each SLICEL possesses special carry chains used for organization of fast multi-bit adders. It is worth noting that these circuits can be used to implement arbitrary logic circuits [,].
In this paper, we use multiplexers to generate functions (1) and (2). We denote a multiplexer having K data inputs as K M X . Using a single 6-LUT, we can implement a circuit of 4 M X . It has two control inputs and four data inputs. Further, we can organize an 8 M X with the help of two 6-LUTs. Its circuit has only slightly bigger delay than a circuit of a 4 M X []. It is possible due to using fast interconnections inside a slice. If a 16 M X has the control inputs T 1 T 4 , then its circuit includes four 6-LUTs controlled by T 3 T 4 . To implement a 32 M X , two slices and inter-slice interconnections are used. As a result, a 32 M X is much slower than a 16 M X .
In LUT-based FSMs, the flip-flops of RG are distributed among LUTs generating functions (1). Due to this, the RG is “hidden” inside the slices where the IMFs are generated. There are two blocks in an LUT-based P Mealy FSM (Figure 2).
Figure 2. Structural diagram of LUT-based P Mealy FSM.
In this paper, we denote as LUTer a logic block consisting of LUT-based CLBs. In the P Mealy FSM (Figure 2), a L U T e r Y implements SBF (1) and L U T e r T implements SBF (2). To control the RG, the pulses S t a r t and C l o c k are used.
If each function from systems (1) and (2) depends on not more than I L arguments, then both blocks of LUT-based P Mealy FSM (Figure 2) are represented by single-level circuits. For Xilinx-based solutions, an LUT has 6 inputs []. There is no point in increasing this value, because I L = 6 provides the best balance for such LUT characteristics as the occupied chip area, performance and consumed power []; however, even for FSMs with average complexity [], it could be up to 40 arguments in functions (1) and (2). Obviously, there is a distinct imbalance between such big number of arguments in SBFs representing FSM circuits and a fairly small value of LUT inputs. This imbalance requires improving synthesis methods of LUT-based FSMs.
Denote as N A ( f i ) the number of literals in a sum-of-product of function f i Φ Y . If the condition
N A ( f i ) > I L
holds, then it is impossible to represent the function f i Φ Y by a single-level circuit. In this case, it is very important to optimize the system of connections between different slices of an FSM circuit. This follows from the fact that more than 70% of the power consumption is due to the interconnections []. Moreover, time delays of the interconnection system are starting to play a major role in comparison with CLB delays []. The results of research [] show that the optimization of interconnections leads to increasing the maximum operating frequency and reducing the power consumption of LUT-based FSM circuits. This can be performed, for example, using various methods of structural decomposition [].

4. Main Idea of the Proposed Method

The proposed method is based on finding a partition Π C = { A 1 , , A J C } of the set A by J C classes of compatible states. The same state variables are used for encoding states from different compatibility classes. The states are encoded by codes C ( a m ) using R A state variables:
R A = m a x ( l o g 2 M 1 , . . . , l o g 2 M J C ) .
A code C ( a m ) determines the state a m A as the element of a particular class of Π C . The classes A j Π C are encoded by class codes K ( A j ) . These codes include R C bits:
R C = l o g 2 J C .
We propose to represent FSM states a m A by the complex state codes denoted as C S C ( a m ) . For any state a m A j , a CSC is a concatenation of the class code K ( A j ) and a state code C ( a m ) :
C S C ( a m ) = K ( A j ) * C ( a m ) .
In (14), the sign “*” denotes the concatenation of the codes. There are R B state variables in CSCs. The value of R B is determined as
R B = R C + R A .
To encode the classes, we use the variables from the set T C = { T 1 , , T R C } . To encode states as elements of classes A j Π C , we use R C variables from the set T A = { T R C + 1 , , T R B } . Together, these sets form a set T = T C T A having R B elements.
The proposed method of state assignment is aimed at the reducing LUT count for LUT-based circuits of M P Y FSMs. The method is based on the joint application of: (1) the replacement of FSM inputs; (2) the encoding of collections of outputs; (3) the encoding of states by complex state codes. As a result, we propose to replace M P Y FSMs by M P C Y FSMs. The subscript “C” means that the complex state codes are used in M P Y FSM. There is the structural diagram of M P C Y FSM shown in Figure 4.
Figure 4. Structural diagram of LUT-based M P C Y Mealy FSM.
There are four levels of logic blocks in M P C Y FSMs. The first level is represented by LUTerP. This block implements the SBF (5).
The second level includes J C blocks L U T e r j , where j { 1 , , J C } . A class A j Π C determines three sets of variables. The set P j P includes additional variables p g P determining transitions from the states a m A j . The set Φ j contains IMFs generated during the transitions from the states a m A j . The set Z j consists of the variables z r Z equal to 1 in codes of COs produced during the transitions from the states a m A j determined by each class A j Π C . Each block L U T e r j produces the following partial functions:
Φ j = Φ j ( T A , P j ) ;
Z j = Z j ( T A , P j ) .
The block L U T e r T Z represents the third logic level. It consists of R Y + R B multiplexers generating IMFs D r Φ and additional variables z r Z . The data inputs of these multiplexers are the partial functions (16) and (17). To select a particular partial function, we use the class variables T r T C . So, the multiplexers generate the following SBFs:
D r = D r ( T C , D r 1 , , D r J C ) ( r { 1 , , R B } ) ;
z r = z r ( T C , z r 1 , , z r J C ) ( r { 1 , , R Y } ) .
The functions (18) enter the inputs of the flip-flops that make up the hidden register RG. Due to this, the control signals C l o c k and S t a r t enter this block.
The fourth logic level is represented by the block L U T e r Y . It implements the SBF (7).
So, there are four levels of logic blocks in the circuits of M P C Y Mealy FSMs. In the best case, each block is represented by a single-level LUT-based circuit.
In this paper, we propose a synthesis method for LUT-based M P C Y Mealy FSMs. We start the synthesis process from an FSM state transition graph. The proposed method includes the following steps:
(1)
Creating the state transition table of Mealy FSM.
(2)
Constructing the partition Π C of the set of states by classes of compatible states.
(3)
Encoding of FSM states by complex state codes C S C ( a m ) .
(4)
Executing the replacement of FSM inputs by additional variables p g P .
(5)
Creating SBF (5) representing L U T e r P .
(6)
Encoding of collections of outputs by codes K ( Y q ) .
(7)
Creating SBF (7) representing L U T e r Y .
(8)
Creating direct structure table of M P C Y Mealy FSM.
(9)
Creating tables of blocks of partial functions L U T e r 1 L U T e r J C .
(10)
Creating SBFs (16) and (17) representing the second level of M P C Y Mealy FSM logic circuit.
(11)
Creating table of L U T e r T Z .
(12)
Creating SBFs (18) and (19) representing the third level of the logic circuit.
(13)
Implementing the LUT-based circuit of M P C Y Mealy FSM using internal resources of a particular FPGA chip.
The partition Π C is created using the method []. This approach allows minimizing LUT counts in the resulting Mealy FSM circuits. If it is possible, each class of compatible states should include the maximum possible number of states. This helps minimizing the number of classes (and the blocks of the second level of logic). In turn, this optimizes the number of LUTs in the circuit of L U T e r T Z . Any multiplexer from this block is implemented as a single LUT if the following condition takes place:
R C + J C I L .
Even if condition (20) is violated, then the multiplexers could be implemented as single-level circuits. This is possible, if the number of partial functions for a given function f i Φ Y does not exceed the value I L R C . Otherwise, the internal multiplexers of CLBs are used for generating functions (18) and (19).

5. Example of Synthesis

We use the symbol M P C Y ( S a ) to show that the model of M P C Y Mealy FSM (Figure 4) is used to implement the circuit of an FSM S a . Consider an FSM S 0 represented by its STG (Figure 5). Let us synthesize the circuit of Mealy FSM M P C Y ( S 0 ) using 5-LUTs.
Figure 5. The state transition graph of Mealy FSM S 0 .
Step 1. The h-th edge of an STG is transformed into a row of an STT []. There are 19 edges in the STG (Figure 5). So, it should be H = 19 rows in the corresponding STT. The transformation is executed in a trivial way []. Table 1 is a resulting STT of FSM S 0 . The following sets can be derived from Table 1: the set of states A = { a 1 , , a 8 } , the set of inputs X = { x 1 , , x 10 } , and the set of outputs Y = { y 1 , , y 7 } . This gives the following parameters: M = 8 , L = 10 , and N = 7 .
Table 1. State transition table of Mealy FSM S 0 .
Step 2. Using the methods [], we can obtain the partition Π C = { A 1 , A 2 } with J C = 2 . There are the following classes of this partition: A 1 = { a 1 , , a 4 } and A 2 = { a 5 , , a 8 } . So, there is M 1 = M 2 = 4 . Using (12), we can obtain the value R A = m a x ( l o g 2 M 1 , l o g 2 M 2 ) = 2 . Using (13), we can obtain the value R C = 1 . Now, we have the sets T = { T 1 , T 2 , T 3 } , T C = { T 1 } , and T A = { T 3 , T 4 } .
Step 3. As known [], the state codes do not affect the number of LUTs in circuits of FSMs based on twofold or extended state codes []. So, the states can be encoded in the arbitrary way. For our example, one of the possible outcomes of the state assignment is shown in Figure 6.
Figure 6. Complex state codes of Mealy FSM M P C Y ( S 0 ) .
The following class and state codes can be found from Figure 6: K ( A 1 ) = 0 and K ( A 2 ) = 1 , C ( a 1 ) = C ( a 5 ) = 00 , , C ( a 4 ) = C ( a 8 ) = 11 . Using the codes of classes of compatible states K ( A j ) and state codes C ( a m ) gives the following complex state codes: C S C ( a 1 ) = 000 , C S C ( a 2 ) = 001 , , C S C ( a 4 ) = 011 , and C S C ( a 8 ) = 111 .
Step 4. To execute the replacement, we should find the minimum value of additional variables, G. To do it, we use the methods from []. It is necessary to analyze sets X ( a m ) X including FSM inputs which determine the transitions from states a m A []. These sets can be found using either the STG (Figure 5) or STT (Table 1). In the discussed case, there are the following sets: X ( a 1 ) = { x 1 , x 2 } , X ( a 2 ) = { x 3 , x 4 } , X ( a 3 ) = { x 6 } , X ( a 4 ) = { x 5 } , X ( a 5 ) = { x 5 , x 7 } , X ( a 6 ) = , X ( a 7 ) = { x 8 , x 9 } , and X ( a 8 ) = { x 10 } . If L ( a m ) is a number of elements in the set X ( a m ) X , then L ( a 1 ) = L ( a 2 ) = L ( a 5 ) = L ( a 7 ) = 2 , L ( a 3 ) = L ( a 4 ) = L ( a 8 ) = 1 , L ( a 6 ) = 0 .
The value of G is equal to the maximum value of L ( a m ) . Obviously, there is G = 2 . So, it is enough G = 2 additional variables to replace L = 10 inputs: P = { p 1 , p 2 } .
The columns of table of inputs’ replacement are marked by FSM states a m A , the rows are marked by additional variables p g P . If an input x l X is replaced by a variable p g P in a state a m A , then this input is written at the intersection of the corresponding column and row. Using methods from [] gives the table of replacement (Table 2).
Table 2. Replacement of inputs for Mealy FSM M P C Y ( S 0 ) .
Step 5.Table 2 is a base for finding SBF (5). The following SBF can be derived from Table 2:
p 1 = A 1 x 1 A 2 x 3 A 3 x 6 A 5 x 7 A 7 x 8 A 8 x 10 ; p 2 = A 1 x 2 A 2 x 4 A 4 x 5 A 5 x 5 A 7 x 9 .
In (21), the symbol A m stands for a conjunction of state variables corresponding to the state a m A . Obviously, each of Equation (21) can be implemented as 8 M X .
Step 6. There are Q = 9 different collections of outputs in STT (Table 1). They are the following: Y 1 = , Y 2 = { y 1 , y 2 } , Y 3 = { y 5 } , Y 4 = { y 4 } , Y 5 = { y 3 , y 6 } , Y 6 = { y 2 , y 5 } , Y 7 = { y 4 , y 7 } , Y 8 = { y 3 } , Y 9 = { y 3 , y 7 } .
To optimize the circuit of L U T e r Y , it is necessary to encode COs in a way minimizing the total number of literals in SBF (7) []. Each literal determines an interconnection between L U T e r T Z and L U T e r Y . Using the approach from [], we can encode the COs as it is shown in Figure 7.
Figure 7. Outcome of encoding of COs for Mealy FSM M P C Y ( S 0 ) .
Step 7. Using contents of COs and their codes (Figure 7) gives the following SBF:
y 1 = Y 2 = z 2 z 3 ¯ ; y 5 = Y 3 Y 6 = z 1 ¯ z 2 ¯ z 4 ; y 2 = Y 2 Y 6 = z 3 ¯ z 4 ; y 6 = Y 5 = z 1 ¯ z 2 z 3 ; y 3 = Y 5 Y 8 Y 9 = z 3 z 4 ¯ ; y 7 = Y 7 Y 9 = z 1 z 2 ¯ . y 4 = Y 4 Y 7 = z 1 z 4 ;
There are 16 literals in (22). The maximum number of literals is equal to N R Y = 7 · 4 = 28 . So, due to encoding shown in Figure 7, the number of literals (and interconnections) has almost halved.
Step 8. The DST of M P Y Mealy FSM is constructed using the initial STT, codes of states and COs, and a table of replacement of inputs. A DST includes the following columns: a m , K ( a m ) , a T , K ( a T ) , P h , Φ h , Z h , h. The columns of state codes include codes from Figure 6. The column P h is constructed using the initial STT and table of replacement of inputs (Table 2). The column Φ h includes IMFs equal to 1 for loading the code K ( a T ) into state register. The column Z h includes variables z r Z equal to 1 in the code K ( Y q ) of CO written in the h-th row of STT. This column is constructed using the initial STT and codes of COs (Figure 7).
In the discussed case, the DST is represented by Table 3. Let us analyze the first row of Table 3. There is the input x 1 in this row of Table 1. As follows from Table 2, the input x 1 is replaced by the additional variable p 1 in the state a 1 . For this row, there is the following relation: a T = a 4 . As follows from Figure 6, there is K ( a 4 ) = 011 . Due to this, column Φ h of Table 3 contains D 2 = D 3 = 1 in row h = 1 . In row 1 of Table 1, there is the CO Y 2 in column Y h . As follows from Figure 7, there is K ( Y 2 ) = 0101 . Due to this, column Z h of Table 3 contains z 2 = z 4 = 1 in row h = 1 . All other rows of Table 3 are constructed in the same way.
Table 3. Direct structure table of Mealy FSM U 4 ( S 1 ) .
Step 9. These tables are constructed using the classes A j Π C , DST of M P Y Mealy FSM, codes C ( a m ) and C S C ( a m ) . For the discussed example, there is J C = 2 . So, there are two blocks ( L U T e r 1 and L U T e r 2 ) generating the partial functions (16) and (17). The transitions from the states from the class A 1 Π C are represented by Table 4, for the class A 2 Π C by Table 5.
Table 4. Table of L U T e r 1 of Mealy FSM M P C Y ( S 0 ) .
Table 5. Table of L U T e r 2 of Mealy FSM M P C Y ( S 0 ) .
There is a transparent correspondence between Table 3, on the one hand, and tables of L U T e r 1 (Table 4) and L U T e r 2 (Table 5), on the other hand. There are H 1 = 10 rows in Table 4 and H 2 = 9 rows in Table 5. Obviously, the following equality takes place: H 1 + H 2 = H = 19 .
Step 10. The following sets can be found from Table 4 and Table 5: P 1 = P 2 = P , Φ 1 = Φ 2 = Φ , and Z 1 = Z 2 = Z . It means that each L U T e r j contains R B + R Y = 7 5-LUTs. Together, this gives 14 5-LUTs in the mutual circuit of L U T e r 1 and L U T e r 2 .
The functions (16) and (17) are constructed in the trivial way. For example, the following SBF of partial functions D 1 1 , D 1 2 , z 1 1 , and z 1 2 can be derived from Table 4 and Table 5:
D 1 1 = T 2 ¯ T 3 p 1 T 2 ¯ T 3 p 1 ¯ p 2 T 2 T 3 ¯ p 1 ; D 1 2 = T 2 ¯ T 3 ¯ p 2 ¯ T 2 ¯ T 3 T 2 T 3 ¯ T 2 T 3 p 1 .
z 1 1 = T 2 ¯ T 3 ¯ p 1 ¯ p 2 ¯ T 2 T 3 ¯ p 1 ; z 1 2 = T 2 ¯ T 3 ¯ p 2 ¯ T 2 T 3 p 2 .
All other partial functions are created in the same manner.
Step 11. The table of L U T e r T Z is constructed using sets Φ j and Z j where j { 1 , , J C } . The table contains the columns "Function" and "j". For our example, this block is represented by Table 6.
Table 6. Table of L U T e r T Z of Mealy M P C Y ( S 0 ) .
For example, the IMF D 1 appears in both tables. Due to this, there are ones in columns with j = 1 and j = 2 . All other rows are filled used the similar analysis.
Step 12. The SBFs (18) and (19) representing the third level of the logic circuit are constructed in the trivial way. They include two components: (1) conjunctions of variables T r T C corresponding to class codes and (2) corresponding partial functions. For example, functions D 1 and z 1 are represented by the following SBF:
D 1 = T 1 ¯ D 1 1 T 1 D 1 2 ; z 1 = T 2 ¯ z 1 1 T 1 ¯ z 1 2 .
All other functions (18) and (19) are constructed in the same manner.
Step 13. To implementing the LUT-based circuit of Mealy FSM M P C Y ( S 0 ) , it is necessary to use some CAD tools. In the case of FPGAs from Virtex-7, the system Vivado [] should be used; for our simple example we can design this circuit manually.
As follows from (21), there are nine literals in the sum-of-products of p 1 and eight literals in the sum-of-products of p 2 . The circuit should be implemented using LUTs with I L = 5 inputs. So, the condition (4) holds. To implement the circuit of L U T e r P , it is necessary to apply the methods of FD [,]. As a result, we obtain a two-level circuit of L U T e r P including six LUTs.
In the discussed case, each function f i Z Φ is represented by J C = 2 partial functions. Further, the condition (11) holds for the blocks of Π C . Due to this, there are enough 2 ( R B + R Y ) = 14 5-LUTs for implementing the circuits of L U T e r 1 - L U T e r 2 . Since ( R B + R Y ) = 7 , there are seven LUTs in the circuit of L U T e r T Z . As follows from (22), there are seven LUTs in the circuit of L U T e r Y .
So, there are 34 5-LUTs in the circuit of Mealy FSM M P C Y ( S 0 ) . This circuit has five levels of LUTs (Figure 8).
Figure 8. Logic circuit of Mealy FSM M P C Y ( S 0 ) .
In this circuit, L U T e r P is represented by LUT1–LUT6. This circuit has two levels of LUTs shown in Figure 8. The B u s X T delivers the inputs x l X and state variables T r T A for generating the additional variables p g P . These variables enter B u s P T to be transformed into the partial functions (16) and (17). The transformation is executed by L U T e r 1 and L U T e r 2 . These blocks include elements LUT7–LUT20. The fourth level of the FSM circuit is represented by LUT21–LUT27. The IMFs are generated by LUT21–LUT23. The outputs of these LUTs are connected with flip-flops implementing the register RG. The flip-flops are controlled by the pulses S t a r t and C l o c k . The variables z r Z are generated by LUT24–LUT27. The outputs of L U T e r T Z form the B u s T Z . At last, level five consists of seven LUTs (LUT28–LUT34) creating the circuit of L U T e r Y .
We compared the characteristics of the 5-LUT-based circuits of M P C Y ( S 0 ) and M P Y ( S 0 ) FSMs. In both cases, there is the same number of flip-flops in the state register ( R = R B = 3 ). In the case of M P Y ( S 0 ) , there are six LUTs in the L U T e r P and seven LUTs in L U T e r Y . There are two levels of LUTs in the circuit of L U T e r P . So, these subcircuits are the same for M P C Y ( S 0 ) and M P Y ( S 0 ) . There are two levels of LUTs in the circuit of L U T e r T Z of M P Y ( S 0 ) . This block’s circuit includes 24 LUTs. So, there are 6 + 24 + 7 = 37 LUTs in the circuit of M P Y ( S 0 ) .
Thus, for the FSM S 0 , the transition from model M P Y ( S 0 ) to model M P C Y ( S 0 ) allows you to reducing the LUT count by 1.088 times. Note that both circuits have the same number of logical levels; therefore, the model proposed in this article allows reducing the number of LUTs without reducing the operating frequency compared to the circuit of equivalent M P Y ( S 0 ) FSM. In the next Section, we compare some FSM models with the one proposed in this article.

6. Experimental Results

In this Section, we show the results of experiments which have been conducted to compare characteristics of M P C Y Mealy FSMs with characteristics of FSM circuits based on some other models. To conduct the experiments, we use: (1) the internal resources of Virtex-7; (2) the benchmark FSMs from the library []; (3) the industrial package Vivado []. The library [] includes 48 benchmarks represented in the format KISS2. The benchmarks have a wide range of basic characteristics (numbers of states, inputs, and outputs). They are used very often by different researchers to compare area and time characteristics of FSMs obtained using various synthesis methods. The characteristics of benchmarks are shown Table 7.
Table 7. Characteristics of benchmark Mealy FSMs [].
We executed the experiments using a personal computer with the following characteristics: CPU: Intel Core i7 6700 K 4.2@4.4 GHz, Memory: 32 GB RAM 2400 MHz CL15. Further, we used the Virtex-7 VC709 Evaluation Platform (xc7vx690tffg1761-2) [] and CAD tool Vivado v2019.1 (64-bit) []. There is I L = 6 for FPGAs of Virtex-7. To obtain the results of experiments, the reports produced by Vivado are used. To enter Vivado, we use thed CAD tool K2F [].
We compared three basic characteristics of resulting FSM circuits. These parameters are: (1) the LUT count; (2) the time of cycle; (3) the power consumption. In addition, two integral characteristics were investigated, namely: (1) the area-time products and (2) the area-time-power products. To conduct the experiments, five FSM models were used. They are: (1) Auto of Vivado (it uses binary state codes); (2) one-hot of Vivado; (3) JEDI; (4) M P Y -based FSMs; (5) M P C Y -based FSMs proposed in this article. Obviously, the first three methods are based on the model of P FSM shown in Figure 2.
Based on the methodology [], we divide the benchmark FSMs [] by five categories. To divide the benchmarks, we use the relation between the values of R + L and I L . There is I L = 6 for LUTs of Virtex-7. We use this value to divide the benchmarks by the categories.
The benchmarks belong to category of trivial FSMs (category 0), if the following condition holds: R + L 6 . This category includes the following 11 benchmarks: b b t a s , d k 17 , d k 27 , d k 512 , e x 3 , e x 5 , l i o n , l i o n 9 , m c , m o d u l o 12 , and s h i f t r e g . The benchmarks belong to category of simple FSMs (category 1), if there is R + L 12 . The category 1 consists of the benchmarks bbara, bbsse, beecount, cse, dk14, dk15, dk16, donfile, ex2, ex4, ex6, ex7, keyb, mark1, opus, s27, s386, s840, and s s e . The benchmarks belong to category of average FSMs (category 2), if R + L 18 . The category 2 contains the benchmarks ex1, kirkman, planet, planet1, pma, s1, s1488, s1494, s1a, s208, styr, and t m a . The benchmarks belong to category of big FSMs (category 3), if the following condition takes place: R + L 24 . The category three includes only the benchmark s a n d . The category of very big FSMs (category 4) includes benchmarks satisfying relation R + L > 24 . The benchmarks s420, s510, s820, and s832 belong to this category.
The results of experiments are shown in Table 8, Table 9, Table 10, Table 11 and Table 12. There is the same organization of these tables. The investigated methods are listed in the table columns. The table rows contain the names of benchmarks. Inside each table, the benchmarks are listed in alphabetical order, and sorted by ascending category number. The rows “Total” contain results of summation of values for each column. The row “Percentage” includes the percentage of summarized characteristics of FSM circuits produced by other methods respectively to M P C Y -based FSMs. We use the model of P Mealy FSM as a starting point for methods Auto, one-hot, and JEDI. The basic data (the LUT count, time, and power consumption) are taken from reports of Vivado. Next these data were used to obtain the integral characteristics.
Table 8. Experimental results (LUT counts).
Table 9. Experimental results (the latency time for all benchmarks, nsec).
Table 10. Experimental results (Total On-Chip Power, Watts).
Table 11. Experimental results (area-time products).
Table 12. Experimental results (area-time-power products).
Let us analyze the experimental data taken from reports of Vivado. These tables contain the following data: (1) the LUT counts (Table 8); (2) the minimum time of cycle (Table 9); (3) the total On-Chip Power (Table 10); (4) the area-time products (Table 11); (5) the area-time-power products (Table 12). In addition, we compared each of the characteristics for each category; however, in order to avoid a significant increase in the size of the article, we did not show the corresponding tables. We just showed the results of these comparisons.
As follows from Table 8, the M P C Y –based FSMs require fewer LUTs than it is for other investigated counterparts. Using the proposed approach, we can obtain circuits having 52.19% less 6-LUTs than it is for equivalent Auto–based FSMs; 77.1% less 6-LUTs than for equivalent one-hot–based FSMs; 25.34% less 6-LUTs than for equivalent JEDI–based FSMs. Our approach produces circuits having on average 11.36% less 6-LUTs than the circuits of M P Y -FSMs.
Using Table 8, we can compare LUT counts for different categories of benchmark FSMs. Comparing the results for category 0 shows that both multi-level approaches ( M P Y and M P C Y ) lose out to the other methods. This loss is 30.4% compared to auto-based FSMs, 3.4% compared to one-hot-based FSMs, and 31.5% compared to JEDI-based FSMs. We explain this by the fact that condition (4) is not satisfied for benchmark FSMs of the category 0. This means that only a single LUT is needed to implement any function for systems (1) and (2). Obviously, for category 0, the replacement of inputs should not be performed for both M P Y and M P C Y FSMs; however, the encoding of output collections is always performed for these multi-level FSMs. Due to this, for the category 0, the multi-level FSMs have higher LUT counts than they are for other investigated design methods. Let us point out, that equivalent M P Y - and M P C Y -FSMs have the same LUT counts for this category.
Starting from category 1, the condition (4) is met. At the same time, it makes sense to use structural reduction methods instead of methods of functional decomposition. For this category, using the complex state codes in M P Y FSMs allows obtaining FSM circuits with fewer LUTs than it is for other methods used in our experiments. This gain is 40.0% compared to auto-based FSMs, 81.1% compared to one-hot-based FSMs, 16.2% compared to JEDI-based FSMs, and 11.0% compared to M P Y FSMs.
As follows from this part of research, the winnings increase with the increase in the category number. The gain in LUTs increases up to 65.64% (for categories 2–4) compared to auto-based FSMs. The gain increases up to 65.64% (for categories 2–4) compared to one-hot-based FSMs. Comparison with JEDI-based FSMs shows that the gain increases up to 34.86% (for categories 2–4). At last, compared to M P Y -based FSMs, the gain increases from 8.44% (for categories 0–1) to 12.73% (for categories 2–4).
As follows from Table 9, the M P C Y -based FSMs are faster than their investigated counterparts. They require a cycle time 9.39% less than the equivalent auto-based FSMs, 10.24% less than one-hot-based FSMs, and 1.08% less than the equivalent JEDI-based FSMs. win 18.73%. They also marginally benefit (0.31%) in relation to M P Y FSMs. It follows from this that our approach allows reducing the number of LUTs without losing performance. As we have already noted, this is the greatest challenge associated with the optimization of chip area occupied by an FSM circuit. So, our approach allows overcoming this obstacle.
Using Table 9, we have compared time characteristics for different categories of benchmark FSMs. Comparing the results for category 0 shows that M P C Y -based FSMs lose out to the other methods. This loss is 3.23% compared to auto-based FSMs, 0.2% compared to one-hot-based FSMs, 3.68% compared to JEDI-based FSMs, and 1.13% compared to M P Y -based FSMs. As it is for LUT counts, we explain this by the fact that condition (4) is not satisfied for benchmarks of this category. Starting from category 1, the condition (4) is met. This allows obtaining some gain compared to FSMs based on both Auto (3.48%) and one-hot (3.37%); however, other models provide better performance than our approach (3.84% compared to JEDI and 1.93% compared to M P Y ).
Starting from the category 2, our approach gives better results compared to all other investigated methods. This gain for the category 2 is the following: (1) 24.17% compared to Auto; (2) 25.96% compared to one-hot; (3) 8.88% compared to JEDI; (4) 3.78% compared to M P Y FSMs. For the category 3, the gain increases. It is the following: (1) 31.49% compared to Auto; (2) 31.49% compared to one-hot; (3) 20.24% compared to JEDI; (4) 4.76% compared to M P Y FSMs. Further, there is a gain for category 4; however, the gain is less than for category 3. It is the following: (1) 18.73% compared to Auto; (2) 16.48% compared to one-hot; (3) 7.96% compared to JEDI; (4) 3.07% compared to M P Y FSMs. We explain this decrease in winnings by an increase in the number of levels in the circuits of M P C Y -based FSMs compared to their number for category 3; however, the following conclusion can be drawn: the proposed approach allows obtaining faster LUT-based FSM circuits starting from category 2.
The Vivado provides us by information about the total on-chip power. We combine these reports in Table 10. As follows from Table 10, the M P C Y -based FSMs consume less energy than their investigated counterparts. On average, they provide the following gain in power consumption: (1) 47.02% compared to auto-based FSMs; (2) 59.17% compared to one-hot-based FSMs; (3) 23.96% compared to JEDI-based FSMs; (4) 5.44% compared to M P Y Mealy FSMs.
Using Table 10, we have compared total on-chip power for each category. Comparing the results for category 0 shows that M P C Y -based FSMs lose out to the other methods. This loss is 19.37% compared to auto-based FSMs, 17.6% compared to one-hot-based FSMs and 21.08% compared to JEDI-based FSMs. The same data are correct for M P Y FSMs; however, starting from the category 1, our approach allows designing circuits consuming less power. The winnings grow as the category number grows. With respect to auto-based FSMs, our method provides the following gain: (1) 33.95% for the category 1; (2) 85.68% for the category 2; (3) 106.28% for the category 3; (4) 124.46% for the category 4. With respect to one-hot-based FSMs, our method provides the following gain: (1) 47.22% for the category 1; (2) 98.26% for the category 2; (3) 106.28% for the category 3; (4) 163.44% for the category 4. With respect to JEDI-based FSMs, the proposed method provides the following gain: (1) 19.69% for the category 1; (2) 43.98% for the category 2; (3) 77.38% for the category 3; (4) 80.97% for the category 4. Further, there is the following gain compared to M P Y -based FSMs: (1) 5.20% for the category 1; (2) 7.58% for the category 2; (3) 10.77% for the category 3; (4) 12.36% for the category 4. So, the proposed organization of the FSM circuit allows reducing the power consumption, starting with simple FSMs (category 1).
Using data from Table 8, Table 9 and Table 10, we can calculate the values for two integral characteristics. One of them is an area-time product [,], the second is an area-time-power product. The smaller the values of these products, the better the quality of the resulting FSM circuit []. As it is the case in many articles [,], we estimate the area of an FSM circuit by its LUT count.
As follows from Table 11, the M P C Y -based FSMs have better area-time characteristics than their investigated counterparts. On average, they provide the following gain: (1) 84.13% compared to auto-based FSMs; (2) 113.34% compared to one-hot-based FSMs; (3) 33.41% compared to JEDI-based FSMs; (4) 13.53% compared to M P Y Mealy FSMs. Using Table 11, we have compared area-time characteristics for each category of benchmark FSMs. As in the previous cases, for category 0 our approach gives the worst results; however, starting from category 1, the benefits of our approach are steadily increasing.
Comparing the results for category 0 shows that M P C Y -based FSMs lose out to the other methods. This loss is 31.8% compared to auto-based FSMs, 2.88% compared to one-hot-based FSMs, 33.33% compared to JEDI-based FSMs, and 1.1% compared to M P Y FSMs; however, starting from category 1, our approach allows designing circuits having smaller values of area-time products than they are for all other approaches. With respect to auto-based FSMs, our method provides the following gain: (1) 46.49% for the category 1; (2) 107.13% for the category 2; (3) 90.73% for the category 3; (4) 141.9% for the category 4. With respect to one-hot-based FSMs, our method provides the following gain: (1) 88.39% for the category 1; (2) 139.36% for the category 2; (3) 90.73% for the category 3; (4) 148.74% for the category 4. With respect to JEDI-based FSMs, the proposed method provides the following gain: (1) 11.42% for the category 1; (2) 45.42% for the category 2; (3) 50.63% for the category 3; (4) 61.03% for the category 4. Further, there is the following gain compared to M P Y -based FSMs: (1) 8.74% for the category 1; (2) 16.92% for the category 2; (3) 13.88% for the category 3; (4) 18.75% for the category 4. So, the proposed organization of the FSM circuit allows reducing the values of area-time products, starting with simple FSMs (category 1).
As follows from Table 12, the M P C Y -based FSMs have much smaller values of area-time-power products than they are for their investigated counterparts. On average, they provide the following gain: (1) 254.69% compared to auto-based FSMs; (2) 325.06% compared to one-hot-based FSMs; (3) 96.36% compared to JEDI-based FSMs; (4) 22.75% compared to M P Y Mealy FSMs. Using Table 12, we have compared area-time-power products for each category of benchmark FSMs. As in the previous cases, for category 0 our approach gives the worst results; however, starting from category 1, the benefits of our approach are steadily increasing.
Comparing the results for category 0 shows that M P C Y -based FSMs lose out to the other methods. This loss is 45.51% compared to auto-based FSMs, 10.12% compared to one-hot-based FSMs, 48.69% compared to JEDI-based FSMs, and 1.13% compared to MPY FSMs; however, starting from category 1, our approach allows designing circuits having smaller values of area-time-power products than they are for all other approaches. With respect to auto-based FSMs, our method provides the following gain: (1) 104.39% for the category 1; (2) 301.21% for the category 2; (3) 293.45% for the category 3; (4) 502.86% for the category 4. With respect to one-hot-based FSMs, our method provides the following gain: (1) 194.34% for the category 1; (2) 376.56% for the category 2; (3) 293.45% for the category 3; (4) 528.13% for the category 4. With respect to JEDI-based FSMs, the proposed method provides the following gain: (1) 36.51% for the category 1; (2) 112.16% for the category 2; (3) 167.19% for the category 3; (4) 213.43% for the category 4. Further, there is the following gain compared to M P Y -based FSMs: (1) 15.65% for the category 1; (2) 25.71% for the category 2; (3) 26.14% for the category 3; (4) 33.91% for the category 4. So, the proposed organization of the FSM circuit allows reducing the values of area-time-power products, starting with simple FSMs (category 1).
The main goal of the proposed approach is the reducing LUT counts in FPGA-based circuits of Mealy FSMs. The results of experiments (Table 8) show that this goal has been achieved. Obviously, this gain is achieved by using complex state codes in M P Y FSMs. Using these codes leads to introducing an additional level of LUTs forming the partial functions. It was natural to expect that the introduction of this additional level would lead to a decrease in performance; however, as follows from Table 9, our approach leads to slower FSM circuits only for FSMs from categories 0–1. As the complexity of FSMs increases, our approach begins to give a win in terms of minimum cycle time. Moreover, the proposed approach allows reducing the power consumption of resulting FSM circuits (starting from the category 1). The same is true for the integral characteristics of FSM circuits (the area-time and area-time-power products). These phenomena are positive side effects associated with our approach.
So, the results of our experiments show that the proposed approach can be used instead of other models starting from the simple FSMs (category 1). Our approach allows improving LUT counts starting from the simple FSMs. The same is true for the power consumption. Further, starting from the category 2, the proposed method allows improving the minimum cycle time compared with other investigated methods. In our research, we use the chip xc7vx690tffg1761-2 by Virtex-7 (Xilinx); however, this chip has no unique architecture of CLBs. This very architecture of CLBs is used in all chips of the 7th generation of Xilinx chips. Due to this, the results of our experiments show that the proposed approach can be used for improving LUT counts for designs based on any FPGA chip of the 7th generation. Moreover, all Xilinx FPGA families have one fundamental property in common: an extremely limited number of LUT inputs. This leads to the need to develop FSM synthesis methods aimed at reducing the influence of this factor on the characteristics of the LUT-based FSM circuits. The results of our research show that the proposed method allows solving this problem better than some well-known methods combining various approaches of state assignment (auto, one-hot, and JEDI) and functional decomposition, as well as our previous method based on structural decomposition ( M P Y FSMs).

7. Conclusions

Modern FPGA chips include more than 7.5 billion transistors []. They have proved to be a very effective means of implementing a variety of digital systems. There is a serious drawback inherent in FPGAs, namely, a rather small number of LUT inputs. This leads to the need of using various methods of functional decomposition under the design of LUT-based FSM circuits. As a result, the LUT-based circuits of rather complex FSMs are presented in the form of multi-level networks with a complex system of spaghetti-type interconnections []. This disadvantage can be overcome due to applying methods of structural decompositions [,]. This leads to FSM circuits with predicted number of levels and regular systems of interconnections [].
Our research [,] shows that SD-based Mealy FSM circuits have better characteristics than their FD-based counterparts. As a rule, the combined use of methods of SD allows obtaining a greater gain in the number of LUTs than from the use of each of these methods separately []. In [], we proposed to use two methods of SD, namely, the replacement of inputs and encoding of collections of outputs; however, even in this case, some parts of the resulting FSM circuits can have more than a single level of LUTs. In this article, we discuss just such a case.
To diminish the number of LUTs and their levels, we use the ideas []. These methods are based on finding a partition of states by classes of compatible states; however, in contrast to the known methods [], we have replaced known extended state codes by the complex state codes, which have not been known before. The complex state codes are represented by concatenations of class codes and the codes of FSM states as elements of these classes. This approach leads to four-level FSM circuits, which require fewer LUTs than their counterparts based on methods []. There is a gain in the LUT count around 11.36% relative to three-level M P Y FSM circuits []. Moreover, our approach provides a very small increase in the FSM performance (on average, only 0.31%) and a decrease in the power consumption (on average by 5.79%) for the benchmarks from the library []. In our opinion, the proposed method can be applied instead of LUT-based MPY Mealy FSMs.

Author Contributions

Conceptualization, A.B., L.T., and K.K.; methodology, A.B., L.T., K.K., and S.S.; software, A.B., L.T., and K.K.; validation, A.B., L.T., and K.K.; formal analysis, A.B., L.T., K.K., and S.S.; investigation, A.B., L.T., and K.K.; writing—original draft preparation, A.B., L.T., K.K., and S.S.; supervision, A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CLBconfigurable logic block
COcollection of outputs
CSCcomplex state codes
DSTdirect structure table
ESCextended state code
FDfunctional decomposition
FPGAfield-programmable gate array
FSMfinite state machine
IMFinput memory function
LUTlook-up table
RGstate code register
SBFsystems of Boolean functions
SDstructural decomposition
STGstate transition graph
STTstate transition table

References

  1. Micheli, G.D. Synthesis and Optimization of Digital Circuits; McGraw-Hill: Cambridge, MA, USA, 1994. [Google Scholar]
  2. Barkalov, A.; Titarenko, L.; Krzywicki, K. Structural Decomposition in FSM Design: Roots, Evolution, Current State—A Review. Electronics 2021, 10, 1174. [Google Scholar] [CrossRef]
  3. Kubica, M.; Kania, D. Technology Mapping of FSM Oriented to LUT-Based FPGA. Appl. Sci. 2020, 10, 3926. [Google Scholar] [CrossRef]
  4. Czerwinski, R.; Kania, D. Finite State Machine Logic Synthesis for Complex Programmable Logic Devices. In Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2013; Volume 231. [Google Scholar]
  5. Kubica, M.; Kania, D.; Kulisz, J. A technology mapping of fsms based on a graph of excitations and outputs. IEEE Access 2019, 7, 16123–16131. [Google Scholar] [CrossRef]
  6. Islam, M.M.; Hossain, M.S.; Shahjalal, M.D.; Hasan, M.K.; Jang, Y.M. Area-time efficient hardware implementation of modular multiplication for elliptic curve cryptography. IEEE Access 2020, 8, 73898–73906. [Google Scholar] [CrossRef]
  7. Grout, I. Digital Systems Design with FPGAs and CPLDs; Elsevier Science: Amsterdam, The Netherlands, 2011. [Google Scholar]
  8. Trimberger, S.M. Field-Programmable Gate Array Technology; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  9. Ruiz-Rosero, J.; Ramirez-Gonzalez, G.; Khanna, R. Field Programmable Gate Array Applications—A Scientometric Review. Computation 2019, 7, 63. [Google Scholar] [CrossRef] [Green Version]
  10. Xilinx FPGAs. Available online: https://www.xilinx.com/products/silicon-devices/fpga.html (accessed on 18 December 2021).
  11. Barkalov, A.; Titarenko, L.; Krzywicki, K. Reducing LUT Count for FPGA-Based Mealy FSMs. Appl. Sci. 2020, 10, 5115. [Google Scholar] [CrossRef]
  12. Kubica, M.; Opara, A.; Kania, D. Technology mapping for LUT-based FSMs. In Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2021; Volume 713, p. 216. [Google Scholar]
  13. Scholl, C. Functional Decomposition with Application to FPGA Synthesis; Kluwer Academic Publishers: Boston, MA, USA, 2001. [Google Scholar]
  14. Baranov, S. Logic Synthesis of Control Automata; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1994. [Google Scholar]
  15. Sklarova, D.; Sklarov, V.A.; Sudnitson, A. Design of FPGA-Based Circuits Using Hierarchical Finite State Machines; TUT Press: Tallinn, Estonia, 2012. [Google Scholar]
  16. Kuon, I.; Tessier, R.; Rose, J. FPGA architecture: Survey and challenges—Found trends. Electr. Des. Autom. 2008, 2, 135–253. [Google Scholar]
  17. Barkalov, A.; Titarenko, L.; Mielcarek, K. Improving characteristics of LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2020, 30, 745–759. [Google Scholar]
  18. Kubica, M.; Kania, D. Decomposition of multi-level functions oriented to configurability of logic blocks. Bull. Pol. Acad. Sci. 2017, 67, 317–331. [Google Scholar]
  19. Mishchenko, A.; Chattarejee, S.; Brayton, R. Improvements to technology mapping for LUT-based FPGAs. IEEE Trans. CAD 2006, 27, 240–253. [Google Scholar]
  20. Chapman, K. Multiplexer Design Techniques for Datapath Performance with Minimized Routing Resources. Application Note. 2012. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.259.5300&rep=rep1&type=pdf (accessed on 17 March 2022).
  21. Sasao, T.; Mishchenko, A. LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations. Proc. IWLS. 2009, pp. 310–316. Available online: http://www.lsi-cad.com/sasao/Papers/files/IWLS2009_sasao_mis.pdf (accessed on 17 March 2022).
  22. Senhadji-Navarro, R.; Garcia-Vargas, I. Mapping Arbitrary Logic Functions onto Carry Chains in FPGAs. Electronics 2022, 11, 27. [Google Scholar] [CrossRef]
  23. Kim, J.H.; Anderson, J. Post-LUT-Mapping Implementation of General Logic on Carry Chains Via a MIG-Based Circuit Representation. In Proceedings of the 2021 31st International Conference on Field-Programmable Logic and Applications (FPL), Dresden, Germany, 30 August–3 September 2021; pp. 334–340. [Google Scholar]
  24. Kubica, M.; Kania, D. Technology mapping oriented to adaptive logic modules. Bull. Pol. Acad. Sci. 2019, 67, 947–956. [Google Scholar]
  25. Das, N. Reset: A Reconfigurable state encoding technique for FSM to achieve security and hardware optimality. Microprocess. Microsyst. 2020, 77, 103196. [Google Scholar] [CrossRef]
  26. Tao, Y.; Zhang, Y.; Wang, Q.; Cao, J. MPGA: An evolutionary state assignment for dynamic and leakage power reduction in FSM synthesis. IET Comput. Digit. Tech. 2018, 12, 111–120. [Google Scholar] [CrossRef]
  27. El-Maleh, A.H. A Probabilistic Tabu Search State Assignment Algorithm for Area and Power Optimization of Sequential Circuits. Arab. J. Sci. Eng. 2020, 45, 6273–6285. [Google Scholar] [CrossRef]
  28. Sentovich, E.M.; Singh, K.J.; Lavagno, L.; Moon, C.; Murgai, R.; Saldanha, A.; Savoj, H.; Stephan, P.R.; Brayton, R.K.; Sangiovanni-Vincentelli, A. SIS: A System for Sequential Circuit Synthesis; University of California: Berkely, CA, USA, 1992. [Google Scholar]
  29. ABC System. Available online: https://people.eecs.berkeley.edu/~alanmi/abc/ (accessed on 18 December 2021).
  30. Brayton, R.; Mishchenko, A. ABC: An Academic Industrial-Strength Verification Tool. In Computer Aided Verification (Berlin, Heidelberg, 2010); Touili, T., Cook, B., Jackson, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 24–40. [Google Scholar]
  31. Vivado Design Suite User Guide: Synthesis. UG901 (v2019.1). Available online: https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug901-vivado-synthesis.pdf (accessed on 18 December 2021).
  32. Quartus Prime. Available online: https://www.intel.pl/content/www/pl/pl/software/programmable/quartus-prime/overview.html (accessed on 18 December 2021).
  33. Khatri, S.P.; Gulati, K. Advanced Techniques in Logic Synthesis, Optimizations and Applications; Springer: New York, NY, USA, 2011. [Google Scholar]
  34. Sklyarov, V. Synthesis and implementation of RAM-based finite state machines in FPGAs. In International Workshop on Field Programmable Logic and Applications; Springer: Berlin/Heidelberg, Germany, 2000; pp. 718–727. [Google Scholar]
  35. Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving the Characteristics of Multi-Level LUT-Based Mealy FSMs. Electronics 2020, 9, 1859. [Google Scholar] [CrossRef]
  36. McElvain, K. LGSynth93 Benchmark; Mentor Graphics: Wilsonville, OR, USA, 1993. [Google Scholar]
  37. VC709 Evaluation Board for the Virtex-7 FPGA User Guide; UG887 (v1.6); Xilinx, Inc.: San Jose, CA, USA, 2019.
  38. Islam, M.M.; Hossain, M.S.; Hasan, M.K.; Shahjalal, M.; Jang, Y.M. FPGA implementation of high-speed area-efficient processor for elliptic curve point multiplication over prime field. IEEE Access 2019, 7, 178811–178826. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.