Next Article in Journal
Identifying Adversary Impact Using End User Verifiable Key with Permutation Framework
Next Article in Special Issue
An Encryption Application and FPGA Realization of a Fractional Memristive Chaotic System
Previous Article in Journal
Kinematically Constrained Jerk–Continuous S-Curve Trajectory Planning in Joint Space for Industrial Robots
Previous Article in Special Issue
MWIRGAN: Unsupervised Visible-to-MWIR Image Translation with Generative Adversarial Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving the Spatial Characteristics of Three-Level LUT-Based Mealy FSM Circuits

by
Alexander Barkalov
1,2,*,†,
Larysa Titarenko
1,3,†,
Małgorzata Mazurkiewicz
4,*,† and
Kazimierz Krzywicki
5,†
1
Institute of Metrology, Electronics and Computer Science, University of Zielona Góra, ul. Licealna 9, 65-417 Zielona Góra, Poland
2
Department of Computer Science and Information Technology, Vasyl Stus’ Donetsk National University, 600-richya Str. 21, 21021 Vinnytsia, Ukraine
3
Department of Infocommunication Engineering, Faculty of Infocommunications, Kharkiv National University of Radio Electronics, Nauky Avenue 14, 61166 Kharkiv, Ukraine
4
Institute of Control & Computation Engineering, University of Zielona Góra, ul. Licealna 9, 65-417 Zielona Góra, Poland
5
Department of Technology, The Jacob of Paradies University, ul. Teatralna 25, 66-400 Gorzów Wielkopolski, Poland
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2023, 12(5), 1133; https://doi.org/10.3390/electronics12051133
Submission received: 5 February 2023 / Revised: 23 February 2023 / Accepted: 24 February 2023 / Published: 26 February 2023
(This article belongs to the Special Issue Feature Papers in Circuit and Signal Processing)

Abstract

:
The main purpose of the method proposed in this article is to reduce the number of look-up-table (LUT) elements in logic circuits of sequential devices. The devices are represented by models of Mealy finite state machines (FSMs). Thesee are so-called MPY FSMs based on two methods of structural decomposition (the replacement of inputs and encoding of output collections). The main idea is to use two types of state codes for implementing systems of partial Boolean functions. Some functions are based on maximum binary codes; other functions depend on extended state codes. The reduction in LUT counts is based on using the method of twofold state assignment. The proposed method makes it possible to obtain FPGA-based FSM circuits with four logic levels. Only one LUT is required to implement the circuit corresponding to any partial function. An example of FSM synthesis using the proposed method is shown. The results of the conducted experiments show that the proposed approach produces LUT-based FSM circuits with better area-temporal characteristics than for circuits produced using such methods as Auto and One-hot of Vivado, JEDI, and MPY FSMs. Compared to MPY FSMs, the values of LUT counts are improved. On average, this improvement is 8.98%, but the gain reaches 13.65% for fairly complex FSMs. The maximum operating frequency is slightly improved as compared with the circuits of MPY FSMs (up to 0.64%). For both LUT counts and frequency, the gain increases together with the growth for the numbers of FSM inputs, outputs and states.

1. Introduction

To represent various sequential blocks, a model of a Mealy finite state machine (FSM) [1] can be applied. There are many examples of using this model in the implementation of various digital systems [2]. In this paper, we consider FSM circuits implemented using field-programmable gate arrays (FPGAs) [3,4]. This choice is due to the wide use of FPGAs in the implementation of a wide variety of projects [4,5]. Leading experts are confident that FPGAs will continue to dominate logic design for at least the next twenty years [6].
When using any logic basis for the implementation of FSM circuits, a number of optimization problems always arise [7,8]. One of the most important tasks is to obtain a circuit that is optimal in terms of hardware costs. By optimal, we mean a circuit that consumes the minimum possible amount of chip resources while simultaneously providing the required level of performance and power consumption. In the case of FPGA-based circuits [9], the optimization strategy significantly depends on the types of configurable logic blocks (CLBs) used [10]. In this paper, we discuss the most common CLBs which include look-up table (LUT) elements, programmable flip-flops, and dedicated multiplexers [10,11]. To combine these CLBs into an FSM circuit, the following chip resources are used: the synchronization tree, programmable interconnections, and programmable input-outputs [12,13]. The method proposed in this paper is aimed at reducing the number of LUTs (LUT count) in a resulting FSM circuit.
It is generally accepted that reducing LUT count leads to improving the spatial characteristics of FSM circuits (reducing the occupied chip areas) [14,15]. Area reduction can be achieved by applying structural decomposition (SD) methods [9] leading to multi-level FSM circuits. However, such a reduction may have an overhead [9]. This overhead consists of a significant performance degradation compared to equivalent single-level FSM circuits [14,16]. However, performance has to be sacrificed if the criterion of design optimality is the minimum occupied chip area.
The best LUT counts can be obtained for three-level FSM circuits when the methods of replacing FSM inputs and encoding collections of FSM outputs [17] are used together. However, for sufficiently complex FSMs, some of the logic blocks (or even all three blocks) may have a multilevel structure. This leads to an increase in the number of logical levels and interconnections. In turn, this leads to an increase in the occupied area, power consumption and delay time of the FSM circuit. In this paper, we propose a method to reduce the LUT counts of three-level FSM circuits. The proposed method is based on using twofold state assignment [18]. This approach leads to a decrease in the number of LUTs and their levels in the resulting LUT-based FSM circuits.
There are some leading companies producing FPGA chips. The largest producer is AMD Xilinx [19]. As follows from [4], FPGAs from AMD Xilinx are widely used in various projects. Due to this, we structured our approach according to the FPGA families [19] by AMD Xilinx. In our research, we use FPGAs from the VIrtex-7 family [10].
The article contains several new scientific results. Firstly, a new architecture of an LUT-based Mealy FSM circuit is proposed. Secondly, methods for the uniform distribution of inputs and state encoding are proposed, which make it possible to reduce the number of LUTs in the circuit of the input replacement block in comparison with the known methods for implementing this block. Thirdly, a new method for stabilizing FSM outputs is proposed, in which the input register is replaced by a register of output collection codes. The noted new approaches led to the main contribution of the article, which is a novel design method aimed at hardware reduction in the multilevel circuits of LUT-based Mealy FSMs. The hardware reduction is achieved due to the use of two types of state codes. The maximum binary state codes are used to replace the FSM inputs. Other partial Boolean functions depend on extended state codes. The proposed approach leads to four-level FSM circuits where any partial function is represented by a single LUT. The conducted experiments show that the resulting FSM circuits include fewer LUTs compared to equivalent three-level circuits [17]. It is very important that the hardware reduction does not lead to the significant deterioration of temporal characteristics.
The rest of the paper is organized as follows. Section 2 shows the peculiarities of the LUT-based Mealy FSM design. The analysis of related works is discussed in Section 3. Section 4 presents the main idea of our method. In Section 5, we include a step-by-step example showing how to apply the proposed method. Section 6 includes the experimental results. The last part of the article is a short conclusion.

2. Peculiarities of LUT-Based Mealy FSM Design

The law of the behaviour of a Mealy FSM can be represented using three sets and two functions [20]. These sets are the following: a set of internal states S = { s 1 , , s M } , a set of inputs X = { x 1 , , x L } , and a set of outputs Y = { y 1 , , y N } . The interstate transitions are represented by a function of transitions. An output function shows the FSM outputs generated during these transitions. In this article, we use a state transition graph (STG) [1] as an initial tool for FSM design. An STG consists of vertices representing FSM states. The vertices are connected by arcs corresponding to interstate transitions. Each arc is marked by an input signal (the conjunction of inputs leading to a particular transition) and a collection of outputs associated with this transition [1]. To synthesize the FSM circuit, we transformed this STG into the equivalent state transition table (STT) [1].
To design an FSM circuit, it is necessary to replace abstract states s m S with binary codes K ( s m ) . This is the state-assignment step [1]. To minimize the number of state variables and input memory functions (IMFs), it is necessary to minimize the bitness of state codes. The minimum possible number R M B of state-code bits corresponds to a maximum state assignment [20]. This number is determined as
R M B = log 2 M .
To encode states, state variables creating a set T = { T 1 , , T R M B } are used. To keep the state codes, a special register, RG, consisting of R M B flip-flops is used as a part of FSM circuit.
In most practical cases [9], as elements of the state register are used the synchronous D flip-flops. Each state variable is represented by a unique flip-flop. The input of the r-th flip-flop is connected with an IMF D r D where D = { D 1 , , D R M B } is a set of IMFs. The initial state code is forcibly loaded into RG. To do this, a special pulse of initialization Start is used. Set D determines a state code loaded into RG. To load a code K ( s m ) , the pulse of synchronization Clock is used.
Using either STG or STT, a direct structure table (DST) [20] can be constructed. There are six columns in the DST [20]: s C , s T , X h , Y h , D h , h. The data from these columns have the following meaning: s C is an initial state for a given transition; s T is a final state for this transition; X h is a conjunction of FSM inputs determining the transition s C , s T ; Y h is a collection of outputs (CO) produced during the transition s C , s T ; D h is a set of IMFs equal to 1 to execute the h-th transition (to load the code K ( s T ) into RG); and h is the transition number ( h { 1 , , H } ) . The DST is a base for constructing the following systems of Boolean functions (SBFs) [21]:
D = D ( T , X ) ;
Y = Y ( T , X ) .
The SBFs (2) and (3) are a base for implementing the so-called P Mealy FSM [9]. In FPGA-based FSMs, the flip-flops of RG are distributed among the CLBs, including LUTs, generating the functions (2). Thus, the distributed state-code register is hidden. As a result, there are only two blocks in the structural diagram of LUT-based P Mealy FSM (Figure 1).
The LUTs of a block LT implement IMFs (2). The memory elements of LT create the RG. This explains why the pulses Start and Clock enter LT. Obviously, the state variables T r T come out of the block LT. The block LY generates functions (3) representing the outputs y n Y . Each LUT has S L inputs.
The functions (2) and (3) are represented by their sum of products (SOPs) [1]. An SOP of a Boolean function f i D Y has N I ( f i ) literals. For rather complex FSMs, the following condition may hold:
N I ( f i ) > S L .
If (4) takes place, then the circuit of P Mealy FSM is multi-level. It is known [9] that multi-level circuits are less efficient than the equivalent single-level circuits (the former are much slower and require more power than the latter). The same is true for the numbers of interconnections in the equivalent single-level and multi-level circuits. The growth in interconnections leads to the further growth in the values of both time of cycle and power consumption. The use of SD-based methods can lead to a significant improvement in the overall circuit quality [9,17].
There are two types of literals in SOPs of functions (2) and (3): external inputs x l X and elements of the set T (the variables T r T ). Each function f i D Y depends on R i R M B state variables and L i L inputs. There is only one LUT in the circuit corresponding to the function f i D Y , if the following condition is true:
R i + L i S L .
If condition (5) holds, then the values of function f i D Y are generated by a single-LUT circuit. If condition (5) takes place for all R + N functions, then the circuit of P Mealy FSM is single-level. A single-level circuit has the best possible values of the required chip area, power consumption and maximum operating frequency.
However, there are FSMs with around 500 states and 30 inputs [2]. In this case, each function f i D Y may depend on up to 39 arguments. Thus, their SOPs can include up to 39 literals. Of course, these SOPs cannot be implemented using only a single LUT with S L = 6 inputs. Thus, the corresponding circuits will be multi-level with spaghetti-type interconnecting systems. To improve the characteristics of multi-level circuits, various optimization methods should be applied. In this paper, we propose an approach which allows reducing the chip area occupied by the LUT-based FSM circuit when the condition (5) is violated.

3. Brief Analysis of Related Works

The problem of area reduction is discussed in thousands of monographs and articles. For example, various methods for solving this problem are proposed in the following works (to name but a few): [14,22,23,24,25,26,27,28]. As follows from [23], reducing the required chip area is connected with reducing the LUT count for a corresponding circuit. To achieve this goal, three groups of methods can be used: a proper state assignment, a functional decomposition (FD) of Boolean functions, and SD-based approaches [9].
The proper state assignment leads to the elimination of some literals from SOPs (2) and (3) [20]. If the elimination of literals results in the fulfilment of condition (5) for SOPs of all functions (2) and (3), then the resulting FSM circuit is single-level. This can be achieved using, for example, the state assignment method JEDI distributed with the CAD system SIS [29]. JEDI-based optimization is achieved by creating adjacent codes for states whose transitions depend on the same FSM inputs x l X . As shown in [30], this allows elimination of up to 3 literals from SOPs representing benchmark FSMs from the library LGSynth93 [31]. Thus, JEDI can solve the optimization problem if the relation N I ( f i ) S L 3 holds. However, this relation only takes place for rather simple FSMs [9].
As follows from various research [32,33,34,35], there is no best universal state-assignment approach. For example, optimization success depends on how many variables x l X the transitions from each state depend on. For different FSMs, the same state-assignment method may either improve or deteriorate the quality of resulting circuits. In addition, the optimization strategy depends strongly on the peculiarities of the logic elements used [33]. If LUTs are used, the spatial improvement can be achieved due to an increase in the state-code length [36]. In the extreme case, the number of bits is equal to M. This is a one-hot state assignment [1], when the RG includes M flip-flops. The results of research reported in [32] show that the one-hot state assignment can improve the FSM characteristics, if there is M > 16 . However, it is necessary to take into account the number of FSM inputs [34]. As shown in [32], using MBC improves FSM quality if there is L > 10 (compared to FSMs with one-hot codes). This situation stimulates the development of new types of state codes and encoding strategies.
If no state-assignment method allows the implementation of a single-level circuit for a given FSM, then decomposition methods should be applied. In this case, the initial functions (2) and (3) are represented as a composition of partial Boolean functions (PBFs). The decomposition is executed till the condition (4) is satisfied for each partial function. Any kind of decomposition leads to a multi-level FSM circuit.
In the case of FD-based FSM circuits, CLBs are connected by complicated systems of “spaghetti-type” interconnections [11]. Such circuits have much lower clock rates compared to equivalent single-level solutions. This is connected with the fact that, now, “...wires delay has come to dominate logic delay” [37]. In addition, compared to single-level circuits, FD-based circuits are more power-consuming. This phenomenon is due to the fact that the interconnections absorb up to 70% of the total power consumed by an FPGA-based FSM circuit [37]. However, the advantage of FD is that it is applicable to the implementation of Boolean functions of any practical complexity. Therefore, FD-based algorithms are used in all industrial CAD systems aimed at the implementation of FPGA-based digital systems [38,39,40,41].
In many cases, the methods of structural decomposition [9] allow the production of FSM circuits with better space-time-energy characteristics than their FD-based counterparts. The SD-based FSM circuits can be viewed as a composition of large logic blocks with unique input-output systems. Such an approach leads to the regularization of interconnections compared to FD-based FSM circuits [16]. Different methods of SD can be used together. Due to this, the number of blocks can vary from 2 to 4, depending on how many methods are used. The methods of SD and FD can be used together [9].
Two methods of SD are most commonly used. One of them is the replacement of inputs (RI) with some additional variables [9]. The second method is the encoding of COs [9]. Below is a brief description of these methods.
The process of RI comes down to replacing inputs x l X with the additional variables from a set B = { b 1 , , b G } . The replacement makes sense if L G [9]. As a result, the SBFs (2) and (3) are replaced by the systems
B = B ( T , X ) ;
D = D ( T , B ) ;
Y = Y ( T , B ) .
The system (6) is represented by a block with inputs x l X and T r T . In the following text, we denote this block with the symbol LB. Obviously, the circuit of LB consumes some chip resources. The systems (7) and (8) are implemented by block LTY. This approach makes sense if the SOPs (7) and (8) include much fewer literals than the SOPs (2) and (3) [9]. In this case, the LUT counts in the circuit of P FSM significantly exceed the total number of LUTs necessary to implement SBFs (6)–(8).
During the interstate transitions, Q different COs Y q Y are generated. Each CO can be represented by a code K ( Y q ) . This code includes R C O bits [9]:
R C O = log 2 Q .
The COs are encoded using some additional variables creating a set Z = { z 1 , , z R C O } . If this approach is applied together with the RI, then the SBF (3) is replaced with the following SBFs:
Z = Z ( T , B ) ;
Y = Y ( Z ) .
The system (10) depends on the same variables as the system (7). Thus, these two SBFs are implemented using the same block, LTZ. To implement SBF (11), block LY is used. Sharing these methods turns the original P FSM (Figure 1) into MPY FSM (Figure 2).
In MPY FSM, the block LB generates the additional variables (6). The block LTZ generates IMFs represented by (7) and additional variables (10). The block LY generates the FSM outputs (11). As shown in [17], the transition from P FSM to MPY FSM allows the reduction in LUT counts in equivalent FSM circuits. Of course, this area reduction leads to a decrease in the value of maximum operating frequency. This decrease can be viewed as the area-reducing overhead.
To obtain SBF (6), a table of RI should be constructed [20]. Its columns are marked by states s m S , whereas additional variables b g B mark its rows. There is a symbol x l written at the intersection of a row b g B and column s m S , if the variable b g B replaces the input x l X for the state s m S . In fact, the block LB is a multiplexer, the information inputs of which are connected to inputs x l X and the control inputs are connected to state variables T r T .
To obtain SBFs (7) and (10), it is necessary to create a transformed DST. In the transformed DST, the column X h is replaced by a column B h , whereas the column Y h is replaced by a column Z h . These new columns are filled in as follows. For example, the first row of DST includes a CO Y 2 generated during a transition s 1 , s 2 caused by the input signal X 1 = x 1 x 2 . Let the following relations take places for the state s 1 S : x 1 = b 1 and x 2 = b 2 . In this case, the input signal X 1 = x 1 x 2 is replaced by the conjunction B 1 = b 1 b 2 written in the column B h . If K ( Y 2 ) = 101 , then the additional variables z 1 , z 3 Z are written in the column Z h . All other rows of the transformed DST are filled in the same manner.
To obtain SBF (11), it is necessary to create the Karnaugh map whose cells are marked by the variables z r Z . The symbols Y q are written inside the cells. Using this map, the minimized SOPs (11) are constructed. The minimization makes sense if some literals are eliminated from all product terms of a SOP representing a function y n Y [9].
The application of this approach is most efficient if condition (4) is satisfied for all functions f i B D Z Y [9]. Otherwise, there will be more than a single LUT in the circuits for functions that do not satisfy condition (4). Moreover, this leads to the multi-levelness of the corresponding blocks, which further reduces the MPY FSM performance. To implement these multi-level circuits, the methods of FD should be applied.
To overcome this shortcoming of MPY FSM, we propose to transform its structural diagram using the method of two-fold state assignment (TSA) [18]. This idea is discussed in the next section.

4. Main Idea of Proposed Method

To execute the TSA, it is necessary to create a partition π S = { S 1 , , S K } of the set of states. As a result, each state s m S has two codes. The maximum binary code K ( s m ) has R M B bits. This code represents a state as some element of the set S. The partial code C ( s m ) represents a state as some element of a class S k π S . This class includes M k elements. To encode them, R k bits are sufficient:
R k = log 2 ( M k + 1 ) .
In (12), the value of M k is incremented to encode the relation s m S k . We use the code with all zeroes to encode this relation. This code represents the state s m S k for all classes other than S k .
The codes C ( s m ) for all classes S k π S form an extended state (ESC) code of the state s m S k . Each ESC includes R S bits, where
R S = R 1 + + R K .
To create ESCs, the additional variables are used. These variables are elements of a set V = V 1 V 2 V K . The variables v r V k create the codes C ( s m ) for the states s m S k . To generate ESCs, it is necessary to transform state codes K ( s m ) into codes C ( s m ) for all states s m S . To transform the codes, it is necessary to create the following SBF:
V = V ( T ) .
We discuss a case wherein both the replacement of inputs and encoding of COs are executed. In this case, each class S k π S determines three sets. A set B k B includes variables b g B determining transitions from the states s m S k . A set of additional variables Z k Z includes elements determining COs generated during transitions from the states s m S k . Finally, the elements of a set D k D include IMFs equal to 1 in the codes of the states next to states s m S k . Each class S k π S determines the following systems of PBFs:
D k = D k ( V k , B k ) ;
Z k = Z k ( V k , B k ) .
To obtain the final values of functions D r D and z r Z , it is necessary to create the following SBFs:
D = D ( D 1 , , D K ) ;
Z = Z ( Z 1 , , Z K ) .
The functions f i D Z are disjunctions of corresponding PBFs.
The combined use of these three methods of SD leads to MP T Y Mealy FSMs. The subscript “T” shows that the two-fold state assignment is used. Its structural diagram consists of four logic levels (Figure 3).
In MP T Y Mealy FSM, the block LB generates functions (6) to replace FSM inputs using additional variables. The second logic level consists of blocks LB1, , LBK. Each block LBk implements systems of PBFs (15) and (16). These functions are transformed into functions f i D Z by the block LTZ. This block represents the third logic level of FSM circuit. The block LTZ includes two distributed registers. One of them is the state code register RG. The RG outputs are used as a feedback for the input transformation. In addition, they enter a block LV to create ESCs. The second register (a register RZ) keeps the codes of COs. We discuss the necessity of RZ later. Both registers are zeroed by the pulse Start and synchronized by the pulse Clock. The fourth logic level includes two blocks. The block LY generates FSM outputs represented by (11). The block LV transforms the maximum state codes K ( s m ) into extended state codes C ( s m ) . This block implements SBF (14).
To reduce the chip area occupied by the LUT-based circuit of MP T Y Mealy FSM, we propose two new approaches. One of them allows the reduction of the number of LUTs and their levels in the circuit of LB. The second method aims to reduce the number of flip-flops necessary for the stabilization of the FSM operation.
We use the symbol X ( b g ) for a set of FSM inputs replaced by an additional variable b g B . As a rule, the RI is executed in the following way [20]: the number of FSM inputs in different sets X ( b q ) should be maximal. At best, identical inputs x l X should be replaced by the same variable b g B . Such an approach allows minimization of the chip area if an FSM circuit is implemented using programmable logic arrays (PLAs) [9]. However, PLAs have a lot of inputs, whereas this number is very limited for LUTs. Thus, we propose distributing inputs x l X in a way which allows holding the following condition for the maximum possible number of sets X ( b g ) :
| X ( b g ) | + R M B = S L .
Obviously, if (19) takes place for the set X ( b g ) , then a circuit generating the function b g B includes only one element. If (19) takes place for all sets X ( b g ) , then the block LB includes G elements. In addition, this circuit is single-level.
To increase the value of | X ( b g ) | , we propose to encode the states in a way that decreases the number of state variables in functions (6). Let S ( b g ) S be a set of states whose transitions depend on the inputs x l X ( b g ) . We propose to encode the states s m S ( b g ) in such a way that their codes create the minimum possible number of generalized cubes of R M B -dimensional Boolean space. This approach allows excluding some state variables as literals of SOPs (6).
As a rule, FSMs are not stand-alone units. They are used as parts of a digital system. Due to it, the stability of the outputs is one of the very important problems in FSM circuit design [13,42,43]. If an FSM is a part of some digital system, then the FSM outputs are inputs of other system’s blocks. It is known [1,20] that outputs of Mealy FSMs are unstable: input fluctuations may lead to output fluctuations. In turn, these fluctuations of FSM outputs may cause failure in some blocks of a digital system. It is possible to avoid such failures by stabilizing the FSM inputs. To do this, it is necessary to introduce a synchronous register of inputs (RI) [20]. This changes the FSM operation mode.
De facto, the set of inputs X = { x 1 , , x L } consists of outputs of various system blocks. These outputs enter the flip-flops of RI. Till these outputs are transients, the synchronization signal of RI is not active. Due to this, the FSM is disconnected from other blocks. Thus, the RI keeps the values of FSM inputs registered in the previous cycle. After the stabilization of system outputs, they are loaded into the RI using the required edge of synchronization. Thus, eliminating the dependence of the inputs’ stability on the stability of system outputs leads to additional area costs and reduces overall performance. This is an overhead of stability (additional LUTs, flip-flops, interconnections, power consumption and delay). Thus, it makes sense to reduce this overhead.
In our paper, we propose to include a register RZ into block LTZ. There is a flip-flop in each CLB generating a function z r Z . Thus, to organize the RZ, there is no need for additional LUTs. In addition, these flip-flops could be controlled by already-existing pulses Start and Clock. Obviously, the proposed approach does not require additional CLBs. This means that it does not require the additional chip area (compared to an FSM architecture which uses either a registration of inputs or a registration of outputs).
A method for the synthesis of MP T Y Mealy FSMs is proposed in this paper. We start the design from an STG [1]. To create tables representing the blocks of the FSM circuit, the STG is transformed into the equivalent STT [1]. The proposed method includes the following steps:
  • Creating STT of Mealy FSM.
  • Executing replacement of FSM inputs.
  • Assignment of maximum binary state codes K ( s m ) optimizing SBF (6).
  • Creating SBF (6) representing the block LB.
  • Finding the partition π S with the minimum cardinality number.
  • Assignment of partial codes C ( s m ) to states s m S k .
  • Encoding of COs Y q Y using maximum binary codes.
  • Creating SBF (11) representing the block LY.
  • Constructing tables of LB1–LBK and creating SBFs (15) and (16).
  • Constructing the table of LTZ and creating systems (17) and (18).
  • Constructing table of LV and deriving the system (14).
  • Implementing LUT-based circuit of MP T Y FSM.
If an FSM A is synthesized using the model of MP T Y Mealy FSM, then we denote such a situation by the symbol MP T Y(A). Next, we discuss an example of MP T Y FSM synthesis.

5. Example of Synthesis of MP T Y Mealy FSM Logic Circuit

We discuss the synthesis of Mealy FSM MP T Y(A1) using LUTs with S L = 5 inputs. The STG (Figure 4) represents the FSM A 1 .
Using STG (Figure 4), we can derive the sets S = { s 1 , , s 6 } (each vertex of STG corresponds to a state); X = { x 1 , , x 8 } (these inputs are shown above the STG arcs); and Y = { y 1 , , y 9 } (these outputs are written above the STG arcs). This gives the following values: M = 6 , L = 8 , and N = 9 . There are H = 17 arcs connecting the vertices of STG (Figure 4). Obviously, there are H = 17 rows in the equivalent STT. As follows from (1), R M B = 3 is necessary to execute the maximum binary state assignment. This gives the sets T = { T 1 , T 2 , T 3 } and D = { D 1 , D 2 , D 3 } .
Step 1. The procedure of transformation is executed using the approach shown in [1]. Each arc of STG determines a row of STT. Each row includes a current state s C , a transition state s T , an input signal X h which determines the transition from s C into s T , an output collection Y h , and the row number, h. In the discussed example, the STG (Figure 4) is transformed into STT (Table 1). This table includes an additional column q containing the subscripts of COs written in each row of the column Y h .
Step 2. The interstate transitions from s m S depend on inputs creating the set X ( s m ) X with N I m elements. To find the number, G, of additional variables b g B , it is necessary to use the following formula [20]:
G = m a x ( N I 1 , , N I M ) .
As follows from Table 1, the existing sets X ( s m ) X have the following cardinality numbers: N I 1 = N I 2 = N I 4 = 3 , N I 5 = N I 6 = 2 , and N I 3 = 0 . Using (20) gives G = 3 and B = { b 1 , b 2 , b 3 } .
Thus, there is S L = 5 and R M B = 3 . Using (19) gives | X ( b g ) | = S L R M B = 2 . Thus, the IR should be executed in a way so that the relation | X ( b g ) | = 2 holds for the maximum possible number of sets X ( b g ) . Using the proposed approach gives the distribution of inputs shown in Table 2.
Step 3. States s m S should be encoded in a way that minimizes the numbers of literals in SBF (6). We denote by symbol S ( b g ) a set of states in which FSM inputs x l X are replaced by the additional variable b g B . To optimize SBF (6), we propose placing the codes of states s m S ( b g ) in the same rows of an R M B - dimensional Karnaugh map. If an input x l X is replaced by a variable b g B for states s m , s i S ( b g ) , then we propose placing these states into adjusted cells of the map. To optimize the SOP of b g B , we can use three types of insignificant assignments. They are the following: (1) the states with unconditional transitions; (2) the states which do not belong to a particular set S ( b g ) ; and (3) the combinations of state variables which are not used as state codes. For the discussed example, the Karnaugh map (Figure 5) includes the state codes.
Let us explain how this map was created. There are the sets S ( b 1 ) = { s 1 , s 2 , s 4 } and X ( b 1 ) = { x 1 , x 4 } . As follows from Figure 5, these states are placed in the same row of the map. For states s 1 and s 4 , the same input x 1 is replaced. So, these states have adjacent codes 000 and 010. The code 001 (state s 3 ) can be thought of as insignificant because the transition from this state is unconditional. The code 011 (state s 5 ) can be thought of as insignificant because there is no input symbol in the row b 1 ( the transaction from this state is unconditional). To optimize the term depended on s 2 , we can use state assignments 110 (no state), 111 (the symbol “–” in the row b 1 ) and 101 (no state). As a result, the following Boolean equation is obtained: b 1 = x 1 T 1 ¯ + x 4 T 1 .
Step 4. Using the approach discussed above, we can obtain the following SBF:
b 1 = x 1 ( A 1 A 4 ) x 4 A 2 = x 1 T 1 ¯ x 4 T 1 ; b 2 = x 2 A 1 x 5 A 2 x 7 ( A 4 A 5 ) = = x 2 T 1 ¯ T 2 ¯ x 7 T 2 ; b 3 = x 3 ( A 3 A 6 ) x 2 A 2 x 8 A 4 = = x 3 T 1 ¯ T 2 ¯ T 3 x 3 T 1 T 2 x 2 T 1 T 2 ¯ x 8 x 3 T 2 ¯ T 3 ¯ .
The analysis of SBF (21) shows that the circuits implemented into its equations have four LUTs. The circuit for b 1 includes a single LUT, as does the circuit for b 2 . The two-level circuit generating b 3 includes two LUTs. Thus, in the discussed case, there are four LUTs and two have their levels in the circuit of LB.
Step 5. We use the approach proposed in the paper [18] to create the partition π S . Using the method [18] gives the following sets: π S = { S 1 , S 2 } , S 1 = { s 1 , s 2 , s 4 } and S 2 = { s 3 , s 5 , s 6 } . Thus, K = 2 .
Step 6. As follows from analysis of classes S k π S , each class includes M k = 3 states. Using (12) and (13) gives the following: R 1 = R 2 = 2 , R S = 4 , V 1 = { v 1 , v 2 } , V 2 = { v 3 , v 4 } and V = { v 1 , , v 4 } . It is known that the partial state codes do not affect the number of LUTs in the circuits of LBk [18]. Thus, we can assign them in the trivial way: codes are assigned as the subscript grows and corresponds to the decimal number of the step to which the code C ( s m ) is assigned. This approach gives the following codes: C ( s 1 ) = C ( s 3 ) = 01 , C ( s 2 ) = C ( s 5 ) = 10 , and C ( s 4 ) = C ( s 6 ) = 11 .
Step 7. As follows from Table 1, during the operation of the FSM A 1 , the following COs are generated: Y 1 = { } , Y 2 = { y 1 , y 7 } , Y 3 = { y 2 } , Y 4 = { y 1 , y 2 } , Y 5 = { y 3 , y 6 } , Y 6 = { y 1 , y 3 , y 7 } , Y 7 = { y 4 , y 9 } , Y 8 = { y 4 , y 5 } , Y 9 = { y 5 , y 8 } , Y 10 = { y 6 , y 8 , y 9 } . Thus, there are Q = 10 collections of outputs generated during the interstate transitions of FSM A 1 . Using (9) gives R C O = 4 and the set Z = { z 1 , , z 4 } .
The encoding is executed in such a way as to reduce the total number of literals in SOPs (11). This can be carried out using, for example, the approach from the work [44]. One of the possible outcomes is shown in (Figure 6).
Step 8. Using codes K ( Y q ) and insignificant input assignments [1], we can obtain the following SBF:
y 1 = Y 2 Y 4 Y 6 = z 1 ¯ z 2 ; y 2 = Y 3 Y 4 = z 1 ¯ z 4 ; y 3 = Y 5 Y 6 = z 1 ¯ z 3 ; y 4 = Y 7 Y 8 = z 1 z 3 ¯ ; y 5 = Y 8 Y 9 = z 1 z 4 ; y 6 = Y 5 Y 10 = z 2 ¯ z 3 ; y 7 = Y 2 Y 6 = z 2 z 4 ¯ ; y 8 = Y 9 Y 10 = z 1 z 3 ; y 9 = Y 7 Y 10 = z 1 z 4 ¯ .
The SBF (22) represents the circuit of block LY. Thus, it corresponds to SBF (11). The maximum number of literals in the SOPs of (11) is determined as N × R C O . In the discussed case, this number is equal to 9 × 4 = 36. The SBF (22) contains 18 literals. Thus, using the approach [44] allows a reduction in the number of literals by a factor of 2.0 compared to its maximum possible value. Each literal corresponds to the interconnection between the blocks LTZ and LY. Thus, reducing the number of literals results in reducing the number of interconnections. This is a positive factor because interconnections significantly influence the chip area used, power consumption and performance.
Step 9. To create a table of LBk, it is necessary to use the STT rows representing transitions from states s m S k . For example, to create a table representing LB1, we should choose the rows 1–8 and 10–13 of Table 1. The column X h should be replaced by the column B h 1 . This column includes the conjunctions of variables b g B corresponding the conjunctions of replaced inputs x l X . The column Y h is replaced by the column Z h 1 . This column includes the variables z r Z equal to 1 in the codes K ( Y q ) of COs shown the corresponding rows of STT.
In addition, this table includes the columns C ( s C ) (the partial code of the current state), K ( s T ) (the MBC of the next state), and D h 1 (IMFs equal to 1 to load the code K ( s T ) into RG). In the discussed case, this table contains H1 = 12 rows (Table 3).
For example, the second row of Table 3 is created in the following manner. This row is constructed using the second row of Table 1. This row describes the transition s 1 , s 2 executed when the following relation takes place: x 1 ¯ x 2 = 1 . During this transition, the CO Y 2 = { y 4 , y 4 } is produced. From the outcome of step 6, we have the code C ( s 1 ) = 01 . This code should be placed in the column C ( s C ) . Using the Karnaugh map (Figure 5) gives state code K ( s T ) = 100 . This code should be placed in the column K ( s T ) . It determines existence of the symbol D 1 in the column D h 1 ( h = 2 ) of Table 3. As follows from the column s 1 of Table 2, the input x 1 is represented by b 1 and the input x 2 is replaced by the variable b 2 . Thus, the conjunction x 1 ¯ x 2 is replaced by the conjunction b 1 ¯ b 2 written in the column B h 1 ( h = 1 ) of Table 3.
A similar approach is used to create all the rows of Table 3 (block LB1) and Table 4 (block LB2). These tables represent SBFs (15) and (16). There are examples of some SOPs shown below:
z 1 1 = v 1 v 2 ¯ b 1 ¯ b 2 ¯ v 1 v 2 b 1 v 1 v 2 b 2 ¯ b 3 ¯ ; D 3 1 = v 1 ¯ v 2 b 1 ¯ b 2 ¯ v 1 v 2 ¯ b 1 ¯ b 2 ¯ v 1 v 2 b 1 ¯ b 2 .
z 1 2 = v 3 ¯ v 4 v 3 v 4 ¯ b 2 v 3 v 4 b 3 ¯ ; D 3 2 = v 3 ¯ v 4 v 3 v 4 ¯ .
Step 10. The table of block LTZ includes the following columns: “Function” (the column includes symbols D r D and z r Z ), LB1, LB2. If a PBF is generated by the block LBk ( k { 0 , 1 , , K } ), then the intersection of the row with this function and the column LBk is marked by 1. Otherwise, this intersection contains zero. The block LTZ is represented by Table 5.
To fill the columns LB1 and LB2, we use Table 3 and Table 4, respectively. In the discussed case, Table 5 determines SBFs (17) and (18). For example, the following disjunctions may be derived from Table 5:
z 1 = z 1 1 z 1 2 ; D 3 = D 3 1 D 3 2 .
Step 11. The block LV converts MBC codes K ( s m ) into the partial state codes C ( s m ) . The conversion is executed for all states. The table of LV includes the columns s m , K ( s m ) , C ( s m ) , V m . If there is v r = 1 for a particular code C ( s m ) , then there is the symbol v r in the column V m (Table 6).
Using Table 6, it is possible to create SBF (14) represented by its perfect SOPs. To minimize these SOPs, we can create a multi-functional Karnaugh map, as shown in Figure 7.
This Karnaugh map is created using the codes from Figure 5. In Figure 7, the symbols of states s m S are replaced by symbols of additional variables v r V . This is performed in the following way: if a particular cell of Figure 5 includes a state s m S k , then the symbols v r V k are rewritten into the corresponding cell of Figure 7. Using Figure 7 gives the following SBF, which determines the contents of LUTs from the block LV:
v 1 = A 2 A 4 = T 1 T 3 ¯ T 2 T 3 ¯ ; v 2 = A 1 A 4 = T 1 ¯ T 3 ¯ ; v 3 = A 5 A 6 = T 2 T 3 ; v 4 = A 3 A 6 = T 2 ¯ T 3 T 1 T 3 .
Step 12. Using the obtained SOPs, we can estimate how many LUTs it is necessary to implement in the circuit of MP T Y(A1). As follows from SBF (21), condition (19) holds for SOP functions b 1 , b 2 B . Thus, each of these functions is implemented using a single LUT with S L = 5 . There are six literals in the SOP b 3 B . Thus, this SOP should be decomposed. As a result, the corresponding circuit includes two LUTs connected in series. Due to this, the circuit of LB includes four LUTs and has two levels of logic (Figure 8).
Each of the blocks LB1, LB2 (the second level of logic) and LTZ (the third level of logic) have circuits with seven LUTs. Each of these circuits is single-level. The fourth level consists of circuits for blocks LY (nine LUTs) and LV (four LUTs).
Thus, the resulting circuit has five levels and includes 38 LUTs. Our analysis of Mealy FSM MPY(A1) shows the following. There are the same LUT counts for the circuits of the blocks LB and LY of equivalent MPY and MP T Y FSMs. Thus, in the discussed case, these blocks include 4 + 9 = 13 LUTs. There are R M B + G = 6 literals in the SOPs of SBFs (7) and (10). Using LUTs with five inputs leads to the functional decomposition of these SOPs. As the result, there are three LUTs in a two-level circuit implementing any function from SBFs (7) and (10). There are R M B + R C O = 7 functions generated by the LTZ of Mealy FSM MPY(A1). Thus, there are 21 LUTs in this circuit. This calculation gives 34 LUTs in the circuit of Mealy FSM MPY(A1). The circuit has five levels of LUTs.
Thus, there is the same number of levels in the circuits of FSMs MPY(A1) and MP T Y(A1). However, the circuit of Mealy FSM MPY(A1) includes fewer LUTs. It is possible to obtain the same LUT count for both circuits if we change the approach for the encoding of states and COs [16]. However, we do not discuss this approach in our current paper.
Our example is rather simple. It is necessary to compare equivalent FSMs based on various approaches using some benchmarks with a wide range of characteristics. Such a comparison is given in the next Section. This comparison is executed for FPGAs produced by AMD Xilinx. Due to this, the industrial package Vivado [39] is applied to fulfil all the necessary steps of technology mapping [7,26,45].

6. Experimental Results

To compare the LUT-based circuits produced by our proposed method with circuits obtained using some known design methods, we use 48 benchmarks creating the library LGSsynth93 [31]. These benchmarks have a wide diapason of their main characteristics such as: the numbers of transitions, internal states, input variables, output functions, collections of FSM outputs. The benchmarks are represented by STTs in the format KISS2. The choice of this library is based on the fact that a lot of FSM designers use it to compare their results with main characteristics of known FSM circuits [27,36,37,46,47,48]. The characteristics of the benchmark FSMs could be found, for example, in our previous articles. Due to this, we do not show them in our current paper.
To conduct the experiments, we use the Virtex-7 VC709 platform (xc7vx690tffg1761-2) [49] based on FPGA chip xc7vx690tffg1761-2 (AMD Xilinx). The CLBs of this chip include LUTs with six address inputs. To obtain the FSM circuits, we use an industrial package Vivado v2019.1 (64-bit) [39] produced by AMD Xilinx. To process the benchmarks, we use their VHDL-based models. To transform the KISS2-based benchmarks files into VHDL codes, the CAD tool K2F [50] is applied.
For each benchmark, we use Vivado reports to find the LUT counts and performance (the values of cycle time and maximum operating frequency). We compare the proposed FSM model with four different FSM models. Three of these models are P FSMs based on: (1) Auto of Vivado (P Mealy FSMs with MBCs); (2) One-hot of Vivado (one-hot-based P Mealy FSMs); (3) JEDI (P Mealy FSMs with MBCs). As the fourth model, we investigate the MPY Mealy FSMs.
In our research, we take into account the fact that FSMs are not stand-alone units. To achieve the stability of the outputs, we use an additional synchronous register. In the cases of P FSMS, the inputs are loaded into this register. Thus, it consists of L flip-flops. Obviously, to implement this register, it is necessary to use L additional LUTs. In the cases of both MPY and M P T Y FSMs, this register keeps the codes of COs. Thus, it has R C O flip-flops and does not require additional LUTs. In addition, it does not require the additional synchronization pulse. This simplifies the synchronization circuit compared with equivalent P FSMs.
The results of experiments [16,17] show that practically all the characteristics of LUT-based FSM circuits strongly depend on the relation between the values of L + R M B , on the one hand, and S L , on the other hand. In experiments, we use Virtex-7 FPGAs for which S L = 6 . We divided the set of benchmarks by classes of complexity (CC). If the symbol CCP ( P = 1 , 2 , ) means a class number, then the benchmarks belonging to a certain class is determined by the expression
C C P = ( L + R M B ) / S L 1 .
For the library used, there are five classes of complexity (CC0-CC4). In each of the following tables, the benchmarks belonging to a certain class are shown in the column “Class of complexity”. The class CC0 includes trivial FSMs. The class CC1 includes simple FSMs. The class CC2 includes average FSMs. The class CC3 includes big FSMs. Finally, the class CC4 includes very big FSMs.
Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14, Table 15 and Table 16 contain the results of the experiments conducted. Table 7 includes the numbers of LUTs necessary to implement the electrical circuit for a given benchmark. All benchmarks are represented in this table. Table 8 contains the LUT counts for classes CC0–CC1. Table 9 contains the LUT counts for classes CC2–CC4. The negative influence of the number of FSM inputs is shown in Table 10. Table 11 contains the values of the minimum cycle times for each benchmark. The data for these tables are taken from the Vivado reports. In addition, we show cycle times separately for classes CC0–CC1 (Table 12) and CC2–CC4 (Table 13). The values of the maximum operating frequencies are shown in Table 14. These values are obtained in a simple way using data from Table 11. In addition, we show the frequencies separately for classes CC0–CC1 (Table 15) and CC2–CC4 (Table 16).
Each table is organized in the same manner. The first column includes the benchmarks’ names, the row “Total” and the row “Percentage”. The names of the investigated methods are shown in the next five columns. The classes of complexity are shown in the last column. In the row “Total” are shown the results of the summation of values for a particular column. Finally, the row “Percentage” includes the percentage of the summarized characteristics of various FSM circuits in relation to the summarized characteristics of MP T Y FSMs. We start the discussion of the results starting with Table 7.
As follows from Table 8, as compared to other investigated methods, the circuits of MP T Y-based FSMs consist of the minimum number of LUTs. There is the following gain: (1) 56.99% compared to Auto-based FSMs; (2) 79.13% compared to One-hot –based FSMs; (3) 33.13% compared to JEDI-based FSMs; and (4) 8.98% compared to MPY-based FSMs. In second place in terms of gain are MPY-based FSMs. We think this gain is associated with two factors. First, for rather complex FSMs, SD-based circuits always have fewer LUTs than for equivalent FD-based FSMs [9]. Second, there are an additional L LUTs in the circuits of FD-based FSMs required to stabilize their operation. In the case of both MPY- and MP T Y-based FSMs, the stabilization is achieved by registering the codes of COs. To produce these codes, LUTs of LTZ are used. The outputs of these LUTs are connected with R C O flip-flops creating the additional register. Thus, there is no need for additional LUTs. Of course, the gain is also associated with replacing FSM inputs with additional variables. We think that this diminishes the number of partial functions compared to equivalent FD-based FSMs.
It is interesting to show how the gain is changed with the change in FSM complexity. Using Table 7, we created two additional tables. Table 9 shows LUT counts for trivial and simple FSMs. Table 9 contains information about LUT counts for average, big and very big FSMs.
Analysis of Table 8 shows that the proposed approach provides the same LUT counts as for equivalent MPY FSMs. All P-based models require more LUTs. Our approach gives the following gain: (1) 24.89% compared to Auto-based FSMs; (2) 56.11% compared to One-hot—based FSMs; and (3) 9.61% compared to JEDI-based FSMs. We think that this gain is connected to the different stabilization methods used in SD- and FD-based FSMs. The input register of FD-based FSMs requires more LUTs than the output register of SD-based FSMs. However, both MPY- and MP T Y-based FSMs require more LUTs for trivial FSMs (the complexity class CC0). We think this has a very simple explanation. Namely, for trivial FSMs, the condition (5) holds. Thus, there is no need to apply the SD-based methods. However, these methods are always used during the synthesis of both MPY- and MP T Y-based FSMs. In this case, it is necessary to implement circuits of blocks LB and LY. It is the presence of these absolutely redundant blocks that determines the marked loss of SD-based methods.
The next phenomenon comes from Table 8: for the class CC0, the circuits of equivalent MPY- and MP T Y-based FSMs have equal amounts of LUTs. We think this is connected with the fact that the partition π S consists of one class. Due to this, there is no need to use the blocks LB1–LBK. This means that MP T Y FSMs turn into MPY FSMs. Obviously, these FSM circuits should have equal values for all the other characteristics. This, once again, indicates that it is advisable to use different FSM models for different conditions. Thus, it makes no sense to apply SD-based methods when condition (5) is met.
Now, we are going to discuss the temporal characteristics of FSM circuits. First of all, we show the negative influence of input register. In all P-based FSMs, the stabilization of operation is achieved due to loading FSM inputs into the additional register. Thus, this approach leads to the use of L additional LUTs and flip-flops. Obviously, the cycle time increases due to the presence of the chain < input-LUTs–flip-flops–LUTs of LB>. In addition, this increases the consumed power. We explored how the number of inputs affects the time and power characteristics of resulting circuits. This information is shown in Table 10.
As follows from Table 10, the number of inputs significantly affects the timing and energy characteristics of LUT-based FSM circuits. The more inputs the FSM has, the greater their negative impact. In the case of the investigated SD-based FSMs, the stabilization is achieved due to the registering codes of COs. In this case, the number of additional flip-flops is equal to R C O . Moreover, there is no need for additional LUTs because the codes of COs are generated by the LUTs of LTZ. As follows, for the studied benchmarks, the following relation holds: R C O L . The validity of this relation determines the gain in time characteristics obtained due to the transition from FD-based FSMs to SD-based FSMs. This gain is shown in Table 11.
As follows from Table 11, the SD-based FSMs have the best values of cycle time. Our proposed method produces FSM circuits which are a bit slower than the circuits of MPY-based FSMs (the average loss is 0.76%). However, our method has the following average gain compared to other FSMs: (1) 70.65% compared to Auto-based FSMs; (2) 71.08% compared to One-hot-based FSMs; and (3) 62.13% compared to JEDI-based FSMs. This gain for the SD-based FSMs is explained by the difference in the methods used for stabilizing the FSM outputs, as discussed before.
To show the influence of FSM complexity, we create two additional tables. Table 12 includes information about the cycle times for trivial and simple FSMs. Table 13 includes information about the cycle times for average, big and very big FSMs.
As follows from Table 12, the time characteristics are equal for SD-based trivial and simple FSMs. They have the following gain: (1) 65.63% compared with both Auto- and One-hot—based FSMs and (2) 59.60% compared with JEDI-based FSMs. The reasons for this situation are as discussed before.
As follows from Table 13, starting from the complexity CC2, our approach wins in performance. There is the following gain: (1) 78.93% compared with Auto-based FSMs; (2) 79.72% compared with One-hot-based FSMs; (3) 66.3% compared with JEDI-based FSMs and (4) 2.0% compared with equivalent MPY FSMs. We think that the superiority of SD-based FSMs is due to the fact that they generate fewer partial Boolean functions. Due to this, their circuits have fewer logic levels and interconnections. In turn, they are faster.
The slight superiority of MP T Y FSMs (2%) in relation to MPY FSMs is due to the fact that MP T Y FSMs have fewer interconnections. This is connected with different approaches of stabilization. Since interconnections significantly affect the timing characteristics, our approach produces faster circuits for FSMs from the classes CC2-CC4. Apparently, equivalent SD-based FSMs have the same number of logic levels (the number of series-connected LUTs). Thus, with respect to the other methods under study, the performance of MP T Y FSMs improves as their complexity increases.
We did not obtain the values of maximum operating frequencies from Vivado reports. However, we calculated them using the values of cycle times. The frequency comparison is represented by Table 14.
As follows from Table 14, on average, the circuits of MP T Y-based FSMs are faster in relation to all other models. There is the following gain: (1) 58.79% compared to Auto-based FSMs; (2) 58.7% compared to One-hot-based FSMs; (3) 61.65% compared to JEDI-based FSMs; and (4) 0.64% compared to MPY-based FSMs. Obviously, the reasons for this gain are the same as the ones discussed for the time of cycles. We will not repeat them.
Naturally, the change in the gain in frequency has the same tendencies as the change in the gain in cycle time. This statement is justified by information from Table 15 and Table 16.
It should be noted that the gain in operating frequency for our method begins to appear from the complexity CC2. At the same time, the gain grows in the process of the transition to the highest categories of complexity.
Thus, if FSMs belong to the classes CC0-CC1, then equivalent MP T Y and MPY FSMs have the same values of LUT counts, cycle time and maximum operating frequency. For more complex FSMs, MP T Y FSMs require fewer LUTs than for equivalent MPY FSMs. In addition, for FSMs from classes CC0-CC1, both models have the same values of temporal characteristics. However, as the complexity increases, the temporal characteristics of the MP T Y FSMs gradually become slightly better than they are for equivalent MPY FSMs. This gain is rather small; however, the very fact that a decrease in the number of LUTs does not lead to performance degradation is important. The results of the experiments allow us to draw the following conclusion: MP T Y FSMs can replace MPY FSMs for average, big and very big sequential devices. For a more visual assessment of the results, we built a diagram (Figure 9). This diagram shows a comparison of percentages for the main characteristics of the studied methods.
To construct charts (Figure 9), we used tables in which the results are shown for all benchmarks, and not for their individual categories. To show the results for LUT counts, we used Table 7. The times of cycles are taken from Table 11. At last, the results for the values of maximum operating frequencies are derived from Table 14. It clearly follows from Figure 9 that the proposed method allows the improvement in the spatial characteristics of circuits (without the degradation of temporal characteristics).

7. Conclusions

Modern FPGAs are widely used in digital design [2]. These chips are very powerful: today, a single chip may implement a circuit with very complicated blocks [4]. Being universal, these chips have a significant drawback: they include a huge number of LUT elements with an extremely small number of inputs [3,4]. This phenomenon leads to the need to use extremely sophisticated methods for optimizing the FSM-based logic circuits. It is this shortcoming that necessitates the use of various methods of functional decomposition to obtain the resulting circuit. As a result of functional decomposition, the implemented circuits are multi-level. These circuits are slower and less energy efficient than the equivalent single-level solutions.
The use of structural decomposition methods allows the improvement in the main characteristics of multi-level FSM circuits [9]. The analysis of the work [9] leads to the conclusion that in the vast majority of cases, the SD-based FSM circuits are significantly better than their FD-based counterparts. In the paper [17], the decrease in LUT counts is achieved due to joint use of such SD-based methods as the replacement of inputs and encoding of output collections. As follows from [17], this approach allows the obtaining of MPY FSMs, whose circuits have better characteristics compared with equivalent FD-based circuits.
To reduce the LUT count in the circuits of MPY-based FSMs, we propose to replace the maximum binary state codes with extended state codes. The proposed approach is based on using twofold state assignment [18]. As follows from the experiments, the proposed approach reduces LUT counts without the degradation of temporal characteristics as compared to equivalent MPY-based FSMs. We hope the proposed method can be used in FPGA-based designs.

Author Contributions

Conceptualization, A.B., L.T., M.M. and K.K.; Methodology, A.B., L.T., M.M. and K.K.; Software, A.B., L.T., M.M. and K.K.; Validation, A.B., L.T., M.M. and K.K.; Formal analysis, A.B., L.T., M.M. and K.K.; Investigation, A.B., L.T., M.M. and K.K.; Writing—original draft preparation, A.B., L.T., M.M. and K.K.; Supervision, A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CLBconfigurable logic block
COcollection of outputs
DSTdirect structure table
FDfunctional decomposition
FPGAfield-programmable gate array
FSMfinite state machine
IMFinput memory function
LUTlook-up table
RGstate-code register
SBFsystem of Boolean functions
SDstructural decomposition
STGstate-transition graph
STTstate-transition table

References

  1. De Micheli, G. Synthesis and Optimization of Digital Circuits; McGraw–Hill: New York, NY, USA, 1994; p. 578. [Google Scholar]
  2. Baranov, S. High-Level Synthesis of Digital Systems: For Data-Path and Control Dominated Systems; Amazon: Seattle, WA, USA, 2018; p. 207. [Google Scholar]
  3. Trimberg, S.M. Three ages of FPGA: A Retrospective on the First Thirty Years of FPGA Technology. IEEE Proc. 2015, 103, 318–331. [Google Scholar] [CrossRef]
  4. Ruiz-Rosero, J.; Ramirez-Gonzalez, G.; Khanna, R. Field Programmable Gate Array Applications—A Scientometric Review. Computation 2019, 7, 63. [Google Scholar] [CrossRef] [Green Version]
  5. Grout, I. Digital systems design with FPGAs and CPLDs; Elsevier Science: Amsterdam, The Netherlands, 2011. [Google Scholar]
  6. Trimberger, S.M. Field-Programmable Gate Array Technology; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  7. Kubica, M.; Opara, A.; Kania, D. Technology Mapping for LUT-Based FPGA; Lecture Notes in Electrical Engineering; Springer: Cham, Switzerland, 2021; p. 216. [Google Scholar]
  8. Ling, A.; Singh, D.P.; Brown, S.D. FPGA Technology Mapping: A Study of Optimality. In Proceedings of the 42nd Annual Design Automation Conference, Anaheim, CA, USA, 13–17 June 2005; Association for Computing Machinery: New York, NY, USA, 2005; pp. 427–432. [Google Scholar] [CrossRef]
  9. Barkalov, A.; Titarenko, L.; Krzywicki, K. Structural Decomposition in FSM Design: Roots, Evolution, Current State—A Review. Electronics 2021, 10, 1174. [Google Scholar] [CrossRef]
  10. Chapman, K. Multiplexer Design Techniques for Datapath Performance with Minimized Routing Resources; Xilinx All Programmable: Santa Clara, CA, USA, 2014; pp. 1–32. [Google Scholar]
  11. Kubica, M.; Opara, A.; Kania, D. Logic Synthesis Strategy Oriented to Low Power Optimization. Appl. Sci. 2021, 11, 8797. [Google Scholar] [CrossRef]
  12. Sasao, T.; Mishchenko, A. LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations. In Proceedings of the International Workshop on Logic Synthesis, Berkeley, CA, USA, 31 July–2 August 2009; pp. 310–316. [Google Scholar]
  13. Gazi, O.; Arli, A.C. State Machines Using VHDL: FPGA Implementation of Serial Communication and Display Protocols; Springer: Berlin/Heidelberg, Germany, 2021; p. 326. [Google Scholar] [CrossRef]
  14. Kubica, M.; Kania, D.; Kulisz, J. A Technology Mapping of FSMs Based on a Graph of Excitations and Outputs. IEEE Access 2019, 7, 16123–16131. [Google Scholar] [CrossRef]
  15. Zgheib, G.; Ouaiss, I. Enhanced Technology Mapping for FPGAs with Exploration of Cell Configurations. J. Circuits Syst. Comput. 2015, 24, 1550039. [Google Scholar] [CrossRef]
  16. Barkalov, A.; Titarenko, L.; Krzywicki, K. Using a Double-Core Structure to Reduce the LUT Count in FPGA-Based Mealy FSMs. Electronics 2022, 11, 3089. [Google Scholar] [CrossRef]
  17. Barkalov, A.; Titarenko, L.; Krzywicki, K. Reducing LUT Count for FPGA-Based Mealy FSMs. Appl. Sci. 2020, 10, 5115. [Google Scholar] [CrossRef]
  18. Barkalov, A.; Titarenko, L.; Mielcarek, K. Hardware reduction for LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2018, 28, 595–607. [Google Scholar] [CrossRef] [Green Version]
  19. AMD Xilinx FPGAs. Available online: https://www.xilinx.com/products/silicon-devices/fpga.html (accessed on 31 January 2023).
  20. Baranov, S. Logic Synthesis of Control Automata; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1994; p. 312. [Google Scholar]
  21. Kubica, M.; Kania, D. Technology Mapping of FSM Oriented to LUT-Based FPGA. Appl. Sci. 2020, 10, 3926. [Google Scholar] [CrossRef]
  22. Jóźwiak, L.; Ślusarczyk, A.; Chojnacki, A. Fast and compact sequential circuits for the FPGA-based reconfigurable systems. J. Syst. Archit. 2003, 49, 227–246. [Google Scholar] [CrossRef]
  23. Islam, M.M.; Hossain, M.S.; Shahjalal, M.; Hasan, M.K.; Jang, Y.M. Area-Time Efficient Hardware Implementation of Modular Multiplication for Elliptic Curve Cryptography. IEEE Access 2020, 8, 73898–73906. [Google Scholar] [CrossRef]
  24. Mishchenko, A.; Brayton, R.; Jiang, J.H.R.; Jang, S. Scalable Don’t-Care-Based Logic Optimization and Resynthesis. ACM Trans. Reconfig. Technol. Syst. 2011, 4, 1–23. [Google Scholar] [CrossRef]
  25. Senhadji-Navarro, R.; Garcia-Vargas, I. Mapping Arbitrary Logic Functions onto Carry Chains in FPGAs. Electronics 2022, 11, 27. [Google Scholar] [CrossRef]
  26. Kubica, M.; Kania, D. Technology mapping oriented to adaptive logic modules. Bull. Pol. Acad. Sci. Tech. Sci. 2019, 67, 947–956. [Google Scholar]
  27. El-Maleh, A.H. A Probabilistic Tabu Search State Assignment Algorithm for Area and Power Optimization of Sequential Circuits. Arab. J. Sci. Eng. 2020, 45, 6273–6285. [Google Scholar] [CrossRef]
  28. Salauyou, V.; Ostapczuk, M. State Assignment of Finite-State Machines by Using the Values of Output Variables. In Theory and Applications of Dependable Computer Systems, Proceedings of the Fifteenth International Conference on Dependability of Computer Systems DepCoS-RELCOMEX, Brunow, Poland, 29 June–3 July 2020; Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J., Eds.; Springer: Cham, Switzerland, 2020; Volume 1173, pp. 543–553. [Google Scholar]
  29. Sentowich, E.; Singh, K.L.; Lavango, L.; Moon, C.; Murgai, R.; Saldanha, A.; Savoj, H.; Stephan, P.R.; Bryton, R.K.; Sangiovanni-Vincentelli, A.L. SIS: A System for Sequential Circuit Synthesis; Technical Report; University of California: Berkely, CA, USA, 1992. [Google Scholar]
  30. Tatalov, E. Synthesis of Compositional Microprogram Control Units for Programmable Devices. Master’s Thesis, Donetsk National Technical University, Donetsk, Ukraine, 2011. [Google Scholar]
  31. McElvain, K. LGSynth93 Benchmark; Mentor Graphics: Wilsonville, OR, USA, 1993. [Google Scholar]
  32. Skliarova, I.; Sklyarov, V.; Sudnitson, A. Design of FPGA-Based Circuits Using Hierarchical Finite State Machines; TUT Press: Tallinn, Estonia, 2012. [Google Scholar]
  33. Khatri, S.P.; Gulati, K. Advanced Techniques in Logic Synthesis, Optimizations and Applications; Springer: New York, NY, USA, 2011. [Google Scholar]
  34. Das, N.; Panchanathan, A. ReSET: A Reconfigurable State Encoding Technique for FSM to achieve Security and Hardware optimality. Microprocess. Microsyst. 2020, 77, 103196. [Google Scholar] [CrossRef]
  35. Tao, Y.; Zhang, Y.; Qinyu, W.; Jian, C. MPGA: An Evolutionary State Assignment for Dynamic and Leakage Power reduction at FSM synthesis. IET Comput. Digit. Tech. 2018, 12, 111–120. [Google Scholar] [CrossRef]
  36. El-Maleh, A.H. A probabilistic pairwise swap search state assignment algorithm for sequential circuit optimization. Integration 2017, 56, 32–43. [Google Scholar] [CrossRef]
  37. Mishchenko, A.; Chatterjee, S.; Brayton, R.K. Improvements to Technology Mapping for LUT-Based FPGAs. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2007, 26, 240–253. [Google Scholar] [CrossRef] [Green Version]
  38. ABC System. Available online: https://people.eecs.berkeley.edu/~alanmi/abc/ (accessed on 31 January 2023).
  39. Vivado Design Suite User Guide: Synthesis. UG901 (v2019.1). Available online: https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug901-vivado-synthesis.pdf (accessed on 31 January 2023).
  40. Xilinx Vitis. Available online: https://www.xilinx.com/products/design-tools/vitis/vitis-platform.html (accessed on 31 January 2023).
  41. Quartus Prime. Available online: https://www.intel.pl/content/www/pl/pl/software/programmable/quartus-prime/overview.html (accessed on 31 January 2023).
  42. Gajski, D.; Gerstlauer, A.; Abdi, S.; Schirner, G. Embedded System Design: Modeling, Synthesis and Verification; Springer: New York, NY, USA, 2009; p. 352. [Google Scholar] [CrossRef]
  43. Baranov, S. Finite State Machines and Algorithmic State Machines: Fast and Simple Design of Complex Finite State Machines; Amazon: Seattle, WA, USA, 2018; p. 185. [Google Scholar]
  44. Achasova, S. Synthesis Algorithms for Automata with PLAs; M: Soviet Radio: Russia, Moscow, 1987. (In Russian) [Google Scholar]
  45. Soloviev, V. Architecture of the FILM of the Firm Xilinx: CPLD and FPGA of the 7th Series; Hot-line Telecom: Moscow, Russia, 2016; p. 392. (In Russian) [Google Scholar]
  46. Czerwinski, R.; Kania, D. Finite State Machine Logic Synthesis for Complex Programmable Logic Devices; Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2013; p. 231. [Google Scholar]
  47. Benini, L.; Bogliolo, A.; Micheli, G. A survey of design techniques for system-level dynamic power management. IEEE Trans Very Large Scale Integr. (VLSI) Syst. 2000, 8, 299–316. [Google Scholar] [CrossRef]
  48. De Micheli, G.; Brayton, R.K.; Sangiovanni-Vincentelli, A. Optimal State Assignment for Finite State Machines. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2006, 4, 269–285. [Google Scholar] [CrossRef] [Green Version]
  49. VC709 Evaluation Board for the Virtex-7 FPGA User Guide; UG887 (v1.6); Xilinx, Inc.: San Jose, CA, USA, 2019.
  50. Barkalov, A.; Titarenko, L.; Mielcarek, K.; Chmielewski, S. Logic Synthesis for FPGA-Based Control Units—Structural Decomposition in Logic Design; Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2020; Volume 636. [Google Scholar] [CrossRef]
Figure 1. Structural diagram of LUT-based P Mealy FSM.
Figure 1. Structural diagram of LUT-based P Mealy FSM.
Electronics 12 01133 g001
Figure 2. Structural diagram of MPY Mealy FSM.
Figure 2. Structural diagram of MPY Mealy FSM.
Electronics 12 01133 g002
Figure 3. Structural diagram of MP T Y Mealy FSM.
Figure 3. Structural diagram of MP T Y Mealy FSM.
Electronics 12 01133 g003
Figure 4. State transition graph of Mealy FSM A 1 .
Figure 4. State transition graph of Mealy FSM A 1 .
Electronics 12 01133 g004
Figure 5. Outcome of state maximum binary state assignment.
Figure 5. Outcome of state maximum binary state assignment.
Electronics 12 01133 g005
Figure 6. Codes of output collections.
Figure 6. Codes of output collections.
Electronics 12 01133 g006
Figure 7. Multi-functional map of LV.
Figure 7. Multi-functional map of LV.
Electronics 12 01133 g007
Figure 8. Circuit of block LB for Mealy FSM MP T Y(A1).
Figure 8. Circuit of block LB for Mealy FSM MP T Y(A1).
Electronics 12 01133 g008
Figure 9. Comparison of percentages for the main characteristics of the studied methods.
Figure 9. Comparison of percentages for the main characteristics of the studied methods.
Electronics 12 01133 g009
Table 1. State transition table of FSM A 1 .
Table 1. State transition table of FSM A 1 .
S c S T X h Y h qh
s 1 s 1 x 1 -11
s 2 x 1 ¯ x 2 y 1 y 7 22
s 5 x 1 ¯ x 2 ¯ x 3 y 2 33
s 3 x 1 ¯ x 2 ¯ x 3 ¯ y 1 y 2 44
s 2 s 2 x 4 y 3 y 6 55
s 4 x 4 ¯ x 5 y 1 y 3 y 7 66
s 6 x 4 ¯ x 5 ¯ x 6 y 4 y 9 77
s 5 x 4 ¯ x 5 ¯ x 6 ¯ y 4 y 5 88
s 3 s 6 1 y 5 y 8 99
s 4 s 4 x 1 y 5 y 8 1010
s 6 x 1 ¯ x 7 y 2 311
s 1 x 1 ¯ x 7 ¯ x 8 -112
s 2 x 1 ¯ x 7 ¯ x 8 ¯ y 3 y 6 513
s 5 s 6 x 7 y 4 y 9 714
s 3 x 7 ¯ y 1 y 2 415
s 6 s 4 x 3 y 1 y 7 216
s 1 x 3 ¯ y 4 y 5 817
Table 2. Table of RI for FSM A 1 .
Table 2. Table of RI for FSM A 1 .
B S S 1 S 2 S 3 S 4 S 5 S 6
b 1 x 1 x 4 - x 1 --
b 2 x 2 x 5 - x 7 x 7 -
b 3 x 3 x 6 - x 8 - x 3
Table 3. Table of block LB1.
Table 3. Table of block LB1.
S c C ( S c ) S T K ( S T ) B h 1 Z h 1 D h 1 h
s 1 01 s 1 000 b 1 --1
s 2 100 b 1 ¯ b 2 z 2 D 1 2
s 5 011 b 1 ¯ b 2 ¯ b 3 z 4 D 2 D 3 3
s 3 001 b 1 ¯ b 2 ¯ b 3 ¯ z 2 z 4 D 3 4
s 2 10 s 2 100 b 1 z 3 D 1 5
s 4 010 b 1 ¯ b 2 z 2 z 3 D 2 6
s 6 111 b 1 ¯ b 2 ¯ b 3 z 1 D 1 D 2 D 3 7
s 5 011 b 1 ¯ b 2 ¯ b 3 ¯ z 1 z 4 D 2 D 3 8
s 4 11 s 4 010 b 1 z 1 z 3 D 2 9
s 6 111 b 1 ¯ b 2 z 4 D 1 D 2 D 3 10
s 1 000 b 1 ¯ b 2 ¯ b 3 --11
s 2 100 b 1 ¯ b 2 ¯ b 3 ¯ z 1 D 1 12
Table 4. Table of block LB2.
Table 4. Table of block LB2.
S c C ( S c ) S T K ( S T ) B h 2 Z h 2 D h 2 h
s 3 01 s 6 1111 z 1 z 2 z 3 z 4 D 1 D 2 D 3 1
s 5 10 s 6 111 b 1 z 3 D 1 D 2 D 3 2
s 3 001 b 2 ¯ z 2 z 4 D 3 3
s 6 11 s 4 010 b 3 z 2 z 4 D 2 4
s 6 000 b 3 ¯ z 1 z 7 -5
Table 5. Table of LTZ.
Table 5. Table of LTZ.
FunctionLB1LB2
D 1 11
D 2 11
D 3 11
z 1 11
z 2 11
z 3 11
z 4 11
Table 6. Table of block LV.
Table 6. Table of block LV.
S m K ( S m ) C ( S m ) V m
s 1 0000100 v 2
s 2 1001000 v 1
s 3 0010001 v 4
s 4 0101100 v 1 v 2
s 5 0110010 v 3
s 6 1110011 v 3 v 4
Table 7. Experimental results (the LUT counts).
Table 7. Experimental results (the LUT counts).
BenchmarkAutoOne-HotJEDIMPYMP T YClass of Complexity
bbara2121141212CC1
bbsse4044311414CC1
bbtas77799CC0
beecount2222171313CC1
cse4773431818CC1
dk141930131212CC1
dk151819151111CC1
dk161736141414CC1
dk17714799CC0
dk2746588CC0
dk5121111101414CC0
donfile3333262121CC1
ex17983622824CC2
ex21111101111CC1
ex31111111616CC0
ex42119181212CC1
ex51111111515CC0
ex62941272121CC1
ex76761010CC1
keyb5068472828CC1
kirkman5470512822CC2
lion4741010CC0
lion981371212CC0
mark12828252222CC1
mc71071212CC0
modulo128881111CC0
opus3333272020CC1
planet138138957668CC2
planet1138138957668CC2
pma102102947462CC2
s173107695248CC2
s14881321391168679CC2
s14941341401189283CC2
s1a5789514235CC2
s2082342212018CC2
s271022101212CC1
s3863346293131CC1
s4202950282420CC4
s5106767514236CC4
s8201313131414CC1
s832106100867062CC4
s8409897806856CC4
sand1431431259983CC3
shiftreg37388CC0
sse4044373838CC1
styr102129908179CC2
tma5246464136CC2
Total20992395178014571337
Percentage, %156.99179.13133.13108.98100.00
Table 8. Experimental results (the LUT counts for classes CC0-CC1).
Table 8. Experimental results (the LUT counts for classes CC0-CC1).
BenchmarkAutoOne-HotJEDIMPYMP T YClass of Complexity
bbara2121141212CC1
bbsse4044311414CC1
bbtas77799CC0
beecount2222171313CC1
cse4773431818CC1
dk141930131212CC1
dk151819151111CC1
dk161736141414CC1
dk17714799CC0
dk2746588CC0
dk5121111101414CC0
donfile3333262121CC1
ex21111101111CC1
ex31111111616CC0
ex42119181212CC1
ex51111111515CC0
ex62941272121CC1
ex76761010CC1
keyb5068472828CC1
lion4741010CC0
lion981371212CC0
mark12828252222CC1
mc71071212CC0
modulo128881111CC0
opus3333272020CC1
s271022101212CC1
s3863346293131CC1
s8201313131414CC1
shiftreg37388CC0
sse4044373838CC1
Total572715502458458
Percentage, %124.89156.11109.61100.00100.00
Table 9. Experimental results (the LUT counts for classes CC2-CC4).
Table 9. Experimental results (the LUT counts for classes CC2-CC4).
BenchmarkAutoOne-HotJEDIMPYMP T YClass of Complexity
ex17983622824CC2
kirkman5470512822CC2
planet138138957668CC2
planet1138138957668CC2
pma102102947462CC2
s173107695248CC2
s14881321391168679CC2
s14941341401189283CC2
s1a5789514235CC2
s2082342212018CC2
s4202950282420CC4
s5106767514236CC4
s832106100867062CC4
s8409897806856CC4
sand1431431259983CC3
styr102129908179CC2
tma5246464136CC2
Total152716801278999879
Percentage, %173.72191.13145.39113.65100.00
Table 10. Influence of input register on cycle time and consumed power.
Table 10. Influence of input register on cycle time and consumed power.
LPower [W]Data Path Delay [ns]
10.3563.471
20.3673.599
30.3803.603
40.3923.640
50.4063.667
60.4183.688
70.4313.729
80.4483.793
90.4623.800
100.4773.705
110.4913.767
120.5113.898
180.6084.112
190.6234.113
Table 11. Experimental results (the cycle time, nanoseconds).
Table 11. Experimental results (the cycle time, nanoseconds).
BenchmarkAutoOne-HotJEDIMPYMP T YClass of Complexity
bbara8.8118.8118.3525.2145.214CC1
bbsse10.0969.6429.2135.2265.226CC1
bbtas8.4978.4978.4515.3085.308CC0
beecount9.6059.6058.9415.3735.373CC1
cse10.5589.8409.3435.4535.453CC1
dk148.8219.3958.7625.8395.839CC1
dk158.7978.9988.7355.2195.219CC1
dk169.4919.3208.6725.2455.245CC1
dk178.6179.5878.6175.4005.400CC0
dk278.3258.4248.3695.1955.195CC0
dk5128.5668.5668.4774.1194.119CC0
donfile9.0339.0348.5095.1685.168CC1
ex110.42510.9559.4545.8215.741CC2
ex28.6358.6358.5965.6245.624CC1
ex38.7318.7318.7075.9315.931CC0
ex49.2149.3158.8745.4815.481CC1
ex59.1479.1479.1195.4255.425CC0
ex69.5649.7729.3305.3695.369CC1
ex78.5988.5788.5845.2005.200CC1
keyb10.12110.6999.6665.2655.265CC1
kirkman10.97110.39210.2805.6125.482CC2
lion8.5398.5018.5416.0626.062CC0
lion98.4708.9988.4445.2705.270CC0
mark19.8259.8259.3436.3956.395CC1
mc8.6888.7198.6826.0996.099CC0
modulo128.3028.3028.2995.9285.928CC0
opus9.6849.6849.2755.3225.322CC1
planet11.26411.2649.0736.0185.878CC2
planet111.26411.2649.0736.0185.834CC2
pma10.63410.6349.6816.1016.101CC2
s110.62311.15410.1565.8305.707CC2
s148811.01311.37210.1556.4326.206CC2
s149410.48710.6549.8785.7235.511CC2
s1a10.3139.4629.7045.6895.511CC2
s2089.5039.4349.3616.1255.835CC2
s278.6728.8628.6626.3876.387CC1
s3869.6769.4949.3116.1646.164CC1
s4209.8649.7809.7555.8686.028CC4
s5109.7429.7429.1555.3245.834CC4
s82010.69110.6419.7755.7265.726CC1
s83210.97510.6389.8666.7246.401CC4
s8409.1959.2289.1586.2325.882CC4
sand12.39012.39011.6527.2217.087CC3
shiftreg8.3027.2657.0915.5645.564CC0
sse10.0969.6429.4555.5615.561CC1
styr11.06711.49710.6665.9215.719CC2
tma9.83110.4959.8215.7025.596CC2
Total453.73454.88431.08267.89265.88
Percentage, %170.65171.08162.13100.76100.00
Table 12. Cycle times for classes CC0-CC1 (nanoseconds).
Table 12. Cycle times for classes CC0-CC1 (nanoseconds).
BenchmarkAutoOne-HotJEDIMPYMP T YClass of Complexity
bbara8.8118.8118.3525.2145.214CC1
bbsse10.0969.6429.2135.2265.226CC1
bbtas8.4978.4978.4515.3085.308CC0
beecount9.6059.6058.9415.3735.373CC1
cse10.5589.8409.3435.4535.453CC1
dk148.8219.3958.7625.8395.839CC1
dk158.7978.9988.7355.2195.219CC1
dk169.4919.3208.6725.2455.245CC1
dk178.6179.5878.6175.4005.400CC0
dk278.3258.4248.3695.1955.195CC0
dk5128.5668.5668.4774.1194.119CC0
donfile9.0339.0348.5095.1685.168CC1
ex28.6358.6358.5965.6245.624CC1
ex38.7318.7318.7075.9315.931CC0
ex49.2149.3158.8745.4815.481CC1
ex59.1479.1479.1195.4255.425CC0
ex69.5649.7729.3305.3695.369CC1
ex78.5988.5788.5845.2005.200CC1
keyb10.12110.6999.6665.2655.265CC1
lion8.5398.5018.5416.0626.062CC0
lion98.4708.9988.4445.2705.270CC0
mark19.8259.8259.3436.3956.395CC1
mc8.6888.7198.6826.0996.099CC0
modulo128.3028.3028.2995.9285.928CC0
opus9.6849.6849.2755.3225.322CC1
s278.6728.8628.6626.3876.387CC1
s3869.6769.4949.3116.1646.164CC1
s82010.69110.6419.7755.7265.726CC1
shiftreg8.3027.2657.0915.5645.564CC0
sse10.0969.6429.4555.5615.561CC1
Total274.17274.53264.20165.53165.53
Percentage, %165.63165.85159.60100.00100.00
Table 13. Cycle times for classes CC2-CC4 (nanoseconds).
Table 13. Cycle times for classes CC2-CC4 (nanoseconds).
BenchmarkAutoOne-HotJEDIMPYMP T YClass of Complexity
ex110.42510.9559.4545.8215.741CC2
kirkman10.97110.39210.2805.6125.482CC2
planet11.26411.2649.0736.0185.878CC2
planet111.26411.2649.0736.0185.834CC2
pma10.63410.6349.6816.1016.101CC2
s110.62311.15410.1565.8305.707CC2
s148811.01311.37210.1556.4326.206CC2
s149410.48710.6549.8785.7235.511CC2
s1a10.3139.4629.7045.6895.511CC2
s2089.5039.4349.3616.1255.835CC2
s4209.8649.7809.7555.8686.028CC4
s5109.7429.7429.1555.3245.834CC4
s83210.97510.6389.8666.7246.401CC4
s8409.1959.2289.1586.2325.882CC4
sand12.39012.39011.6527.2217.087CC3
styr11.06711.49710.6665.9215.719CC2
tma9.83110.4959.8215.7025.596CC2
Total179.56180.36166.89102.36100.35
Percentage, %178.93179.72166.30102.00100.00
Table 14. Experimental results (the maximum operating frequency, MHz).
Table 14. Experimental results (the maximum operating frequency, MHz).
BenchmarkAutoOne-HotJEDIMPYMP T YClass of Complexity
bbara113.496113.496119.727191.809191.809CC1
bbsse99.049103.713108.539191.342191.342CC1
bbtas117.687117.687118.336188.389188.389CC0
beecount104.112104.112111.839186.111186.111CC1
cse94.713101.626107.03183.399183.399CC1
dk14113.364106.439114.134171.26171.26CC1
dk15113.675111.137114.487191.626191.626CC1
dk16105.362107.294115.316190.654190.654CC1
dk17116.049104.308116.049185.192185.192CC0
dk27120.122118.709119.494192.487192.487CC0
dk512116.74116.74117.963242.792242.792CC0
donfile110.706110.696117.517193.504193.504CC1
ex195.92291.281105.777171.796174.19CC2
ex2115.808115.808116.34177.799177.799CC1
ex3114.536114.536114.846168.594168.594CC0
ex4108.53107.352112.69182.443182.443CC1
ex5109.327109.327109.661184.328184.328CC0
ex6104.556102.333107.183186.268186.268CC1
ex7116.306116.576116.495192.304192.304CC1
keyb98.80693.466103.453189.921189.921CC1
kirkman91.14896.23297.272178.181182.406CC2
lion117.11117.634117.083164.969164.969CC0
lion9118.065111.136118.421189.756189.756CC0
mark1101.781101.781107.032156.361156.361CC1
mc115.102114.694115.174163.958163.958CC0
modulo12120.454120.454120.498168.696168.696CC0
opus103.265103.265107.818187.911187.911CC1
planet88.77788.777110.222166.182170.14CC2
planet188.77788.777110.222166.159171.417CC2
pma94.03994.039103.293163.902163.902CC2
s194.13489.65398.465171.535175.215CC2
s148890.887.93498.472155.481161.143CC2
s149495.35793.861101.236174.744181.467CC2
s1a96.963105.687103.048175.776181.467CC2
s208105.231106106.825163.266171.38CC2
s27115.314112.842115.449156.566156.566CC1
s386103.348105.329107.401162.231162.231CC1
s420101.378102.249102.514170.42165.897CC4
s510102.648102.648109.226187.816171.398CC4
s82093.53793.975102.3174.643174.643CC1
s83291.11794.001101.354148.725156.231CC4
s840108.755108.364109.196160.471170.02CC4
sand80.71180.71185.821138.478141.096CC3
shiftreg120.454137.645141.028179.726179.726CC0
sse99.049103.713105.76179.809179.809CC1
styr90.35986.97993.754168.899174.865CC2
tma101.71995.284101.819175.381178.703CC2
Total4918.264910.35157.588312.068365.78
Percentage, %58.7958.761.6599.36100
Table 15. Experimental results (the frequencies for classes CC0-CC1, MHz).
Table 15. Experimental results (the frequencies for classes CC0-CC1, MHz).
BenchmarkAutoOne-HotJEDIMPYMP T YClass of Complexity
bbara113.496113.496119.727191.809191.809CC1
bbsse99.049103.713108.539191.342191.342CC1
bbtas117.687117.687118.336188.389188.389CC0
beecount104.112104.112111.839186.111186.111CC1
cse94.713101.626107.030183.399183.399CC1
dk14113.364106.439114.134171.260171.260CC1
dk15113.675111.137114.487191.626191.626CC1
dk16105.362107.294115.316190.654190.654CC1
dk17116.049104.308116.049185.192185.192CC0
dk27120.122118.709119.494192.487192.487CC0
dk512116.740116.740117.963242.792242.792CC0
donfile110.706110.696117.517193.504193.504CC1
ex2115.808115.808116.340177.799177.799CC1
ex3114.536114.536114.846168.594168.594CC0
ex4108.530107.352112.690182.443182.443CC1
ex5109.327109.327109.661184.328184.328CC0
ex6104.556102.333107.183186.268186.268CC1
ex7116.306116.576116.495192.304192.304CC1
keyb98.80693.466103.453189.921189.921CC1
lion117.110117.634117.083164.969164.969CC0
lion9118.065111.136118.421189.756189.756CC0
mark1101.781101.781107.032156.361156.361CC1
mc115.102114.694115.174163.958163.958CC0
modulo12120.454120.454120.498168.696168.696CC0
opus103.265103.265107.818187.911187.911CC1
s27115.314112.842115.449156.566156.566CC1
s386103.348105.329107.401162.231162.231CC1
s82093.53793.975102.300174.643174.643CC1
shiftreg120.454137.645141.028179.726179.726CC0
sse99.049103.713105.760179.809179.809CC1
Total3300.423297.823419.065474.855474.85
Percentage, %60.2860.2462.45100.00100.00
Table 16. Experimental results (the frequencies for classes CC2-CC4 MHz).
Table 16. Experimental results (the frequencies for classes CC2-CC4 MHz).
BenchmarkAutoOne-HotJEDIMPYMP T YClass of Complexity
ex195.92291.281105.777171.796174.190CC2
kirkman91.14896.23297.272178.181182.406CC2
planet88.77788.777110.222166.182170.140CC2
planet188.77788.777110.222166.159171.417CC2
pma94.03994.039103.293163.902163.902CC2
s194.13489.65398.465171.535175.215CC2
s148890.80087.93498.472155.481161.143CC2
s149495.35793.861101.236174.744181.467CC2
s1a96.963105.687103.048175.776181.467CC2
s208105.231106.000106.825163.266171.380CC2
s420101.378102.249102.514170.420165.897CC4
s510102.648102.648109.226187.816171.398CC4
s83291.11794.001101.354148.725156.231CC4
s840108.755108.364109.196160.471170.020CC4
sand80.71180.71185.821138.478141.096CC3
styr90.35986.97993.754168.899174.865CC2
tma101.71995.284101.819175.381178.703CC2
Total1617.831612.481738.512837.212890.94
Percentage, %55.9655.7860.1498.14100.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Barkalov, A.; Titarenko, L.; Mazurkiewicz, M.; Krzywicki, K. Improving the Spatial Characteristics of Three-Level LUT-Based Mealy FSM Circuits. Electronics 2023, 12, 1133. https://doi.org/10.3390/electronics12051133

AMA Style

Barkalov A, Titarenko L, Mazurkiewicz M, Krzywicki K. Improving the Spatial Characteristics of Three-Level LUT-Based Mealy FSM Circuits. Electronics. 2023; 12(5):1133. https://doi.org/10.3390/electronics12051133

Chicago/Turabian Style

Barkalov, Alexander, Larysa Titarenko, Małgorzata Mazurkiewicz, and Kazimierz Krzywicki. 2023. "Improving the Spatial Characteristics of Three-Level LUT-Based Mealy FSM Circuits" Electronics 12, no. 5: 1133. https://doi.org/10.3390/electronics12051133

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop