Next Article in Journal
Optimizing Computer Networks Communication with the Band Collocation Problem: A Variable Neighborhood Search Approach
Previous Article in Journal
Evolutionary Optimization of Asymmetrical Pixelated Antennas Employing Shifted Cross Shaped Elements for UHF RFID
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving the Characteristics of Multi-Level LUT-Based Mealy FSMs

1
Institute of Metrology, Electronics and Computer Science, University of Zielona Góra, ul. Licealna 9, 65-417 Zielona Góra, Poland
2
Department of Mathematics and Information Technology, Vasyl’ Stus Donetsk National University, 21, 600-richya str., 21021 Vinnytsia, Ukraine
3
Department of Infocommunication Engineering, Faculty of Infocommunications, Kharkiv National University of Radio Electronics, Nauky Avenue 14, 61166 Kharkiv, Ukraine
4
Department of Technology, The Jacob of Paradies University, ul. Teatralna 25, 66-400 Gorzów Wielkopolski, Poland
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(11), 1859; https://doi.org/10.3390/electronics9111859
Submission received: 9 September 2020 / Revised: 14 October 2020 / Accepted: 2 November 2020 / Published: 5 November 2020
(This article belongs to the Section Circuit and Signal Processing)

Abstract

:
Contemporary digital systems include many varying sequential blocks. In the article, we discuss a case when Mealy finite state machines (FSMs) describe the behavior of sequential blocks. In many cases, the performance is the most important characteristic of an FSM circuit. In the article, we propose a method which allows increasing the operating frequency of multi-level look-up table (LUT)-based Mealy FSMs. The main idea of the proposed approach is to use together two methods of structural decomposition. They are: (1) the known method of transformation of codes of collections of outputs into FSM state codes and (2) a new method of extension of state codes. The proposed approach allows producing FPGA-based FSMs having three levels of logic combined through the system of regular interconnections. Each function for every level of logic was implemented using a single LUT. An example of the synthesis of Mealy FSM with the proposed architecture is shown. The effectiveness of the proposed method was confirmed by the results of experimental studies based on standard benchmark FSMs. The research results show that FSM circuits based on the proposed approach have a higher operating frequency than can be obtained using other investigated methods. The maximum operating frequency is improved by an average of 3.18 to 12.57 percent. These improvements are accompanied by a small growth of LUT count.

1. Introduction

Digital systems are widely used in our daily life [1]. They can be viewed as combinations of various sequential and combinational blocks [2,3]. To implement the circuit of a sequential block, it is necessary to formally describe its behavior. Very often, models of finite state machines (FSMs) [4,5] are used for this purpose. The quality of an FSM circuit is determined by a combination of such characteristics as: a chip area occupied by the circuit, maximum operating frequency and consumption of power. As follows from [6], there is a direct relationship between these circuit characteristics. To reduce the occupied chip area, various methods of structural decomposition can be applied [7]. These methods produce circuits with multiple levels of logic, which are significantly slower than their single-level counterparts.
However, very often the performance is a critical factor for a digital system. For example, it is true for real-time embedded systems [8,9]. If a multi-level circuit does not provide the required performance, then the number of levels should be decreased. This conversion must be performed in a way that increases the amount of resources used as little as possible. In this paper, we propose a method for the solution of this problem in the case in which circuits of Mealy FSMs are implemented using field programmable gate arrays (FPGAs).
There are two models of FSMs, namely, Mealy and Moore FSMs [4,5]. Problems related to the synthesis of FSM circuits are discussed in a huge number of scientific articles and books. These works are mainly devoted to the synthesis and design of Mealy automata. This determined our choice of Mealy FSMs in the current research.
To optimize the characteristics of FSM circuits, a designer should use the main features of context in which these circuits are implemented [2,10]. In this article we consider methods of implementing FSM circuits in the context of field programmable gate arrays [11,12,13]. These chips are very popular devices used for implementations of digital systems [2,14,15,16,17,18]. This fact explains our choice of FPGA-based Mealy FSMs as a research object. The current article deals with FSM circuits, which are implemented using look-up table (LUT) elements, flip-flops and programmable interconnections of FPGAs. Since the Xilinx is the largest manufacturer of FPGA chips [13], we focus our research on its solutions.
A LUT is a single-output block having S L inputs [19,20]. If a Boolean function depends on up to S L Boolean variables, then its logic circuit includes only one LUT. However, a LUT has a very small number of inputs [11,13]. At the same time, FSMs can be represented by very complex systems of Boolean functions (SBFs) having dozens of arguments [4]. For LUT-based FSMs, this contradiction leads to the necessity of functional decomposition of initial SBFs [21]. In turn, the functional decomposition gives rise to FSM circuits having many logic levels and very complex interconnections [22,23].
To implement a LUT-based FSM circuit, it is necessary to execute the step of technology mapping [24,25,26,27]. The technology mapping is a very important stage of the FPGA-based design process [28]. Its outcome significantly determines the characteristics of a resulting FSM circuit.
As a rule, LUT-based circuits of sequential blocks use five components of FPGA fabric. These components include LUTs, synced memory elements (flip-flops), programmable interconnections, synchronization circuits and blocks of input–output. Our current article is devoted to synthesis of multi-level LUT-based circuits of Mealy FSMs obtained using the methods of structural decomposition. As follows from [24,29], it is very important to optimize the system of interconnections between different elements of a circuit. The article [24] notes that time delays of the interconnection system are starting to play a major role in comparison with logic delays. Additionally, more than 70% of the power dissipation is due to the interconnections [29]. Thus, the optimization of interconnections leads to improving main characteristics of LUT-based FSM circuits. This can be done, for example, using an encoding of collections of outputs.
The main goal of our article is to increase the operating frequency of LUT-based Mealy FSM circuits. To achieve this goal, we try to reduce the number of levels of LUTs between the FSM inputs and FSM outputs. We determine the number of levels of LUTs as the number of LUT elements connected in series in the longest path connecting FSM inputs with FSM outputs. Reducing the number of levels reduces the number of interconnections in the FSM circuit [24]. Since interconnections significantly affect performance [29], a simultaneous decrease in the number of levels of LUTs and the number of interconnections leads to a significant increase in frequency.
Research [19,20] has shown that there is no point in increasing the number of LUT inputs. If the number of inputs exceeds six, it violates the balance between the main characteristics of a LUT circuit. However, the increasing complexity of modern digital systems is accompanied by an increase in the number of arguments in representing FSM functions. Therefore, there is a need for new methods and improvements to existing methods of LUT-based FSM design.
The methods of structural decomposition [7] are designed to reduce the numbers of LUTs in FSM circuits. As a rule, FSM circuits with three levels of logic blocks require the smallest numbers of LUTs. However, three-level FSMs have a much lower operating frequency compared to their single-level counterparts. FSM circuits with two levels of logic blocks represent a compromise on the number of LUTs and operating frequency. The main contribution of this paper is a novel design method aimed at increasing the operating frequency of two-level LUT-based Mealy FSMs. The main idea of the proposed approach is to use together two methods of structural decomposition. They are: (1) the known method of transformation of codes of collections of outputs into FSM state codes and (2) a new method of extension of state codes. Due to it, there are exactly three levels of LUTs in the part of FSM circuit implementing the system of outputs. Additionally, it produces FSM circuits having regular system of interconnections, where each level of logic has its unique systems of inputs and outputs. The proposed method allows obtaining FSM circuits that have slightly more LUTs and a higher operating frequency than their three-level counterparts [30]. The experimental results presented in the article show that the advantage of the proposed approach increases as the number of FSM inputs increases.
The further text of the article includes five sections. Section 2 presents the background of single-level LUT-based Mealy FSMs. Section 3 discusses the methods currently used in design of FPGA-based FSMs. The main idea of our method is considered in Section 4. In Section 5, we discuss an example synthesis, and the main ways for improving the characteristics of the resulting FSM circuit. In Section 6, we present the results of research on the effectiveness of the proposed method for benchmarks FSMs from [31]. The article ends with a brief summary.

2. Single-Level LUT-Based Mealy FSMs

As follows from [13], FPGAs manufactured by Xilinx are based on “island-style” architecture [19,20]. The configurable logic blocks (CLBs) are “islands” surrounded by a “sea” of programmable interconnections that form a general routing matrix [13]. In this paper, we discuss a case of CLBs including LUTs and programmable flip-flops. The flip-flops are used to organize hidden distributed registers keeping FSM state codes [2]. A LUT-based CLB includes a LUT, a flip-flop and a multiplexer (Figure 1).
A LUT can implement a function f c dependent on up to S L arguments. A LUT is a combinational block. Thus, the value of f c could be changed by changing the values of arguments. Using the pulse of synchronization clock, the current value of f c is written into the D flip-flop. The output of flip-flop represents a registered function f R . The multiplexer M X selects an appropriate form of CLB’s output. The output f C L B is either combinational ( f 0 = 0 ) or registered ( f 0 = 1 ).
An FSM circuit is represented by some SBF. For practical digital systems, an SBF can include around 50–70 literals [3,4]. However, a LUT has not more than six inputs. This limitation makes it necessary to transform SBFs representing FSM circuits. The transformation is executed using different methods of functional decomposition (FD) [32]. The FD-based transformation leads to FSM circuits with many levels of LUT-based CLBs and systems of unordered (irregular) interconnections. The functional decomposition leads to CLB-based circuits having “spaghetti-type” interconnections [33].
A Mealy FSM is represented as a six-component vector S = < X , Y , A , δ , λ , a 1 > [34]. The vector S includes a set of inputs X = { x 1 , , x L } , a set of outputs Y = { y 1 , , y N } , a set of internal states A = { a 1 , , a M } , a function of transitions δ , a function of output λ and an initial state a 1 A . Various tools can be applied to represent the vector S. The most commonly used tools are: graph-schemes of algorithms [3,34], binary decision diagrams [35,36], state transition graphs [4] and inverter graphs [37]. In this article, we use state transition tables (STTs) to represent Mealy FSMs.
An STT includes the following columns [4]: a current state a m ; a state of transition (a next state) a s ; an input signal X h (it determines a transition from a m to a s ); a collection of outputs Y h (it is generated during the transition from the current state into the next state). The column h includes the numbers of transitions ( h { 1 , , H } ). For example, a Mealy FSM S 0 is represented by the STT (Table 1).
As follows from Table 1, the FSM S 0 has two inputs, four outputs, three states and five transitions. From Table 1 we can find, for example, that δ ( a 1 , x 1 ) = a 2 and λ ( a 1 , x 1 ) = y 1 (these formulae follow from the first row of Table 1). The following steps should be executed to construct SBFs describing logic circuits of FSMs [3,34]: (1) the encoding of FSM states a m A by binary codes K ( a m ) ; (2) the constructing sets of state variables T = { T 1 , , T R } and input memory functions (IMFs) Φ = { D 1 , , D R } ; and (3) constructing a direct structure table (DST). To encode the states a m A , the step of state assignment should be executed [2].
In this paper, we use the style of binary state assignment where the number state variables (R) is determined as
R = l o g 2 M .
The binary state assignment is used, for example, in the system SIS [38]. The number of bits of the state code can vary from the minimum value determined by (1) to the number of states, M. If R = M , then the corresponding state codes are one-hot codes. This style is used, for example, by the academic system ABC [37] of Berkeley.
A special state register ( R G ) keeps FSM state codes. It is controlled by two internal pulses. The pulse start causes the loading of the initial state code into the R G . The pulse clock sets the time when the R G can be changed. For CLB-based FSMs, state registers are constructed on the basis of D flip-flops [2]. In this article, we also use state registers based on D flip-flops. The pulse clock allows the functions D r Φ to change the R G content.
After the state assignment, each state a m A is represented by its code K ( a m ) . The Boolean systems representing an FSM circuit can be derived from a DST. Compared to the initial STT, a DST includes three additional columns: K ( a m ) , K ( a s ) and Φ h . The column Φ h includes the symbols D r Φ corresponding to 1s in the code of the state a s from the row h of a DST. A DST is a base for finding the following SBFs:
Φ = Φ ( T , X ) ;
Y = Y ( T , X ) .
The architecture of a Mealy FSM U 1 is defined by these systems of Boolean functions (SBFs). It is shown in Figure 2.
Let us analyze this architecture. The SBF (2) is implemented by B l o c k δ . This block includes the distributed register. The R G is controlled by IMFs (2) and mutual pulses of synchronization and reset. The SBF (3) is implemented using B l o c k λ . Both blocks are implemented with CLBs (Figure 1).
Analysis of systems Φ and Y shows that they depend on the same variables. It is the main peculiarity of Mealy FSMs. Many design methods [7,39] use this specific to reduce the numbers of LUTs in circuits represented by SBFs (2) and (3).

3. State-Of-The-Art

As a rule, the process of designing digital systems involves solving some optimization problems [2,4]. In the case of FPGA-based sequential blocks, these problems are the following [2,24]: (1) the reduction of chip resources required to implement a LUT-based circuit; (2) the decreasing the propagation time (the increasing the maximum operating frequency); and (3) the reducing power consumption. Our current article is devoted to improving the maximum operating frequency of LUT-based Mealy FSMs.
The characteristics of FPGA-based FSM circuits can be improved due to optimal state assignment [2,8,37,38,39,40,41,42]. Additionally, this can be done using embedded memory blocks (EMBs) instead of LUT-based CLBs [43,44,45,46,47,48,49,50]. Let us analyze these approaches.
We call optimal state codes such codes that allow reducing the numbers of arguments in SBFs (2) and (3). For example, the numbers of arguments is significantly reduced by the algorithm JEDI [38]. It is one of the best state assignments algorithms [2]. Due to it, we chose JEDI-based FSMs to compare with FSMs based on our proposed approach.
Modern industrial CAD tools include various state assignment strategies. For example, the following state assignment methods are used in the Xilinx design tool Vivado [40]: automatic state assignment (auto); sequential encoding; the one-hot; Gray encoding and Johnson codes. The same methods can be found in the package XST by Xilinx [51].
The one-hot state assignment is very popular in LUT-based design [41], because FPGAs include many programmable flip-flops. The one-hot state assignment leads to increasing the number of input memory functions compared with (1). However, these IMFs are much simpler than in the case of binary state assignment [2]. As follows from [41], it is better to use the one-hot codes if an FSM has more than 16 states. However, the characteristics of LUT-based FSM circuits significantly depend on the number of inputs [2]. As follows from [42], the binary state encoding allows producing better FSM circuits if L 10 . Since each approach is good under certain conditions, we compare both of these encoding styles with our proposed method. The method of binary state assignment auto of Vivado is used as a baseline for comparison with the proposed method.
To reduce the power consumption, it is very important to diminish the number of interconnections inside an FSM circuit. Therefore, to diminish the number of interconnections, it is necessary to minimize the numbers of arguments in SBFs (2) and (3) [2]. Thus, it is always useful to apply the optimal state assignment to improve the characteristics of FSM circuits.
The second approach to optimizing CLB-based FSMs is related to using EMBs instead of LUTs [47]. There are many design methods targeting EMB-based FSMs [47,48,49,52,53,54,55,56,57].The survey of different methods of EMB-based design can be found in [47]. In the best case, only a single EMB is necessary to implement an FSM circuit [49]. However, if the number of arguments in systems (2) and (3) exceeds the maximum possible number of EMB address inputs, then an FSM is represented by a network of EMBs. To diminish the number of EMBs in such a network, it is necessary to implement some functions using LUTs [2,49].
Thus, an FSM circuit can be implemented as either a network of EMBs, or a network of LUTs, or a joint network of LUTs and EMBs. In this article, we discuss the second case, when FSM circuits are implemented using LUT-based CLBs. This approach makes sense if: (1) all EMBs are used to implement other parts of a digital system or (2) the number of arguments in SBFs (2) and (3) exceeds 15 (this is a maximum possible number of modern EMBs [11,12,13]).
Denote as N L ( f i ) the number of literals [4] in sum-of-products (SOPs) of functions (2) and (3). If the condition
N L ( f i ) S L ( i { 1 , , N + R } )
takes place, then a logic circuit for any function f i Φ Y is represented by exactly one LUT. If N L ( f i ) > S L , then the corresponding logic circuit can be obtained using various methods of FD [21,23,27,35,36,48,58,59]. The FD can be viewed as a process during which decomposed functions are broken down into smaller and smaller components. If any component depends on no more than S L arguments, the process of FD for a given function is completed. Of course, this results in multi-level LUT-based circuits. For these circuits, it is typical that the same inputs x l X or state variables appear on several logic levels. It significantly complicates the system of interconnection between LUTs of FD-based FSM circuits (with all the ensuing consequences).
In the best case, the LUT count of an FSM circuit is equal to the total number of inputs and state variables. However, if the condition (4) is violated, the LUT count increases by the value of | Ψ | , where Ψ is a set of additional functions different from (2) and (3). These additional functions are components of functions (2) and (3) produced during the process of FD. We do not discuss these methods in our article.
The reducing LUT counts in circuits of Mealy FSMs can be achieved using the various methods of structural decomposition [7,39]. These methods eliminate a direct dependence of functions y n Y and D r Φ on inputs x l X . The methods of structural decomposition are also connected with introducing new functions f i Ψ . Functions f i Ψ depend on variables x l X and T r T . The structural decomposition allows reducing LUT counts if there is
| Ψ | N + R .
These new functions are divided into subsystems having unique input and output variables. Each subsystem determines a separate LUT-based block of logic. When the condition (5) takes place, the total LUT count for a decomposed FSM is significantly less than it is for equivalent FSM U 1 . The new functions are arguments of functions (2) and (3). If the condition
| Ψ | L + R
takes place, then the total LUT count of a decomposed FSM circuit is significantly less than it is for an equivalent multi-level circuit . A survey of different methods of structural decomposition is represented in [7].
In this article, we discuss three known methods of structural decomposition [7,34]: replacement of inputs, encoding of outputs and transformation of codes of collections of outputs into state codes. Consider these approaches.
To reduce the LUT count, the inputs x l X could be replaced by additional variables p g P = { p 1 , , p G } , where G L [34]. As a rule, the value of G is determined as [34]:
G = m a x ( | X ( a 1 ) | , , | X ( a M ) | ) .
The system of additional variables p g P is represented by the SBF
P = P ( T , X ) .
The functions f i Φ Y are represented by the following SBFs:
Φ = Φ ( T , P ) ;
Y = Y ( T , P ) .
Collections of outputs (COs) Y q Y ( q { 1 , , Q } ) include functions y n Y generated simultaneously. To synthesize an FSM circuit, it is necessary to represent each CO Y q Y by a binary code K ( Y q ) . As a rule, the number of bits in these codes is determined as
R Q = l o g 2 Q .
To create codes K ( Y q ) , it is necessary to use additional variables z r Z = { z 1 , , z R Q } . This allows representing outputs of FSM as the following:
Y = Y ( Z ) .
The additional variables z r Z are represented by the following system:
Z = Z ( T , X ) .
To generate functions (13), an additional block of logic should be used.
In the work [30], two known methods of structural decomposition are used for reducing LUT count for FPGA-based Mealy FSMs. It results in Mealy FSM U 2 shown in Figure 3.
The logic circuit of Mealy FSM U 2 has three logic levels. The B l o c k P executes the replacement of inputs x l X by additional variables p g P = { p 1 , , p G } and implements the SBF (8). The B l o c k δ generates input memory functions (9) and additional variables z r Z used for encoding of collections of outputs Y q Y ( q { 1 , , Q } ) . This block includes a distributed register keeping state codes. To generate variables z r Z , it is necessary to implement the system
Z = Z ( T , P ) .
B l o c k λ implements the system (12) dependent on additional variables z r Z .
As our investigations [30] show, this approach allows significantly reducing the LUT count as compared to equivalent FSM U 1 . However, this solution has a serious drawback: the performance of FSM U 2 is always less than it is for an equivalent Mealy FSM U 1 .
In [36], different models of Mealy FSMs based on transformation of object codes are discussed. One of the typical methods from this group is a transformation of codes K ( Y q ) into state codes K ( a m ) .
The main idea of this approach is the following. For example, some CO Y 3 is generated during transitions into states a 4 and a 6 . Using CO Y 3 , it is possible to determine these states. To do it, it is necessary to use identifiers I 1 and I 2 . Using two pairs < c o l l e c t i o n o f o u t p u t s , i d e n t i f i e r > allows the following representation of these states of transition: a 4 < Y 3 , I 1 > and a 6 < Y 3 , I 2 > . Thus, each state a m A can be represented by one or more pairs < Y q , I n p > . To create the set of identifiers S I = { I 1 , , I N P } , it is necessary to find the maximum amount of pairs ( N P ) including the same CO Y q Y .
Each identifier I n p I is represented by a binary code K ( I n p ) having R I bits, where
R I = l o g 2 N P .
To encode identifiers, the elements of the set V = { v 1 , , v R I } are used.
It allows representing the IMFs by the following system:
Φ = Φ ( Z , V ) .
The variables v r V are represented by the following system:
V = V ( T , X ) .
Thus, an FSM based on this principle implements systems (12), (13), (16) and (17). It is an FSM U 3 shown in Figure 4.
In FSM U 3 , the B l o c k Z V implements systems (13) and (17); the B l o c k δ implements input memory functions represented as (16); the B l o c k λ implements the system (12). Thus, there are only two levels of logic between inputs and outputs in the case of FSM U 3 . As follows from Figure 3, there are three levels of logic between inputs and outputs in the case of FSM U 2 .
This property of FSM U 3 can be used for acceleration of a digital system. As is known [2], outputs (3) of Mealy FSM are not stable. If inputs are changing during a clock cycle, the outputs (3) may also change. This may cause the digital system as a whole to crash. To prevent failures, it is necessary to prohibit the access of incorrect outputs (3) to a digital system. To do it, a special register S R G is introduced (Figure 5).
If all transients in the FSM circuit are completed and the values of outputs are stable, then a pulse of synchronization C 1 is generated. It allows loading outputs y n Y into S R G . Next, the registered outputs y n Y R enter the digital system. The system executes the corresponding operations and generates the values of inputs x l X . Such an interaction should be organized for any model of Mealy FSM.
Thus, in the case of FSM U 3 , the pulse C 1 may be generated when the correct values are set for the outputs of two blocks ( B l o c k Z V and B l o c k λ ). In the case of FSM U 2 , the correct outputs are set after all three blocks are triggered sequentially. Thus, the model U 3 can provide better performance than the model U 2 .
There is one very serious disadvantage of FSM U 3 compared to equivalent FSM U 2 . If the relation
G < R I + R Q
is true, then the number of LUTs (and maybe their levels) in B l o c k Z V is significantly more than in B l o c k P of equivalent FSM U 2 . In this article, we propose a method which allows reducing the number of LUTs in FSM U 3 .

4. Main Idea of the Proposed Method

In this article, we discuss a case when the condition (4) is violated for some functions f i Z V . It leads to a multi-level circuit of B l o c k Z V with an irregular system of interconnections. Obviously, it degenerates the performance of FSM U 3 . To diminish the number of levels of LUTs in the circuit of B l o c k Z V , we propose the following approach.
As it is in the case of two-fold state assignment [7,60], we propose to construct a partition Π = { A 1 , , A J } of the set A such that the following condition takes place:
R j + L j S L ( j { 1 , , J } ) .
Using methods [7,60] allows creating the required partition Π A having the minimum possible number of classes, J.
If a class A j Π A includes M j states a m A ,
R j = l o g 2 ( M j + 1 )
then there are enough state variables to encode the states a m A j . To do it, the state variables T r T j T are used. There are R o elements in the sets T and Φ :
R o = j = 1 J R j .
If a m A j , then T r = 0 for T r T j . It explains the presence of 1 in (20).
Now, we can encode each state a m A j by a code C ( a m ) having R o bits. In this code, R o R j variables are equal to zero. Only variables T r T j identify a state a m A as an element of A j Π A .
As R o > R , the codes C ( a m ) are extended state codes [7]. However, only R j < R state variables are used to represent functions dependent on states a m A j .
To find SBFs (13) and (17), it is necessary to construct a table of B l o c k Z V ( T Z V ). It includes the columns a m , C ( a m ) , a s , Y q , I n p , X h , K ( Y q ) , K ( I n p ) , Z h , V h and h.
A class A j Π A determines a table T Z V j which is a subtable of T Z V . A table T Z V j determines sets X j X , Z j Z and V j V . These variables are written in the columns X h , Z h and V h of T Z V j , respectively. Additionally, a table T Z V j determines SBFs
Z j = Z j ( T j , X j ) ;
V j = V j ( T j , X j ) .
Using this preliminary information, we propose an architecture of Mealy FSM U 4 (Figure 6).
In FSM U 4 , the B l o c k j implements functions (22) and (23). Due to (19), each B l o c k j has only a single level of LUTs.
B l o c k O R implements functions z r Z and v r V as disjunctions:
z r = z r 1 z r 2 z r j ;
v r = v r 1 v r 2 v r j .
In (24) and (25), the superscript j means that the corresponding function is generated by the B l o c k j .
If J S L , then there is only a single level of LUTs in the circuit of B l o c k O R . Otherwise, it is a multi-level block.
B l o c k λ and B l o c k δ execute the same functions as these blocks in FSM U 3 . The B l o c k λ generates functions (12), the B l o c k δ the functions (16). If R Q S L , then B l o c k λ includes only a single level of LUTs.
Thus, in the best case, there are three levels of LUTs between inputs x l X and outputs y n Y . If the condition (4) is violated for equivalent FSM U 3 , then the FSM U 4 provides higher operating frequency.
Comparison of Figure 4 and Figure 6 shows that: (1) B l o c k Z V of U 3 is replaced by Block1, …, B l o c k J , B l o c k O R and (2) B l o c k δ of U 4 has R o > R outputs. These two issues are the main specifics of FSM U 4 .
In this paper, we propose a method of synthesis of finite state machine U 4 . If an FSM is represented by an STT, then the method includes the following steps:
  • Representing states a m A by pairs P ( m , q ) .
  • Encoding of collections of outputs and identifiers. Constructing SBF (12) representing B l o c k λ .
  • Constructing the partition Π A of the set A.
  • Creating tables T Z V j determining Block1 B l o c k J .
  • Constructing SBFs representing B l o c k 1 B l o c k J .
  • Constructing SBFs (24) and (25) representing B l o c k O R .
  • Constructing SBF (16) representing B l o c k δ .
  • Implementing the logic circuit of FSM U 4 .
The first step is executed using an initial STT. If CO Y q Y is generated during transitions into m q different states a s A , then there are m q identifiers. Each identifier determines an unique state represented by Y q Y . The cardinality of the set S I is determined as
N P = m a x ( m 1 , , m Q ) .
Step 2 is executed on the basis of STT. The COs should be encoded in a way optimizing the number of literals in SBF (12). Identifiers can be encoded in the trivial way.
The partition Π A is constructed using methods from [7,43]. After finding classes A j Π A , we can encode the states a m A j . It gives sets T j T = { T 1 , , T R 0 } and Φ = { D 1 , , D R 0 } .
A table of B l o c k j has the following columns: a m , C ( a m ) , X h j , Z h j , V h j , h. The states a m A j are written in the column a m . As T r = 0 if T r T j , we can write only parts of C ( a m ) created from state variables T r T j . A column Z h j includes variables z h j Z j , a column V h j variables v h j V j . The outcome of step 4 is tables of Block1BlockJ.
A table T Z V j is a base to derive the SBFs (24) and (25). The terms of corresponding SOPs are conjunctions A m · X h , where A m is a conjunction of variables T r T j . All other state variables are treated as insignificant. The SBF (24) and (25) are used to implement circuits of Block1BlockJ.
The step 6 is executed in the trivial way. If J S L , then there is a single level of LUTs in BlockOR. In this case, its circuit includes exactly R Q + R I LUTs.
To find the SBF (16), it is necessary to construct a table of B l o c k δ . This table includes the following columns: Y q , K ( Y q ) , I n p , K ( I n p ) , a s , C ( a s ) , Φ h , h. Each row of this table corresponds to a pair < Y q , I n p > determining the state a s A . The terms of SOPs (16) are conjunctions of variables z r Z and v r V . The corresponding literals are determined by codes K ( Y q ) and K ( I n p ) .
The last step is executed using standard CAD tools. It is based on program tools translating initial STT into required SBFs. These SBFs are used into VHDL models of FSMs.
Now, we would like to show the difference between the two-fold state assignment [60] and the proposed method. In the first case, there are two sets of state variables. The set T = { T 1 , T R } is used to encode states a m A as elements of set A. The set τ = { τ 1 , , τ R 0 } is used to encode states a m A j as elements of sets A j ( j = 1 , J ¯ ) . Due to it, there are two levels of logic creating inputs of the Block1BlockJ. In the proposed approach, the inputs of these block are generated by B l o c k δ . Thus, the proposed approach leads to faster FSMs than for the two-fold state assignment.

5. Example of Synthesis

In this article, we use a symbol U i ( S j ) to show that an FSM model U i is used to synthesize an FSM S j . An example of synthesis of Mealy FSM U 4 ( S 1 ) is shown in this section. A Mealy FSM S 1 is represented by Table 2.
The following characteristics of S 1 follow from Table 2: the number of states M = 6 , the number of transitions H = 15 , the number of inputs L = 6 and the number of outputs N = 8 . Additionally, the following collections of outputs can be found from Table 2: Y 1 = , Y 2 = { y 1 , y 2 } , Y 3 = { y 3 } , Y 4 = { y 2 , y 4 } , Y 5 = { y 5 , y 6 } , Y 6 = { y 5 , y 7 } , Y 7 = { y 3 , y 8 } . Thus, there is Q = 7 .
1. Representing states by pairs P ( m , q ) .
Using STT (Table 2), it is possible to find pairs < Y q , I n p > representing the states a m A . For example, the CO Y 2 is written in the rows 1, 9, 12 and 13. Additionally, these rows include the states of transitions a 2 (rows 1 and 12) and a 6 (rows 9 and 13). Thus, it is necessary two identifiers ( I 1 , I 2 ) to distinguish these states: a 2 < Y 2 , I 1 > , a 6 < Y 2 , I 2 > .
Using the same approach, we can find all pairs < Y q , I n p > for the given example. The process is shown in Figure 7. Using (26) gives N P = 2 and I = { I 1 , I 2 } .
In the discussed case, there is H P = 12 , where H P is a number of pairs P ( m , q ) . Thus, the B l o c k δ will be represented by the table having 12 rows.
2. Encoding of COs Y q Y and identifiers I n p S I . There is Q = 7 , N P = 2 . Using (11) gives R Q = 3 and the set Z = { z 1 , z 2 , z 3 } . Using (15) gives R I = 1 and the set V = { v 1 } .
There is R Q + R I = 4 < S L . Therefore, each equation from SBF (16) is implemented using only a single look-up table. Thus, there is no need in encoding of COs in a way optimizing (16). Let us encode COs Y q Y in a way optimizing the SBF (12).
Using contents of COs, the following SBF can be obtained:
y 1 = Y 2 ; y 2 = Y 2 Y 4 ; y 3 = Y 3 Y 7 ; y 4 = Y 4 ; y 5 = Y 5 Y 6 ; y 6 = Y 5 ; y 7 = Y 6 ; y 8 = Y 7 .
To diminish the number of interconnections between B l o c k O R and B l o c k δ , it is necessary to reduce the number of literals in functions (12). It can be done using approach [61]. One of the possible solutions is shown in Figure 8.
Using codes from Figure 8 and rules of minimization [4], we can transform the SBF (27) into the following system:
y 1 = z 1 ¯ z 2 z 3 ¯ ; y 2 = z 2 z 3 ; y 3 = z 1 ¯ z 3 ; y 4 = z 1 z 2 ; y 5 = z 1 z 2 ¯ ; y 6 = z 1 z 2 ¯ z 3 ¯ ; y 7 = z 1 z 3 ; y 8 = z 2 z 3 .
The system (28) represents B l o c k λ of U 4 ( S 1 ) . This block has 18 interconnections with B l o c k O R . In the common case, there are N · R Q = 8 × 3 = 24 literals (and 24 interconnections). Thus, the number of interconnections is reduced by 1.33 times thanks to encoding of COs shown in Figure 8.
The identifiers can be encoded in a trivial way: K ( I 1 ) = 0 and K ( I 2 ) = 1 . Now, the identifier I 1 is determined by v 1 ¯ , and I 2 by v 1 .
3. Constructing the partition of the set A. There is S L = 5 in the discussed example. It means that each block A j Π A should satisfy the condition L j + R j 5 .
This step is very important because it determines significantly the characteristics of FSM U 4 [60]. We do not discuss this step in detail. Instead, we use the approach [60] to create the partition Π A = { A 1 , A 2 } with classes A 1 = { a 1 , a 3 , a 6 } and A 2 = { a 2 , a 4 , a 5 } . Using Table 2 gives the sets X 1 = { x 1 , x 2 , x 5 } and X 2 = { x 3 , x 4 , x 6 } .
Using (20) gives R 1 = R 2 = 2 , R o = 4 , T = { T 1 , , T 4 } , T 1 = { T 1 , T 2 } and T 2 = { T 3 , T 4 } . There is L 1 = L 2 = 3 . It means that L 1 + R 1 = L 2 + R 2 = 5 = S L . Thus, the found partition satisfies the condition (19).
Due to it, state codes C ( a m ) do not affect the number of look-up tables in circuits of Block1 and Block2. We can encode them in the following way: C ( a 1 ) = 0100 , C ( a 2 ) = 0001 , C ( a 3 ) = 1000 , C ( a 4 ) = 0010 , C ( a 5 ) = 0011 and C ( a 6 ) = 1100 .
4. Creating tables of Block1 and Block2. To do it, we should construct a table of B l o c k Z V of equivalent FSM U 3 ( S 1 ) . Next, this table is divided by two tables using classes A j Π A and codes C ( a m ) .
Table of B l o c k Z V is constructed using an initial STT. To do it, the states of transitions are replaced by corresponding pairs P ( m , q ) . Additionally, the codes K ( Y q ) , K ( I p ) and columns Z h , V h are introduced instead of the column Y h of STT. In the discussed example, the B l o c k Z V is represented by Table 3.
In Table 3, we used codes K ( Y q ) from Figure 8. The pairs < Y q , I n p > were taken from Figure 7. To design circuits of Block1BlockJ, Table 3 should be transformed into a set of tables representing blocks of the first level of logic.
Consider the row h = 1 of Table 3. It corresponds the pair P ( 2 , 2 ) . Thus, the column Y q includes Y 2 and the column I n p includes I 1 . The column K ( Y q ) includes K ( Y 2 ) = 010 , the column K ( I n p ) the code K ( I 1 ) = 0 . It explains the contents of columns Z h and V h of the row 1. The column X h is the same as for initial STT (Table 2). All other rows are filled in the same way.
To create tables of a Blockj, we should: (1) choose state a m A j and (2) take rows of table of BlockZV for these states. In this case, the Block1 is represented by Table 4 and the Block2 by Table 5. In Table 4 and Table 5 the superscripts 1 and 2 mean that corresponding functions are implemented by Block1 or Block2, respectively.
5. Constructing systems representing blocks of the first level. These systems are constructed using Table 4 and Table 5. Each system includes R Q + R I = 4 equations.
The Block1 is represented by the following SBF:
z 1 1 = T 1 T 2 ¯ x 5 ; z 2 1 = T 1 ¯ T 2 x 1 T 1 ¯ T 2 x 2 ¯ T 1 x 5 ¯ ; z 3 1 = T 1 ¯ T 2 x 1 ¯ x 2 T 1 T 2 ¯ x 5 ¯ x 1 T 1 T 2 x 5 ¯ ; v 1 1 = T 1 ¯ T 2 x 1 ¯ x 2 T 1 T 2 ¯ .
The Block2 is represented by the following SBF:
z 1 2 = T 3 ¯ T 4 x 4 T 3 ¯ T 4 x 3 ¯ T 3 T 4 ¯ x 6 x 3 ¯ ; z 2 2 = T 3 T 4 ¯ x 3 ¯ T 3 T 4 ¯ x 6 ¯ T 3 T 4 ; z 3 2 = T 3 ¯ T 4 x 4 ¯ T 3 ¯ T 4 x 3 ¯ T 3 T 4 ¯ x 6 x 3 ; v 1 2 = T 3 T 4 ¯ x 6 T 3 T 4 .
6. Constructing the system for BlockOR. This system is constructed in a trivial way. Each function f i Z V is represented by a disjunction of functions of the same name with different upper indexes. It is the following SBF in the discussed case:
z 1 = z 1 1 z 1 2 ; z 2 = z 2 1 z 2 2 ; z 3 = z 3 1 z 3 2 ; v 1 = v 1 1 v 1 2 .
7. Constructing the system for B l o c k δ . To find the system (16), it is necessary to create a table of B l o c k δ . It is constructed using pairs P ( m , q ) and codes K ( Y q ) , K ( I n p ) and C ( a s ) . In the discussed case, this is Table 6. The table uses data from Figure 7 and Figure 8. The following SBF is derived from Table 6:
D 1 = z 1 ¯ z 2 z 3 ¯ v 1 z 1 ¯ z 2 ¯ z 3 v 1 z 1 z 2 ¯ z 3 ¯ v 1 ¯ z 1 ¯ z 2 z 3 v 1 ¯ ; D 2 = z 1 ¯ z 2 ¯ z 3 ¯ z 1 ¯ z 2 z 3 ¯ v 1 ; D 3 = z 1 z 2 z 3 ¯ z 1 z 2 ¯ z 3 ¯ v 1 z 1 z 2 ¯ z 3 z 1 ¯ z 2 z 3 v 1 ; D 4 = z 1 ¯ z 2 z 3 ¯ v 1 ¯ z 1 ¯ z 2 ¯ z 3 v 1 ¯ z 1 z 2 z 3 ¯ v 1 z 1 z 2 ¯ z 3 z 1 ¯ z 2 z 3 v 1 .
Now, we have systems for each block of FSM U 4 ( S 1 ) . Next step is the implementation of the logic circuit.
8. Implementing the logic circuit of FSM U 4 ( S 1 ) . This step is executed using special synthesis tools, e.g., Quartus Prime [50] or Vivado by Xilinx [40]. During this step, each LUT is represented by its truth table. Such complicated tasks are executed as mapping, placement and routing [6]. We just focus on finding the number of LUTs in the circuit and do not discuss this step for our example.
The Block1 is represented by the SBF (29). The corresponding circuit includes four LUTs. The Block2 is represented by the SBF (30). Its circuit also includes four LUTs. Thus, the first level of logic includes eight LUTs having S L = 5 .
The BlockOR is represented by the SBF (31). To implement its circuit, it is enough to have four LUTs. B l o c k λ is represented by the SBF (28). Its circuit consist of 8 LUTs. At last, the system (32) represents B l o c k δ . Its circuit has four LUTs.
Thus, the circuit of FSM U 4 ( S 1 ) includes 24 LUTs. There are three levels of LUTs between inputs x l X and outputs y n Y . The same is true for inputs and input memory functions D r Φ .
This example is very simple. We show it to explain all steps of the proposed method. The next Section shows results of experiments with more complex FSMs.

6. Experimental Results

In this section we show the results of experiments based on benchmark FSMs from the library [31]. There are 48 benchmarks in the library. They are very often used to compare outcomes of different design methods. The benchmark Mealy FSMs are represented in the format KISS2. We do not show the characteristics of these benchmarks in this article. They can be found, for example, in [30].
To implement FPGA-based FSM, we used VHDL-based FSM models. Our CAD tool K2F [2] translated the benchmarks into VHDL-based FSM models. The synthesis and simulation of FSMs were executed by the Active-HDL environment. As a target platform, we used Xilinx VC709 Evaluation Board (Virtex 7, XC7VX690T-2FFG1761C) [62]. This chip includes LUTs having S L = 6 . To execute the technology mapping and produce reports with characteristics of resulting FSM circuits, we used Xilinx CAD tool Vivado—version 2019.1 [40].
When we investigated FSM U 2 [30], we found that this model allows producing circuits with less area and power consumption if R + L > S L . In [30], we divided the benchmarks into five groups using the values of L + R and S L . If L + R 6 , then benchmarks belong to group 0 (trivial FSMs); if L + R 12 , then to group 1 (simple FSMs); if L + R 18 , then to group 2 (average FSMs); if L + R 24 , then to group 3 (big FSMs); otherwise, they belong to group 4 (very big FSMs). As our research [30] shows, the larger the group number, the bigger the gain from using our method. We use the same division of benchmarks in this article too.
Group 0 includes the following benchmarks: bbtas, dk17, dk27, dk512, ex3, ex5, lion, lion9, mc, modulo12 and shiftreg. Group 1 contains the most benchmarks. They are the following: bbara, bbsse, beecount, cse, dk14, dk15, dk16, donfile, ex2, ex4, ex6, ex7, keyb, mark1, opus, s27, s386, s840 and sse. Group 2 consists of the following 12 benchmarks: ex1, kirkman, planet, planet1, pma, s1, s1488, s1494, s1a, s208, styr and tma. There is only a single benchmark: sand in Group 3. Group 4 includes the following benchmarks: s420, s510, s820 and s832.
In the section State-of-the-art, we have justified the choice of three methods for comparison with our approach. We chose the method auto of Vivado as a method based on binary state codes. Additionally, we used the method one-hot of Vivado. Due to its high reputation, we chose JEDI-based FSMs as a basis for comparison too. Our approach is a competitor to the method from work [30]. Thus, we chose U 2 -based FSMs with three levels of logic blocks as the fourth method used in experiments. The results of experiments are shown in Table 7 (the number of LUTs) and Table 8 (the maximum operating frequency). These results were taken from reports generated by Vivado.
We use the same organization of Table 7 and Table 8. Their rows are marked by the names of benchmarks, the columns by investigated design methods. The row “Total” includes results of summation for corresponding values. The summarized characteristics of our approach ( U 4 -based FSMs) were taken as 100%. The row “Percentage” shows the percentages of summarized characteristics of FSM circuits implemented by other methods, respectively, compared to benchmarks based on our approach. Let us point out that the model U 1 was used for designs with auto, one-hot, and JEDI.
As follows from Table 7, the U 2 -based FSMs require fewer LUTs than other investigated methods. Our approach produces circuits having 8.84% more LUTs than equivalent U 2 -based FSMs. However, our approach requires fewer LUTs than auto (24.86% of gain), one-hot (45.3% of gain) and JEDI-based FSMs (2.83% of gain). The higher is the group, the greater is the gain in LUTs respectively auto, one-hot and JEDI-based FSMs. We show these results in Figure 9.
Analysis of Table 8 shows that the U 4 -based FSMs have the highest operating frequency of the investigated methods. Our method gives us a 9.85% advantage over the auto. The one-hot of Vivado loses 10.48% to our approach. The U 4 -based FSMs provide a 3.18% gain compared to JEDI-based FSMs. At last, the U 2 -based FSMs have an average frequency of 12.57% less than it is for FSM based on our approach. These results are shown in Figure 10.
To clarify how the gain in LUTs depends on the FSM group, we have created Table 9 (gain in LUTs for group 0), Table 10 (gain in LUTs for group 1) and Table 11 (gain in LUTs for groups 2–4). Additionally, we present these results by graphs on Figure 11, Figure 12 and Figure 13, respectively. To clarify how the gain in frequency depends on the FSM group, we have created Table 12 (gain in frequency for group 0), Table 13 (gain in frequency for group 1) and Table 14 (gain in frequency for groups 2–4). Additionally, we present these results by graphs on Figure 14, Figure 15 and Figure 16, respectively.
Analysis of Table 9 and Figure 10 shows that the U 4 -based FSMs have more used LUTs than other investigated methods. Our method has the following loss: 44.14% compared to auto, 22.52% compared to one-hot, 45.05% compared to JEDI-based FSMs and 19.82% compared to U 2 -based FSMs. Thus, this method is not suitable for small FSMs.
As follows from Table 10 and Figure 12, the U 4 -based FSMs of group 1 required fewer LUTs than FSMs based on auto (11.54% of gain) and one-hot (44.23% of gain). However, we still lose to the JEDI-based FSMs (7.42% of loss) and U 2 -based FSMs (12.36% of loss). Note that the loss decreased in comparison with the group 0.
As follows from Table 11 and Figure 10, the U 4 -based FSMs of groups 2–4 required fewer LUTs than FSMs based on auto (37.72% of gain), one-hot (53.44% of gain) and JEDI-based FSMs (12.13% of gain). Only U 2 -based FSMs have better results and our approach has 6.27% of loss. Note that the loss decreased in comparison with the group 1. Thus, starting from average FSMs, our approach loses only to the U 2 -based FSMs.
As follows from Table 12 and Figure 14, the U 4 -based FSMs of group 0 are faster than U 2 -based FSMs (5.38% of gain). In this group, the best results belong to JEDI-based FSMs. They have the following gains: (1) 0.9% regarding auto; (2) 3.57% regarding one-hot; (3) 12.73% regarding U 2 -based FSMs; (4) 7.35% regarding our approach. Thus, for the group 0, there is no sense in applying our approach. However, starting from the group 1, our method allows producing faster circuits than the other investigated methods.
The proposed approach produces the best results for FSMs from group 1 (Table 13 and Figure 15). There are the following gains: (1) 9.32% regarding auto; (2) 9.43% regarding one-hot; (3) 2.88% regarding JEDI-based FSMs; and (4) 14.25% regarding U 2 -based FSMs. Our approach provides even better results (Table 14 and Figure 16) for FSMs from groups 2–4. The gain increases and amounts to: (1) 20.92% regarding auto; (2) 21.13% regarding one-hot; (3) 10.49% regarding JEDI-based FSMs; and (4) 15.39% regarding U 2 -based FSMs.
As can be seen from Table 8, the U 2 -based FSMs require fewer LUTs compared to other methods. Analysis of Table 9 shows that U 4 -based FSMs are the ones with the highest maximum operating frequency compared to other methods. The overall design quality can be estimated by the product of used resources [63] (for example, chip area occupied by a circuit) and the latency time. As it is in [63], we use the number of LUTs to compare areas required for FSM circuits based on different models (auto, one-hot, JEDI, U 2 and U 4 ). As a rule, an FSM is only a part of a digital system. We do not know how many cycles a system needs to perform a required task. Thus, we cannot find absolute values of latency times. However, for a relative evaluation of different models, it is sufficient to know only the time of cycle.
In this article, we have performed a generalized comparison of the models used in experiments. As a generalized assessment, we used the result of multiplying the number of LUTs in an FSM circuit by the cycle time. The numbers of LUTs are taken from Table 7. To calculate the cycle times in nanoseconds, we used the operating frequencies from Table 8. The area-time products measured in L U T s × n s are shown in Table 16.
To better evaluate the chip resources used by FSM circuits, we have created Table 15. It contains the numbers of flip-flops required for implementing the state registers. As follows from Table 15, there are the same number of flip-flops in registers of FSMs obtained using methods auto, JEDI and U 2 -based FSMs. For these FSMs the number of memory elements is the same. They use the least number of flip-flops determined as R = l o g 2 M . The largest number of flip-flops is consumed by FSMs based on the one-hot state assignment (eight times more than, for example, U 2 -based FSMs and 4.97 times more than U 4 -based FSMs). Our approach gives a gain of 397% compared to one-hot-based FSMs, but loses 37% to other investigated methods. If we find the difference between, for example, the number of flip-flops in registers of U 2 - and U 4 -based FSMs, we can see that the difference decreases as the group number decreases.
As follows from Table 16, our approach produces FSM circuits with better area-time products than those of other investigated methods. Our approach gives the following gains: (1) 55.24% regarding auto; (2) 79.87% regarding one-hot; (3) 12.28% regarding JEDI-based FSMs; and (4) 8.6% regarding U 2 -based FSMs. If we compare results for different groups, we can draw the following conclusions. Our approach loses out to all other models for group 0. For group 1, U 4 -based FSMs lose out only to JEDI-based FSMs (4.46% of loss). However, our approach provides significantly better area-time products for FSMs from groups 2–4. In this case, our approach gives the following gains: (1) 76.79% regarding auto; (2) 97.55% regarding one-hot; (3) 24.71% regarding JEDI-based FSMs; and (4) 12.63% regarding U 2 -based FSMs.
The results of our experiments show that the proposed approach can be used instead of other models starting from simple FSMs. The U 2 -based FSMs have fewer LUTs than other models. However, starting from average FSMs, our approach allows producing circuits having slightly larger numbers of LUTs with significantly higher maximum operating frequencies. Additionally, our approach provides better area-time products starting from average FSMs. It has rather good potential and can be used in targeting FPGA-based Mealy FSMs.

7. Conclusions

Modern FPGA chips have reached such a level that quite complex systems can be implemented using only a single chip. At the same time, significant parts of the digital systems are implemented using LUTs having rather small numbers of inputs. The value S L = 6 is considered as optimal [19,20], but it is too small compared to the number of inputs and outputs of FSMs from modern digital systems. To design these complex FSMs with the use of such simple elements, it is necessary to apply the methods of functional decomposition. As a rule, the functional decomposition results in LUT-based FSM circuits having many logic levels and very complicated systems of interconnections.
Different methods of structural decomposition can be used to optimize the characteristics of FPGA-based FSM circuits. Our research [30,60] shows that the FSM circuits based on structural decomposition possess significantly better characteristics (fewer LUTs, higher maximum operating frequency, lower power consumption) than their counterparts based on functional decomposition. It is very important that the FSM circuits based on structural decomposition have regular systems of interconnections and predicted numbers of levels of logic. In the best case, each logic block of an FSM circuit has only a single level of LUTs.
In this paper, we propose a novel approach aimed at optimization of LUT-based Mealy FSMs. The proposed method leads to Mealy FSM U 4 . Two methods of structural decomposition are the cornerstones of our approach. They are: (1) the transformation of codes of collections of outputs into state codes and (2) the extension of state codes. The second method is a new one and it is proposed in this paper. To increase the maximum operating frequency, we encode the FSM states using more than the minimum number of state variables determined by (1). Our approach leads to Mealy FSM circuits with three levels of LUTs and regular systems of interconnections. As it is in a single-level FSMs U 1 , FSM outputs are generated simultaneously with input memory functions. As a result, our approach provides an increase in maximum operating frequency, accompanied by a small increase in the number of LUTs compared to equivalent three-level FSMs.
The results of our experiments clearly show that the proposed approach can be used instead of other models starting from simple FSMs. The U 2 -based FSMs have fewer LUTs than other models. However, starting from average FSMs, our approach allows producing circuits having slightly larger numbers of LUTs with significantly higher maximum operating frequency. Additionally, our approach provides better area-time products starting from average FSMs. Thus, our approach can be used if either the performance or the area-time product is the dominant characteristic of a digital system.
We are currently considering several areas of research. We intend to explore the possibility of applying the proposed approach to FPGA chips of Intel (Altera). We will also try to adapt this approach for optimizing characteristics of Moore finite state machines.

Author Contributions

Conceptualization, A.B., L.T. and K.K.; methodology, A.B., L.T., K.K. and S.S.; software, A.B., L.T. and K.K.; validation, A.B., L.T. and K.K.; formal analysis, A.B., L.T., K.K. and S.S.; investigation, A.B., L.T. and K.K.; writing—original draft preparation, A.B., L.T., K.K. and S.S.; supervision, A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BIMFblock of input memory functions
CLBconfigurable logic block
COFcollection of output functions
COcollection of output
DSTdirect structure table
EMBembedded memory block
FDfunctional decomposition
FSMfinite state machine
FPGAfield-programmable gate array
LUTlook-up table
SBFsystems of Boolean functions
SOPsum-of-products
STTstate transition table

References

  1. Bailliul, J.; Samad, T. (Eds.) Encyclopaedia of Systems and Control; Springer: London, UK, 2015; p. 1554. [Google Scholar]
  2. Sklyarov, V.; Skliarova, I.; Barkalov, A.; Titarenko, L. Synthesis and Optimization of FPGA-Based Systems; Volume 294 of Lecture Notes in Electrical Engineering; Springer: Berlin, Germany, 2014. [Google Scholar]
  3. Baranov, S. Logic and System Design of Digital Systems; TUTPress: Tallinn, Estonia, 2008. [Google Scholar]
  4. Micheli, G.D. Synthesis and Optimization of Digital Circuits; McGraw-Hill: Cambridge, MA, USA, 1994. [Google Scholar]
  5. Minns, P.; Elliot, I. FSM-Based Digital Design Using Verilog HDL; JohnWiley and Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  6. Grout, I. Digital Systems Design with FPGAs and CPLDs; Elsevier Science: Amsterdam, The Netherlands, 2011. [Google Scholar]
  7. Barkalov, A.; Titarenko, L.; Mielcarek, K.; Chmielewski, S. Logic Synthesis for FPGA-Based Control Units—Structural Decomposition in Logic Design; Volume 636 of Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
  8. Gajski, D.D.; Abdi, S.; Gerstlauer, A.; Schirner, G. Embedded System Design: Modeling, Synthesis and Verification; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  9. Krzywicki, K.; Barkalov, A.; Andrzejewski, G.; Titarenko, L.; Kolopienczyk, M. SoC research and development platform for distributed embedded systems. Przegląd Elektrotechniczny 2016, 92, 262–265. [Google Scholar] [CrossRef] [Green Version]
  10. Czerwinski, R.; Kania, D. Finite State Machine Logic Synthesis for Complex Programmable Logic Devices; Volume 231 of Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  11. Intel FPGAs and Programmable Devices. Available online: https://www.intel.pl/content/www/pl/pl/products/programmable.html (accessed on 9 September 2020).
  12. Altera. Cyclone IV Device Handbook. Available online: http://www.altera.com/literature/hb/cyclone-iv/cyclone4-handbook.pdf (accessed on 9 September 2020).
  13. Xilinx FPGAs. Available online: https://www.xilinx.com/products/silicon-devices/fpga.html (accessed on 9 September 2020).
  14. Sass, R.; Schmidt, A. Embedded System Design with Platform FPGAs: Principles and Practices; Morgan Kaufmann Publishers: Amsterdam, The Netherlands, 2010; p. 409. [Google Scholar]
  15. Branco, S.; Ferreira, A.G.; Cabral, J. Machine Learning in Resource-Scarce Embedded Systems, FPGAs, and End-Devices: A Survey. Electronics 2019, 8, 1289. [Google Scholar] [CrossRef] [Green Version]
  16. Cheng, Q.; Zhao, X.; Wen, M.; Shen, J.; Tang, M.; Zhang, C. SAPTM: Towards High-Throughput Per-Flow Traffic Measurement with a Systolic Array-Like Architecture on FPGA. Electronics 2020, 9, 1160. [Google Scholar] [CrossRef]
  17. Wang, Z.; Tang, Q.; Guo, B.; Wei, J.-B.; Wang, L. Resource Partitioning and Application Scheduling with Module Merging on Dynamically and Partially Reconfigurable FPGAs. Electronics 2020, 9, 1461. [Google Scholar] [CrossRef]
  18. Salauyou, V.; Ostapczuk, M. State Assignment of Finite-State Machines by Using the Values of Output Variables. In Theory and Applications of Dependable Computer Systems. DepCoS-RELCOMEX 2020. Advances in Intelligent Systems and Computing; Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J., Eds.; Springer: Cham, Switzerland, 2020; Volume 1173, pp. 543–553. [Google Scholar]
  19. Kilts, S. Advanced FPGA Design: Architecture, Implementation, and Optimization; Wiley-IEEE Press: Hoboken, NJ, USA, 2007. [Google Scholar]
  20. Kuon, I.; Tessier, R.; Rose, J. FPGA architecture: Survey and challenges—found trends. Electr. Des. Autom. 2008, 2, 135–253. [Google Scholar]
  21. Scholl, C. Functional Decomposition with Application to FPGA Synthesis; Kluwer Academic Publishers: Boston, MA, USA, 2001. [Google Scholar]
  22. Kubica, M.; Kania, D. Technology mapping oriented to adaptive logic modules. Bull. Pol. Acad. Sci. 2019, 67, 947–956. [Google Scholar]
  23. Kubica, M.; Kania, D. Decomposition of multi-level functions oriented to configurability of logic blocks. Bull. Pol. Acad. Sci. 2017, 67, 317–331. [Google Scholar]
  24. Mishchenko, A.; Chattarejee, S.; Brayton, R. Improvements to technology mapping for LUT-based FPGAs. IEEE Trans. CAD 2006, 27, 240–253. [Google Scholar]
  25. Kubica, M.; Kania, D.; Kulisz, J. A technology mapping of fsms based on a graph of excitations and outputs. IEEE Access 2019, 7, 16123–16131. [Google Scholar] [CrossRef]
  26. Kubica, M.; Kania, D. Area-oriented technologymapping for lut-based logic blocks. Int. J. Appl. Math. Comput. Sci. 2017, 27, 207–222. [Google Scholar] [CrossRef] [Green Version]
  27. Machado, L.; Cortadella, J. Support-Reducing Decomposition for FPGA Mapping. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2020, 39, 213–224. [Google Scholar] [CrossRef]
  28. Mishchenko, A.; Brayton, R.; Jiang, J.-H.R.; Jang, S. Scalable don’t-care-based logic optimization and resynthesis. ACM Trans. Reconfigurable Technol. Syst. 2011, 4, 4. [Google Scholar] [CrossRef]
  29. Feng, W.; Greene, J.; Mishchenko, A. Improving FPGA Performance with a S44 LUT structure. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA”18), Monterey, CA, USA, 25–27 February 2018; p. 6. [Google Scholar] [CrossRef]
  30. Barkalov, A.; Titarenko, L.; Krzywicki, K. Reducing LUT Count for FPGA-Based Mealy FSMs. Appl. Sci. 2020, 10, 5115. [Google Scholar] [CrossRef]
  31. McElvain, K. LGSynth93 Benchmark; Mentor Graphics: Wilsonville, OR, USA, 1993. [Google Scholar]
  32. Rawski, M.; Łuba, T.; Jachna, Z.; Tomaszewicz, P. The Influence of Functional Decomposition Onmodern Digital Design Process. In Design of Embedded Control Systems; Springer: Boston, MA, USA, 2005; pp. 193–203. [Google Scholar]
  33. Dahl, O.; Dijkstra, E.; Hoare, C. (Eds.) Structured Programming; Academic Press: London, UK, 1972; p. 234. [Google Scholar]
  34. Baranov, S. Logic Synthesis of Control Automata; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1994. [Google Scholar]
  35. Opara, A.; Kubica, M.; Kania, D. Strategy of Logic Synthesis using MTBDD dedicated to FPGA. Integr. VLSI J. 2018, 62, 142–158. [Google Scholar] [CrossRef]
  36. Kubica, M.; Opara, A.; Kania, D. Logic synthesis for FPGAs based on cutting of BDD. Microprocess. Microsyst. 2017, 52, 173–187. [Google Scholar] [CrossRef]
  37. Brayton, R.; Mishchenko, A. ABC: An Academic Industrial-Strength Verification Tool. In Computer Aided Verification (Berlin, Heidelberg, 2010); Touili, T., Cook, B., Jackson, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 24–40. [Google Scholar]
  38. Sentowich, E.; Singh, K.; Lavango, L.; Moon, C.; Murgai, R.; Saldanha, A.; Savoj, H.; Stephan, P.R.; Bryton, R.; Sangiovanni-Vincentelli, A. SIS: A System for Sequential Circuit Synthesis; University of California: Berkely, CA, USA, 1992. [Google Scholar]
  39. Barkalov, A.; Titarenko, L.; Barkalov, A., Jr. Structural decomposition as a tool for the optimization of an FPGA-based implementation of a Mealy FSM. Cybern. Syst. Anal. 2012, 48, 313–322. [Google Scholar] [CrossRef]
  40. Vivado Design Suite User Guide: Synthesis. UG901 (v2019.1). Available online: https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug901-vivado-synthesis.pdf (accessed on 9 September 2020).
  41. De Micheli, G.; Brayton, R.K.; Sangiovanni-Vincentelli, A. Optimal state assignment for finite statemachines. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2006, 4, 269–285. [Google Scholar] [CrossRef] [Green Version]
  42. Sutter, G.; Todorovich, E.; López-Buedo, S.; Boemo, E. Low-power FSMs in FPGA: Encoding alternatives. In Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation; Springer: Berlin/Heidelberg, Germany, 2002; pp. 363–370. [Google Scholar]
  43. Klimovich, A.S.; Solovev, V.V. Minimization of mealy finite-state machines by internal states gluing. J. Comput. Syst. Sci. Int. 2012, 51, 244–255. [Google Scholar] [CrossRef]
  44. Zając, W.; Andrzejewski, G.; Krzywicki, K.; Królikowski, T. Finite State Machine Based Modelling of Discrete Control Algorithm in LAD Diagram Language With Use of New Generation Engineering Software. Procedia Comput. Sci. 2019, 159, 2560–2569. [Google Scholar] [CrossRef]
  45. El-Maleh, A.H. A probabilistic pairwise swap search state assignment algorithm for sequential circuit optimization. Integr. VLSI J. 2017, 56, 32–43. [Google Scholar] [CrossRef]
  46. Park, S.; Cho, S.; Yang, S.; Ciesielski, M. A new state assignment technique for testing and low power. In Proceedings of the 41st annual Design Automation Conference (2004), San Diego, CA, USA, 7–11 June 2004; pp. 510–513. [Google Scholar]
  47. Garcia-Vargas, I.; Senhadji-Navarro, R. Finite state machines with input multiplexing: A performance study. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2015, 34, 867–871. [Google Scholar] [CrossRef]
  48. Rawski, M.; Selvaraj, H.; Łuba, T. An application of functional decomposition in ROM-based FSM implementation in FPGA devices. J. Syst. Archit. 2005, 51, 423–434. [Google Scholar] [CrossRef]
  49. Kołopienczyk, M.; Titarenko, L.; Barkalov, A. Design of emb-based moore fsms. J. Circuits Syst. Comput. 2017, 26, 1–23. [Google Scholar] [CrossRef]
  50. Quartus Prime. Available online: https://www.intel.pl/content/www/pl/pl/software/programmable/quartus-prime/overview.html (accessed on 9 September 2020).
  51. Xilinx. XST UserGuide. V.11.3. Available online: https://www.xilinx.com/support/documentation/sw_manuals/xilinx11/xst.pdf (accessed on 9 September 2020).
  52. Rafla, N.I.; Gauba, I. A reconfigurable pattern matching hardware implementation using on-chip RAM-based FSM. In Proceedings of the 53rd IEEE International Midwest Symposium on Circuits and Systems, Seattle, WA, USA, 1–4 August 2010; pp. 49–52. [Google Scholar]
  53. Senhadji-Navarro, R.; Garcia-Vargas, I.; Jiménez-Moreno, G.; Civit-Balcells, A.; Guerra-Gutierrez, P. ROM based FSM implementation using input multiplexing in FPGA devices. Electron. Lett. 2004, 40, 1249–1251. [Google Scholar] [CrossRef]
  54. Garcia-Vargas, I.; Senhadji-Navarro, R.; Jiménez-Moreno, G.; Civit-Balcells, A.; Guerra-Gutierrez, P. ROM-based finite state machine implementation in low cost FPGAs. In Proceedings of the IEEE International Symposium on Industrial Electronics ISIE 2007, Vigo, Spain, 4–7 June 2007; pp. 2342–2347. [Google Scholar]
  55. Senhadji-Navaro, R.; Garcia-Vargas, I. High-Speed and Area-Efficient Reconfigurable Multiplexer Bank for RAM-Based Finite State Machine Implementations. J. Circuits Syst. Comput. 2015, 24, 7. [Google Scholar] [CrossRef]
  56. Barkalov, A.; Titarenko, L.; Mazurkiewicz, M.; Krzywicki, K. Encoding of terms in EMB-based Mealy FSMs. Appl. Sci. 2020, 10, 2762. [Google Scholar] [CrossRef]
  57. Senhadji, N.; Garcia-Vargas, I. High-Performance Architecture for Binary-Tree-Based Finite State Machines. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2018, 37, 796–805. [Google Scholar] [CrossRef] [Green Version]
  58. Selvaraj, H.; Nowicka, M.; Luba, T. Non-Disjoint Decomposition Strategy in Decomposition-Based Algorithms & Tools. In Proceedings of the International Conference on Computational Intelligence and Multimedia Application, Gippsland, Australia, 2 July–2 October 1998; pp. 34–42. [Google Scholar]
  59. Michalski, T.; Kokosiński, Z. Functional decomposition of combinational logic circuits with PKmin. Czas. Tech. 2016, 2016, 191–202. [Google Scholar]
  60. Barkalov, O.; Titarenko, L.; Mielcarek, K. Hardware reduction for LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2018, 28, 595–607. [Google Scholar] [CrossRef] [Green Version]
  61. Achasova, S. Synthesis Algorithms for Automata with PLAs; Soviet Radio: Moscow, Russia, 1987. [Google Scholar]
  62. VC709 Evaluation Board for the Virtex-7 FPGA User Guide; UG887 (v1.6); Xilinx, Inc.: San Jose, CA, USA, 2019.
  63. Islam, M.M.; Hossain, M.S.; Shahjalal, M.D.; Hasan, M.K.; Jang, Y.M. Area-Time Efficient Hardware Implementation of Modular Multiplication for Elliptic Curve Cryptography. IEEE Access 2020, 8, 73898–73906. [Google Scholar] [CrossRef]
Figure 1. Architecture of a look-up table (LUT)-based configurable logic block (CLB).
Figure 1. Architecture of a look-up table (LUT)-based configurable logic block (CLB).
Electronics 09 01859 g001
Figure 2. Architecture of LUT-based Mealy FSM U 1 .
Figure 2. Architecture of LUT-based Mealy FSM U 1 .
Electronics 09 01859 g002
Figure 3. Architecture of Mealy FSM U 2 .
Figure 3. Architecture of Mealy FSM U 2 .
Electronics 09 01859 g003
Figure 4. Architecture of Mealy FSM U 3 .
Figure 4. Architecture of Mealy FSM U 3 .
Electronics 09 01859 g004
Figure 5. Interaction of FSM with the rest of a digital system.
Figure 5. Interaction of FSM with the rest of a digital system.
Electronics 09 01859 g005
Figure 6. Architecture of Mealy FSM U 4 .
Figure 6. Architecture of Mealy FSM U 4 .
Electronics 09 01859 g006
Figure 7. Representation of states by pairs P ( m , q ) .
Figure 7. Representation of states by pairs P ( m , q ) .
Electronics 09 01859 g007
Figure 8. Outcome of encoding of collections of outputs (COs).
Figure 8. Outcome of encoding of collections of outputs (COs).
Electronics 09 01859 g008
Figure 9. Total gain in LUTs relative to our approach (LUT count—total percentage).
Figure 9. Total gain in LUTs relative to our approach (LUT count—total percentage).
Electronics 09 01859 g009
Figure 10. Total gain in frequency relative to our approach (the operating frequency—total percentage).
Figure 10. Total gain in frequency relative to our approach (the operating frequency—total percentage).
Electronics 09 01859 g010
Figure 11. Gain in LUTs for group 0 (LUT count—total percentage).
Figure 11. Gain in LUTs for group 0 (LUT count—total percentage).
Electronics 09 01859 g011
Figure 12. Gain in LUTs for group 1 (LUT count—total percentage).
Figure 12. Gain in LUTs for group 1 (LUT count—total percentage).
Electronics 09 01859 g012
Figure 13. Gain in LUTs for groups 2–4 (LUT count—total percentage).
Figure 13. Gain in LUTs for groups 2–4 (LUT count—total percentage).
Electronics 09 01859 g013
Figure 14. Gain in frequency for group 0 (the operating frequency—total percentage).
Figure 14. Gain in frequency for group 0 (the operating frequency—total percentage).
Electronics 09 01859 g014
Figure 15. Gain in frequency for group 1 (the operating frequency—total percentage).
Figure 15. Gain in frequency for group 1 (the operating frequency—total percentage).
Electronics 09 01859 g015
Figure 16. Gain in frequency for groups 2–4 (the operating frequency—total percentage).
Figure 16. Gain in frequency for groups 2–4 (the operating frequency—total percentage).
Electronics 09 01859 g016
Table 1. The state transition table (STT) of Mealy FSM S 0 .
Table 1. The state transition table (STT) of Mealy FSM S 0 .
a m a s X h Y h h
a 1 a 2 x 1 y 1 1
a 3 x 1 ¯ y 2 y 3 2
a 2 a 3 x 2 y 4 3
a 1 x 2 ¯ y 2 4
a 3 a 1 15
Table 2. STT of Mealy FSM S 1 .
Table 2. STT of Mealy FSM S 1 .
a m a s X h Y h h
a 1 a 2 x 1 y 1 y 2 1
a 3 x 1 ¯ x 2 y 3 2
a 4 x 1 ¯ x 2 ¯ y 2 y 4 3
a 2 a 2 x 3 x 4 y 3 4
a 3 x 3 x 4 ¯ y 5 y 6 5
a 5 x 3 ¯ y 5 y 7 6
a 3 a 4 x 5 y 5 y 6 7
a 5 x 5 ¯ x 1 y 3 y 8 8
a 6 x 5 ¯ x 1 ¯ y 1 y 2 9
a 4 a 3 x 6 x 3 y 3 10
a 5 x 6 x 3 ¯ y 2 y 4 11
a 2 x 6 ¯ y 1 y 2 12
a 5 a 6 1 y 1 y 2 13
a 6 a 1 x 5 14
a 3 x 5 ¯ y 3 y 8 15
Table 3. Table of B l o c k Z V of Mealy FSM U 3 ( S 1 ) .
Table 3. Table of B l o c k Z V of Mealy FSM U 3 ( S 1 ) .
a m C ( a m ) a s Y q I np X h K ( Y q ) K ( I np ) Z h V h h
a 1 0100 a 2 Y 2 I 1 x 1 0100 z 2 1
a 3 Y 3 I 2 x 1 ¯ x 2 0011 z 3 v 1 2
a 4 Y 4 I 1 x 1 ¯ x 2 ¯ 1100 z 1 z 2 3
a 2 0001 a 2 Y 3 I 1 x 3 x 4 0010 z 3 4
a 3 Y 5 I 1 x 3 x 4 ¯ 1000 z 1 5
a 5 Y 6 x 3 ¯ 101 z 1 z 3 6
a 3 1000 a 4 Y 5 I 2 x 5 1001 z 1 v 1 7
a 5 Y 7 I 2 x 5 ¯ x 1 0111 z 2 z 3 v 1 8
a 6 Y 2 I 2 x 5 ¯ x 1 ¯ 0101 z 2 v 1 9
a 4 0010 a 3 Y 3 I 2 x 6 x 3 0011 z 3 v 1 10
a 5 Y 4 I 2 x 6 x 3 ¯ 1101 z 1 z 2 v 1 11
a 2 Y 2 I 1 x 6 ¯ 0100 z 2 12
a 5 0011 a 6 Y 2 I 2 10101 z 2 v 1 13
a 6 1100 a 1 Y 1 x 5 00014
a 3 Y 7 I 1 x 5 ¯ 0110 z 2 z 3 15
Table 4. Table of Block1 of Mealy FSM U 4 ( S 1 ) .
Table 4. Table of Block1 of Mealy FSM U 4 ( S 1 ) .
a m C ( a m ) X h 1 Z h 1 V h 1 h
a 1 01 x 1 z 2 1 1
x 1 ¯ x 2 z 3 1 v 1 1 2
x 1 ¯ x 2 ¯ z 1 1 z 2 1 3
a 3 10 x 5 z 1 1 v 1 1 4
x 5 ¯ x 1 z 2 1 z 3 1 v 1 1 5
x 5 ¯ x 1 ¯ z 2 1 v 1 1 6
a 6 11 x 5 7
x 5 ¯ z 2 1 z 3 1 8
Table 5. Table of Block2 of Mealy FSM U 4 ( S 1 ) .
Table 5. Table of Block2 of Mealy FSM U 4 ( S 1 ) .
a m C ( a m ) X h 1 Z h 1 V h 1 h
a 2 01 x 3 x 4 z 1 2 1
x 3 x 4 ¯ z 3 2 2
x 3 ¯ z 1 2 z 3 2 3
a 4 10 x 6 x 3 z 3 2 v 1 2 4
x 6 x 3 ¯ z 1 2 z 2 2 v 1 2 5
x 6 ¯ z 2 2 6
a 5 111 z 2 2 v 1 2 7
Table 6. Table of B l o c k δ of Mealy FSM U 4 ( S 1 ) .
Table 6. Table of B l o c k δ of Mealy FSM U 4 ( S 1 ) .
Y q K ( Y q ) I np K ( I np ) a s C ( a s ) Φ h h
Y 1 000 a 1 0100 D 2 1
Y 2 010 I 1 0 a 2 0001 D 4 2
I 2 1 a 6 1100 D 1 D 2 3
Y 3 001 I 1 0 a 2 0001 D 4 4
I 2 1 a 3 1000 D 1 5
Y 4 110 I 1 0 a 4 0010 D 3 6
I 2 1 a 5 0011 D 3 D 4 7
Y 5 100 I 1 0 a 3 1000 D 1 8
I 2 1 a 4 0010 D 3 9
Y 6 101 a 5 0011 D 3 D 4 10
Y 7 011 I 1 0 a 3 1000 D 1 11
I 2 1 a 5 0011 D 3 D 4 12
Table 7. Results of experiments (LUT count).
Table 7. Results of experiments (LUT count).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
bbara17171010141
bbsse33372426291
bbtas555890
beecount19191414161
cse40663633351
dk1416271012141
dk151516128111
dk1615341211131
dk1751258100
dk27354790
dk5121010912140
donfile31312421241
ex170745340442
ex29988101
ex399911140
ex415131211131
ex599910120
ex624362221231
ex7454681
keyb43614037401
kirkman42583933352
lion252680
lion961158100
mark123232019211
mc474680
modulo127779110
opus28282221231
planet1311318878822
planet11311318878822
pma94948672762
s165996154582
s148812413110889932
s149412613211090942
s1a49814338422
s2081231109112
s276186681
s38626392220221
s420103198104
s51048483222234
s89999111
s82088826852564
s83280796250524
sand132132114991033
shiftreg262460
sse33373026291
styr931208170782
tma45393930342
Total18082104148913201448
Percentage,%124.86145.30102.8391.16100
Table 8. Results of experiments (the operating frequency, MHz).
Table 8. Results of experiments (the operating frequency, MHz).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
bbara193.39193.39212.21183.32210.211
bbsse157.06169.12182.34159.24193.431
bbtas204.16204.16206.12194.43201.470
beecount166.61166.61187.32156.72194.471
cse146.43163.64178.12153.24182.621
dk14191.64172.65193.85162.78201.391
dk15192.53185.36194.87175.42206.741
dk16169.72174.79197.13164.16199.141
dk17199.28167.00199.39147.22172.990
dk27206.02201.9204.18181.73190.320
dk512196.27196.27199.75175.63187.450
donfile184.03184203.65174.28206.831
ex1150.94139.76176.87164.32180.722
ex2198.57198.57200.14188.95196.581
ex3194.86194.86195.76174.44187.260
ex4180.96177.71192.83168.39196.181
ex5180.25180.25181.16162.56162.560
ex6169.57163.8176.59156.42187.531
ex7200.04200.84200.6191.43204.161
keyb156.45143.47168.43136.49178.591
kirkman141.38154.00156.68155.36184.622
lion202.43204.00202.35185.74195.730
lion9205.3185.22206.38167.28183.450
mark1162.39162.39176.18153.48182.371
mc196.66195.47196.87178.02182.950
modulo12207.00207.00207.13189.7201.740
opus166.2166.2178.32157.42186.341
planet132.71132.71187.14174.68212.452
planet1132.71132.71187.14173.29212.452
pma146.18146.18169.83156.12192.432
s1146.41135.85157.16145.32145.322
s1488138.5131.94157.18141.27182.142
s1494149.39145.75164.34155.63186.492
s1a153.37176.4169.17166.36188.922
s208174.34176.46178.76166.42192.152
s27198.73191.5199.13185.15201.261
s386168.15173.46179.15164.65192.341
s420173.88176.46177.25186.35218.624
s510177.65177.65198.32199.05221.194
s8180.02178.95181.23168.32191.321
s820152.00153.16176.58175.69195.734
s832145.71153.23173.78174.39199.184
sand115.97115.97126.82120.07143.143
shiftreg262.67263.57276.26248.79253.720
sse157.06169.12174.63158.14171.181
styr137.61129.92145.64118.02164.522
tma163.88147.8164.14137.48182.722
Total8127.088061.228718.877873.369005.11
Percentage,%90.2589.5296.8287.43100
Table 9. Gain in LUTs for group 0 (LUT count).
Table 9. Gain in LUTs for group 0 (LUT count).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
bbtas555890
dk1751258100
dk27354790
dk5121010912140
ex399911140
ex599910120
lion252680
lion961158100
mc474680
modulo127779110
shiftreg262460
Total62866189111
Percentage,%55.8677.4854.5980.18100
Table 10. Gain in LUTs for group 1 (LUT count).
Table 10. Gain in LUTs for group 1 (LUT count).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
bbara17171010141
bbsse33372426291
beecount19191414161
cse40663633351
dk1416271012141
dk151516128111
dk1615341211131
donfile31312421241
ex29988101
ex415131211131
ex624362221231
ex7454681
keyb43614037401
mark123232019211
opus28282221231
s276186681
s38626392220221
s8 409999111
sse33373026291
Total406525337319364
Percentage,%111.54144.2392.5887.64100
Table 11. Gain in LUTs for groups 2–4 (LUT count).
Table 11. Gain in LUTs for groups 2–4 (LUT count).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
ex170745340442
kirkman42583933352
planet1311318878822
planet11311318878822
pma94948672762
s165996154582
s148812413110889932
s149412613211090942
s1a49814338422
s2081231109112
styr931208170782
tma45393930342
sand132132114991033
s420103198104
s51048483222234
s82088826852564
s83280796250524
Total134014931091912973
Percentage,%137.72153.44112.1393.73100.00
Table 12. Gain in frequency for group 0 (the operating frequency).
Table 12. Gain in frequency for group 0 (the operating frequency).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
bbtas204.16204.16206.12194.43201.470
dk17199.28167199.39147.22172.990
dk27206.02201.9204.18181.73190.320
dk512196.27196.27199.75175.63187.450
ex3194.86194.86195.76174.44187.260
ex5180.25180.25181.16162.56162.560
lion202.43204202.35185.74195.730
lion9205.3185.22206.38167.28183.450
mc196.66195.47196.87178.02182.950
modulo12207207207.13189.7201.740
shiftreg262.67263.57276.26248.79253.720
Total2254.902199.702275.352005.542119.64
Percentage, %106.38103.78107.3594.62100.00
Table 13. Gain in frequency for group 1 (the operating frequency).
Table 13. Gain in frequency for group 1 (the operating frequency).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
bbara193.39193.39212.21183.32210.211
bbsse157.06169.12182.34159.24193.431
beecount166.61166.61187.32156.72194.471
cse146.43163.64178.12153.24182.621
dk14191.64172.65193.85162.78201.391
dk15192.53185.36194.87175.42206.741
dk16169.72174.79197.13164.16199.141
donfile184.03184203.65174.28206.831
ex2198.57198.57200.14188.95196.581
ex4180.96177.71192.83168.39196.181
ex6169.57163.8176.59156.42187.531
ex7200.04200.84200.6191.43204.161
keyb156.45143.47168.43136.49178.591
mark1162.39162.39176.18153.48182.371
opus166.2166.2178.32157.42186.341
s27198.73191.5199.13185.15201.261
s386168.15173.46179.15164.65192.341
s8180.02178.95181.23168.32191.321
sse157.06169.12174.63158.14171.181
Total3339.553335.573576.723158.003682.68
Percentage, %90.6890.5797.1285.75100.00
Table 14. Gain in frequency for groups 2–4 (the operating frequency).
Table 14. Gain in frequency for groups 2–4 (the operating frequency).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
ex1150.94139.76176.87164.32180.722
kirkman141.38154156.68155.36184.622
planet132.71132.71187.14174.68212.452
planet1132.71132.71187.14173.29212.452
pma146.18146.18169.83156.12192.432
s1146.41135.85157.16145.32145.322
s1488138.5131.94157.18141.27182.142
s1494149.39145.75164.34155.63186.492
s1a153.37176.4169.17166.36188.922
s208174.34176.46178.76166.42192.152
styr137.61129.92145.64118.02164.522
tma163.88147.8164.14137.48182.722
sand115.97115.97126.82120.07143.143
s420173.88176.46177.25186.35218.624
s510177.65177.65198.32199.05221.194
s820152153.16176.58175.69195.734
s832145.71153.23173.78174.39199.184
Total2532.632525.952866.802709.823202.79
Percentage, %79.0878.8789.5184.61100.00
Table 15. Results of experiments (FFs count).
Table 15. Results of experiments (FFs count).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
bbara4124461
bbsse5265561
bbtas494440
beecount4104461
cse5325571
dk145265581
dk155175571
dk1677577101
dk174164440
dk274104440
dk5125245540
donfile5245561
ex178077112
ex25255571
ex34144440
ex45185571
ex54164440
ex64144461
ex75175571
keyb5225561
kirkman64866102
lion353330
lion94114440
mark15225571
mc383330
modulo124124440
opus5185571
planet78677122
planet178677122
pma64966112
s165466102
s1488711277162
s1494711877162
s1a78677142
s20863766112
s274114451
s3865235571
s420813788184
s510817288214
s84154451
s82077877164
s83277677174
sand78877143
shiftreg4164440
sse5265581
styr76777132
tma66366122
Total2512011251251404
Percentage,%62.13497.7762.1362.13100
Table 16. Results of experiments (the generalized assessments, L U T s × n s ).
Table 16. Results of experiments (the generalized assessments, L U T s × n s ).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
bbara87.9187.9147.1254.5566.601
bbsse210.11218.78131.62163.28149.931
bbtas24.4924.4924.2641.1544.670
beecount114.04114.0474.7489.3382.271
cse273.17403.32202.11215.35191.651
dk1483.49156.3951.5973.7269.521
dk1577.9186.3261.5845.6053.211
dk1688.38194.5260.8767.0165.281
dk1725.0971.8625.0854.3457.810
dk2714.5624.7619.5938.5247.290
dk51250.9550.9545.0668.3374.690
donfile168.45168.48117.85120.50116.041
ex1463.76529.48299.66243.43243.472
ex245.3245.3239.9742.3450.871
ex346.1946.1945.9763.0674.760
ex482.8973.1562.2365.3266.271
ex549.9349.9349.6861.5273.820
ex6141.53219.78124.58134.25122.651
ex720.0024.9019.9431.3439.181
keyb274.85425.18237.49271.08223.981
kirkman297.07376.62248.91212.41189.582
lion9.8824.519.8832.3040.870
lion929.2359.3924.2347.8254.510
mark1141.63141.63113.52123.79115.151
mc20.3435.8120.3233.7043.730
modulo1233.8233.8233.8047.4454.530
opus168.47168.47123.37133.40123.431
planet987.11987.11470.24446.53385.972
planet1987.11987.11470.24450.11385.972
pma643.04643.04506.39461.18394.952
s1443.96728.74388.14371.59399.122
s1488895.31992.88687.11630.00510.602
s1494843.43905.66669.34578.29504.052
s1a319.49459.18254.18228.42222.322
s20868.83175.6855.9454.0857.252
s2730.1993.9930.1332.4139.751
s386154.62224.84122.80121.47114.381
s42057.51175.6850.7842.9345.744
s510270.19270.19161.36110.52103.984
s8 4049.9950.2949.6653.4757.501
s820578.95535.39385.09295.98286.114
s832549.04515.56356.77286.71261.074
sand1138.231138.23898.91824.52719.583
shiftreg7.6122.767.2416.0823.650
sse210.11218.78171.79164.41169.411
styr675.82923.65556.17593.12474.112
tma274.59263.87237.60218.21186.082
Total12,228.6114,168.648844.908554.937877.31
Percentage,%155.24179.87112.28108.60100
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving the Characteristics of Multi-Level LUT-Based Mealy FSMs. Electronics 2020, 9, 1859. https://doi.org/10.3390/electronics9111859

AMA Style

Barkalov A, Titarenko L, Krzywicki K, Saburova S. Improving the Characteristics of Multi-Level LUT-Based Mealy FSMs. Electronics. 2020; 9(11):1859. https://doi.org/10.3390/electronics9111859

Chicago/Turabian Style

Barkalov, Alexander, Larysa Titarenko, Kazimierz Krzywicki, and Svetlana Saburova. 2020. "Improving the Characteristics of Multi-Level LUT-Based Mealy FSMs" Electronics 9, no. 11: 1859. https://doi.org/10.3390/electronics9111859

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop