Next Article in Journal
Leveraging Prototypical Prompt Learning for Robust Bridge Defect Classification in Civil Infrastructure
Previous Article in Journal
Rectifier Fault Diagnosis Using LTSA Optimization High-Dimensional Energy Entropy Feature
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving Temporal Characteristics of Mealy FSM with Composite State Codes

by
Alexander Barkalov
1,*,
Larysa Titarenko
1,2,
Kazimierz Krzywicki
3,* and
Svetlana Saburova
2
1
Institute of Metrology, Electronics and Computer Science, University of Zielona Gora, ul. Licealna 9, 65-417 Zielona Gora, Poland
2
Department of Infocommunication Engineering, Faculty of Infocommunications, Kharkiv National University of Radio Electronics, Nauky Avenue 14, 61166 Kharkiv, Ukraine
3
Department of Technology, The Jacob of Paradies University, ul. Fryderyka Chopina 52/b.7, 66-400 Gorzow Wielkopolski, Poland
*
Authors to whom correspondence should be addressed.
Electronics 2025, 14(7), 1406; https://doi.org/10.3390/electronics14071406
Submission received: 17 March 2025 / Revised: 27 March 2025 / Accepted: 28 March 2025 / Published: 31 March 2025

Abstract

:
In this paper, we proposed a new state assignment method focusing on Mealy finite state machines (FSMs). The method makes it possible to improve the temporal characteristics of the circuits of FSMs, the internal states of which are encoded by the composite state codes (CSCs). These codes consist of class codes and partial state codes. Both class and partial state codes are maximum binary codes. We propose to encode classes by one-hot codes. The main goal of the method is improving the value of the FSM cycle time without any significant degradation of spatial characteristics. The method can be applied if FSM circuits are implemented using look-up table (LUT) elements of field-programmable gate arrays (FPGAs). The resulting FSM circuit includes two logic blocks. The first block generates partial input memory functions and FSM outputs depending on maximum binary state codes and one-hot class codes. The choice of partial codes allows minimizing the systems of partial functions. This allows generating most partial functions by single-LUT circuits. Some partial functions require using dedicated multiplexers. The second block generates final values of input memory functions and FSM outputs. This block does not require class codes to generate functions, which is the case of CSC-based FSMs. The proposed approach allows reducing the number of series-connected LUTs in comparison with CSC-based FSMs. Due to this reduction, the temporal characteristics are improved. The paper includes an example of FSM synthesis through applying the proposed method. The experiments are conducted using standard benchmark FSMs. The results of experiments show that the proposed method allows improving the temporal characteristics (by an average of 9.15%). In relation to CSC-based FSMs, the number of LUTs increases by an average of 10.03%, and the power consumption increases by an average of 7.63%.

1. Introduction

A model of Mealy finite state machine (FSM) [1] represents the behavior of sequential devices [2,3]. In this case, the synthesis of a sequential device is reduced to the synthesis of an FSM circuit. In this paper, we discuss a case when field-programmable gate arrays (FPGAs) [4] are used for implementing FSM circuits. This choice is determined by the fact that FPGAs are widely used for the implementation of digital systems of varying complexity [5]. The development of new FPGA-oriented design methods makes sense because, according to experts [6], they will be widely used in the next few decades.
The FPGA-based design is associated with the need to solve a number of interrelated optimization problems. Most often, it is necessary to obtain a circuit in which either the number of used internal resources, or performance, or power consumption is optimal [7,8,9,10]. We discuss the case when the dominant optimization problem is the increase of performance (the increasing FSM operating frequency). To implement a Mealy FSM circuit, the following FPGA internal resources are used: look-up table (LUT) elements combined with flip-flops, internal multiplexers, synchronization tree, programmable interconnections and input–outputs [11,12,13]. As the starting object, we chose Mealy FSMs in whose states are encoded by composite state codes (CSCs) [14]. Using CMCs is based on creating classes of compatible states. Both classes and states are encoded with a minimum number of bits. We propose a method for improving performance that does not lead to a significant overhead (the LUT count and consumed power). The required chip area is estimated as the LUT count [15].
Currently, there is an increasing amount of information processed by various digital systems [16]. This requires increasing the system performance. For example, this issue is very important for real-time embedded systems [17,18,19,20]. Very often, the performance of the digital system becomes one of the decisive factors of commercial success. This is perfectly true for various FPGA-based systems. In the case of FPGA-based systems, it is very important to increase the system performance without a significant overhead [7]. The overhead may be represented by an increased number of LUTs and interconnections or increased power consumption [7]. Very often, FSMs are used as a control unit of a digital system. Such FSMs generate control signals in each cycle of the system operation. Obviously, the cycle time of this control FSM significantly affects the performance of the controlled digital system [21].
Increasing the performance of FSMs is possible due to a state assignment allowing to diminish the number of logical levels in the FPGA-based FSM circuit [7,22]. In our current paper, we propose a state assignment method allowing to decrease the FSM cycle time without the significant overhead in internal resources (LUTs, flip-flops, interconnections). The need to solve the related problem is a motivation of our current research.
The main contribution of this paper is a novel method improving the temporal characteristics of LUT-based Mealy FSM circuits with CSCs. The improvement is based on the use of one-hot codes (OHCs) for encoding classes of compatible states. For encoding the compatible states, the maximum binary codes (MBCs) are used. This approach leads to an FSM circuit architecture that is different from the architecture where OHCs are not used (only MBCs) [14]. The performance improvement is related to the absence of the need to use class codes when converting partial functions to FSM outputs and input memory functions. As follows from our experimental study, the proposed approach allows the performance to be improved for rather complex FSMs. At the same time, there is no significant increase in the values of both LUT count and power consumption.
The article includes seven sections, the first of which is this brief introduction. Basic necessary information-related FSM design is represented in Section 2. Next, Section (Section 3) includes the analysis of CSC-based FSMs. The essence of the proposed method is considered in Section 4. The next section includes an example of FSM synthesis using the proposed method (Section 5). The outcomes of conducted experiments are analyzed in the sixth section (Section 6). Section 7 is a short conclusion ending the paper.

2. Basic Necessary Information

The behavior of a Mealy FSM can be represented by either a state transition graph (STG) or state transition table (STT) [1]. Using these tools, we can find sets of FSM states A = { a 1 , , a M } , inputs X = { x 1 , , x L } , and outputs Y = { y 1 , , y N } . So, an FSM has M internal states, L external inputs and N outputs. For each state a m A , an STG (an STT) shows which outputs are generated under the influence of FSM inputs [1,23].
The design process starts from the step of state assignment [1]. During this step, the internal states a m A are encoded by binary codes C ( a m ) having R bits. To create these codes, we use state variables. These variables are combined into a set T = { T 1 , , T R } . Next, it is necessary to obtain systems of Boolean functions (SBFs) representing a combinational part of the FSM circuit. The special R-bit register R G keeps the codes C ( a m ) . The register is connected with the combinational part. So-called maximum binary codes have the minimum possible amount of bits, R M B . The value of R M B can be found using the following expression [7,24]:
R M B = l o g 2 M .
The content of RG could be changed using input memory functions (IMFs) together with pulses Start and Clock. If the FSM circuit is implemented with LUTs, then the state register includes master–slave flip-flops having inputs of the type D [2]. Due to it, IMFs are elements of a set D = { D 1 , , D R } . If the pulse Start arises, then the code of initial state a 1 A is written into R G . The IMFs determine a code of the next state to be written into R G when a certain edge of the pulse Clock arises.
The following SBFs are used for implementing the combinational part of Mealy FSM [1,23]:
D = D ( T , X ) ;
Y = Y ( T , X ) .
These systems determine M B -FSM [25], where the symbol M B shows that maximum binary codes are used for state assignment. Its architecture is largely determined by the peculiarities of the logic elements used [7].
This paper is devoted to the design of LUT-based FSMs. Modern FPGAs include a lot of configurable logic blocks (CLBs) which could be interconnected using programmable interconnections [26]. We target at FPGAs manufactured by AMD Xilinx [12]. In this case, each configurable block consists of LUTs, dedicated multiplexers and programmable flip-flops. Using multiplexers, it is possible to obtain a super-LUT having more inputs than the basic LUT. For example, there are S L = 6 inputs in the basic LUT from the Virtex-7 family of FPGAs [12,27]. To show the number of inputs, we denote LUT as an S L -LUT. Inside a single CLB, using dedicated multiplexers allows obtaining either two 7-LUTs or one 8-LUT. These super-LUTs (SLUTs) are practically as fast as the basic 6-LUT [28]. To create a super-LUT with more than eight inputs, it is necessary to use several CLBs. However, even 9-LUT is significantly slower than either 7-LUT or 8-LUT [28]. This is due to the fact that external inter-CLB connections are much slower than intra-CLB connections [28]. We use the symbol CLBer to denote that a particular FSM block is implemented using internal CLB resources. In C L B -based FSMs, there is no a separate block of state register R G . The elements of R G are distributed among LUTs, generating functions (2). As a result, the architecture of M B -FSM includes only two blocks (Figure 1).
A block CLBerT implements SBF (2). This is accomplished using LUTs and flip-flops. The pulses Start and Clock enter CLBerT to control the flip-flops of R G . The functions of SBF (3) are generated by LUTs of a block CLBerY.
The modern LUTs have a serious drawback: the number of basic LUT inputs ( S L ) is extremely limited [13]. This drawback manifests itself in the implementation of LUT-based circuits for functions whose number of arguments exceeds the value S L . If a Boolean function β j D Y has N A j arguments, then the corresponding LUT-based circuit requires using super-LUTs if the following condition takes place:
N A j > S L .
The SBFs (2) and (3) are derived from a direct structure table (DST) [23]. To construct a DST, it is possible to use a state transition graph (STG) [1]. The STG includes M nodes corresponding to FSM states. These nodes are connected by arcs corresponding to interstate transitions.
There are H arcs in an STG. If the h-th arc is directed from a m to a s , then it determines a transition < a m , a s > . The arc number h { 1 , , H } is marked by symbols X h and Y h . The symbol X h stands for an input signal determining the transition < a m , a s > . The input signal is a conjunction of inputs x l X (or their complements). The symbol Y h denotes a collection of outputs (CO) generated during the transition < a m , a s > .
To form a DST, we should transform a graph into an equivalent state transition table. An STT includes columns denoted as a m (a current state), a s (a state of transition), X h (an FSM input), Y h (a collection of outputs), and h (a number of table line).
To construct a DST, it is necessary to encode FSM states a m A by binary codes C ( a m ) . Compared to an STT, a DST includes the columns with codes C ( a m ) and C ( a s ) as well as a column with a collection of IMFs D h D . These IMFs determine the next state code [23].
The most common way to encode states is either using maximum binary or one-hot state assignment. The MBC-based approach provides the minimum width of state codes determined by (1). But the resulting functions (2) and (3) are very complex [7]. In the case of the OHC-based approach, the codes have the maximum number of bits ( R O H = M ). In this case, it is necessary to use a lot of flip-flops, but the functions (2) and (3) are rather very simple. In work [29], they compared MBC-and OHC-based FSMs. The results of comparison show that a one-hot state assignment provides better results if the number of states exceeds 16. However, in addition to the number of states, the characteristics of LUT-based circuits are also affected by the number of FSM inputs [30]. The experimental results shown in [31] prove the following: MBC-based FSMs have better characteristics (as compared with OHC-based FSMs) if the number of FSM inputs exceeds 10.
If the relation (4) holds for some function β j D Y , then this function should be represented as a disjunction of partial functions. This can be accomplished using methods of functional decomposition (FD) [30]. Each partial function should violate the condition (4). In this case, such a partial function is generated using a single LUT. Next, these LUTs create a mutual multi-level network. So, FD produces FSM circuits having a lot of logic levels and a very complicated network of “spaghetti-type” interconnections [32,33].
The FSM performance may be improved by reducing the amount of LUT levels in an FSM circuit. The reducing the number of literals in the sum-of-products (SOP) of functions (2) and (3) can solve this problem. A proper state assignment may optimize SBFs (2) and (3) [32]. An example of such approach efficiency is the algorithm JEDI [34]. Using JEDI leads to creating generalized cubes covering codes of states for which transitions depend on the same inputs signals X h . Using adjacent codes for these states leads to the minimization of SOPs. In turn, this optimizes the corresponding circuits generating SBFs (2) and (3): these circuits have fewer LUTs as well as fewer levels and interconnections than the equivalent FSM circuits obtained using both maximum binary and one-hot state codes. As follows from [35], using JEDI can improve all three basic characteristics of LUT-based FSM circuits.

3. Analysis of CSC-Based Finite State Machines

As the research results [14] have shown, both temporal and spatial characteristics of MB-FSM circuits can be improved using the composite state codes. As follows from [14], CSC-based FSM circuits require less chip area and provide better performance compared with MBC-, OHC-and JEDI-based equivalents.
The method proposed in [14] is based on splitting the set of states A by K classes of compatible states A k A . As a result, a partition Π A = { A 1 , , A K } is created. This partition consists of the minimum possible number of elements (classes). The codes C ( A k ) of classes A k Π A have R C bits:
R C = l o g 2 K .
To create an MBC of classes, the elements of the set C = { c 1 , , c R C } are used.
Inside each class, the states a m A k are encoded by partial state codes P C ( a m ) . These codes are maximum binary codes, too. To create the partial codes for a particular class A k Π A , it is necessary to have at least R k bits, where
R k = l o g 2 | A k | .
To create partial state codes for any class A k Π A , we use the variables that form a set S = { s 1 , , s R S } , where
R S = m a x ( R 1 , , R K ) .
For a state a m A k , the composite state code C C ( a m ) is a concatenation of codes C ( A k ) and P C ( a m ) :
C C ( a m ) = C ( A k ) P C ( a m ) .
In (8), the symbol “∗” is a concatenation sign.
Three sets of variables are determined by each class A k Π A . The first one is a partial set of inputs X k X having L k elements. The input signals consisting of variables x l X cause transitions from the states a m A k . The partial outputs form the set Y k Y . This set consists of FSM outputs y n Y produced under the influence of inputs x l X k . The third set is a partial set of IMFs D k D . The impact of the inputs x l X k leads to the generation of outputs y n Y k and IMFs D r D k .
Each class A k Π A satisfies the following condition:
R S + L k S L .
In [36], there is proposed a greedy algorithm creating the partition Π A with the minimum possible number of classes, K.
The same variables s r S represent the bits of partial codes for any class A k Π A . So, the total number of used variables ( R C C ) is determined as
R C C = R S + R C .
The partial functions are represented by the following SBFs:
D k = D k ( S , X k ) ;
Y k = Y k ( S , X k ) .
Functions (11) and (12) should be transformed into the final values of corresponding functions β j D Y . These resulting values are represented as
D = D ( C , D 1 , , D K ) ;
Y = Y ( C , Y 1 , , Y K ) .
The functions (13) and (14) can be viewed as multiplexing functions [7] where each conjunction of variables c r C determines which partial function must be passed to the multiplexer output. Obviously, the dedicated multiplexers of CLBs [28,37] could be used for implementing circuits generating functions (13) and (14).
The SBFs (11)–(14) determine the architecture of P C Mealy FSM. This architecture (Figure 2) is proposed in the following article [14].
In P C FSM, CLB-based blocks (CLBer1CLBerK) represent the first logic level of the FSM circuit. The CLBerk generates partial functions having the superscript k. A CLBerF generates FSM outputs (14). Also, this block includes the hidden distributed register R G having R C C flip-flops. Thus, the block CLBerF implements IMFs (13) entering inputs of flip-flops. The outputs of flip-flops are the variables c r C and s r S . In this way, both class and partial state codes are created.
The article [14] presents the results of experimental studies aimed at comparing the characteristics of P C FSMs with characteristics of equivalent FSMs based on MBCs and OHCs. These results show that CSC-based FSMs have significantly better spatial characteristics than those of the other models studied. It is very important that the improving spatial characteristics do not worsen the temporal characteristics.
As follows from [14], CSC-based FSMs could be used if the following condition holds:
R M B + L 2 S L .
If the condition
R C + K S L
is true, this leads to a single-level circuit of CLBerF. In this case, there are exactly two levels of LUTs in the circuits of P C FSMs. This is the best situation for using the composite state codes.
If condition (16) is violated, then the circuit of CLBerF is multi-level. This increases the number of LUTs and their interconnections in comparison with the equivalent single-level circuit. Thus, the violation of condition (16) leads to the deterioration of the increasing chip area occupied by the FPGA-based circuit of an FSM using CSCs.
Note that the circuit of CLBerF can be single level even if the condition (16) is violated. Let us explain this phenomenon. Any function β j D Y is represented by N P j K partial functions. If the condition (16) is violated, then the following condition may take place:
R C + N P j S L .
If condition (17) holds, then there is a single-LUT circuit implementing the corresponding function. If condition (17) is true for all functions β j D Y , then the circuit of the block CLBerF is single level.
If (17) is violated, then the circuit of CLBerF is multi-level. But even in this case, the CSC-based FSMs have better characteristics than their counterparts based on functional decomposition [14]. If the CSC-based FSM is the control device of a digital system, then decreasing the number of logic levels in the circuit of CLBerF will increase the performance of this digital system. This is also very important for accelerating, for example, the real-time analysis of defect detection in composite materials. Faster FSMs can improve the industrial quality control [38].
To diminish the number of logic levels in a CRC-based FSM, it makes sense to decrease the number of arguments in SOPs (13) and (14). One of the possible ways for solving this problem is the elimination of class variables from these SOPs. In this article, we propose such an approach.

4. The Essence of Proposed Method

In our current paper, we propose a method which eliminates the entry of class variables c r C into a block CLBerF. In this case, the classes A k Π A are encoded by one-hot codes. The states are still encoded by the maximum binary partial codes. Now, the set C includes K elements: C = { c 1 , , c K } . These variables enter LUTs from the first level of the FSM circuit. The variable c k C enters all LUTs of the block CLBerk. The method is focused on the situation when the following condition is violated:
R C + N P j S L + S 0 .
The value S L + S 0 is equal to the maximum number of SLUT inputs that are possible within one block of a CLB. For example, there is S 0 = 2 for CLBs of the Virtex-7 family [27]. If condition (18) is violated, then there are at least two levels of CLBs in the circuit of CLBerF. So, there are at least three levels of CLBs in the circuit of P C FSM. The proposed method allows obtaining a circuit having exactly two levels of CLBs. Therefore, if condition (18) is violated, there should be a performance gain in the transition to the proposed class coding method.
Because variables c k C enter the first logic level, the SBFs (11) and (12) should be changed. Now, the following SBFs are generated by blocks CLBer1-CLBerK:
D k = D k ( S , c k , X k ) ;
Y k = Y k ( S , c k , X k ) .
In this case, there are no connections between variables c r C and CLBerF. This means that SBFs (13) and (14) are transformed in the following way:
D = D ( D 1 , , D K ) ;
Y = Y ( Y 1 , , Y K ) .
As before, IMFs (20) are transformed into class variables c k C and state variables s r S . The transition from SBFs (11)–(14) to SBFs (19)–(22) transforms the initial P C FSM into P C O H FSM. The letters “OH” in the subscript “COH” mean that the classes A k Π A are encoded by OHCs. The proposed architecture of P C O H FSM is shown in Figure 3.
In P C O H FSM, SBFs (19) and (20) are generated by LUTs from the first logic level. These LUTs create the blocks CLBerk ( k { 1 , , K } ). The LUTs of block CLBerF generate the final values of FSM outputs y n Y , class variables c k C and state variables s r S . Pulses Start and Clock enter control inputs of flip-flops creating the hidden register R G . This register consists of R C O H flip-flops:
R C O H = K + R S .
As follows from comparison of the formulae (10) and (23), the R G of P C O H FSM has more flip-flops than the R G of equivalent P C FSM. The difference is equal to ( K l o g 2 K ).
To create the partition Π A , we use the greedy algorithm proposed in [39]. The condition (9) still determines the compatibility of states. So, equivalent P C O H and P C FSMs have the same number of blocks CLBerk. But LUTs of P C O H FSM require the additional input for the variable c k C . Obviously, it can lead to the necessity of using more than a single basic LUT for implementing a circuit for some partial Boolean function.
For each class of the partition Π A , each partial function β j k D k Y k is represented by an SOP having N A j k literals. If the condition
N A j k S L 1
holds, then including the variable c k C in this SOP still leads to a single-LUT circuit implementing this partial function. If (24) is violated, then the function β j k is represented by a multi-LUT circuit. Until the condition
N A j k S L + S 0 1
holds, then there are enough resources of a single CLB to implement the circuit generating the partial function β j k . This circuit is still fast, because the performances of super-LUTs and basic LUTs are practically the same [28].
If (25) is violated, then it is necessary to combine several CLBs to obtain the circuit for function β j k . This leads to slow circuits with complicated systems of interconnections [28]. If the condition (24) is violated, it makes sense to apply the adjacent codes for some states a m A k by adjacent codes. To accomplish such a state assignment, we can use the algorithm JEDI [34].
Consider the following situation. A class A 2 = { a 2 , a 4 , a 5 , a 7 } belongs to the partition Π A created for P C FSM. This class determines a set X 2 = { x 1 , x 2 } . Using (6) gives the value R S = 2 and the set S = { s 1 , s 2 } . Now, we are going to design the equivalent P C O H FSM using basic LUTs having S L = 4 . We can implement a super-LUT having five inputs using two basic LUTs and a single dedicated multiplexer. Furthermore, we use the symbol A m to denote the conjunction of state variables corresponding to the partial code P C ( a m ) .
Consider the following SOP created for some partial function y 3 2 Y 2 :
y 3 2 = A 2 x 1 x 2 ¯ A 7 x 1 x 2 ¯ .
Because there is R S = 2 , the expression (26) is characterized by the value N A 3 2 = 4 . Obviously, the corresponding circuit is implemented by a single basic 4-LUT. Let states a m A 2 be encoded in the following way: P C ( a 2 ) = 00 , P C ( a 4 ) = 01 , P C ( a 5 ) = 10 , and P C ( a 7 ) = 11 . In this case, Equation (26) is transformed into the following equation:
y 3 2 = s 1 ¯ s 2 ¯ x 1 x 2 ¯ s 1 s 2 x 1 x 2 ¯ .
When switching to the equivalent P C O H FSM, the expression (27) must be logically multiplied by c 2 determining the class A 2 Π A . This gives the expression
y 3 2 = ( s 1 ¯ s 2 ¯ x 1 x 2 ¯ s 1 s 2 x 1 x 2 ¯ ) c 2 .
There is N A 3 2 = 5 for SOP (28). So, the corresponding circuit includes two 4-LUTs and a dedicated multiplexer (Figure 4a).
To optimize the circuit (Figure 4a), we propose to use JEDI-based approach. In this case, partial codes of states a 2 , a 7 A 2 should be adjacent. One of the possible outcomes is the following: P C ( a 2 ) = 00 , P C ( a 4 ) = 10 , P C ( a 5 ) = 11 , and P C ( a 7 ) = 01 . Now, partial codes of states a 2 , a 7 A 2 are covered by the generalized cube 0x. Using these codes turns (28) into the following expression:
y 3 2 = s 1 ¯ x 1 x 2 ¯ c 2 .
Now, there is N A 3 2 = 4 for the SOP of y 3 2 . As a result, there is a single 4-LUT in the corresponding circuit (Figure 4b). Obviously, inside each class A k Π A , states should be encoded using the JEDI-based style of assignment.
Let us discuss the proposed method for designing LUT-based P C O H Mealy FSMs. We assume that an FSM is represented by its STG. This method is the following:
  • Transforming the initial STG into a state transition table.
  • Creating the partition Π A of set A with the minimum number of classes, K.
  • Encoding of classes A k Π A by one-hot codes C ( A k ) .
  • Encoding of states a m A k by partial maximum binary codes P C ( a m ) .
  • Creating composite state codes C C ( a m ) .
  • Creating tables of blocks CLBer1CLBerK.
  • Deriving SBFs (19) and (20) representing the first logic level of the FSM circuit.
  • Creating table of block CLBerF.
  • Deriving SBFs (21) and (22) representing the second logic level of the FSM circuit.
  • Implementing the LUT-based circuit of P C O H FSM.
Obviously, if an FSM is represented by its STT, then there is no need in the execution of step 1. The partition Π A may be obtained using the approach from the article [39]. This greedy approach minimizes the value of K. Now, we will discuss an example of synthesis through applying the proposed method.

5. Example of Synthesis of P COH Mealy FSM

Consider an STG (Figure 5) representing a Mealy FSM F 1 . In this section, there is a synthesis example for the P C O H Mealy FSM F 1 . The FSM circuit is implemented using basic LUTs having S L = 5 .
Step 1. The three following sets could be derived from Figure 5: A = { a 1 , , a 11 } , X = { x 1 , , x 7 } , and Y = { y 1 , , y 8 } . These sets are characterized by the following cardinality numbers: M = 11 , L = 7 and N = 8 , respectively. The STGs have 24 arcs. So, the corresponding STT has H = 21 rows (Table 1).
Each row of an STT corresponds to some STG arc. The following notation is used in Table 1: a m is a current state corresponding to the vertex from which the h-th arc leaves; a s is a next state corresponding to the vertex in which the h-th arc enters; X h is an input signal (a conjunction of variables x l X ) written above the h-th arc; Y h is a collection of FSM outputs written above the h-th arc ( Y h Y ); h is a number of transitions ( h { 1 , , H } ). The transition for an STG to an equivalent STT is transparent and straightforward [1].
Step 2. This step is executed using the greedy algorithm proposed in [39]. There is S L = 5 . This transforms condition (9) in the following way: R S + L k 5 . Applying the method from [39] gives the partition Π A = { A 1 , A 2 , A 3 } with K = 3 .
The states a m A are distributed in the following way: A 1 = { a 1 , a 3 , a 4 , a 6 } , A 2 = { a 2 , a 5 , a 9 , a 10 } , and A 3 = { a 7 , a 8 , a 11 } . These classes correspond to the following sets of inputs and outputs: X 1 = { x 1 , x 2 , x 4 } , X 2 = { x 3 , x 5 , x 6 } , X 3 = { x 3 , x 4 , x 7 } , Y 1 = { y 1 , y 2 , y 3 , y 5 , y 6 , y 7 } , Y 2 = { y 3 , y 4 , y 6 , y 7 , y 8 } and Y 3 = { y 1 , , y 6 , y 8 } . For each class, using (6) gives R k = 2 . Using (7) gives R S = 2 . There is L k = 3 for each class A k A . So, to generate any partial function, it is enough of a single LUT having five inputs.
Step 3. There is K = 3 . This gives the value R C = 3 and the set C = { c 1 , c 2 , c 3 } . There is no influence of class codes on the FPGA area required for implementing the circuits of CLBer1CLBerK. Due to it, we can encode the classes in the following way: C ( A 1 ) = 100 , C ( A 2 ) = 010 , and C ( A 3 ) = 001 .
Step 4. As we found earlier, the set of partial state variables contains R S = 2 elements: S = { s 1 , s 2 } . The same variables s r S are used to encode states for any class A k Π A .
To select partial state codes, it is necessary to split the source STT into K subtables, each of which corresponds to a certain class A k Π A . Next, we should use the algorithm JEDI for optimal state encoding. Naturally, this makes it possible to optimize only partial SOPs for functions y n k Y k .
Using this approach, we can assign the following partial codes. For the class A 1 Π A , there are the following codes: P C ( a 1 ) = 00 , P C ( a 3 ) = 01 , P C ( a 4 ) = 10 , and P C ( a 6 ) = 11 . For the class A 2 Π A , there are the following codes: P C ( a 2 ) = 00 , P C ( a 9 ) = 01 , P C ( a 10 ) = 10 , and P C ( a 5 ) = 11 . For the class A 3 Π A , there are the following codes: P C ( a 7 ) = 00 , P C ( a 8 ) = 01 , and P C ( a 11 ) = 10 .
Obviously, there are R C C elements in the set of IMFs. In the discussed case, there is a set D = { D 1 , , D 5 } . The IMFs D 1 , D 2 , D 3 D determine the class codes. The other three IMFs determine the partial state codes.
Step 5. Using (10) gives R C C = 5 . As follows from (8), to create composite state codes, we should use both class and partial state codes. For the discussed example, these codes are the following: C C ( a 1 ) = 10000 , C C ( a 2 ) = 01000 , C C ( a 3 ) = 10001 , C C ( a 4 ) = 10010 , C C ( a 5 ) = 01011 , C C ( a 6 ) = 10011 , C C ( a 7 ) = 00100 , C C ( a 8 ) = 00101 , C C ( a 9 ) = 01001 , C C ( a 10 ) = 01010 , and C C ( a 11 ) = 00110 .
In these codes, the class code is represented by the first three digits, and the partial state code is represented by two digits. These codes are used in tables of CLBer1-CLBer3.
Step 6. In the discussed case, CLBer1 is represented by Table 2, CLBer2 by Table 3, and CLBer3 by Table 4. These tables have the following columns: a m , P C ( a m ) , a s , C C ( a s ) , X h , Y h , D h , and h. For each state, the columns a m , a s , X h , and Y h are the same as they are in Table 1. The column D h is filled using the code C C ( a s ) .
Table 2 is constructed using lines 1–3, 6–10, 12–13 of initial STT (Table 1). Table 3 is constructed using lines 4–5, 11–21 of Table 1. Table 4 is constructed using lines 14–16 and 23–24 of the STT (Table 1).
Step 7. Table 2, Table 3 and Table 4 are used for deriving SBFs (10)–(20). This is accomplished in two stages. Stage 1 is reduced to creating partial SOPs using partial state codes and FSM inputs. During stage 2, the final values of SOPs are created by multiplying the outcomes of stage 2 by the corresponding class variable.
For our example, we show only SOPs of partial functions y 1 1 , D 1 1 (they are derived from Table 2), y 1 2 , D 1 2 (they are derived from Table 3) and y 1 3 , D 1 3 (they are derived from Table 4). These partial SOPs are represented by the following equations:
y 1 1 = [ F 1 1 F 4 1 ] c 1 = T 4 ¯ x 1 c 1 ; y 1 2 = 0 ; y 1 3 = [ F 1 3 F 4 3 ] c 3 = ( T 4 ¯ T 5 ¯ x 4 T 4 T 5 ¯ x 3 ) c 3 .
D 1 1 = ( [ F 2 1 F 3 1 ] F 5 1 [ F 7 1 F 8 1 ] ) c 1 = ( T 4 ¯ T 5 ¯ x 1 ¯ ) T 4 ¯ T 5 x 1 ¯ x 4 T 4 ¯ T 5 ¯ ) c 1 ; D 1 2 = ( [ F 1 2 F 2 2 ] F 5 2 F 7 2 ) c 2 = ( T 4 ¯ T 5 ¯ T 4 ¯ T 5 x 3 ¯ x 5 T 4 T 5 ¯ x 6 ) c 2 ; D 1 3 = [ F 5 3 F 6 3 ] c 3 = T 4 x 3 ¯ c 3 .
In these systems, variables F h k represent product terms corresponding to the CLBerk lines of the table. We combine some groups of terms and place them into square brackets to show that these terms can be represented by a single conjunction [1]. We hope there is a transparent connection between SBFs (30) and (31) and Table 2, Table 3 and Table 4, respectively.
Step 8. The table of CLBerF shows which partial functions should be connected using logical disjunction. Using this information, the final values of FSM outputs and IMFs are created. There are the following columns in this table: “F”, 1 , , K . The row number j corresponds to a function β j D Y . These functions are written in the column “F”. The column number k corresponds to the partial function β j k D k Y k . This column contains information about the participation of a particular partial function in the formation of the final value of function β j D Y . If a function β j D Y is generated by an LUT of CLBerk, then there is a sign “+” on the intersection of the column k and row j. In the discussed case, CLBerF is represented by Table 5.
Step 9. Using Table 5 allows obtaining a system of disjunctions representing functions β j D Y . In the discussed case, the functions D 1 and y 1 are represented by SBF (32):
y 1 = y 1 1 y 1 3 ; D 1 = D 1 1 D 1 2 D 1 3 .
Using a similar approach, we can obtain the SOPs of all functions belonging to SBFs (19) and (20). The IMFs enter inputs of flip-flops. This results in generating the class variables c k C and state variables s r S on the outputs of CLBerF.
Step 10. To implement the FPGA-based circuit, it is necessary to apply some industrial CAD tools. These tools execute the technology mapping [30,40]. In the case of Virtex-based design, we should use the package Vivado [41]. We do not discuss this step for our example.
Now, we are going to evaluate the temporal and spatial characteristics of the designed LUT-based FSM circuit. For a rough estimate of the cycle time, we should find the number of CLB levels in the circuit. The spatial characteristic is represented by an LUT count. To design the circuit, we use five-input LUTs.
As follows from Table 2, the LUTs of CLBer1 generate six output functions and five input memory functions. We analyzed the minimized SBFs (19) and (20) for this block. The minimization is executed by an algorithm, JEDI. As a result of the analysis of the output functions y n 1 Y 1 , we found that seven LUTs and one dedicated multiplexer are enough to generate them. To implement a circuit generating the partial function y 2 1 , two 5-LUTs are combined into a super-LUT. To accomplish this, we use a dedicated multiplexer. As a result of the analysis of the IMFs D r 1 D 1 , we found that eight LUTs and three dedicated multiplexers are enough to generate them. The circuit of CLBer1 includes 15 basic LUTs and four dedicated multiplexers. Also, this circuit has a single level of CLBs. To implement circuits for some functions β j 1 D 1 Y 1 , it is necessary to combine two LUTs. But this is accomplished using internal fast interconnections of a CLB [28]. So, this combining practically does not affect the time characteristics (compared with single-LUT circuits).
A similar analysis was performed for the blocks CLBer2 and CLBer3. As a result, it was found that: (1) the circuit of block CLBer2 includes 14 LUTs and four multiplexers, and (2) the circuit of block CLBer3 includes 13 LUTs and one multiplexer. Both schemes include one level of CLBs. But some functions are generated by SLUTs.
So, there are 42 LUTs and nine multiplexers on the first logic level of the designed P C O H FSM circuit. The circuit has one level of CLBs, although some functions are generated using super-LUTs.
As follows from Table 5, there are 12 basic LUTs in the circuit of CLBerF. The output y 5 Y is generated only by an LUT of CLBer1. Due to it, there is no need to use an LUT of CLBerF to generate the output y 5 Y . To generate any other function, it is enough to use either two or three inputs of basic 5-LUTs. So, the circuit of CLBerF is a single-level. Moreover, there is no need in using dedicated multiplexers (each function is represented by a single-LUT circuit).
To summarize, the designed FSM circuit consists of 52 basic LUTs and nine dedicated multiplexers. There are two levels of CLBs in this circuit. Each partial function is implemented by a circuit having almost the same latency time as a single-LUT circuit (this effect is achieved through the use of intra-CLB dedicated multiplexers).
To analyze the efficiency of the proposed approach and some other known methods, we have conducted a lot of experiments. We have compared the characteristics of P C O H FSMs with characteristics of their counterparts based on MBCs, OHCs and CSCs. The results of the experiments are shown in Section 6.

6. Experimental Results

We have conducted the experiments to compare the temporal (maximum operating frequencies) and spatial (LUT counts) characteristics of FSM circuits based on various known state encoding methods and circuits of P C O H FSMs. As an example of the MBC-based method, we use the method Auto of Vivado [41]. The one-hot of Vivado [41] is used as an example of an OHC-based state assignment. Also, we compared our approach with the algorithm JEDI [34], which is an MBC-based algorithm creating cubes covering codes of some states [30]. We compared the proposed approach with P C FSMs [14].
In experiments, we use benchmark FSMs from the well-known library LGSynth93 [42]. There are 48 Mealy FSMs (benchmarks) in the library. The benchmarks are represented in the format KISS2. The benchmark FSMs have a wide range of basic characteristics (numbers of states, inputs, outputs, and transitions). A lot of scientists have been using these benchmarks as a basis for the comparison of various design methods [43,44,45]. The characteristics of benchmarks are shown in Table 6. The last column of this table includes the summarized number of FSM inputs and bitness of MBCs ( L + R M B ). We explain a bit later why this column is added.
To conduct the experiments, we use a platform including the FPGA chip of the Virtex-7 family. This is a VC709 Evaluation Platform (xc7vx690tffg1761-2) [27]. The CLBs used include basic LUTs with six address inputs ( S L = 6 ). To implement a CLB-based circuit of CLBerF, the dedicated multiplexers can be used. Each CLB includes four basic LUTs and three dedicated multiplexers. We use the industrial CAD package Vivado v2019.1 (64-bit) [41] to execute the step of technology mapping. Using the Vivado reports, we created four tables showing the results of the conducted experiments.
Each Boolean function β j D Y depends on N A j arguments. If condition (4) is violated for all IMFs and FSM outputs, then there are exactly R M B + N LUTs in the circuit of MB-FSM. Also, such a circuit is single level. Obviously, if condition (4) is violated, then the LUT-based MB-FSM circuit has the best possible characteristics: the minimum values of LUT count and time of cycle. In this case, there is no need for the optimization of such a circuit. So, in this case, it makes sense just to use the model of MB-FSM.
As shown in [14], the condition (4) holds if a benchmark included into the library [42] satisfies the following relation:
L + R M B > 2 S L .
If condition (33) holds, then we can replace MBC-based FSMs with FSMs with composite state codes. Thus, it makes sense to use either CSC- or MSC-based models for those FSMs, where relation (33) holds. There is S L = 6 . So, we show experimental results only for the benchmarks satisfying the condition L + R M B > 12 . The experimental results are shown in Table 7 (cycle times, nanoseconds), Table 8 (maximum operating frequency, MHz), Table 9 (LUT counts), Table 10 (area-time products), and Table 11 (power consumption).
There are the following columns in Table 7, Table 8, Table 9, Table 10 and Table 11: FSM (names of benchmark FSMs); MB (experimental results for FSMs with maximum binary state codes); OH (experimental results for FSMs with one-hot state codes); JEDI (experimental results for FSMs whose state codes are encoded using the JEDI algorithm); P C (experimental results for FSMs with composite state codes); P C O H (our new approach); L + R M B . The results of the summation of values from the corresponding columns are shown in the row “Total”. The row “Percentage” shows the percentage of summarized characteristics of investigated FSM circuits, respectively, to P C O H FSMs.
The main goal of the proposed method is the increasing performance of LUT-based FSM circuits in relation to the circuits of P C -based FSMs. Due to this, we start the research analysis from cycle times (Table 7).
Table 7. Results of experiments (cycle time, nsec).
Table 7. Results of experiments (cycle time, nsec).
FSMMBOHJEDI P C P COH L + R MB
ex16.6257.1555.6543.6203.45616
kirkman7.0736.4946.3824.3593.95618
planet7.5357.5355.3443.4913.34214
planet17.5357.5355.3443.4913.34214
pma6.8416.8415.8884.0803.76214
s16.8307.3616.3634.2333.94814
s14887.2207.5796.3624.3844.18115
s14946.6946.8616.0854.4204.14115
s1a6.5205.6695.9114.5214.19615
s5105.6295.6295.5124.5813.62727
s8206.5796.5295.6634.4193.60525
s8326.8636.5265.7544.5533.87125
sand8.6238.6237.8854.4244.09418
styr7.2677.6976.8663.7193.53716
tma6.1026.7666.0923.5573.60813
Total103.937104.80091.10661.85156.666
Percentage, %183.42184.94160.78109.15100.00
As follows from Table 7, the proposed method of creating composite state codes makes it possible to obtain LUT-based FSM circuits with shorter cycle times than those of the other studied FSMs. As can be seen from Table 7, P C O H -based FSMs have the following gain: (1) 83.42% compared with MBC-based FSMs; (2) 84.94% compared with OHC-based FSMs; (3) 60.78% compared with JEDI-based FSMs and (4) 9.15% compared with P C -based FSMs.
Now, we are going to compare the temporal characteristics of equivalent P C and P C O H FSMs. Consider the results for the benchmark tma. As follows from Table 7, our method produces a bit slower circuit than the circuits produced using the approach [14]. The loss is around 1.5%. However, our method gives a payoff starting from the FSMs for which L + R M B = 14 (planet, planet1, pma, s1). There is the gain equal to 26.3% for the most complex benchmark s510 having the number 27 in the last column. There is the gain in 22.5% for s820 and 17.7% for s832. Both benchmarks have L + R M B = 25 . So, the gain from the applying the proposed method increases with the growth of the value L + R M B .
We think that this phenomenon is related to the difference in the number of CLBs connected in series in the CLBerF circuit. As the value of L + R M B increases, the number of consecutive CLBs grows faster for P C FSMs than for equivalent P C O H FSMs. From this, we can conclude that the proposed method leads to LUT-based FSM circuits with better temporal characteristics compared with these characteristics of equivalent P C FSM circuits.
The values of cycle times are taken directly from the Vivado reports. Using the values from Table 7, we can obtain the values of the maximum operating frequencies. These results are shown in Table 8.
Table 8. Results of experiments (maximum operating frequency, MHz).
Table 8. Results of experiments (maximum operating frequency, MHz).
FSMMBOHJEDI P C P COH L + R MB
ex1150.94139.76176.87276.23289.3116
kirkman141.38154.00156.68229.41252.7718
planet132.71132.71187.14286.48299.2614
planet1132.71132.71187.14286.48299.2614
pma146.18146.18169.83245.12265.8314
s1146.41135.85157.16236.24253.2914
s1488138.50131.94157.18228.12239.1815
s1494149.39145.75164.34226.23241.4815
s1a153.37176.40169.17221.18238.3215
s510177.65177.65181.42218.31275.7227
s820152.00153.16176.58226.28277.4125
s832145.71153.23173.78219.65258.3225
sand115.97115.97126.82226.03244.2418
styr137.61129.92145.64268.92282.7416
tma163.88147.80164.14281.14277.1613
Total2184.412173.032493.893675.823994.29
Percentage, %54.6954.4062.4492.03100.00
As follows from Table 8, the proposed method produces FPGA-based FSM circuits with higher frequencies than the other studied FSMs. As can be seen from Table 8, P C O H -based FSMs have the following gain: (1) 45.31% compared with MBC-based FSMs; (2) 45.60% compared with OHC-based FSMs; (3) 37.56% compared with JEDI-based FSMs and (4) 7.87% compared with P C -based FSMs. The reasons for this state of affairs have already been considered in the analysis of the data from Table 7. We do not repeat them.
The main goal of the proposed approach is to improve the temporal characteristics of LUT-based FSM circuits in relation to circuits of equivalent P C -based FSMs. However, it is very important that the increase in performance did not lead to a significant increase in the number of LUTs used. Comparison of the spatial characteristics of different FSM models is shown in Table 9.
Table 9. Results of experiments (LUT count).
Table 9. Results of experiments (LUT count).
FSMMBOHJEDI P C P COH L + R MB
ex1707453404816
kirkman425839314218
planet13113188778214
planet113113188778214
pma949486727814
s1659961556114
s1488124131108879315
s1494126132110859115
s1a498143435215
s510484832263427
s820888268485125
s832807962505425
sand1321321149110218
styr9312081718116
tma453939313713
Total131814311072884988
Percentage, %133.40144.84108.5089.47100.00
As follows from Table 7 and Table 9, an increase in performance by an average of 9% leads to an increase in the LUT counts by an average of 10%.We consider this ratio to be quite acceptable. Note that the proposed approach allows improving the spatial characteristics in comparison with all methods except for the method of maximum encoding of classes used in P C FSMs. Our method gains the following on average: (1) 33.4% compared with MB-based FSMs; (2) 44.84% compared with OHC-based FSMs and (3) 8.5% compared with JEDI-based FSMs.
At the same time, for the most complex benchmark FSM s510, the loss reaches around 25%. However, for simpler benchmarks (planet, planet1), the loss is about 6.5%. Thus, an increase in the value of the sum L + R M B leads to an increase in the loss of the proposed approach in relation to equivalent P C FSMs.
There are several integral evaluations of the quality of a digital circuit. One of the main integral characteristics is the product of the area of the circuit and its performance (area-time product) [15]. In the case of LUT-based circuits, the area is estimated using such a characteristic as LUT count. Naturally, performance is measured in terms of cycle time. The smaller the value of this integral characteristic is, the higher the quality of the circuit. Using Vivado reports, we created Table 10 including values of area-time products for circuits of benchmark FSMs.
Table 10. Results of experiments (area-time products).
Table 10. Results of experiments (area-time products).
FSMMBOHJEDI P C P COH L + R MB
ex1463.76529.48299.66144.81165.9116
kirkman297.07376.62248.91135.13166.1618
planet987.11987.11470.24268.78274.0114
planet1987.11987.11470.24268.78274.0114
pma643.04643.04506.39293.73293.4214
s1443.96728.74388.14232.81240.8314
s1488895.31992.88687.11381.38388.8315
s1494843.43905.66669.34375.72376.8415
s1a319.49459.18254.18194.41218.1915
s510270.19270.19176.39119.10123.3127
s820578.95535.39385.09212.13183.8425
s832549.04515.56356.77227.63209.0425
sand1138.231138.23898.91402.60417.6218
styr675.82923.65556.17264.02286.4816
tma274.59263.87237.60110.27133.5013
Total936710257660536313752
Percentage, %249.66273.37176.0496.78100.00
As follows from Table 10, the proposed method makes it possible to obtain LUT-based FSM circuits with slightly larger area-time products than those of their P C -based counterparts. The P C O H FSM circuits lose in relation to equivalent CSC-based FSMs (the loss is 3.72%). Obviously, this loss is very small. However, our method provides significantly better results than the other methods studied. The gain is 149.66% in relation to MB-based FSMs, 173.37% in relation to OH-based FSMs, and 76.04% in relation to JEDI-based FSMs.
It is interesting to note that our method always loses if there is R M B + L < 25 . At the same time, there is a gain when there is R M B + L = 25 (benchmarks s820 and s832) but a loss when there is R M B + L = 27 (benchmark s510). It follows from this that the gain or loss depends on how many levels of CLBs are needed to generate partial functions. And this, in turn, depends on the number of arguments in the respective SOPs.
One of the very important issues is the power consumption [46]. This factor significantly affects the lifetime of mobile and autonomous devices [46,47]. As can be seen from Table 9, our method leads to an increase in the value of LUT count compared to equivalent CSC-based FSMs. This can lead to an increase in power consumption. To check the overhead, we used reports of Vivado. The results are shown in Table 11.
As follows from Table 11, the proposed method makes it possible to obtain LUT-based FSM circuits consuming more power than the equivalent P C -based FSMs. The P C O H FSM circuits lose in relation to equivalent CSC-based FSMs (the loss is 7.63%). This overhead is connected with the growth of LUT counts and the number of interconnections. However, our method provides significantly better results than the other methods based on both maximum binary codes (the gain is 27.62%) and one-hot state codes (the gain is 29.42%). Also, our method gives a small advantage in relation to JEDI-based FSMs (the gain is 3.03%).
Table 11. Results of experiments (power consumption, Watts).
Table 11. Results of experiments (power consumption, Watts).
FSMMBOHJEDI P C P COH L + R MB
ex14.5643.4302.8042.6122.3512.53916
kirkman2.2042.3551.9501.8541.6691.80218
planet4.5534.5532.8872.9142.6232.83214
planet14.5534.5532.8872.9142.6232.83214
pma1.8181.8181.7011.7261.5531.67814
s13.1333.5782.9663.0892.7803.00314
s14884.4304.5443.9964.0013.6013.88915
s14943.5273.6263.4303.5233.1713.42415
s1a1.7702.4581.6561.6721.5051.62515
s5102.1662.1661.7141.6431.4791.62727
s8201.1281.1971.1241.1421.0281.13125
s8322.6622.4092.0711.9151.7241.89625
sand1.6401.6401.4791.4211.2791.38118
styr4.5065.2333.6493.7213.3493.61716
tma2.0201.7451.7521.7811.6031.73113
Total44.6745.3136.0735.9332.3435.01
Percentage, %127.62129.42103.03102.6392.37100.00
It should be noted that the impact of increased FSM performance can eliminate the shortcomings associated with increased power consumption. As the performance of the control FSM increases, the system performs the necessary calculations faster. Reducing the calculation time leads to less power consumption of the system as a whole. Thus, our method can lead to a decrease in power consumption by the system as a whole despite the increase in consumption by the FSM circuit.
An analysis of the results of the studies carried out allows drawing the following conclusion regarding the equivalent P C O H - and P C -based FSMs. If condition (33) holds, then P C O H FSMs’ circuits have better performance which is not accompanied by a significant deterioration in spatial characteristics (the LUT counts). In this case, the equivalent P C O H - and P C -based FSMs have almost the same values of area-time products. This fact means that our approach gains in performance almost as much as it loses in area occupied by FSM circuits. So, the proposed method can be used if the main criterion for the FSM circuit optimality is the maximum performance.

7. Discussion

One of the basic problems associated with LUT-based FSM design is the problem of improving the temporal characteristics of produced circuits. Under certain conditions, the best spatial characteristics are provided if the composite state assignment is used [14]. This method is based on finding a partition of the set of states by the minimum number of classes of compatible states. In [14], there is the minimum possible number of bits in the class codes. Inside each class, states are encoded by partial maximum binary codes. To decrease the value of cycle time of CSC-based FSMs, we propose to encode classes by one-hot codes.
As the conducted experiments show, the proposed method enables reducing the value of cycle time compared with equivalent CSC-based FSMs. The reducing is possible for rather complex FSMs for which the condition (33) holds. So, the proposed method could be applied if the value of L + R is at least twice the number of inputs of the base LUT. For example, for FSMs whose circuits are implemented using 6-input LUTs, the temporal improving is possible if L + R M B > 12 . In this case, the increase in performance (the increase in maximum operating frequency) is accompanied by an increase in the LUT count (the number of LUTs required to implement an FSM circuit). Interestingly, in percentage terms, these gains and losses are almost the same.
Thus, the proposed state assignment method allows increasing the FSM performance without the significant deterioration of spatial characteristics of FSM circuits. So, the proposed method can be used for designing rather complex FSMs if the main optimality criterion for FSM circuit is the maximum performance.

Author Contributions

Conceptualization, A.B., L.T. and K.K.; methodology, A.B., L.T., K.K. and S.S.; software, A.B., L.T. and K.K.; validation, A.B., L.T. and K.K.; formal analysis, A.B., L.T., K.K. and S.S.; investigation, A.B., L.T. and K.K.; writing—original draft preparation, A.B., L.T., K.K. and S.S.; supervision, A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CLBconfigurable logic block
COcollection of outputs
CSCcomplex state codes
DSTdirect structure table
FDfunctional decomposition
FPGAfield-programmable gate array
FSMfinite state machine
IMFinput memory function
LUTlook-up table
MBCmaximum binary codes
OHCone-hot codes
SBFsystems of Boolean functions
SOPsum-of-products
STGstate transition graph
STTstate transition table

References

  1. Micheli, G.D. Synthesis and Optimization of Digital Circuits; McGraw-Hill: Cambridge, MA, USA, 1994. [Google Scholar]
  2. Baranov, S. Finite State Machines and Algorithmic State Machines: Fast and Simple Design of Complex Finite State Machines; Amazon: Seattle, WA, USA, 2018; p. 185. [Google Scholar]
  3. Kubica, M.; Kania, D. Technology mapping oriented to adaptive logic modules. Bull. Pol. Acad. Sci. 2019, 67, 947–956. [Google Scholar]
  4. Trimberger, S.M.S. Three ages of fpgas: A retrospective on the first thirty years of fpga technology: This paper reflects on how moore’s law has driven the design of fpgas through three epochs: The age of invention, the age of expansion, and the age of accumulation. IEEE Solid-State Circuits Mag. 2018, 10, 16–29. [Google Scholar]
  5. Ruiz-Rosero, J.; Ramirez-Gonzalez, G.; Khanna, R. Field Programmable Gate Array Applications—A Scientometric Review. Computation 2019, 7, 63. [Google Scholar] [CrossRef]
  6. Gazi, O.; Arli, A. State Machines Using VHDL: FPGA Implementation of Serial Communication and Display Protocols; Springer: Berlin/Heidelberg, Germany, 2021; p. 326. [Google Scholar]
  7. Barkalov, A.; Titarenko, L.; Krzywicki, K. Logic Synthesis for FPGA-Based Mealy Finite State Machines: Structural Decomposition in Logic Design; CRC Press, Taylor & Francis Group: Boca Raton, FL, USA, 2024; ISBN 9781003536734. [Google Scholar]
  8. Senhadji-Navarro, R.; Garcia-Vargas, I. Mapping Outputs and State Encoding Bits to Outputs Using Multiplexers in Finite State Machine Implementation. Electronics 2023, 12, 502. [Google Scholar] [CrossRef]
  9. Senhadji-Navarro, R.; Garcia-Vargas, I. Mapping Arbitrary Logic Functions onto Carry Chains in FPGAs. Electronics 2022, 11, 27. [Google Scholar] [CrossRef]
  10. Senhadji-Navarro, R.; Garcia-Vargas, I. High-performance architecture for binary-tree-based finite state machines. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2017, 37, 796–805. [Google Scholar]
  11. Grout, I. Digital Systems Design with FPGAs and CPLDs; Elsevier Science: Amsterdam, The Netherlands, 2011. [Google Scholar]
  12. Kuon, I.; Tessier, R.; Rose, J. FPGA architecture: Survey and challenges—Found trends. Electr. Des. Autom. 2008, 2, 135–253. [Google Scholar]
  13. Amagasaki, M.; Shibata, Y. FPGA Structure. Principles and Structures of FPGAs; Springer: Singapore, 2018; pp. 47–86. [Google Scholar]
  14. Barkalov, A.; Titarenko, L.; Krzywicki, K. Improving Characteristics of LUT-Based Sequential Blocks for Cyber-Physical Systems. Energies 2022, 15, 2636. [Google Scholar] [CrossRef]
  15. Islam, M.M.; Hossain, M.S.; Shahjalal, M.D.; Hasan, M.K.; Jang, Y.M. Area-time efficient hardware implementation of modular multiplication for elliptic curve cryptography. IEEE Access 2020, 8, 73898–73906. [Google Scholar] [CrossRef]
  16. Mazur, P.; Czerwinski, R.; Chmiel, M. PLC implementation in the form of a System-on-a-Chip. Bull. Pol. Acad. Sci. Tech. Sci. 2020, 68, 1263–1273. [Google Scholar] [CrossRef]
  17. Marwedel, P. Embedded System Design: Embedded Systems Foundations of Cyber-Physical Systems, and the Internet of Things; Springer Nature: Cham, Switzerland, 2021. [Google Scholar]
  18. Ashjaei, M.; Bello, L.L.; Daneshtalab, M.; Patti, G.; Saponara, S.; Mubeen, S. Time-Sensitive Networking in automotive embedded systems: State of the art and research opportunities. J. Syst. Archit. 2021, 117, 102137. [Google Scholar] [CrossRef]
  19. Bayılmış, C.; Ebleme, M.A.; Çavuşoğlu, Ü.; Küçük, K.; Sevin, A. A survey on communication protocols and performance evaluations for Internet of Things. Digit. Commun. Netw. 2022, 8, 1094–1104. [Google Scholar] [CrossRef]
  20. Kopetz, H.; Steiner, W. Real-Time Systems: Design Principles for Distributed Embedded Applications; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
  21. Salauyou, V. Area and Performance Estimates of Finite State Machines in Reconfigurable Systems. Appl. Sci. 2024, 14, 11833. [Google Scholar] [CrossRef]
  22. Salauyou, V.; Bułatow, W. Optimized Sequential State Encoding Methods for Finite-State Machines in Field-Programmable Gate Array Implementations. Appl. Sci. 2024, 14, 5594. [Google Scholar] [CrossRef]
  23. Baranov, S. High Level Synthesis of Digital Systems: For Data Path and Control Dominated Systems; ISBN Canada: Ottawa, ON, Canada, 2018; ISBN 1775091716. [Google Scholar]
  24. Salauyou, V.; Klimowicz, A.; Grzes, T. High-Performance Digital Devices Design by the ASMD-FSMD Technique for Implementation in FPGA. Appl. Sci. 2025, 15, 410. [Google Scholar] [CrossRef]
  25. Baranov, S. From Algorithm to Digital System: HSL and RTL tool Sinthagate in Digital System Design; Amazon: Seattle, WA, USA, 2020; p. 76. [Google Scholar]
  26. Trimberger, S.M. Field-Programmable Gate Array Technology; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  27. Xilinx, Inc. VC709 Evaluation Board for the Virtex-7 FPGA User Guide; UG887 (v1.6); Xilinx, Inc.: San Jose, CA, USA, 2019; Available online: https://docs.amd.com/v/u/en-US/ug887-vc709-eval-board-v7-fpga (accessed on 17 March 2025).
  28. Chapman, K. Multiplexer Design Techniques for Datapath Performance with Minimized Routing Resources. Application Note. 2014. Available online: https://docs.amd.com/v/u/en-US/xapp522-mux-design-techniques (accessed on 17 March 2025).
  29. Sklarova, D.; Sklarov, V.A.; Sudnitson, A. Design of FPGA-Based Circuits Using Hierarchical Finite State Machines; TUT Press: Tallinn, Estonia, 2012. [Google Scholar]
  30. Kubica, M.; Opara, A.; Kania, D. Technology mapping for LUT-based FSMs. In Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2021; Volume 713, p. 216. [Google Scholar]
  31. Sklyarov, V. Synthesis and implementation of RAM-based finite state machines in FPGAs. In International Workshop on Field Programmable Logic and Applications; Springer: Berlin/Heidelberg, Germany, 2000; pp. 718–727. [Google Scholar]
  32. Park, J.; Yoo, H. Area-efficient fault tolerance encoding for Finite State Machines. Electronics 2020, 9, 1110. [Google Scholar] [CrossRef]
  33. Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving Characteristics of LUT-Based Mealy FSMs with Twofold State Assignment. Electronics 2021, 10, 901. [Google Scholar] [CrossRef]
  34. Sentowich, E.; Singh, K.; Lavango, L.; Moon, C.; Murgai, R.; Saldanha, A.; Savoj, H.; Stephan, P.R.; Bryton, R.K.; Sangiovanni-Vincentelli, A. SIS: A System for Sequential Circuit Synthesis; University of California: Berkely, CA, USA, 1992. [Google Scholar]
  35. Tatalov, E. Synthesis of Compositional Microprogram Control Units for Programmable Devices. Master’s Thesis, Donetsk National Technical University, Donetsk, Ukraine, 2011. [Google Scholar]
  36. Barkalov, A.; Titarenko, L.; Mielcarek, K. Improving characteristics of LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2020, 30, 745–759. [Google Scholar] [CrossRef]
  37. Sasao, T.; Mishchenko, A. LUTMIN: FPGA logic synthesis with MUX-based and cascade realizations. Proc. IWLS 2009, 310–316. [Google Scholar]
  38. Versaci, M.; Laganà, F.; Morabito, F.C.; Palumbo, A.; Angiulli, G. Adaptation of an Eddy Current Model for Characterizing Subsurface Defects in CFRP Plates Using FEM Analysis Based on Energy Functional. Mathematics 2024, 12, 2854. [Google Scholar] [CrossRef]
  39. Barkalov, O.; Titarenko, L.; Mielcarek, K. Hardware reduction for LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2018, 28, 595–607. [Google Scholar]
  40. Mishchenko, A.; Chattarejee, S.; Brayton, R. Improvements to technology mapping for LUT-based FPGAs. IEEE Trans. CAD 2006, 27, 240–253. [Google Scholar]
  41. Vivado Design Suite User Guide: Synthesis. UG901 (v2019.1). Available online: https://docs.amd.com/v/u/2019.1-English/ug901-vivado-synthesis (accessed on 17 March 2025).
  42. McElvain, K. LGSynth93 Benchmark; Mentor Graphics: Wilsonville, OR, USA, 1993. [Google Scholar]
  43. Feng, W.; Greene, J.; Mishchenko, A. Improving FPGA Performance with a S44 LUT structure. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA”18), Monterey, CA, USA, 25–27 February 2018; p. 6. [Google Scholar] [CrossRef]
  44. Kubica, M.; Opara, A.; Kania, D. Logic synthesis for FPGAs based on cutting of BDD. Microprocess. Microsyst. 2017, 52, 173–187. [Google Scholar]
  45. Kubica, M.; Kania, D.; Kulisz, J. A technology mapping of fsms based on a graph of excitations and outputs. IEEE Access 2019, 7, 16123–16131. [Google Scholar]
  46. Lee, J.; Kang, S.; Lee, J.; Shin, D.; Han, D.; Yoo, H.J. The hardware and algorithm co-design for energy-efficient DNN processor on edge/mobile devices. IEEE Trans. Circuits Syst. Regul. Pap. 2020, 67, 3458–3470. [Google Scholar]
  47. Li, Y.; Ibanez-Guzman, J. Lidar for autonomous driving: The principles, challenges, and trends for automotive lidar and perception systems. IEEE Signal Process. Mag. 2020, 37, 50–61. [Google Scholar]
Figure 1. Architecture of LUT-based MB-FSM.
Figure 1. Architecture of LUT-based MB-FSM.
Electronics 14 01406 g001
Figure 2. Architecture of LUT-based P C FSM.
Figure 2. Architecture of LUT-based P C FSM.
Electronics 14 01406 g002
Figure 3. Architecture of P C O H FSM.
Figure 3. Architecture of P C O H FSM.
Electronics 14 01406 g003
Figure 4. Implementing function y 3 2 using arbitrary (a) and JEDI-based (b) partial state codes.
Figure 4. Implementing function y 3 2 using arbitrary (a) and JEDI-based (b) partial state codes.
Electronics 14 01406 g004
Figure 5. State transition graph of Mealy FSM F 1 .
Figure 5. State transition graph of Mealy FSM F 1 .
Electronics 14 01406 g005
Table 1. State transition table of FSM F 1 .
Table 1. State transition table of FSM F 1 .
a m a s X h Y h h
a 1 a 2 x 1 y 1 1
a 3 x 1 ¯ x 2 y 2 y 5 2
a 4 x 1 ¯ x 2 ¯ y 2 y 5 3
a 2 a 1 x 3 y 4 4
a 3 x 3 ¯ y 6 5
a 3 a 2 x 1 y 1 6
a 4 x 1 ¯ x 4 y 2 y 5 7
a 5 x 1 ¯ x 4 ¯ y 3 y 5 8
a 4 a 4 x 2 y 3 y 6 9
a 6 x 2 ¯ y 7 10
a 5 a 2 1 y 7 11
a 6 a 5 x 2 y 3 12
a 7 x 2 ¯ y 2 y 6 13
a 7 a 8 x 4 y 1 y 5 14
a 9 x 4 ¯ y 2 15
a 8 a 7 1 y 5 y 8 16
a 9 a 8 x 3 y 4 17
a 6 x 3 ¯ x 5 y 6 y 7 18
a 10 x 3 ¯ x 5 ¯ y 6 19
a 10 a 6 x 6 y 3 y 8 20
a 11 x 6 ¯ y 7 21
a 11 a 10 x 3 y 1 22
a 4 x 3 ¯ x 7 y 2 y 6 23
a 1 x 3 ¯ x 7 ¯ y 3 y 4 24
Table 2. CLBer1 of Mealy FSM F 1 .
Table 2. CLBer1 of Mealy FSM F 1 .
a m PC ( a m ) a s CC ( a s ) X h Y h D h h
a 1 00 a 2 01000 x 1 y 1 D 2 1
a 3 10001 x 1 ¯ x 2 y 2 y 5 D 1 D 5 2
a 4 10010 x 1 ¯ x 2 ¯ y 2 y 5 D 1 D 4 3
a 3 01 a 2 01000 x 1 y 1 D 2 4
a 4 10010 x 1 ¯ x 4 y 2 y 5 D 1 D 4 5
a 5 10011 x 1 ¯ x 4 ¯ y 3 y 5 D 2 D 4 D 5 6
a 4 10 a 4 10010 x 2 y 3 y 6 D 1 D 4 7
a 6 10011 x 2 ¯ y 7 D 1 D 4 D 5 8
a 6 11 a 5 01011 x 2 y 3 D 2 D 4 D 5 9
a 7 001 x 2 ¯ y 2 y 6 D 3 10
Table 3. CLBer2 of Mealy FSM F 1 .
Table 3. CLBer2 of Mealy FSM F 1 .
a m PC ( a m ) a s CC ( a s ) X h Y h D h h
a 2 00 a 1 10000 x 3 y 4 D 1 1
a 3 10001 x 3 ¯ y 6 D 1 D 5 2
a 5 11 a 2 010001 y 7 D 2 3
a 9 01 a 8 001 x 3 y 4 D 3 4
a 6 10011 x 3 ¯ x 5 y 6 y 7 D 1 D 4 D 5 5
a 10 010 x 3 ¯ x 5 ¯ y 6 D 2 6
a 10 10 a 6 10011 x 6 y 3 y 8 D 1 D 4 D 5 7
a 11 001 x 6 ¯ y 7 D 3 8
Table 4. CLBer3 of Mealy FSM F 1 .
Table 4. CLBer3 of Mealy FSM F 1 .
a m PC ( a m ) a s CC ( a s ) X h Y h D h h
a 7 00 a 8 00101 x 4 y 1 y 5 D 3 D 5 1
a 9 01001 x 4 ¯ y 2 D 2 D 5 2
a 8 01 a 7 001001 y 5 y 8 D 3 3
a 11 10 a 10 01010 x 3 y 1 D 2 D 4 4
a 4 10010 x 3 ¯ x 7 y 2 y 6 D 1 D 4 5
a 1 10000 x 3 ¯ x 7 ¯ y 3 y 4 D 1 6
Table 5. CLBerF of Mealy FSM F 1 .
Table 5. CLBerF of Mealy FSM F 1 .
F123
y 1 +-+
y 2 +-+
y 3 +++
y 4 -++
y 5 +--
y 6 +++
y 7 +++
y 8 -++
D 1 +++
D 2 +++
D 3 +++
D 4 +++
D 5 +++
Table 6. Basic characteristics of benchmarks from library [42].
Table 6. Basic characteristics of benchmarks from library [42].
BenchmarkLNHM L + R MB
bbara4260108
bbsse77561611
bbtas222465
beecount342876
cse77911611
dk14355676
dk15353245
dk1623108277
dk17233285
dk27121474
dk5121315155
donfile2196247
ex19191382014
ex22272197
ex32236106
ex469211410
ex5223296
ex6583488
ex72236106
keyb771701912
kirkman1263701616
lion211144
lion9212596
mark151622159
mc351045
modulo121124125
opus5622109
planet7191154813
planet17191154813
pma88732413
s1871062013
s14888192514814
s14948192504814
s1a861072013
s27413467
s29836109621811
s38677641311
s510197774725
s8412057
s82018192322523
s83218192452523
sand1191843216
shiftreg111684
sse77561611
styr9101663014
tma89442013
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving Temporal Characteristics of Mealy FSM with Composite State Codes. Electronics 2025, 14, 1406. https://doi.org/10.3390/electronics14071406

AMA Style

Barkalov A, Titarenko L, Krzywicki K, Saburova S. Improving Temporal Characteristics of Mealy FSM with Composite State Codes. Electronics. 2025; 14(7):1406. https://doi.org/10.3390/electronics14071406

Chicago/Turabian Style

Barkalov, Alexander, Larysa Titarenko, Kazimierz Krzywicki, and Svetlana Saburova. 2025. "Improving Temporal Characteristics of Mealy FSM with Composite State Codes" Electronics 14, no. 7: 1406. https://doi.org/10.3390/electronics14071406

APA Style

Barkalov, A., Titarenko, L., Krzywicki, K., & Saburova, S. (2025). Improving Temporal Characteristics of Mealy FSM with Composite State Codes. Electronics, 14(7), 1406. https://doi.org/10.3390/electronics14071406

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop