1. Introduction
Modern digital systems include many and various sequential blocks [
1]. Very often, these blocks are used as control units [
2,
3,
4,
5,
6,
7]. Many other examples could be added [
8,
9,
10,
11,
12,
13,
14,
15,
16,
17]. For example, efficient digital control is very important in advanced energy systems [
18]. The behavior of sequential blocks can be described using the Moore finite state machine (FSM) model [
19,
20]. When an FSM is used as a control unit, the quality of its implementation strongly affects the quality of the overall system. In this paper, we propose a method for optimizing spatial characteristics of Moore FSMs [
21,
22] implemented using field-programmable gate arrays (FPGAs) [
23,
24].
Our choice is supported by the fact that FPGAs are the most popular modern tools for implementing digital systems [
23,
25]. Leading logic design experts predict that FPGAs will remain in use for at least the next three decades [
26]. In this article, we discuss a case where logic circuits of Moore FSMs are implemented with look-up table (LUT) elements, programmable flip-flops, dedicated multiplexers, programmable interconnections, and a synchronization tree.
A LUT is a single-output logic element with
inputs [
24]. It contains SRAM cells to keep a truth table of an arbitrary Boolean function with up to
arguments [
22,
27]. A key feature of a LUT is its very small number of inputs (typically about six) [
28,
29,
30]. This necessitates the use of functional decomposition (FD) methods in LUT-based design. These methods transform initial systems of Boolean functions (SBFs) into compositions of partial functions. As a result, FSM implementations often become multi-level circuits with highly complex interconnection structures [
1].
Among the key challenges in LUT-oriented FSM synthesis is the reduction of the hardware resources required for circuit implementation, in particular the chip area occupied by the resulting controller [
31,
32]. This issue is closely related to power efficiency, since smaller implementations are generally associated with lower energy demand [
33]. Such optimization is especially relevant in embedded and autonomous applications, where both silicon resources and energy budgets are limited [
34]. In this work, we focus on a synthesis method that decreases the implementation cost of Moore FSMs while preserving the clock-cycle time. This objective is important because aggressive area optimization often causes noticeable performance degradation [
35].
For LUT-based FSM circuits, the implementation cost is influenced not only by the number of LUTs, but also by the organization of signal routing between them. Hence, area reduction should be considered together with interconnection optimization. As pointed out in [
27], the routing structure has a strong impact on the quality of the final circuit, since signal propagation through interconnects frequently contributes more to delay than the logic itself [
36,
37]. In addition, routing resources may be responsible for a dominant share of total energy usage, reaching about 70% in some cases [
37]. For this reason, a synthesis method that simultaneously lowers LUT demand and simplifies routing can improve both energy efficiency and operating speed.
As shown in [
25], the value
is optimal. It provides the best trade-off between the occupied chip area, the performance, and the consumed energy of such element. Therefore, this value is unlikely to increase in the near future. But modern FPGA-based projects are becoming more and more complex [
16,
38]. This requires the permanent development of new, more efficient, logic design methods aimed at improving the basic characteristics of LUT-based FSM circuits.
In this paper, we propose a
novel design method aimed at reducing LUT counts in circuits of LUT-based Moore FSMs with twofold state assignment [
1]. The method exploits the presence of pseudoequivalent states, which is a characteristic feature of Moore FSMs.
Its main idea is to use two cores of partial input memory functions (IMFs). One core contains partial Boolean functions (PBFs) that depend on state variables and FSM inputs, whereas the second core is based on structural decomposition with partial state codes. The experimental results show that the proposed double-core architecture requires fewer LUTs than architectures based solely on twofold state assignment, while maintaining the maximum operating frequency.
The rest of the paper is organized as follows.
Section 2 includes necessary background information.
Section 3 is devoted to the discussion of related works. The main ideas of the proposed method are shown in
Section 4.
Section 5 contains an example of FSM synthesis using the proposed method.
Section 6 includes the presentation and analysis of results of conducted experiments. A brief conclusion sums up the results obtained in the paper.
2. FPGA-Based Design of Moore FSMs
Among other internal resources, modern FPGAs include configurable logic blocks (CLBs) and a matrix of programmable interconnections [
24,
28,
29,
39]. To design an FSM circuit, a designer may use CLBs including such internal resources as LUTs, dedicated multiplexers, and programmable flip-flops. The LUT output is permanently connected with the input of a flip-flop. Using multiplexers allows the CLB output to be made either combinational (the output of the LUT) or registered (the output of the flip-flop). The flip-flops are combined into a state code register of Moore FSM [
19].
As noted above, the main feature of a LUT is its limited number of inputs,
. In modern FPGAs,
does not exceed 6 [
24,
28,
29,
39]. Therefore, functional decomposition methods must be used for Boolean functions that depend on at least 1 +
arguments [
22]. Using FD-based methods results in creating multi-level FSM circuits with complex systems of “spaghetti-type” interconnections [
1].
The abstract Moore FSM is represented by a vector
[
19], where
is a set of internal states,
is a set of FSM inputs,
is a set of outputs,
is a transition function,
is an output function, and
is an initial state. All these sets are finite. A Moore FSM can be represented using a variety of tools [
40]. In this paper, we use the apparatus of state transition tables (STTs) [
19].
An STT can be viewed as a tabular representation of the corresponding state-transition graph. Each row of an STT represents a single interstate transition. An STT has the following columns [
19]:
(the current state),
(the next state),
(the input signal determining the transition
), and
h (the transition number, where
). The input signal
is represented by a conjunction of inputs (or their complements) determining a particular transition. The collection of outputs
(
) generated in state
is shown in the current-state column (
).
To design an FSM circuit, a designer should execute three preliminary steps [
19]: 1) the state assignment; 2) the construction of a direct structure table (DST); 3) the derivation of SBFs representing the FSM circuit.
The state assignment is reduced to encoding each state
by a binary code
. One of the most popular encoding schemes is the maximum binary code (MBC) [
41,
42,
43,
44]. Such codes are created using
state variables. The value of
is determined as
To transform an STT into a DST, it is necessary to determine the sets of state variables
and input memory functions (IMFs)
. The DST consists of all columns of the STT and the additional columns
,
, and
. In column
, the symbol
is written if
(
) in code
. State codes are stored in a state register (RG). This register consists of
flip-flops with common synchronization (Clock) and reset (Start) inputs. In FPGA-based design, D flip-flops are used in LUT-based FSMs [
1]. Input memory functions
may change the state code stored in RG.
The DST is a base for deriving the following SBFs:
SBFs (
2) and (
3) are used for designing a logic circuit of a Moore FSM. In the simplest case, this circuit includes two combinational blocks and the state register [
45]. One block generates IMFs, another block generates FSM outputs. If LUTs are used for implementing the logic circuits of these blocks, then the blocks are named
and
, correspondingly [
1]. In the case of LUT-based design, the state register is distributed among the CLBs of
. So, the state register is hidden.
The model of Moore FSMs possesses two important properties [
46]. First of all, the set of states includes classes of pseudoequivalent states (PESs). The pseudoequivalent states have the same systems of interstate transitions but different collections of outputs. The second property follows from (
3): the FSM outputs do not directly depend on the FSM inputs. The first property allows for minimization of the number of literals in sum-of-products (SOP) representing SBF (
2). The second property allows for minimization of the number of literals in the SOP representing SBF (
3). Using these properties may result in reducing the number of LUTs used and their levels in the circuits of blocks
and
.
3. Related Works
At the technology-mapping stage, the designer must account for several competing implementation objectives [
47]. For LUT-based FSM circuits, these objectives mainly concern the silicon area, achievable clock frequency, and power consumption of the final design [
1]. In the present work, we concentrate on reducing the hardware overhead in Moore FSM implementations, with particular emphasis on lowering the number of LUTs and limiting inter-CLB routing. These two factors have a major influence on the implementation cost of LUT-based circuits [
1]. In addition, a reduction in area-related overhead may also lead to lower power dissipation [
35].
Let
be the number of literals [
40] in a SOP of some function
. This value influences the number of levels in LUT-based circuits. Let the following condition hold for at least a single function
:
In this case, to implement an FSM circuit, the methods of FD are used [
2,
22,
48]. The idea of FD is as follows. If condition (
4) holds, then a function
is decomposed into smaller sub-functions containing fewer literals than the initial SOP. These sub-functions are partial Boolean functions (PBFs). The decomposition process ends when each PBF depends on no more than
arguments. Functional decomposition is a powerful tool in FPGA-based technology mapping [
48,
49]; however, it usually leads to multi-level circuits.
In multi-level circuits, the same inputs
and state variables
may appear at more than one logic level [
1]. This complicates the interconnection structure and results in circuits with spaghetti-type interconnections. To optimize the spatial characteristics of such a circuit, it is necessary to regularize the interconnections. As shown in [
50], circuits with a regular interconnection structure consume less power than their counterparts with spaghetti-type interconnections.
To obtain a regular interconnection structure [
1], methods of structural decomposition (SD) may be applied [
1,
50,
51]. These methods are based on eliminating the direct dependence of functions
on inputs
. To eliminate the direct dependence, some additional functions are introduced. Each system of new functions determines a separate block
possessing unique sets of input and output variables. The classical SD methods are [
19] the replacement of logical conditions and encoding of the collections of outputs. These methods are thoroughly discussed, for example, in [
51].
In Moore FSMs, area reduction can be achieved by exploiting pseudoequivalent states [
50]. Two states
and
are pseudoequivalent when
for
. This relation makes it possible to partition
A into classes of pseudoequivalent states,
. To optimize SBF (
2), the state assignment should be chosen so that each class
is represented by the minimum possible number of generalized intervals in
-dimensional Boolean space [
1]. Such an assignment can be executed using, for example, the methods in [
52]. However, minimizing SBF (
3) requires a different assignment. Therefore, separate state variables are needed to optimize both systems simultaneously.
This can be achieved using the twofold state assignment (TSA) [
50]. This approach can be used if the following condition holds for all FSM states:
In (
5), the symbol
stands for the number of inputs determining transitions from the state
.
The term “twofold state assignment” means that each state is encoded by two codes: the maximum binary code and the partial state code . To create partial state codes, the additional variables are used.
The TSA is based on finding a partition
of the set
A by the classes of compatible states. A class
includes
states. Each state is encoded by codes
and
. The code
includes
bits, where
To create partial codes of states
, the variables
are used. The variables
are combined into a single set
containing
elements. The variables
create extended state codes (ESCs)
. The value of
is determined as
Each class
determines a block
, generating the PBFs
In (
8), the symbol
stands for a set of partial IMFs generated during the transitions from the states
, and the symbol
stands for a set of FSM inputs causing transitions from the states
. These
K blocks create the first level of the FSM circuit.
A block
creates the final values of the IMFs:
This block is a functional assembler forming the second logic level.
The third logic level includes the blocks
LUTerY and
. The first of these generates FSM outputs represented by SBF (
3). The second block transforms maximum binary codes into extended state codes. Therefore, block
implements SBF (
3) and
The architecture of Moore FSM
is shown in
Figure 1.
The block includes master–slave flip-flops combined into the state register. The flip-flops of this register are distributed among the CLBs of . The pulses Start and Clock control the operation of RG.
A comparison of SBFs (
2), (
8) and (
9) shows that the state variables
are replaced by partial state variables
. The partial IMFs
are used as inputs of the block
. De facto, these functions are used as additional variables replacing inputs
.
Now, it is possible to create codes
, minimizing the numbers of literals in SBF (
3). The partial codes
are created in a way that minimizes the number of literals in SBFs (
8). Now, different variables are used for optimizing SBFs (
3) and (
8). Therefore, the contradiction mentioned above is eliminated.
The greedy algorithm [
50] creates the classes of compatible states. Let the symbol
stand for the number of elements in the set
. As shown in [
1], the compatible states
satisfy the condition
As shown in [
1], this approach allows for designing FSM circuits with better characteristics than their FD-based counterparts. This model cannot be applied if condition (
5) is violated for at least a single state
.
In this paper, we discuss a case where condition (
5) holds for all states of a particular Moore FSM. We propose a design method which allows the use of two cores of PBFs. One core is based on the twofold state assignment for some part of the FSM circuit. The second core generates partial IMFs depending on the state variables creating the maximum binary state codes. We denote these cores as SDC and MCC, respectively.
4. The Essence of the Proposed Method
The core MCC exists if there is a set
satisfying the following condition:
In (
12), the symbol
stands for the number of FSM inputs determining transitions from states
. The symbol
stands for the number of state variables representing states
. The following condition holds:
The relation
takes place if states
are encoded in such a way that some state variables
are insignificant. This can be achieved using the approach discussed in [
52]. To satisfy (
13), it is necessary to find the partition
, where
is a class of PES.
Obviously, the set
includes all states for which
. These states should be encoded in a way that minimizes the number of generalized intervals covering their codes. After that, the value of
can be determined. Next, states with
should be considered for inclusion in the set
. All states from a particular class
should be added together. If all such states from all classes have been included in the set
, then states with
should be considered. This process is terminated when no state can be included in the set
without violating condition (
12).
It should be noted that it is possible to create
sets satisfying condition (
12). Therefore, the MCC core may include several LUT blocks. However, the case discussed in this paper is limited to
. Other cases require additional research. In this paper, we only aim to show the main idea of the proposed method without considering all possible variants. The set
determines the sets
and
. The first set includes FSM inputs determining transitions from states
. The second set consists of state variables used for encoding states
, where
.
Once the set
has been determined, the set
A can be partitioned into two disjoint subsets:
and
(
). Next, we should execute TSA for states
. As a result, the partition
of the set
into
K classes of compatible states is obtained. This can be achieved using the greedy algorithm discussed in [
50]. The states
are encoded by partial codes using elements of the set
. The set
determines the core SDC with the set of inputs
.
Next, it is necessary to create a table of the core MCC. This table includes the same columns as any DST. Using it, we can derive the following SBF:
After executing the partial state assignment, it is necessary to create tables for each block
. Using these tables, it is possible to find SBF (
8). Next, the partial IMFs should be combined into their final forms. This is achieved by a functional assembler generating the following SBF:
Finally, we should find an SBF representing the dependence of variables
on the state variables
. SBF (
10) gives this dependence.
Systems (
8)–(
10), (
14) and (
15) determine the architecture of Moore FSM
with two cores of partial Boolean functions. The LUT-based architecture of FSM
is shown in
Figure 2.
The architecture (
Figure 2) includes three levels of logic blocks. The first level includes two cores of PBFs. The core
is represented by
. This block generates partial IMFs (
14). The LUTs of the core
generate partial IMFs (
8). This core is represented by blocks
, …,
. Both cores are represented by single-level circuits. So, each PBF is generated by a single LUT.
The second logic level consists of the functional assembler
. Its LUTs generate functions (
15). Each function
is represented by
partial functions. The corresponding circuit is single-level if the following condition holds for each IMF:
If (
16) is violated, then this circuit includes at least two levels of LUTs. The block hides a distributed register RG. Therefore, the pulses
Start and
Clock enter the functional assembler.
The code transformer
represents the third logic level. Its LUTs generate SBFs (
3) and (
10). This block is single-level if the following condition holds:
In this paper, we propose a synthesis method for Moore FSM . We assume that an FSM is represented by its STT. If an FSM is represented using some other form, then it is necessary to transform this form into the equivalent STT. The proposed method includes the following steps:
- 1.
Finding the partition .
- 2.
Dividing the set of states into the subsets and .
- 3.
Encoding of states
in a way that minimizes SBF (
3).
- 4.
Splitting the set into K classes of compatible states.
- 5.
Encoding of states by partial state codes .
- 6.
Constructing the table of
and finding SBF (
14).
- 7.
Constructing the tables of blocks
and finding SBFs (
8).
- 8.
Constructing the table of the functional assembler and finding SBF (
15).
- 9.
Constructing the table of
and finding SBFs (
3) and (
10).
- 10.
Implementing the FSM logic circuit using the internal resources of a particular chip.
5. Example of Synthesis
Let us discuss an example of synthesis of some FSM E1 using the model
. The FSM is represented by
Table 1. To implement the FSM circuit, we could use LUTs with
.
Analysis of
Table 1 shows that Moore FSM E1 is characterized by the sets
,
, and
. This gives the following values: M = 12, L = 8, and N = 7. This STT consists of H = 28 rows. Using (
1) gives the number of bits in MBCs:
. This value determines the sets
and
. Using
Table 1 and the value of
, we should divide the set A by two disjoint sets. We should start from finding the partition
. This can be achieved using the interstate transitions shown in
Table 1.
Step 1. Using the definition of PESs [
1], we can find the partition
with eight classes of PES:
,
,
,
,
,
,
, and
. Thus, there is
.
Step 2. We start this step by finding the set
. There is
. The difference
shows that the set
may include states with
and
.
Table 1 does not include states with
. It includes 8 states with
. These states are candidates to be included into the set
. We start from the state
. This gives the set
and
. Obviously, adding states from the class
does not change the set
.
The state
is the initial FSM state [
19]. So, its code should include only zeros:
. Accordingly, this state is placed in the cell 0000 (
Figure 3a). To reduce the value of
, we treat the assignment 0001 as insignificant. Therefore, the symbol “*” is placed in the corresponding cell. Thus, three state variables are sufficient to identify state
, and the variable
is insignificant.
The transitions from states
depend on input
. So, we can include these states in the set
. This leads to the set
. We should place these states in some generalized interval of 4-dimensional Boolean space with the insignificant variable
. One of the possible variants is shown in
Figure 3b.
There is
. So, there is
. Thus, we can include in
some other states with
. Let us choose the class
. Including these states leads to the set
. The state codes for states
are shown in
Figure 3c. Obviously, including states
in the set
does not change the set
. But now, there is the set
.
Analysis of
Table 1 shows that it is not possible to include more states in the set
without violating (
12). Thus, the set
has been determined. To obtain the set
, it is necessary to compute the set difference of
A and
. Obviously, this difference yields
.
Step 3. The states
are encoded in a way that minimizes SBF (
14). The codes of states
should be selected in a way that minimizes SBF (
3). To achieve this, the FSM outputs should be represented by the minimum possible number of generalized intervals in
-dimensional Boolean space. Obviously, to create the proper state codes it is necessary to use “free” state assignments. Using the approach from [
52], the codes shown in
Figure 4 are obtained.
Step 4. To find the partition
, we can use the method discussed in [
1]. The algorithm in [
1] is based on the greedy algorithm proposed in [
50]. This algorithm tries to include as many states as possible into each class of
. The main rule of this method is the following: a state could be included into a particular class
if it leads to a minimal increase in the number of elements in the set
.
As follows from [
1], there are the same transitions from all PESs
, where
. So, in the case of Moore FSMs, all pseudoequivalent states
should be placed into the same class
.
In the discussed example, all classes of PES have been shown during the discussion of Step 1. Using greedy algorithm [
50] gives the following partition of the set
:
with K = 2. This partition includes the classes
and
. So, the class
includes classes of PES
; the class
includes PESs from the classes
.
Step 5. In the discussed case, we can find that
. Using (
7) gives the numbers of bits in partial state codes:
These values determine the following sets:
,
. In turn, this gives the set
. The variables from the set
create partial state codes for states
The variables from the set
create partial state codes for states
To eliminate some state variables from SOPs of functions (
8), we propose placing codes
, where
and
, into the same generalized interval of
-dimensional Boolean space. This can be achieved using, for example, the approach discussed in [
52]. Using this approach, we can create the partial state codes shown in the Karnaugh maps in
Figure 5 and
Figure 6. In these maps, the symbol “∉” in cell 000 is reserved for states
.
As follows from
Figure 5 and
Figure 6, each class of PES is represented by a single generalized interval. Let us analyze the Karnaugh map (
Figure 5). The state
is represented by the interval 01* (the symbol “*” means that the corresponding state variable is insignificant). The states
are represented by the interval 10*. Finally, the states
are represented by the interval 11*. So, the state variable
is insignificant. Therefore, this variable is eliminated from the SOPs representing the circuit of
. As a result, the number of LUTs in the circuit of
is reduced.
Step 6. Table of
has the following columns:
(the current state belonging to the set
),
,
(state of transition),
,
,
, and
h. There is no information about FSM outputs in the first column of this table. This information can be taken from the initial STT (
Table 1).
In the discussed case, the table of the block
is constructed using information from rows 1, 2, and 5–12 of the STT (
Table 1). The corresponding state codes are taken from
Figure 4. In the discussed case, the block
is represented by
Table 2.
In
Table 2, the partial IMFs shown in column
should have the superscript “0”. We did not show this superscript to simplify the table. The same is done for all other tables. The table of
is the basis for deriving (
14). Using
Table 2, we can derive the following sum-of-products:
Step 7. The tables of blocks
have almost the same columns as
Table 2. However, column
is replaced by column
. To construct the table of
(
Table 3), we use rows 3, 4, 7–9, 13–18, and 22–25 of
Table 1. The partial state codes are taken from
Figure 5.
To construct the table of
(
Table 4), the rows 19–21 and 26–28 (
Table 1) are used. The partial state codes are taken from
Figure 6.
Using
Table 3 and generalized intervals from the Karnaugh map (
Figure 5), the following SBF is constructed:
Analysis of the SOPs in (
19) shows that they do not include variable
. This result follows from the state assignment used for PESs.
Using
Table 4 and generalized intervals from the Karnaugh map (
Figure 6), the following SBF is constructed:
SBF (
19) is the basis for constructing the circuit of block
. In turn, SBF (
20) is the basis for constructing the circuit of block
. Obviously, a single LUT is sufficient to implement each SOP from systems (
19) and (
20).
Step 8. The functional assembler (block
) performs disjunctions of partial IMFs and produces the final values of the IMFs
. In the general case, this block is represented by a table with
columns. The first column contains the symbols
, other columns are marked by the numbers 0, 1, 2, … The column 0 corresponds to the block
. The intersections of rows and columns are marked with the signs “+” or “−”. If the IMF
is not equal to zero, then there is “+” on intersection of the row
and the column
k(
). Otherwise, the sign “−” is used. Analysis of SBFs (
18)–(
20) shows that each block generates all partial IMFs. This leads to
Table 5.
Table 5 is a base for creating the final values of IMFs represented by (
15). The following SBF is derived from
Table 5:
Step 9. The table of
shows the code-transformation rule. It has the following columns:
,
,
,
, and
. Column
contains the symbols of partial state variables equal to 1 in the third column of the table. Column
contains the symbols of FSM outputs generated in the state written in column
. In the discussed case,
Table 6 represents the block
.
In this table, the codes
are taken from the Karnaugh map (
Figure 4). The partial state codes
are taken from the maps shown in
Figure 5 (class
) and
Figure 6 (class
). The following SBFs are derived from
Table 6:
Analysis of SBF (
22) shows that there is no need to generate state variable
. Thus, using PESs for partial state encoding eliminates about 20% of the LUTs used to generate partial state variables.
SBF (
23) is optimized using generalized intervals covering state codes (
Figure 4). As can be seen, almost all outputs depend on fewer than
variables. In total, SBF (
23) includes 20 literals. In the general case, it includes
literals. In the discussed case,
= 28. Each literal corresponds to an interconnection wire. Therefore, the adopted state-assignment method reduces the number of interconnections by 40%.
Step 10. To implement an FPGA-based FSM circuit, it is necessary to use some industrial package. In the case of chips produced by AMD Xilinx, we should use the CAD tool Vivado [
43]. We do not show the outcome for our example. This is due to the fact that Vivado operates with LUTs with six inputs.