Reducing LUT Counts in Moore FSMs with Twofold State Assignment

Barkalov, Alexander; Titarenko, Larysa; Krzywicki, Kazimierz

doi:10.3390/app16073540

Open AccessArticle

Reducing LUT Counts in Moore FSMs with Twofold State Assignment

by

Alexander Barkalov

^1,*

,

Larysa Titarenko

^1,2

and

Kazimierz Krzywicki

^3,*

¹

Institute of Metrology, Electronics and Computer Science, University of Zielona Gora, ul. Licealna 9, 65-417 Zielona Gora, Poland

²

Department of Infocommunication Engineering, Faculty of Infocommunications, Kharkiv National University of Radio Electronics, Nauky Avenue 14, 61166 Kharkiv, Ukraine

³

Department of Technology, The Jacob of Paradies University, ul. Fryderyka Chopina 52/b.7, 66-400 Gorzow Wielkopolski, Poland

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(7), 3540; https://doi.org/10.3390/app16073540

Submission received: 1 March 2026 / Revised: 23 March 2026 / Accepted: 2 April 2026 / Published: 4 April 2026

(This article belongs to the Special Issue Feature Papers Collection in the Section ‘Electrical, Electronics and Communications Engineering’, Second Edition)

Download

Browse Figures

Versions Notes

Abstract

In this paper, we propose a new synthesis method for LUT-based Moore finite state machines (FSMs) with twofold state assignment (TSA). The method introduces an additional core of partial input memory functions (IMFs), resulting in an architecture with two IMF cores. The first core is based on structural decomposition using additional partial state variables, whereas the second uses maximum binary state codes. Both cores are implemented as single-level circuits. We formulate the conditions under which the proposed method can be applied and show that it improves both the area and timing characteristics of the resulting FSM circuits. The method exploits pseudoequivalent state classes to reduce the number of literals in sum-of-products describing partial IMFs. The developed FSM architecture is organized into three logic stages. At the first stage, two dedicated blocks generate partial IMFs. At the next stage, these intermediate functions are merged and used to form the maximum binary state code. The final stage produces both the output signals and the partial state encoding. The proposed method is illustrated by a synthesis example and validated using standard benchmark FSMs. The obtained results indicate that the method is particularly suitable for larger and more complex Moore FSM implementations.

Keywords:

Moore FSM; LUT; structural decomposition; core of partial functions; synthesis; pseudoequivalent states

1. Introduction

Modern digital systems include many and various sequential blocks [1]. Very often, these blocks are used as control units [2,3,4,5,6,7]. Many other examples could be added [8,9,10,11,12,13,14,15,16,17]. For example, efficient digital control is very important in advanced energy systems [18]. The behavior of sequential blocks can be described using the Moore finite state machine (FSM) model [19,20]. When an FSM is used as a control unit, the quality of its implementation strongly affects the quality of the overall system. In this paper, we propose a method for optimizing spatial characteristics of Moore FSMs [21,22] implemented using field-programmable gate arrays (FPGAs) [23,24].

Our choice is supported by the fact that FPGAs are the most popular modern tools for implementing digital systems [23,25]. Leading logic design experts predict that FPGAs will remain in use for at least the next three decades [26]. In this article, we discuss a case where logic circuits of Moore FSMs are implemented with look-up table (LUT) elements, programmable flip-flops, dedicated multiplexers, programmable interconnections, and a synchronization tree.

A LUT is a single-output logic element with

S_{L}

inputs [24]. It contains SRAM cells to keep a truth table of an arbitrary Boolean function with up to

S_{L}

arguments [22,27]. A key feature of a LUT is its very small number of inputs (typically about six) [28,29,30]. This necessitates the use of functional decomposition (FD) methods in LUT-based design. These methods transform initial systems of Boolean functions (SBFs) into compositions of partial functions. As a result, FSM implementations often become multi-level circuits with highly complex interconnection structures [1].

Among the key challenges in LUT-oriented FSM synthesis is the reduction of the hardware resources required for circuit implementation, in particular the chip area occupied by the resulting controller [31,32]. This issue is closely related to power efficiency, since smaller implementations are generally associated with lower energy demand [33]. Such optimization is especially relevant in embedded and autonomous applications, where both silicon resources and energy budgets are limited [34]. In this work, we focus on a synthesis method that decreases the implementation cost of Moore FSMs while preserving the clock-cycle time. This objective is important because aggressive area optimization often causes noticeable performance degradation [35].

For LUT-based FSM circuits, the implementation cost is influenced not only by the number of LUTs, but also by the organization of signal routing between them. Hence, area reduction should be considered together with interconnection optimization. As pointed out in [27], the routing structure has a strong impact on the quality of the final circuit, since signal propagation through interconnects frequently contributes more to delay than the logic itself [36,37]. In addition, routing resources may be responsible for a dominant share of total energy usage, reaching about 70% in some cases [37]. For this reason, a synthesis method that simultaneously lowers LUT demand and simplifies routing can improve both energy efficiency and operating speed.

As shown in [25], the value

S_{L} = 6

is optimal. It provides the best trade-off between the occupied chip area, the performance, and the consumed energy of such element. Therefore, this value is unlikely to increase in the near future. But modern FPGA-based projects are becoming more and more complex [16,38]. This requires the permanent development of new, more efficient, logic design methods aimed at improving the basic characteristics of LUT-based FSM circuits.

In this paper, we propose a novel design method aimed at reducing LUT counts in circuits of LUT-based Moore FSMs with twofold state assignment [1]. The method exploits the presence of pseudoequivalent states, which is a characteristic feature of Moore FSMs. Its main idea is to use two cores of partial input memory functions (IMFs). One core contains partial Boolean functions (PBFs) that depend on state variables and FSM inputs, whereas the second core is based on structural decomposition with partial state codes. The experimental results show that the proposed double-core architecture requires fewer LUTs than architectures based solely on twofold state assignment, while maintaining the maximum operating frequency.

The rest of the paper is organized as follows. Section 2 includes necessary background information. Section 3 is devoted to the discussion of related works. The main ideas of the proposed method are shown in Section 4. Section 5 contains an example of FSM synthesis using the proposed method. Section 6 includes the presentation and analysis of results of conducted experiments. A brief conclusion sums up the results obtained in the paper.

2. FPGA-Based Design of Moore FSMs

Among other internal resources, modern FPGAs include configurable logic blocks (CLBs) and a matrix of programmable interconnections [24,28,29,39]. To design an FSM circuit, a designer may use CLBs including such internal resources as LUTs, dedicated multiplexers, and programmable flip-flops. The LUT output is permanently connected with the input of a flip-flop. Using multiplexers allows the CLB output to be made either combinational (the output of the LUT) or registered (the output of the flip-flop). The flip-flops are combined into a state code register of Moore FSM [19].

As noted above, the main feature of a LUT is its limited number of inputs,

S_{L}

. In modern FPGAs,

S_{L}

does not exceed 6 [24,28,29,39]. Therefore, functional decomposition methods must be used for Boolean functions that depend on at least 1 +

S_{L}

arguments [22]. Using FD-based methods results in creating multi-level FSM circuits with complex systems of “spaghetti-type” interconnections [1].

The abstract Moore FSM is represented by a vector

S = < A, X, Y, δ, λ, a_{1} >

[19], where

A = {a_{1}, \dots, a_{M}}

is a set of internal states,

X = {x_{1}, \dots, x_{L}}

is a set of FSM inputs,

Y = {y_{1}, \dots, y_{N}}

is a set of outputs,

δ

is a transition function,

λ

is an output function, and

a_{1} \in A

is an initial state. All these sets are finite. A Moore FSM can be represented using a variety of tools [40]. In this paper, we use the apparatus of state transition tables (STTs) [19].

An STT can be viewed as a tabular representation of the corresponding state-transition graph. Each row of an STT represents a single interstate transition. An STT has the following columns [19]:

a_{m}

(the current state),

a_{s}

(the next state),

X_{h}

(the input signal determining the transition

〈 a_{m}, a_{s} 〉

), and h (the transition number, where

h \in {1, \dots, H}

). The input signal

X_{h}

is represented by a conjunction of inputs (or their complements) determining a particular transition. The collection of outputs

Y_{q} \subseteq Y

(

q \in Q = {1, \dots, Q}

) generated in state

a_{m} \in A

is shown in the current-state column (

a_{m}

).

To design an FSM circuit, a designer should execute three preliminary steps [19]: 1) the state assignment; 2) the construction of a direct structure table (DST); 3) the derivation of SBFs representing the FSM circuit.

The state assignment is reduced to encoding each state

a_{m} \in A

by a binary code

K (a_{m})

. One of the most popular encoding schemes is the maximum binary code (MBC) [41,42,43,44]. Such codes are created using

R_{A}

state variables. The value of

R_{A}

is determined as

R_{A} = ⌈ l o g_{2} M ⌉ .

(1)

To transform an STT into a DST, it is necessary to determine the sets of state variables

T = {T_{1}, \dots, T_{R_{A}}}

and input memory functions (IMFs)

D = {D_{1}, \dots, D_{R_{A}}}

. The DST consists of all columns of the STT and the additional columns

K (a_{m})

,

K (a_{s})

, and

D_{h}

. In column

D_{h}

, the symbol

D_{r}

is written if

T_{r} = 1

(

r \in {1, \dots, R_{A}}

) in code

K (a_{s})

. State codes are stored in a state register (RG). This register consists of

R_{A}

flip-flops with common synchronization (Clock) and reset (Start) inputs. In FPGA-based design, D flip-flops are used in LUT-based FSMs [1]. Input memory functions

D_{r} \in D

may change the state code stored in RG.

The DST is a base for deriving the following SBFs:

D = D (T, X);

(2)

Y = Y (T) .

(3)

SBFs (2) and (3) are used for designing a logic circuit of a Moore FSM. In the simplest case, this circuit includes two combinational blocks and the state register [45]. One block generates IMFs, another block generates FSM outputs. If LUTs are used for implementing the logic circuits of these blocks, then the blocks are named

L U T e r D

and

L U T e r Y

, correspondingly [1]. In the case of LUT-based design, the state register is distributed among the CLBs of

L U T e r D

. So, the state register is hidden.

The model of Moore FSMs possesses two important properties [46]. First of all, the set of states includes classes of pseudoequivalent states (PESs). The pseudoequivalent states have the same systems of interstate transitions but different collections of outputs. The second property follows from (3): the FSM outputs do not directly depend on the FSM inputs. The first property allows for minimization of the number of literals in sum-of-products (SOP) representing SBF (2). The second property allows for minimization of the number of literals in the SOP representing SBF (3). Using these properties may result in reducing the number of LUTs used and their levels in the circuits of blocks

L U T e r D

and

L U T e r Y

.

3. Related Works

At the technology-mapping stage, the designer must account for several competing implementation objectives [47]. For LUT-based FSM circuits, these objectives mainly concern the silicon area, achievable clock frequency, and power consumption of the final design [1]. In the present work, we concentrate on reducing the hardware overhead in Moore FSM implementations, with particular emphasis on lowering the number of LUTs and limiting inter-CLB routing. These two factors have a major influence on the implementation cost of LUT-based circuits [1]. In addition, a reduction in area-related overhead may also lead to lower power dissipation [35].

Let

N L (f_{i})

be the number of literals [40] in a SOP of some function

f_{i} \in D \cup Y

. This value influences the number of levels in LUT-based circuits. Let the following condition hold for at least a single function

f_{i} \in D \cup Y

:

N L (f_{i}) \geq S_{L} .

(4)

In this case, to implement an FSM circuit, the methods of FD are used [2,22,48]. The idea of FD is as follows. If condition (4) holds, then a function

f_{i} \in D \cup Y

is decomposed into smaller sub-functions containing fewer literals than the initial SOP. These sub-functions are partial Boolean functions (PBFs). The decomposition process ends when each PBF depends on no more than

S_{L}

arguments. Functional decomposition is a powerful tool in FPGA-based technology mapping [48,49]; however, it usually leads to multi-level circuits.

In multi-level circuits, the same inputs

x_{l} \in X

and state variables

T_{r} \in T

may appear at more than one logic level [1]. This complicates the interconnection structure and results in circuits with spaghetti-type interconnections. To optimize the spatial characteristics of such a circuit, it is necessary to regularize the interconnections. As shown in [50], circuits with a regular interconnection structure consume less power than their counterparts with spaghetti-type interconnections.

To obtain a regular interconnection structure [1], methods of structural decomposition (SD) may be applied [1,50,51]. These methods are based on eliminating the direct dependence of functions

f_{i} \in D \cup Y

on inputs

x_{l} \in X

. To eliminate the direct dependence, some additional functions are introduced. Each system of new functions determines a separate block

L U T e r

possessing unique sets of input and output variables. The classical SD methods are [19] the replacement of logical conditions and encoding of the collections of outputs. These methods are thoroughly discussed, for example, in [51].

In Moore FSMs, area reduction can be achieved by exploiting pseudoequivalent states [50]. Two states

a_{m}

and

a_{s}

are pseudoequivalent when

δ (a_{m}, X_{h}) = δ (a_{s}, X_{h})

for

h \in {1, \dots, H}

. This relation makes it possible to partition A into classes of pseudoequivalent states,

Π_{A} = {B_{1}, \dots, B_{I}}

. To optimize SBF (2), the state assignment should be chosen so that each class

B_{i} \in Π_{A}

is represented by the minimum possible number of generalized intervals in

R_{A}

-dimensional Boolean space [1]. Such an assignment can be executed using, for example, the methods in [52]. However, minimizing SBF (3) requires a different assignment. Therefore, separate state variables are needed to optimize both systems simultaneously.

This can be achieved using the twofold state assignment (TSA) [50]. This approach can be used if the following condition holds for all FSM states:

L (a_{m}) < S_{L} .

(5)

In (5), the symbol

L (a_{m})

stands for the number of inputs determining transitions from the state

a_{m} \in A

.

The term “twofold state assignment” means that each state

a_{m} \in A

is encoded by two codes: the maximum binary code

K (a_{m})

and the partial state code

P C (a_{m})

. To create partial state codes, the additional variables

τ_{r} \in τ

are used.

The TSA is based on finding a partition

Π_{A T} = {A^{1}, \dots A^{K}}

of the set A by the classes of compatible states. A class

A^{k} \in Π_{A T}

includes

M_{k}

states. Each state is encoded by codes

K (a_{m})

and

P C (a_{m})

. The code

P C (a_{m})

includes

R_{k}

bits, where

R_{k} = ⌈ {log}_{2} (M_{k} + 1) ⌉ .

(6)

To create partial codes of states

a_{m} \in A^{k}

, the variables

τ_{r} \in τ^{k} = {τ_{1}, \dots, τ_{R k}}

are used. The variables

τ_{r} \in τ^{k}

are combined into a single set

τ

containing

R_{T F}

elements. The variables

τ_{r} \in τ

create extended state codes (ESCs)

E C (a_{m})

. The value of

R_{T F}

is determined as

R_{T F} = R_{1} + R_{2} + \dots + R_{K} .

(7)

Each class

A^{k} \in Π_{A T}

determines a block

L U T e r k

, generating the PBFs

D^{k} = D^{k} (τ^{k}, X^{k}) .

(8)

In (8), the symbol

D^{k}

stands for a set of partial IMFs generated during the transitions from the states

a_{m} \in A^{k}

, and the symbol

X^{k}

stands for a set of FSM inputs causing transitions from the states

a_{m} \in A^{k}

. These K blocks create the first level of the FSM circuit.

A block

L U T e r D

creates the final values of the IMFs:

D = D (D^{1}, \dots, D^{K}) .

(9)

This block is a functional assembler forming the second logic level.

The third logic level includes the blocks LUTerY and

L U T e r τ

. The first of these generates FSM outputs represented by SBF (3). The second block transforms maximum binary codes into extended state codes. Therefore, block

L U T e r Y τ

implements SBF (3) and

τ = τ (T) .

(10)

The architecture of Moore FSM

U_{1}

is shown in Figure 1.

The block

L U T e r D

includes

R_{A}

master–slave flip-flops combined into the state register. The flip-flops of this register are distributed among the CLBs of

L U T e r D

. The pulses Start and Clock control the operation of RG.

A comparison of SBFs (2), (8) and (9) shows that the state variables

T_{r} \in T

are replaced by partial state variables

τ_{r} \in τ

. The partial IMFs

D_{r}^{k}

are used as inputs of the block

L U T e r T

. De facto, these functions are used as additional variables replacing inputs

x_{l} \in X

.

Now, it is possible to create codes

K (a_{m})

, minimizing the numbers of literals in SBF (3). The partial codes

P C (a_{m})

are created in a way that minimizes the number of literals in SBFs (8). Now, different variables are used for optimizing SBFs (3) and (8). Therefore, the contradiction mentioned above is eliminated.

The greedy algorithm [50] creates the classes of compatible states. Let the symbol

L_{k}

stand for the number of elements in the set

X^{k}

. As shown in [1], the compatible states

a_{m} \in A^{k}

satisfy the condition

R_{k} + L_{k} \leq S_{L} .

(11)

As shown in [1], this approach allows for designing FSM circuits with better characteristics than their FD-based counterparts. This model cannot be applied if condition (5) is violated for at least a single state

a_{m} \in A

.

In this paper, we discuss a case where condition (5) holds for all states of a particular Moore FSM. We propose a design method which allows the use of two cores of PBFs. One core is based on the twofold state assignment for some part of the FSM circuit. The second core generates partial IMFs depending on the state variables creating the maximum binary state codes. We denote these cores as SDC and MCC, respectively.

4. The Essence of the Proposed Method

The core MCC exists if there is a set

A_{M C}

satisfying the following condition:

L_{M C} + R_{M C} \leq S_{L} .

(12)

In (12), the symbol

L_{M C}

stands for the number of FSM inputs determining transitions from states

a_{m} \in A_{M C}

. The symbol

R_{M C}

stands for the number of state variables representing states

a_{m} \in A_{M C}

. The following condition holds:

R_{M C} \leq R_{A} .

(13)

The relation

R_{M C} < R_{A}

takes place if states

a_{m} \in A_{M C}

are encoded in such a way that some state variables

T_{r} \in T

are insignificant. This can be achieved using the approach discussed in [52]. To satisfy (13), it is necessary to find the partition

Π_{A} = {B_{1}, \dots B_{I}}

, where

B_{i} \in Π_{A}

is a class of PES.

Obviously, the set

A_{M C}

includes all states for which

L (a_{m}) = 0

. These states should be encoded in a way that minimizes the number of generalized intervals covering their codes. After that, the value of

R_{M C}

can be determined. Next, states with

L (a_{m}) = 1

should be considered for inclusion in the set

A_{M C}

. All states from a particular class

B_{i} \in Π_{A}

should be added together. If all such states from all classes have been included in the set

A_{M C}

, then states with

L (a_{m}) = 2

should be considered. This process is terminated when no state can be included in the set

A_{M C}

without violating condition (12).

It should be noted that it is possible to create

J_{M C}

sets satisfying condition (12). Therefore, the MCC core may include several LUT blocks. However, the case discussed in this paper is limited to

J_{M C} = 1

. Other cases require additional research. In this paper, we only aim to show the main idea of the proposed method without considering all possible variants. The set

A_{M C}

determines the sets

X_{M C}

and

T_{M C}

. The first set includes FSM inputs determining transitions from states

a_{m} \in A_{M C}

. The second set consists of state variables used for encoding states

a_{m} \in A_{M C}

, where

A_{M C} \subset A

.

Once the set

A_{M C}

has been determined, the set A can be partitioned into two disjoint subsets:

A_{M C}

and

A_{S D}

(

A_{M C} \cup A_{S D} = A;

| A_{S D} \cap A_{M C} | = 0

). Next, we should execute TSA for states

a_{m} \in A_{S D}

. As a result, the partition

Π_{A S D} = {A^{1}, \dots, A^{K}}

of the set

A_{S D}

into K classes of compatible states is obtained. This can be achieved using the greedy algorithm discussed in [50]. The states

a_{m} \in A^{k}

are encoded by partial codes using elements of the set

τ^{k} \in τ = τ^{1} \cup τ^{2} \cup \dots \cup τ^{K}

. The set

A_{S D}

determines the core SDC with the set of inputs

X_{S D}

.

Next, it is necessary to create a table of the core MCC. This table includes the same columns as any DST. Using it, we can derive the following SBF:

D_{M C} = D_{M C} (T_{M C}, X_{M C}) .

(14)

After executing the partial state assignment, it is necessary to create tables for each block

L U T e r k

. Using these tables, it is possible to find SBF (8). Next, the partial IMFs should be combined into their final forms. This is achieved by a functional assembler generating the following SBF:

D = D (D_{M C}, D^{1}, \dots, D^{K}) .

(15)

Finally, we should find an SBF representing the dependence of variables

τ_{r} \in τ

on the state variables

T_{r} \in T

. SBF (10) gives this dependence.

Systems (8)–(10), (14) and (15) determine the architecture of Moore FSM

U_{2}

with two cores of partial Boolean functions. The LUT-based architecture of FSM

U_{2}

is shown in Figure 2.

The architecture (Figure 2) includes three levels of logic blocks. The first level includes two cores of PBFs. The core

M C C

is represented by

L U T e r M C

. This block generates partial IMFs (14). The LUTs of the core

S D C

generate partial IMFs (8). This core is represented by blocks

L U T e r 1

, …,

L U T e r K

. Both cores are represented by single-level circuits. So, each PBF is generated by a single LUT.

The second logic level consists of the functional assembler

L U T e r T

. Its LUTs generate functions (15). Each function

D_{r} \in D

is represented by

N F (D_{r})

partial functions. The corresponding circuit is single-level if the following condition holds for each IMF:

N F (D_{r}) \leq S_{L} .

(16)

If (16) is violated, then this circuit includes at least two levels of LUTs. The block hides a distributed register RG. Therefore, the pulses Start and Clock enter the functional assembler.

The code transformer

L U T e r τ Y

represents the third logic level. Its LUTs generate SBFs (3) and (10). This block is single-level if the following condition holds:

R_{A} \leq S_{L} .

(17)

In this paper, we propose a synthesis method for Moore FSM

U_{2}

. We assume that an FSM is represented by its STT. If an FSM is represented using some other form, then it is necessary to transform this form into the equivalent STT. The proposed method includes the following steps:

1.: Finding the partition $Π_{A} = {B_{1}, \dots B_{I}}$ .
2.: Dividing the set of states into the subsets $A_{M C}$ and $A_{S D}$ .
3.: Encoding of states $a_{m} \in A$ in a way that minimizes SBF (3).
4.: Splitting the set $A_{S D}$ into K classes of compatible states.
5.: Encoding of states $a_{m} \in A^{k}$ by partial state codes $P C (a_{m})$ .
6.: Constructing the table of $L U T e r M C$ and finding SBF (14).
7.: Constructing the tables of blocks $L U T e r k$ and finding SBFs (8).
8.: Constructing the table of the functional assembler and finding SBF (15).
9.: Constructing the table of $L U T e r τ Y$ and finding SBFs (3) and (10).
10.: Implementing the FSM logic circuit using the internal resources of a particular chip.

5. Example of Synthesis

Let us discuss an example of synthesis of some FSM E1 using the model

U_{2}

. The FSM is represented by Table 1. To implement the FSM circuit, we could use LUTs with

S_{L} = 5

.

Analysis of Table 1 shows that Moore FSM E1 is characterized by the sets

A = {a_{1}, \dots, a_{12}}

,

X = {x_{1}, \dots, x_{8}}

, and

Y = {y_{1}, \dots, y_{7}}

. This gives the following values: M = 12, L = 8, and N = 7. This STT consists of H = 28 rows. Using (1) gives the number of bits in MBCs:

R_{A} = 4

. This value determines the sets

T = {T_{1}, \dots, T_{4}}

and

D = {D_{1}, \dots, D_{4}}

. Using Table 1 and the value of

S_{L} = 5

, we should divide the set A by two disjoint sets. We should start from finding the partition

Π_{A} = {B_{1}, \dots B_{I}}

. This can be achieved using the interstate transitions shown in Table 1.

Step 1. Using the definition of PESs [1], we can find the partition

Π_{A}

with eight classes of PES:

B_{1} = {a_{1}}

,

B_{2} = {a_{2}}

,

B_{3} = {a_{3}, a_{4}}

,

B_{4} = {a_{5}, a_{6}}

,

B_{5} = {a_{7}, a_{8}}

,

B_{6} = {a_{9}}

,

B_{7} = {a_{10}, a_{11}}

, and

B_{8} = {a_{12}}

. Thus, there is

I = 8

.

Step 2. We start this step by finding the set

A_{M C}

. There is

R_{A} = 4

. The difference

S_{L} - R_{A} = 1

shows that the set

A_{M C}

may include states with

L (a_{m}) = 0

and

L (a_{m}) = 1

. Table 1 does not include states with

L (a_{m}) = 0

. It includes 8 states with

L (a_{m}) = 1

. These states are candidates to be included into the set

A_{M C}

. We start from the state

a_{1}

. This gives the set

A_{M C} = {a_{1}}

and

X_{M C} = {x_{1}}

. Obviously, adding states from the class

B_{4} = {a_{5}, a_{6}}

does not change the set

X_{M C} = {x_{1}}

.

The state

a_{1}

is the initial FSM state [19]. So, its code should include only zeros:

K (a_{1}) = 0000

. Accordingly, this state is placed in the cell 0000 (Figure 3a). To reduce the value of

R_{M C}

, we treat the assignment 0001 as insignificant. Therefore, the symbol “*” is placed in the corresponding cell. Thus, three state variables are sufficient to identify state

a_{1}

, and the variable

T_{4}

is insignificant.

The transitions from states

a_{5}, a_{6} \in B_{4}

depend on input

x_{1}

. So, we can include these states in the set

A_{M C}

. This leads to the set

A_{M C} = {a_{1}, a_{5}, a_{6}}

. We should place these states in some generalized interval of 4-dimensional Boolean space with the insignificant variable

T_{4}

. One of the possible variants is shown in Figure 3b.

There is

T_{M C} = {T_{1}, T_{2}, T_{3}}

. So, there is

S_{L} - R_{M C} = 2

. Thus, we can include in

A_{M C}

some other states with

L (a_{m}) = 1

. Let us choose the class

B_{3} = {a_{3}, a_{4}}

. Including these states leads to the set

A_{M C} = {a_{1}, a_{3}, a_{4}, a_{5}, a_{6}}

. The state codes for states

a_{3}, a_{4} \in B_{3}

are shown in Figure 3c. Obviously, including states

a_{3}, a_{4} \in B_{3}

in the set

A_{M C}

does not change the set

T_{M C}

. But now, there is the set

X_{M C} = {x_{1}, x_{2}}

.

Analysis of Table 1 shows that it is not possible to include more states in the set

A_{M C}

without violating (12). Thus, the set

A_{M C}

has been determined. To obtain the set

A_{S D}

, it is necessary to compute the set difference of A and

A_{M C}

. Obviously, this difference yields

A_{S D} = {a_{2}, a_{7}, \dots, a_{12}}

.

Step 3. The states

a_{m} \in A_{M C}

are encoded in a way that minimizes SBF (14). The codes of states

a_{m} \in A_{S D}

should be selected in a way that minimizes SBF (3). To achieve this, the FSM outputs should be represented by the minimum possible number of generalized intervals in

R_{A}

-dimensional Boolean space. Obviously, to create the proper state codes it is necessary to use “free” state assignments. Using the approach from [52], the codes shown in Figure 4 are obtained.

Step 4. To find the partition

Π_{A T} = {A^{1}, \dots A^{K}}

, we can use the method discussed in [1]. The algorithm in [1] is based on the greedy algorithm proposed in [50]. This algorithm tries to include as many states as possible into each class of

Π_{A T}

. The main rule of this method is the following: a state could be included into a particular class

A^{k}

if it leads to a minimal increase in the number of elements in the set

X^{k}

.

As follows from [1], there are the same transitions from all PESs

a_{m} \in B_{i}

, where

B_{i} \in Π_{A}

. So, in the case of Moore FSMs, all pseudoequivalent states

a_{m} \in B_{i}

should be placed into the same class

A^{k} \in Π_{A T}

.

In the discussed example, all classes of PES have been shown during the discussion of Step 1. Using greedy algorithm [50] gives the following partition of the set

A_{S D}

:

Π_{A T} = {A^{1}, A^{2}}

with K = 2. This partition includes the classes

A^{1} = {a_{2}, a_{7}, a_{8}, a_{10}, a_{11}}

and

A^{2} = {a_{9}, a_{12}}

. So, the class

A^{1} \in Π_{A T}

includes classes of PES

B_{2}, B_{5}, B_{7}

; the class

A^{2} \in Π_{A T}

includes PESs from the classes

B_{6}, B_{8}

.

Step 5. In the discussed case, we can find that

M_{1} = 5, M_{2} = 2

. Using (7) gives the numbers of bits in partial state codes:

R_{1} = 3, R_{2} = 2 .

These values determine the following sets:

τ^{1} = {τ_{1}, τ_{2}, τ_{3}}

,

τ^{2} = {τ_{4}, τ_{5}}

. In turn, this gives the set

τ = {τ_{1}, \dots, τ_{5}}

. The variables from the set

τ^{1}

create partial state codes for states

a_{m} \in A^{1} .

The variables from the set

τ^{2}

create partial state codes for states

a_{m} \in A^{2} .

To eliminate some state variables from SOPs of functions (8), we propose placing codes

P C (a_{m})

, where

a_{m} \in B_{i}

and

a_{m} \in A^{k}

, into the same generalized interval of

R_{k}

-dimensional Boolean space. This can be achieved using, for example, the approach discussed in [52]. Using this approach, we can create the partial state codes shown in the Karnaugh maps in Figure 5 and Figure 6. In these maps, the symbol “∉” in cell 000 is reserved for states

a_{m} \notin A^{k}

.

As follows from Figure 5 and Figure 6, each class of PES is represented by a single generalized interval. Let us analyze the Karnaugh map (Figure 5). The state

a_{2} \in B_{2}

is represented by the interval 01* (the symbol “*” means that the corresponding state variable is insignificant). The states

a_{7}, a_{8} \in B_{5}

are represented by the interval 10*. Finally, the states

a_{10}, a_{11} \in B_{7}

are represented by the interval 11*. So, the state variable

τ_{3}

is insignificant. Therefore, this variable is eliminated from the SOPs representing the circuit of

L U T e r 1

. As a result, the number of LUTs in the circuit of

L U T e r τ Y

is reduced.

Step 6. Table of

L U T e r M C

has the following columns:

a_{m}

(the current state belonging to the set

A_{M C}

),

K (a_{m})

,

a_{s}

(state of transition),

K (a_{s})

,

X_{h}

,

D_{h}

, and h. There is no information about FSM outputs in the first column of this table. This information can be taken from the initial STT (Table 1).

In the discussed case, the table of the block

L U T e r M C

is constructed using information from rows 1, 2, and 5–12 of the STT (Table 1). The corresponding state codes are taken from Figure 4. In the discussed case, the block

L U T e r M C

is represented by Table 2.

In Table 2, the partial IMFs shown in column

D_{h}

should have the superscript “0”. We did not show this superscript to simplify the table. The same is done for all other tables. The table of

L U T e r M C

is the basis for deriving (14). Using Table 2, we can derive the following sum-of-products:

\begin{array}{l} D_{1}^{0} = \bar{T_{1}} \bar{T_{2}} \bar{T_{3}} x_{1} \lor \bar{T_{1}} T_{2} \bar{T_{3}} \bar{x_{1}}; \\ D_{2}^{0} = \bar{T_{1}} \bar{T_{2}} \bar{T_{3}} x_{1} \lor \bar{T_{1}} \bar{T_{2}} \bar{T_{3}} \bar{x_{2}} \lor \bar{T_{1}} T_{2} \bar{T_{3}} x_{1}; \\ D_{3}^{0} = \bar{T_{1}} \bar{T_{2}} \bar{T_{3}} \bar{x_{1}} \lor \bar{T_{1}} \bar{T_{2}} T_{3} x_{2}; \\ D_{4}^{0} = \bar{T_{1}} \bar{T_{2}} \bar{T_{3}} x_{1} \lor \bar{T_{1}} \bar{T_{2}} T_{3} x_{2} \lor \bar{T_{1}} T_{2} \bar{T_{3}} . \end{array}

(18)

Step 7. The tables of blocks

L U T e r 1 - L U T e r 2

have almost the same columns as Table 2. However, column

K (a_{s})

is replaced by column

P C (a_{s})

. To construct the table of

L U T e r 1

(Table 3), we use rows 3, 4, 7–9, 13–18, and 22–25 of Table 1. The partial state codes are taken from Figure 5.

To construct the table of

L U T e r 2

(Table 4), the rows 19–21 and 26–28 (Table 1) are used. The partial state codes are taken from Figure 6.

Using Table 3 and generalized intervals from the Karnaugh map (Figure 5), the following SBF is constructed:

\begin{array}{l} D_{1}^{1} = τ_{1} \bar{τ_{2}} \bar{x_{3}} x_{4} \lor τ_{1} τ_{2} \bar{x_{3}}; \\ D_{2}^{1} = τ_{1} \bar{τ_{2}} x_{3} \lor τ_{1} τ_{2} \bar{x_{3}}; \\ D_{3}^{1} = \bar{τ_{1}} τ_{2} \lor τ_{1} \bar{τ_{2}}; \\ D_{4}^{1} = \bar{τ_{1}} τ_{2} \bar{x_{3}} \lor τ_{1} \bar{τ_{2}} x_{3} \lor τ_{1} \bar{τ_{2}} x_{4} . \end{array}

(19)

Analysis of the SOPs in (19) shows that they do not include variable

τ_{3}

. This result follows from the state assignment used for PESs.

Using Table 4 and generalized intervals from the Karnaugh map (Figure 6), the following SBF is constructed:

\begin{array}{l} D_{1}^{2} = τ_{4} x_{6} \lor τ_{5} x_{7} \lor τ_{5} x_{8}; \\ D_{2}^{2} = τ_{4} x_{6} \lor τ_{4} x_{7}; \\ D_{3}^{2} = τ_{4} \bar{x_{6}} x_{7} \lor τ_{5} x_{7}; \\ D_{4}^{2} = τ_{5} \bar{x_{7}} x_{8} . \end{array}

(20)

SBF (19) is the basis for constructing the circuit of block

L U T e r 1

. In turn, SBF (20) is the basis for constructing the circuit of block

L U T e r 2

. Obviously, a single LUT is sufficient to implement each SOP from systems (19) and (20).

Step 8. The functional assembler (block

L U T e r T

) performs disjunctions of partial IMFs and produces the final values of the IMFs

D_{r} \in D

. In the general case, this block is represented by a table with

K + 2

columns. The first column contains the symbols

D_{r} \in D

, other columns are marked by the numbers 0, 1, 2, … The column 0 corresponds to the block

L U T e r M C

. The intersections of rows and columns are marked with the signs “+” or “−”. If the IMF

D_{r}^{k} \in A^{k}

is not equal to zero, then there is “+” on intersection of the row

D_{r}

and the column k(

k \in {0, 1, \dots, K}

). Otherwise, the sign “−” is used. Analysis of SBFs (18)–(20) shows that each block generates all partial IMFs. This leads to Table 5.

Table 5 is a base for creating the final values of IMFs represented by (15). The following SBF is derived from Table 5:

\begin{array}{l} D_{1} = D_{1}^{0} \lor D_{1}^{1} \lor D_{1}^{2}; \\ D_{2} = D_{2}^{0} \lor D_{2}^{1} \lor D_{2}^{2}; \\ D_{3} = D_{3}^{0} \lor D_{3}^{1} \lor D_{3}^{2}; \\ D_{4} = D_{4}^{0} \lor D_{4}^{1} \lor D_{4}^{2} . \end{array}

(21)

Step 9. The table of

L U T e r τ Y

shows the code-transformation rule. It has the following columns:

a_{m}

,

K (a_{m})

,

P C (a_{m})

,

τ_{m}

, and

Y_{m}

. Column

τ_{m}

contains the symbols of partial state variables equal to 1 in the third column of the table. Column

Y_{m}

contains the symbols of FSM outputs generated in the state written in column

a_{m}

. In the discussed case, Table 6 represents the block

L U T e r τ Y

.

In this table, the codes

K (a_{m})

are taken from the Karnaugh map (Figure 4). The partial state codes

P C (a_{m})

are taken from the maps shown in Figure 5 (class

A^{1}

) and Figure 6 (class

A^{2}

). The following SBFs are derived from Table 6:

\begin{array}{l} τ_{1} = A_{7} \lor A_{8} \lor A_{10} \lor A_{11} = f_{1} (T_{1}, \dots, T_{4}); \\ τ_{2} = A_{2} \lor A_{10} \lor A_{11} = f_{2} (T_{1}, \dots, T_{4}); \\ τ_{4} = A_{9} = T_{1} T_{3} T_{4}; \\ τ_{5} = A_{12} = T_{1} T_{3} \bar{T_{4}} . \end{array}

(22)

\begin{array}{l} y_{1} = A_{2} \lor A_{5} \lor A_{7}; \\ y_{2} = A_{3} \lor A_{12} = \bar{T_{2}} T_{3} \bar{T_{4}}; \\ y_{3} = A_{2} \lor A_{7} \lor A_{9} = T_{1} T_{4}; \\ y_{4} = A_{4} \lor A_{8} \lor A_{9} = T_{3} T_{4}; \\ y_{5} = A_{3} \lor A_{11} = \bar{T_{1}} T_{3} \bar{T_{4}}; \\ y_{6} = A_{6} = \bar{T_{1}} \bar{T_{3}} T_{4}; \\ y_{7} = A_{5} \lor A_{10} = T_{2} \bar{T_{3}} \bar{T_{4}} . \end{array}

(23)

Analysis of SBF (22) shows that there is no need to generate state variable

τ_{3}

. Thus, using PESs for partial state encoding eliminates about 20% of the LUTs used to generate partial state variables.

SBF (23) is optimized using generalized intervals covering state codes (Figure 4). As can be seen, almost all outputs depend on fewer than

R_{A}

variables. In total, SBF (23) includes 20 literals. In the general case, it includes

N R_{A}

literals. In the discussed case,

N R_{A}

= 28. Each literal corresponds to an interconnection wire. Therefore, the adopted state-assignment method reduces the number of interconnections by 40%.

Step 10. To implement an FPGA-based FSM circuit, it is necessary to use some industrial package. In the case of chips produced by AMD Xilinx, we should use the CAD tool Vivado [43]. We do not show the outcome for our example. This is due to the fact that Vivado operates with LUTs with six inputs.

6. Experimental Results and Discussion

To investigate the characteristics of FSMs obtained by using the proposed method, we use standard benchmark FSMs described in [53]. This is a library including 53 benchmark Mealy FSMs. The benchmarks are parts of real projects. These benchmarks are often used by many researchers to compare the characteristics of FSMs produced by various design methods; some examples can be found in [48,54,55,56]. The state transition tables of the benchmark FSMs [53] are represented in the format KISS2. Our current paper is devoted to Moore FSMs. Therefore, the initial files were transformed to represent benchmark Moore FSMs. The transformation is based on a known approach [57]. The characteristics of the resulting Moore FSMs are represented in Table 7. The table includes the numbers of inputs, outputs, states, transitions, state variables (

R_{A}

), and the total number of inputs and state variables (

L + R_{A}

).

FPGA implementations of the considered Moore FSMs were obtained from VHDL descriptions generated on the basis of the original KISS2 files. The transformation from KISS2 specifications to VHDL models was carried out using our K2F CAD tool [50]. Synthesis and simulation were performed in the Active-HDL environment, whereas hardware implementation was completed in AMD Xilinx Vivado v2025.1 [43] (San Jose, CA, USA). The target device was the AMD Xilinx Virtex UltraScale+ FPGA xcvu29p-fsga2577-2L-e [24,30]. This device is built from six-input base LUTs, and each CLB (SLICE L) contains eight such elements. By combining them with dedicated multiplexers, the architecture supports super-LUT structures with seven, eight, or nine inputs. In particular, a seven-input super-LUT requires one F7 MUX, an eight-input super-LUT requires two F7 MUXes and one F8 MUX, whereas a nine-input super-LUT is formed using four F7 MUXes, two F8 MUXes, and one F9 MUX.

6.1. Experimental Results

As an initial point of reference, we evaluated the proposed approach against three state-assignment strategies available in Vivado, namely, auto, one-hot, and gray. These are general-purpose solutions applicable to arbitrary FSM implementations. For the benchmark set considered in this study, however, the variants obtained with Auto consistently exhibited the least favorable characteristics, and for this reason they are omitted from further discussion. The comparison is therefore presented in four separate tables: Table 8 for one-hot-based FSMs, Table 9 for gray-based FSMs, Table 10 for

U_{1}

-based FSMs, and Table 11 for

U_{2}

-based FSMs.

All four tables use the same set of eight performance indicators. The column “Benchmark” identifies the FSM under consideration. “LUTs” gives the number of base LUTs used in the implementation. “F7 MUX” and “F8 MUX” denote the numbers of multiplexers employed to construct 7-input and 8-input super-LUTs, respectively. “Registers” specifies the number of D flip-flops forming the state register. “Delay” represents the clock-cycle time in nanoseconds, whereas “Freq.” gives the corresponding maximum operating frequency in MHz. Finally, “Power” reports the total power consumption in watts. According to the obtained results, F9 MUX resources are not used for the benchmark circuits from [53]; consequently, no separate “F9 MUX” column is included.

As we found before [1], the decomposition should be executed only if the following condition is violated:

L + R_{A} \leq 2 S_{L} .

(24)

If (24) holds, then each function from SBFs (2) and (3) is generated by a single-LUT circuit. Such FSMs belong to a group of simple FSMs. If condition (24) is violated but condition (5) holds, then it is necessary to use some decomposition approach. Such FSMs form a group of complex FSMs. If condition (5) is violated, then only functional decomposition can be used. Such FSMs are very complex. Analysis of Table 7 shows that it includes FSMs from all three groups: 6 benchmarks are very complex, 14 benchmarks are complex, and 33 benchmarks are simple.

The following benchmarks belong to the group of complex FSMs: ex1, kirkman, planet, planet1, pma, s1, s1488, s1494, s1a, s208, s298, styr, and tma. The group of very complex FSMs includes the benchmarks s420, s510, s820, s832, sand, and scf.

Let us explain how the columns of Table 10 and Table 11 are filled. Our goal is to create circuits with the minimum number of LUTs. If, for a particular simple one-hot-based FSM, the minimum LUT count is achieved, then we use the information from all columns of Table 8 to fill the columns of Table 10 and Table 11 (for this particular benchmark). Otherwise, information from Table 9 is used to fill the columns of Table 10 and Table 11. If some very complex one-hot-based FSM has the minimum value for the LUT count, then we use information from all columns of Table 8 to fill the columns of Table 10 and Table 11 (for this particular benchmark). Otherwise, for this benchmark, information from Table 9 is used to fill the columns of Table 10 and Table 11. For complex FSMs, the information is taken from Vivado reports. Let us point out that the values of maximum operating frequencies are calculated using values of cycle times available from Vivado reports.

Using Table 8, Table 9, Table 10 and Table 11, we created two tables for comparing the spatial (Table 12) and temporal (Table 13) characteristics of FSM circuits based on different architectures. These tables have the same structures. Their rows are marked by the name of the benchmark, the columns by the design method (One-hot, Gray,

U_{1}

, and

U_{2}

). The row “Total” includes results of summation for numbers from corresponding columns. The results obtained for

U_{2}

-based FSMs are taken as 100%. The row “Percentage” shows the percentage of summarized characteristics respective to the

U_{2}

-based benchmarks.

6.2. Discussion

As follows from Table 12, the circuits of

U_{2}

-based FSMs require fewer LUTs than their counterparts used for comparison. The gain in LUTs is the following: 1) 39.60% with respect to one-hot-based FSMs; 2) 30.13 % with respect to gray-based FSMs; and 3) 7.5% with respect to

U_{1}

-based FSMs. An analysis of Table 12 shows that the gain is obtained only for complex FSMs. Obviously, the gain is connected to including the core of partial IMFs based on maximum binary codes. The second factor is the reduction in using PESs for organization in the first level of the FSM circuit.

As we can see from Table 13, the proposed approach does not deteriorate the temporal characteristics of the FSM circuits. Moreover, our method even improves the values of the maximum operating frequencies. As follows from Table 13, the proposed method leads to circuits with higher operating frequencies than for the other investigated methods. There is the following gains in frequency: 1) 9.92% with respect to one-hot-based FSMs; 2) 4.58% with respect to gray-based FSMs; and 3) 0.75% with respect to

U_{1}

-based FSMs. As follows from Table 13, the proposed approach gives better results for FSMs where condition (24) is violated and condition (5) holds.

Obviously, the proposed method is suitable for implementing LUT-based circuits of complex FSMs. To check the influence of our approach on characteristics of complex FSMs, we created Table 14, which includes results of implementation only for complex FSMs. This table shows summarized values for four characteristics obtained from Vivado reports (the total numbers of LUTs, flip-flops, summarized frequency, and power consumption). For all these characteristics, we show the percentage of summarized characteristics with respect to the corresponding characteristics of the

U_{2}

-based benchmarks.

As follows from Table 14, the proposed approach allows for production of LUT-based circuits with better spatial and temporal characteristics than for the other methods investigated. An analysis of Table 14 shows the following gains in the values of the LUT counts: 1) 57.9% with respect to one-hot-based FSMs; 2) 39.73% with respect to gray-based FSMs; and 3) 12.7% with respect to

U_{1}

-based FSMs. Moreover, there is the following gains in frequency: 1) 26.06% with respect to one-hot-based FSMs; 2) 16.94% with respect to gray-based FSMs; and 3) 2.84% with respect to

U_{1}

-based FSMs. The circuits based on the proposed approach consume more power. But this is connected with the fact that they are faster then their counterparts. Using Vivado, it is impossible to compare the values of power consumption for the same operating frequency.

So, three different approaches could be used for synthesis of FPGA-based Moore FSMs with twofold state assignment. To make the choice of the approach to be used, we worked out a method to choose which model is the best for some Moore FSM. The flowchart of this method is shown in Figure 7.

Let us explain this method. First of all, it is necessary to check that the method of twofold state assignment may be used. This checking is executed in the following way. The validity of condition (5) should be checked for each state

a_{m} \in A

. The checking starts from the state

a_{1}

(block 1). The condition (5) is checked in block 2. If the condition is violated (the output No from block 2), then twofold state assignment is impossible. So, a particular FSM should be synthesized using the methods of functional decomposition (block 3). So, the choice of the synthesis method is finished (the transition to the block End).

If the condition (5) holds for a particular state (the output Yes from block 2), then it is necessary to check the next FSM state. So, the value of m (the subscript of a state) is incremented (block 4). If some states are not checked (the output No from block 5), then the next state should be checked (the transition to block 2). If the condition (5) is valid for all states (the output Yes from block 5), then it is necessary to create the set

A_{M C}

and to find the values of both

R_{M C}

and

L_{M C}

(block 6).

Next, we should check whether the MC-based core could be used for a particular FSM. This is determined by the condition (12). This condition is checked in block 7. If the condition (12) holds (the output Yes from block 7), then it is possible to use the model

U_{2}

proposed in the current paper (block 8). Otherwise (the output No from block 7), we should use the model

U_{1}

based on twofold state assignment (block 9). The outputs of both blocks 8 and 9 are connected with the block End. So, the choice is finished.

So, our approach allows for the improvement of both spatial and temporal characteristics of the LUT-based circuits of complex FSMs. Let us point out that the obtained values of gain are valid only for the benchmarks [58] and the device AMD Virtex UltraScale+ 56G PAM4 VCU129 FPGA Evaluation Kit (Virtex Ultrascale+ xcvu29p-fsga2577-2L-e) [24,30]. But the results of conducted experiments show that the proposed design method may improve both the spatial (the values of LUT count) and temporal (the maximum value of operating frequency) characteristics of complex FSMs.

7. Conclusions

One of the most important problems associated with FPGA-based FSM design is the problem of reducing the chip areas occupied by FSM circuits. In the case of LUT-based design, the best solution decreases the value of the LUT count. It is very important to find a solution with the minimum LUT count that does not significantly increase the value of the cycle time. To obtain a LUT-based circuit of complex FSMs, it is necessary to use various decomposition methods. One of these methods is twofold state assignment. It belongs to the methods of structural decomposition.

In this paper, we proposed a new architecture for a LUT-based Moore FSM circuit. The method is connected with the introduction of an additional core of partial input memory functions in the architecture based on twofold state assignment. This new core is based on maximum binary codes. A comparison with single-core Moore FSMs shows the following:

1.: Using the second core reduces the number of blocks based on partial state codes as well as the number of LUTs required for producing these codes.
2.: The proposed method does not increase the number of blocks generating the partial input memory functions.

As follows from the experimental part of the paper, using two cores of functions produces Moore FSM circuits that consume fewer LUTs compared with equivalent FSMs based on twofold state assignment. Moreover, instead of performance deterioration, the proposed method obtains circuits with higher values of maximum operating frequency. We think that, for LUT-based design, the proposed method may be used for improving the spatial and temporal characteristics of complex Moore FSMs. The main limitation of this work: the proposed method could be applied only in the case when the FSM states are encoded using twofold state assignment. This determines the future direction of our research. In the future, we are going to transform the proposed method to take into account the peculiarities of other known methods of state assignment.

Author Contributions

Conceptualization, A.B., L.T. and K.K.; methodology, A.B., L.T. and K.K.; software, A.B., L.T. and K.K.; validation, A.B., L.T. and K.K.; formal analysis, A.B., L.T. and K.K.; investigation, A.B., L.T. and K.K.; writing—original draft preparation, A.B., L.T. and K.K.; supervision, A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CLB	Configurable logic block
DST	Direct structure table
FD	Functional decomposition
FPGA	Field-programmable gate array
FSM	Finite state machine
IMF	Input memory function
LUT	Look-up table
MBC	Maximum binary code
SBF	System of Boolean functions
SD	Structural decomposition
SOP	Sum-of-products
STT	State transition table
TSA	Twofold state assignment

References

Barkalov, A.; Titarenko, L.; Krzywicki, K. Logic Synthesis for FPGA-Based Mealy Finite State Machines: Structural Decomposition in Logic Design; Taylor & Francis Group; CRC Press: Boca Raton, FL, USA, 2024; p. 332. [Google Scholar] [CrossRef]
Czerwinski, R.; Kania, D. Finite State Machine Logic Synthesis for Complex Programmable Logic Devices; Lecture Notes in Electrical Engineering; Springer: Berlin, Germany, 2013; Volume 231, p. 172. [Google Scholar] [CrossRef]
Sklyarov, V.; Skliarova, I.; Barkalov, A.; Titarenko, L. Synthesis and Optimization of FPGA-Based Systems; Lecture Notes in Electrical Engineering; Springer International Publishing: Cham, Switzerland, 2014; Volume 294, p. 432. [Google Scholar] [CrossRef]
Kubica, M.; Opara, A.; Kania, D. Logic Synthesis for FPGAs Based on Cutting of BDD. Microprocess. Microsyst. 2017, 52, 173–187. [Google Scholar] [CrossRef]
Kubica, M.; Kania, D.; Kulisz, J. A Technology Mapping of FSMs Based on a Graph of Excitations and Outputs. IEEE Access 2019, 7, 16123–16131. [Google Scholar] [CrossRef]
Opara, A.; Kubica, M.; Kania, D. Methods of Improving Time Efficiency of Decomposition Dedicated at FPGA Structures and Using BDD in the Process of Cyber-Physical Synthesis. IEEE Access 2019, 7, 20619–20631. [Google Scholar] [CrossRef]
Kubica, M.; Kania, D. Area-oriented technology mapping for LUT-based logic blocks. Int. J. Appl. Math. Comput. Sci. 2017, 27, 207–222. [Google Scholar] [CrossRef]
Brown, B.D.; Card, H.C. Stochastic neural computation. I. Computational elements. IEEE Trans. Comput. 2001, 50, 891–905. [Google Scholar] [CrossRef]
Li, P.; Lilja, D.J.; Qian, W.; Riedel, M.D.; Bazargan, K. Logical Computation on Stochastic Bit Streams with Linear Finite-State Machines. IEEE Trans. Comput. 2014, 63, 1474–1486. [Google Scholar] [CrossRef]
Ardakani, A.; Leduc-Primeau, F.; Onizawa, N.; Hanyu, T.; Gross, W.J. VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2017, 25, 2688–2699. [Google Scholar] [CrossRef]
Barkalov, O.; Titarenko, L.; Mazurkiewicz, M. Foundations of Embedded Systems; Studies in Systems, Decision and Control; Springer International Publishing: Cham, Switzerland, 2019; Volume 195, p. 167. [Google Scholar]
Esper, K.; Wildermann, S.; Teich, J. Automatic Synthesis of FSMs for Enforcing Non-functional Requirements on MPSoCs Using Multi-objective Evolutionary Algorithms. ACM Trans. Des. Autom. Electron. Syst. 2023, 28, 98. [Google Scholar] [CrossRef]
Ganewattha, C.; Khan, Z.; Lehtomaki, J. Hardware-accelerated Real-time Drift-awareness for Robust Deep Learning on Wireless RF Data. ACM Trans. Reconfig. Technol. Syst. 2023, 16, 19. [Google Scholar] [CrossRef]
Jiang, P.; Yang, X.; Gagan, A. Combining SIMD and Many/Multi-core Parallelism for Finite-state machines with Enumerative Speculation. ACM Trans. Parallel Comput. 2020, 7, 15. [Google Scholar] [CrossRef]
Hazzazi, M.; Budaraju, R.; Bassfar, Z.; Albakri, A.; Mishra, S. A Finite State Machine-Based Improved Cryptographic Technique. Mathematics 2023, 11, 2225. [Google Scholar] [CrossRef]
Garcia-Vargas, I.; Senhadji-Navarro, R. A New Approach for Implementing Finite State Machines with Input Multiplexing. Electronics 2023, 12, 3763. [Google Scholar] [CrossRef]
Trifan, M.; Ionescu, B.; Ionescu, D. A Combined Finite State Machine and PlantUML Approach to Machine Learning Applications. In Proceedings of the 2023 IEEE 17th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 23–26 May 2023; pp. 631–636. [Google Scholar] [CrossRef]
Wu, X.; Wang, L.; Li, A.; Wu, G.; Hu, Z.; Fei, F.Y.; Mori, T. High Conversion Efficiency in Intrinsic High Power-Density Mg2Sn-GeTe Thermoelectric Generator. Adv. Sci. 2025, 12, e06997. [Google Scholar] [CrossRef]
Baranov, S. Logic and System Design of Digital Systems; TUT Press: Tallinn, Estonia, 2008; p. 276. [Google Scholar]
Minns, P.; Elliott, I. FSM-Based Digital Design Using Verilog HDL; John Wiley and Sons: Chichester, UK, 2008; p. 391. [Google Scholar]
Rawski, M.; Łuba, T.; Jachna, Z.; Tomaszewicz, P. Design of Embedded Control Systems; Chapter The Influence of Functional Decomposition on Modern Digital Design Process; Springer: Boston, MA, USA, 2005; pp. 193–203. [Google Scholar]
Scholl, C. Functional Decomposition with Application to FPGA Synthesis; Kluwer Academic Publishers: Boston, MA, USA, 2001. [Google Scholar]
Boutros, A.; Betz, V. FPGA Architecture: Principles and Progression. IEEE Circuits Syst. Mag. 2021, 21, 4–29. [Google Scholar] [CrossRef]
AMD Virtex UltraScale+ 56G PAM4 VCU129 FPGA Evaluation Kit. Available online: https://www.amd.com/en/products/adaptive-socs-and-fpgas/evaluation-boards/vcu129.html (accessed on 23 March 2026).
Kuon, I.; Tessier, R.; Rose, J. FPGA Architecture: Survey and Shallenges—Found Trends. Electr. Des. Autom. 2008, 2, 135–253. [Google Scholar]
Trimberger, S. Three ages of FPGA: A Retrospective on the First Thirty Years of FPGA Technology. IEEE Proc. 2015, 103, 318–331. [Google Scholar] [CrossRef]
Grout, I. Digital Systems Design with FPGAs and CPLDs; Elsevier Science: Amsterdam, The Netherlands, 2011; p. 718. [Google Scholar]
Altera. Intel Corporation (Since 2016). Available online: https://www.intel.com/content/www/us/en/products/programmable.html (accessed on 23 March 2026).
Xilinx, A. Available online: https://www.amd.com/en.html (accessed on 23 March 2026).
UltraScale Architecture and Product Data Sheet: Overview (DS890). Available online: https://docs.amd.com/v/u/en-US/ds890-ultrascale-overview (accessed on 23 March 2026).
Lu, S.; Shang, L.; Qu, Q.; Jung, S.; Liang, Q.; Pan, C. An Efficient Multi-Output LUT Mapping Technique for Field-Programmable Gate Arrays. Electronics 2025, 14, 1782. [Google Scholar] [CrossRef]
Wang, F.; Zhu, L.; Zhang, J.; Li, L.; Zhang, Y.; Luo, G. Dual-output LUT merging during FPGA technology mapping. In Proceedings of the 39th International Conference on Computer-Aided Design (ICCAD), San Diego, CA, USA, 2–5 November 2020; pp. 1–9. [Google Scholar]
Chapman, K. Multiplexer Design Techniques for Datapath Performance with Minimized Routing Resources; Xilinx All Programmable, 2014; pp. 1–32. Available online: https://docs.amd.com/api/khub/documents/9pOn~3NV8ApAbwglqp6MJQ/content (accessed on 23 March 2026).
Liu, Y.; Peng, Y.; Wang, B.; Yao, S.; Liu, Z. Review on cyber-physical systems. IEEE/CAA J. Autom. Sin. 2017, 4, 27–40. [Google Scholar] [CrossRef]
Barkalov, O.; Titarenko, L.; Mielcarek, K. Improving characteristics of LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2020, 30, 745–759. [Google Scholar] [CrossRef]
Kilts, S. Advanced FPGA Design: Architecture, Implementation, and Optimization; Wiley-IEEE Press: Hoboken, NJ, USA, 2007; p. 312. [Google Scholar]
Feng, W.; Greene, J.; Mishchenko, A. Improving FPGA Performance with a S44 LUT Structure. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays; FPGA ’18; ACM: New York, NY, USA, 2018; pp. 61–66. [Google Scholar] [CrossRef]
Murray, K.E.; Petelin, O.; Zhong, S.; Wang, J.M.; Eldafrawy, M.; Legault, J.-P.; Sha, E.; Graham, A.G.; Wu, J.; Walker, M.J.P.; et al. VTR 8: High-performance CAD and customizable FPGA architecture modelling. ACM Trans. Reconfig. Technol. Syst. 2020, 13, 9. [Google Scholar] [CrossRef]
Atmel. Microchip Technology (Since 2016). Available online: https://www.microchip.com (accessed on 23 March 2026).
De Micheli, G. Synthesis and Optimization of Digital Circuits; McGraw–Hill: New York, NY, USA, 1994; p. 578. [Google Scholar]
Sentowich, E.; Singh, K.; Lavango, L.; Moon, C.; Murgai, R.; Saldanha, A.; Savoj, H.; Stephan, P.; Bryton, R.; Sangiovanni-Vincentelli, A. SIS: A System for Sequential Circuit Synthesis; Technical Report; University of California: Berkely, CA, USA, 1992. [Google Scholar]
Brayton, R.; Mishchenko, A. ABC: An Academic Industrial-Strength Verification Tool. In Computer Aided Verification, Proceedings of the 22nd International Conference; Touili, T., Cook, B., Jackson, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 24–40. [Google Scholar] [CrossRef]
Xilinx. Vivado Design Suite User Guide: Synthesis; UG901 (v2025.1). 2026. Available online: https://docs.amd.com/r/en-US/ug901-vivado-synthesis/Introduction (accessed on 23 March 2026).
Quartus Prime Design Software. 2026. Available online: https://www.intel.pl/content/www/pl/pl/software/programmable/quartus-prime/overview.html (accessed on 23 March 2026).
Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving Characteristics of LUT-Based Mealy FSMs with Twofold State Assignment. Electronics 2021, 10, 901. [Google Scholar] [CrossRef]
Barkalov, A.; Titarenko, L.; Chmielewski, S. Improving characteristics of LUT-based Moore FSMs. IEEE Access 2020, 8, 155306–155318. [Google Scholar] [CrossRef]
Kubica, M.; Opara, A.; Kania, D. Technology Mapping for LUT-Based FPGA; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar] [CrossRef]
Opara, A.; Kubica, M.; Kania, D. Decomposition Approaches for Power Reduction. IEEE Access 2023, 11, 29417–29429. [Google Scholar] [CrossRef]
Testa, E.; Amaru, L.; Soeken, M.; Mishchenko, A.; Vuillod, P.; Luo, J.; Casares, C.; Gaillardon, P.; Micheli, G.D. Scalable Boolean Methods in a Modern Synthesis Flow. In Proceedings of the 2019 Design, Automation Test in Europe Conference Exhibition (DATE), Florence, Italy, 25–29 March 2019; pp. 1643–1648. [Google Scholar] [CrossRef]
Barkalov, A.; Titarenko, L.; Mielcarek, K.; Chmielewski, S. Logic Synthesis for FPGA-Based Control Units—Structural Decomposition in Logic Design; Lecture Notes in Electrical Engineering; Springer: Berlin, Germany, 2020; Volume 636. [Google Scholar] [CrossRef]
Barkalov, A.; Titarenko, L.; Krzywicki, K. Structural Decomposition in FSM Design: Roots, Evolution, Current State—A Review. Electronics 2021, 10, 1174. [Google Scholar] [CrossRef]
Achasova, S. Synthesis Algorithms for Automata with PLAs; M-Soviet Radio: Moscow, Russia, 1987; p. 135. (In Russian) [Google Scholar]
Barkalov, A.; Titarenko, L.; Krzywicki, K. Improving Characteristics of LUT-Based Sequential Blocks for Cyber-Physical Systems. Energies 2022, 15, 2636. [Google Scholar] [CrossRef]
Opara, A.; Kubica, M. Technology mapping of multi-output functions leading to the reduction of dynamic power consumption in FPGAs. Int. J. Appl. Math. Comput. Sci. 2023, 33, 267–284. [Google Scholar] [CrossRef]
Salauyou, V. Area and Performance Estimates of Finite State Machines in Reconfigurable Systems. Appl. Sci. 2024, 14, 11833. [Google Scholar] [CrossRef]
Salauyou, V. Fault Detection of Moore Finite State Machines by Structural Models. In Proceedings of the Computer Information Systems and Industrial Management: 22nd International Conference, CISIM 2023, Tokyo, Japan, 22–24 September 2023; pp. 394–409. [Google Scholar] [CrossRef]
Baranov, S. Logic Synthesis of Control Automata; Kluwer Academic Publishers: Boston, MA, USA, 1994; p. 312. [Google Scholar]
McElvain, K. LGSynth93 Benchmark; Mentor Graphics: Wilsonville, OR, USA, 1993. [Google Scholar]

Figure 1. Structural diagram of Moore FSM

U_{1}

.

Figure 1. Structural diagram of Moore FSM

U_{1}

.

Figure 2. LUT-based architecture of FSM

U_{2}

.

Figure 2. LUT-based architecture of FSM

U_{2}

.

Figure 3. State assignment for set

A_{M C}

—(a) initial step, (b) second step, (c) final step.

Figure 3. State assignment for set

A_{M C}

—(a) initial step, (b) second step, (c) final step.

Figure 4. Outcome of state assignment for Moore FSM E1.

Figure 5. Partial state codes for

L U T e r 1

.

Figure 5. Partial state codes for

L U T e r 1

.

Figure 6. Partial state codes for

L U T e r 2

.

Figure 6. Partial state codes for

L U T e r 2

.

Figure 7. Choice of a model used for synthesis.

Table 1. State transition table of Moore FSM E1.

$a_{m}$	$a_{s}$	$X_{h}$	h
$a_{1}$	$a_{2}$	$x_{1}$	1
	$a_{3}$	$\bar{x_{1}}$	2
$a_{2} (y_{1} y_{2})$	$a_{3}$	$x_{3}$	3
	$a_{4}$	$\bar{x_{3}}$	4
$a_{3} (y_{2} y_{5})$	$a_{4}$	$x_{2}$	5
	$a_{5}$	$\bar{x_{2}}$	6
$a_{4} (y_{4})$	$a_{4}$	$x_{2}$	7
	$a_{5}$	$\bar{x_{2}}$	8
$a_{5} (y_{1} y_{7})$	$a_{6}$	$x_{1}$	9
	$a_{7}$	$\bar{x_{1}}$	10
$a_{6} (y_{6})$	$a_{6}$	$x_{1}$	11
	$a_{7}$	$\bar{x_{1}}$	12
$a_{7} (y_{1} y_{3})$	$a_{8}$	$x_{3}$	13
	$a_{9}$	$\bar{x_{3}} x_{4}$	14
	$a_{3}$	$\bar{x_{3}} \bar{x_{4}}$	15
$a_{8} (y_{4})$	$a_{8}$	$x_{3}$	16
	$a_{9}$	$\bar{x_{3}} x_{4}$	17
	$a_{3}$	$\bar{x_{3}} \bar{x_{4}}$	18
$a_{9} (y_{3} y_{4})$	$a_{10}$	$x_{6}$	19
	$a_{11}$	$\bar{x_{6}} x_{7}$	20
	$a_{1}$	$\bar{x_{6}} \bar{x_{7}}$	21
$a_{10} (y_{7})$	$a_{1}$	$x_{3}$	22
	$a_{12}$	$\bar{x_{3}}$	23
$a_{11} (y_{5})$	$a_{1}$	$x_{3}$	24
	$a_{12}$	$\bar{x_{3}}$	25
$a_{12} (y_{2})$	$a_{12}$	$x_{7}$	26
	$a_{7}$	$\bar{x_{7}} x_{8}$	27
	$a_{1}$	$\bar{x_{7}} \bar{x_{8}}$	28

Table 2. Table of

L U T e r M C

.

Table 2. Table of

L U T e r M C

.

$a_{m}$	$K (a_{m})$	$a_{s}$	$K (a_{s})$	$X_{h}$	$D_{h}$	h
$a_{1}$	0000	$a_{2}$	1101	$x_{1}$	$D_{1} D_{2} D_{4}$	1
		$a_{3}$	0010	$\bar{x_{1}}$	$D_{3}$	2
$a_{3}$	0010	$a_{4}$	0011	$x_{2}$	$D_{3} D_{4}$	3
		$a_{5}$	0100	$\bar{x_{2}}$	$D_{2}$	4
$a_{4}$	0011	$a_{4}$	0011	$x_{2}$	$D_{3} D_{4}$	5
		$a_{5}$	0100	$\bar{x_{2}}$	$D_{2}$	6
$a_{5}$	0100	$a_{6}$	0101	$x_{1}$	$D_{2} D_{4}$	7
		$a_{7}$	1001	$\bar{x_{1}}$	$D_{1} D_{4}$	8
$a_{6}$	0101	$a_{6}$	0101	$x_{1}$	$D_{2} D_{4}$	9
		$a_{7}$	1001	$\bar{x_{1}}$	$D_{1} D_{4}$	10

Table 3. Table of

L U T e r 1

.

Table 3. Table of

L U T e r 1

.

$a_{m}$	$PC (a_{m})$	$a_{s}$	$K (a_{s})$	$X_{h}$	$D_{h}$	h
$a_{2}$	010	$a_{3}$	0010	$x_{3}$	$D_{2}$	1
		$a_{4}$	0011	$\bar{x_{3}}$	$D_{3} D_{4}$	2
$a_{7}$	100	$a_{8}$	0111	$x_{3}$	$D_{2} D_{3} D_{4}$	3
		$a_{9}$	1011	$\bar{x_{3}} x_{4}$	$D_{1} D_{3} D_{4}$	4
		$a_{3}$	0010	$\bar{x_{3}} \bar{x_{4}}$	$D_{3}$	5
$a_{8}$	101	$a_{8}$	0111	$x_{3}$	$D_{2} D_{3} D_{4}$	6
		$a_{9}$	1011	$\bar{x_{3}} x_{4}$	$D_{1} D_{3} D_{4}$	7
		$a_{3}$	0010	$\bar{x_{3}} \bar{x_{4}}$	$D_{3}$	8
$a_{10}$	110	$a_{1}$	0000	$x_{3}$	–	9
		$a_{12}$	1010	$\bar{x_{3}}$	$D_{1} D_{2}$	10
$a_{11}$	111	$a_{1}$	0000	$x_{3}$	–	11
		$a_{12}$	1010	$\bar{x_{3}}$	$D_{1} D_{2}$	12

Table 4. Table of

L U T e r 2

.

Table 4. Table of

L U T e r 2

.

$a_{m}$	$PC (a_{m})$	$a_{s}$	$K (a_{s})$	$X_{h}$	$D_{h}$	h
$a_{9}$	1*	$a_{10}$	1100	$x_{6}$	$D_{1} D_{2}$	1
		$a_{11}$	0110	$\bar{x_{6}} x_{7}$	$D_{2} D_{3}$	2
		$a_{1}$	0000	$\bar{x_{6}} \bar{x_{7}}$	–	3
$a_{12}$	*1	$a_{12}$	1010	$x_{7}$	$D_{1} D_{3}$	4
		$a_{7}$	1001	$\bar{x_{7}} x_{8}$	$D_{1} D_{4}$	5
		$a_{1}$	0000	$\bar{x_{7}} \bar{x_{8}}$	–	6

Table 5. Table of

L U T e r T

.

Table 5. Table of

L U T e r T

.

$D_{M}$	0	1	2
$D_{1}$	+	+	+
$D_{2}$	+	+	+
$D_{3}$	+	+	+
$D_{4}$	+	+	+

Table 6. Table of

L U T e r τ Y

.

Table 6. Table of

L U T e r τ Y

.

$a_{m}$	$K (a_{m})$	$PC (a_{m})$	$τ_{m}$	$Y_{m}$
$a_{1}$	0000	–	–	–
$a_{2}$	1101	010	$τ_{2}$	$y_{1} y_{2}$
$a_{3}$	0010	–	–	$y_{2} y_{5}$
$a_{4}$	0011	–	–	$y_{4}$
$a_{5}$	0100	–	–	$y_{1} y_{7}$
$a_{6}$	0101	–	–	$y_{6}$
$a_{7}$	1001	100	$τ_{1}$	$y_{1} y_{3}$
$a_{8}$	0111	101	$τ_{1} τ_{3}$	$y_{4}$
$a_{9}$	1011	1*	$τ_{4}$	$y_{3} y_{4}$
$a_{10}$	1100	110	$τ_{1} τ_{2}$	$y_{7}$
$a_{11}$	0110	111	$τ_{1} τ_{2} τ_{3}$	$y_{5}$
$a_{12}$	1010	*1	$τ_{5}$	$y_{2}$

Table 7. Characteristics of benchmark Moore FSMs.

Benchmark	L	N	M	H	$R_{A}$	$L + R_{A}$
bbara	4	2	12	72	4	8
bbsse	7	7	23	104	5	12
bbtas	2	2	9	36	4	6
beecount	3	4	10	40	4	7
cse	7	7	32	183	5	12
dk14	3	5	26	208	5	8
dk15	3	5	17	136	5	8
dk16	2	3	75	300	7	9
dk17	2	3	16	64	4	6
dk27	1	2	10	20	4	5
dk512	1	3	23	48	5	6
donfile	2	1	25	100	5	7
ex1	9	19	80	637	7	16
ex2	2	2	25	92	5	7
ex3	2	2	14	44	4	6
ex4	6	9	17	28	5	11
ex5	2	2	16	52	4	6
ex6	5	8	13	61	4	9
ex7	2	2	17	56	5	7
keyb	7	2	22	193	5	12
kirkman	12	6	139	3817	8	20
lion	2	1	5	14	3	5
lion9	2	1	11	31	4	6
mark1	5	16	21	29	5	10
mc	3	5	8	20	3	6
modulo12	1	1	13	26	4	5
opus	5	6	10	22	4	9
planet	7	19	103	248	7	14
planet1	7	19	103	248	7	14
pma	8	8	49	132	6	14
s1	8	6	20	107	5	13
s1488	8	19	168	912	8	16
s1494	8	19	168	1030	8	16
s1a	8	6	21	115	5	13
s208	11	2	36	309	6	17
s27	4	1	6	34	3	7
s298	3	6	332	1669	9	12
s386	7	7	23	127	5	12
s420	19	2	36	282	6	25
s510	19	7	73	133	7	26
s8	4	1	6	24	3	7
s820	18	19	70	613	7	25
s832	18	19	70	707	7	25
sand	11	9	88	654	7	18
scf	27	56	138	190	8	35
shiftreg	1	1	16	32	4	5
sse	7	7	23	104	5	12
styr	9	10	57	366	6	15
tav	4	4	27	322	5	9
tbk	6	3	60	2942	6	12
tma	7	6	38	72	6	13
train11	2	1	14	34	4	6
train4	2	1	6	21	3	5

Table 8. Experimental results for one-hot-based FSMs.

Benchmark	LUTs	F7 MUX	Registers	Delay (ns)	Freq. (MHz)	Power (W)
bbara	17	0	12	2.494	400.96	2.723
bbsse	27	0	23	2.560	390.63	3.712
bbtas	9	0	9	2.531	395.10	2.805
beecount	18	0	10	3.554	281.37	3.705
cse	55	0	32	2.774	360.49	3.568
dk14	31	0	26	3.355	298.06	5.732
dk15	20	0	17	2.788	358.68	5.052
dk16	84	0	75	3.169	315.56	6.961
dk17	18	0	16	2.927	341.65	5.409
dk27	7	0	10	3.051	327.76	4.116
dk512	15	0	21	3.017	331.46	4.752
donfile	31	0	25	2.870	348.43	3.222
ex1	143	0	80	4.121	242.66	4.490
ex2	11	1	14	2.562	390.32	2.690
ex3	12	1	13	2.542	393.39	2.667
ex4	18	0	17	2.753	363.24	3.378
ex5	12	0	14	2.599	384.76	2.798
ex6	23	0	13	3.074	325.31	7.555
ex7	6	0	7	2.666	375.09	2.584
keyb	36	0	22	3.669	272.55	3.364
kirkman	267	0	139	4.777	209.34	5.048
lion	5	0	5	2.640	378.79	2.784
lion9	12	0	11	3.206	311.92	2.927
mark1	19	0	20	3.743	267.17	3.040
mc	10	0	8	2.736	365.50	3.735
modulo12	7	0	13	2.283	438.02	2.720
opus	16	0	10	3.463	288.77	3.900
planet	147	0	103	4.037	247.71	13.386
planet1	147	0	103	4.037	247.71	13.386
pma	83	0	49	3.325	300.75	5.049
s1	42	0	20	3.558	281.06	6.741
s1488	232	0	168	5.113	195.58	7.025
s1494	232	0	168	5.847	171.03	7.043
s1a	36	0	21	3.051	327.76	2.868
s208	61	0	36	2.873	348.07	3.343
s27	10	0	6	2.446	408.83	3.256
s298	1143	14	341	9.276	107.81	11.008
s386	29	0	23	2.684	372.58	3.661
s420	62	0	36	3.920	255.10	3.331
s510	64	0	73	4.597	217.53	5.757
s8	11	0	6	2.446	408.83	2.787
s820	103	0	70	3.665	272.85	4.908
s832	104	0	70	3.531	283.21	4.918
sand	136	0	88	4.387	227.95	7.000
scf	111	0	135	6.787	147.34	4.952
shiftreg	11	0	16	2.565	389.86	3.397
sse	27	0	23	2.560	390.63	3.712
styr	91	0	57	3.629	275.56	4.056
tav	31	0	27	3.569	280.19	3.722
tbk	152	0	60	3.445	290.28	3.618
tma	59	0	38	4.009	249.44	3.482
train11	14	0	14	3.196	312.89	2.879
train4	5	0	6	2.671	374.39	2.828
Total	4072	16	2419	183.148	16,541.89	243.500

Table 9. Experimental results for Gray-based FSMs.

Benchmark	LUTs	F7 MUX	F8 MUX	Registers	Delay (ns)	Freq. (MHz)	Power (W)
bbara	18	0	0	4	2.716	368.19	3.300
bbsse	36	3	0	5	2.636	379.36	7.750
bbtas	6	0	0	4	2.583	387.15	3.600
beecount	12	4	0	4	2.855	350.26	5.899
cse	55	10	0	5	3.059	326.90	5.155
dk14	25	10	5	5	2.854	350.39	9.755
dk15	20	0	0	5	3.453	289.60	8.376
dk16	70	22	0	7	3.758	266.10	7.730
dk17	7	0	0	4	2.537	394.17	6.233
dk27	4	0	0	4	2.725	366.97	3.897
dk512	8	0	0	5	2.612	382.85	4.473
donfile	11	5	0	5	3.077	324.99	3.579
ex1	177	27	0	7	3.526	283.61	18.548
ex2	5	0	0	4	2.720	367.65	2.953
ex3	6	0	0	4	2.517	397.30	4.521
ex4	22	0	0	5	2.687	372.16	5.824
ex5	5	0	0	4	2.823	354.23	4.232
ex6	19	2	0	4	2.664	375.38	12.036
ex7	5	0	0	4	2.823	354.23	3.593
keyb	42	5	1	5	3.221	310.46	4.324
kirkman	377	53	9	8	5.990	166.94	14.522
lion	3	0	0	3	2.638	379.08	2.754
lion9	5	0	0	4	2.549	392.31	2.775
mark1	29	0	0	5	2.845	351.49	8.345
mc	6	0	0	3	2.659	376.08	5.522
modulo12	3	0	0	4	2.362	423.37	2.630
opus	18	0	0	4	2.863	349.28	4.560
planet	129	33	0	7	3.411	293.17	20.657
planet1	129	33	0	7	3.411	293.17	20.657
pma	73	6	0	6	2.874	347.95	12.260
s1	53	7	0	5	3.089	323.73	10.869
s1488	291	33	2	10	5.280	189.39	26.114
s1494	293	31	1	10	5.670	176.37	26.202
s1a	49	9	1	5	3.257	307.03	5.447
s208	45	5	0	6	2.791	358.29	5.388
s27	7	3	0	3	2.456	407.17	3.696
s298	607	198	79	39	4.373	228.68	21.491
s386	36	4	0	5	2.642	378.50	7.201
s420	60	12	3	6	2.972	336.47	4.179
s510	51	4	1	7	4.080	245.10	8.502
s8	7	3	0	3	2.453	407.66	3.594
s820	151	27	7	7	3.662	273.07	10.973
s832	145	24	5	7	4.168	239.92	10.298
sand	125	19	0	7	3.278	305.06	11.072
scf	194	19	1	8	8.929	111.99	25.147
shiftreg	3	0	0	4	2.362	423.37	3.989
sse	36	3	0	5	2.636	379.36	7.750
styr	97	14	1	6	3.026	330.47	10.220
tav	25	5	0	5	3.094	323.21	7.392
tbk	136	14	0	6	3.812	262.33	6.286
tma	53	7	0	6	2.772	360.75	9.646
train11	5	0	0	4	2.584	387.00	2.748
train4	2	0	0	3	2.441	409.67	2.923
Total	3796	654	116	312	171.275	17,539.40	451.587

Table 10. Experimental results for

U_{1}

-based FSMs.

Table 10. Experimental results for

U_{1}

-based FSMs.

Benchmark	LUTs	F7 MUX	F8 MUX	Registers	Delay (ns)	Freq. (MHz)	Power (W)
bbara	17	0	0	4	2.716	368.19	2.723
bbsse	27	0	0	5	2.636	379.36	3.712
bbtas	6	0	0	4	2.583	387.15	3.600
beecount	12	4	0	4	2.855	350.26	5.899
cse	55	10	0	5	3.059	326.90	5.155
dk14	25	10	5	5	2.854	350.39	9.755
dk15	20	0	0	5	3.453	289.60	8.376
dk16	70	22	0	7	3.758	266.10	7.730
dk17	7	0	0	4	2.537	394.17	6.233
dk27	4	0	0	4	2.725	366.97	3.897
dk512	8	0	0	5	2.612	382.85	4.473
donfile	11	5	0	5	3.077	324.99	3.579
ex1	130	0	0	7	3.338	299.58	4.939
ex2	5	0	0	4	2.720	367.65	2.953
ex3	6	0	0	4	2.517	397.30	4.521
ex4	22	0	0	5	2.687	372.16	3.378
ex5	5	0	0	4	2.823	354.23	4.232
ex6	19	2	0	4	2.664	375.38	12.036
ex7	5	0	0	4	2.823	354.23	3.593
keyb	36	0	0	5	3.221	310.46	3.090
kirkman	240	4	0	8	3.822	261.64	5.553
lion	3	0	0	3	2.638	379.08	2.754
lion9	5	0	0	4	2.549	392.31	2.775
mark1	19	0	0	5	2.845	351.49	3.040
mc	6	0	0	3	2.659	376.08	5.522
modulo12	3	0	0	4	2.362	423.37	2.630
opus	18	0	0	4	2.863	349.28	3.900
planet	131	7	0	9	3.310	302.11	14.725
planet1	131	7	0	9	3.310	302.11	14.725
pma	75	0	0	6	2.727	366.70	5.554
s1	38	2	0	5	2.882	346.98	7.415
s1488	211	6	0	8	4.090	244.50	7.727
s1494	209	8	0	8	4.678	213.77	7.677
s1a	34	0	0	5	2.471	404.69	3.140
s208	41	3	0	6	2.289	436.87	3.677
s27	7	3	0	3	2.456	407.17	3.696
s298	546	12	12	9	3.585	278.94	12.021
s386	29	0	0	5	2.642	378.50	7.201
s420	60	12	0	6	2.972	336.47	4.179
s510	51	4	0	7	4.080	245.10	8.502
s8	7	3	0	3	2.453	407.66	3.594
s820	103	27	0	7	3.662	273.07	4.908
s832	104	24	0	7	4.168	239.92	4.918
sand	136	19	0	7	3.278	305.06	7.000
scf	111	19	0	8	8.929	118.99	4.952
shiftreg	3	0	0	4	2.362	423.37	3.989
sse	25	0	0	5	2.074	482.16	4.076
styr	83	0	0	6	2.903	344.47	4.461
tav	25	5	0	5	3.094	323.21	7.392
tbk	136	14	0	6	3.812	262.33	6.286
tma	49	3	0	6	2.273	439.95	3.830
train11	5	0	0	4	2.584	387.00	2.748
train4	2	0	0	3	2.441	409.67	2.923
Total	3136	235	17	282	162.921	18,224.98	291.364

Table 11. Experimental results for

U_{2}

-based FSMs.

Table 11. Experimental results for

U_{2}

-based FSMs.

Benchmark	LUTs	F7 MUX	F8 MUX	Registers	Delay (ns)	Freq. (MHz)	Power (W)
bbara	17	0	0	4	2.716	368.19	2.723
bbsse	27	0	0	5	2.636	379.36	3.712
bbtas	6	0	0	4	2.583	387.15	3.600
beecount	12	4	0	4	2.855	350.26	5.899
cse	55	10	0	5	3.059	326.90	5.155
dk14	25	10	5	5	2.854	350.39	9.755
dk15	20	0	0	5	3.453	289.60	8.376
dk16	70	22	0	7	3.758	266.10	7.730
dk17	7	0	0	4	2.537	394.17	6.233
dk27	4	0	0	4	2.725	366.97	3.897
dk512	8	0	0	5	2.612	382.85	4.473
donfile	11	5	0	5	3.077	324.99	3.579
ex1	113	0	0	7	3.271	305.72	5.235
ex2	5	0	0	4	2.720	367.65	2.953
ex3	6	0	0	4	2.517	397.30	4.521
ex4	22	0	0	5	2.687	372.16	3.378
ex5	5	0	0	4	2.823	354.23	4.232
ex6	19	2	0	4	2.664	375.38	12.036
ex7	5	0	0	4	2.823	354.23	3.593
keyb	36	0	0	5	3.221	310.46	3.090
kirkman	211	3	0	8	3.707	269.76	5.942
lion	3	0	0	3	2.638	379.08	2.754
lion9	5	0	0	4	2.549	392.31	2.775
mark1	19	0	0	5	2.845	351.49	3.040
mc	6	0	0	3	2.659	376.08	5.522
modulo12	3	0	0	4	2.362	423.37	2.630
opus	18	0	0	4	2.863	349.28	3.900
planet	117	5	0	9	3.277	305.16	15.314
planet1	117	5	0	9	3.277	305.16	15.314
pma	68	0	0	6	2.672	374.25	5.932
s1	34	2	0	5	2.824	354.11	7.859
s1488	186	4	0	8	3.967	252.08	8.036
s1494	187	6	0	8	4.584	218.15	7.907
s1a	31	0	0	5	2.347	426.08	3.359
s208	37	3	0	6	2.243	445.83	3.751
s27	7	3	0	3	2.456	407.17	3.696
s298	481	10	9	9	3.442	290.53	12.838
s386	29	0	0	5	2.642	378.50	7.201
s420	60	12	0	6	2.972	336.47	4.179
s510	51	4	0	7	4.080	245.10	8.502
s8	7	3	0	3	2.453	407.66	3.594
s820	103	27	0	7	3.662	273.07	4.908
s832	104	24	0	7	4.168	239.92	4.918
sand	136	19	0	7	3.278	305.06	7.000
scf	111	19	0	8	8.929	111.99	4.952
shiftreg	3	0	0	4	2.362	423.37	3.989
sse	23	0	0	5	1.991	502.26	4.369
styr	74	0	0	6	2.816	355.11	4.585
tav	25	5	0	5	3.094	323.21	7.392
tbk	136	14	0	6	3.812	262.33	6.286
tma	45	2	0	6	2.182	458.30	4.037
train11	5	0	0	4	2.584	387.00	2.748
train4	2	0	0	3	2.441	409.67	2.923
Total	2917	223	14	282	161.769	18,362.97	296.322

Table 12. Experimental results (LUT count).

Benchmark	One-Hot	Gray	$U_{1}$	$U_{2}$
bbara	17	18	17	17
bbsse	27	36	27	27
bbtas	9	6	6	6
beecount	18	12	12	12
cse	55	55	55	55
dk14	31	25	25	25
dk15	20	20	20	20
dk16	84	70	70	70
dk17	18	7	7	7
dk27	7	4	4	4
dk512	15	8	8	8
donfile	31	11	11	11
ex1	143	177	130	113
ex2	11	5	5	5
ex3	12	6	6	6
ex4	18	22	22	22
ex5	12	5	5	5
ex6	23	19	19	19
ex7	6	5	5	5
keyb	36	42	36	36
kirkman	267	377	240	211
lion	5	3	3	3
lion9	12	5	5	5
mark1	19	29	19	19
mc	10	6	6	6
modulo12	7	3	3	3
opus	16	18	18	18
planet	147	129	131	117
planet1	147	129	131	117
pma	83	73	75	68
s1	42	53	38	34
s1488	232	291	211	186
s1494	232	293	209	187
s1a	36	49	34	31
s208	61	45	41	37
s27	10	7	7	7
s298	1143	607	546	481
s386	29	36	29	29
s420	62	60	60	60
s510	64	51	51	51
s8	11	7	7	7
s820	103	151	103	103
s832	104	145	104	104
sand	136	125	136	136
scf	111	194	111	111
shiftreg	11	3	3	3
sse	27	36	25	23
styr	91	97	83	74
tav	31	25	25	25
tbk	152	136	136	136
tma	59	53	49	45
train11	14	5	5	5
train4	5	2	2	2
Total	4072	3796	3136	2917
Percentage, %	139.60	130.13	107.51	100.00

Table 13. Experimental results (maximum operating frequency, MHz).

Benchmark	One-Hot	Gray	$U_{1}$	$U_{2}$
bbara	400.96	368.19	368.19	368.19
bbsse	390.63	379.36	379.36	379.36
bbtas	395.10	387.15	387.15	387.15
beecount	281.37	350.26	350.26	350.26
cse	360.49	326.90	326.90	326.90
dk14	298.06	350.39	350.39	350.39
dk15	358.68	289.60	289.60	289.60
dk16	315.56	266.10	266.10	266.10
dk17	341.65	394.17	394.17	394.17
dk27	327.76	366.97	366.97	366.97
dk512	331.46	382.85	382.85	382.85
donfile	348.43	324.99	324.99	324.99
ex1	242.66	283.61	299.58	305.72
ex2	390.32	367.65	367.65	367.65
ex3	393.39	397.30	397.30	397.30
ex4	363.24	372.16	372.16	372.16
ex5	384.76	354.23	354.23	354.23
ex6	325.31	375.38	375.38	375.38
ex7	375.09	354.23	354.23	354.23
keyb	272.55	310.46	310.46	310.46
kirkman	209.34	166.94	261.64	269.76
lion	378.79	379.08	379.08	379.08
lion9	311.92	392.31	392.31	392.31
mark1	267.17	351.49	351.49	351.49
mc	365.50	376.08	376.08	376.08
modulo12	438.02	423.37	423.37	423.37
opus	288.77	349.28	349.28	349.28
planet	247.71	293.17	302.11	305.16
planet1	247.71	293.17	302.11	305.16
pma	300.75	347.95	366.70	374.25
s1	281.06	323.73	346.98	354.11
s1488	195.58	189.39	244.50	252.08
s1494	171.03	176.37	213.77	218.15
s1a	327.76	307.03	404.69	426.08
s208	348.07	358.29	436.87	445.83
s27	408.83	407.17	407.17	407.17
s298	107.81	228.68	278.94	290.53
s386	372.58	378.50	378.50	378.50
s420	255.10	336.47	336.47	336.47
s510	217.53	245.10	245.10	245.10
s8	408.83	407.66	407.66	407.66
s820	272.85	273.07	273.07	273.07
s832	283.21	239.92	239.92	239.92
sand	227.95	305.06	305.06	305.06
scf	147.34	111.99	111.99	111.99
shiftreg	389.86	423.37	423.37	423.37
sse	390.63	379.36	482.16	502.26
styr	275.56	330.47	344.47	355.11
tav	280.19	323.21	323.21	323.21
tbk	290.28	262.33	262.33	262.33
tma	249.44	360.75	439.95	458.30
train11	312.89	387.00	387.00	387.00
train4	374.39	409.67	409.67	409.67
Total	16,541.89	17,539.40	18,224.98	18,362.97
Percentage, %	90.08	95.52	99.25	100.00

Table 14. Comparison of characteristics for complex FSMs.

	One-Hot	Gray	$U_{1}$	$U_{2}$
Total LUTs	2710	2409	1943	1724
Percentage LUTS, %	157.19	139.73	112.70	100.00
Total Registers	1346	127	97	97
Percentage Registers, %	1387.63	130.93	100.00	100.00
Total Frequency	3595.09	4038.91	4724.49	4862.49
Percentage Frequency, %	73.94	83.06	97.16	100.00
Total Power	90.637	209.771	99.52	104.478
Percentage Power, %	86.75	200.78	95.25	100.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Barkalov, A.; Titarenko, L.; Krzywicki, K. Reducing LUT Counts in Moore FSMs with Twofold State Assignment. Appl. Sci. 2026, 16, 3540. https://doi.org/10.3390/app16073540

AMA Style

Barkalov A, Titarenko L, Krzywicki K. Reducing LUT Counts in Moore FSMs with Twofold State Assignment. Applied Sciences. 2026; 16(7):3540. https://doi.org/10.3390/app16073540

Chicago/Turabian Style

Barkalov, Alexander, Larysa Titarenko, and Kazimierz Krzywicki. 2026. "Reducing LUT Counts in Moore FSMs with Twofold State Assignment" Applied Sciences 16, no. 7: 3540. https://doi.org/10.3390/app16073540

APA Style

Barkalov, A., Titarenko, L., & Krzywicki, K. (2026). Reducing LUT Counts in Moore FSMs with Twofold State Assignment. Applied Sciences, 16(7), 3540. https://doi.org/10.3390/app16073540

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reducing LUT Counts in Moore FSMs with Twofold State Assignment

Abstract

1. Introduction

2. FPGA-Based Design of Moore FSMs

3. Related Works

4. The Essence of the Proposed Method

5. Example of Synthesis

6. Experimental Results and Discussion

6.1. Experimental Results

6.2. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI