Abstract
This work proposes a method for hardware reduction in circuits of Mealy finite state machines (FSMs). The circuits are implemented as networks of interconnected look-up table (LUT) elements. The FSMs with twofold state assignment and encoding of output collections are discussed. The method is based on using two LUT-based cores to implement systems of partial Boolean functions. One of the cores uses only maximum binary codes, while the second core is based on the use of extended state codes. The hardware reduction is based on diminishing the number of transformed maximum binary codes. This leads to FPGA-based FSM circuits with three levels of logic blocks. Each logic block has a single level of LUTs. As a result, partial functions are represented by single-LUT circuits. The article shows a step-by-step procedure for the transition from the initial form of the FSM representation to its logical circuit (a network of programmable look-up table elements, flip-flops, and interconnects). The results of experiments conducted with standard benchmarks show that the proposed approach produces LUT-based FSM circuits with significantly better area characteristics than for circuits produced by such methods as Auto and One-Hot of Vivado, JEDI, and twofold state assignment. Compared to these methods, the number of LUTs is reduced from 9.44% to 69.98%. Additionally, the proposed method leads to the following phenomenon: the maximum operating frequency is slightly improved as compared with FSM circuits based on twofold state assignment (up to 0.6%). The negative effect of these improvements is an increase in power consumption. However, it is extremely insignificant (up to 1.56%). As the values of the FSM’s main characteristics grow, there is an increase in the gain from the application of the proposed method. The conditions for applying the proposed method are determined. A generalized architecture consisting of three blocks of partial functions and a method for synthesizing an FSM with this architecture are proposed. A method for selecting one of the seven architectures generated by the generalized architecture is proposed.
1. Introduction
Our world is characterized by the widespread distribution of various cyber-physical systems (CPSs) into all spheres of human activity [1,2,3]. Currently, intensive research is being carried out in the field of designing and ensuring the safety of the operation of CPSs [4,5,6,7,8,9]. As the name suggests, these systems include digital (cybernetic) parts interacting with physical objects [10,11,12]. Very often, these digital parts include various sequential blocks [3,11]. These blocks can implement, for example, various security algorithms [13]. To improve the overall quality of a cybernetic part, it is necessary to optimize characteristics of its sequential blocks. In the current paper, we discuss a case where the sequential blocks of digital parts are represented by finite state machines (FSMs) [14].
Very often, the models of Mealy FSMs are used for the specification of sequential blocks [14,15]. The process of FSM design requires balancing the values of the occupied chip area, the maximum operating frequency, and power consumption [16,17]. We discuss a case where FSM circuits are designed with field-programmable gate arrays (FPGAs). The look-up table (LUT) elements are the basic elements used for implementing FSM circuits. As follows from [18,19], the circuit area has the greatest influence on the values of other characteristics. The area can be reduced due to jointly applying various methods of structural decomposition. In our paper [20], we propose an optimization method based on jointly applying the methods of twofold state assignment (TSA) and encoding of output collections. As a result, LUT-based FSM circuits have exactly three logic levels. Let us point out that FPGAs are very popular in modern digital systems design [5,7,8].
In this paper, we focus our attention on FPGA chips produced by AMD Xilinx [21] because this corporation is the largest manufacturer of FPGA chips. To implement an FSM circuit, we use configurable logic blocks (CLBs) that include four main components: LUTs, programmable flip-flops, dedicated multiplexers, and fast interconnections. To obtain a multi-CLB circuit, the system of inter-CLB programmable interconnects should be used. The proposed method reduces the values of LUT counts in the multi-level circuits of Mealy FSMs.
The main principle of TSA-based FSMs assumes using two types of internal state codes [20]. Each state is represented by both a maximum binary state code (MBC) and an extended state code (ESC) [20]. Such an approach allows for reducing FSM hardware compared to methods based solely on MBCs. However, the approach in [20] is connected with some overhead. Namely, an additional state transformer block should convert MBCs into ESCs. This converter consumes additional LUTs and interconnections. In this paper, we show how to reduce the noted overhead.
The main contribution of this paper boils down to the following. We have proposed: (1) a novel design method aimed at reducing the LUT counts in the circuits of FPGA-based Mealy FSMs with twofold state assignment and encoding of output collections; (2) a generalized FSM architecture, including three blocks of partial Boolean functions (PBFs); (3) a method of choosing one of seven possible FSM architectures based on the generalized architecture. To reduce hardware, we propose to use at least two cores of logic [22]. The first core generates PBFs based on MBCs. The second core uses ESCs for this purpose. This approach allows for reducing hardware in the state transformer circuit because now only a part of the MBCs is transformed into ESCs. The scientific novelty of the proposed approach also includes an improvement in the known method of encoding of output collections by some additional variables. This encoding is done so that each of the cores includes some additional variables that do not occur in the second core. Thanks to this approach, the number of LUTs generating additional variables is reduced. Our current research shows that joint usage of these two approaches leads to FSM circuits having fewer LUTs compared to FSM circuits based on the approach in [20]. The experimental results show that the proposed approach does not lead to significant deterioration of FSM temporal characteristics.
The remainder of the article is organized as follows. Section 2 presents the background of FPGA-based Mealy FSM design. Section 3 includes an analysis of relevant works. Section 4 is devoted to representing a main idea of the proposed method. An example of FSM synthesis is discussed in Section 5. The conducted experiments are analyzed in Section 6. A generalized FSM architecture is discussed in Section 7. Finally, Section 8 is a short conclusion that summarizes the results.
2. Background Information for FPGA-Based Mealy FSMs
A Mealy FSM has M internal states, L external inputs, and N outputs used by other blocks of a CPS. To organize interstate transitions, special internal objects are used. These include R1 state variables and R1 input memory functions (IMFs). These objects are combined into corresponding sets S, I, O, SV, and D [14], which represent the following: , , , , and . The sets S, I, and O uniquely follow from, for example, the FSM state transition graph (STG) [23]. However, the value of the parameter R1 is chosen by the circuit designer during the state assignment stage [23].
In the case of MBCs [24], the following formula determines the value of R1:
Formula (1) determines the number of bits for MBCs (this is the minimum possible number for the given number of states). In the case of one-hot state assignment [24], the value of R1 is equal to the number of states ().
The state variables create so-called full state codes . Each state code bit corresponds to a flip-flop from a register RG. The register is controlled by IMFs and two special pulses, Res and Clk [25]. The pulse Res executes the initialization of the FSM operation. This pulse sets an FSM in the initial state . The pulse Clk determines the instant of state code loading into RG. The r-th bit of is determined by the value of . Like the vast majority of researchers, we use D flip-flops to organize the register RG [26].
The following internal resources of FPGA fabric are involved in implementing an FSM circuit: LUTs, flip-flops, programmable interconnections, a synchronization tree, and programmable input–outputs [27,28]. In this paper, we consider a case where FPGAs of AMD Xilinx [25] are used.
A LUT is a functional generator having inputs and a single output [24,29]. A LUT may keep a truth table of any Boolean function it depends on up to Boolean arguments. Nowadays, the value of does not exceed 6. However, using dedicated multiplexers, the number of inputs can be increased to 8 (within a single CLB) [27]. If the number of Boolean arguments exceeds 8, then a corresponding function is represented by a multi-CLB circuit. This leads to the necessity of minimizing the number of LUTs and their levels in the resulting circuit [30,31]. In this paper, we denote by the symbol LUTer a block consisting of LUTs, multiplexers, flip-flops, and interconnections. All these elements are programmable [32].
Two systems of Boolean functions (SBFs) represent an FSM logic circuit. They are the following [17]:
These SBFs define a so-called P Mealy FSM whose architecture is shown in Figure 1 [14].
Figure 1.
Architecture of P Mealy FSM.
In Figure 1, the block LUTerSV implements IMFs (2). The IMFs determine the next state code (a code of the state of transition). The flip-flops of register RG are distributed among the elements of LUTerSV. The pulses Clk and Res control the operation of flip-flops. The block LUTerOF generates output functions (3).
The analysis of SBFs (2) and (3) shows that their functions depend on variables and . Each function depends on state variables and inputs. The number of LUT levels in the corresponding circuit depends on the following condition:
If (4) holds, then there is a single LUT in the corresponding logic circuit. The FSM circuit is single-level if condition (4) holds for each function belonging to SBFs (2) and (3). In this case, the resulting FSM circuit is characterized by the best possible values of its main characteristics. This means that this circuit requires the minimum possible chip area, that it consumes the minimum possible power, and that it represents the fastest possible solution.
Even average FSMs can have up to 10 state variables and 30 inputs [14]. Therefore, each function belonging to (2) and (3) may have up to 40 arguments. However, the number of LUT inputs is extremely small (). In this regard, the probability of violation of the condition (4) is very high. In the case of violation, various optimization methods are used to improve the characteristics of an FSM circuit. In this paper, we discuss a case where condition (4) is violated.
3. Analysis of Related Work
Methods for improving spatial characteristics of FSM circuits are discussed in thousands of scientific works. For example, they can be found in [18,19,25,30,32,33,34,35,36,37,38]. To estimate the chip area required for a LUT-based circuit, the designers use the values of LUT counts [18]. Therefore, reducing the value of LUT count leads to a decrease in the area occupied by the circuit. This goal can be achieved using: 1. an optimal state assignment; 2. a functional decomposition (FD) of SBFs (2) and (3); and 3. a structural decomposition (SD) of the FSM logic circuit [19].
The optimal state assignment excludes some literals from sum-of-products (SOPs) of functions (2) and (3) [39]. In the best case, this exclusion allows for implementing a single-level Mealy FSM circuit. One of best state assignment methods is JEDI, which is distributed together with the CAD tool SIS [40]. In the work of [41], results of applying JEDI to FSMs from the library LGSynth93 are shown [42]. These results show that JEDI allows for excluding up to 3 literals from SOPs (2) and (3), representing the benchmark FSMs. Therefore, using JEDI can turn multi-level circuits into single-level ones only for rather simple FSMs [32].
Using either FD or SD leads to representing SBFs (2) and (3) by systems of partial Boolean functions [34,43]. Each PBF should depend on no more than arguments. In this case, each PBF will be represented by a single-LUT circuit. Applying any type of decomposition produces multi-level FSM circuits. However, there is a fundamental difference in the resulting interconnection system for different decomposition methods [19]. Applying the functional decomposition leads to FSM circuits with a “spaghetti-type” irregular interconnect system. In such a system, the same inputs and state variables may appear at any place on the circuit. Let us point out that the system of interconnections has a regular character for SD-based FSM circuits. An SD-based FSM circuit consists of large blocks [19]. Each block has its unique systems of input variables and output functions, which can differ from FSM inputs and state variables . Due to this, SD-based circuits have better quality than the equivalent FD-based circuits [19].
One such method is the encoding of FSM output collections (OCs) [19]. A collection is a set of outputs that are generated simultaneously during the same interstate transition. If a particular STG has H interstate transitions, then the number of OCs, Q, differs from 1 to H [19].
To encode Q OCs by maximum binary codes , R2 variables are enough:
These variables create the set . There are two SBFs representing the system of FSM outputs [19]:
Applying this approach turns P Mealy FSM into PY Mealy FSM (Figure 2).
Figure 2.
Architecture of PY Mealy FSM.
In the LUT-based PY Mealy FSM, the block LUTerSV implements SBF (2). The block LUTerAV generates the additional variables represented by SBF (6). The block LUTerOF produces the FSM outputs represented by SBF (7).
As follows from the research [44], this approach allows for reducing the chip area necessary for generating FSM outputs compared to this parameter if the outputs are represented by SBF (3). However, this gain reduces the value of the maximum operating frequency compared to an equivalent P Mealy FSM. To optimize characteristics of PY Mealy FSMs, the encoding of OCs may be connected with a twofold state assignment [20], leading to Mealy FSMs. We will discuss them a bit further.
To execute the TSA, we should find a partition of the set S by K classes. Each class includes compatible states. States , are compatible if their inclusion in the same class of the partition does not lead to the following phenomenon: the required number of LUT inputs exceeds the maximum number of inputs of LUT . Why such a phenomenon is possible will be clear from the further text of the article. Three sets characterize any class . These sets consist of: 1. inputs determining transitions from states (a set including elements); 2. outputs produced during the transitions from these states (a set ); and 3. IMFs determining MBCs of transition states (a set ). If the encoding of OCs is used, then the set is replaced by set . The set includes additional variables equal to 1 in the codes of OCs generated during the transitions from states .
Each class includes compatible states . Inside each class, the states are encoded by partial codes . These codes have bits:
To create the partial codes, a set ASV of additional state variables is created. The states are encoded using the variables . The sets create the set ASV, which includes R3 elements:
If a state is compatible with states , then including this state into satisfies the condition:
This approach leads to a Mealy FSM. In Mealy FSMs, each state has two codes. One of them is a maximum binary full state code , and the second is a partial state code . The second code determines a particular state as an element of a particular class.
Each class determines the following two systems of PBFs:
To obtain the final values of additional variables and IMFs, the following SBFs should be created:
Next, the codes of the OCs should be transformed into FSM outputs. The outputs are represented by SBF (7). Additionally, the full state codes should be transformed into the corresponding partial codes. The transformation is represented by the following SBF:
SBFs (11) and (12) define the first level of a Mealy FSM circuit. SBFs (13) and (14) determine its second level. Finally, SBFs (7) and (15) represent the third circuit level. The architecture of a Mealy FSM is shown in Figure 3.
Figure 3.
Architecture of Mealy FSM.
In this architecture, the block LUTerk generates PBFs (11) and (12). The block LUTerPF implements the system of disjunctions (13) and (14). This block includes the distributed RG controlled by the pulses Clk and Res. The block LUTerOF implements the outputs represented by SBF (7). The block LUTerASV implements SBF (15). Therefore, it executes the transformation of state codes.
Our previous research [20] shows that the LUT-based circuits of FSMs have better characteristics than the circuits of equivalent PY FSMs. If the conditions
hold, then the circuits of FSMs are three-level and are faster than the equivalent PY Mealy FSMs.
Let us represent the circuit (Figure 3) as a combination of a core of partial functions (CorePF) and a functional transformer. The core includes blocks LUTer1–LUTerK. The functional transformer includes all other blocks shown in Figure 3. This leads to the generalized diagram of a FSM (Figure 4).
Figure 4.
Generalized diagram of a Mealy FSM.
Analysis of the generalized diagram shows the following peculiarity: the transformation of full codes into partial codes is executed for all FSM states. However, there is a case when there is no need in the code transformation. If, for some state , condition (4) holds, then, for this state, all PBFs are represented by single-LUT circuits. If we take into account this property, we can reduce the cardinality number of the partition . Additionally, the number of state variables R3 can be reduced as compared to its value for the equivalent FSM. In this paper, we propose a method based on taking into account the mentioned property.
4. Analysis of Our Current Approach
The transitions from state are determined by elements of a set . There are elements in the set . If the condition
holds, then it is enough for a single LUT to represent a circuit for any PBF generated during the transitions from . Therefore, for such states, it makes sense to use the full state codes for generating PBFs. If the condition (18) is violated, then the corresponding codes should be transformed into partial codes. This allows for creating a class of states whose maximum binary codes do not require the transformation. Therefore, the partition based on (10) should be constructed only for the states .
Based on the above-mentioned statement, we propose to use the ideas from our paper [22]. First of all, we should divide the set S by disjoint sets and . If a state satisfies condition (18), then this state is included in the set . The states create a block CoreFC. Otherwise, the state belongs to the set S1. The states form a block CorePC. Obviously, only the codes of states should be transformed.
CoreFC determines the sets , , and . The input causes the transitions from states creating the CoreFC. The set AV1 consists of additional variables produced only during the transitions from states creating the CoreFC. The set consists of the additional variables produced by both FSM cores. The set includes functions produced during the transitions creating the CoreFC. Therefore, the circuit of CoreFC is determined by the following SBFs:
To synthesize CorePC, it is necessary to create the partition of the set S1. This can be done using the same approach as the one creating . CorePC determines the sets and . Their purpose is clear from the previous analysis.
Three sets (, , ) are determined by each class of the partition . Their meaning follows from the previous text. The state variables from the set ASV2 encode the states . The codes of states are created from elements of the set . There are R4 elements in the set . The following SBFs determine the circuit of CorePC:
To generate the final values of additional variables, FSM outputs, and state variables, we should use the functional transformer. This block is similar to the one used in the FSM (Figure 3). Using this information, we propose to transform FSMs into Mealy FSMs (Figure 5).
Figure 5.
Architecture of the Mealy FSM.
In the proposed two-core FSM, the block CoreFC implements SBFs (19)–(21). The block CorePC implements SBFs (22) and (23). The block LUTerFA is a functional assembler implementing the following disjunctions:
The block LUTerFA includes a distributed full state code register whose informational inputs are connected with IMFs (24). The register is controlled by pulses Clk and Res. The block LUTerOF implements SBF (7) where . The block LUTerASV2 implements SBF:
Let us analyze the proposed solution. The partition has J classes. Obviously, the following conditions take place:
Due to the validity of condition (27), we can state that the circuit of the Mealy FSM (Figure 5) is not slower than the circuit of the equivalent FSM (Figure 3). Due to the validity of condition (28), we can state that the circuit of CorePC for FSM should perform better LUT counts than it does for block CorePF of the equivalent FSM . The same is true for block LUTerASV of the equivalent FSMs and . Therefore, we could expect that a circuit of FSM (Figure 5) requires a smaller area and is not slower compared to a circuit of equivalent FSM (Figure 3). These assumptions of ours have been confirmed by the conducted studies, the results of which are given in Section 6.
Let us show the features of our method in comparison with the methods proposed in [20,22]. In the article [20], we discussed FSMs with two-fold state assignment and encoding of output collections. The FSMs have the following differences. First, in FSMs, the codes of all states are converted, while in FSMs, only a part of the code is converted. This allows for optimizing the code converter circuit (compared to the circuit used in equivalent FSMs). Secondly, the use of two cores allows us to encode OCs such that some variables are generated only by the LUTs of CoreFC. This allows for reducing the number of LUTs generating output signals (compared to this number for equivalent FSMs). In the article [22], we discussed so-called FSMs, where two cores of LUTs are used. However, FSMs are based on one-hot encoding of outputs. In FSMs, we use maximum binary codes of output collections. This allows for reducing the number of LUTs generating output signals (compared to this number for equivalent FSMs).
In this paper, we propose a synthesis method aimed at LUT-based Mealy FSMs. The synthesis process starts from the FSM state transition graphs [17]. Next, these graphs are transformed into equivalent state transition tables (STTs) [17]. The sequence of steps of the proposed method is the following:
- Creating an STT of P Mealy FSM.
- Pre-formation of sets and S1.
- Pre-formation of partition of set S1.
- Final formation of sets and S1 and partition .
- Creating full state codes .
- Encoding of output collections and finding SBF (7).
- Encoding of states by partial state codes .
- Creating a table of LUTerASV and systems (26).
- Creating the Mealy FSM circuit.
To show that the model of FSM is used to synthesize FSM A, we use the symbol . Let us explain how to execute the steps of the proposed design method.
5. Synthesis Example
We discuss a synthesis example for Mealy FSM A1 (Figure 6). To implement the FSM circuit, we use LUTs with .
Figure 6.
Initial STG.
The FSM states correspond to the STG vertices [17]. To show interstate transitions, the vertices are connected by arcs. An STG includes H arcs. The h-th arc is marked by a pair . In this pair, the symbol stands for a conjunction of either FSM inputs or their complements. This is an input signal. The set includes FSM outputs generated during the transition number h.
The STG (Figure 6) determines the following sets: , , and . Therefore, the FSM A1 is characterized by , . There are 22 arcs in the initial STG. This gives 22 transitions among the states of FSM A1.
Step 1. This step is omitted if an FSM is represented by STT. The transformation is executed in the following way [14]. The STT includes H lines. Each line corresponds to an STG arc. Each transition is characterized by its current state , the next state , inputs (for the h-th arc, this is the signal ), outputs (for the h-th arc, this is the OC ), and h. Therefore, each arc determines the columns , , , , and h. Table 1 is an STT of A1.
Table 1.
STT of FSM A1.
This table uniquely corresponds to the STG (Figure 6). We add the column q into Table 1 to show the subscripts of output collections.
Step 2. The following values of can be found from the analysis of Table 1: for states , , , , ; for states , , , . Additionally, . Using (1) gives . As follows from the initial conditions of the example, . Therefore, condition (18) takes place for states with . Thus, the following sets can be created: and . As follows from our analysis, some states may be transferred from to S2. Thus, the elements of these sets can be changed. From Table 1, we can find the sets and .
Step 3. Using known approach [20], we can find the partition of the set S1. It includes the classes and . Because the set S1 is a preliminary one, this partition is also preliminary. Each class includes elements. Using (8) gives the following relation: . Using (9) gives . Therefore, there is a set of state variables .
Step 4. The classes determine the following sets of inputs: and . Therefore, we have . This means we cannot add new inputs in these sets due to violation of condition (10). Each set can include up to 3 elements without violation of (10). Therefore, one additional state can be added to each of the sets .
The method of state redistribution is discussed in detail in the paper [22]. In our current paper, we just show the result of redistribution, which is the following: and . The redistribution gives the following classes: and . Now, we obtain . Using these values and Formula (8), we can see that and . Therefore, the total number of state variables does not change, but now the set includes fewer elements. Now we can expect a decrease in the value of the LUT count for the circuit of CoreFC.
Step 5. There are elements in the set S. Therefore, using (1) gives . This value determines the sets and . As shown in [17], it is necessary to cover the states from the same class using the minimum possible number of generalized cubes of R1-dimensional Boolean space. Such an outcome decreases the number of literals in functions (19)–(21). One of the possible outcomes is shown in Figure 6. To encode the states by MBCs, we used the algorithm JEDI [40].
As we can see from the analysis of the resulting Karnaugh map (Figure 7), the states are covered by the generalized cube 00xx. The states are represented by the generalized cube x100. The cube 1x00 covers the states . Therefore, for our example, each class is placed into a single generalized cube.
Figure 7.
Maximum binary state codes for FSM A1.
Step 6. The analysis of Table 1 gives output collections. They are the following: , , , , , , , , , and . Using (5) gives and the set .
Each literal in the sum-of-product (SOP) of a Boolean function corresponds to an interconnection between the input source and a corresponding LUT. To reduce the number of interconnections, the number of literals in SOPs should be decreased. To encode the output collections, we used the methods presented in classical work [17]. Using the approach from [17] gives the codes shown in Figure 8.
Figure 8.
Codes of output collections for Mealy FSM A1.
We encoded the OCs in a way where the variable is generated only by one LUT of CoreFC. To do this, we have analyzed Table 1. The analysis of Table 1 shows that the following OCs are generated during the transitions from states : , , , and . Therefore, we have divided the Karnaugh map (Figure 8) into two parts. The first part corresponds to , and the second part corresponds to . We have placed the OCs , , , and into the second part. Now, we can obtain the following system of functions:
The SBF (29) determines the circuit of LUTerOF. The function is represented by a corresponding output of LUTerFA. Therefore, the circuit of LUTerOF consists of 6 LUTs. Analysis of system (29) shows that there are 12 literals in the SOPs of the implemented functions. This determines 12 interconnections between LUTerOF and other circuit blocks. Using the results of [19] gives the maximum number of interconnections. In our case, it is equal to . Thus, due to using the proposed approach, the number of interconnects is reduced by 2.33 times.
Step 7. To construct the table of CoreFC, it is necessary to select the lines of STT with transitions from states . In the discussed case, we should select lines 1–2 and 6–9 of STT (Table 1). The table of CoreFC includes 5 additional columns (compared to the baseline STT). These columns are: , , , , and . There is a self-explanatory meaning of columns and . The column includes IMFs creating the code (to load it into the code register). The column includes the additional variables equal to 1 in codes of generated OCs. These variables are also produced by some blocks of LUTerPC. The column includes the additional variables generated only by the block CoreFC. Obviously, these variables are not produced by any block of LUTerPC. Table 2 represents the block CoreFC for the given example.
Table 2.
Table of CoreFC for Mealy FSM A1.
The columns and are created in the following manner. For example, there is an OC written in line 1 of Table 1. Analysis of Figure 8 gives the code . This code determines the variables , , and . Therefore, the first line of Table 2 includes the variables and in the column , as well as the variable in the column . All other lines of Table 2 are created using a similar approach.
Using Table 2, we can obtain SBFs (19)–(21). For example, the function is represented as the following:
The block CoreFC determines the set . We will show a bit later the SOPs for functions and .
Step 8. The codes for states use the variables . The codes for states are based on the variables , . The code combination indicates that a particular state belongs to a class other than . The code combination indicates that a particular state belongs to a class other than . Due to the fulfillment of condition (10), the codes do not affect the number of LUTs in the circuit of CorePC. Therefore, the partial state codes can be arbitrary. We have chosen the following approach: the smaller the subscript (m) of a state, the more nulls its partial code contains. The obtained partial state codes are shown in Figure 9.
Figure 9.
Partial state codes for Mealy FSM A1.
Using Figure 9, we can obtain the following partial codes: , , and . Using them allows for creating tables representing CorePC.
Step 9. The block CorePC includes two blocks of LUTs. The block corresponds to the set , whereas the block corresponds to the set . The table of ) (Table 3) is based on lines 3–5, 10–12, and 15–16 (Table 1). Table 4 represents the block . The table is constructed using the lines 13–14 and 17–22 of the initial STT.
Table 3.
Table of CorePC().
Table 4.
Table of CorePC().
In these tables, the current states are represented by their partial codes ; the states of transition are represented by their full codes . The column of STT is replaced by the columns and , respectively. These columns include additional variables equal to 1 in the codes of the OCs.
The transparent approach is used to construct SBFs (22) and (23). For example, the functions , , and are represented as:
In the same way, we can obtain the following SOPs:
Step 10. The block LUTerFA is based on Table 5. Table 5 includes the following columns: Function (this is an assembled function produced by LUTerFA), CoreFC, and CorePC. If some function belonging to the set is generated by a LUT of the block CoreFC, then there is a 1 in the intersection of the row containing this function and the column CoreFC. The opposite situation is marked by 0. The column CorePC is divided by J subcolumns corresponding to the classes . The same principle is in play for placing either 1 or 0 in the rows of this part of Table 5.
Table 5.
Table of LUTerFA.
We use Table 2 to fill the rows of column CoreFC of Table 5. To fill the rows of subcolumn , we use Table 3 (Table 4).
Table 5 determines the R1 + R2 disjunctions of partial Boolean functions. The following disjunctions represent the circuit of the block LUTerFA:
Step 11. The block LUTerASV transforms the full codes into the partial state codes . This transformation is not executed for the states . The table of LUTerASV includes the following columns: , , , and . The last column includes the symbols of additional variables equal to 1 in the codes . In the discussed case, the full state codes are taken from Figure 7; the partial state codes are taken from Figure 9. Using these codes, we can create Table 6.
Table 6.
Table of LUTerASV.
Obviously, using Table 6 gives us the perfect SOPs [17] of SBF (12). To minimize the number of interconnections between the blocks LUTerFA and LUTerASV, we transform Table 6 into a multi-functional Karnaugh map (Figure 10).
Figure 10.
Karnaugh map for SBF ASV2(SV).
Figure 10 is based on Figure 7. This transformation is done in an obvious way. We have simply replaced the symbols of states from Figure 7 with symbols of corresponding additional variables. Additionally, the codes of states are “do not care” code combinations. Using Figure 10 gives us the following SBF:
There are 10 literals in SBF (34). If each function from (12) is represented by its perfect SOP, then these SOPs have literals. Therefore, using the multi-functional Karnaugh map allows for reducing the number of interconnections by 1.6 times. As shown in [31], the fewer interconnections a circuit has, the less power it consumes.
Step 12. During this step, various technology mapping procedures should be executed [45,46]. If the FPGA chip used is produced by AMD Xilinx, then their CAD tool Vivado [47] should be applied for implementing an FSM circuit. In the next section, we show some results based on using this CAD package to implement FSM circuits. Experiments allow us to compare the effectiveness of the proposed method in relation to some known methods.
At the end of this section, we will show how to estimate the hardware amount in the circuits of FSMs and . We start from FSM . To find the LUT counts for circuits of CoreFC, CorePC (the first logic level), and LUTerFA (the second logic level), it is necessary to analyze Table 5 (the table of LUTerFA). Each symbol “1” in this table corresponds to a LUT from the first logic level. In the table, there are 21 “1” symbols. Therefore, the first-level circuits consist of 21 LUTs. If a row of the table includes more than a single 1, then this row corresponds to a LUT from the second logic level. There are 7 LUTs in the circuit of LUTerFA. This can be found from Table 5. To find the LUT counts for blocks LUTerOF and LUTerASV creating the third logic level, we should analyze SOPs (29) and (34), respectively. If an SOP includes at least two literals, then it determines a LUT of the third logic level. As follows from (29), there are 6 such SOPs. The analysis of (34) shows that the system includes 4 such SOPs. Therefore, the third logic level includes 10 LUTs. Summing up the number of LUTs for different levels, we see that the circuit of FSM includes LUTs.
To estimate the number of LUTs in the circuit of FSM , it is necessary to find the compatibility classes for the set of states. Using the approach [20] gives the partition with . There are the following relations between the classes of and : , , and . This means that the table of LUTerPF (FSM ) is the same as Table 5. This gives 21 LUTs for the first logic level consisting of the blocks LUTer1–LUTer3. Also, there are 7 LUTs in the circuit of LUTerPF. The blocks LUTerOF are the same for both FSMs (each of which includes 6 LUTs). However, there is . This gives 6 LUTs in the block LUTerASV. In total, 12 LUTs create the third logic level of FSM . Summing up the number of LUTs for different levels, we see that there are LUTs in the circuit of .
Therefore, for such a simple FSM, we see a gain of 5.3% due to the transition from to . For more complex FSMs, the gain can be much higher. This statement is confirmed by the results of the research shown in the next section.
6. Experimental Results
As a basis for comparing the efficiency of different synthesis methods, we use the benchmark FSMs from the library [42]. The library includes 48 benchmarks of varying complexity (numbers of states, inputs, outputs, output collections, and interstate transitions). The STTs of benchmark FSMs are represented using the format KISS2. These benchmarks have been used by different designers as a representative sample to compare the main characteristics of proposed and known FSM circuits [33,34,36]. To give an idea of the complexity of these benchmarks, we show their characteristics in [19,42].
As a rule, in research, FSMs are considered as stand-alone units. In this case, the stability of the output signals is not one of the main design problems. However, in our current paper, we consider Mealy FSMs as some parts of digital systems. As follows, for example, from [14], Mealy FSMs are unstable. This means that input fluctuations result in output fluctuations. The output fluctuations can cause operation failure in a digital system. Output stabilization can be achieved due to using a synchronous input register (AIR) [19]. The following is a principle of interaction of an FSM and other digital system blocks (Figure 11).
Figure 11.
Interaction of FSM with other system blocks.
The system outputs are treated as FSM inputs forming the set I. As long as there are transients in the digital system, the synchronization signal Clk1 is equal to zero. This actually disconnects the FSM from other system blocks. When system outputs are stable, they are loaded into the AIR. Due to this, fluctuations in the system outputs do not affect the FSM output values. Of course, there is some overhead connected with this approach. Obviously, AIR consumes additional resources of the FPGA fabric. It also consumes some additional power and increases the value of FSM cycle time. Therefore, we took into account this overhead in our research.
In experiments, we use the Virtex-7 VC709 Evaluation Platform (xc7vx690tffg1761-2) [38]. Its FPGA chip xc7vx690tffg1761-2, produced by AMD Xilinx, is a base for implementing FSM circuits. For LUTs of this chip, there is . The step of technology mapping is executed by the CAD tool Vivado v2019.1 (64-bit) [47]. To create tables with experimental results, we use data from the reports produced by Vivado. The VHDL-based FSM models are used to connect the benchmarks with Vivado. We use the CAD tool K2F [10] to create VHDL codes corresponding to initial KISS2-based benchmark files.
From the Vivado reports, we have derived the following characteristics of Mealy FSM circuits: the number of LUTs (LUT count), value of cycle time, maximum operating frequency, and power consumption. As a basis for comparison, we have chosen four different FSMs. They are the following: 1. P Mealy FSMs with MBCs produced by the Auto method of Vivado; 2. P Mealy FSMs with one-hot state codes produced by the One-Hot method of Vivado; 3. JEDI-based P Mealy FSMs; and 4. -based FSMs with twofold state assignment [20]. We did not compare and PY Mealy FSMs. This is because FSMs have better characteristics than equivalent PY Mealy FSMs [20]. Therefore, if the proposed approach allows for improving characteristics compared to , then the results obtained will obviously be better than the results for equivalent PY Mealy FSMs.
As follows from [19,42], the values of LUT counts and other LUT-based FSM circuits’ characteristics strongly depend on the relation between the values of and . In the discussed case, there is . The benchmarks used have 5 complexity levels (C0–C4). These levels are determined in the following order. The benchmarks have the level C0 if . The level C0 determines trivial FSMs. The benchmarks have the level C1 if . The level C1 determines simple FSMs. The benchmarks have the level C2 if . The level C2 determines average FSMs. The benchmarks have the level C3 . The level C3 determines big FSMs. The benchmarks have the level C4 if . The level C4 determines very big FSMs.
The results of experiments are shown in Table 7 (the LUT counts), Table 8 (the minimum cycle times), Table 9 (the maximum operating frequencies), and Table 10 (the consumed power). There is a similar organization for each of these tables. Benchmark names are in the table rows. The investigated methods are shown in the table columns. The complexity of a particular benchmark is shown in the last column. The row “Sum” includes results of summation for corresponding columns. In the row “Percentage”, we show the percentage of the summarized characteristics of various FSM circuits with respect to -based FSMs.
Table 7.
Experimental results (LUT counts).
Table 8.
Experimental results (cycle time in nanoseconds).
Table 9.
Experimental results (maximum operating frequency in MHz).
Table 10.
Experimental results (consumed power in watts).
From Table 7, we can find that, compared to other investigated methods, the circuits of -based FSMs consume a minimum number of LUTs. The proposed approach provides the following gain: 1. 48.97% regarding the Auto-based FSMs; 2. 69.98% regarding the One-Hot-based FSMs; 3. 26.33% regarding the JEDI-based FSMs; and 4. 9.44% regarding the -based FSMs. In our opinion, this gain is associated with a decrease in the amount of transformed state codes compared to -based FSMs. Due to this, the LUT count in LUTerASV is less than 1 for the code transformer of equivalent -based FSMs. Additionally, the gain can be achieved due to reducing the cardinality number of the partition of states. The fulfillment of the condition provides a decrease in the required number of LUT inputs for elements of LUTerFA compared to that of the LUTs of LUTerPF. This phenomenon can lead to a decrease in the LUT count.
The following phenomenon is clear from Table 7: if an FSM has the complexity C0, then there are the same LUT counts for equivalent FSMs based on collection encoding. Moreover, in this case, other FSMs have better values of LUT counts than - and -based FSMs. We can explain this in the following way. If an FSM has the complexity C0, then the condition (4) takes place. In this case, each SOP (2) and (3) is implemented by a single LUT. Therefore, in this case, there is no need to use various structural decomposition methods. However, regardless of the validity of condition (4), the encoding of output collections is executed for both - and -based FSMs. As a result of this, the block LUTerOF is used. This block consumes additional LUTs compared to other researched methods. Due to validity of (4), there are no partial functions for FSMs having the complexity C0. As a result, there is no need to assemble blocks (LUTerFA and LUTerPF). This means that both and FSMs degenerate into equivalent PY FSMs.
Now, let us analyze the temporal characteristics of FSM circuits. They are represented in Table 8 (the cycle time measured in nanoseconds) and Table 9 (the maximum operating frequency measured in megahertz).
Analysis of Table 8 shows that JEDI-based FSMs are the fastest. It also shows that -based FSMs are marginally slower than circuits of -based FSMs (the average loss is 0.56%). At the same time, the proposed approach generates circuits with worse time characteristics than the circuits of P FSMs. The Auto-based FSMs are 0.09% faster than the -based FSMs. The One-Hot-based FSMs are 0.73% faster than the -based FSMs. Finally, JEDI-based FSMs are 5.93% faster than -based FSMs. If the FSM complexity exceeds C0, then both - and -based FSMs have three-level circuits. At the same time, it is difficult to estimate a priori the number of logic levels in circuits of P FSMs. It all depends on the number of literals in the implemented sum-of-products.
As follows from Table 8, if FSM complexity is equal to C0, then cycle times are the same for equivalent - and -based FSMs. This phenomenon takes place because, in this case, both - and -based FSMs turn into PY FSMs. However, if we look at the most complex FSMs having the complexity C4, we will see that the proposed method allows for obtaining the fastest circuits. Thus, the performance of FSMs becomes better and better as the synthesized FSMs become more complex.
As follows from Table 9, an average, the circuits of FSMs are slower compared to circuits of P-based FSMs. Our approach loses 1.6% to Auto-based FSMs. It loses 1.43% to One-Hot–based FSMs. The JEDI-based FSMs have the greatest gain (6.54%). Only -based FSMs are a bit slower than -based FSMs. Obviously, the reasons for the loss in frequency are the same as the reasons for the loss in cycle time. Additionally, analysis of Table 9 shows that, starting with complexity level C2, our method allows us to produce faster circuits compared to other methods under study.
It is known [48] that one of the most important characteristics of FSM circuits is their power consumption. In particular, it is important in the case of mobile and autonomous cyber-physical systems [49]. Very often, a designer should make the choice among the area-temporal characteristics and the power consumption of a particular device. The values of power consumption can be taken from the Vivado reports. The power consumption is measured for the maximum possible value of the operating frequency. We show the experimental results for power consumption in Table 10.
The proposed method reduces the numbers of LUTs in FSM circuits compared with this characteristic of equivalent -based FSMs. Very often, such improvement results in an increase in power consumption [19]. This phenomenon takes place for our method. However, as follows from comparison of - and -based FSMs (Table 10), FSMs have a very small gain in power consumption. Compared to -based FSMs, the loss in power consumption averages 1.55%. Additionally, JEDI-based FSMs require less power than equivalent FSMs. The proposed approach allows for obtaining FSM circuits with less power consumption than for both Auto-based FSMs (11.95% of gain) and One-Hot-based FSMs (19.29% of gain).
If FSMs have complexity C0, then both and FSMs have equal values of power consumption. If the FSM complexity exceeds C0, then FSMs always require less power than equivalent FSMs. We see the following reason for this situation. In FSMs, the state variables enter only block LUTerASV. In contrast to this, in FSMs, the outputs of LUTerFA are connected with two blocks (LUTerASV and CoreFC). It is known [31] that interconnections consume up to 70% of power. Therefore, the more interconnections, the more power is consumed.
Let us sum up some results of the comparison of equivalent and FSMs. If FSMs have complexity C0, then there are the same values of basic characteristics for both models. For other levels of complexity, FSMs have better spatial characteristics (the required FPGA chip area) than they do in their single-core counterparts based on twofold state assignment. For rather simple FSMs, FSMs have better temporal characteristics. However, as the complexity increases, the cycle times (and maximum operating frequencies) of FSMs gradually become better than in their single-core counterparts. The FSM circuits based on the proposed method always require more power. However, this loss is very small (it does not exceed 2% on average). This comparison leads to the following conclusion: FSMs should be used instead of FSMs if the required chip area is the main optimality criterion of designed LUT-based circuits. This conclusion is supported by diagrams shown in Figure 12.
Figure 12.
Percent summary of results.
Under certain conditions, the proposed method can be applied to implement the LUT-based circuit of any sequential block. In this case, neither the algorithm for the functioning of this block nor the scope of the digital system in which this block operates is important. The possibility of applying the model of FSM depends on the distribution of inputs between the states . If this distribution leads to the fulfillment of condition (4), then there is no need for optimization (because the circuit of P FSM has the best possible characteristics). If condition (4) is violated but the distribution leads to the fulfillment of condition (10), then the method can be applied. Otherwise, it is impossible to find a partition of the set of states for which each partial function is represented by a single-LUT circuit. The proposed method can be applied only if condition (18) is satisfied for some states . In this case, the corresponding partial functions depending on the state variables are implemented using single-LUT circuits. The more states that satisfy condition (18), the greater the gain from applying our method compared to using FSMs. However, if condition (18) is satisfied for all states, then there is no point in applying either or FSMs. In this case, both of these models degenerate into a PY FSM. Thus, it is advisable to use the proposed method only if condition (18) is satisfied for a number of states (but not for all M states), and condition (10) for the rest.
7. Generalized FSM Architecture
Unfortunately, there is a condition where the proposed method cannot be applied. For a given FSM, let the set of states include at least a single state for which the following condition is satisfied:
It is obvious that the state satisfying condition (35) cannot be included in either set or set S1. To obtain partial functions generated during transitions from this state, it is necessary to apply the methods of functional decomposition. Thus, to take into account the presence of such states, it is necessary to introduce a CoreFD based on functional decomposition into the architecture of FSM shown in Figure 5.
We propose to split the set S by three disjoint sets (, , ). The set includes states satisfying condition (18). The set includes states satisfying condition (35). The set includes the rest of the states, i.e., . The transitions from states are determined by FSM inputs creating the set I3. To encode these states, it is necessary to create the set of state variables ASV3. This set includes its own unique state variables. Three sets of PBFs are generated by LUTs of CoreFD: (IMFs generated during the transitions from states ); (additional variables encoding the OCs generated during the transitions from states ); and AV3 (unique additional variables encoding the OCs generated during the transitions from states ). Therefore, the following partial SBFs are generated by LUTs of CoreFD:
We denoted as the proposed generalized architecture of the LUT-based FSM circuit. Here the letter “F” means the presence of the block CoreFD. The proposed generalized architecture is shown in Figure 13.
Figure 13.
Generalized architecture of Mealy FSM.
The generalized architecture (Figure 13) includes three cores of PBFs. CoreFC generates PBFs for states satisfying condition (18). CoreFD generates PBFs for states satisfying condition (35). CorePC generates PBFs for the rest of the states.
In the FSM, LUTerFA generates the full functions represented by the following systems of disjunctions:
LUTerOF implements SBF (7). However, now the set AV is represented in the following form: . To encode the states of a FSM, the set is used, where . Therefore, LUTerASV generates the SBF:
Naturally, the proposed architecture is universal. In this paper, we propose the following method for synthesizing an FSM with a generalized architecture:
- Creating an STT of a P Mealy FSM.
- Pre-formation of sets , S2, and S3.
- Pre-formation of partition of set S2.
- Final formation of sets and S2 and partition .
- Creating full state codes for states .
- Encoding of output collections and finding SBF (7).
- Encoding of states by partial state codes .
- Encoding of states by partial state codes .
- Creating a table of LUTerASV and system (41).
- Implementing a Mealy FSM circuit using internal resources of a particular FPGA chip.
We hope that all the presented steps of this method are clear from the previous text. We do not, however, consider this method in detail. This will be the subject of a separate study. Now we will show that the generalized architecture (Figure 13) generates 6 more architectures. Three conditions are used for this purpose. The fulfillment of condition (18) indicates the presence of the block CoreFC in the FSM circuit architecture. This means that the set contains at least one element. The fulfillment of condition (35) indicates the presence of the block CoreFD. In this case, the set S3 contains at least one element. Finally, the fulfillment of the condition
indicates the presence of the block CorePC. In this case, the set S2 contains at least one element. We show the possible FSM models in Table 11. Additionally, the table rows contain conditions (or their conjunctions) in which a particular architecture should be used.
Table 11.
Possible FSM models.
The first three columns of the table contain the names of the sets (, , ) and corresponding architectural blocks (CoreFC, CorePC, CoreFD). The fourth column contains the model designation. The fifth column shows which combination of conditions leads to the model from a particular row. If there is a zero (one) at the intersection of the column with the block and the row with the model, then this block is not included (is included) in the FSM architecture corresponding to this row.
For example, if all states satisfy condition (35), then the architecture includes only CoreFD. We denote this architecture by the symbol . This is the first row of Table 11. If some of states satisfy condition (35) and others satisfy condition (42), then the architecture includes blocks CorePC and CoreFD (row 3). This leads to FSMs, and so on. The last row corresponds to the generalized FSM architecture, including three cores of partial functions.
Using Table 11 and generalized architecture, we can obtain the architecture for any model represented by this table. Obviously, it is possible to transform the design method for FSMs into a design method for any other model. In this case, of particular interest is the implementation of Step 2 of the proposed method and the definition of a model corresponding to its outcome. We have presented the algorithm for performing these steps in Figure 14.
Figure 14.
Selection of FSM model.
Let us consider this algorithm. Block 1 shows the initial information (FSM is represented by STG and the FPGA chip is represented by the number of inputs of the LUT element). Next, this STG must be converted to the equivalent STT (block 2).
The distribution of states over sets is performed in a cycle, including blocks 3–7. The distribution starts from the first state (block 3). In block 4, condition (18) is checked. If this condition is met (output “Yes” from block 4), then the state is placed in set (block FC). If this condition is violated (output “No” from block 4), then condition (42) is checked (block 5). If this condition is met (output “Yes” from block 5), then the state is placed in set S3 (block FD). If this condition is violated (output “No” from block 5), then the state is placed in set S2 (block PC). The analysis of the next state begins (block 6). If all states are distributed (output “Yes” from block 7), then the FSM architecture selection begins (transition to block 8). Otherwise (output “No” from block 7), the analysis continues (transition to block 4).
To choose an architecture, we analyze whether empty sets are obtained in the process of distributing states. The analysis begins with checking the set (block 8). As follows from Table 11, if the set is empty (output “Yes” from block 8), then the choice is made among three architectures (, , ). Set S2 is analyzed (block 9). If it is empty (output “Yes” from block 9), then the FSM is selected (block 11). If set S2 is not empty (output “No” from block 9), then set S3 is analyzed (block 12). If it is empty (output “Yes” from block 12), then the FSM is selected (block 15). If set S3 is not empty (output “No” from block 12), then the FSM is selected (block 16).
If the set is not empty (output “No” from block 8), then the choice is made among four architectures (, , , ). Set S2 is analyzed (block 10). If it is empty (output “Yes” from block 10), then set S3 is analyzed (block 13). If S3 is empty (output “Yes” from block 13), then the FSM PY is selected (block 17). If S3 is not empty (output “No” from block 13), then the FSM is selected (block 18). If set S2 is not empty (output “No” from block 10), then set S3 is analyzed (block 14). If S3 is empty (output “Yes” from block 14), then the FSM is selected (block 19). If S3 is not empty (output “No” from block 14), then the FSM is selected (block 20).
8. Conclusions
Modern FPGAs are widely used in the design of cyber-physical systems [13]. These chips are very powerful: a single FPGA chip is enough for implementing practically any block (either combinational or sequential) of modern CPSs [50]. The reverse side of the FPGA universality is an extremely small number of LUT inputs [21,51]. This is a serious drawback that significantly complicates the design process. As a result, various methods of functional decomposition should be applied in the step of technology mapping. It is known that FD-based circuits are multi-level. The disadvantages of layered circuits are well known: they are slower and less energy efficient than equivalent single-level counterparts.
Better results can be obtained by replacing the functional decomposition with the structural one [10]. This is proved, for example, in the work [19]. In the paper [20], the FSM circuit optimization is achieved due to using the twofold state assignment and encoding of output collections. The resulting FSM circuits have better values of LUT counts than their FD-based counterparts. However, the twofold state assignment is connected with the transformation of maximum binary state codes into their extended equivalents. As a result, a code transformer should be used that consumes some additional resources of the FPGA fabric.
To reduce the LUT count in the circuits of -based FSMs, we propose to use two LUT-based blocks (cores). To do this, we use the main ideas from the paper [22]. Both cores generate systems of partial Boolean functions. This leads to FSMs having the following peculiarity: one of the cores uses the MBCs, whereas the second core uses the partial state codes. Our approach reduces LUT counts and slightly improves temporal characteristics as compared to equivalent -based FSMs. The overhead of the proposed method is a rather insignificant increase in consumption of power (up to 1.56% on average). We hope the proposed FSMs can function as an efficient tool for implementing FPGA-based sequential devices in modern cyber-physical systems.
The conducted experiments have shown that, under certain conditions, the proposed method allows for better results than methods based entirely on either maximum binary or one-hot state codes. If some partial functions are implemented using a single LUT, then our method allows for improving the spatial, temporal, and energy characteristics of the LUT-based circuits of sequential blocks. We think that our method can be modified to take into account the use of state assignment methods other than the twofold one. In this, we see the further directions for the proposed method development.
Under certain conditions, the transition from the proposed model to other models is possible (Table 11). In the most general case, the FSM architecture consists of three cores of partial functions. There are also three dual-core architectures. One of the directions of our further research is the development of synthesis methods and the study of the characteristics of LUT-based FSM circuits based on these two- and three-core models.
Author Contributions
Conceptualization, A.B., L.T. and K.K.; methodology, A.B., L.T., K.K. and S.S.; software, A.B., L.T. and K.K.; validation, A.B., L.T. and K.K.; formal analysis, A.B., L.T., K.K. and S.S.; investigation, A.B., L.T. and K.K.; writing—original draft preparation, A.B., L.T., K.K. and S.S.; supervision, A.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study are available in the article.
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| CLB | configurable logic block |
| CPS | cyber-physical system |
| ESC | extended state code |
| FD | functional decomposition |
| FPGA | field-programmable gate array |
| FSM | finite state machine |
| IMF | input memory function |
| LUT | look-up table |
| MBC | maximum binary code |
| OC | output collection |
| PBF | partial Boolean function |
| SBF | system of Boolean functions |
| SD | structural decomposition |
| SOP | sum-of-products |
| STG | state transitions graph |
| STT | state transition table |
| TSA | twofold state assignment |
References
- Alur, R. Principles of Cyber-Physical Systems; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
- Suh, S.C.; Tanik, U.J.; Carbone, J.N.; Eroglu, A. Applied Cyber-Physical Systems; Springer: New York, NY, USA, 2014. [Google Scholar]
- Marwedel, P. Embedded System Design: Embedded Systems Foundations of Cyber-Physical Systems, and the Internet of Things, 3rd ed.; Springer International Publishing: New York, NY, USA, 2018. [Google Scholar]
- Kovtun, V.; Izonin, I.; Gregus, M. Reliability model of the security subsystem countering to the impact of typed cyber-physical attacks. Sci. Rep. 2022, 121, 12849. [Google Scholar] [CrossRef]
- Wojnakowski, M.; Wisniewski, R.; Bazydlo, G.; Poplawski, M. Analysis of safeness in a Petri net-based specification of the control part of cyber-physical systems. Int. J. Appl. Math. Comput. Sci. 2021, 31, 647–657. [Google Scholar]
- Wisniewski, R.; Bazydlo, G.; Gomes, L.; Costa, A.; Wojnakowski, M. Analysis and design automation of cyber-physical system with hippo and IOPT-tools. In Proceedings of the IECON 2019—45th Annual Conference of the IEEE Industrial Electronics Society, Lisbon, Portugal, 14–17 October 2019; Volume 1, pp. 5843–5848. [Google Scholar]
- Bazydlo, G.; Costa, A.; Gomes, L. Integrating different modelling formalisms supporting co-design development of controllers for cyber-physical systems—A case study. In Proceedings of the 2022 IEEE 9th International Conference on e-Learning in Industrial Electronics (ICELIE), Brussels, Belgium, 17–20 October 2022; pp. 1–6. [Google Scholar]
- Wisniewski, R.; Wojnakowski, M.; Li, Z. Design and Verification of Petri-Net-Based Cyber-Physical Systems Oriented toward Implementation in Field-Programmable Gate Arrays—A Case Study Example. Energies 2023, 16, 67. [Google Scholar] [CrossRef]
- Wisniewski, R.; Benysek, G.; Gomes, L.; Kania, D.; Simos, T.; Zhou, M. IEEE Access Special Section: Cyber-Physical Systems. IEEE Access 2019, 7, 157688–157692. [Google Scholar] [CrossRef]
- Barkalov, A.; Titarenko, L.; Mazurkiewicz, M. Foundations of Embedded Systems; Springer International Publishing: New York, NY, USA, 2019. [Google Scholar]
- Gajski, D.D.; Abdi, S.; Gerstlauer, A.; Schirner, G. Embedded System Design: Modeling, Synthesis and Verification; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Gazi, O.; Arli, A. State Machines Using VHDL: FPGA Implementation of Serial Communication and Display Protocols; Springer: Berlin, Germany, 2021; p. 326. [Google Scholar]
- Bhattacharjya, A.; Wisniewski, R.; Nidumolu, V. Holistic Research on Blockchain’s Consensus Protocol Mechanisms with Security and Concurrency Analysis Aspects of CPS. Electronics 2022, 11, 2760. [Google Scholar] [CrossRef]
- Baranov, S. Logic Synthesis of Control Automata; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1994. [Google Scholar]
- Czerwinski, R.; Kania, D. Finite State Machine Logic Synthesis for Complex Programmable Logic Devices; Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2013; Volume 231. [Google Scholar]
- Baranov, S. Finite State Machines and Algorithmic State Machines: Fast and Simple Design of Complex Finite State Machines; Amazon: Seattle, WA, USA, 2018; p. 185. [Google Scholar]
- Micheli, G.D. Synthesis and Optimization of Digital Circuits; McGraw-Hill: Cambridge, MA, USA, 1994. [Google Scholar]
- Islam, M.M.; Hossain, M.S.; Shahjalal, M.D.; Hasan, M.K.; Jang, Y.M. Area-time efficient hardware implementation of modular multiplication for elliptic curve cryptography. IEEE Access 2020, 8, 73898–73906. [Google Scholar] [CrossRef]
- Barkalov, A.; Titarenko, L.; Krzywicki, K. Structural Decomposition in FSM Design: Roots, Evolution, Current State—A Review. Electronics 2021, 10, 1174. [Google Scholar] [CrossRef]
- Barkalov, O.; Titarenko, L.; Mielcarek, K. Hardware reduction for LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2018, 28, 595–607. [Google Scholar] [CrossRef]
- AMD Xilinx FPGAs. Available online: https://www.xilinx.com/products/silicon-devices/fpga.html (accessed on 1 March 2023).
- Barkalov, A.; Titarenko, L.; Krzywicki, K. Using a Double-Core Structure to Reduce the LUT Count in FPGA-Based Mealy FSMs. Electronics 2022, 11, 3089. [Google Scholar] [CrossRef]
- Baranov, S. High-Level Synthesis of Digital Systems: For Data-Path and Control Dominated Systems; Amazon: Seattle, WA, USA, 2018; p. 207. [Google Scholar]
- Kubica, M.; Opara, A.; Kania, D. Logic Synthesis Strategy Oriented to Low Power Optimization. Appl. Sci. 2021, 11, 8797. [Google Scholar] [CrossRef]
- Zhao, X.; He, Y.; Chen, X.; Liu, Z. Human-Robot collaborative Assembly Based on Eye-Hand and a Finite State Machine in a Virtual Environment. Appl. Sci. 2021, 11, 5754. [Google Scholar] [CrossRef]
- Koo, B.; Bae, J.; Kim, S.; Park, K.; Kim, H. Test case generation method for increasing software reliability in Safety-Critical Embedded Systems. Electronics 2020, 9, 797. [Google Scholar] [CrossRef]
- Senhadji-Navarro, R.; Garcia-Vargas, I. Methodology for Distributed-ROM-based Implementation of Finite State Machines. IEEE Trans.-Comput. Des. Integr. Circuits Syst. 2020, 40, 2411–2415. [Google Scholar] [CrossRef]
- Skliarova, I. A Survey of Network-Based Hardware Accelerators. Electronics 2022, 11, 1029. [Google Scholar] [CrossRef]
- Mishchenko, A.; Brayton, R.; Jiang, J.H.; Jang, S. RESP: Ok. Scalable don’t-care-based logic optimization and resynthesis. ACM Trans. Reconfigurable Technol. Syst. 2011, 4, 1–23. [Google Scholar] [CrossRef]
- El-Maleh, A.H. A Probabilistic Tabu Search State Assignment Algorithm for Area and Power Optimization of Sequential Circuits. Arab. J. Sci. Eng. 2020, 45, 6273–6285. [Google Scholar] [CrossRef]
- Feng, W.; Greene, J.; Mishchenko, A. Improving FPGA performance with a S44 LUT structure. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 25–27 February 2018; pp. 61–66. [Google Scholar]
- Chapman, K. Multiplexer Design Techniques for Datapath Performance with Minimized Routing Resources. Application Note. 2012. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.259.5300&rep=rep1&type=pdf (accessed on 1 March 2023).
- Senhadji-Navarro, R.; Garcia-Vargas, I. Mapping Arbitrary Logic Functions onto Carry Chains in FPGAs. Electronics 2022, 11, 27. [Google Scholar] [CrossRef]
- Kubica, M.; Opara, A.; Kania, D. Technology Maping for LUT-Based FPGA; Springer: Berlin, Germany, 2021; p. 208. [Google Scholar]
- Solov’ev, V.V. Implementation of finite-state machines based on programmable logic ICs with the help of the merged model of Mealy and Moore machines. J. Commun. Technol. Electron. 2013, 58, 172–177. [Google Scholar] [CrossRef]
- Park, J.; Yoo, H. Area-efficient fault tolerance encoding for Finite State Machines. Electronics 2020, 9, 1110. [Google Scholar] [CrossRef]
- Baranov, S. From Algorithm to Digital System: HSL and RTL Tool Sinthagate in Digital System Design; Amazon: Seattle, WA, USA, 2020; p. 76. [Google Scholar]
- Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving Characteristics of LUT-Based Three-Block Mealy FSMs’ Circuits. Electronics 2022, 11, 950. [Google Scholar] [CrossRef]
- Khatri, S.P.; Gulati, K. Advanced Techniques in Logic Synthesis, Optimizations and Applications; Springer: New York, NY, USA, 2011. [Google Scholar]
- Sentowich, E.; Singh, K.; Lavango, L.; Moon, C.; Murgai, R.; Saldanha, A.; Savoj, H.; Stephan, P.R.; Bryton, R.K.; Sangiovanni-Vincentelli, A.L. SIS: A System for Sequential Circuit Synthesis; Technical Report; University of California, Berkely: Berkely, CA, USA, 1992. [Google Scholar]
- Tatalov, E. Synthesis of Compositional Microprogram Control Units for Programmable Devices. Master’s Thesis, Donetsk National Technical University, Donetsk, Ukraine, 2011. [Google Scholar]
- McElvain, K. LGSynth93 Benchmark; Mentor Graphics: Wilsonville, OR, USA, 1993. [Google Scholar]
- Scholl, C. Functional Decomposition with Application to FPGA Synthesis; Kluwer Academic Publishers: Boston, MA, USA, 2001. [Google Scholar]
- Barkalov, A.; Titarenko, L.; Krzywicki, K. Reducing LUT Count for FPGA-Based Mealy FSMs. Appl. Sci. 2020, 10, 5115. [Google Scholar] [CrossRef]
- Kubica, M.; Kania, D. Technology mapping oriented to adaptive logic modules. Bull. Pol. Acad. Sci. 2019, 67, 947–956. [Google Scholar]
- Mishchenko, A.; Chattarejee, S.; Brayton, R. Improvements to technology mapping for LUT-based FPGAs. IEEE Trans. CAD 2006, 27, 240–253. [Google Scholar]
- Vivado Design Suite User Guide: Synthesis. UG901 (v2019.1). Available online: https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug901-vivado-synthesis.pdf (accessed on 1 March 2023).
- Tiwari, A.; Tomko, K.A. Saving power by mapping finite-state machines into embedded memory blocks in FPGAs. Proc. Des. Autom. Test Eur. Conf. Exhib. 2004, 2, 916–921. [Google Scholar]
- Lucía, Ó.; Monmasson, E.; Navarro, D.; Barragán, L.A.; Urriza, I.; Artigas, J.I. Modern control architectures and implementation. Control Power Electron. Convert. Syst. 2018, 2, 477–502. [Google Scholar]
- Ruiz-Rosero, J.; Ramirez-Gonzalez, G.; Khanna, R. Field Programmable Gate Array Applications—A Scientometric Review. Computation 2019, 7, 63. [Google Scholar] [CrossRef]
- Altera. Cyclone IV Device Handbook. Available online: http://www.altera.com/literature/hb/cyclone-iv/cyclone4-handbook.pdf (accessed on 1 March 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).