Figure 1.
Block-based Neural Network layout.
Figure 1.
Block-based Neural Network layout.
Figure 2.
Processing Element (PE) schemes considered for the Block-based Neural Network.
Figure 2.
Processing Element (PE) schemes considered for the Block-based Neural Network.
Figure 3.
Inner feedback loops of a Block-based Neural Network (BbNN) configuration.
Figure 3.
Inner feedback loops of a Block-based Neural Network (BbNN) configuration.
Figure 4.
(a) Scalable architecture using multiple isolated reconfigurable regions connected to the static system. (b) Scalable architecture using multiple reconfigurable regions arranged in a slot style, where one reconfigurable module (RM) can span multiple reconfigurable regions (RRs).
Figure 4.
(a) Scalable architecture using multiple isolated reconfigurable regions connected to the static system. (b) Scalable architecture using multiple reconfigurable regions arranged in a slot style, where one reconfigurable module (RM) can span multiple reconfigurable regions (RRs).
Figure 5.
Scalable architectures using reconfigurable interconnections.
Figure 5.
Scalable architectures using reconfigurable interconnections.
Figure 6.
Fixed-point representation used in this work.
Figure 6.
Fixed-point representation used in this work.
Figure 7.
Comparation of the approximate sigmoid function and the real function.
Figure 7.
Comparation of the approximate sigmoid function and the real function.
Figure 8.
Proposed structure for the BbNN processing element. (a) Shows the interface and internal connections of one possible type of processing element (PE). (b) Exposes internal blocks of a generic PE.
Figure 8.
Proposed structure for the BbNN processing element. (a) Shows the interface and internal connections of one possible type of processing element (PE). (b) Exposes internal blocks of a generic PE.
Figure 9.
Values of the selection parameter register for the subscribed neuron type and read sequence.
Figure 9.
Values of the selection parameter register for the subscribed neuron type and read sequence.
Figure 10.
Block-based Neural Network Intellectual Property (IP) with fine-grain reconfigurable elements in each PE.
Figure 10.
Block-based Neural Network Intellectual Property (IP) with fine-grain reconfigurable elements in each PE.
Figure 11.
Configurations with the same dimensions and selected output but different latency. (a) shows a configuration with a 10 latency cycles; meanwhile (b) exposes a configuration with 18 latency cycles.
Figure 11.
Configurations with the same dimensions and selected output but different latency. (a) shows a configuration with a 10 latency cycles; meanwhile (b) exposes a configuration with 18 latency cycles.
Figure 12.
BbNN configuration encoded in the chromosome structure. (a) presents the representation of the chromosome structure. (b) exposes and example of dataflow configuration from bits in E_param and N_param.
Figure 12.
BbNN configuration encoded in the chromosome structure. (a) presents the representation of the chromosome structure. (b) exposes and example of dataflow configuration from bits in E_param and N_param.
Figure 13.
(a) Empty reconfigurable region. (b) reconfigurable region with 3 × 3 BbNN. (c) reconfigurable region with 4 × 4 BbNN showing LUT-based constants grouped in columns.
Figure 13.
(a) Empty reconfigurable region. (b) reconfigurable region with 3 × 3 BbNN. (c) reconfigurable region with 4 × 4 BbNN showing LUT-based constants grouped in columns.
Figure 14.
Connecting the external edges of the BbNN using reconfigurable interconnections.
Figure 14.
Connecting the external edges of the BbNN using reconfigurable interconnections.
Figure 15.
(a) Shows a BbNN static system implementation that can allocate up to 3 × 5 PEs. (b) Shows how neuron blocks can be arranged inside the reconfigurable region at run-time to form a 1 × 5 BbNN.
Figure 15.
(a) Shows a BbNN static system implementation that can allocate up to 3 × 5 PEs. (b) Shows how neuron blocks can be arranged inside the reconfigurable region at run-time to form a 1 × 5 BbNN.
Figure 16.
Progression of the fitness value during the XOR training for a 2 × 2 BbNN.
Figure 16.
Progression of the fitness value during the XOR training for a 2 × 2 BbNN.
Figure 17.
Solution for the XOR problem.
Figure 17.
Solution for the XOR problem.
Figure 18.
Influence of the BbNN size in the convergence of the algorithm for XOR problem. Each graph exposes the data from 100 executions of the EA. The generations needed to achieve a solution are segmented in intervals, from 0 to 1000 generations. Executions over 1000 generations are stopped. Convergence of different BbNN sizes is analyzed: 2 × 2 BbNN (a), 3 × 2 BbNN (b), 4 × 2 BbNN (c) and 5 × 2 BbNN (d).
Figure 18.
Influence of the BbNN size in the convergence of the algorithm for XOR problem. Each graph exposes the data from 100 executions of the EA. The generations needed to achieve a solution are segmented in intervals, from 0 to 1000 generations. Executions over 1000 generations are stopped. Convergence of different BbNN sizes is analyzed: 2 × 2 BbNN (a), 3 × 2 BbNN (b), 4 × 2 BbNN (c) and 5 × 2 BbNN (d).
Figure 19.
Resolution of the XOR problem using the dynamic scalability feature. At generation 211 the Evolutionary Algorithm (EA) increases the BbNN size and resets evolution. At generation 249 the EA converges towards a solution.
Figure 19.
Resolution of the XOR problem using the dynamic scalability feature. At generation 211 the Evolutionary Algorithm (EA) increases the BbNN size and resets evolution. At generation 249 the EA converges towards a solution.
Figure 20.
OpenAI Mountain Car environment and coordinate system used to determine the position of the car. The hills are generated with the sin(3x) function.
Figure 20.
OpenAI Mountain Car environment and coordinate system used to determine the position of the car. The hills are generated with the sin(3x) function.
Figure 21.
Example of progression of the fitness value during the Mountain Car training for 2 × 2 BbNN.
Figure 21.
Example of progression of the fitness value during the Mountain Car training for 2 × 2 BbNN.
Figure 22.
Influence of the BbNN size in the convergence of the algorithm for Mountain Car problem. Each graph exposes the data from 100 executions of the EA. The generations needed to achieve a solution are segmented in intervals, from 0 to 1000 generations. Executions over 1000 generations are stopped. Convergence of different BbNN sizes is analyzed: 1 × 2 BbNN (a), 2 × 2 BbNN (b), 3 × 2 BbNN (c) and 4 × 2 BbNN (d).
Figure 22.
Influence of the BbNN size in the convergence of the algorithm for Mountain Car problem. Each graph exposes the data from 100 executions of the EA. The generations needed to achieve a solution are segmented in intervals, from 0 to 1000 generations. Executions over 1000 generations are stopped. Convergence of different BbNN sizes is analyzed: 1 × 2 BbNN (a), 2 × 2 BbNN (b), 3 × 2 BbNN (c) and 4 × 2 BbNN (d).
Figure 23.
Cart Pole environment.
Figure 23.
Cart Pole environment.
Figure 24.
Example of progression of the fitness value during the Cart Pole training for 3 × 4 BbNN.
Figure 24.
Example of progression of the fitness value during the Cart Pole training for 3 × 4 BbNN.
Figure 25.
Influence of the BbNN size in the convergence of the algorithm for Cart Pole problem. Each graph exposes the data from 100 executions of the EA. The generations needed to achieve a solution are segmented in intervals, from 0 to 1000 generations. Executions over 1000 generations are stopped. Convergence of different BbNN sizes is analyzed: 1 × 4 BbNN (a), 2 × 4 BbNN (b), 3 × 4 BbNN (c) and 4 × 4 BbNN (d).
Figure 25.
Influence of the BbNN size in the convergence of the algorithm for Cart Pole problem. Each graph exposes the data from 100 executions of the EA. The generations needed to achieve a solution are segmented in intervals, from 0 to 1000 generations. Executions over 1000 generations are stopped. Convergence of different BbNN sizes is analyzed: 1 × 4 BbNN (a), 2 × 4 BbNN (b), 3 × 4 BbNN (c) and 4 × 4 BbNN (d).
Figure 26.
BbNN solution for Cart Pole problem.
Figure 26.
BbNN solution for Cart Pole problem.
Figure 27.
Cart Pole training and re-train for 3 × 4 BbNN.
Figure 27.
Cart Pole training and re-train for 3 × 4 BbNN.
Figure 28.
Mountain Car training and re-train for 2 × 2 BbNN.
Figure 28.
Mountain Car training and re-train for 2 × 2 BbNN.
Table 1.
Signal coding for parameter selection.
Table 1.
Signal coding for parameter selection.
SelX | Input | SelW | Weight | SelB | Bias | SelY | Output |
---|
00 | | 00 | | 0 | | 0001 | |
01 | | 01 | | 1 | | 0010 | |
10 | | 10 | | - | - | 0100 | |
11 | | 11 | | - | - | 1000 | |
- | - | - | - | - | - | 0000 | Reset acc. |
Table 2.
Parameters of the Evolutionary Algorithm.
Table 2.
Parameters of the Evolutionary Algorithm.
Parameter | Type | Value | Functionality |
---|
TargetFitness | Float (0, 1.0) | Application dependant | Desired fitness |
Pop-size | Int | 15 | Number of chromosomes in the population |
N-offspring | Int | 10 | Number of mutated copies from one chromosome |
MaxAge | Int | 7 | Maximum number of stalled generations |
ExtinctionFreq | Int | 5 | Generations between extinctions |
MutationRate | Float (0, 1.0) | 0.3 | Percentage of data altered in the mutation |
Table 3.
Resource utilization of the BbNN implemented on a Zynq XC7Z020.
Table 3.
Resource utilization of the BbNN implemented on a Zynq XC7Z020.
Resource Type | Static System | Individual PE |
---|
LUTs | 7966 | 473 (95.84%) * |
FFs | 7939 | 163 (16.98%) * |
DSPs | 0 | 1 (25%) * |
BRAMs | 2 | 0 |
Table 4.
Resource utilization per individual PE in comparison with other works in the state-of-the-art.
Table 4.
Resource utilization per individual PE in comparison with other works in the state-of-the-art.
Work | Platform | Logic Elements * | Memory Elements | DSP Elements | Activation Function |
---|
Proposed architecture | Zynq XC7Z020 | 473 | 163 FFs | 1 | Sigmoid-no DSP |
Nambiar [21] | Stratix III | 231 | 276 FFs | 2 | Tanh-piecewise |
Jewajinda [49] | Virtex V | 263 | 341 FFs | 1 | Sigmoid-LUT based |
Merchant [19] | Virtex-II Pro | 338 | 4BRAM | 1 | Sigmoid-LUT based |
Lee and Hamagami [18] ** | Stratix IV | 186 | 40 FFs | 8 | Linear |
Table 5.
Time breakdown for a 3 × 3 BbNN.
Table 5.
Time breakdown for a 3 × 3 BbNN.
Task | Inference | Training |
---|
BbNN computation | <0.63 s | <0.63 s |
Input data transference | 6.1 s | 6.1 s |
Output data transference | 4.3 s | 4.3 s |
BbNN configuration | - | 41.7 s |
Fitness computation | - | Application dependant |
Throughput | 90.66 KOPS | Application dependant |
Table 6.
Performance comparison for different BbNN sizes.
Table 6.
Performance comparison for different BbNN sizes.
BbNN Size | BbNN Computation (s) | Input Data (s) | Output Data (s) | BbNN Configuration (s) |
---|
Fine-Grain | AXI | Fine-Grain | AXI | Fine-Grain | AXI |
---|
1 × 3 BbNN | <0.21 s | 6.1 | 4.5 | - | 4.3 | 15.8 | 19.7 |
3 × 3 BbNN | <0.63 s | 6.1 | 4.5 | - | 4.3 | 41.7 | 58.9 |
3 × 5 BbNN | <1.05 s | 6.6 | 7.7 | - | 7 | 56 | 94.1 |
Table 7.
Influence of the BbNN size on the training process for the XOR problem. Average stats from 100 convergent training processes.
Table 7.
Influence of the BbNN size on the training process for the XOR problem. Average stats from 100 convergent training processes.
Performance Indicator | 2 × 2 BbNN | 3 × 2 BbNN | 4 × 2 BbNN | 5 × 2 BbNN |
---|
Best fitness | 0.95 | 0.97 | 0.98 | 0.95 |
Average tested configurations | 19,954 | 13,434 | 13,036 | 20,090 |
Average generations | 133 | 91 | 87 | 140 |
Table 8.
Influence of the BbNN size on the training process for Mountain Car problem. Average stats from 100 training processes.
Table 8.
Influence of the BbNN size on the training process for Mountain Car problem. Average stats from 100 training processes.
Performance Indicator | 1 × 2 BbNN | 2 × 2 BbNN | 3 × 2 BbNN | 4 × 2 BbNN |
---|
Best fitness | 0.41 | 0.46 | 0.45 | 0.43 |
Average tested configurations | 323 | 1.156 | 3.899 | 984 |
Average generations | 3 | 8 | 26 | 7 |
Table 9.
Influence of the BbNN size on the training process for Cart Pole problem. Average stats from 100 training processes.
Table 9.
Influence of the BbNN size on the training process for Cart Pole problem. Average stats from 100 training processes.
Performance Indicator | 1 × 4 BbNN | 2 × 4 BbNN | 3 × 4 BbNN | 4 × 4 BbNN |
---|
Best fitness | 0.977 | 0.973 | 0.970 | 0.978 |
Average tested configurations | 518 | 2101 | 8651 | 2051 |
Average generations | 4 | 14 | 58 | 14 |
Table 10.
Initial and modified conditions for online training.
Table 10.
Initial and modified conditions for online training.
Problem | Parameter | Normal Value | Modified Value |
---|
Cart Pole | Gravity | 9.8 m/s | 20.0 m/s |
Pole length | 0.5 m | 0.1 m |
Cart mass | 1.0 Kg | 1.5 Kg |
Mountain Car | Engine power | 0.001 | 0.0008 |