This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
This article presents a design methodology for designing an artificial neural network as an equalizer for a binary signal. Firstly, the system is modelled in floating point format using Matlab. Afterward, the design is described for a Field Programmable Gate Array (FPGA) using fixed point format. The FPGA design is based on the System Generator from Xilinx, which is a design tool over Simulink of Matlab. System Generator allows one to design in a fast and flexible way. It uses low level details of the circuits and the functionality of the system can be fully tested. System Generator can be used to check the architecture and to analyse the effect of the number of bits on the system performance. Finally the System Generator design is compiled for the Xilinx Integrated System Environment (ISE) and the system is described using a hardware description language. In ISE the circuits are managed with high level details and physical performances are obtained. In the Conclusions section, some modifications are proposed to improve the methodology and to ensure portability across FPGA manufacturers.
Artificial Neural Networks (ANNs) have been widely used as identifiers of patterns [
The FPGA design is based on the System Generator from Xilinx [
In this regard there have been several studies on ANN over FPGA for real time processing. Some of them focused on baseband signals, and are used as receptors [
When the rate of the input signal increases the ANN implemented in a computer in floating point format cannot operate in real time. For decreasing the response time the ANN should be passed to a digital circuit, normally in fixed point format. The reason is that floating point arithmetic in a digital device needs a lot of hardware resources and power, without substantial improvement in speed. Besides, with the digital device the volume and the power consumption will decrease.
One alternative is to use an Application Specific Integrated Circuit (ASIC). The ASIC has low area occupation, low power consumption and high speed, but its disadvantages are: high price, difficult debugging and verification, long time to market, the fact that it does not allow reprogramming and has high nonrecurring engineering costs. For these reason, ASIC is undesirable to develop prototypes where the number of units to be produced is small.
On the other hand, Digital Signal Processors (DSPs) can be used, which are cheaper than ASICs. DSPs reach higher clock frequencies, but the data rate that can be processed is limited because of the parallelism of the data, the size and format of the data, and the pipelined are fixed. All this is imposed by its predetermined architecture.
Finally, the use of Field Programmable Gate Arrays (FPGA) has several advantages: low price, no nonrecurring engineering costs, minimum development time, ease of debugging and verification, short time to market, high data parallelism, flexible data format and flexible pipelined structure. Although the clock frequency is not as high as in DSPs, with the above characteristics an increase in the data rate can be achieved. Moreover FPGAs have higher power consumption, but they are appropriate for individual prototypes because FPGAs can be reprogrammed by the designer.
This study focuses on a binary unipolar NRZ signal, and the digital cero (“0”) and digital one (“1”) have the same probability (
In summary, it is assumed that the signal has been transmitted over a channel with infinite bandwidth which adds AWGN. The received signal is sampled each
One objective of this study is to check if a Time Delay Neural Network (TDNN) can be used as a preamplifier or equalizer; increasing the output Signal to Noise Relation. Furthermore, it is proposed a design methodology over a FGPA for the TDNN. The tool used to simulate the system in floating point format was Matlab and Simulink, and especially the Neural Network Toolbox was used [
Initially the bit rate (
The question is whether this TDNN will improve the SNR of the sampled signal. This TDNN is trained with its input noisy sampled signal and the target is the original data signal. The signal received at the input of the sampler is called
Initially to train, validate and test the neural network a sequence of 1,000 random bits with +10 dB of SNR was used. Only one hidden layer with five neurons was used, so the neural network size can be denoted as 10/5/1.
Typically the number of neurons in the intermediate layer is initially the geometric or arithmetic average between the inputs and outputs. As transfer function the “logsig” type was used in all neurons (
Secondly, the neural network was tested with another 1,000 random bits for the same SNR (+10 dB). It can be said that the testing SNR was +10 dB. Original data, sampler output and TDNN output are shown in
The TDNN restricts the output signal to (0,1) interval because the output neuron has as transfer function a “logsig” type. The error in the output is bounded to 1. It should be noted that the signal power in the input and output have the same value. This is due to the waveform obtained in the output, because of the target values specified on training.
Finally, the TDNN trained with +10 dB of input SNR, was tested with different values of the input SNR. The testing SNR was varied from −5 dB to +25 dB in 0.5 dB increments. For each testing SNR 1,000 random bits were simulated.
The high number of parameters involved in the TDNN design should be emphasized. First of all, other neural network architectures are possible, some of them have signal feedbacks from the output to the input and besides the number of feedback samples can be varied. This study is focused on a TDNN without feedback. Once the architecture of the neural network has been fixed the design parameters are the number of intermediate layers, the number of neurons in each layer, the transfer function used in neurons, the training algorithm, the error function used in training, the observation interval duration, the training SNR, the testing SNR, the duration of the signal used for training, validation and testing, and the method of splitting the sampled signal for training, validation and testing.
Hereafter, the effects of these parameters are analysed. To observe the effect of the number of intermediate layers were tested configurations with two and three hidden layers, while the rest of the parameters remained as in the previous section. None of them trained successfully. Therefore, the architecture is fixed with a single intermediate layer. Then the effect of the number of neurons in this layer was analysed; for this purpose it was varied from 1 to 20.
For high SNR in the input (>15 dB), one neuron in the middle layer gets high values of the output SNR (>30 dB). For low input SNR the curves converge. At this point the designer must establish a design criterion. It can be set to one neuron in the hidden layer if SNR values are good enough for the application; if not, it should be increased. For describing the method one neuron is fixed in the hidden layer. This is the smallest architecture, therefore it minimizes the area, the response time and power consumption. It must be emphasized that the results depend on the training, but these are the standard curves obtained. Henceforth, the system will have a single neuron in its intermediate layer.
At this point the effect of different transfer functions can be checked. The other parameters were held as in the previous configuration. The transfer functions must be increasing ones and differentiable except at specific points. The function chosen can have impact in the hardware implementation. In the intermediate layer the transfer functions shown in
The combinations shown in the
In order to facilitate the design of the transfer function on a FPGA the cases in which the intermediate layer has a LogSigmoid or TanSigmoid Transfer Function were rejected (the first two rows of the
When the intermediate layer has a Satlin or Satlins Transfer Function (third and fourth row of the
In the last row of the
For the above conditions, twenty training algorithms were tested. The Conjugate gradient backpropagation with PolakRibiere updates training algorithm was set because of the output SNR obtained, the limited number of iterations and the speed of convergence.
So far, the Mean Squared Error function has been used for the training algorithms. Under the conditions of the previous section, including the mentioned training algorithm, the effect of the error function was analysed. There are four functions available: mean squared error, mean absolute error, sum squared error and sum absolute error. The error function is not critical for the training; despite this, the Mean Squared Error function is used for the small improvement for low input SNR.
In this section the effect of the size of the observation interval is analysed. The rest of the parameters have the same configuration of the previous section. For this purpose 10 samples per bit are taken, the sampling frequency is 10 kHz and the training SNR is +10 dB. The number of samples processed by the TDNN is varied between 1 and 20. The best result is for 10 processed samples.
These simulations justify the use of an observation interval of
If the observation interval is longer than
The noise power is one of the most important parameters in the TDNN design. It was tested by sweeping the training SNR from −5 dB to +20 dB in 1 dB steps. Afterwards, the testing SNR was varied from −5 to +20 dB in 0.5 dB steps for each training SNR. In
Training with −5 dB causes the best testing SNR for −5 dB,
Training with 0 dB causes the best testing SNR for 0 dB,
Training with +5 dB causes the best testing SNR for +5 dB,
Training with +10 dB causes the best testing SNR for +10 dB,
Training with +15 dB causes the best testing SNR for +15 dB,
But training with +20 dB does not cause the best testing SNR for +20 dB; in fact, the best testing SNR for +20 dB is for +15 dB training SNR.
If the training SNR is less than −5 dB the training does not converge, it is too much noise. Above +15 dB for training SNR suffers a relaxing process, there is very little noise. A criterion for fixing the training SNR which produces the greatest output SNR for an input SNR, or for an input SNR range could be set. Hereafter, we will try to find the optimal training SNR with the criterion of having maximum output SNR with low input SNR; furthermore, the output SNR will be greater than the input SNR for all the range.
The objective is to design an equalizer using a TDNN for:
1 kbit/s data binary rate,
the signal received is unipolar NRZ with AWGN,
10 samples per bit are taken, so the sample frequency is 10 kHz.
The TDNN has been designed with the following parameters:
+7 dB for the training SNR,
a observation interval of 10 samples,
one hidden layer with one neuron,
Linear Transfer Function in the hidden layer,
one neuron in the output layer,
Satlin Transfer Function for the output neuron,
initially, 1,000 bits were used (80% for training, 10% validation and 10% for testing),
conjugate gradient backpropagation with PolakRibiere updates training algorithm was used,
Mean Squared Error function for training.
Finally the TDNN was checking:
with testing SNR from −5 dB to +20 dB in 0.5 dB steps,
1,000 bits were simulated for each SNR value.
The solution obtained was subjected to a second round of parameter variations to see if the solution obtained is local. The sweep was done in the same and reverse order, that is, if changing a parameter can improve the system. By varying the parameters the same solution was obtained.
The TDNN architecture with associated parameters has been reached, but each time it is trained different coefficients are obtained (weights and bias), these coefficients produce similar but not identical curves. That is, variations may occur between TDNNs obtained with the same training SNR. This is because each training session has different initialization of the algorithm. In addition, every training session uses different data and noise signals; although, the values of its parameters: power, statistics features,
For this purpose, 100 training sessions were realized. The curves whose SNR output was less than input SNR were neglected. There were 94 successful training sessions; in
The calculation that performs a neural network can be fully or partly sequentialized, less hardware resources implies more time delay. Furthermore, the system can be completely parallelized, and the maximum amount of resources entail minimum delay. The level of parallelization chosen depends on the maximum delay allowed and the hardware resources available for the design. For high sample rates the fully parallelized architecture must be used for reducing the response time.
The TDNN was designed in fixed point format twos complement for an FPGA using the Xilinx System Generator. The Simulink block diagram of the TDNN is shown in
Henceforth, the System Generator blocks used are described. Gateway In block is the input bus to the FPGA, the signals would come from an Analog to Digital Converter. If the noise power was zero, the signal would be 0 or +1 and it would be enough an unsigned bit for the representation; in that case, an equalizer would not be necessary. The noisy signal is bipolar and needs a sign bit. The range of the input signal must be covered for the worst case of −5 dB of input SNR. In that situation the noise power (σ^{2}) is 1.58, and the noise typical deviation is σ = 1.26. The values
Then the samples are found with a probability greater than 99.7% between the limits given by
To set the number of decimal bits (
Nine delay cells are used, each of them delay a clock cycle, the frequency of this clock is the sampling frequency. The 10 samples of the input signal are grouped together using a Bus Creator Simulink block which facilitates the connection to the next stage. Afterwards, Goto and From Simulink blocks are used for wiring. This technique is useful in complex systems with many neurons, where there are many electrical connections.
The hidden layer is formed by a single neuron. In the first stage the delay cells outputs are multiplied by weights and the bias value is added. The second stage is the transfer function.
Coefficient values are given from a Matlab array in floating point format, these values come from the training process. The number of bits and binary point position are set for covering the range and for representing the value with a maximum error. Initially, the output precision for the constant multipliers was set as full. In the same way, the adders are configured with zero latency and full output resolution. Finally, the bias is added, this value is stored in a Constant block and configured similarly to the constant multiplier block.
The transfer function for this neuron is the identity function (Linear Transfer Function in Matlab). It could be avoided joining the blocks with a connection. A block with an internal connection is used to remember that in other cases it must be inserted a block that performs the transfer function. Obviously, this does not penalize in area, power consumption or delay time.
The output of the hidden layer is connected to the output layer, which has only one neuron. The first stage in
The output precision is set by the multiplexer. Initially, it was set to full resolution. The input from the previous stage is signed; for this reason, the multiplexer has a signed output. Actually, Satlin function output is unsigned, between 0 and +1, it could be eliminated the bit sign, this adjustment was done later.
Under these conditions, the coefficients (weights and bias) are represented with a maximum error of 1%. The peak to peak value (
Finally, the testing SNR was varied from −5 dB to +25 dB in 1 dB increments, for each testing SNR 1,000 random bits were simulated.
The number of bits can be reduced for the representation of weights, bias and input samples; this implies saving area, power and less delay time. Decreasing the number of bits produces growth in the representation error and the performance of the system gets worse. In other words, the reduction of number of bits causes degradation of output SNR
The maximum error must be set through some criterion. For example, that the curve of fixed point model must deviate less than 1 dB respect to the curve of floating point model. With this criterion, the maximum error was set to 1%. In this case, the input signal format has 11 bits: a bit for the sign, three integer bits and seven decimal bits. With full resolution in all operators the output of the TDNN in the FPGA has 40 bits, including 25 decimal bits and a sign bit. Then some adjustments are possible in the output format—this corresponds to the optimized model. First, the sign bit can be removed in the output. Moreover, the number of decimal bits can be reduced to seven without degrading the SNR curve, this can be checked experimentally. Finally, one bit for the integer part is used, so +1 can be represented without error. Given this reduction in the number of bits in the output signal, the number of bits of different stages is reduced in the system from the output to the input.
At this point the model and the architecture of the system have been fixed; besides, the full functionality has been checked. Then the design is compiled with System Generator. For the compilation process the FPGA device must be chosen, in this case the Xilinx Spartan3E family, device xc3s500e, package fg320, and −5 for speed grade was used. Besides, for System Generator compilation a standard Hardware Description Language (HDL) must be chosen, these languages are Verilog and Very High Speed Integrated Circuit Hardware Description Language (VHDL) [
The ISE software is the Xilinx standard tool for FPGA design. The syntax of the HDL files can be checked, and synthesis and behavioral simulation of the TDNN can be executed. After that, the design implementation permits the timing simulation of the system. The simulation for 1 kbit/s is illustrated in
Xilinx ISE software manages FPGA circuits with a high level of detail. For this reason, the physical performances can be determined more accurately.
It should be emphasized that ISE simulator uses FPGA circuits with high level detail, this makes simulations more accurate, but much slower. Only short duration signals can be simulated, in opposition to System Generator simulations. In this environment the full functionality of the system cannot be tested, but timing details can be analysed.
A design methodology of an equalizer is presented using a neural network on a FPGA. Three phases can be differentiated in the design, the first two phases are supported by Matlab. In the first stage the Matlab Neural Network Toolbox is used for fixing the floating point architecture, parameters and the performance of the neural network, the information obtained can be called the “golden rule”.
In the second stage the Xilinx System Generator is used, which operates on Matlab Simulink. In this phase the system is designed in fixed point format according to the golden rule. In System Generator the circuits are handled with low level of detail, for this reason the simulations are very fast and the functionality of the system can checked completely. During this stage a poor estimation of the area is calculated, and nor power consumption nor speed of the circuit are evaluated. Moreover, the effect of the number of bits in different parts of the design can be tested. The fixed point format has implications on the functionality of the system and the hardware resources occupied.
In the third step, the system description obtained with System Generator is used by the Xilinx Integrated System Environment. This design tool uses a high level of circuit details, and this allows estimation of physical performances: hardware resources, power consumption and maximum clock frequency.
It should be noted that the description of the system obtained by System Generator is not portable to other manufacturers. The reason is that System Generator calls primitives and specific blocks of Xilinx. The design could have been done for Altera, the second FPGA manufacturer in importance. Altera offers DSP Builder, which is a similar tool over Simulink. In the same way these designs are only valid for Altera FPGAs. As a future line of investigation, Matlab HDL Coder could be used, whose files are portable to all manufacturers. The HDL Coder designs can be compared with the designs obtained with the FPGA manufacturers' tools. The results will depend on the compilers. Provide the portability using a hand coded hardware description language is not a good alternative. The design of complex systems directly in a hardware description language is long and tedious, and not flexible for changes.
Obviously, increasing the sampling frequency can improve the system performance, but this may be a design restriction or be limited by the technology. In other words, given the sampling frequency it is possible to improve the system by varying other parameters of the neural network.
The output SNR curve obtained is not rigid, and among other parameters it depends on the architecture and training SNR. The neural network can be trained for other scenarios, for instance if the signal suffers from distortion or other noise model. Another advantage is that the architecture and parameters can be changed to fit the new channel. Other neural network architectures are available, even some of them with feedback signals.
The same error was assumed in the representation of the input and coefficients in the two layers. In general the effect of different errors should be analysed for input and each layer coefficients. The conclusions should focus on the functionality and physical performance of the system. This study should be automated with a Matlab program for executing the models designed with System Generator. In this case, given the shape of the transfer functions its effect is not considered. That is, these functions do not produce approximation errors between input and output. In general, with other nonlinear functions it is necessary to consider the approximation error introduced by the implementation.
During system design it is convenient to maintain full the resolution of the operators. Reducing the number of bits sometimes is possible in the final output. In this case it is possible to decrease the binary representation toward the circuit inputs. This process could also be automated with a Matlab program. The low rates used in the initial simulations do not affect the method or the conclusions, being generalizable to higher frequencies, as high as allowed by the available technology.
Proposed model.
(
Time delay neural network.
Logsigmoid transfer function.
(
Output SNR
Effect of the number of neurons in the hidden layer on the output SNR.
Transfer functions tested in the intermediate layer.
Transfer functions tested in output neuron.
Transfer functions checked in the neural network.
Matlab Neural Network Toolbox training window. Matlab has a bug in the output neuron drawing the unipolar Satlin Transfer Function as bipolar.
Effect of the observation interval.
Effect of the training SNR from −5 dB to +20 dB.
Effect of the training SNR from +6 dB to +10 dB.
(
System Generator block diagram of the TDNN.
First stage of the hidden layer.
(
Simulink simulation of the TDNN: (
Output SNR
Xilinx ISE timing simulations: (
Effect of the errors on the system.
Floating point: o  

Fixed point:*  
0.1%  14,47 dB 


Slices: 1026 
Signed Number of bis: 14 
Signed Number of bis: 50 
1%  14,47 dB 


Slices: 693 
Signed Number of bis: 11 
Signed Number of bis: 40 
1% (optimized)  14,46 dB 


Slices: 678 
Signed Number of bis: 11 
Unsigned Number of bis: 8 
2%  14,46 dB 


Slices: 623 
Signed Number of bis: 10 
Signed Number of bis: 37 
Results in area, power and maximum clock frequency for both HDL.
VHDL  Number of Slices: 602 out of 4656: 12%  Quiescent: 0.085 
276.167 MHz 
Number of Slice Flip Flops: 100 out of 9312: 1%  
Number of 4 input LUTs: 961 out of 9312: 10%  
Number of IOs: 21  
Number of bonded IOBs: 20 out of 232: 8%  
Number of GCLKs: 1 out of 24: 4%  
Verilog  Number of Slices: 602 out of 4656: 12%  Quiescent: 0.085 
273.523 MHz 
Number of Slice Flip Flops: 100 out of 9312: 1%  
Number of 4 input LUTs: 963 out of 9312: 10%  
Number of IOs: 21  
Number of bonded IOBs: 20 out of 232: 8%  
Number of GCLKs: 1 out of 24: 4% 