A Finite State Machine Approach to Algorithmic Lateral Inhibition for Real-Time Motion Detection †

Many researchers have explored the relationship between recurrent neural networks and finite state machines. Finite state machines constitute the best-characterized computational model, whereas artificial neural networks have become a very successful tool for modeling and problem solving. The neurally-inspired lateral inhibition method, and its application to motion detection tasks, have been successfully implemented in recent years. In this paper, control knowledge of the algorithmic lateral inhibition (ALI) method is described and applied by means of finite state machines, in which the state space is constituted from the set of distinguishable cases of accumulated charge in a local memory. The article describes an ALI implementation for a motion detection task. For the implementation, we have chosen to use one of the members of the 16-nm Kintex UltraScale+ family of Xilinx FPGAs. FPGAs provide the necessary accuracy, resolution, and precision to run neural algorithms alongside current sensor technologies. The results offered in this paper demonstrate that this implementation provides accurate object tracking performance on several datasets, obtaining a high F-score value (0.86) for the most complex sequence used. Moreover, it outperforms implementations of a complete ALI algorithm and a simplified version of the ALI algorithm—named “accumulative computation”—which was run about ten years ago, now reaching real-time processing times that were simply not achievable at that time for ALI.


Introduction
Over recent decades, many researchers have explored the relationship between discrete-time recurrent neural networks and finite state machines, either by showing their computational equivalence or by training the former to perform as finite state recognizers [1]. The relationship between discrete-time recurrent neural networks and finite state machines has very deep roots [2,3]. Firstly, consider that finite state machines constitute the best-characterized computational model, whereas artificial neural networks have become a very successful tool for modeling and problem solving. Indeed, the fields of neural networks and finite state computation emerged simultaneously. A McCulloch-Pitts net really is a finite state machine of interconnected McCulloch-Pitts neurons, each of them in two possible states: firing and not firing [3]. Kleene formalized the sets of input sequences that led a McCulloch-Pitts network to a given state, and later, Minsky showed that any finite state machine can be simulated by a discrete-time recurrent neural net using McCulloch-Pitts units [2].

The Algorithmic Lateral Inhibition Method
Computational neuroscience is characterized by the desire to fulfill two clear objectives [20], namely: (1) the construction of computational models of neurons and neural networks as a valuable tool to understand the nervous system; and (2) the use of neural models of biological inspiration as methods for solving problems in a wide range of domains, such as vision, character recognition, temporal series prediction, planning, and control, where symbolic methods have shown to be inadequate or insufficient.
We use biology as a source of inspiration to obtain methods and procedures useful in engineering and computation. Concretely, we look for inspiration in neural networks which repeat along the whole visual pathway for artificial vision and motion related problems. If we had to take a neural circuit that insistently repeats itself in the superior vertebrates' visual path in looking for inspiration for the development of artificial neural networks that may be useful in computer vision, there is no doubt that such a circuit would be of lateral inhibition (LI). LI inhibition refers to the inhibition effect that neighboring neurons in brain pathways have upon each other. More precisely, LI is the capacity of an excited neuron to reduce the activity of its neighbors. Such a structure of neural calculation, in its non-recurrent (guided by data), as well as in its recurrent (guided by local results) versions, appears along the whole visual pathway. Firstly, it appears among cones and rods in bipolar, amacrine and ganglion cells. It also appears in the lateral geniculate body, and finally, among columns in the cerebral cortex. Thus, it is reasonable to think of the very special value of LIs in the process of constructing an internal representation of a visual scene [21].
In the field of neural computation, LIs have essentially been used in two kinds of tasks. They have been used as filtering tasks to detect spatial-temporal contrasts, as well as in preprocessing tasks in learning networks. In the latter, before beginning to modify the weight values, the "winner neuron" is selected as the one that responds with greater intensity to a given configuration of stimuli. This is performed by soft-competition methods, with much accentuated nucleus in differences, or by hard-competition methods ("winner take all", WTA) [22]. In a linear formulation of the LI, a convolution operator with a nucleus in differences is used in such a way that the geometry of the nucleus (symmetry, orientation, etc.) defines an important part of the calculus. Thus, in this same sense of recursive or non-recursive linear filters, the LI is also used in digital image processing to detect spatial, temporal or spatial-temporal contrasts.
Our proposal to increase the calculation capacity of lateral inhibition circuits is to maintain the relational structure, that is to say, to maintain the skeletal model of the LI, but to substitute the usual analytic operators in the linear models by others which are logical-relational in nature, of a greater calculation capacity [23,24]. In this way, the concept of lateral inhibition is extended to embrace a wider group of operators than the lineal and non-lineal analytic ones. In a computational sense, we are speaking of lateral inhibition algorithms, with non-linearity of "if-then" type, local memory and sequential control.
Finally, one more step in the generalization of the LI mechanism is to abstract the structure that underlies the anatomic circuits of the superior vertebrates' visual path up to the knowledge level. The LI turns into a procedure to break up the subtasks where expressions are evaluated in the central part, the same or other expressions are evaluated over the data of the periphery, and there is a "dialogue" comparing the results of both evaluations of the central and the peripheral part [19]. In this work (and some previous ones) we use an abstract representation of the LI anatomic-physiological processes to build a method of an inferential nature at the knowledge level, and to test its usefulness within the context of computer vision. Algorithmic lateral inhibition (ALI) is the symbolic/inferential version of LI in which analytical operators are replaced by rules [24].
In this work, we show some temporal non-recurrent and spatial recurrent ALI processes, as described in the previously referenced work [24]. We look at the results of accumulations in the central and peripheral parts, and the later competitions (usually recursively calculating a consensus value between central and peripheral accumulations), and its usefulness in the construction of an internal representation of the moving pixels that are present in a video sequence. From now on, this paper will focus on the formal model for the ALI applied to motion detection in video sequences. The article shows how to implement ALI in motion detection by means of a formal model described as finite state machines. These are concretely called ALI Temporal Motion Detecting, ALI Spatial-Temporal Recharging and ALI Spatial-Temporal Homogenization.

Formal Model of ALI for Motion Detection
The control knowledge of the ALI method is described extensively in Sections 3.1-3.3 by means of finite state machines in which the state space is constituted from the set of distinguishable cases in the state of accumulated charge in a local memory [10]. The general ALI method can be broadly described as follows: • A scalar quantization into N levels of accumulated charges is performed on the input images.

•
For each level, if an image pixel at time t does not belong to the level, the charge at that pixel and that level is discharged down to minimum value v dis . • For each level, if an image pixel at time t belongs to the level and did not belong to it at previous time t − ∆t, the charge value is loaded to the maximum saturation value v sat . • For each level, if an image pixel belongs to the level at time t and t − ∆t, the charge is decremented by value v dm (discharge value due to motion detection). Of course, the charge value cannot be under minimum value v dis . The discharge of a pixel by quantity v dm is the way to stop paying attention to a pixel of the image through time.

•
For each level, a pixel not directly or indirectly linked by means of lateral inhibition mechanisms to a maximally charged pixel (v sat ), decreases to total discharge v dis with time. Therefore, an extra charge v rv (charge value due to neighborhood) is added to the charge in those image pixels that receive a recharge stimulus from any of the four neighboring pixels. Let us suppose, without loss of generality, that it is enough to distinguish eight levels of accumulated charge (N = 8). Consequently, we can use an 8-state automaton (S 0 , S 1 , . . . , S 7 ), where S 0 corresponds to v dis and S 7 to v sat , as a formal model describing the data flow corresponding to the calculation of the subtasks. Let us also suppose that discharge and recharge initially take the values corresponding to the descent of two states (v dm = 64), and to the ascent of one state (v rv = 32). This way, the state transition diagram corresponds to a kind of reversible counter ("up-down") which is controlled by the result of lateral inhibition (dialogue among neighbors).
To complete the description of the states, together with the accumulated charge value, v (v dis ≤ v ≤ v sat ), it is necessary to include a binary variable, A C = {0, 1}. When A C = 1, a pixel tells its neighbors that it has detected a moving object, or that some neighbor has told it to have detected such moving object. This is the label that informs the presence of a moving object in the receptive field (in the central part or in the periphery). Thus, state S(t) is a tuple S(t) = [v(t), A C (t)]. Figure 1 anticipates the different phases of the ALI algorithm applied to motion detection. The three phases, to which the sequence of input video images is subjected, are explained in detail in the following sections. Note that ALI Spatial-Temporal Homogenization is the single phase with a clear lateral inhibition inspiration, which makes it a far more computationally expensive one. The simplification denominated accumulative computation (AC) covers only the first and second phases.

ALI Temporal Motion Detecting
The aim of this phase is to detect the temporal and local (pixel to pixel) contrasts of pairs of consecutive binarized images at gray level k. The phase firstly gets the values of the L = 256 gray level input pixels I(i, j; t) as input data, and generates N = 8 binary images, x k (i, j; t), corresponding to N levels obtained through scalar quantization. The output space has a memory with two levels: one for the current value, the other for the value of the previous instant. Thus, for N levels, there are 2N = 16 binary values for each input pixel; at each level, there is the current value x k (i, j; t) and the previous value x k (i, j; t − ∆t), such that: where k = 0, . . . , N − 1, is the level index. Thus, the first step is a scalar quantization algorithm called multilevel thresholding, that segments the image into N equally spaced gray levels.
A pair of binarized values at each level, x k (i, j; t) and x k (i, j; t − ∆t), constitutes the input space to the temporal non-recurrent ALI. The output space is the result of the individual calculation phase in each element and the current charge value that initially is v dis at state S 0 . It is formed by potential values v dis , v sat and max{y k (i, j; t − ∆t) − v dm , v dis }, where v dm is the decrement value, v dis is the minimum charge value and v sat is the maximum charge value. Value v sat is obtained either when an object just enters the receptive field, or when movement has been detected by any of the pixel's neighbors.
Thus, the output of phase ALI Temporal Motion Detecting is the accumulated charge value, y k (i, j; t), in association with label A C . Remember that A C = 1 denotes the fact that a movement has been locally detected by this pixel.
The following transitions can be observed: In this case the calculation element (i, j) has not detected any contrast with respect to the input of a moving object in that level (x k (i, j; t) = 0). It may have detected it (or not) in the previous interval (x k (i, j; t − ∆t) = 1, x k (i, j; t) = 0). In any case, the element passes to state S 0 [v = v dis , A C = 0], the state of complete discharge, independently of the initial state. 2.
x k (i, j; t − ∆t) = 0, x k (i, j; t) = 1. The calculation element has detected a contrast in its level (x k (i, j; t) = 1) in t, and it did not in the previous interval ( , the state of total charge, independently of the previous state.
Also, A C passes to 1, to tell its potential dialogue neighbors that this pixel has detected a moving object. This fact will be used later during phase ALI Spatial-Temporal Recharging. Figure 2 shows, in first place (300 to 400 ns), the evolution of the automata states when motion is detected in a pixel where previously no motion was detected ( Notice that CLK and t show the ∆t and t time clock intervals, respectively. V and AC_out represent v and A C . 3. x k (i, j; t − ∆t) = 1, x k (i, j; t) = 1. The calculation element has detected the presence of an object in its level (x k (i, j; t) = 1), and it also detected it in the previous interval (x k (i, j; t − ∆t) = 1). In this case, it diminishes its charge value by v dm , corresponding to two states. This partial discharge can proceed from an initial state of saturation . This partial discharge due to the persistence of the object in that position and in that level is described by means of a transition from S 7 to an intermediate state, The descent in the element's state is equivalent to the descent in the pixel's charge, such that (as you may appreciate in Figure 2, starting around 670 ns) only the following transitions are allowed:

ALI Spatial-Temporal Recharging
In the previous phase, ALI Temporal Motion Detecting, we have obtained the individual "opinion" of each computation element. However, our aim is also to consider the "opinions" of the neighbors. The reason is that an individual element should stop paying attention to motion detected in the past, but before making that decision, there should be a communication in form of lateral inhibition with its neighbors, to see if any of them are in state S 7 (v sat , maximum charge). Otherwise, it will be discharging down to S 0 (v dis , minimum charge), because that pixel is not bound to a pixel that has detected motion. In other words, the aim of this phase is to focus on those pixels charged with an intermediate accumulated charge value, but directly or indirectly connected to saturated pixels (in state S 7 ) by incrementing their charge.
These "motion values" of the previous layer constitute the input space, whereas the output is formed by charge value z k (i, j; t) after dialogue with neighboring pixels. The values of accumulated charge before dialogue are written in the central part of the output space of each pixel that now enters the dialogue phase according to the recurrent ALI scheme instantiated for this task. The data in the periphery of receptive field in the output space of each pixel now contains the individual calculations of its neighbors.
Let v C (t) = y k (i, j; t) be the initial charge value at this phase. Each pixel considers the set of individual calculations, v C (t + l·∆τ), A j , by means of a logical union of labels A j : This result, A P * , is now used to output the new consensus charge value after dialogue, z k (i, j; t + ∆t), with ∆t = m·∆τ, being m ≥ l the number of iterations in the dialogue phase, a function of the size of the receptive field. The whole dialogue process is executed with clock τ, during m intervals ∆τ. It starts when clock t detects the configuration x k (i, j; t − ∆t) = x k (i, j; t) = 1 and terminates at the end of ∆t, when a new image is considered.
At each dialogue step (in other words, at each interval of clock ∆τ), the calculation element only considers values x k (i, j; t − ∆t), x k (i, j; t) and A C present in that moment in its receptive field. To diffuse or to use more distant information, new dialogue steps are necessary. In other words, new inhibitions in l·∆τ (1 < l ≤ m) are required. This only affects state variable A C (τ), as x k (i, j; t − ∆t) and x k (i, j; t) values remain constant during the intervals used to diffuse τ and reach consensus on the different partial results obtained by the calculation elements.
Note that the recharge may only be performed once during all the dialogue steps. That is why A C = 0 when a recharge takes place. Lastly, the output will be: Figure 3 shows the simplified state transition diagram, where the following transitions are distinguished: 1.
x k (i, j; t − ∆t) = {0, 1}, x k (i, j; t) = 0. In any case, independently of the pixel's dialogue with the neighbors (see Figure 4), at the end of ∆t the pixel passes to state x k (i, j; t − ∆t) = 0, x k (i, j; t) = 1. Again, independently of the dialogue step, the pixel's state will be S 7 [v = v sat , A C = 1]. 3.

Local memory is in
. Pixels in state S 0 are not affected by recharge due to motion detection in their periphery. Thus, the pixel maintains the same state S 0 . b.
Local memory is in S 7 [v = v sat , A C = 1]. Pixels in state S 7 are maximally charged. Therefore, they cannot be recharged. They also remain in the same state. c.

Local memory is in S int
Depending on their four neighbors' charge values, it can stay in S int if all neighbors have variable A j = 0, or transit up to S 7 if it finds some neighbor with variable A j = 1.
i. Transit from S i to S i+1 . After recharge, the calculation element is now in S i+1 . It sends A C = 1 and waits to the end of ∆t. In a second clock cycle, ∆τ, A C = 1 is potentially used by its neighbors to increment their charge values. Thus, the dialogue extends in steps of size the receptive field. Pixels with A C = 1 are said to be "transparent" if they allow information on motion detection by some neighbor (in state S 7 ) of their receptive field to cross them. ii.
Remain in S i. If none of its neighbors has transmitted A j = 1, the pixel stays in S i , without recharging in the first ∆τ. In this case, it maintains its proper A C * = 0, and its behavior is called "opaque". However, if in a later ∆τ and inside the dialogue interval it does receive any A j = 1, it will pass to S i+1 . Figure 4 illustrates this diffusion mechanism through "opaque" and "transparent" pixels of the receptive field.
Moreover, Figure 4 offers, in more detail, an example of a dialogue among j, j + 1 and j + 2. Pixels j + 1 and j + 2 are neighbors of pixel j. More concretely, Figure 4 shows the automata's evolution when there is motion in both neighboring pixels. Control automaton that receives inputs x k (i, j; t − ∆t) and x k (i, j; t), and produces three outputs, coincident with its three distinguishable charge states (S 0 = v dis , S 7 = v sat , and {v int }).

Figure 4.
Detail of the dialogue where diffusion of motion detection is shown through "transparent" pixels (j + 2 and j + 1), while pixel j deserves an "opaque" behavior. Dialogue at (a) pixel j, (b) pixel j + 1, and (c) pixel j + 2, respectively.

ALI Spatial-Temporal Homogenization
The aim of this third phase is to obtain all moving patches present in the scene. The phase considers the union of pixels that are physically together and at a same gray level to be a component of an object. A set of recurrent lateral inhibition processes are performed to distribute the charge among all neighbors that are not fully discharged (z k (i, j; t) of the previous phase); those pixels are in states S 1 to S 7 , and are physically connected. A double objective is pursued: To dilute the charge due to the image background motion among other pixels of their own background, so that only moving objects are detected. To dilute the charge due to the image background motion does not mean that we are dealing with moving cameras. Instead, we are facing the problem of false motion detected where moving objects are just leaving pixels that now belong to the background.

2.
To obtain a parameter common to all pixels of the object, those belonging to the same gray level (simple classification task).
Charge values, z k (i, j; t + ∆t), offered by the previous phase, are now evaluated in the center and in the periphery. Now, let v(t + ∆t) = z k (i, j; t + ∆t) be the initial charge value at this phase. In this last phase, we have the average of those neighbors that hold charge values greater than a threshold value θ min . v C = max{v C , θ min } (8) We compare the result of the individual value in the center (C) with the mean value in the periphery (P), and produce a discrepancy class according to threshold, θ min , and pass the mean charge values that overcome that threshold. After this, the result is again compared with a second threshold, namely θ max , eliminating noisy pixels pertaining to non-moving objects.
The dialogue scheme and the description of the control automaton, where the transitions between the initial state S i (t) and the final state S i (t + ∆t) state, are carried out, in agreement with rule: where the sum on sub-index j extends to all neighbors, v j , belonging to the subset of the two-dimensional receptive field, RF k , such that its state is different from S 0 , and N k is the number of neighbors with state different from S 0 .

Hardware Implementation of ALI for Motion Detection
In this section, we depict and analyze the current implementation of the ALI algorithm. We also estimate the increase in speed with respect to the previous implementations of the ALI [9] and AC [10] methods, by focusing on the time required by each design to process each video frame. Figure 5 summarizes the characteristics of the implementations considered in this study. Thus, Figure 5a  Again, our current implementation is based on reconfigurable hardware (see Figure 5c). More specifically, we have considered state-of-the-art Xilinx FPGAs [25], and accordingly, we have used the Xilinx Vivado [26] tool for the definition (in VHDL (The VHDL code is available to anyone interested on it for research purposes)), synthesis and implementation of our design. Firstly, we have defined an "ALI module", which is able to process each image pixel exactly as described in previous sections. Then, starting from this design, we have implemented the complete system, composed of 16 ALI modules and able to process each 4 × 4-pixel image block. The schematics corresponding to both the ALI module and the complete system are provided as Supplementary Material to this paper.
For the implementation, we have considered one of the members of the 16-nm Kintex UltraScale+ family of Xilinx FPGAs. Specifically, we have used a XCKU3P device (the xcku3p-ffvb676-1LV-I model), since the number of IOBs (Input/Output Blocks) provided satisfies the requirements of our design. We have assumed a clock rate of 50 MHz, obtaining the timing parameters relative to maximum delay paths shown in Table 1 after the implementation step. Therefore, the minimum clock period would be 20.000 − 13.070 = 6.930 ns, that is, the maximum clock rate would be 144.3 MHz. From the previous data, we may also estimate the time required to process each image or frame in a video sequence. Assuming a 320 × 240-pixel image, which is composed of 4800 4 × 4-pixel blocks, and, considering that the maximum data path delay for our implementation is 6.810 ns, the processing would take 0.033 ms at most, which means that the ALI method is capable of processing at least 30 frames per second (fps). Assuming a common video frame rate of 24 fps, we consider that this performance enables real-time sequence processing. Furthermore, we have compared these timing results with respect to those obtained in our previous FPGA implementation [10], where the AC algorithm was synthetized and implemented in a Xilinx Virtex-5 FPGA (more specifically, the 5vfx30tff665-1 model). For that implementation, which could process 8-pixel image blocks, the maximum combinational data path was 4.348 ns. Therefore, to process a 320 × 240-pixel image, composed of 9600 8-pixel blocks, the previous implementation would require 0.042 ms. This involves an increase in speed of approximately 27% for the current implementation. Similarly, we have computed the performance of the ALI implementation performed in [9] (over a Virtex-4 FPGA), obtaining a frame processing time of 1.24 ms. Note that this is at least two orders of magnitude higher than the performance of the implementation presented in the current work. Table 2 summarizes the FPGA utilization results for our current implementation. The device utilization rates are very similar to those presented in [10].  Table 3 summarizes the main results provided by the power analysis performed in our implementation.

Results
This section includes all the relevant details on the evaluation process carried out to check the performance of the implemented algorithm, which was undertaken using FPGAs to reduce the execution time of the sequential moving object detection algorithm. FPGA data were introduced in the previous section, together with the results of the corresponding analysis.
Three different video sequences were used in this work. These sequences were selected from the ChangeDetection.NET (CDNET) website [27,28]. More concretely, the employed datasets are Corridor, Highway, and wetSnow. The datasets were chosen due to the variable complexity in the motion detection tasks [29][30][31]. In addition, these three benchmarks were chosen to demonstrate that ALI can detect movement in a variety of relatively complex situations.
Firstly, Highway belongs to the 2012 DATASET, and is the simplest dataset of the three that were used. It pertains to the Baseline Category, which represents a mixture of moderate challenges. Some videos have subtle background motion, others have isolated shadows, some have an abandoned object, and others have pedestrians that stop for a short moment and then move away. These videos are fairly easy, but not trivial, to process, and are provided mainly as references. Corridor is a dataset belonging to the 2012 DATASET Thermal Category. The Thermal Category includes videos that have been captured using far-infrared cameras. These videos contain typical thermal artifacts such as heat stamps (e.g., bright spots left on a seat after a person gets up and leaves), heat reflection on floors and windows, and camouflage effects, when a moving object is of the same temperature as the surrounding regions. Lastly, in the 2014 DATSET, we find the Challenging Weather Category that includes outdoor videos captured in challenging winter weather conditions, i.e., snow storm, snow on the ground, fog. We have selected wetSnow, one of the videos belonging to this category. Figure 6 shows the results of applying the ALI method to the three previously described datasets. From top to bottom of the figure, we show the results for Highway (a), Corridor (b) and wetSnow (c). As can be appreciated in the figure, on the right side the results of the ALI method are shown for one of the images from each sequence, along with boxes containing the detected moving objects. On the left of the figure you can see the input image and the boxes surrounding the moving objects. Readers interested in intermediate results of the several phases of the ALI algorithm, presented as images, are invited to consult a paper from the same authors [32]. This paper shows different input image sequences, and their step-by-step outputs, by varying the most important parameters of the ALI algorithm.
ALI behaves excellently when used with the Highway dataset (see Figure 6a). Three cars are perfectly detected and tracked accordingly. A fourth car entering the scene is still not considered, as it is not shown completely. In the case of the Corridor sequence (see Figure 6b), the segmentation by ALI method has also performed in an outstanding manner. In the image presented, a reflection, which is common in thermal images, is included in the surrounding box. Lastly, the most challenging dataset, wetSnow (see Figure 6c), introduces some unwanted movement in several zones of the image. Table 4 provides more information on the performance metrics of the application of the ALI method to the three datasets. Starting from true positives (TP), false positives (FP) and false negatives (FN), specificity, sensitivity and F-score are shown. These quantitative metrics agree with the qualitative results shown in Figure 6 and the brief explanation provided.  Let us highlight that these performance metrics are in line with previous results of the same authors when applying the ALI method to other datasets, such as MOVI Image Base (http://www.irisa. fr/texmex/ressources/bases/base_images_movi/index.html) [24], Ettlinger-Tor in Karlsruhe (http: //i21www.ira.uka.de/image_sequences/) [24,33], TwoWalkNew (University of Maryland) [32,33].
The performance of the ALI method can be broadly compared to other approaches thanks to a recent work [34]. The performance for tasks directly related to motion detection ranges from 24 to 42 fps. More concretely, we have 42 fps for background detection [35], 30 fps for object detection [34], 25 fps for surveillance [36] and video segmentation [37] respectively, and finally, 24 fps for denoising [38]. An objective comparison is quite difficult, as the tasks do not all have the same complexity, and image sizes are also different. However, the previous figures show that our ALI algorithm, implemented in current FPGAs, is competitive for most motion-detection-based computer vision applications.

Conclusions
In recent years, our research team has been working with the accumulative computation (AC) and algorithmic lateral inhibition (ALI) methods to accurately detect moving objects in video sequences. Moreover, real-time processing of the video images has also been a major issue in all computer vision applications. Unfortunately, the ALI method is computationally intensive, which necessitates maintaining the latest FPGAs to speedup real-time video processing.
To address this problem, the present paper has developed its three main contributions. Firstly, the formal model of finite state machines that simplifies the general neurally-inspired ALI algorithm has been reproduced. This was the first step towards reducing the computation time. Second, the formal model was implemented in up-to-date Xilinx FPGA technology, to continue reducing processing time for the reduced ALI algorithm. Lastly, a comparison between FPGA-based implementations of AC and ALI (about ten years ago) and ALI (to date) has been performed.
We have concluded that the current FPGA-based implementation of ALI achieves excellent performance in terms of F-score (0.98 and 0.86 for simple and complex datasets respectively) as expected, and outperforms the processing times of the AC and ALI implementations performed about ten years ago (27% and 3,658% faster respectively). Current FPGA technology has demonstrated that it is possible to maintain excellent motion detection accuracy whilst implementing more sophisticated biologically-inspired computer vision algorithms.
In the different phases of the ALI algorithm, certain pixel-based processing is performed for each image as it is received, and for the intermediate images generated throughout the processing. In most cases, pixel computation could be performed simultaneously on all pixels, since there is no dependence on such processing. GPUs are well-suited to this parallelism. Therefore, we are planning to translate the ALI algorithm to a GPU-based computing platform. In this way, it will be also possible to compare the current FPGA-based performance with that of a GPU.