^{1}

^{2}

^{*}

^{1}

^{3}

^{1}

^{4}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

The neurally inspired accumulative computation (AC) method and its application to motion detection have been introduced in the past years. This paper revisits the fact that many researchers have explored the relationship between neural networks and finite state machines. Indeed, finite state machines constitute the best characterized computational model, whereas artificial neural networks have become a very successful tool for modeling and problem solving. The article shows how to reach real-time performance after using a model described as a finite state machine. This paper introduces two steps towards that direction: (a) A simplification of the general AC method is performed by formally transforming it into a finite state machine. (b) A hardware implementation in FPGA of such a designed AC module, as well as an 8-AC motion detector, providing promising performance results. We also offer two case studies of the use of AC motion detectors in surveillance applications, namely infrared-based people segmentation and color-based people tracking, respectively.

Motion analysis in image sequences is a constantly growing discipline due to the great number of applications in which it plays a primordial key function. Moreover, optical flow in monocular video can serve as a key for recognizing and tracking moving objects, as flow data contains richer information and in experiments can successfully track difficult sequences [

In this sense, many researchers have explored the relation between discrete-time neural networks and finite state machines, either by showing their computational equivalence or by training them to perform as finite state recognizers from example [

Our experience up to date has shown that most applications in computer vision, and more specifically in motion detection through AC, offer good results with the same values of the parameters of the model. The article shows how to reach real-time performance after using a model described as a finite state machine. The two steps towards that direction are: (a) A simplification of the general AC method is performed by formally transforming it into a finite state machine. (b) A hardware implementation of such a designed AC module, as well as an 8-AC motion detector, providing promising performance results. The rest of the paper is structured as follows. Section 2. revisits the AC method in motion detection. Then, section 3. introduces the simplified model for AC in form of a finite state automaton. Section 4. depicts the real-time hardware implementation of motion-detection AC modules obtained from the previous formal model. Lastly, 5. and 6. are the Data and results and Conclusions sections, respectively.

The two main problems in motion analysis in image sequences are the correspondence and the aperture problem. The correspondence problem, well exposed by Duda and Hart [

Models based on local motion detection face the correspondence problem considering that a pixel in time

Correlation based models [

There are also models based on the uniformity restriction. These impose that the moving objects velocity fields vary uniformly, since objects usually have uniform surfaces. They analyze local velocity fields to obtain information about the real velocity of the objects. Some examples are the visual motion measurement model of Hildreth [

The method proposed, based on the effect called permanency [

The AC approach is neurally inspired. Usually the time evolution of the neuron membrane potential is modeled by a first order differential equation known as the “leaky integrator model”. A different way of modeling time evolution of membrane potential is to consider the membrane as a local working memory in which neither the triggering conditions nor the way in which the potential tries to return to its input-free equilibrium value, needs to be restricted to thresholds and exponential increases and decays. This type of working memory is characterized by the possibility of controlling its charge and discharge dynamics in terms of:

The presence of specific spatio-temporal features with values over a certain threshold.

The persistency in the presence of these features.

The increment or decrement values (±

The control and learning mechanisms.

The upper part of

This formula assigns pixel (

The charge value at pixel (

The control knowledge is described extensively by means of a finite automaton in which the state space is constituted from the set of distinguishable situations in the state of accumulated charge in a local memory [_{0}, _{1}, …, _{N}_{0} is the state corresponding to the totally discharged local memory (_{N}_{int}

Let us suppose, without loss of generality, that it is enough to distinguish eight levels of accumulated charge (_{0}, _{1}, …, _{7}), where _{0} corresponds to _{7} to

Now, the aim is to detect the temporal and local (pixel to pixel) contrasts of pairs of consecutive binarised images at gray level _{k}_{k}_{k}

_{k}_{k}

In this case the calculation element (_{k}_{k}_{k}_{0}, the state of complete discharge, independently of which was the initial state.

_{k}_{k}

The calculation element has detected in _{k}_{k}_{7}, the state of total charge, independently of which was the previous state.

_{k}_{k}

The calculation element has detected the presence of an object in its band (_{k}_{k}_{7}, or from some intermediate state (_{6}, …, _{1}). This partial discharge due to the persistence of the object in that position and in that band, is described by means of a transition from _{7} to an intermediate state, _{int}_{0}. The descent in the element's state is equivalent to the descent in the pixel's charge, as you may appreciate on

The presented scheme suffers from low performance when a pixel is in the border of two bands. In this situation, a pixel with a mean value in the border of two bands and some noise that makes the pixel change from one band to another close band, activates the stimuli sequence and, consequently, motion is detected when there is no real motion in the scene.

However the scheme can be slightly modified to overcome this problem. Indeed, the previous scheme can be modified to take into account a hysteresis cycle defined through

In this case the accumulated charge for band

In order to accelerate their performance, and hence to obtain real-time processing rates, many applications use reconfigurable hardware. More concretely, they are programmed on field programmable gate arrays (FPGAs) [

Some of the most recently used FPGA families are Xilinx Virtex-II [

We also highlight a recent paper [

In this section, we show how a single AC module, as well as its expansion to an 8-module, starting from the description as a finite state machine, has been implemented (see

In

The output

The same

Each one of the 8

Now, for the implementation of an 8-module, using the same FPGA (the 5vfx30tff665-1), the results obtained are shown in

As the maximum combinational path delay is 4.348 ns, when working with 648 × 480 pixel images, which need 38880 8-AC modules, the results are obtained after 0.167 ms. This performance has to be considered as excellent, enabling working at real-time.

In order to validate the usefulness of the AC modules described previously, a couple of case studies of the use of AC motion detectors in surveillance applications, namely infrared-based people segmentation and color-based people tracking, respectively, are introduced in this section. The cases introduced only show a few of many possible uses of our approach.

We have used an infrared surveillance sequence captured by our research team, where different persons appear and disappear in the scene.

Notice that motion not detected in one band is detected in another one. Notice that the background motion is mainly obtained at bands 2 and 3, whereas the foreground is obtained at bands 4 to 7. Bands 1 and 7 do not offer much information, neither on foreground nor on background motion. A deeper insight into the figure show some interesting results. A gross conclusion is that band 4 mostly gets the contours of the foreground moving elements (people, in this case), whereas bands 5 and 6 show the main parts of the moving bodies. This is why, in this particular case, it seems reasonable to sum up bands 5 and 6 to obtain moving people in infrared imagery. Now,

In this case study, we have used a data set containing 1109 frames captured in an office room.

As you may appreciate in

The sequence has been analyzed using the hysteresis modification proposed with different settings. In this case the charge of all bands
_{i}_{T}_{i}_{i}

_{T}_{R}_{G}_{B}_{R}_{G}_{B}_{T}_{R}_{G}_{B}_{R}_{G}_{B}_{T}_{R}_{G}_{B}_{R}_{G}_{B}

From the results offered it can be easily seen that, when the number of bands _{i}_{i}_{T}_{i}_{i}

This paper starts from previous works in computer vision, where our accumulative computation method applied to motion detection has proven to be quite efficient. We have shown in this article how the AC model, based in neural networks, has been modeled by means of finite state automata, seeking for real-time through an implementation in FPGA-based reconfigurable hardware. Therefore, two steps towards that direction have been taken: (a) A simplification of the general AC method by formally transforming it into a finite state machine. (b) A hardware implementation of such AC modules.

The design by means of programmable logic enables the systematic and efficient crossing from the descriptions of the functional specifications of a sequential system to the equivalent description in terms of a finite state automaton. Starting from this point, a hardware implementation by means of programmable logic is very easy to perform. This kind of design is especially interesting in those application domains where the response time is crucial (e.g., monitoring and diagnosing tasks in visual surveillance and security).

In this paper, the results obtained after implementing AC modules in hardware on programmable logic, concretely on Virtex-5 FPGA's, have been shown. These results start from previous validated researches on moving objects detection, which unfortunately did not reach real-time performance. Prior to the implementation, a simplification of the model into an 8-state finite automaton has been performed. The procedure is easily expandable to all delimited-complexity functions that may be described in a clear and precise manner by a not too high number of states.

Two case studies of real interest in surveillance applications have been introduced. These examples have demonstrated the versatility of the motion detectors, which can be inserted into any high-level computer vision task.

This work was partially supported by the Spanish Ministerio de Ciencia e Innovación under projects TIN2007-67586-C02-02 and TEC2008-0277/TEC, and by the Spanish Junta de Comunidades de Castilla-La Mancha under projects PII2I09-0069-0994, PII2I09-0071-3947 and PEII09-0054-9581.

The AC working memory model (upper part) and an example of the temporal evolution of the accumulated persistency state,

Control automaton that receives inputs _{k}_{k}_{0} = _{7} = _{int}

Layout of a motion-detection AC module.

AC module automata for band 7.

Layout of an 8-AC motion detector.

Result of AC detection modules for each gray level band.

Addition of AC detection modules corresponding to bands 5 and 6 for efficient infrared-based people segmentation.

Result of AC detection modules for color-based people tracking. (a) Input image. From top to bottom, bands 0 to 7, result of AC on the (b)

ROC curve associated to the color video sequence.

Total charge for _{R}_{G}_{B}_{R}_{G}_{B}

Total charge for _{R}_{G}_{B}_{R}_{G}_{B}

Total charge for _{R}_{G}_{B}_{R}_{G}_{B}

Temporal results for the AC module.

Minimum period | 1.287 ns |

Maximum frequency | 777.001 MHz |

Minimum input required time before clock | 2.738 ns |

Maximum output delay after clock | 3.271 ns |

Device utilization summary for the AC module.

Slice Logic Utilization: | |

Number of Slice Registers | 24 out of 20480 (0%) |

Number of Slice LUTs | 40 out of 20480 (0%) |

Number used as Logic | 40 out of 20480 (0%) |

Slice Logic Distribution: | |

Number of LUT Flip Flop pairs used | 40 |

Number with an unused Flip Flop | 16 out of 40 (40%) |

Number with an unused LUT | 0 out of 40 (0%) |

Number of fully used LUT-FF pairs | 24 out of 40 (60%) |

Number of unique control sets | 1 |

IO Utilization: | |

Number of IOs | 32 |

Number of bonded IOBs | 32 out of 360 |

Temporal results for the 8-AC motion detector.

Minimum period | 2.736 ns |

Maximum frequency | 365.497 MHz |

Minimum input required time before clock | 2.834 ns |

Maximum output delay after clock | 3.271 ns |

Maximum combinational path delay | 4.348 ns |

Device utilization summary for the 8-AC motion detector.

Slice Logic Utilization: | |

Number of Slice Registers | 248 out of 20480 (1%) |

Number of Slice LUTs | 467 out of 20480 (2%) |

Number used as Logic | 467 out of 20480 (2%) |

Slice Logic Distribution: | |

Number of LUT Flip Flop pairs used | 492 |

Number with an unused Flip Flop | 244 out of 492 (49%) |

Number with an unused LUT | 25 out of 492 (5%) |

Number of fully used LUT-FF pairs | 223 out of 492 (45%) |

Number of unique control sets | 2 |

IO Utilization: | |

Number of IOs | 260 |

Number of bonded IOBs | 260 out of 360 (72%) |

Number of BUFG/BUFGCTRLs | 1 out of 32 (3%) |

Algorithm performance statistics for the color video sequence.

Number of Cases: | 1109 |

Number of Correct Cases: | 1066 |

Accuracy: | 96.1% |

Sensitivity: | 95.8% |

Specificity: | 96.4% |

Positive Cases Missed: | 24 |

Negative Cases Missed: | 19 |

| |

Fitted ROC Area: | 0.968 |

Empiric ROC Area: | 0.964 |