^{*}

This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (

The purpose of this study is to develop a motion sensor (delivering optical flow estimations) using a platform that includes the sensor itself, focal plane processing resources, and co-processing resources on a general purpose embedded processor. All this is implemented on a single device as a SoC (System-on-a-Chip). Optical flow is the 2-D projection into the camera plane of the 3-D motion information presented at the world scenario. This motion representation is widespread well-known and applied in the science community to solve a wide variety of problems. Most applications based on motion estimation require work in real-time; hence, this restriction must be taken into account. In this paper, we show an efficient approach to estimate the motion velocity vectors with an architecture based on a focal plane processor combined on-chip with a 32 bits NIOS II processor. Our approach relies on the simplification of the original optical flow model and its efficient implementation in a platform that combines an analog (focal-plane) and digital (NIOS II) processor. The system is fully functional and is organized in different stages where the early processing (focal plane) stage is mainly focus to pre-process the input image stream to reduce the computational cost in the post-processing (NIOS II) stage. We present the employed co-design techniques and analyze this novel architecture. We evaluate the system’s performance and accuracy with respect to the different proposed approaches described in the literature. We also discuss the advantages of the proposed approach as well as the degree of efficiency which can be obtained from the focal plane processing capabilities of the system. The final outcome is a low cost smart sensor for optical flow computation with real-time performance and reduced power consumption that can be used for very diverse application domains.

The term Optical Flow refers to the visual phenomenon due to the apparent movement perceived when we move through a scene and/or regarding the objects moving within it. It represents the projection of the 3-D motion presented in the scene to the 2-D plane of the image sensor or the retina. Note that as a consequence of this protection, depth information is partially lost and the estimation of the 3-D scene structure and motion from the available 2-D field is a very complex task. Optical flow has been extensively studied in the computer vision community (see for instance [

Different approaches have been proposed, in the scientific framework, to estimate the optical flow field. The most widely used ones are the gradient based methods. These methods are based on the constant-brightness assumption. An extended model is the well-known local method proposed by Lucas and Kanade [

In addition to the model choice used to compute the optical flow, its performance and computing resource demands are key elements to develop an embedded system for real-world applications. In the framework of real-time computing approaches, Díaz

Following the results of [

The rest of the paper is organized as follows: Section 2 provides a brief overview of the Eye-RIS™ system which will be the target device to implement the optical flow sensor. Section 3 presents an introduction to the optical flow constraint equation of the Lucas and Kanade method used in this paper. In Section 4, we suggest a number of approaches to enhance the performance of the algorithm implemented in the Eye-RIS™ platform as well as the co-design strategy used to carry out the implementation in this system. The evaluation of the different approaches is described in Section 5. Finally, our experimental results are presented in Section 6 while our conclusions and directions for future research are summarized in Section 7.

In this paper, we make use of a commercial smart camera designed by AnaFocus, named the Eye-RIS™ v1.2 [

One unique characteristic of the Eye-RIS™ vision systems compared to other commercial solutions is that image acquisition and early-processing take place at the sensor, which is actually a Smart Image Sensor (SIS). In this device, image acquisition and pre-processing is performed simultaneously in all pixels of the SIS. Consequently, images do not need to be downloaded from the sensor for the initial stages of the processing. This concept of concurrent sensing-processing extends the Image Sensor concept to the Smart Image Sensor one. The Smart Camera integrates a SIS called Q-Eye, which is a quarter CIF (aka QCIF, 176 × 144) resolution fully-programmable SIS. It consists of an array of 176 × 144 cells plus a surrounding global circuitry. Each cell comprises multi-mode optical sensors, pixel memories, and linear and non-linear analog processors and binary processors. Each cell is interconnected in several ways with its 8 neighboring cells, allowing for highly flexible, programmable, efficient, real-time image acquisition and spatial processing operations. In Smart Image Sensors, each local processor is merged with an optical sensor. This means that each pixel can both sense the corresponding spatial sample of the image and process this data in close interaction and cooperation with other pixels.

Eye-RIS™ v1.2 allows ultra-high processing speed beyond 1,000 frames per second (fps) thanks to the incorporation of mixed-signal processing at the pixel level (enough light is assumed so that exposure time does not become a bottleneck). Processing speed is also application-dependent. Applications with intensive post-processing algorithms might present slower frame rates, since the performance may be constrained by the processing power of the embedded processor (NIOS II).

On the other hand, the Eye-RIS™ Vision System is not conceived for implementing intensive, iterative gray-level processing tasks. This kind of models can be implemented using the embedded micro-processor but its limited computational power highly limits the complexity of the vision models that can be processed in real time. For this reason, it is necessary to take advantage of the resources available in the architecture to develop the proposed approaches in this paper, to estimate optical flow.

The Q-Eye must be seen as a powerful resource for a further processing,

The presented architecture has several advantages compared to conventional smart cameras, but imposes some restrictions in programming, due to the analog nature of the SIS Q-Eye, that shall be understood and taken into consideration by application developers.

This section introduces the basics to understand the concept of optical flow and the method used in this paper. An ordered sequence of images allows the apparent motion estimation. The optical flow vector can be defined as a temporal variation in the image coordinates across the time, usually denoted as _{x} f_{y} f_{t}

The objective of minimize the error _{x} f_{y} f_{t}

The Lucas and Kanade method has been chosen for two main reasons. At first, this method has been ranked with a very good accuracy

The next section describes how this model is simplified and optimized (in terms of processing speed) for its implementation in a NIOS II soft-core processor with the focal plane co-processing capability.

One important problem in optical flow methods is the amount of memory accesses and massive multiplications computed by the model. For this reason, a high optimization becomes necessary to obtain a reasonable system performance.

In order to speed up the computation of the Lucas and Kanade model a Sparse Integration Block (SIB) approach is used in (4), as show in _{y}_{t}

We apply the principle of vicinity, which assumes that any point in the image will have a similar value to those in its neighborhood. This principle will be used for the optical flow estimation for computation of only a quarter part of the pixel by 4:1 subsampling. Hence, a calculated optical flow vector will be propagated to the neighborhood as shown in

Once the implementation is detailed, we evaluate the system performance with the different approaches. On one hand, the method was implemented in C with two different SIBs; on the other hand, the same implementation was optimized in assembler with different SIBs. Assembler optimization allows to avoid RAW dependencies, optimizes memory accesses in the pipelined data path, and avoids unnecessary stack accesses usually implemented by the C compiler, absolute registers control, loops unrolling,

Next, a performance study is carried out to evaluate the optimization evolution.

Therefore, with these approaches, we come to the conclusion that the global gain obtained in a 9 × 9 version amounts to 40.47 the gain factor as show in

It is convenient extrapolate the performance result to a regular PC processor, for instance the Intel Core 2 Duo, to make clear the constraints we have in the proposed architecture. For a comparative evaluation between NIOS II and an actual processor we make use of Dhrystone 2.1 [^{*2} while an Intel Core 2 Duo 2.00 GHz processor obtains 4240 DMIPS (using only one of the processor cores). If we compare both processors an Intel Core 2 Duo obtains a gain factor of 59.7. Furthermore, Intel Core 2 Duo uses a superscalar architecture with two cores and support SIMD instructions (MMX and SSE) while NIOS II is a basic processor with a scalar architecture with a reduced instructions set (add, sub, mul, jmp,

The boundaries in an image are areas where optical flow can be more confidently estimated (unless they correspond to 3-D objects where occlusion problems are very common, though this case is less probable). These regions are rich in features; hence, the resulting estimation has more accuracy than in areas with poor contrast structure. This is so because the Lucas and Kanade model collects weighted spatio-temporal structural information. If the local contrast structure is poor, the optical flow estimation confidence will be low. Instead of computing all the points and discarding unreliable points in a second stage, we can avoid the calculations of low confidence optical flow estimates by discarding these points a priori (using local contrast structure estimates). In order to take advantage of this issue, we make use of the Roberts Cross operator to localize the edges (local contrast maxima). The used kernels are shown in

The sum of the absolute value of each convolution provides edge estimations, where each convolution operation obtains the maximum response when the edge angle reaches 45° (

Low contrast areas will not provide significant edges. This problem can be solved or reduced by locally performing modifications on the image intensity histograms, for instance by applying a 3 × 3 Laplacian convolution (aka Sharpen filter) which emphasizes the low-contrast areas.

It can be observed that a higher edge density is obtained by applying the sharpen filter. In this example,

Once we have reached this point, we integrate the edge binary map with the optical flow estimation to optimize the computation time. Since the points of interest are previously selected in the focal plane, this process is carried out on the fly without affecting the global performance.

Analyzing the obtained results and taking into consideration that normal scenes have from 30% to 40% of density with a 35% as mean value, we can deduce that making use of 5 × 5 SIB, the frame rate can oscillate between 56 (red mark in

We can conclude that the proposed approach is suitable for real time computation beyond video-rate, (25 fps). The main advantage of studying optical flow in points of interest (in our case they are pre-selected edges) is the increase of performance. The gain obtained is around 2 in the worse cases, assuming these cases in scenes with a 50% of density. Considering the best cases, scenes with 30% of density, the gain will arise up to 2.6. As commented before, a common scene usually contains 35% of density. Hence, we can conclude that the mean speed up gain is 2.3. This gain can be used to compute flow at higher frame-rate (and therefore, improve the optical flow accuracy [

It is important to remark that NIOS II is able to handle 0.044 GOPS to compute optical flow while the focal plane processes 4.1 GOPS to smooth the image, obtain points of interest, and compute the optical flow regularization. To estimate the number of operations used in the focal plane, we carried out an equivalence of a digital processor (NIOS II) to perform the same functionality. Due the amount of operations involved in a previous (focal plane) and final stage (processor) to obtain optical flow, we can conclude that the optical flow estimation could not be implemented in this architecture without the focal plane assistance.

In previous sections, we defined how to estimate optical flow as well as how such optical flow has been implemented to speed up the optical flow estimations. In this section, we introduce the global description of how to estimate the motion vector components in the Eye-RIS™ system. As described before, the Q-Eye sensor is a system able to process in the same physical layer where the image is captured (focal plane computation). For this reason, at the same time that the system captures the image, we can process it and send it to memory where a post-digital processing takes place.

After the image capture is finished, the focal plane processes the image with the proposed method to select edges, as described in the previous section, and applies a linear diffusion filter [

An important factor to take into account is the exposure time when the image is captured. Adopting a sequential strategy, image acquisition, and NIOS processing being done sequentially (one after the other), is not convenient because it does not take advantage of the pipelined processing capabilities of the system. The focal plane (Q-Eye) is able to work asynchronously with the processor, and the system will be able to capture and process in the focal plane at the same time that we are making use of the digital processor. To accomplish this, the exposure time, focal plane processing, and processor computing time must be taken into account.

Furthermore, analog-based internal memories in the focal plane cannot retain the images for a long time due to leakage. Taking into account that the mean value of an image stored in an internal focal plane memory, decreases around 0.8 LSBs every 40 ms. Prolonged storage leads to significant degradation, due to transistors leakage. In order to reduce this signal degradation as well as remain a constant sampling period, we must meet the following constraint, as indicated in expression (6):
_{Processor}_{Focal Plane}_{Processor}_{t} image is captured by the focal plane at the same time as the partial optical flow estimation (half of the resolution) of a previous captured sequence I_{t−2} and I_{t−1}, is carried out on the processor. The second stage captures I_{t+1} image and processes the unfinished optical flow calculation of the previous stage. The last stage transfers the optical flow components to the focal plane to apply the post-processing lineal diffusion filter which act as a Gaussian isotropic filter. Note that we do not have a continuous acquisition process where time between frames is fixed. Contrary, we handle the acquisition process (according to the scheme of _{t−2} and I_{t−1} and between frames I_{t} and I_{t+1} but we do not compute the flow between frames I_{t−1} and I_{t} because the time interval can be different. This is because after compute optical flow another further processing algorithms could be applied.

The purpose of this section is to evaluate and validate the suggested approaches described in the previous section. In this paper the error measure (7) will be the same that the used by Barron _{E}_{c}_{e}

To evaluate the angular error (7) in a sequence, the real optical flow must be known. The Yosemite sequence, created by Lynn Quam [

Once the angular error estimation procedure is defined, we carry out an angular error study on the simplified Lucas and Kanade approach described in this work. As a first step, we will evaluate the original model implementation in software, with the different proposed approaches and densities (100% and 48.5%) as shown in

To compare our approaches, we will measure the error in the Lucas and Kanade implementation proposed by Barron [

As we stated in Section 4, the main idea is to propagate the optical flow estimation to the neighborhood. The results obtained above reveal us that making use of this performance optimization; we reach quite similar results if we compare them with non-propagated version. Hence, we can say that this approach is totally valid since the loss in accuracy is insignificant in both sparse and non-sparse integration block approaches.

The last test, we evaluate optical flow regularization (spatial coherence). In this way, we can correct small errors, weighting them with the neighborhood. The most common smoothing filter used is the Gaussian convolution. Hence, we apply the mentioned filter to the estimated optical flow and then, as a priori, the same test scheme is implemented. Analyzing the experiment above, we can deduce that the obtained results are better if we apply this smoothing convolution filter to the optical flow components. The angular error reduction is higher for small integration blocks, both sparse and non-sparse. While using larger masks, the error is lower, if we compare these masks with the smaller ones (5 × 5). This is due to the fact that, for the information collected in small blocks, the model weighted worse than in the case of masks with larger neighborhoods. Hence, applying the regularization to the result helps to weight again the vicinity, being therefore small mask based approaches more favored. If we compare our results, after the regularization step, with the Barron’s implementation we obtain quite similar angular errors but otherwise we reduce the standard deviation by more than a half.

All the previous simplified approaches have been evaluated using Matlab code, with double floating point data representation, to illustrate the effect of the successive approaches. To obtain a realistic evaluation, we will estimate the angular error in the Eye-RIS™ system. In order to carry out these measurements, different integration blocks and post smoothing filters (regularization) are taken into account. In this evaluation, we assume as valid all the approaches and simplifications evaluated before. Note that here a new filter, in the regularization step, is used. This filter, lineal diffusion filter [

In

We conclude that after performing the angular error measurements of optical flow, taking into account different blocks of integration as well as linear diffusion filters (regularization), the best result obtained is the 9 × 9 SIB with a lineal diffusion filter with

In this section, we present our experimental results with real sequences. To evaluate the optical flow results, we have decided to use a traffic sequence where the cars move through the scene. In this sequence, the optical flow has a clear interpretation and therefore, a qualitative evaluation can be done. The original sequence, Ettlinger-Tor, can be obtained from [

In the results shown below we can observe that the optical flow increases, as we increase the sparse integration block (5 × 5 SIB and 9 × 9 SIB). To estimate the optical flow in

In Section 4.1, we proposed a method to obtain the image edge response in a focal plane. With this approach, we are able to obtain a mean gain of 2.3. Once the edge estimation is computed in the focal plane, morphological operations of dilatation and erosion are applied to the binary map (two dilatations and one erosion with a 3 × 3 kernel) to bring near the optical flow results to the obtained ones without the sparse estimation. In

After the estimation of the points of interest, the motion vectors are calculated on the processor and post smoothed on the Q-Eye. The results of this procedure are shown in

We can conclude that using points of interest, the non-edge detection risk in areas of low contrast must be taken into consideration as shown in

In this paper, we propose an approach to solve optical flow through the Lucas and Kanade method on hybrid architecture, analog and digital processing, based on a computing focal plane and a digital processor. An early image processing is carried out in the analog device, where the image acquisition and processing are executed in the same physical layer (taking advantage of pixel-wise processing parallelism). Once this early image processing is done, the processor is used to estimate the motion vector components with the different proposed simplifications and optimizations. This co-design strategy allows to improve the input image SNR and at the same time, focuses our attention on the relevant image features. This strategy allows to enhance the system accuracy and performance in terms of computing speed.

In this contribution, we show the different model modifications towards an embedded architecture where computing resources are significantly constrained. The originality and challenge of this work lie in the way that the approach was implemented in this architecture, which has low computation power for digital processing, to obtain reasonable results it is necessary to take full advantage of the powerful capabilities of the analog processor. The presented optical flow implementation, on a platform that integrates analog focal plane processing capabilities and digital processing resources takes full advantage of both computational paradigms. Furthermore, different simplifications and optimizations (such as the post-processing filters) are adapted to better match the computing architecture. The development of vision models in this kind of platforms requires an efficient management of the available processing resources.

Focal plane computations allow pixel-wise processing parallelism. Taking full advantage of these parallelism capabilities is not straightforward and also requires evaluating the signal degradation due to analog processing of storage at the focal plane resources. We have carried out a performance evaluation in terms of processing speed and accuracy as well as the evaluation of different simplifications and optimizations, estimating their impact on the final performance rate and accuracy. The results from the experiment reveal us an empirical validation of the proposed scheme. We can conclude that the obtained implementation (and its performance results) validates the proposed approach, as a high complexity model implemented on a low cost sensor. This article may be useful for those who may have similar restrictions as those exposed here (addressing approaches on hybrid analog-digital platforms) or for those who need to speed-up the models with an affordable loss in accuracy using focal plane analog computing. Future research will focus our work in the automotive sector, to detect car overtaking, where the optical flow is the main factor to carry out these purposes.

The authors would like to thank Jarno Ralli for his help in this project. This work was supported by the company Anafocus (Innovaciones Microelectrónicas S.L.) and the Spanish Grants DINAM-VISION (DPI2007-61683) and Andalusia regional projects (P06-TIC-02007 and TIC-3873).

5 × 5 Sparse Integration Block (SIB) representation.

Neighborhood propagation illustration.

Comparison of the gains obtained with the different optimization strategies. Two different implementations are evaluated here, 5 × 5 (red bars) and 9 × 9 (blue bars). (Left to right) The first group of columns represents the Sparse Integration Block (SIB) factor gain; the second group shows the obtained gain after applying the optical flow 4:1 propagation. In the third column group figure the gain when the method is optimized in assembler while last column show total gain factor obtained after all the approaches are combined.

Roberts Cross convolution filter.

(a) Image edge detection in the original image. (b) Image edge detection with a sharpen pre-filtering.

(a) System performance using pre-selected points of interest and 5 × 5 SIB. (b) System performance using pre-selected points of interest and 9 × 9 SIB. Colored mark illustrates three typical scenarios using different image edge densities. The green mark makes reference to the best performance cases (scenes with low edge density), the blue one, to the most common density values (normal scenes) and the red mark, to the worst cases (scenes with high edge density). The error bar represents the standard deviation of the results for 10 trials.

Initial Co-Design scheme.

Different stages to estimate optical flow in Eye-RIS™ system.

Optical flow representation. The color corresponds with the direction of the optical flow vector while the magnitude is encoded as the color intensity.

Optical flow estimation in a traffic sequence. Flow field is overlaid with the original frame. In the first Row (a–c), the optical flow is estimated using 9 × 9 SIB. In the second Row (d–f), the optical flow is estimated using 5 × 5 SIB.

Binary edge map processed, with the proposed method in Section 4.1, in the focal plane.

Optical flow estimation, on edges, in a traffic sequence. The average edge density, in these images, is 42.5%. Flow field is overlaid with the original frame. In the first row (a–c), the optical flow is estimated using 9 × 9 SIB. In the second Row (d–f), the optical flow is estimated using 5 × 5 SIB.

System performance evaluation obtained with a 176 × 144 spatial resolution.

L&K C 9 × 9 Integration Block | 0.3 |

L&K C 9 × 9 SIB | 0.9 |

L&K C 9 × 9 SIB | 3.6 |

L&K C 5 × 5 Integration Block | 1.3 |

L&K C 5 × 5 SIB | 3.3 |

L&K C 5 × 5 SIB and propagation | 8.9 |

Average angular error (AAE) and standard deviation (STD), in Yosemite sequence (without clouds), with the different proposed approaches and densities on Matlab.

5 × 5 Barron’s implementation | 11.01° | 17.14° | 10.32° | 17.40° |

5 × 5 | 20.68° | 21.75° | 20.38° | 20.57° |

9 × 9 | 14.75° | 14.85° | 14.63° | 14.16° |

5 × 5 SIB | 19.64° | 20.72° | 19.34° | 19.54° |

9 × 9 SIB | 14.09° | 13.99° | 13.98° | 13.33° |

5 × 5 Propagation | 20.74° | 21.97° | 20.43° | 20.84° |

9 × 9 Propagation | 14.73° | 15.04° | 14.56° | 14.32° |

5 × 5 SIB + Propagation | 19.71° | 20.98° | 19.39° | 19.86° |

9 × 9 SIB + Propagation | 14.08° | 14.21° | 13.92° | 13.51° |

5 × 5 SIB Propagation + Regularization |
12.30° | 12.18° | 12.10° | 11.29° |

9 × 9 SIB Propagation + Regularization |
10.51° | 8.46° | 10.51° | 8.12° |

5 × 5 SIB Propagation + Regularization |
11.33° | 10.26° | 11.17° | 9.53° |

9 × x9 SIB Propagation + Regularization |
9.89° | 7.11° | 9.86° | 6.75° |

Average angular error (AAE) and standard deviation (STD), in Yosemite sequence (without clouds), with the different proposed approaches and densities on Eye-RIS system.

5 × 5 SIB | 24.79° | 20.12° | 23.45° | 18.55° |

9 × 9 SIB | 17.10° | 14.02° | 15.88° | 12.59° |

5 × 5 SIB Propagation | 24.84° | 20.09° | 23.25° | 18.41° |

9 × 9 SIB Propagation | 17.14° | 13.88° | 15.94° | 12.55° |

5 × 5 SIB Propagation + Regularization |
15.61° | 12.01° | 13.12° | 10.53° |

9 × 9 SIB Propagation + Regularization |
13.06° | 10.11° | 11.25° | 8.86° |

5 × 5 SIB Propagation + Regularization |
14.61° | 10.63° | 12.24° | 9.46° |

9 × 9 SIB Propagation + Regularization |
13.09° | 9.34° | 10.44° | 7.87° |