^{1}

^{2}

^{1}

^{*}

^{2}

^{2}

^{1}

^{3}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness.

Optimal straight line detection is a considerable step for several embedded vision applications, and now the largest research focus is based on the Hough Transform (HT) [

Current trends in complex computing architectures integrate Field Programmable Gate Arrays (FPGAs) as a competitive alternative due to its parallelism to accelerate computational performance for embedded vision applications. There are many FPGA implementation research areas, such as lane detection [

In this proposed real-time straight line detection system, detection of edge pixels is a basic task that has a significant influence on the performance of follow-up calculation. Currently, edge detection algorithm mainly includes the following two approaches: (1) Classical gradient differential operator, such as Sobel operator, Canny operator [

The classical HT converts the edge feature image into a new domain called the Hough parameter space as a popular method for straight line detection. We can obtain the computational result in Hough space by mapping the parameter points to image space. Each point in Hough space corresponds to a line in the initial image. Reasonable use of peak information in the Hough space is the common denominator of all HT-related detection methods [

Duda

Furthermore, there are acceleration solutions based on different hardware architectures to increase processing speed for tremendous real-time performance and avoid high development costs. Examples include graphics processing unit (GPU) [

Some attempts in literatures as mentioned above have been made to detect straight lines in low-definition and still images for some vision applications. Many researchers depend on increasing the signal processing frequency for the pre-stored edge image to upgrade computational speed. From the performance and resource consumption perspective, few refine them by the associated optimization of HT software algorithm and hardware architecture applied to real-time high-definition video sequences. So in this paper, we extend the work in [

We primarily present a novel PHT-based straight line detection algorithm and the corresponding architecture implementation on FPGA to deal with the acceleration and accuracy problems of classical HT. As

The block diagram of the proposed straight line detection architecture is depicted in

The resource-optimized edge detection algorithm is an extension of our previous work [

In the classical Canny algorithm, it is difficult to detect accurate edge features in complex backgrounds, because the smoothness of Gaussian filtering with artificial parameters can lead to excessive smoothing (losing edge information) or insufficient smoothing (unable to remove noise). Moreover, insufficient non-maximum suppression conditions result in false edges. In this proposed method, parallel and fast median filter was selected to suppress grain noise, and we do not need to set parameters for implementation in FPGA with flexibility [

In this new edge detection method, digital image is first smoothened by median filter to reduce noise in the image. In this paper, the 3 × 3 template is used to achieve parallel fast median filtering, and the diagram is shown in

After the nine template pixel data is obtained, we design a new parallel sorting and comparing architecture based on the parallelism of FPGA as shown in

In the first level, we divide the nine values into three groups, and three three-value comparators can calculate in parallel with a two clock delay in this level to decide the maximal value (Max), middle value (Mid), and minimal value (Min). For the second level of this architecture, three maximal values, three middle values and three minimal values are assigned into respective groups. Through data analysis, it is clear that in the group of three maximal values, just the minimal value will be the final candidate middle data in nine pixels. So in this group, the Min is selected for the third level middle data comparison. In the middle values group in the second level, just the middle value will be the final candidate middle data. Similarly, in the minimal values group in the second level, just the maximal value will be the final candidate middle data. Finally, in the third level comparison, the real middle data can be decided by a three-value comparator. Therefore, the fast median filtering can be done after a six clock delay.

_{x}_{y}

In order to reduce the computational complexity in parameter coordinates, it is also needed to record whether the gradient is positive or negative as well as its absolute value. As

As expressed in

In this proposed algorithm, we divide the gradient directions into eight regions as

For example, if gradient direction of one pixel is at the range of 0° to 45° or 180° to 225°, the gradient value of this pixel along these two directions can be measured by

Finally, our dual-threshold detection implementation is based on the verification mechanism proposed by Mondal

HT is a well-known and effective method for straight line detection in digital images. This transform converts the binary edge feature image from Cartesian coordinates (

Through accumulating the intersection points, we can find the specific parameter (

Most HT implementations in FPGA platform depend on increasing the processing frequency for low-definition and still images pre-stored in off-chip memory. The sin

In this paper, refering to [_{a}_{b}

Now, we further define

After the definition as

On the assumption that the minimal step of

When

Next, when

Obviously, if we traverse all the possible discrete values of ^{6}, then Δ

Through

In this multi-level pipelined PHT architecture, the straight line parameters can be calculated through 101 pipeline units (

For each pipeline unit, the hardware architecture in FPGA needs two registers, two 6-bit shifts, one adder, and one subtractor. The initial inputs _{a0}_{b0}_{ai}_{bi}

After parameter estimation for all of the candidate edge pixels in one image with multi-level pipelined PHT architecture, RAM_a(i) and RAM_b(i) will contain all the candidate accumulating results. Current work is to find the peak accumulating information in these two inner RAMs for all of intersection points, and this peak corresponds to the candidate longest straight line. To RAM_a(i) or RAM_b(i), through the multi-value comparators, it is easy to find the maximum for FPGA implementation respectively. The maximum search results in RAM_a(i) are the three parameters: the accumulating maximum value _{ai}_{ai}_{(max)}, and the step number of the accumulating maximum value _{ai}_{(max)}. There is a similar operation to RAM_b(i) to get the corresponding three parameters. In the follow-up straight line parameter computation module, a two-value comparator is used to determine the final maximum between _{ai}_{(max)} and _{bi}_{(max)} in both RAM_a(i) and RAM_b(i). Referring to the initial HT definition, we can obtain the specific parameter (_{(max)} in RAM_a(i) or RAM_b(i), and _{(max)} multiplied by) Δ

The integral multi-level pipelined PHT based straight line detection algorithm is shown in

In this section, we propose this embedded vision system based on a FPGA evaluation platform. We evaluate the performance of the FPGA-based straight line detection with throughput, maximum error, memory access bandwidth, and computational time. In addition, we also present qualitative and quantitative experimental results for the accuracy and robustness of our proposed algorithm.

The proposed architecture has been evaluated on ALTERA DE2-115 platform with Cyclone IV EP4CE115F29 FPGA and QuartusII version 10.0 synthesis tool, with the maximum operation frequency of 200 MHz, as

In our hardware architecture and implementation, the fraction part ^{−(F +1)}, where

If the fractional part is 15, the maximum error from Chen

As a PE-based resource-efficient FPGA architecture, the approach in [

In our proposed multi-level pipelined PHT architecture based on spatial angle-level optimization, after clock delay, it can calculate 101 angles per cycle for one edge feature pixel and get the straight line parameters through peak value search for current pixel. So, in our proposed architecture, the memory address width is 16, and the required memory access bandwidth usage is much larger than [

In Chen

From the work of Fowers

From the above qualitative experimental results, our proposed PHT algorithm can detect single straight line in complex background correctly. In this subsection, we present quantitative experimental results to show the accuracy and robustness of this algorithm and hardware architecture. In

In this experiment, we defined the average testing deviation rate of every angle situation (_{i}_{i}_{i}_{i}

In this paper, we have presented a novel PHT algorithm and its FPGA implementation architecture for real-time straight line detection in high-definition video sequences. To obtain fewer but accurate candidate edge pixels, we enhance the non-maximum suppression conditions by a resource-optimized Canny edge detection algorithm. For real-time straight line detection purpose on high-definition video sequences, a novel spatial angle-level PHT algorithm and the corresponding multi-level pipelined PHT hardware architecture are proposed. This gives us an advantage over existing methods which rely on increasing processor frequency.

The proposed algorithm and architecture have been evaluated on the ALTERA DE2-115 evaluation platform with a Cyclone IV EP4CE115F29 FPGA. Quantitative results, including throughput, maximum error, memory access bandwidth, and computational time, on 1,024 × 768 resolution videos are presented and compared with four representative algorithms on different hardware platforms. Due to the PHT software algorithm and its implemented architecture associated optimization, we are not limited just to estimate straight line parameters fast and accurately in high-definition video sequences. This robust and effective embedded vision system has potential applications in various pattern recognition tasks based on high-definition images. Future work consists of exploring spatial and temporal parallelism in the sequence of frames to further reduce computational load.

This work was supported by NSFC (61221001), 973 Program (2010CB731401), the 111 project (B07022) and the Shanghai Key Laboratory of Digital Media Processing and Transmissions.

The authors declare no conflict of interest.

PHT straight line detection algorithm flow.

The block diagram of embedded straight line detection system.

Flow chart of resource-optimized canny edge detection algorithm.

Median filtering diagram.

Architecture of a 3 × 3 pixel template cache.

Calculation architecture of parallel fast median filtering.

Parallel gradient computation architecture.

Eight gradient directions division diagram.

HT representation (

Multi-level pipelined PHT architecture.

Block diagram of the maximum search scheme for parallel pipeline units.

Integral PHT algorithm diagram.

FPGA based embedded straight line detection vision system.

(

(

The experimental comparisons of LSM and PHT method: (a-1)–(a-4) are the original images; (b-1)–(b-4) are the results of LSM of Ji

Accuracy and robustness testing samples.

The deviations of tested angles. (

Performance comparison among different approaches.

Zhou |
0.177 | 2 | 384 MHz | 768 | 256 × 256 |

Mayasandra |
0.125 | 1/9 | 500 MHz | 56 | 256 × 256 |

Chern |
0.125 | 1 | 387 MHz | 387 | 512 × 480 |

Chen |
0.012 | 16 | 333 MHz | 5,328 | 512 × 512 |

Proposed method | 0.027 | 101 | 200 MHz | 20,200 | 1,024 × 768 |

Memory and bandwidth comparison results.

Chen |
512 × 512 | 3,270,032 | 223,360 | 1,172,880 |

Proposed | 1024 × 768 | 0 | 3,052,544 | 2,674,480 |

Calculation time comparison.

LSM of Ji |
15.57 ms | 1,024 × 768 | |

Chen |
2.07–3.61ms | 512 × 512 | |

Proposed Method on FPGA | 15.59 ms | 1,024 × 768 | |

Direct HT Computation on PC | (a-1) | 0.93 s | 1,024 × 768 |

(a-2) | 1.26 s | 1,024 × 768 | |

(a-3) | 1.62 s s | 1,024 × 768 | |

(a-4) | 1.45 | 1,024 × 768 |

FPGA implementation resource consumption.

Combinational LE with no register | 15,704 | 114,480 | 13.72% |

Sequential LE | 1,839 | 114,480 | 1.61% |

Combinational LE with a register | 11,888 | 114,480 | 10.38% |

Dedicated logic registers | 13,727 | 117,053 | 11.73% |

LABs | 2,589 | 7,155 | 36.18% |

M9Ks | 377 | 432 | 87.27% |

Block memory bits | 3,052,544 | 3,981,312 | 76.67% |

Embedded Multiplier 9-bit elements | 8 | 532 | 1.50% |

PLLs | 1 | 4 | 25.00% |

FPGA implementation power consumption.

Total thermal power dissipation | 640.89 mW |

Core dynamic thermal power dissipation | 414.83 mW |

Core static thermal power dissipation | 105.40 mW |

I/O thermal power dissipation | 120.66 mW |

Quantitative experimental results for straight line detection angle deviation.

_{i} |
0.753% | 0.854% | 1.016% | 1.492% | 1.508% | 1.791% |

_{i} |
0.764% | 3.056% | 1.528% | 2.948% | 1.530% | 2.984% |

_{i} |
0.728% | 0.072% | 0.461% | 0.036% | 1.456% | 0.995% |

1.236% |