^{1}

^{1}

^{*}

^{2}

^{1}

^{1}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (

Plenoptic cameras are a new type of sensor that extend the possibilities of current commercial cameras allowing 3D refocusing or the capture of 3D depths. One of the limitations of plenoptic cameras is their limited spatial resolution. In this paper we describe a fast, specialized hardware implementation of a super-resolution algorithm for plenoptic cameras. The algorithm has been designed for field programmable graphic array (FPGA) devices using VHDL (very high speed integrated circuit (VHSIC) hardware description language). With this technology, we obtain an acceleration of several orders of magnitude using its extremely high-performance signal processing capability through parallelism and pipeline architecture. The system has been developed using generics of the VHDL language. This allows a very versatile and parameterizable system. The system user can easily modify parameters such as data width, number of microlenses of the plenoptic camera, their size and shape, and the super-resolution factor. The speed of the algorithm in FPGA has been successfully compared with the execution using a conventional computer for several image sizes and different 3D refocusing planes.

Human beings live in a world of images, a stream that surrounds us every day so it is difficult to imagine life without them. Over the last centuries, there is evidence of the continuous evolution of devices to capture images in its design, size, and optical components. Simultaneously new and sophisticated techniques have been developed for image processing leading to a better experience for the user and to more realistic results. Precisely at the junction between photography and computers appears the Computational Photography [

Unlike their conventional counterpart, plenoptic cameras capture light beams instead of dots or pixels of information. This means that when we capture a photograph, we obtain different perspectives or views of the world allowing the user to refocus the image after the shot, to obtain 3D information or to select between different views of the image.

An important drawback of plenoptic cameras is the generation of low-resolution images due to their spatio-angular lightfield sampling [

This work is structured in five sections including this Introduction. First, we describe the theoretical background of the super-resolution algorithm. Then, Section 3 describes the design of the architecture using FPGA. Section 4 explains the experimental results and finally conclusions and future work are presented in Section 5.

Conventional 2D photographs in a plenoptic camera are obtained theoretically using the photography transform. This transform takes a 4D lightfield as its input and generates a photograph focused on a determined plane [_{F}_{1}, _{2})′ (apostrophe denotes transpose) on the lens plane to position _{1}, _{2})′ on the sensor plane.

The lightfield _{F}_{F}_{F}

This equation explains how to compute a photograph
_{F}

Without loss of generality and to simplify the exposition we will absorb the constant term 1/^{2} into _{F}^{2} term. Also, in order to make the discretization easier we reparametrize the focusing distance using (1 − 1/α) =

In practice, the continuous formulation of the photography operator cannot be used since a plenoptic camera only captures discrete samples of the lightfield (

A simple discretization of _{q}_{x}_{x}_{u}_{u}_{x}_{x1}, _{x2}) and _{u}_{u1}, _{u2}). _{x}_{x}_{u}_{u}_{x}_{x}, n_{x}_{u}_{u}, n_{u}_{NN}

When the super-resolution factor

The FPGA implementation of the algorithm is based on

%d is the super-resolution factor %qp is the 3-D focusing plane %np+1 is the center of the super-resolved output image %num is the numerator of

The Matlab-code loops each (

In the innermost part of the algorithm is computed the contribution of each incoming data to the output image by adding its value with the existing data in the

The algorithm can be accelerated using parallel processing power of FPGAs instead of other classical technology platforms [

Arithmetic computations are performed in pipeline and as parallel as possible.

The algorithm is implemented fully in parallel for each color component.

We have used a hardware divider that performs the division operation for the normalization task in a single clock cycle.

Furthermore, the algorithm implementation using FPGA offers the following additional advantages:

Independence of the technology. The algorithm is fully implemented in VHDL. Thus it can be implemented on any FPGA from any vendor provided that meets the hardware requirements.

The implemented system is modular and scalable because we have used VHDL generics and packages. The implemented system is easily reconfigurable by using these resources of the language. We can easily change the image size, the width of input data and intermediate calculations, the slope and the number of microlenses of the sensor.

The image is introduced into the FPGA by rows as a conventional image. This involves rearranging the nested loops in Algorithm 1. So, the order of the indexes has to change from

The functional architecture of the design has five sub-modules and a package in which is implemented the mask of microlenses used by the plenoptic sensor. This mask takes into account that microlenses vary in size and shape. The name of this component is

%d is the super-resolution factor %qp is the 3-D focusing plane %np+1 is the center of the super-resolved output image %num is the numerator of

There are three modules, one for each color component. The Contribution Accumulator unit computes the den array in Algorithm 2. It has a similar structure to the above mentioned module. Finally, the Divider module implements the normalization of each pixel in the output image by the number of contributions that apply. There is also one for each color component.

The implementation of this module is depicted in

The module contains four nested counters according to the Matlab-code of Table 3. Pixels of the lightfield are read by rows (

In addition, the Position Estimator module acts as the global controller of the FPGA design of the system. In this sense, it is responsible for switching the operation of the accumulation modules according to the value of the

This block is responsible for estimating the memory addresses from the pre-computed positions for reading and writing of memories, for both the accumulation stage and the memory emptying stage (Stage 2), where the memory addresses for the emptying are generated by a counter that goes from 0 to the size of the re-focused output image. The block diagram of the module is shown in

This block is the key element of the architecture, since it makes the sum of the contributions to obtain the output image. The schematic of this module is shown in

The most important blocks of this module are the intermediate memory of storage and the two accumulators in series that precede it. The datum of entry (Image_in6) is delayed five clock cycles to be synchronized with the calculus of the output positions (

An important hardware problem appears when two or more consecutives data have to be accumulated in the same output pixel. These data have associated the same values of

For this reason, one accumulator is implemented to store the data which is called: “

After accumulating all the lightfield data, it is necessary to dump the data of the super-resolved image in a process called

If the

The accumulator module of contributions of the output pixels has a similar architecture to the previous accumulator. The principal difference is that the size of memory is lower, as the data that has to be stored uses a lower quantity of bits.

This is a pipeline divider which makes the division using nine clock cycles. This module makes the division between A and B, of XBits and YBits respectively, with the unique restriction XBITS ≥ YBITS (Being A and B natural numbers), obtaining each nine clock cycles the quotient (

It is arranged as three combinational divisor modules, one for each color channel of the image that operates when the Stage 2 of the process begins. It consists in the division of each output of the memories Red-Green-Blue (RGB) by the common contribution memory (

This module is a ROM which stores a mask with the geometry of the microlenses in the lightfield. The content of this ROM is defined in one package inside the VHDL design. In this way, it is easy to modify this ROM for the size and the shape of the microlenses defined by the user.

The design for an entry image of 341 × 341 pixels (31 × 31 microlenses) is been implemented using the hardware description language VHDL and the synthesizer Xilinx Synthesis Technology (XST) in a development board Digilent Atlys with a Spartan 6 SX6SLX FPGA. The complete system is been checked using the simulation software ModelSim (Mentor Graphics Inc., Wilsonville, OR, USA) and the hardware simulation through Matlab-Simulink Xilinx System Generator (MathWorks, Natick, MA, USA). The maximum frequency is 149.591 MHz. However, the prototype currently operates at 100 MHz, a frequency which provides the Atlys FPGA board. The critical path is located inside the Position Estimation module (count_y block, 6.685 ns).

The architecture of the design allows selecting the 3D plane (slope) where we want to re-focus the image. Furthermore, it allows choosing the super-resolution factor which is applied to the original image.

In

The implemented architecture is a pipeline that permits continuous data streaming. Considering this and assuming that the throughput is the number of frames produced per unit of time, the throughput is 859.99 frames per second (116,281 cycles at 100 MHz for an entry image of 341 × 341 and 31 × 31 microlenses) and 9.76 frames per second (10,246,401 cycles at 100 MHz for an image of 3201 × 3201 pixels entry and 291 × 291 microlenses). The use of internal memory allows simultaneous accesses to the data for each color component. Furthermore, the implemented pipeline divider allows us to obtain one division for each clock cycle after the latency. The method and the technology used in this implementation is useful for mobile applications where GPUs may not be available, for instance embedded on a plenoptic camera, allowing real-time super-resolution on the camera without having to transfer the image to a host machine. Thus, the transfer from the PC to the FPGA has not been considered in the computational time analysis. Taking into account this considerations and the definition of the super-resolution algorithm, the cycles for the operation of the module are:

Independently of the plenoptic image dimensions, there is a delay of 5 cycles for calculate the values of

In

For the computational time analysis in Matlab we have used a Lenovo Z580 computer (Lenovo Group Ltd., Beijing, China) with the following characteristics: Windows 7 Professional 64 bits, Intel Core i7 3612QM 2.10 GHz, 8 Gb RAM DDR3, NVIDIA GeForce GT 630 M. For the time analysis in C++ we have used a Toshiba Satellite A200-1DY (Toshiba Corporation, Tokyo, Japan) with the following characteristics: Xubuntu 12.04 32 bits, Intel Core 2 Duo T7100 1.8 GHz, 3Gb RAM DDR2-OpenCV 2.3.

Results show the improvement in the computational time of the FPGA implementation over the Matlab and C++ simulations. The time reduction factor has been 6,350 for the Images 1 and 2 (341 × 341 pixels each one), and a value of 5,415 for the Image 3 (3,201 × 3,201 pixels) in comparison with the Matlab solution and 47 and 52, respectively, in comparison with the C++ solution. The reduction factor is approximately constant since the computational complexity of the super-resolution algorithm is linear on the size of the plenoptic image. This can be verified noting that true super-resolution is only possible if the size of the output image

Block RAMs are the critical resource for the implementation of the system in a FPGA device.

This paper presents the first FPGA implementation of a super-resolution algorithm for plenoptic sensors. The main contribution of this work is the use of FPGA technology for processing the huge amount of data from the plenoptic sensor. The algorithm execution is significantly faster than the equivalent solution on a conventional computer. This result is due to the extremely high-performance signal processing and conditioning capabilities through parallelism based on FPGA slices and arithmetic circuits and its highly flexible interconnection possibilities. Furthermore, the use of a single FPGA can meet the size requirements for a portable commercial camera.

The design of the super-resolution algorithm was developed using functional VHDL hardware description language and is technology-independent. So, the system can be implemented on any FPGA independently of its size and vendor. The design of the FPGA also makes possible to adapt the hardware to different sizes and shapes of the microlenses.

As future improvements, we are planning to implement the algorithm in bigger FPGAs, for processing plenoptic images of bigger size. The current bottleneck of the implementation is the sequential input of the plenoptic data from the charge-coupled device (CCD) to the FPGA. We are planning to use CCDs that allow parallel transfer of the image data blocks. Taking advantage of the inherent parallelism of the FPGA, and using a device large enough, the system could obtain images focused at different distances simultaneously sharing the input data. We are also considering the implementation of other algorithms based on different super-resolution interpolation kernels.

This work has been partially supported by “Ayudas al Fomento de Nuevos Proyectos de Investigación” (Project 2013/0001339) of the University of La Laguna.

The authors declare no conflict of interest.

Two plane parameterization of the lightfield.

Plenoptic image and details from the white square in the plenoptic images [

Architecture of the designed system.

Diagram of the positions estimator module.

Diagram of the addresses generator.

Diagram of the data accumulator.

Switching between modes of accumulation and dump/erase of the memory.

Original plenoptic image.

(

Super-resolved (

Execution time of the algorithm in Matlab.

Plenoptic input image resolution (pixels) | 341 × 341 | 341 × 341 | 3201 × 3201 |

Number of microlenses | 31 × 31 | 31 × 31 | 291 × 291 |

Super-resolved output image resolution (pixels) | 69 × 69 | 69 × 69 | 589 × 589 |

Slope | 0.4 | −0.4 | −0.4 |

Super-resolution (d) | 2 | 2 | 2 |

Time Stage 1 (FPGA) | 1.162 ms | 1.162 ms | 102.4641 ms |

Time Stage 2 (FPGA) | 47.71 μs | 47.71 μs | 3.4693 ms |

FPGA internal memory resources. BRAM: Block RAM.

XC6SLX45 Spartan 6 | 65 × 65 × 4 | BRAM 18 Kb | 28/116 (28%) |

XC6VCX75T Virtex 6 | 65 × 65 × 4 | BRAM 18 Kb | 28/312 (8%) |

XC6VCX75T Virtex 6 | 205 × 205 × 4 | BRAM 36 Kb | 112/156 (71%) |