This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

A novel low power CMOS imaging system with smart image capture and adaptive complexity 2D-Discrete Cosine Transform (DCT) is proposed. Compared with the existing imaging systems, it involves the smart image capture and image processing stages cooperating together and is very efficient. The type of each 8 × 8 block is determined during the image capture stage, and then input into the DCT block, along with the pixel values. The 2D-DCT calculation has adaptive computation complexity according to block types. Since the block type prediction has been moved to the front end, no extra time or calculation is needed during image processing or image capturing for prediction. The image sensor with block type decision circuit is implemented in TSMC 0.18 µm CMOS technology. The adaptive complexity 2D-DCT compression is implemented based on Cyclone EP1C20F400C8 device. The performance including the image quality of the reconstructed picture and the power consumption of the imaging system are compared to those of traditional CMOS imaging systems to show the benefit of the proposed low power algorithm. According to simulation, up to 46% of power consumption can be saved during 2D DCT calculation without extra loss of image quality for the reconstructed pictures compared with the conventional compression methods.

Wide utilization of portable battery-operated devices in multimedia applications, such as cell phones, portable digital assistants (PDAs) and smart toys, has triggered a demand for ultra low-power image system. CMOS imaging technology has recently become a very attractive solution for these applications as they consume less power, and operate at higher speeds compared with CCD imaging technology [

Many low power designs for image capture were reported during the last decade [

After the image capture, energy-aware data compression is usually performed for efficient transmission. The Discrete Cosine Transform (DCT), and in particular the DCT-II, is often used in image/video processing such as JPEG still image compression, MJPEG, MPEG video compression due to its good energy compaction. However the DCT itself contains very computationally intensive matrix multiplications and therefore is power consuming. Numerous algorithms have been proposed attempting to minimize the number of additions and multiplications such as the Loeffler DCT [

In all the existing CMOS imaging systems, the image capture stage and the compression stage are simply concatenated together. The DCT architectures without multiplier are usually time consuming and hardware-expensive. Some DCT designs subsample the area where pixels change less, but all the block type predictions are made during the digital image processing and require extra processing time and also extra memory space to store the image data during prediction. However, in our imaging system, these two stages are intertwined together. The block type prediction is moved to the front end, that is, the block type is estimated during image capture in an analog way (at the mean time during read out). It is more efficient in terms of power. In addition, no extra time is needed during image processing or image capturing for block type prediction. The 2D DCT calculation has adaptive data format and computation complexity depending on the block type.

The paper is organized as follows. In

In a traditional CMOS imaging system, the image is first captured by the imager, and then the analog image data go through an Analog-to-Digital (ADC) conversion and then digital processing, such as image compression. The general diagram of a basic traditional digital camera system with 2D-DCT based compression is shown in

A traditional digital imaging system with 2D-Discrete Cosine Transform (DCT) calculation.

In hardware the N × N 2D-DCT can be realized by storing the output of the first 1-D DCT in a memory buffer line after line, then applying a second 1-D DCT transform on the columns of the results [

The architecture of the novel low power digital camera system with adaptive complexity 2D-DCT based calculation is shown in

The proposed low power digital CMOS imaging system with smart image capture and adaptive complexity 2D-DCT calculation.

As shown in

Example of the proposed low power algorithm.

Using vector form, the 8 × 8 DCT transform becomes ^{T}^{T}

For background type blocks, the spatial resolution unit will be considered as 2 × 2 during calculation, therefore the row DCT transform becomes ^{T}^{T}

^{'}_{8x4 = }_{8x8}_{8x4}, and the subscriptions indicate the sizes of the matrixes.

So the matrix sizes during computation can be reduced and the times of multiplication and addition are also reduced. In hardware, it is realized by skipping part of the calculations for background type blocks. Similarly, during column DCT, part of the calculation can also be skipped since half of the inputs in each column are the same.

The analog data are read out from imager row by row. After ADC, the data needs to be stored in a memory temporarily for reordering before performing DCT. If the block type prediction is done during image processing, since the digital data is read out from the memory one by one, at least 64 additional clocks are needed in order to make a prediction for an 8 × 8 block. Also, additional memory is required to keep the data during prediction before DCT is performed. However, in our case, since the block type is decided in the imager during the image capture stage rather than during the image processing stage, it does not need additional processing time. Also the idea of adaptive complexity 2D-DCT can be combined with the dual analog Vdd algorithm proposed in [

The proposed imager for smart image capture is mainly composed of a pixel array, row and column decoders, block decision block, and readout circuits, as shown in

It works in a rolling shutter mode, the signals do not need to be sampled to in-pixel capacitors as required in [

The conventional 3T CMOS APS pixel is used in the pixel array [

Block diagram of the proposed imager for smart image capture.

The block type decision unit is shared by each 8 columns and computes the type for each block according to the estimated voltage variance values _{Max}_{Min}

A chip for image capture and block type prediction is implemented based on TSMC 0.18 µm process, as shown in

Layout of the image sensor with block type decision circuitry.

Chip Attributes.

Technology | TSMC 0.18 µm |
---|---|

Voltage supply | 1.8 V |

Pixel array size | 128 × 128 |

Pitch width | 5 µm |

Chip size | 2 mm × 2 mm |

Fill factor | 26% |

Estimated power (whole chip) | 0.5 mW @ 30 FPS |

Estimated power (decision logic) | 7 µW @ 30 FPS |

2D DCT can be done by running a 1D DCT over every row and then every column. Vector processing using parallel multipliers is a method used for implementation of DCT. The advantages of vector processing method are regular structure, simple control and interconnect and good balance between performance and complexity of implementation. The complexity of the 2D-DCT depends on the block type decided by the image sensor during image capture. Two optimizations are performed for the small variance blocks to save power.

For small variance blocks, only the differential part of the pixel values _{pixel}_{DC}

_{first_b} from the pixel values, and then input to the DCT block. The DC values should be compensated to the DC coefficients to generate the final DC coefficients. Another benefit brought by this optimization is a small increment in image quality of the reconstructed picture. The reason is that because fewer bits are performed for DCT coefficients calculation, less information is lost during the truncation stage. There are less values toggling during DCT coefficient calculation and therefore less power is consumed.

Schematic of adaptive input format according to block types.

For small variance blocks, the spatial resolution for these blocks can be reduced while not affecting the image quality much. Consequently part of the calculations can be skipped to save power consumption during DCT. For small variance blocks, the calculation for the second row is just the same as the first row, therefore we can skip the row DCT alternatively. In addition, since half of the inputs of the non-skipped row DCT and column DCT are the same, half of the calculation groups can be skipped during calculation for small variance blocks, as shown in

For now, the adaptive complexity 2D-DCT is implemented based on FPGA (Cyclone EP1C20F400C8) first. Later we are planning to integrate the imager for capture and the compression on the same chip. According to the synthesis report given by Quartus II, there is about 10% hardware increase than a conventional 2D-DCT with unique complexity. The maximum frequency is 100 MHz.

Schematic of adaptive spatial resolution according to block types. (a) Normal case—large variance blocks; (b) Low-complexity case (dashed paths are disabled)—Small variance blocks.

In order to show the benefit of the proposed low power algorithm, the proposed 2D-DCT based computation and a reference 2D-DCT core released by Xilinx [

As shown in

Test images (

Simulations about the PSNR

(

(

The power is estimated by Quartus II based on the Voltage Change Dump (VCD) files from post layout netlist simulation and the PSNR analysis is done by Matlab. The clock frequency used here is 0.5 MHz corresponding to a 128 × 128 array working at 30 FPS. It can be increased up to 100 MHz for imagers with larger array size and higher frame rate. The power savings varies from image to image. It is more efficient for predominant background images with up to about 46% power saving while no extra image quality degradation is observed compared with traditional compression. Extra image quality degradation is small because optimizations are performed only for background blocks. Power saving and the image quality of the reconstructed picture depend on the variance threshold. It is a tradeoff between image quality and power, and can be easily controlled by the threshold according to different applications. As shown in

Power saved During 2D-DCT Calculation.

Images | Background ratio (th = 30) | Percentage of power saving | Extra image quality degradation |
---|---|---|---|

Garden | 0.8% | 0.5% | None |

Cameraman | 50% | 24% | None |

Plane | 90% | 46% | None |

The power of the whole imaging system is the sum of the image sensor, ADC and the compression. For the proposed image sensor, the power estimation of the block type prediction is 7.0 µW. This adds only about 1.4% to the total imager power, and 0.7% to the system. Therefore, we conclude that the expected power savings outperform the extra power caused by the block type decision circuitry and result in significant power savings for the system.

A novel low power CMOS imaging system with smart image capture and adaptive complexity 2D-DCT calculation is proposed, simulated and implemented. The complexity of the 2D-DCT calculation is controlled by the block types which are estimated during image capture stage. It does not add additional processing time or memory space for block type prediction.

The imager is more efficient when the picture has predominant background. By choosing appropriate threshold, up to 46% of the power consumption can be saved during 2D-DCT calculation for images having predominant background, while no extra image quality degradation occurs for the reconstructed pictures compared with traditional compressions. For typical scenarios, up to about 23% of power can be saved for the whole imaging system. The idea of smart image capture and adaptive complexity according to block types can be extended to other 2D DCT architectures.

The authors would like to thank CMC for the support with Cadence tools and chip fabrication.

The authors declare no conflict of interest.