Next Article in Journal
An Enhanced LoRaWAN Security Protocol for Privacy Preservation in IoT with a Case Study on a Smart Factory-Enabled Parking System
Next Article in Special Issue
An Extension to Deng’s Entropy in the Open World Assumption with an Application in Sensor Data Fusion
Previous Article in Journal
Boosting a Low-Cost Smart Home Environment with Usage and Access Control Rules
Previous Article in Special Issue
A Multimodal Deep Log-Based User Experience (UX) Platform for UX Evaluation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Focus Fusion Technique on Low-Cost Camera Images for Canola Phenotyping

1
Department of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
2
Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C5, Canada
*
Author to whom correspondence should be addressed.
Sensors 2018, 18(6), 1887; https://doi.org/10.3390/s18061887
Submission received: 19 March 2018 / Revised: 4 June 2018 / Accepted: 5 June 2018 / Published: 8 June 2018
(This article belongs to the Collection Multi-Sensor Information Fusion)

Abstract

:
To meet the high demand for supporting and accelerating progress in the breeding of novel traits, plant scientists and breeders have to measure a large number of plants and their characteristics accurately. Imaging methodologies are being deployed to acquire data for quantitative studies of complex traits. Images are not always good quality, in particular, they are obtained from the field. Image fusion techniques can be helpful for plant breeders with more comfortable access plant characteristics by improving the definition and resolution of color images. In this work, the multi-focus images were loaded and then the similarity of visual saliency, gradient, and color distortion were measured to obtain weight maps. The maps were refined by a modified guided filter before the images were reconstructed. Canola images were obtained by a custom built mobile platform for field phenotyping and were used for testing in public databases. The proposed method was also tested against the five common image fusion methods in terms of quality and speed. Experimental results show good re-constructed images subjectively and objectively performed by the proposed technique. The findings contribute to a new multi-focus image fusion that exhibits a competitive performance and outperforms some other state-of-the-art methods based on the visual saliency maps and gradient domain fast guided filter. The proposed fusing technique can be extended to other fields, such as remote sensing and medical image fusion applications.

1. Introduction

The sharp increase in demand for global food raises the awareness of the public, especially agricultural scientists, to global food security. To meet the high demand for food in 2050, agriculture will need to produce almost 50 percent more food than was produced in 2012 [1]. There are many ways to improve yields for canola and other crops. One of the solutions is to increase breeding efficiency. In the past decade, advances in genetic technologies, such as next generation DNA sequencing, have provided new methods to improve plant breeding techniques. However, the lack of knowledge of phenotyping capabilities limits the ability to analyze the genetics of quantitative traits related to plant growth, crop yield, and adaptation to stress [2]. Phenotyping creates opportunities not only for functional research on genes, but also for the development of new crops with beneficial features. Image-based phenotyping methods are those integrated approaches that enable the potential to greatly enhance plant researchers’ ability to characterize many different traits of plants. Modern advanced imaging methods provide high-resolution images and enable the visualization of multi-dimensional data. The basics of image processing have been thoroughly studied and published. Readers can find useful information on image fusion in the textbooks by Starck or Florack [3,4]. These methods allow plant breeders and researchers to obtain exact data, speed up image analysis, bring high throughput and high dimensional phenotype data for modeling, and estimate plant growth and structural development during the plant life cycle. However, with low-cost and low-resolution sensors, phenotyping would meet some obstacles to solve low-resolution images. To cope with this issue, image fusion is a valuable choice.
Image fusion is a technique that combines many different images to generate a fused image with highly informative and reliable information. There are several image fusion types, such as multi-modal and multi-sensor image fusion. In multi-modal image fusion, two different kinds of images are fused, for example, combining a high-resolution image with a high-color image. In multi-sensor image fusion, images from different types of sensors are combined, for example, combining an image from a depth sensor with an image from a digital sensor. Image fusion methods can be divided into three levels depending on the processing: pixel level, feature level, and decision level [5,6]. Image fusion at the pixel level refers to an imaging process that occurs in the pixel-by-pixel manner in which each new pixel of the fused image obtains a new value. At a higher level than the pixel level, feature-level image fusion first extracts the relevant key features from each of the source images and then combines them for image-classification purposes such as edge detection. Decision-level image fusion (also named as interpretation-level or symbol-level image fusion) is the highest level of image fusion. Decision-level image fusion refers to a type of fusion in which the decision is made based on the information separately extracted from several image sources.
Over the last two decades, image fusion techniques have been widely applied in many areas: medicine, mathematics, physics, and engineering. In plant science, many image fusion techniques are being deployed to improve the classification accuracy for determining plant features, detecting plant diseases, and measuring crop diversification. Fan et al. [7] well implemented a Kalman filtering fusion to improve the accuracy of the prediction on the citrus maturity. In a related work, a feature-level fusion technique [8] was successfully developed to detect some types of leaf disease with excellent results. In other similar research, apple fruit diseases were detected by using feature-level fusion in which two or more color and feature textures were combined [9]. Decision-level fusion techniques have been deployed to detect crop contamination and plague [10]. Dimov et al. [11] have also implemented the Ehler’s fusion algorithm (decision level) to measure the diversification of the three critical crop systems with the highest classification accuracy. These findings suggest that image-fusion techniques at many levels are broadly applied in the plant science sector because they offer the highest classification accuracy.
Currently, many multi-focus image fusion techniques have been developed [12,13]. The techniques can be categorized into two classes: spatial domain methods and frequency domain methods. In the spatial domain methods, source images are directly fused, in which the information of image pixels are directly used without any pre-processing or post-processing. In the frequency domain methods, source images are transformed into frequency domain, and then combined. Therefore, frequency domain methods are more complicated and time-consuming than spatial domain methods. Many studies investigated multi-focus image fusion techniques in spatial and frequency domains to improve the outcomes. Wan et al. [14] proposed a method based on the robust principal component analysis in the spatial domain. They developed this method to form a robust fusion technique to distinguish focused and defocused areas. The method outperforms wavelet-based fusion methods and provides better visual perception; however, it has a high computation cost. In the similar spatial domain, a multi-focus image fusion method based on region [15] was developed, in which, their algorithm offers smaller distortion and a better reflection of the edge information and importance of the source image. Similarly, Liu et al. [16] investigated a fusion technique based on dense scale invariant feature transform (SIFT) in the spatial domain. The method performs better than other techniques in terms of visual perception and performance evaluation, but it requires a high amount of memory. In the frequency domain, Phamila and Amutha [17] conducted a method based on Discrete Cosine Transform. The process computes and obtains the highest variance of the 8 × 8 DCT coefficients to reconstruct the fused image. However, the method suffers from undesirable side effects such as blurring and artifact. In a recently published article, the authors reviewed the works using sparse representation (SR)-based methods on multi-sensor systems [18]. Based on sparse representation, the same authors also developed the image fusing method for multi-focus and multi-modality images [19]. This SR method learns an over-complete dictionary from a set of training images for image fusion, it may result in a huge increment of computational complexity.
To deal with these obstacles mentioned above, a new multi-focus image fusion based on the image quality assessment (IQA) metrics is proposed in this paper. The proposed fusion method is developed based on crucial IQA metrics and a gradient domain fast guided image filter (GDFGIF). This approach is motivated by the fact that visual saliency maps, including visual saliency, gradient similarity, and chrominance similarity maps, outperform most of the state-of-the-art IQA metrics in terms of the prediction accuracy [20]. According to Reference [20], visual saliency similarity, gradient similarity, and chrominance maps are vital metrics in accounting for the visual quality of image fusion techniques. In most cases, changes of visual saliency (VS) map can be a good indicator of distortion degrees and thus, VS map is used as a local weight map. However, VS map does not work well for the distortion type of contrast change. Fortunately, the image gradient can be used as an additional feature to compensate for the lack of contrast sensitivity of the VS map. In addition, VS map does not work well for the distortion type change of color saturation. This color distortion cannot be well represented by gradient either since usually the gradient is computed from the luminance channel of images. To deal with this color distortion, two chrominance channels are used as features to represent the quality degradation caused by color distortion. These IQA metrics have been proved to be stable and have the best performance [20]. In addition, gradient domain guided filter (GDGIF) [21] and fast guided filter (FGF) [22] are adopted in this work as the combination of GDGIF and FGF and can offer fast and better fused results, especially near the edges, where halo artifacts appear in the original guided image filter. This study focuses on how to fuse multi-focus color images to enhance the resolution and quality of the fused image using a low-cost camera. The proposed image fusion method was developed and compared with other state-of-the-art image fusion methods. In the proposed multi-focus image fusion, two or more images captured by the same sensor from the same visual angle but with a different focus are combined to obtain a more informative image. For example, a fused image with clearer canola seedpods can be produced by fusing many different images of a canola plant acquired by the same Pi camera at the same angle with many different focus lengths.

2. Methodology

2.1. Data Acquisition System

This image fusion work is part of the development of a low-cost, high throughput phenotyping mobile system for canola in which a low-cost Raspberry Pi camera is used as a source of image acquisition. This system includes a 3D Time-of-Flight camera, a Pi camera, a Raspberry Pi3 (RP3), and appropriate power supplies for the cameras and the mini computer (Raspberry Pi3). A built-in remote control allows the user to start and stop image recording as desired. Figure 1 shows various components of the data acquisition system. Data are recorded in the SD card of the RP3 and retrieved using USB connection to a laptop before the images are processed. The Kuman for Raspberry Pi 3 Camera Module with adjustable focus is used in this system. This camera is connected to the Raspberry Pi using the dedicated CSI interface. The Pi camera equips to the 5 megapixels OV5647 sensor. It is capable of capturing 2592 × 1944 pixels static images; it also supports video capturing of 1080 p at 30 fps, 720 p at 60 fps, and 640 × 480 p at 60/90 formats.
The testing subjects were the canola plants at different growing stages. The plants were growing in a controlled environment and also in the field. To capture images of the canola, the plants were directly placed underneath the Pi camera that fixed on the tripod at a distance of 1000 mm (Figure 1). Each canola plant was recorded at 10 fps for 3 s. The time between each change of the focal length is 10 s. Only frame number 20 of each video stream acquired from the Pi camera was extracted to store in the database for later use. The reason for selecting the 20th frame is that the plants and the camera are required to be stable before the images are being captured and processed. Only the regions containing the plant in the selected images were cropped and used for multi-focus image fusion methods.

2.2. Image Fusion Algorithm

In the proposed fusion approach, three image quality assessment (IQA) metrics: visual saliency similarity, gradient similarity, and chrominance similarity (or color distortion) are measured to obtain their weight maps. Then these weight maps are refined by a gradient domain fast guided filter in which, a gradient domain guided filter proposed by Reference [21] and a fast guided filter proposed by Reference [22] are combined. The workflow of the proposed multi-focus image fusion algorithm is illustrated in Figure 2. The detail of the proposed algorithm is described as follows.
First, each input image is decomposed into a base and detailed component, which contain the large-scale and small-scale variations in intensity. A Gaussian filter is used for each source image to obtain its base component, and the detailed component can be easily obtained by subtracting the base component from the input image, as given by:
B n = I n G r , σ
D n = I n B n
where B n and D n are the base and detail component of the n t h input image, respectively. denotes convolution operator, and G r , σ is a 2-D Gaussian smoothing filter.
Several measures were used to obtain weight maps for image fusing. Visual saliency similarity, gradient similarity, and chrominance maps are vital metrics in accounting for the visual quality of image fusion techniques [20]. In most cases, changes of visual saliency (VS) map can be a good indicator of distortion degrees and thus, VS map is used as a local weight map. However, VS map does not work very well for the distortion type of contrast change. Fortunately, the gradient modulus can be used as an additional feature to compensate for the lack of contrast sensitivity of the VS map. In addition, VS map does not work well for the distortion type change of color saturation. This color distortion cannot be well represented by gradient either since usually gradient is computed from the luminance channel of images. To deal with this color distortion, two chrominance channels are used as features to represent the quality degradation caused by color distortion. Motivated by these metrics, an image fusion method is designed based on the measurement of the three key visual features of input images.

2.2.1. Visual Saliency Similarity Maps

A saliency similarity detection algorithm proposed by [23] is adopted to calculate visual saliency similarity due to its higher accuracy and low computational complexity. This algorithm is constructed by combining three simple priors: frequency, color, and location. The visual saliency similarity maps are calculated as
V S n k = S F n k · S C n k · S D n k
where S F n k , S C n k , S D n k are the saliency at pixel k under frequency, color and location priors. S F n k is calculated by
S F n k = ( I L n k g ) 2 + ( I a n k g ) 2 + ( I b n k g ) 2 ) 1 / 2
where I L n k , I a n k , I b n k are three resulting channels transformed from the given RGB input image, I n to CIEL*a*b* space. * denotes the convolution operation. CIEL*a*b* is an opponent color system that a* channel represents green-red information while b* channel represents blue-yellow information. If a pixel has a smaller (greater) a* value, it would seem greenish (reddish). If a pixel has a smaller (greater) b* value, it would seem blueish (yellowish). Then, if a pixel has a higher a* or b* value, it would seem warmer; otherwise, colder. The color saliency S C n at pixel k is calculated using
S C n k = 1 exp ( ( I a n k ) 2 + ( I b n k ) 2 σ C 2 )
where σ C is a parameter. I a n k = I a n k m i n a m a x a m i n a , I b n k = I b n k m i n b m a x b m i n b , m i n a ( m a x a ) is the minimum (maximum) value of the I a and minb (maxb) is the minimum (maximum) value of the I b .
Many studies found that the regions near the image center are more attractive to human visual perception [23]. It can thus be suggested that regions near the center of the image will be more likely to be “salient” than the ones far away from the center. The location saliency at pixel k under the location prior can be formulated by
S D n k = exp ( k c 2 σ D 2 )
where σ D is a parameter. c is the center of the input image I n . Then, the visual saliency is used to construct the visual saliency (VS) maps, given by
V S m = V S G r , σ
where G r , σ is a Gaussian filter.

2.2.2. Gradient Magnitude Similarity

According to Zhang et al. [24], the gradient magnitude is calculated as the root mean square of image directional gradients along two orthogonal directions. The gradient is usually computed by convolving an image with a linear filter such as the classic Sobel, Prewitt and Scharr filters. The gradient magnitude similarity algorithm proposed by Reference [24] is adopted in this study. This algorithm uses a Scharr gradient operator, which could achieve slightly better performance than Sobel and Prewitt operators [25]. With the Scharr gradient operator, the partial derivatives G M x n k and G M y n k of an input image I n are calculated as:
G M x n k = 1 16 [ 3 0 3 10 0 10 3 0 3 ] I n k G M y n k = 1 16 [ 3 0 3 10 0 10 3 0 3 ] I n k
The gradient modulus of the image I n is calculated by
G M n = G M x 2 + G M y 2
The gradient is computed from the luminance channel of input images that will be introduced in the next section. Similar to the visual saliency maps, the gradient magnitude (GM) maps is constructed as
G M m = G M G r , σ

2.2.3. Chrominance Similarity

The RGB input images are transformed into an opponent color space, given by
[ L M N ] = [ 0.06 0.63 0.27 0.30 0.04 0.35 0.34 0.6 0.17 ] [ R G B ]
The L channel is used to compute the gradients introduced in the previous section. The M and N (chrominance) channels are used to calculate the color distortion saliency, given by
M n = 0.30 R + 0.04 G 0.35 B
N n = 0.34 R 0.6 G + 0.17 B
C n = M n · N n
Finally, the chrominance similarity or color distortion saliency (CD) maps are calculated by
C D m = C G r , σ

2.2.4. Weight Maps

Using three measured metrics above, the weight maps are computed as given by
W n = ( V S m ) α · ( G M m ) β · ( C D m ) ɤ
where α , β , and ɤ are parameters used to control the relative importance of visual saliency (VS), gradient saliency (GM), and color distortion saliency (CD). From these weight maps, W at each location k, the overall weight maps of each input image can be obtained.
W n k = { 1 , i f   W n k = max ( W 1 k ,   W 2 k , , W N k ) , 0 , o t h e r w i s e ,
where N is the number of input images, W n k is the weight value of the pixel k in the n t h image. Then proposed weight maps are determined by normalizing the saliency maps as follows:
W n k = W n k n = 1 N W n k ,   n = 1 , 2 , , N
These weight maps are then refined by a gradient domain guided filter described in the next section.

2.2.5. Gradient Domain Fast Guided Filter

The gradient domain guided filter proposed by Reference [21] is adopted to optimize the initial weight maps. By using this filter, the halo artifacts can be more effectively suppressed. It is also less sensitive to its parameters but still has the same complexity as the guided filter. The gradient domain guided filter has good edge-preserving smoothing properties as the bilateral filter, but it does not suffer from the gradient reversal artifacts. The filtering output is a local linear model of the guidance image. This is one of the fastest edge-preserving filters. Therefore, the gradient domain guided filter can apply in image smoothing to avoid ringing artifacts.
It is assumed that the filtering output Q is a linear transform of the guidance image G in a local window w k centered at pixel k .
Q i = a k G i + b k ,   i w k
where ( a k ,   b k ) are some linear coefficients assumed to be constant in the local window w k with the size of ( 2 ζ 1 + 1 ) × ( 2 ζ 1 + 1 ) . The linear coefficients ( a k ,   b k ) can be estimated by minimizing the cost function in the window w k between the output image Q and the input image P
E ( a k , b k ) = i w k [ ( Q i P i ) 2 + λ Ґ ^ G ( k ) ( a k γ k ) 2 ]
where γ k is defined as
γ k = 1 1 1 + e ɳ ( χ ( k ) ) μ χ ,
μ χ , is the mean value of all χ ( k ) . ɳ is calculated as 4 / ( μ χ , min ( χ ( k ) ) ) .
Ґ ^ G ( k ) is a new edge-aware weighting used to measure the importance of pixel k with respect to the whole guidance image. It is defined by using a local variance of 3 × 3 windows and ( 2 ζ 1 + 1 ) × ( 2 ζ 1 + 1 ) windows of all pixels by
Ґ ^ G ( k ) = 1 N i = 1 N χ ( k ) + ε χ ( i ) + ε
where χ ( k ) = σ G , 1 ( k ) σ G , ζ 1 ( k ) . ζ 1 is the window size of the filter.
The optimal values of a k and b k are computed by
a k = μ G X , ζ 1 ( k ) μ G , ζ 1 ( k ) μ X , ζ 1 ( k ) + λ Ґ ^ G ( k ) γ k σ G , ζ 1 2 ( k ) + λ Ґ ^ G ( k )
b k = μ X , ζ 1 ( k ) a k   μ G , ζ 1 ( k )
The final value of Q i ^ is calculated by
Q i ^ = a ¯ k G i + b ¯ k
where a ¯ k and b ¯ k are the mean values of a k and b k in the window, respectively. a ¯ k and b ¯ k are computed by
a ¯ k = 1 | w ζ 1 ( k ) | i w ζ 1 ( k ) a k
b ¯ k = 1 | w ζ 1 ( k ) | i w ζ 1 ( k ) b k
where | w ζ 1 ( k ) | is the cardinality of w ζ 1 ( k ) .

2.2.6. Refining Weight Maps by Gradient Domain Guided Filter

Due to these weight maps being noisy and not well aligned with the object boundaries, the proposed approach deploys a gradient domain guided filter to refine the weight maps. The gradient domain guided filter is used at each weight map W n with the corresponding input image I n . However, the weigh map W _ D n used W _ B n as the guidance image to improve the W _ D n , it is calculated by
W _ B n = G r 1 , ε 1 ( W n , I n )
W _ D n = G r 2 ,   ε 2 ( W _ B n , I n )
where r 1 , ε 1 and r 2 , and ε 2 are the parameters of the guided filter. W _ B n and W _ D n are the refined weight maps of the base and detail layers, respectively. Both weight maps W _ B n and W _ D n are deployed using mathematical morphology techniques to remove small holes and unwanted regions in the focus and defocus regions. The morphology techniques are described as bellow,
m a s k   = W n < t h r e s h o l d t e m p 1 = i m f i l l ( m a s k , h o l e s ) t e m p 2 = 1 t e m p 1 t e m p 3 = i m f i l l ( t e m p 2 , h o l e s ) W n ( r e f i n e d )   = b w a r e a o p e n ( t e m p 3 , t h r e s h o l d )
Then, the values of the N refined weight maps are normalized such that they sum to one at each pixel k. Finally, the fused base and detail layer images are calculated and blended to fuse the input images, as given by
B ¯ n = W _ B n   B n
D ¯ n = W _ D n   D n
F u s e d n = B ¯ n + D ¯ n
The fast-guided filter is improved by the guided filter proposed by Reference [22]. This algorithm is adopted for reducing the processing of gradient domain guided filter time complexity. Before processing the gradient domain guided filter, the rough transmission map and the guidance image employ nearest the neighbor interpolation down-sampling. After gradient domain guided filter processing, the gradient domain guided filter output image uses bilinear interpolation for up-sampling and obtains the refining transmission map. Using this fast-guided filter, the gradient domain guided filter performs better than the original one. Therefore, the proposed filter was named as the gradient domain fast guided filter.

3. Results and Discussion

3.1. Multi-Focus Image Fusion

This section describes the comprehensive experiments conducted to evaluate and verify the performance of the proposed approach. The proposed algorithm was developed to fit many types of multi-focus images that are captured by any digital camera or Pi camera. The proposed method was also compared with five multi-focus image fusion techniques: the multi-scale weighted gradient based method (MWGF) [26], the DCT based Laplacian pyramid fusion technique (DCTLP) [27], the image fusion with guided filtering (GFF) [28], the gradient domain-based fusion combined with a pixel-based fusion (GDPB) [29], and the image matting (IM)-based fusion algorithm [30]. The codes of these methods were downloaded and run on the same computer to compare to the proposed method.
The MWGF method is based on the image structure saliency and two scales to solve the fusion problems raised by anisotropic blur and miss-registration. The image structure saliency is used because it reflects the saliency of local edge and corner structures. The large-scale measure is used to reduce the impacts of anisotropic blur and miss-registration on the focused region detection, while the small-scale measure is used to determine the boundaries of the focused regions. The DCTLP presents an image fusion method using Discrete Cosine Transform based Laplacian pyramid in the frequency domain. The higher level of pyramidal decomposition, the better quality of the fused image. The GFF method is based on fusing two-scale layers by using a guided filter-based weighted average method. This method measures pixel saliency and spatial consistency at two scales to construct weight maps for the fusion process. The GDPB method fuses luminance and chrominance channels separately. The luminance channel is fused by using a wavelet-based gradient integration algorithm coupled with a Poisson Solver at each resolution to attenuate the artifacts. The chrominance channels are fused based on a weighted sum of the chrominance channels of the input images. The image mating fusion (IM) method is based on three steps: obtaining the focus information of each source image by morphological filtering, applying an image matting technique to achieve accurate focused regions of each source image, and combining these fused regions to construct the fused image.
All methods used the same input images as the ones applied in the proposed technique. Ten multi-focused image sequences were used in the experiments. Four of them were canola images captured by setting well-focused and manual changing focal length of the Pi camera; the others were selected from the general public datasets used for many image fusion techniques. These general datasets are available in Reference [31,32]. In the first four canola database sets, three of them were artificial multi-focus images obtained by using LunaPic tool [33], one of them was a multi-focus image acquired directly from the Pi camera after cropping the region of interest as described in Section 2.1.
The empirical parameters of the gradient domain fast guided filter and VS metrics were adjusted to obtain the best outputs. The parameters of the gradient domain fast guided filter (see Equation (22)) consisted of a window size filter ( ζ 1 ), a small positive constant ( ε ) , subsampling of the fast-guided filter (s), and a dynamic range of input images (L). The parameters of VS maps (Equation (16)), including α, β, and γ, were used to control visual saliency, gradient similarity, and color distortion measures, respectively. These empirical parameters of the gradient domain fast guided filters were experimentally set as s = 4, L = 9, and two pairs of ζ 1 ( 1 ) = 4 , ε ( 1 ) = 1.0 e 6 and ζ 1 ( 2 ) = 4 , ε ( 2 ) = 1.0 e 6 for optimizing base and detail weight maps. Other empirical parameters of VS maps were set as α = 1, β = 0.89, and γ = 0.31.
Surprisingly, when changing these parameters of the VS maps, such as, α = 0.31, β = 1, and γ = 0.31, the fused results had a similar quality to the first parameter settings. It can be thus concluded that to obtain focused regions, both visual saliency and gradient magnitude similarity can be used as the main saliencies. In addition, the chrominance colors (M and N) also contributed to the quality of the fused results. For example, when increasing the parameters of M and N, the blurred regions appeared in the fused results. Figure 3 shows the outputs of the proposed algorithm, including visual saliency, gradient magnitude similarity, and chrominance colors. The red oval denotes the defocused region of the input image (Figure 3a).

3.2. Comparison with Other Multi-Fusion Methods

In this section, a comprehensive assessment, including both subjective and objective assessment, is used to evaluate the quality of fused images obtained from the proposed and other methods. Subjective assessments are the methods used to evaluate the quality of an image through many factors, including viewing distance, display device, lighting condition, vision ability, etc. However, subjective assessments are expensive and time consuming. Therefore, objective assessments—mathematical models—are designed to predict the quality of an image accurately and automatically.
For subjective or perceptual assessment, the comparisons of these fused images are shown from Figure 4, Figure 5, Figure 6 and Figure 7. The figures show the fused results of the “Canola 1”, “Canola 2”, “Canola 4” and “Rose flower” image sets. In these examples, (a) and (b) are two source multi-focus images, and (c), (d), (e), (f), (g), and (h) are the fused images obtained with the MWGF, DCTLP, GFF, GDPB, IM, and the proposed methods, respectively. In almost all the cases, the MWGF method offers quite good fused images; however, sometimes it fails to deal with the focused regions. For example, the blurred regions remain in the fused image as marked by the red circle in Figure 4c. The DCTLP method offers fused images as good as the MWGF but causes blurring of the fused images in all examples. The IM method also provides quite good results; however, ghost artifacts remain in the fused images, as shown in Figure 4g, Figure 6g, and Figure 7g. Although the fused results of the GFF method reveal good visual effects at first glance, small blurred regions are still remained at the edge regions (the boundary between focused and defocused regions) of the fused results. This blurring of edge regions can be seen in the “Rose flower” fused images in Figure 7e. The fused images of the GDPB method have unnatural colors and too much brightness. The fused results of the GDPB are also suffered from the ghost artifacts on the edge regions and on the boundary between the focused and defocused regions. It can be clearly seen that the proposed algorithm can obtain clearer fused images and better visual quality and contrast than other algorithms due to its combination of the gradient domain fast-guided filter and VS maps. The proposed algorithm offers fused images with fewer block artifacts and blurred edges.
In addition to subjective assessments, an objective assessment without the reference image was also conducted. Three objective metrics, including mutual information (MI) [34], structural similarity (QY) [35], and the edge information-based metric Q(AB/F) [36] were used to evaluate the fusion performance of different multi-focus fusion methods.
The mutual information (MI) measures the amount of information transferred from both source images into the resulting fused image. It is calculated by
M I = 2 ( I ( X , F ) H ( F ) + H ( X ) + I ( Y , F ) H ( F ) + H ( Y ) )
where I ( X , F ) is the mutual information of the input image X and fused image F. I ( Y , F ) is the mutual information of the input image Y and fused image F. H ( X ) , H ( Y ) , and H ( F ) denotes the entropies of the input image X, Y, and used image F, respectively.
The structural similarity (QY) measures the corresponding regions in the reference original image x and the test image y. It is defined as
Q ( x , y , f | w ) = { λ ( w ) S S I M ( x , f | w ) + ( 1 λ ( w ) ) S S I M ( y , f | w ) , f o r   S S I M ( x , y | w ) 0.75 m a x { S S I M ( x , f | w ) ,   S S I M ( y , f | w ) } , f o r   S S I M ( x , y | w ) < 0.75
where λ ( w ) = s ( x | w ) s ( x | w ) + s ( y | w )   is the local weight, and s ( x | w ) and s ( y | w ) are the variances of w x and w y , respectively.
The edge information-based metric Q A B / F measures the amount of edge information that is transferred from input images to the fused image. For the fusion of source images A and B resulting in a fused image F, gradient strength g ( n , m ) and orientation α ( n , m ) are extracted at each pixel (n, m) from an input image, as given by
g A ( n , m ) = s A x ( n , m ) 2 + s A y ( n , m ) 2
α A = t a n 1 ( s A y ( n , m ) s A x ( n , m ) )
where s A x ( n , m ) and s A y ( n , m ) are the output of the horizontal and vertical Sobel templates centered on pixel p A ( n , m ) and convolved with the corresponding pixels of input image A. The relative strength and orientation values of G A F ( n , m ) and A A F ( n , m ) of the input image A with respect to the fused image F are calculated by
G A F ( n , m ) = { g F ( n , m )   g A ( n , m ) i f   g A ( n , m ) > g F ( n , m ) g A ( n , m )   g F ( n , m ) , o t h e r w i s e
A A F ( n , m ) = 1 | α A ( n , m ) α F ( n , m ) | π / 2
From these values, the edge strength and orientation values are derived, as given by
Q g A F ( n , m ) = Ґ g 1 + e K g ( G A F ( n , m ) σ g )
Q α A F ( n , m ) = Ґ α 1 + e K α ( A A F ( n , m ) σ α )
Q g A F ( n , m ) and Q α A F ( n , m ) model information loss between the input image A and the fused image F. The constants Ґ g , K g , σ g and Ґ α , K α ,   σ α determine the exact shape of the sigmoid functions used to form the edge strength and orientation preservation values (Equation 40 and Equation 41). Edge information preservation values are formed by
Q A F ( n , m ) =   Q g A F ( n , m ) Q α A F ( n , m )
with 0 Q A F ( n , m ) 1 . The higher value of Q A F ( n , m ) , the less loss of information of the fused image.
The fusion performance Q AB / F is evaluated as a sum of local information preservation estimates between each of the input images and fused image, it is defined as
Q A B / F = n = 1 N m = 1 M Q A F ( n , m ) w A ( n , m ) + Q B F ( n , m ) w B ( n , m ) ) j = 1 N j = 1 M ( w A ( i , j ) + w B ( i , j ) )
where Q A F ( n , m ) and Q B F ( n , m ) are edge information preservation values, weighted by w A ( n , m ) and w B ( n , m ) , respectively.
Table 1 illustrates the quantitative assessment values of five different multi-focus fusion methods and the proposed method. The larger the value of these metrics, the better image quality is. The values shown in bold represent the highest performance. From Table 1, it can be seen that the proposed method produces the highest quality scores for all three objectives metrics except for QY with “Canola 2” datasets and QAB/F with “Book” (extra images were also run to test the performance). These largest quality scores imply that the proposed method performed well, stably, and reliably. Overall, it can be concluded that the proposed method reveals the competitive performance when compared with previous multi-focus fusion methods both in visual perception and objective metrics. Table 2 describes the ranking of the proposed method with others based on the quality of fused images. The performance (including quality of the images and the processing time) is scaled from 1 to 6. The results show the outperformance of the proposed technique with other techniques previously published.

4. Summary and Conclusions

To improve the description and quality images, especially images acquired from the digital camera or the Pi camera for canola phenotyping, an image fusion method is necessary. A new multi-focus image fusion method was proposed with the combination of the VS maps and gradient domain fast guided filters. In the proposed algorithm, the VS maps were first deployed to obtain visual saliency, gradient magnitude similarity saliency, and chrominance saliency (or color distortions), then the initial weight map was constructed with a mix of three metrics. Next, the final decision weight maps were obtained by optimizing the initial weight map with a gradient domain fast guided filter at two components. Finally, the fused results were retrieved by the combination of two-component weight maps and two-component source images that present large-scale and small-scale variations in intensity. The proposed method was compared with five proper representative fusion methods both in subjective and objective evaluations. Based on the experiment’s results, the proposed fusion method presents a competitive performance with or outperforms some state-of-the-art methods based on the VS maps measure and gradient domain fast guided filter. The proposed method can use digital images which are captured by either a high-end or low-end camera, especially the low-cost Pi camera. This fusion method can be used to improve the images for trait identification in phenotyping of canola or other species.
On the other hand, some limitations of the proposed multi-focus image fusion, such as small-blurred regions in the boundaries between the focused and defocused regions and computational cost, are worthwhile to investigate. Morphological techniques and optimizing the multi-focus fusion algorithm are also recommended for further study.
Furthermore, 3D modeling from enhancing depth images and image fusion techniques should be investigated. The proposed fusion technique can be implemented in the phenotyping system which has multiple sensors, such as thermal, LiDAR, or high-resolution sensors to acquire multi-dimensional images to improve the quality or resolution of the 2D and 3D images. The proposed system and fusion techniques can be applied in plant phenotyping, remote sensing, robotics, surveillance, and medical applications.

Author Contributions

T.C. and K.P. conceived, designed, and performed the experiments; T.C. also analyzed the data; A.D., K.W., S.V. contributed revising, reagents, materials, analysis tools; T.C., A.D., and K.W. wrote the paper.

Funding

The work is supported by The Canada First Research Excellence Fund through the Global Institute for Food Security, University of Saskatchewan, Canada.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Food and Agriculture Organization of the United Nations. The Furture of Food and Agriculture—Trends and Challenges. Available online: http://www.fao.org/3/a-i6583e.pdf/ (accessed on 28 February 2018).
  2. Li, L.; Zhang, Q.; Huang, D. A review of imaging techniques for plant phenotyping. Sensors 2014, 14, 20078–20111. [Google Scholar] [CrossRef] [PubMed]
  3. Starck, J.L.; Fionn, S.; Bijaoui, A. Image Processing and Data Analysis: The Multiscale Approach; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
  4. Florack, L. Image Structure; Springer: Dordrecht, Netherlands, 1997. [Google Scholar]
  5. Nirmala, D.E.; Vaidehi, V. Comparison of Pixel-level and feature level image fusion methods. In Proceedings of the 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 11–13 March 2015; pp. 743–748. [Google Scholar]
  6. Olding, W.C.; Olivier, J.C.; Salmon, B.P. A Markov Random Field model for decision level fusion of multi-source image segments. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 2385–2388. [Google Scholar]
  7. Liang, F.; Xie, J.; Zou, C. Method of Prediction on Vegetables Maturity Based on Kalman Filtering Fusion and Improved Neural Network. In Proceedings of the 2015 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS), Tianjin, China, 1–3 November 2015. [Google Scholar] [CrossRef]
  8. Padol, P.B.; Sawant, S.D. Fusion classification technique used to detect downy and Powdery Mildew grape leaf diseases. In Proceedings of the 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication (ICGTSPICC), Jalgaon, India, 22–24 December 2016. [Google Scholar] [CrossRef]
  9. Samajpati, B.J.; Degadwala, S.D. Hybrid approach for apple fruit diseases detection and classification using random forest classifier. In Proceedings of the 2016 International Conference on Communication and Signal Processing (ICCSP), Melmaruvathur, India, 6–8 April 2016. [Google Scholar] [CrossRef]
  10. West, T.; Prasad, S.; Bruce, L.M.; Reynolds, D.; Irby, T. Rapid detection of agricultural food crop contamination via hyperspectral remote sensing. In Proceedings of the 2009 IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2009), Cape Town, South Africa, 12–17 July 2009. [Google Scholar] [CrossRef]
  11. Dimov, D.; Kuhn, J.; Conrad, C. Assessment of cropping system diversity in the fergana valley through image fusion of landsat 8 and sentinel-1. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, III-7, 173–180. [Google Scholar] [CrossRef]
  12. Kaur, G.; Kaur, P. Survey on multifocus image fusion techniques. In Proceedings of the International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, India, 3–5 March 2016. [Google Scholar] [CrossRef]
  13. Sun, J.; Han, Q.; Kou, L.; Zhang, L.; Zhang, K.; Jin, Z. Multi-focus image fusion algorithm based on Laplacian pyramids. J. Opt. Soc. Am. 2018, 35, 480–490. [Google Scholar] [CrossRef] [PubMed]
  14. Wan, T.; Zhu, C.; Qin, Z. Multifocus image fusion based on robust principal component analysis. Pattern Recognit. Lett. 2013, 34, 1001–1008. [Google Scholar] [CrossRef]
  15. Li, X.; Wang, M. Research of multi-focus image fusion algorithm based on Gabor filter bank. In Proceedings of the 2014 12th International Conference on Signal Processing (ICSP), Hangzhou, China, 19–23 October 2014. [Google Scholar] [CrossRef]
  16. Liu, Y.; Liu, S.; Wang, Z. Multi-focus image fusion with dense SIFT. Inf. Fusion 2015, 23, 139–155. [Google Scholar] [CrossRef]
  17. Phamila, Y.A.V.; Amutha, R. Discrete Cosine Transform based fusion of multi-focus images for visual sensor networks. Signal Process. 2014, 95, 161–170. [Google Scholar] [CrossRef]
  18. Zhang, Q.; Liu, Y.; Blum, R.S.; Han, J.; Tao, D. Sparse Representation based Multi-sensor Image Fusion: A Review. arXiv, 2017; arXiv:1702.03515. [Google Scholar]
  19. Zhang, Q.; Liu, Y.; Blum, R.S.; Han, J.; Tao, D. Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images. J. Inf. Fusion 2018, 40, 57–75. [Google Scholar] [CrossRef]
  20. Zhou, Q.; Liu, X.; Zhang, L.; Zhao, W.; Chen, Y. Saliency-based image quality assessment metric. In Proceedings of the 2016 3rd International Conference on Systems and Informatics (ICSAI), Shanghai, China, 19–21 November 2016. [Google Scholar] [CrossRef]
  21. Kou, F.; Chen, W.; Wen, C.; Li, Z. Gradient Domain Guided Image Filtering. IEEE Trans. Image Process. 2015, 24, 4528–4539. [Google Scholar] [CrossRef] [PubMed]
  22. Zhang, Q.; Li, X. Fast image dehazing using guided filter. In Proceedings of the 2015 IEEE 16th International Conference on Communication Technology (ICCT), Hangzhou, China, 18–20 October 2015. [Google Scholar] [CrossRef]
  23. Zhang, L.; Gu, Z.; Li, H. SDSP: A novel saliency detection method by combining simple priors. In Proceedings of the 2013 20th IEEE International Conference on Image Processing (ICIP), Melbourne, VIC, Australia, 15–18 September 2013. [Google Scholar] [CrossRef]
  24. Zhang, L.; Shen, Y.; Li, H. VSI: A Visual Saliency-Induced Index for Perceptual Image Quality Assessment. IEEE Trans. Image Process. 2014, 23, 4270–4281. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Trans.Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Zhou, Z.; Li, S.; Wang, B. Multi-scale weighted gradient-based fusion for multi-focus images. Inf. Fusion 2014, 20, 60–72. [Google Scholar] [CrossRef]
  27. Naidu, V.; Elias, B. A novel image fusion technique using dct based laplacian pyramid. Int. J. Inven. Eng. Sci. 2013, 1, 1–9. [Google Scholar]
  28. Li, S.; Kang, X.; Hu, J. Image Fusion with Guided Filtering. IEEE Trans. Image Process. 2013, 22, 2864–2875. [Google Scholar] [CrossRef] [PubMed]
  29. Paul, S.; Sevcenco, I.S.; Agathoklis, P. Multi-Exposure and Multi-Focus Image Fusion in Gradient Domain. J. Circuits Syst. Comput. 2016, 25, 1650123. [Google Scholar] [CrossRef] [Green Version]
  30. Li, S.; Kang, X.; Hu, J.; Yang, B. Image matting for fusion of multi-focus images in dynamic scenes. Inf. Fusion 2013, 14, 147–162. [Google Scholar] [CrossRef]
  31. Saeedi, J.; Faez, K. Multi Focus Image Dataset. Available online: https://www.researchgate.net/profile/Jamal_Saeedi/publication/273000238_multifocus_image_dataset/data/54f489b80cf2ba6150635697/multi-focus-image-dataset.rar (accessed on 5 February 2018).
  32. Nejati, M.; Samavi, S.; Shiraniv, S. Lytro Multi Focus Dataset. Available online: http://mansournejati.ece.iut.ac.ir/content/lytro-multi-focus-dataset (accessed on 5 February 2018).
  33. LunaPic Tool. Available online: https://www.lunapic.com (accessed on 5 February 2018).
  34. Hossny, M.; Nahavandi, S.; Creighton, D. Comments on ‘Information measure for performance of image fusion’. Electron. Lett. 2008, 44, 1066. [Google Scholar] [CrossRef]
  35. Yang, C.; Zhang, J.; Wang, X.; Liu, X. A novel similarity based quality metric for image fusion. Inf. Fusion 2008, 9, 156–160. [Google Scholar] [CrossRef]
  36. Xydeas, C.; Petrović, V. Objective image fusion performance measure. Electron. Lett. 2000, 36, 308–309. [Google Scholar] [CrossRef]
Figure 1. (a) Low-cost mobile phenotyping system; (b) Adjustable focus Pi camera; (c) System mounted on a tripod looking down to the canola plants.
Figure 1. (a) Low-cost mobile phenotyping system; (b) Adjustable focus Pi camera; (c) System mounted on a tripod looking down to the canola plants.
Sensors 18 01887 g001
Figure 2. Workflow of the proposed multi-focus image fusion algorithm.
Figure 2. Workflow of the proposed multi-focus image fusion algorithm.
Sensors 18 01887 g002
Figure 3. An example of a source image and its saliencies and weight maps: (a) A source image; (b) visual saliency; (c) gradient saliency; (d) chrominance color (M); (e) chrominance color (N); (f) weight maps; (g) refined base weight map; (h) refined detail weight map.
Figure 3. An example of a source image and its saliencies and weight maps: (a) A source image; (b) visual saliency; (c) gradient saliency; (d) chrominance color (M); (e) chrominance color (N); (f) weight maps; (g) refined base weight map; (h) refined detail weight map.
Sensors 18 01887 g003
Figure 4. Results: (a) Source image 1 of Canola 1; (b) Source image 2 of canola 1; (c) Reconstructed image using MWGF; (d) DCTLP; (e) GFF; (f) GDPB; (g) IM; (h) proposed method.
Figure 4. Results: (a) Source image 1 of Canola 1; (b) Source image 2 of canola 1; (c) Reconstructed image using MWGF; (d) DCTLP; (e) GFF; (f) GDPB; (g) IM; (h) proposed method.
Sensors 18 01887 g004
Figure 5. (a) Source image 1 of Canola 2; (b) Source image 2 of Canola 2; (c) MWGF; (d) DCTLP; (e) GFF; (f) GDPB; (g) IM; (h) proposed method.
Figure 5. (a) Source image 1 of Canola 2; (b) Source image 2 of Canola 2; (c) MWGF; (d) DCTLP; (e) GFF; (f) GDPB; (g) IM; (h) proposed method.
Sensors 18 01887 g005
Figure 6. (a) Source image 1 of Canola 4; (b) Source image 2 of Canola 4; (c) MWGF; (d) DCTLP; (e) GFF; (f) GDPB; (g) IM; (h) proposed method.
Figure 6. (a) Source image 1 of Canola 4; (b) Source image 2 of Canola 4; (c) MWGF; (d) DCTLP; (e) GFF; (f) GDPB; (g) IM; (h) proposed method.
Sensors 18 01887 g006
Figure 7. Results: (a) Source image 1 of Rose; (b) Source image 2 of Rose; (c) MWGF; (d) DCTLP; (e) GFF; (f) GDPB; (g) IM; (h) proposed method.
Figure 7. Results: (a) Source image 1 of Rose; (b) Source image 2 of Rose; (c) MWGF; (d) DCTLP; (e) GFF; (f) GDPB; (g) IM; (h) proposed method.
Sensors 18 01887 g007
Table 1. Comparison of the proposed method with other methods.
Table 1. Comparison of the proposed method with other methods.
IndexSource ImagesMethods
MWGFDCTLPIMGFFGDPBProposed Algorithm
QMICanola 11.2241.0421.1241.1900.6561.288
Canola 21.2200.9461.1641.1470.6111.230
Canola 31.1650.9811.0431.1480.5731.212
Canola 41.3200.9431.4001.0600.5701.400
Doll0.6640.9180.8810.3100.7321.011
Rose1.0491.1331.0020.4400.7361.147
Jug1.0651.0850.9740.3470.7421.094
Diver1.1681.2071.1900.5150.9101.210
Book0.9571.1881.1520.4870.9001.234
Notebook1.1181.1811.1410.4630.7451.190
QYCanola 10.9580.8510.8120.9480.7550.970
Canola 20.9810.8590.9010.9670.7620.980
Canola 30.9610.8560.7520.9550.7370.970
Canola 40.7770.7990.9800.9130.7000.980
Doll0.9020.9500.9600.8000.8720.980
Rose0.9730.9790.9730.8290.9010.980
Jug0.9950.9900.9700.9700.7790.995
Diver0.9750.9710.9760.7440.9180.976
Book0.9520.9560.9590.6470.8500.977
Notebook0.9870.9830.9910.8440.8160.992
QAB/FCanola 10.9580.8850.9380.9300.8830.974
Canola 20.9870.9870.9810.9870.9870.987
Canola 30.9550.6210.9370.8410.6070.970
Canola 40.9060.4920.9150.5290.4810.915
Doll0.9870.9860.9870.9860.9870.987
Rose0.9870.9870.9870.9860.9870.987
Jug0.9870.9870.9870.9860.9870.987
Diver0.9860.9860.9860.9860.9860.986
Book0.9840.9800.9840.9840.9830.984
Notebook0.9860.9870.9870.9860.9870.987
Table 2. Ranking the performance of fused images of the proposed method with other methods based on the results from Table 1.
Table 2. Ranking the performance of fused images of the proposed method with other methods based on the results from Table 1.
Source ImagesMethods
MWGFDCTLPIMGFFGDPBProposed Algorithm
Canola 1254361
Canola 2254361
Canola 3254361
Canola 4241351
Doll523641
Rose324561
Jug324651
Diver423651
Book423651
Notebook432651

Share and Cite

MDPI and ACS Style

Cao, T.; Dinh, A.; Wahid, K.A.; Panjvani, K.; Vail, S. Multi-Focus Fusion Technique on Low-Cost Camera Images for Canola Phenotyping. Sensors 2018, 18, 1887. https://doi.org/10.3390/s18061887

AMA Style

Cao T, Dinh A, Wahid KA, Panjvani K, Vail S. Multi-Focus Fusion Technique on Low-Cost Camera Images for Canola Phenotyping. Sensors. 2018; 18(6):1887. https://doi.org/10.3390/s18061887

Chicago/Turabian Style

Cao, Thang, Anh Dinh, Khan A. Wahid, Karim Panjvani, and Sally Vail. 2018. "Multi-Focus Fusion Technique on Low-Cost Camera Images for Canola Phenotyping" Sensors 18, no. 6: 1887. https://doi.org/10.3390/s18061887

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop