Comparison Between Three Registration Methods in the Case of Non-Georeferenced Close-Range Multispectral Images

This study evaluated three geometric transformations in an image registration method applied to non-georeferenced multispectral images acquired at close range over greenhouse cucumber plants with a Micasense® RedEdge camera. The detection of matching points was performed using SURF features, and outliers matching points were removed using the MSAC algorithm. For each geometric transformation (affine, similarity, and projective), we mapped the matching points of the blue, green, red, and NIR band images into the red-edge band space and computed the root mean square error (RMSE in pixel) to estimate the accuracy of each transformation. Our results achieved an RMSE of less than 1 pixel with the similarity and affine transformations and of less than 2 pixels with the projective transformation, whatever the band image. We determined that the best transformation was the affine transformation because it produces RMSEs of less than 1 pixel and having a Gaussian distribution.


INTRODUCTION
Powdery mildew, which is caused by the fungus Podosphaera xanthii, is a major disease in cucumber greenhouses and may lead to yield losses between 30 to 50% of the total production [1]. Powdery mildew grows haustorium that causes internal structural damages to colonized cell walls of leaves, petioles, and stems [2]. Such changes in the cell walls should be detected using nearinfrared imagery [3]. The infected plants are also subjected to chlorophyll degradation, which can be detected in the visible bands [4]. Acquiring imagery in the visible and near-infrared regions of the electromagnetic spectrum requires the use of multispectral cameras. For greenhouse conditions, studies on the detection of cucumber powdery mildew mainly used RGB images acquired over single leaves having visible symptoms [5]. RGB cameras have the advantage of being made of a single sensor that uses filters to produce red, green, and blue images. As a result, the three images automatically well align.
However, in the case of multispectral imagery acquired with a robot in a greenhouse, the close-range imagery is not georeferenced, and there is the need to develop an image registration method for properly aligning the band images. Such alignment is even more critical for close-range images because of the very small pixel size. For such a registration, the images that need to be registered (known as the moving images) are registered as a function of a reference image (known as the fixed image) [6]. The registration process involves the detection of features and their location on the moving and fixed images and then a geometrical transformation of the moving image into the fixed image space [7].
This study aims to evaluate three geometric transformations in an image registration method applied to non-georeferenced multispectral images acquired at close range over greenhouse cucumber plants with a Micasense ® RedEdge (Micasense, Inc., Seattle, Washington, USA) camera. The camera was attached to a mechanic extension of a mobile cart. The three transformations use the blue, green, red, and NIR bands as moving images and the red-edge band as the fixed reference image because the red-edge sensor has a central position on the camera.

Image Acquisition
Forty-five multispectral images were collected with a Micasense ® RedEdge camera over healthy and infected cucumber plants located in a greenhouse that belongs to Great Lakes Greenhouses Inc., a horticultural company, which is in Leamington, Canada (42°04ʹ27ʹʹN 82°35ʹ15ʹʹW). The Micasense ® RedEdge camera has five bands (Table 1) and a field of view of 47.2°. The camera was attached to a metal structure that was on a cart, which has wheels to facilitate its movement inside the greenhouse aisles. The height of the camera position was at close range (1.5 m) from the top of the cucumber plants. As a result, the pixel size of the images was 0.10 cm.

Image Registration
All data were processed using MATLAB R2020a (MathWorks, Inc., Natick, Massachusetts, USA). The information related to the MATLAB R2020a functions and their related parameters used in this study was obtained from www.mathworks.com. Figure 1 presents the flowchart of the image processing used in this study. The collected images and their respective bands were imported into the MATLAB workspace and converted from uint16 to uint8 file formats using the im2uint8 function. Then, the first 700 columns were removed from each band image because this image region is related to the aisle of the greenhouse. The images were then subjected to the image registration process which includes (i) SURF features, detection, and matching; (ii) geometric transformation; (iii) image wrapping and (iv) computation of the RMSE (in pixel) of the position of the inliers matching points between both the fixed and transformed moving images. In this study, we used raw DN images without converting them into reflectance values. Figure 1. Flowchart of the image registration methodology used in this study to register close-range non-georeferenced images acquired with the Micasense® RedEdge multispectral camera over cucumber plants.
The detection of matching points was performed using the detectSURFFeatures function that uses a multiscale analysis of the Hessian matrix to compute blob-like structures (SURF features) that are stored in SURF Points objects [8]. The function parameters were modified from their default values to obtain the highest possible number of blobs. Once the SURF Points objects were obtained, the extractFeatures function was used to obtain the extracted feature vectors (descriptors) and their corresponding locations on both the moving and fixed images. To match the extracted features from both the moving and fixed images, the matchFeatures function was used to obtain the matching features set from the moving and fixed images. The image registration process involves a geometric transformation that allows transforming the moving image into the Red-Edge band space. It is based on the matching points of the moving and fixed images. Such transformation corrects the image distortions and allows band alignment. The ideal geometric transformation will remove only the spatial distortions between images [9]. The geometric transformation was done as follows. First, the Mestimator Sample Consensus (MSAC) algorithm embedded in the estimateGeometricTransform function was used to exclude outliers matching points [10]. The MSAC algorithm is a faster variant of the Random Sample Consensus (RANSAC) algorithm [11].
Once the outliers are removed, the estimate GeometricTransform function creates a 2D geometric transform object containing the geometric transformation (T) matrix that defines the geometric transformation type. We considered in this study all the three available geometric transformations, i.e., affine, similarity, and projective. Details can be found in [12,13] for the similarity and affine transformations and in [13,14] for the projective transformation. To apply the geometric transformation to each moving image (blue, green, red, and NIR bands), we used the imwarp function with the respective T matrix. The imwarp function returns the moving image transformed into the Red-Edge band space. No computational load issues were observed because each method was computed separately and did not require too many computational resources.

RMSE computation
The inliers matching point positions on both the fixed and transformed moving images were used to compute a root mean square error (RMSE) to assess the registration accuracy quantitatively. First, we used the transformPointsForward function to determine the pixel coordinates (in image rows and columns) of each inlier matching point on both images. The RMSE (in pixels) between the positions of the inliers matching points on both the fixed and transformed moving images was then computed as follows (Equation 1): Where: • x i = column number of the inlier matching point in the fixed (Red-Edge) image; • xi = column number of the inlier matching point in the moving image (blue, green., red, and NIR images); • ŷ i = row number of the inlier matching point in the fixed (Red-Edge) image; • yi = row number of the inlier matching point in the moving image (blue, green., red, and NIR images); • RMSE= root mean square error for all the inliers matching points of the moving image (in pixels). The resulting RMSEs were then plotted using boxplots to compare the performance of each transformation. We also plotted the RMSEs distribution because RMSEs should have a Gaussian rather than a uniform distribution [15].

RESULTS AND DISCUSSION
The minimum, maximum, mean, standard deviation, and standard error for the number of matching points obtained using the SURF features method between the blue, green, red, or NIR band images (moving images) and the Red-Edge band images (fixed image) are given in Table 1. Table 2 gives the same descriptive statistics but for the number of inliers matching points obtained by applying the MSAC algorithm for each geometric transformation. The total number (percentage) of matching points has been reduced more strongly with the projective transformation than with the two other transformations, whatever the band.   The RMSEs between the positions of the inliers matching points on both the fixed and transformed moving images were plotted using boxplots (Figure 2). The projective transformation has the highest RMSEs for all the band images, with an RMSE value near 1.5 pixels (Figure 3). The corresponding RMSEs were less than 1 pixel with the two other transformations (Figure 2). Our RMSEs were lower than those (higher than 2.5 pixels) obtained by Hadaddi and Leblon [16] who performed image registration and band alignment on the same non-georeferenced multispectral Micasense ® RedEdge images as our study. There are several differences between our study and Haddadi and Leblon [16]. They located the matching points with the Harris corner detector [17] and the Scale Invariant Feature Transform (SIFT) algorithm [18], while in our case, we used the SURF features detector and the matchfeatures function of MATLAB (R2020a). The best image registration method should produce RMSEs that have a Gaussian distribution because model errors are likely to have a Gaussian distribution rather than a uniform distribution [14]. Figure 3 presents the distribution of the RMSEs as a function of the band and geometric transformation. The Gaussian distribution of the RMSEs occurred for the affine transformation for all the band images, but the blue.

CONCLUSIONS
We compared three geometric transformations for registering 45 non-georeferenced multispectral images that were acquired at a close distance over mature cucumber plants with a Micasense ® RedEdge camera. The best transformation was the affine transformation because the RMSEs were less than 1 pixel and have a Gaussian distribution for all the band images, but the blue band.
Our study shows that the green and NIR band images have more matching points because they correspond to high reflectance bands and are very sensitive to the canopy and leaf areas. Future research under greenhouse conditions should investigate whether these factors influence the accuracy of the image registration and band alignment. Such a test can be done using younger cucumber plants that have low leaf areas. While the results of this study are quite promising, they were acquired on a limited number of images. Further work is needed to test the method of broad sampling and make the algorithm more robust.