Reflectance Estimation from Multispectral Linescan Acquisitions under Varying Illumination—Application to Outdoor Weed Identification †

To reduce the amount of herbicides used to eradicate weeds and ensure crop yields, precision spraying can effectively detect and locate weeds in the field thanks to imaging systems. Because weeds are visually similar to crops, color information is not sufficient for effectively detecting them. Multispectral cameras provide radiance images with a high spectral resolution, thus the ability to investigate vegetated surfaces in several narrow spectral bands. Spectral reflectance has to be estimated in order to make weed detection robust against illumination variation. However, this is a challenge when the image is assembled from successive frames that are acquired under varying illumination conditions. In this study, we present an original image formation model that considers illumination variation during radiance image acquisition with a linescan camera. From this model, we deduce a new reflectance estimation method that takes illumination at the frame level into account. We experimentally show that our method is more robust against illumination variation than state-of-the-art methods. We also show that the reflectance features based on our method are more discriminant for outdoor weed detection and identification.


Introduction
Nowadays, one of the biggest topics in precision farming is to increase yield production while reducing the quantity of chemicals. In order to optimize the application of herbicides in crop fields, less toxic and expensive weed control alternatives can be considered due to recent advances in imaging devices. During the last decade, sophisticated multispectral sensors have been manufactured and deployed in crop fields, leading to weed detection [1][2][3].
Multispectral cameras collect data over a wide spectral range and they provide the ability to investigate the spectral responses of soils and vegetated surfaces in narrow spectral bands. Two main categories of devices can be distinguished in multispectral image acquisition. "Snapshot" (multi-sensor or filter array-based) devices build the image from a single shot [4]. Although this technology provides multispectral images at a video frame rate, the few acquired channels and low spatial resolution may not be sufficient for fully exploring the vegetation spectral signatures. "Multishot" (tunable filter or illuminationbased, push-broom, and spatio-spectral linescan) devices build the image from several and successive frame acquisitions [5][6][7]. Despite being restricted to still scenes, they provide images with a high spectral and spatial resolution. We use a multishot camera, called the "Snapscan" to acquire outdoor multispectral radiance images of plant parcels in a greenhouse under skylight [8]. From this radiance information, reflectance is estimated as an illumination-invariant spectral signature of each species. Several methods have been proposed for computing reflectance thanks to prior knowledge regarding cameras or illumination conditions [9][10][11]. In field conditions, the typical methods first estimate the illumination by including a reference device (a white diffuser or a color-checker chart) in the scene [2,[12][13][14]. Subsequently, reflectance is estimated at each pixel p by the channel-wise division of the value of the radiance image at p by the pixel values that characterize the white diffuser or the color-checker white patch. In [15], an extension to the multispectral domain of four algorithms traditionally applied to RGB images is proposed for estimating the illumination. In [16], a Bragg-grating-based multispectral camera acquires outdoor radiance images and reflectance is estimated from two white diffusers, one along the image bottom border and another fully visible one. A coarse reflectance is computed first, and then rescaled using illumination-based scaling factors. Finally, the resulting reflectance image is normalized channel-wise by the average reflectance value that is computed over pixels of the second white diffuser that is present in the scene. In [17], a multispectral camera is used in conjunction with a skyward pointing spectrometer to estimate the reflectance from the acquired scene radiance.
These methods require additional devices and knowledge regarding the spectral sensitivity functions (SSFs) of the sensor filters. They also often assume constant incident illumination throughout a few seconds. However, in outdoor conditions, illumination may vary significantly during the successive frame acquisitions (scans) that last for several seconds.
In this paper, we propose a reflectance estimation method that is robust to illumination variations during the multispectral image acquisition that was performed by the Snapscan camera. In Section 2, we provide details regarding multispectral radiance image acquisition with this device and propose an original model for such image formation. In Section 3, this model is first used to study reflectance estimation under constant illumination. We also show that, when outdoor illumination varies during the frame acquisitions, the assumption about spatio-spectral correlation does not hold. Based on this model, we then propose a new method for estimating the scene reflectance from a multispectral radiance image acquired in uncontrolled and varying illumination conditions (see Section 4). Section 5 presents an experimental evaluation of the proposed reflectance estimation, and Section 6 shows the results of weed/crop segmentation using the estimated reflectance.

Multispectral Radiance Image Acquisition by Snapscan Camera
In this section, we first detail how the Snapscan camera achieves radiance measurement and frame acquisition. Subsequently, we explain how a multispectral image is obtained from the successively acquired frames. We propose an original multispectral image formation model that handles how illumination is associated to both the considered band and pixel because the radiance that is associated to a given spectral band at a pixel is measured in a frame that is acquired at a specific time. From this new model, we show that the spatio-spectral correlation assumptions do not hold when illumination varies during the frame acquisitions.

Radiance Measurement
The Snapscan is a multispectral camera manufactured by IMEC that embeds a single matrix sensor that is covered by a series of narrow stripes of Fabry-Perot integrated filters. It contains B = 192 optical filters whose central wavelengths range from λ 0 = 475.1 nm to λ B−1 = 901.7 nm with a variable center step (from 0.5 nm to 5 nm). Specifically, each filter of index b ∈ [0, B − 1] is associated with five adjacent rows of 2048 pixels that form a filter stripe, and it samples a band from the visible or near infra-red spectral domain according to its SSF T b (λ) with a full width at half maximum between 2 nm and 10 nm.
The Snapscan camera acquires a sequence of frames to provide a multispectral image. During frame acquisitions, the object and camera both remain static while the sensor moves and illumination may change. Therefore, the measurement of the radiance that is reflected by a given lambertian surface element s of the scene varies according to the frame acquisition time t, although s is projected at a fixed point q of the image plane. Let us denote, as E t (λ) ∈ [0, 1], the relative spectral power distribution (RSPD) of the illumination at t and assume that it homogeneously illuminates all of the surface elements of the scene. The radiance that is reflected by s and refracted by the camera lens projects onto the image plane at q as a stimulus L t,q (λ): where R q (λ) ∈ [0, 1] is the spectral reflectance of the surface element s that is observed by q, and A q (λ) ∈ [0, 1] is the optical attenuation of the camera lens at q. All of these functions depend on the wavelength λ. The sensor moves forward on the image plane according to the direction perpendicular to the filter stripes (see Figure 1a). Between two successive frame acquisitions, it moves by a constant step v = 5 (in pixels) that is equal to the number of rows in each stripe. Therefore, the radiance that is measured at q is filtered by a different Fabry-Perot filter of index b t,q (0 ≤ b t,q < B) at each acquisition time t. The radiance at q is fully sampled over N frame acquisitions, provided that each of them measures the radiance there, i.e., N ≥ B. Let the coordinates of point q be (x q , y q ) C in the camera 2D coordinate system (O, x, y) whose origin O corresponds to the intersection between the optical axis and image plane. The unit vectors of x and y are given by the photo-sensitive element size (i.e., axis units match with pixels), and y is oriented opposite to the sensor movement. At a given point q, the filter index b t,q can then be expressed as: where y 0 is the coordinate along y of the first filter row at first acquisition time t = 0. Note that the light stimulus L t,q is only associated to a filter at a given point q when t 0 q ≤ t < t B q . The lower bound t 0 q = (y 0 − y q )/v is the acquisition time at which the first optical filter of the sensor observes L t,q . The upper bound t B q = (y 0 − y q )/v + B is the time at which all of the sensor filters have observed L t,q .
Besides , at a given time t, the coordinate y q of point q that is associated to a photosensitive element of the sensor satisfies: since 0 ≤ b t,q < B. Given these restrictions, the radiance S t,q that is then measured at q by the sensor at acquisition time t is expressed as: where Q is the quantization function according to the camera bit depth, τ is the integration time of the frames, and Ω is the working spectral domain. Note that τ is set to the highest possible value that provides no saturated pixel.

Frame Acquisition
The radiance that is measured at q is stored by the camera as a pixel value f t,q = S t,q in frame f t (see Figure 1b). We define the coordinate system (O , x , y ) attached to the sensor, such that origin O is the first (top-left) photo-sensitive element location, axis y corresponds to y, and x is parallel to x, in order to compute the coordinates of q relative to the frame. In this frame system, the coordinates of q are (x q , y q ) F = (x q , y q − y 0 + t · v) F . Note that Equation (3) allows us to check that 0 ≤ y q < B · v.
Snapscan camera  Conversely , any given pixel p(x p , y p ) F of a frame f t is mapped to the coordinates in the camera coordinate system as: at which the stimulus L t,p (λ) of a surface element radiance is filtered by the filter of index b p = y p /v . From this point of view, each frame pixel value is, therefore, also expressed as: Before the frame acquisitions, the Snapscan uses its internal shutter to acquire a dark frame f dark whose values are subtracted pixel-wise from the acquired frames. Therefore, we assume that the pixel value that is expressed by Equation (6) is free from thermal noise.
Let us also point out that, at two (e.g., successive) acquisition times t 1 and t 2 , the sensor is at different locations. Therefore it acquires the values f t 1 ,p and f t 2 ,p from the stimuli L t 1 ,p and L t 2 ,p of two different surface elements at a given pixel p whose coordinate y p in the camera system is time-dependent (see Equation (5)). Besides, the stimuli L t 1 ,p and L t 2 ,p are filtered by the same filter whose index only depends on the pixel coordinate y p in the frame system. Equations (4) and (6) model the radiance that is measured at a given point in the image plane and stored at a given pixel of a frame, respectively. Both of the equations take account of illumination variation during the frame sequence acquisition, but differently take the sensor movement into account. Indeed, the filter index changes at a given point of the image plane during the frame acquisition (see Equation (4)), whereas the observed surface element changes at a given pixel in the successive frames (see Equation (6)).

Stripe Assembly
We now determine the first and last acquisition times of the frame sequence that is required to capture an object of interest whose projection points on the image plane are bounded along the y axis by q a (x q a , y q a ) C and q m (x q m , y q m ) C , with y q a > y q m . Given the initial coordinate y 0 of the sensor along y, we can compute the first and last frame acquisition times t 0 q a and t B−1 q m , so that the measured radiances at the points between q a and q m are consecutively filtered by the B sensor filters (see the top part of Figure 2). The acquisition of the multispectral image from the frame sequence { f t } t B−1 qm t=t 0 qa takes account of the spatial and spectral organizations of each frame. A frame f t is spatially organized as juxtaposed stripes of v adjacent pixel rows. A stripe f b t , b = 0, . . . , B − 1, of v adjacent pixel rows contains the spectral information of the scene radiance that is filtered according to the SSF T b (λ) of filter b centered at wavelength λ b . All of the stripes that are associated with filter b in the acquired frames are stacked by the assembly function to provide a stripe assembly defined as: The size of each stripe assembly is 2048 pixels in width and N · v pixels in height, where N = t B−1 q m − t 0 q a /∆ + 1 is the number of acquired frames and ∆ is the frame acquisition period.
To form the multispectral image I (B) = {I b } B−1 b=0 of the object of interest, only the scene part that is common to all stripe assemblies is considered by the camera (see the bottom part of Figure 2). Specifically, the retained stripes in the b-th assembly are acquired between t b q a and t b q m to form each channel I b : The multispectral image I (B) has its own coordinate system. For convenience, in the sequel, we denote a pixel as p(x, y) in this system, since the camera and frame coordinate systems are not used any longer.

Formation Model of a Multispectral Image Acquired by Snapscan Camera
We can now infer an image formation model for multishot linescan cameras, such as the Snapscan. At any pixel p, the radiance value (8)). It results from the light stimulus L t b p ,p that was filtered according to T b (whose index dependence upon p is dropped by stripe assembly step), and is therefore defined from Equations (1) and (6), as: The term E t b p (λ) shown in Equation (9) points out that illumination is associated to both a channel index and a pixel. These dependencies may weaken the spatio-spectral correlation assumptions of the measured scene radiance.
Spectral correlation relies on the assumption that the SSFs that are associated to adjacent spectral channels strongly overlap. Thus, radiance measures at a given pixel in these channels should be very similar (or correlated). Let us consider the radiance values in two channels b 1 and b 2 at a given pixel p. Even if the SSFs T b 1 (λ) and T b 2 (λ) strongly overlap (and are equal in the extreme case), the illumination conditions at t b 1 p . Spatial correlation relies on the assumption that the reflectance across locally close surface elements of a scene does (almost) not change. Thus, under the same illumination, the radiance measures at their associated pixels within a channel are correlated. Let us consider two pixels, p 1 (x p 1 , y p 1 ) and p 2 (x p 2 , y p 2 ), which observe surface elements of a scene with the same reflectance R p 1 (λ) = R p 2 (λ) for all λ ∈ Ω. If |y p 1 − y p 2 | ≥ v, then the radiances at p 1 and p 2 are acquired at different times t b p 1 and t b p 2 associated to different illumination conditions E t b p1 and E t b p2 , hence I b p 1 = I b p 2 . Therefore, the spatio-spectral correlation assumption does not hold in the image formation model of the Snapscan camera when illumination varies.

Reflectance Estimation with a White Diffuser under Constant Illumination
This short section introduces how to estimate reflectance by a classical (white diffuserbased) method and how the result should be post-processed to ensure its consistency.

Reflectance Estimation
In order to estimate spectral reflectance from radiance images that were acquired under an illumination that is almost constant over time, one classically uses the image I (B) [WD] of a white diffuser acquired in full field beforehand and assumes that: (i) The illumination is spatially uniform and it does not vary during the frame acquisitions, and p ∈ I (B) , and Equation (9) becomes: Note that the quantization function Q is omitted here, since the different terms are considered as being already quantized.
(ii) Each of the Fabry-Perot filters has an ideal SSF 0 otherwise, such that Equation (10) becomes: Reflectance is then derived for any pixel p that is associated to a spectral band centered at λ b as: The white diffuser is supposed to be perfectly diffuse and reflect the incident light with a constant diffuse reflection factor ρ wd . Hence, for I (B) [WD], we can write: where τ wd is the frame integration time of I (B) [WD]. Plugging Equation (13) into (12) yields the reflectance image that is estimated from a B-channel radiance image I (B) : This reflectance estimation model implicitly compensates the vignetting effect, since the white diffuser and object (scene of interest) occupy the same (full) field of view. Accordingly, I b p and I b p [WD] are affected by the same optical attenuation whose effect vanishes after division.
The estimated B-channel reflectance imageR (B) should then undergo two postprocessing steps: spectral correction and negative value removal.

Spectral Correction
Each of the Snapscan Fabry-Perot filters is designed to sample a specific spectral band from the spectrum according to its SSF T b (λ). However, because of the SSFs and optical properties of some filters (angular dependence [18], high-energy harmonics), several spectral bands are redundant, which limits the accuracy of the spectral imaging system. This leads to redundancy in spectral bands and introduces spectral information bias. Therefore, the reflectance image with B = 192 spectral channels is spectrally corrected and only K = 141 channels are kept in practice.
The spectral correction ofR (B) provides a spectrally corrected K-channel reflectance imageR (K) that is expressed at each pixel p as: where M is the sparse K × B correction matrix that is provided by the calibration file of our Snapscan camera. The linear combinations of the channel values ofR for the bands (referred to as "virtual" bands by IMEC) that are associated to the image channels, but the spectral working domain Ω = [475.1 nm, 901.7 nm] is unchanged.

Negative Value Removal
The acquired radiance image contains negative values due to dark frame subtraction, when the value of a dark frame pixel is higher than the measured radiance at this pixel. This generally occurs in low-dynamics channels, where the central wavelengths are in the range [475.1 nm, 560.4 nm] (before spectral correction). These negative values may lay on vegetation pixels and corrupt reflectance estimation at these pixels. Because we intend to classify vegetation pixels, this could lead to unexpected prediction errors. Negative values also occur-for even more pixels-in the spectrally-corrected reflectance imageR (K) (see Equation (15)), because the correction matrix M contains negative coefficients. Negative values have no physical meaning and they must be discarded. Because our images mostly contain smooth textures (vegetation, reference panels, soil), we consider that, unlike radiance, reflectance values are highly correlated over close surface elements.
Thus, we propose correcting negative values in imageR (K) by conditionally using a 3 × 3 median filter, as:R whereR k ref,p is the final reflectance value at pixel p for channel k. Because we consider the reflectance that is estimated by this model (Equations (14)-(16)) as a reference, it is denoted asR (K) ref .

Outdoor Reflectance Estimation with Reference Devices in the Scene
Because illumination varies during the acquisitions of outdoor scene images, the reflectance estimation method that is described by Equation (14) is not adapted to linescan cameras, such as the Snapscan. In such a case, one solution is to use several reference devices [16].
As a first reference device, we use a white diffuser tile mounted on the acquisition system, so that the sensor vertically observes a portion of it (see Figure 3a). Therefore, the pixel subset W D contains (about 10%) right border pixels that represent the white diffuser, as shown in Figure 3b. Because W D spans all the image rows, we further extract a small white square W S that represents a sample of this reference device. Each acquired image also contains a GretagMacbeth ™ ColorChecker that is principally used to assess the performances that are reached by reflectance estimation methods. The pixel subset W P representing the ColorChecker white patch is used as a second reference device by the double white diffuser (dwd) method [16] that we have adapted to our Snapscan acquisitions, as described in Appendix A.
Although the vignetting effect only depends on the intrinsic camera properties, this method corrects it in each acquired image. In Section 4.1, we propose performing this correction by the analysis of the white diffuser image I (B) [WD]. Subsequently, we present state-of-the art estimation reflectance methods that only require one reference device in the scene, but assume that the illumination is constant during the frame acquisition. We finally propose a single-reference method to estimate reflectance in the case of varying illumination.

Vignetting Correction
The dwd method acquires a full-field white diffuser image before each scene image acquisition in order to correct the vignetting effect [16]. However, this procedure can be cumbersome, since it requires an external intervention in order to place/remove the full-field white diffuser. Other methods that are presented in the following only require correcting it only once. Because we consider that vignetting only depends on the intrinsic geometric properties of the camera, we propose correcting it thanks to the analysis of a single full-field white diffuser image I (B) [WD] acquired in a laboratory under controlled illumination conditions. The vignetting effect refers to a loss in the intensity values from the image center to its borders due to the geometry of the sensor optics. To highlight how the vignetting effect would affect radiance measurements, let us rewrite Equation (9) under the Dirac SSF assumption as: In order to compensate for the spatial variation of A p (λ b ), we compute a correction factor at p, because it requires no knowledge regarding the optical device behavior [19]. Being deduced from the full-field white diffuser image I (B) [WD], the correction factor is channel-wise and pixel-wise computed as: where I b [WD] is the median value of the m pixels (m = 11 in our experiments) with the highest values over I b p [WD], which discards saturated or defective pixel values. The correction factors are stored in a B-channel multispectral image, denoted as C.
Because C is deduced from a single white diffuser image, it would be corrupted by noise (even after thermal noise removal during the frame acquisitions). Thus, we propose to directly denoise C by convolving each of its channels C b with an 11 × 11 averaging filter H: The vignetting effect in the B-channel radiance image I (B) is corrected channel-wise and pixel-wise using the smoothed correction factors: where I b p andĨ b p are the intensity values before and after vignetting correction. This procedure should reduce noise while preserving image textures. We assume that the attenuation is spatially uniform after vignetting correction (i.e., A p (λ b ) ·C b p = α b ∈ R for any given channel index b and pixel p), such that each value of the vignetting-free radiance image is expressed from Equation (17) as:

Reflectance Estimation with One Reference Device under Constant Illumination
The illumination E t b p (λ b ) that is associated to p can be determined using the radiance measured at a white diffuser pixel p WD ∈ W D. To determine illumination thanks to a single white diffuser as reference device, the methods in the literature often assume that illumination is constant, i.e., (21) then becomes , since the white diffuser has a homogeneous diffuse reflection R p WD (λ b ) = ρ wd (95% in our case). The reflectance at p is then deduced fromĨ b p andĨ b p WD as: To be robust against spatial noise, the white-average (wa) method [2,20] averages all of the values over the white diffuser pixel subset W S (see Figure 3b) and estimates the reflectance at each image pixel as: where | · | is the set cardinal. Similarly, the max-spectral (ms) method [15] assumes that the pixel with maximum value within each channel can be considered to be a white diffuser pixel for estimating the illumination. While ignoring the diffuse reflection factor, reflectance is estimated at each pixel in each channel by the ms method, as: where X contains all of the image pixels, except W D, and those of the ColorChecker. The wa and ms-based B-channel reflectance images undergo spectral correction and negative value removal (see Equations (15) and (16)) to provide the final K-channel reflectance imagesR

Reflectance Estimation with One Reference Device under Varying Illumination
In varying illumination conditions, the Snapscan acquires each row at a given time, hence under a specific illumination (see Section 2.4). Hence, reflectance can no longer be estimated, as in Equation (14). Instead, we propose determining the illumination that is associated to each row of the vignetting-free imageĨ (B) from the white diffuser pixel set W D [21]. The underlying assumption is that illumination is spatially uniform over each row at both the white diffuser and scene pixels (that may be not verified in the case of shadows). Based on this row uniformity assumption for illumination, we estimate reflectance fromĨ (B) in a row-wise manner, as follows. At pixel p with spatial coordinates x p and y p , Equation (21) can be rewritten as: To determine the illumination E t b yp (λ b ) that is associated to the row of p for channel index b, we use a white diffuser pixel r WD ∈ W D located on the same row as p. At r WD , the reflectance is equal to the white diffuser reflection factor ρ wd , and Equation (21) provides the vignetting-free radiance as: Because p and r WD are located on the same row, t b according to the assumption regarding the spatial uniformity over each row. Therefore, Equation (26) can be rewritten as: which can be considered to be an estimation of the illumination that is associated to pixel p.
For robustness sake, we propose computing it from the median valueĨ b W D,y p of the m highest pixel values that represent the white diffuser subset W D in y p , rather than from a single valueĨ b r WD . Plugging Equation (27) in (25) yields our row-wise (rw) reflectance estimation at pixel p for channel index b: In practice, setting m = 11 pixels is a good compromise for accurately estimating the illumination for each row and each channel.

Experiments about Outdoor Reflectance Estimation
We now present the experimental setup and metrics that were used to objectively evaluate the estimated reflectance. The accuracy results are obtained and discussed for the previously described estimation methods as well as for the extra training-based method described in the present section.

Experimental Setup
An acquisition campaign that was conducted in a greenhouse under skylight (see Figure 4a) provided 109 radiance images of 2048 × 2048 pixels × 192 channels of 10-bit depth. Among the targeted plants are crops (e.g., beet) and weeds (e.g., thistle and goosefoot). The images were acquired at different dates of May and June 2019, and different day times (see Figure 4b). Figures 6a and 7a show a RGB rendering of two of them with the D65 illuminant.
where |P j | is the number of pixels that characterize the considered patch. Among the 24 color patches of the ColorChecker chart, we use a learning subset P l of 12 patches for the learning procedure and the remaining 12 test patches P t for testing the quality of reflectance estimation (see Figure 5). The learning patches of P l are selected using an exhaustive search.
Among the 2,704,156 tested combinations, we retain the one that provides the lowest (mean absolute) reflectance estimation error (see Section 5.4).
The test subset P t is used to assess the performances that are reached by reflectance estimation methods, and the learning one P l is fed into a training-based reflectance estimation method, as described in the following.

Training-Based Reflectance Estimation
The linear Wiener (wn) estimation technique can be applied to estimate reflectance thanks to a learning procedure [23]. It is based on a matrix G that transforms radiance spectra into reflectance. From any radiance image I (B) in the database, we compute the spectrally-corrected vignetting-free radiance imageĨ (K) while using Equations (15) and (20), and then estimate the K-channel reflectance image as: To compute G, we use the spectra of the ColorChecker learning patches (P l subset) that are represented in each of our images. The estimation matrix G that is associated to each input radiance image is determined as: where T re f and T rad are the K × 12 matrices that are formed by horizontally stacking the centered and transposed reference reflectance vectors (fromR ) and radiance vectors (from the current imageĨ (K) ) of the learning patches, and denotes the transpose.

Evaluation Metrics
To evaluate the accuracy of reflectance estimation, we use the patches of the Col-orChecker test subset P t (see Figure 5). LetR (K) * ,P t j , * ∈ {rw, wa, ms, dwd, wn} denote the reflectance image that is estimated for patch P t j ∈ P t by either the proposed rw method (see Equation (28)) or the four implemented state-of-art methods (see Equations (23)-(30)).  Black (24) Orange−yellow (12) Yellow (16) Red (16) Neutral8 (20) White (19) Cyan (18) Purple (10) Neutral65 (21) Blue − sky (3) Light − skin Bluish − green (6) Magenta (17) Blue − flower (5) Moderated − red (9) Green (14) Orange (7) Neutral35 (23) Yellow − green (11) Neutral5 (22) Foliage (4) Purple − Blue (8) Dark − skin (1) (b) Reflectance spectra of learning patches. (c) Reflectance spectra of test patches. This vector is compared to the reference reflectanceR [CC] of the same patch computed according to Equation (29). The spectra of the ColorChecker patches should be similar (and ideally superposed) to their laboratory counterparts when outdoor reflectance is well estimated. We objectively assess each estimated reflectance image thanks to the mean absolute error (MAE) and angular error ∆θ of each test patch P t j ∈ P t given by: and: where · 2 is the Euclidean norm. When ∆θ between two vectors (spectra in our case) is equal to zero, it means that these two vectors are collinear.

Results
We compute the mean absolute error MAE * and angular error ∆θ * averaged over all of the test patches of all reflectance images estimated from the whole database to obtain aggregated metrics. Table 1 presents the results for the five tested methods. The MAE and ∆θ are complementary metrics and they, respectively, highlight two important properties: the scale and shape of the estimated spectra. Indeed, while MAE is mainly sensitive to the scale of the estimated spectra, ∆θ especially focuses on the shape of the spectra, because it is a scale-insensitive measure. Consequently, there might be no correlation between the results that were obtained by the MAE measure and those obtained by ∆θ.
No method provides the best results according to the two metrics, as we can see from Table 1. Indeed, the wn and dwd methods provide better results than rw and wa according to the MAE, but the rw and wa methods provide better results in terms of ∆θ.
The ms method provides the worst results, because it only analyzes pixels of background and vegetation that strongly absorb the incident light in the visible domain. Hence, the biased illumination estimation in this domain affects the performance of ms method. It is worthwhile to mention that the wn method performance might also be biased, since it uses some of the ColorChecker patches as training references (to build estimation matrix G), while the other patches of the same chart are used to evaluate the reflectance estimation quality.
Among illumination-based methods that analyze a single reference device, rw provides similar results to wa in terms of ∆θ, as well as better MAE results. This shows that taking account of the illumination variation during the frame acquisitions improves the reflectance estimation quality.

Multispectral Image Segmentation
Now, we evaluate the contribution of our proposed rw-based reflectance estimation method for supervised crop/weed detection and identification. For this experiment, we focus on the beet (crop) that must be distinguished from thistle and goose-foot (weeds). First, vegetation pixels are detected and ground truth (labels) regarding vegetation pixels is provided by an expert in agronomy (Section 6.1). In order to evaluate the robustness of each considered feature against illumination conditions, we use a data set composed of 37 radiance (13 single-species and 24 mixed) images that we split into a learning and test set, denoted as S learn (23 images) and S test (14 images) (Section 6.2). The illumination conditions are various in the two sets and S test mostly includes images that are acquired on different days from those of S learn (see Figure 4). Note that, as a consequence, vegetation in the learning and test image sets may not be exactly at the same growth stages. We first compare the discrimination power of reflectance features provided by our rw method against radiance features to assess each reflectance estimation method for crop/weed identification and detection. Subsequently, we compare it with reflectance features that are estimated using each of the four considered state-of-the-art methods (wa, ms, wn, and dwd) (Sections 6.3-6.5).

Vegetation Pixel Extraction and Labelling
Only vegetation pixels are analyzed because we aim to detect/identify crops and weeds. They are distinguished from the background (white diffuser, ColorChecker, and soil pixels) using the normalized difference vegetation index (NDVI) [24]. We compute the NDVI values from the rw-based reflectance imageR (K) rw , since the rw method considers illumination variation, but the images provided by any other reflectance estimation method should yield similar vegetation pixel detection results. We consider p to be a vegetation pixel if its NVDI value is greater than a threshold γ: with the Snapscan "virtual" band centers λ 67 = 678.2 nm and λ 139 = 899.2 nm. Setting γ = 0.45 experimentally provides a good compromise between under-and oversegmentation of vegetation pixels. Noisy vegetation pixels are filtered out as much as possible by morphological opening. The vegetation pixels are then manually labelled by an expert in agronomy to build the segmentation ground truth for each multispectral image.

Learning and Test Vegetation Pixels
From the learning set S learn , we randomly extract N learning pixels per class. For a given class C i , i = [0, . . . , N C − 1], the number of extracted learning pixels per image depends on the number of images where class C i is represented in S learn (occurrences). Among the 23 learning images, the beet (crop) class appears in 17 images, thistle in nine images, and goosefoot in 12 images.
In the test set S test , beet, thistle, and goosefoot are represented, respectively, in 12, 10, and four images. For the weed detection task, we extract 2N learning pixels, half for crop and half for weed class. Because we merge thistle and goosefoot prototype pixels to build a single weed class, we extract N /2 learning pixels for thistle and N /2 for goosefoot. Each pixel is characterized by a K-dimensional (K = 141) feature vector of reflectance (or radiance) values. The reflectance/radiance images are averaged channel-wise over a 5 × 5 pixel window to reduce noise and within-class variability. Table 2 shows the number of learning and test pixels per class for weed detection and beet/thistle/goosefoot identification. All of the available pixels in S test are used to assess the generalization power of a supervised classifier.

Evaluation Metrics
The classical accuracy score can be a misleading measure to evaluate a classifier performance when the number of test pixels that are associated to each class is highly skewed (like in Table 2) [25] (p. 114). A classification model that predicts the majority class for all test pixels reaches a high classification accuracy. However, this model can also be considered as weak when misclassifying pixels of the minority classes is worse than missing pixels from the majority classes. In order to overcome this so-called "accuracy paradox", the performance of a classification model for imbalanced datasets should be summarized with appropriate metrics, such as precision/recall curve [25] (pp. 53-56, 114). Although some metrics may be more meaningful and easy to interpret, there is no consensus in the literature for choosing a single optimal metric. In our case, we want to correctly detect weed pixels without over-detection, because this would imply spraying crops with herbicides. Therefore, the performance of our classification model on both crop and weed detection/identification should be comparable. For this purpose, we use the per-class accuracy score and the weighted overall accuracy score. We also compute the F1-score that combines the precision and recall measures. These three measures should summarize the classification performance of imbalanced sets of test pixels well.
Let us denote the true test pixel labels as y and the set of predicted labels asŷ. The per-class accuracy score for class C i , i = [0, . . . , N C − 1], is: where y C ij andŷ C ij are the true and predicted labels for the j-th test pixel of class C i , respectively, and |y C i | is the number of test pixels of class C i . The weighted overall accuracy for binary and multiclass classifications is defined as: where ω C i = 1/|y C i | is the weight that is associated to class C i and computed as the inverse of its size, so as to handle imbalanced classes. Because the F1-score privileges the classification of true positives pixels (weed pixels in our case), we compute the overall F1-score as the population-weighted F1-score, so that the performances over all classes are considered: The F1-score of class C i is computed as: where and

Classification Results
The parametric LightGBM (LGBM) and non-parametric Quadratic Discriminant Analysis (QDA) classifiers are applied for supervised weed detection and identification problems. The choice of these two non-linear classifiers is motivated by their processing time during the learning and prediction procedures and their fundamentally different decision rules. Indeed, LGBM is a parametric tree-based classifier that requires a learning procedure to model a complex classification rule, whereas QDA is a simple non-parametric classifier that is based on Bayes' theorem to perform predictions. For LGBM, we retain the default parameter values (learning rate of 0.05, 150 leaves) and use the log loss function as the learning evaluation metric.
LGBM uses a histogram-based algorithm to bucket the features into discrete bins, which drastically reduces the memory and time consumption. The number of bins is set to 255 and the number of boosting operations to 100. Additionally, the feature fraction and bagging fraction parameters are set to 0.8 to increase LGBM speed and avoid over-fitting. Table 3 shows the classification results that were obtained with LGBM and QDA classifiers for each considered feature. Figures 6 and 7 show the color-coded vegetation pixel classification of two test images using the LGBM classifier in weed detection and identification tasks, respectively.
Let us first compare the classification performance of reflectance against radiance features for the weed detection task. From the results that are given in Table 3, we can see that reflectance features estimated by illumination-based methods (rw, wa, ms, and dwd) provide better classification results than radiance features in terms of the average F1 and accuracy scores, whatever the classifier. The worst classification results are obtained with reflectance features that are estimated using the wn method. Training-based methods, such as wn, can provide an accurate reflectance estimation of scene objects whose optical properties are close to those of the training samples. In our case, the optical properties of vegetation are very different from that of the training ColorChecker patches. Thus, wn provides inaccurate reflectance estimations at vegetation pixels, which affects its classification performance. Figure 6 illustrates the satisfying Accuracy and F1 scores that are obtained thanks to the analysis of illumination-based reflectance features by LGBM. Indeed, this figure shows that weed is globally well detected by these methods.
For weed identification, the classification performances of all the features are degraded, because they provide weak performances on the goosefoot class (see Figure 7). This lack of generalization might be due to the high within-class dispersion (since we consider vegetation at various growth stages) and/or the physiological vegetation changes. Let us now compare the classification performances of the reflectance features. The best overall classification results are obtained by our proposed rw method that performs well with both classifiers and reaches the highest average F1 and accuracy scores with LGBM for weed detection (85.4% and 86.1%, respectively). The wa method provides good classification results, better than those that were obtained by dwd, although the latter accounts for illumination variation during the frame acquisitions. The computation of illumination scaling factors to compensate for illumination variation may explain this poor performance, as well as the loss of spectral information (saturated reflectance values) in the near infra-red domain that is caused by illumination normalization (see Equation (A4)).

Experimental Conclusion
The experiments with this outdoor image database allow us to compare the performances of different reflectance features according to the estimation quality and pixel classification. The evaluation results are summarized by separately studying weed detection and identification. Tables 4 and 5 show the rank Rank , * obtained by each reflectance estimation method * according to each evaluation criterion used in Tables 1 and 3. The method with the lowest total rank is considered to be the best one, since it satisfies several criteria.  The total ranks of ms and wn methods are the highest ones, because they provide the worst results for either estimation quality (ms) or classification performance (wn). On the one hand, the dwd method that uses two reference devices to cope with illumination variation provides the second best total rank for weed detection (see Table 4). On the other hand, the wa method that uses one reference device, but assumes that illumination is constant, gives the second best total rank for weed identification (see Table 5). Our rw method, which row-wise analyzes one single reference device in order to take account of illumination variation, reaches the best total ranks for both weed detection and identification problems. These experiments suggest that rw-based reflectance features are relevant for weed identification under variable illumination conditions. Their performance should also be confirmed with other crop species, such as bean and wheat.

Conclusions
This paper first proposes an original image formation model of linescan multispectral cameras, like the Snapscan. It shows how illumination variation during the multispectral image acquisition by this device impacts the measured radiance that is provided by a Lambertian surface element. Our model is versatile and it can be adapted to model the outdoor image acquisition of several multispectral cameras, such as the HySpex VNIR-1800 [26] or the V-EOS Bragg-grating camera used in [16]. From this model, we propose a reflectance estimation method that copes with illumination variation. Because such varying conditions may affect the reflectance estimation quality, we estimate illumination at the frame level using a row-wise (rw) approach. We experimentally show that the rw method is more robust against illumination variation than the state-of-the-art methods. We also show that rw-based features are more discriminant to target outdoor supervised weed detection and identification, and they provide the best classification results. The accuracy of weed recognition systems and their robustness against illumination can be improved using reflectance features. This allows for precision spraying techniques to be considered in order to get rid of weeds using fewer quantities of chemicals. This study enables to make a step towards sustainable agriculture. As future work, segmentation will be extended to other plant species (such as wheat and bean) and growth stages. Spectral feature selection and texture features extraction will also be studied to improve the crop/weed identification performance.
Each term in this equation is the average value over the row of p within the white diffuser subset W D in channel I b of either the full-field white diffuser image or the scene image. • Finally, the values ofR are normalized channel-wise to provide the dwd reflectance estimation as: where β b W P is the average value over the white patch subset W P in channelR b , and ρ b W D is the diffuse reflection factor of the white patch for the spectral band centered at λ b measured by a spectroradiometer in laboratory.
The B-channel reflectance imageR (B) dwd undergoes spectral correction and negative value removal (see Equations (15) and (16)) to provide the final K-channel reflectance imageR (K) dwd .