Supervised Image Classiﬁcation by Scattering Transform with Application to Weed Detection in Culture Crops of High Density

: In this article, we assess the interest of the recently introduced multiscale scattering transform for texture classiﬁcation applied for the ﬁrst time in plant science. Scattering transform is shown to outperform monoscale approaches (gray-level co-occurrence matrix, local binary patterns) but also multiscale approaches (wavelet decomposition) which do not include combinatory steps. The regime in which scatter transform also outperforms a standard CNN architecture in terms of data-set size is evaluated (10 4 instances). An approach on how to optimally design the scatter transform based on energy contrast is provided. This is illustrated on the hard and open problem of weed detection in culture crops of high density from the top view in intensity images. An annotated synthetic data-set available under the form of a data challenge and a simulator are proposed for reproducible science. Scatter transform only trained on synthetic data shows an accuracy of 85% when tested on real data.


Introduction
Deep learning is currently tested world-wide in almost all application domains of computer vision as an alternative to purely handcrafted image analysis [1].When inspecting the convolutional coefficients in the first layers of deep neural networks, these are very similar to Gabor wavelets.While promoting a universal framework, deep neural networks seem to systematically converge toward tools that humans have been studying for decades.This empirical fact is used by computer scientists in the so-called transfer learning where the first layers of an already trained network are re-used [2].This has also triggered interest by mathematicians to revisit the use of wavelets to produce universal machine-learning architectures.This interdisciplinary cross-talk resulted in the proposal of the so-called scatter transform [3], which is roughly a cascade of wavelet decomposition followed by non-linear and pooling operators.If this deep architecture bares some similarity with the standard deep learning, it does not include the time-consuming feed-forward propagation algorithm.However, it proved its comparable efficiency to deep learning while offering a very rational way of choosing the parameters of the network compared to the rather empirical current art of tuning neural networks.
Despite its intrinsic interest to address multiple scales problems compared to deep learning, scatter transform since its introduction in 2013 has been applied only on a relatively small variety of pattern recognition computer vision problems notably including iris recognition [4], rainfall classification in radar images [5], cell-scale characterization [6,7], or face recognition, [8].Also, in these applications Remote Sens. 2019, 11, 249 2 of 16 scatter transform has shown its efficiency, but it was not systematically compared with other techniques in a comprehensible way.We propose to extend the scope of investigation of the applicability of scatter transform algorithm to plant science with a problem of weed detection in a background of culture crops of high density.This plant science problem is important for field robotics where the mechanical extraction of weed is a current challenge to be addressed to avoid the use of phytochemical products.From a methodological point of view, this classification problem here will also serve as a use case to assess the potential of the scatter transform when compared with other single scale and multiple scales techniques.
A large variety of platforms, sensors, and data process already exist to monitor weeds at various temporal and spatial scales.From remote sensing supported by satellites to cameras located on unmanned aerial vehicles (UAVs) or on ground-based platforms, many systems have been described and compared for the weed monitoring in arable culture crops [9][10][11].Related to the observation scale of our use case, by focusing on the imaging scales of UAVs and ground-based platforms, some studies exploiting RGB data have addressed crop weed classification with a large variety of machine-learning approaches.The problem of segmentation of crop fields from typical weeds, performing vegetation detection, plant-tailored feature extraction, and classification to estimate the distribution of crops and weeds has recently been solved with convolutional neural networks in the field [12,13] and in real-time [14].Earlier, Aitkenhead, M. et al. [15] evaluated weed detection in fields of crop seedlings using simple morphological shape characteristic extraction and self-organizing neural network.Bayesian classifier was used in [16] for plant and weed discrimination.Shape, texture features [12,[17][18][19] or wavelet transform [20,21] coupled with various classifiers including support vector machine (SVM), relevance vector machine (RVM), fuzzy classifier, or random forests were also shown to provide successful pipelines to discriminate between plant and weeds.
The above list of reference is of course not exhaustive and new pipelines will continue to appear because of the large variety of crops shape and imaging platform.In this context, scatter transform constitutes a candidate of possible interest worth to be assessed on a plant-weed classification problem.Also, by comparison with the existing work on weed detection, the computer vision community has focused on the relatively low density of crops and weed where the soil constitutes a background to be classified in addition to crop and weed.In this paper, we consider the case of culture crops of high density, i.e., where the soil is not visible from the top view.In this case, the culture is the background and the object to be detected are weeds of wild type.The contrast in color between the background and the weed, in this case, is obviously here very low by comparison with lower density culture.

Material and Methods
We start by introducing the computer vision problem considered, the data-set, the expected scales included in these images and the algorithms tested for comparison with the multiscale scatter transform algorithm.

Images and Challenges
We consider the situation of a culture crops of a high density of plants (mache salad) with the undesired presence of some weeds.Images were acquired with the imaging system fixed on a robot as displayed in Figure 1.Acquisition trials, as visible in Figure 1, were done under plastic tunnels without additional light.Some sample images are given in Figure 2. Examples of weed detected in such images are shown in Figure 3 to illustrate the variability of shapes among these wild types of weeds.The computer vision task considered in this article consists in detecting the weeds from the top view as shown in the ten real-world images of Figure 2.This is challenging indeed since the intensity or color contrast between weed and crop is very weak.Also, due to the lighting conditions during acquisition, the global intensity may vary from one image to another.The contrast between weeds and plants rather stands in terms of texture since the shape of the plant considered is rather round while the weeds included in the data-set Figure 3 are much more indented.Therefore, this computer vision problem is well adapted to test scatter transform which is a texture-based technique.A ground truth of the position of the weed in the ten images of Figure 2 was produced under the form of finely segmented weed and bounding box patches including these weeds.The total number of weeds being relatively low (21), we decided to generate a larger data-set with synthetic images.To simulate images similar to the real images acquired, we created a simulator which places weeds (among the 21 found in real images) from the annotated weed data-set in images of plants originally free from any weed along the pipeline shown in Figure 4.

Scales
With a spatial resolution of 5120 by 3840 pixels included in the images of our data-set, and as illustrated in Figure 5, multiple anatomical structures of the dense weed/plant culture are accessible in our images.From tiny to coarse sizes, i.e., scales, this includes texture in the limb, the veins, and the leaf.There are possibly discriminant features between the two classes (weed/plant) to be found in these three scales either taken individually or combined with each other.To offer the possibility of a multiple scale analysis, together with a reasonably small computation time, classification is done at the scale of patches chosen as double size of the typical size of leaves, 2 × max{S w , S p }, with rectangles of 250 by 325 pixels where S w = 163 pixels and S w = 157 on average.With this constraint, we also keep for the patch the same ratio between height and width as in the original image for a periodic patch grid.points toward the texture of the limb, (W 2 , P 2 ) indicates the typical size of leaflet and (W 3 , P 3 ) stands for the width of the veins.Sw and Sp show the size of a leaf of weed and plant, respectively.The classification of weed and plant is done at the scale of a patch taken as 2 × max(Sp, Sw) in agreement with a Shannon-like criteria.

Data-Set
With the simulator of Figure 4, we produced a total amount of 3292 patches containing weed and 3292 patches only with plants.The binary classification (weed/plant) is realized on these patches.This balanced data-set serves both for the training and the testing stages to assess the performance of different machine-learning tools.The data sets together with the simulator are proposed as supplementary material under the form of a free executable and a set of images (https://uabox.univ-angers.fr/index.php/s/iuj0knyzOUgsUV9).

Classifiers
In this section, we describe how we apply the scatter transform [3] on the weed detection problem introduced in the previous section.For comparison, we then propose a set of alternative techniques.This paper uses independent k-fold cross-validation to measure the performance of the scatter transform coupled to the classifier depicted in Figure 6 and compare other feature extractors coupled to the same classifier.The performances of these classifiers are measured by the metric of the accuracy of correct classification by where TP indicates that the prediction is positive, and the actual value is positive.FP indicates that the prediction value is positive, but the actual value is negative.TN indicates that the prediction value is negative, and the actual value is negative.FN indicates that the prediction value is negative, but the actual value is positive.

Scatter Transform
A scattering transform defines a signal representation which is invariant to translations and potentially to other groups of transformations such as rotations or scaling.It is also stable to deformations and is thus well adapted to image and audio signal classification.A scattering transform is implemented with a convolutional network architecture, iterating over wavelet decompositions and complex modulus.Figure 6 shows a schematic view of a scatter transform network working as a feature extractor and coupled to a classifier after dimension reduction.
where TP indicates that the prediction is positive and the actual value is positive.FP indicates that the prediction value is positive but the actual value is negative.TN indicates that the prediction value is negative and the actual value is negative.FN indicates that the prediction value is negative but the actual value is positive.

Scatter transform
A scattering transform defines a signal representation which is invariant to translations and potentially to other groups of transformations such as rotations or scaling.It is also stable to deformations and is thus well adapted to image and audio signal classification.A scattering transform is implemented with a convolutional network architecture, iterating over wavelet decompositions and complex modulus.Figure 6 shows a schematic view of a scatter transform network working as a feature extractor and coupled to a classifier after dimension reduction.The scatter vectors Z m at the output of the first three layers m = 1, 2, 3 for an input image f are defined by where the symbol denotes the spatial convolution, |.| stands for the L 1 norm, φ is an averaging operator, ψ j,θ is a wavelet dilated by 2 j and rotated by θ.The range of scales j = {0, 1, . . ., J} and the number of orientations θ = {0, π/L, . . ., π(L − 1)/L} are fixed by integers J and L. The number of layers is between m = 1 to m = M.In our case, we considered as mother wavelet the Gabor filter with implementation provided under Matlab in (https://www.di.ens.fr/data/scattering/) for scatter transform.
Scatter transform differs from a pure wavelet decomposition because of the nonlinear modulus operator.With this nonlinearity, decomposition of the image is not done on a pure orthogonal basis The scatter vectors Z m at the output of the first three layers m = 1, 2, 3 for an input image f are defined by where the symbol denotes the spatial convolution, |.| stands for the L 1 norm, φ is an averaging operator, ψ j,θ is a wavelet dilated by 2 j and rotated by θ.The range of scales j = {0, 1, . . ., J} and the number of orientations θ = {0, π/L, . . ., π(L − 1)/L} are fixed by integers J and L. The number of layers is between m = 1 to m = M.In our case, we considered as mother wavelet the Gabor filter with implementation provided under MATLAB in (https://www.di.ens.fr/data/scattering/) for scatter transform.
Scatter transform differs from a pure wavelet decomposition because of the non-linear modulus operator.With this nonlinearity, decomposition of the image is not done on a pure orthogonal basis (whether wavelet basis is orthogonal or not) and this opens the way of a possible benefit in the concatenation of several layers with a combination of wavelet decompositions at different scales.Interestingly, these specific properties of the scatter transform match the intrinsic multiscale textural nature of our weed detection problem which therefore constitutes an appropriate use case to assess the potential of the scatter transform in practice.A visualization of output images for various filter scale j at m = 2 for a given orientation is shown in Figure 7.It clearly appears in Figure 7 that the various scales (texture of the limb and veins at j = 3, border shape at j = 4 and global leaf shape at j = 8) presented in Section 2.2 can be captured with the different scaling factor applied on the wavelet.In our study, we empirically picked L = 8 orientations and investigated up to J = 8 scales since there are no other anatomical items larger than the leaf itself.The number of layers tested was up to M = 4 as proposed in [3] since the energy after some layers although none zero is logically vanishing.
Interestingly, these specific properties of the scatter transform match the intrinsic multiscale textural nature of our weed detection problem which therefore constitutes an appropriate use case to assess the potential of the scatter transform in practice.A visualization of output images for various filter scale j at m = 2 for a given orientation is shown in Fig. 7.It clearly appears in Fig. 7 that the various scales (texture of the limb and veins at j=3, border shape at j=4 and global leaf shape at j=8-not shown) presented in section 2.2 can be captured with the different scaling factor applied on the wavelet.In our study, we empirically picked L = 8 orientations and investigated up to J = 8 scales since there are no other anatomical items larger than the leaf itself.The number of layers tested was up to M = 4 as proposed in [3] since the energy after some layers although none zero is logically vanishing.In the application of scatter transform to classification found in the literature so far, the optimization of the architecture was done a posteriori after supervised learning.This is rather time-consuming.We investigated the possibility to select a priori the best architecture by analyzing the distribution of relative energy E m at the output of each layer as given by ( We computed these energies for the whole data set as given in Table 1.As noticed in [3], the relative energy is progressively vanishing when the number of layers increases.This observation advocates for the use of a limited number of layers.However, these energies are computed on the whole population of patches including both plants and weeds and therefore it tells nothing about where to find the discriminant energy between each class throughout the feature space produced by the scatter transform.Tables 2 and 3 show the average relative energy for the weeds' patches data-set, E w m , and plants' patches data-set,E p m , for different layers m and various maximum scale J.
In order to show this discriminant energy between each class, various criterion could be proposed.We tested the percentage of energy similarity, Q m , between the two classes defined by In the application of scatter transform to classification found in the literature so far, the optimization of the architecture was done a posteriori after supervised learning.This is rather time-consuming.We investigated the possibility to select a priori the best architecture by analyzing the distribution of relative energy E m at the output of each layer as given by ( We computed these energies for the whole data-set as given in Table 1.As noticed in [3], the relative energy is progressively vanishing when the number of layers increases.This observation advocates for the use of a limited number of layers.However, these energies are computed on the whole population of patches including both plants and weeds and therefore it tells nothing about where to find the discriminant energy between each class throughout the feature space produced by the scatter transform.Tables 2 and 3 show the average relative energy for the weeds' patches data-set, E w m , and plants' patches data-set, E p m , for different layers m and various maximum scale J.
To show this discriminant energy between each class, various criterion could be proposed.We tested the percentage of energy similarity, Q m , between the two classes defined by According to this criterion, the best architecture of the scatter transform can be chosen at the point of η where the minimum Q m between each class is found as a function of J by η = argmin J (Q m (J)).The energy similarity Q m (J) are represented in Figure 8 and this clearly demonstrates that the contrast between classes is more pronounced on coefficient with small relative energy.This observation, not stressed in the original work of [3], indicates that it should be possible to draw benefit from the contribution of these small discriminative coefficients and thus this demonstrates the interest of the combinatory step of the scatter transform.2 and 3.
Also, from the observation of Figure 8, our approach indicates that a priori the best discriminant energy between each class is to be expected with a scatter architecture corresponding to M = 4 and J = 4 which provides the minimum energy similarity, η, between the energy of images of the weeds' class and the plants' class.

Other Methods
To assess the possible interest of the scatter transform in our weed detection problem, we consider several alternative feature extractor algorithms.First, since the scatter transform by construction works on a feature space which includes multiple scales, it is expected to perform better than any state of the art monoscale method, i.e., working on a feature space tuned on a single size, when applied on a multiple scales problem (such as the one we have here with veins, limb, leaf).Second, since the scatter transform works on a combination of wavelet decomposition between scales it should perform slightly better than a pure wavelet decomposition chosen on the same wavelet basis but without the use of the non-linear operator nor the scales combination.Finally, because scatter transform shares some similarities with convolutional neural networks it should also be compared with the performance obtained with a deep learning algorithm.Based on this rationale, we propose the following alternative feature extractor for comparison with the feature extractor of the scatter transform where the same PCA followed by a linear SVM is used for the classification.
Local binary pattern: Under the original form of [22] and as used in this article, for a pixel positioned at (x, y), local binary pattern (LBP) indicates a sequential set of the binary comparison of its value with the eight neighbors.In other words, the LBP value assigned to each neighbor is either 0 or 1, if its value is smaller or greater than the pixel placed at the center of the mask, respectively.The decimal form of the resulting 8-bit word representing the LBP code can be expressed as follows where i x,y corresponds to the gray value of the center pixel, and i n denotes that of the nth neighboring one.Besides, the function ξ(x) is defined as follows The LBP operator remains unaffected by any monotonic gray scale transformation which preserves the pixel intensity order in a local neighborhood.It is worth noticing that all the bits of the LBP code hold the same significance level, where two successive bits value may have different implications.The process of Equation ( 5) is produced at the scale of the patch defined in the previous section.The LBP(x, y) of each pixel inside this patch are concatenated to create a fingerprint of the local texture around the pixel at the center of the patch.Equations ( 5) and ( 6) are applied on all patches of an image.
Gray-Level Co-Occurrence Matrix: A statistical approach that can well describe second-order statistics of a texture image is provided by the so-called gray-level co-occurrence matrix (GLCM).GLCM was firstly introduced by Haralick et al. [23].A GLCM is essentially a two-dimensional histogram in which the (i, j)th element is the frequency of event i co-occurring with event j.A co-occurrence matrix is specified by the relative frequencies C(i, j, d, θ) in which two pixels, separated by a distance d, occurs in a direction specified by the angle θ, one with gray-level i and the other with gray-level j.A co-occurrence matrix is therefore a function of distance d, angle θ and grayscales i and j.
In our study, as perceptible in images of Figure 2, the weed-plant structures are isotropic meaning that they show no specific predominant orientations.As a logical consequence, and as already stated in similar weed classification problem using GLCM [24][25][26], choosing multiple orientations θ would not improve the classification performance.We therefore arbitrarily chose a fixed θ = 0 which enables to probe on average leaves positioned in all directions.For distance, d, it is taken at d = 2 pixels which correspond to a displacement capable of probing the presence of edges, veins, and structures in the limb.
Gabor filter: Same Gabor filters as in the scatter transform were applied to the images to produce a feature space.By contrast with the scatter transform, no non-linearities are included in this process and only one layer of filters is applied.For a fair comparison in this experiment, scale range J and number of orientations L of the Gabor filter bank are chosen at the same value as in the scatter transform.
Deep learning: Representation learning, or deep learning, aims at jointly learning feature representations with the required prediction models.We chose the predominant approach in computer vision, namely deep convolutional neural networks [27].The baseline approach resorts to standard supervised training of the prediction model (the neural network) on the target training data.No additional data sources are used.In particular, given a training set comprised of K pairs of images f i and labels ŷi , we train the parameters θ of the network r using stochastic gradient descent to minimize empirical risk: L denotes the loss function, which is cross-entropy in our case.The minimization is carried out using the ADAM optimizer [28] with a learning rate of 0.001.The architecture of network r(•, •), shown in Figure 9, has been optimized on a hold-out set and is given as follows: five convolutional layers with filters of size 3 × 3 and respective numbers of filters 64, 64, 128, 128, 256 each followed by ReLU activations and 2 × 2 max pooling; a fully connected layer with 1024 units, ReLU activation and dropout (0.5) and a fully connected output layer for 2 classes (weeds, plants) and SoftMax activation.Given the current huge interest on deep learning many other architectures could be tested and possibly provide better results.As a disclaimer, we stress that the architecture proposed in Figure 9 is of course not expected to provide the best performance achievable with any neural network architecture.Here the tested CNN serves as a simple reference with a level of complexity of the architecture adapted to the size of the input image and training data sets.

Result
In this section, we provide experimental results using the experimental protocol for the assessment of scatter transform (Section 2.4) as well as the different alternative feature extraction techniques chosen for comparison in Section 2.4.2.
The scatter transform produces a data vector containing the Z m f of Equation ( 2) whose dimension is reduced by a standard PCA and then applied to a linear kernel SVM.To compare the performance of different structures of scatter transform on the database, we used a different combination of filter scales, j, and the number of layers, m, to realize which structure is the best fit for our data.Table 4 shows the classification accuracy of these structures where 10-fold cross-validation approach is used for classification.The best weed/plant classification results with scatter transform are obtained for J = 4 and m = 4.This a posteriori exactly corresponds to the prediction done a priori from the energy-based approach presented in the method section.Table 4. Percentage of correct classification for 10-fold cross-validation classification on simulation data with scatter transform for various values of m and J.We considered this optimal scatter transform structure with J = 4 and m = 4 and compared it with all alternative methods described in Section 2.4.Table 5 shows the recognition rates of weed detection on the data where a k-fold cross-validation approach of SVM classification with the different number of folds is used.Scatter transform appears to outperform all compared handcrafted methods.This demonstrates the interest of the multiscale and combinatory feature space produced by scatter transform.It is important to notice that to have a fair comparison of these alternative methods, we adapted the feature spaces of all algorithms to the same size.The minimum size of the whole feature space is selected, and feature space of other algorithms are reduced to that specific size.In our techniques, the minimum feature space belongs to the GLCM method which has a size of N × 19 where N represents the number of samples.The PCA algorithm is adapted to our models to reduce the dimensions of the feature space generated by other techniques to the size of N × 19.
As shown in Table 5 and Figure 10, when compared with CNN, like most handcrafted methods, scatter transform performs better for small data sets.The limit where CNN and scatter transform are found to perform equally is found to be 10 4 on the weed detection problem as given in Figure 10.This demonstrates the interest of the scatter transform in case of rather small data sets.It is, however, to be noticed that an intrinsic limitation of scatter transform is that it works only with patches to perform a classification while some architectures of convolutional neural network would also be capable of performing segmentation directly in the whole image (see for instance U-Net) [29].

Discussion
So far, we focused in this article on detection of weeds in fields by the scatter transform algorithm with a comparison of other machine-learning techniques which have been trained and tested on synthetic images produced by the simulator of Figure 4. Our experimental results show that a good recognition rate of weeds detection (approximately 95%) can be achievable by the scatter transform algorithm.On the other hand, other alternative methods also work well for this problem with a minimum recognition rate around 85%.These experiments prove that texture-based algorithms can be useful for weed detection in culture crops of high density.
One may wonder how these classification results compare toward the literature on weed detection in less dense culture cited in the introduction section [12][13][14][15][16][17][18][19][20][21].The performance in this literature varies from 75% to 99% of good detection of weed.It is, however, difficult to provide a fair comparison since in addition to the main difference with the absence of soil, the observation scales together with the acquisition conditions vary from one study to another.
One may wonder how these algorithms trained on synthetic data behave when they are applied to real images including plant background and weed not included in the synthetic data sets.We also tested our scatter transform classifier which was trained on synthetic data when applied on the real images of Figure 2. On average for all 10 real images, the accuracy found is 85.64%.Although this constitutes already interesting results, this indicates a bias between simulated data and real data.One direction could be to improve the realism of the simulator.In the version proposed here weeds were not necessarily acquired in the same lighting conditions as the plant.A simple upgrade could be to adapt the average intensity on the weed and the plant to compensate for this artifact or, since in plant and weed can indeed be of various intensity, to generate data augmentation with various contrast.However, simulators never exactly reproduce reality.Another approach to improve the performance of the training based on simulated data would be to add a step of domain adaptation after the scatter transform [30].So far, the best and worst results obtained with scatter transform are given in Figure 11.A possible interpretation for the rather low performance in Figure 11b is the following.The density of weed in Figure 11b is very high compared to the other images in the training data-set.Consequently, the local texture in the patch may be very different from the one obtained when weeds appear as outliers.This demonstrates that the proposed algorithm, trained on synthetic data, is appropriate in the low density of weeds at an observation scale such as the one chosen for the patch where plant serves as a systematic background.These performances could be improved in several ways.First, a large variety of weeds can be found in nature and it would be important to include more of this variability in the training data sets.Also, weeds are fast growing plants capable of winning the competition for light.Therefore, high percentages of weed are expected to come with higher weeds than in very low percentage of the surface of weeds.This fact illustrated in Figure 11 is not included in the simulator where weeds of a fixed size are randomly picked.Such example of enrichment of the training data-set and simulator could be tested easily following the global methodology presented in this article to assess the scatter transform.Finally, we did not pay much effort on denoising the data.The proposed data have been acquired with a camera fixed on an unmanned vehicle.Compensation for variation of illumination in the data-set, or inside the images, themselves or compensation for the possible optical aberration of the camera used could also constitute directions of investigation to improve the weed/plant detection.All the methods presented in this paper (including scatter transform) have the capability to be robust to global variation of light intensity however the variation of light direction during the day may impact the captured textures.Increasing the data-set to acquire images at all hour of a working day or adding a lighting cabinet on the robot used would make the results even more robust [14,[31][32][33].
The problem of weed detection in culture crops of high density is an open problem in agriculture which we believe deserves the organization of a challenge similar to the one organized on Arabidopsis in controlled conditions [34] for a biology community.Such challenges contribute to improving the state of the art as recently illustrated with the use of simulated Arabidopsis data to boost and speed up the training [35] in machine learning.This challenge is now open on the codalab platform (https://competitions.codalab.org/competitions/20075)together with the effort of proposing real data and the simulator (https://uabox.univ-angers.fr/index.php/s/iuj0knyzOUgsUV9)developed for this article.These additional materials, therefore, contributes to the opening of the problem of weed detection in culture crops of high density to a wider computer vision community.

Conclusions and Perspectives
In this article, we proposed the first application of the scatter transform algorithm to plant sciences with the problem of weed detection in a background of culture crops of high density.This open plant science problem is important for field robotics where the mechanical extraction of weed is a current challenge to be addressed to avoid the use of phytochemical products.
We assessed the potential of the scatter transform algorithm in comparison with single scale and multiscale techniques such as LBP, GLCM, Gabor filter, and convolutional neural network.Experimental results showed the superiority of the scatter transform algorithm with a weed detection accuracy of approximately 95% over the other single scale and multiscale techniques on this application.Though the comparison was not intended to be exhaustive among the huge literature on texture analysis, the variety of tested techniques contributes to confirm the effectiveness of using the scatter transform algorithm as a valuable multiscale technique for a problem of weed detection and opened an interesting approach for similar problems in plant sciences.Finally, an optimization method based on energy at the output of the scatter transform has been successfully proposed to select a priori the best scatter transform architecture for a classification problem.
Concerning the weed-plant detection, our optimal solution with scatter transform can serve as a first reference of performance and other machine-learning techniques could now be tested in the framework of the data challenge that we launched for this article (https://competitions.codalab.org/competitions/20075).As a possible perspective of the investigation, one could further optimize the scatter transform classifier proposed in this paper.For instance, the size of the grid could be fine-tuned or some hyperparameters could be added with non-linear kernels in the SVM step.Also, weed/plant detection was focused here on a binary classification since no distinction between the different weeds were included.In another direction, one could also envision to extend this work to a multiple types of weeds classification problem if more data were included.

Figure 1 .
Figure 1.Global view of the imaging system fixed on a robot moving above mache salads of high density.RGB images are captured by a JAI manufactured camera of 20 M pixels with a spatial resolution of 5120 × 3840 pixels, mounted with a 35 mm objective.The typical distance of plants to camera is of 1 m.

Figure 2 .
Figure 2. Set of 10 RGB images from top view for the detection of weed out of plant used as testing data-set in this study.

Figure 3 .
Figure 3. Illustration of different types of weeds used for the experiment.

Figure 4 .
Figure 4. Simulation pipeline for the creation of images of plant with weed of Figure 3 similar to the one presented in Figure 2.

Figure 5 .
Figure 5. Anatomical scales where (W i ,P i ) presents the scales of weeds and plants respectively; (W 1 , P 1 ) points toward the texture of the limb, (W 2 , P 2 ) indicates the typical size of leaflet and (W 3 , P 3 ) stands for the width of the veins.Sw and Sp show the size of a leaf of weed and plant, respectively.The classification of weed and plant is done at the scale of a patch taken as 2 × max(Sp, Sw) in agreement with a Shannon-like criteria.

Figure 6 .
Figure 6.Schematic layout of the weed/plant classifier based on the scattering transform with three layers.The feature vector transmitted to the principal component analysis (PCA) step consists in the scatter vector Z m f of the last layer of Eq. (2) after transposition.

Figure 6 .
Figure 6.Schematic layout of the weed/plant classifier based on the scattering transform with three layers.The feature vector transmitted to the principal component analysis (PCA) step consists in the scatter vector Z m f of the last layer of Equation (2) after transposition.

Figure 7 .
Figure 7. Output images for each class (weed on left and plant on right) and for each layer m of the scatter transform.

Figure 7 .
Figure 7. Output images for each class (weed on left and plant on right) and for each layer m of the scatter transform.

Figure 9 .
Figure 9. Architecture of the deep network optimized for the task on classification.

Figure 10 .
Figure 10.Comparison of the recognition accuracy between scatter transform and deep learning when the number of samples increases.

( a )Figure 11 .
Image 2 (97.27%)(b) Image 9 (69.45%)Visual comparison of the best and the worst recognition of weeds and plants by scatter transform.

Table 1 .
Average percentage of energy of scattering coefficients E m on frequency-decreasing paths of length m (scatter layers), with L = 8 orientations and various filter scale range, J, for the whole database of plants and weeds patches.

Table 2 .
Average percentage of energy of scattering coefficients E m on frequency-decreasing paths of length m (scatter layers), depending upon the maximum scale J and L = 8 filter orientations for the weed class patches.

Table 5 .
Percentage of correct classification by using k-fold Cross-validation on simulated data.