Sentinel-2 Remote Sensed Image Classiﬁcation with Patchwise Trained ConvNets for Grassland Habitat Discrimination

: The present study focuses on the use of Convolutional Neural Networks (CNN or ConvNet) to classify a multi-seasonal dataset of Sentinel-2 images to discriminate four grassland habitats in the “Murgia Alta” protected site. To this end, we compared two approaches differing only by the ﬁrst layer machinery, which, in one case, is instantiated as a fully-connected layer and, in the other case, results in a ConvNet equipped with kernels covering the whole input (wide-kernel ConvNet). A patchwise approach, tessellating training reference data in square patches, was adopted. Besides assessing the effectiveness of ConvNets with patched multispectral data, we analyzed how the information needed for classiﬁcation spreads to patterns over convex sets of pixels. Our results show that: (a) with an F1-score of around 97% (5 × 5 patch size), ConvNets provides an excellent tool for patch-based pattern recognition with multispectral input data without requiring special feature extraction; (b) the information spreads over the limit of a single pixel: the performance of the network increases until 5 × 5 patch sizes are used and then ConvNet performance starts decreasing. communities can be generally found in lower quota areas. Since these areas are easier to access, they have been cultivated and used for sheep grazing. The listed grassland communities include EUNIS taxonomy codes E1.61-E1.C2-E1.C4.


Introduction
An increased interest in classification methods has emerged in the last three decades, receiving considerable research attention especially in Earth Observation (EO) applications through land cover and habitat mapping by means of both optical and radar satellite images.
To this end, an enormous amount of studies has been carried out through the years in an attempt to improve classification accuracy, as extensively summarized by [1], and a variety of satellite data have become available based on a free access policy.
One of the most challenging applications of remote sensing is the monitoring of natural and semi-natural grassland ecosystems, representing one of the largest landscape units in the terrestrial system [2]. EO data and automatic classification techniques can support the mapping of grassland ecosystems. However, the spectral signature of such ecosystems can be rather complex due to the heterogeneous nature of the habitats composing them [2,3] and, despite recent successful attempts, grasslands mapping is still regarded as challenging [4].
Some studies have already addressed natural and semi-natural grassland ecosystems monitoring at medium-high spatial resolution, exploiting both optical and SAR data by using machine learning approaches [5][6][7][8][9][10][11][12][13]. In particular, a Support Vector Machine (SVM) classifier was used to analyze, both separately and in combination, a series of four optical (5 to 30 m) and five SAR images (12 m), for discriminating natural grasslands from croplands in areas affected by frequent clouds [5]. SVM was also used to distinguish seven Although efforts have been made on the automatic mapping of grassland habitats by means of machine learning techniques, to the best of our knowledge there are no publications that have explored the use of CNN for this purpose.
Thus, the goal of the present work is to assess the performance of a patchwise-trained ConvNet, using multispectral data as input, for a grassland habitats discrimination problem. Two experiments were carried out: (a) evaluating how the information, needed for classification by a ConvNet spreads over convex sets of reference pixels, split into patches, varying the size of square input patches; (b) assessing the quality of a patchwise-trained, ConvNet-based pattern detector fed in input with multispectral data applied to grassland habitats mapping. Furthermore, the performance of a ConvNet was compared with that of a corresponding fully connected architecture network.

Study Area and Grassland Habitats Characterization
The study area is located in the Mediterranean basin within the Apulia region, Southern Italy. The area (red boundary, Figure 1) covers nearly 800 km 2 within the Natura 2000 "Murgia Alta" protected area (IT9120007) (black boundary, Figure 1). This is a Site of community importance and in addition a Special Protection Area that has been included in a National Park since 2004. The altitude of the area ranges from 285 to 680 meters above sea level. The site is characterized by a typical Mediterranean agro-pastoral landscape with millennial landuse history mainly occupied by semi-natural rocky dry grasslands, traditionally used as extensive pastures [29]. In "Murgia Alta", the semi-natural grassland ecosystem hosts numerous endemic, rare or trans-Adriatic distribution species [30]. This area is considered of crucial importance for the conservation of wildlife and priority species [31].
During the last three decades, this unique ecosystem has been exposed to tremendous impacts and an accelerated processes of habitat degradation, fragmentation and biotic contamination (i.e., woody encroachment), both within and next to its borders, on local biodiversity, due to both agricultural intensification (transformation of grassland pastures into agricultural cereal crops intensification) and land abandonment. Furthermore, the long-term below-average rainfall (climate change), the increasing of either legal and illegal mining activities or wind farms infrastructures and arson [32,33] and the spread of invasive species contribute to the threat to the ecosystem, which is in danger of destruction [34,35].
The four grassland habitat types considered in the study area are listed and described in Table 1  Semi-natural and natural dry grasslands and scrubland facies on calcareous substrates (Festuco-Brometalia). This habitat in Murgia Alta is limited to small, highly fragmented patches that can be located in areas found at higher quotas, where agriculture and pasture have been abandoned.

Type_2
62A0/E1.55 Eastern sub-Mediterranean dry grasslands (Scorzoneratalia villosae). This habitat is the most widespread and dominant habitat in the study area and is characterized by the endemic feather grass Stipa austroitalica, which constitutes perennial prairies with a rocky nature.
Type_3 6220*/E1.434 where * indicates priority habitat Pseudo-steppe with grasses and annuals of the Thero-Brachypodietea. In Murgia Alta, this habitat consists of different types of grasslands, both annual and perennial. Annual communities resulting in small patches of less than 10 meters are not considered in the present study.
Only Hyparrhenia hirta perennial communities will be considered in this work.

Type_4
No code in Annex I X/E1.61-E1.C2-E1.C4 Mediterranean subnitrophilous grass communities, thistle fields and giant fennel (Ferula) stands. In the study area, such a grassland type consists of both annual and perennial communities. These grassland communities can be generally found in lower quota areas. Since these areas are easier to access, they have been cultivated and used for sheep grazing. The listed grassland communities include EUNIS taxonomy codes E1.61-E1.C2-E1.C4.

Ground Truth
To obtain a set of reference polygons, georeferenced surveys of the vegetation were carried out using the phytosociological method of the Sigmatista School in Zurich-Montpellier [38] based on the complete floristic composition for the plant community investigated. This approach is recognized at the EU level [39,40] since it allows a precise diagnosis for many habitats of the Directive and in particular for grassland habitats. For our work, the sampling was first stratified, i.e., the relevés were carried out randomly in areas previously identified on the basis of their physiognomic and structural homogeneity. Then, after a multivariate numerical classification by using the coverage values transformed according to the scale proposed by van der Maarel (1979) [41], the different plant community types were identified and consequently attributed to the habitats of both the EU Directive [39], based on the "Interpretation Manual of European Union habitats", and the "Manuale Italiano di Interpretazione degli Habitats" [42,43]. Thereafter, a polygon related to a homogenous area around each relevés was identified by a visual interpretation of the available orthophoto (2018). The data reported in Table 1 are related to four different habitat grasslands classes, which correspond to the same Land Cover (LC) class of semi-natural grasslands.
Due to the highly time-consuming process involved in the recognition and collection of the ground data, the reference polygons dataset resulted in having a rather small cardinality and a high asymmetry mainly among the different classes, with a lower presence of habitats characterized by small and highly fragmented patches.  For each class, the percentage of 75% (54,725 pixels) from the available ground truth samples was randomly selected for training and testing. Validation of the classifier was performed using the 100% (73,386) of ground truth data in a k-fold procedure. Figure 3 shows ground reference samples distribution for each grassland habitat class.

Satellite Data
For the year 2018, four multi-seasonal Sentinel-2A images were freely downloaded from the United States Geological Survey (USGS) EarthExplorer portal [44], selecting images with less than 10% cloud cover on the study area. Table 2 reports the date of acquisition for each image. The entire study area was covered by the tile 33TXF and the orbit R036 was considered. Level-2A surface reflectance products were downloaded for the images.

Algorithm for Habitat Mapping
To discriminate between the four grasslands habitats listed in Table 1, a CNN classifier was adopted to investigate the performance of such a classifier for approaching an application characterized by a complex reference dataset, as detailed in Section 2.2.1.

CNN Classifier Configuration
CNNs are feedforward neural networks and are characterized by sparse connectivity: neurons of adjacent layers are collected by only local connections. In addition, each neuron in a layer shares the same weights and bias. CNNs consist of a series of convolutional, pooling, and nonlinear activation functions. They select features automatically by applying multiple filters (called convolutional kernels) on the input images in the form of multidimensional arrays (i.e., image patches), and learn to select the ones that are necessary for the images' proper classification [45]. Features in a rectangle neighborhood are then aggregated into one feature by the pooling layer [46]. CNN parameters are divided into hyper-parameters and non-hyper-parameters: the first include input size, convolution kernel size, number of convolution kernels, pooling kernel size, and learning rate; the second refer to the weights of the hidden layers that are adjusted during the training by using a Back Propagation (BP) algorithm [47]. However, no definite rules have been codified for the optimization of the CNN parameters, whereby the choice of their setting depends on the user experience [48,49]. CNNs need large training image datasets [50], which are usually not available; hence, to correctly train large architectures pre-trained CNNs [51][52][53][54][55] or segmentation techniques, such as transfer learning [56,57] and active learning [58], are used to overcome this problem.
The most widespread strategy in land cover classification is based on the use of patches by applying moving windows with a fixed size on each pixel [59][60][61][62][63]. By varying input and output network configurations through the patch sizes, classification accuracy and computational cost can be adjusted according to the scenario that has to be classified.
CNNs are most frequently used to classify image data with high spatial resolution (especially up to 10 m) thanks to their ability to extract high-level feature information using a single image scene or multisource remote-sensing data [26]. Thus, a CNN was adopted for our study due to the 10 m spatial resolution of multi-temporal Sentinel-2 data considered for the grassland habitat mapping and the possibility to exploit the contextual information contribution by using a per patch approach.
A basic ConvNet for image classification relies on the following architecture: inputconv-pool-Fully Connected (FC) [64], where the main purpose of the pool layer is to reduce the size of the input data [65]. Due to the patch size used in these experiments being small compared to the convolutional kernel sizes usually chosen with ConvNets, the pool layer was removed.
Consequently, the patch-tailored ConvNet architecture adopted in this study is described in Table 3. num_classes (4) softmax [67] In Table 3: • The input layer specifies the size of the patches, that in our case is variable between 1 × 1 and 6 × 6, while num_bands (in our case equal to 40) refers the depth of the multispectral Sentinel data; • kernel_size is an integer specifying the height and width of the 2D convolution window, whereas depth (equal to 32 in our case) is the dimensionality of the output space (i.e., the number of output filters in the convolution). Such value was determined in the meta-parameter tuning procedure as an effective compromise between the architecture complexity and the learning curve convergence; • output_size (equal to 128 in our case) is the number of output neurons of the first FC layer; • num_classes (equal to 4 in our case) is the number of output neurons of the latest FC layer.
In order to prevent the network from overfitting, two dropout layers [68] were added before each of the two FC layers.
In addition to the settings in Table 3, for all layers, the model used biases initialized at zero and a Gloriot uniform weight initializer [69]. This method was implemented according to Keras [70]. Figure 4 shows the adopted network architecture in terms of units and connection for the case with input patches of 5 × 5 size. The architecture of our network has been designed to comply with our goals: (1) show the effectiveness of a CNN-based approach with our data; (2) show how the information spreads over multiple multispectral pixels. The selected number of units represents a compromise between the computational effectiveness and performance of the network.
The input multispectral dataset includes 40 bands as fully detailed in Section 2.2.2.
To generate the patch-based dataset, the set of polygons was tessellated, embedding the on-site verified pixels in square patches. Patches of multispectral pixels with 1 × 1, 2 × 2, 3 × 3, 4 × 4, 5 × 5, and 6 × 6 sizes were generated by using the procedure described above, as shown in Figure 5. In Figure 5, the blue areas highlight the masked pixels (i.e., pixels set to zero to prevent their contribution), the light purple areas show the unmasked pixels, the green areas show the closed sets delimiting the ground truth, the light yellow lines highlight overlapping patches, and the black thick box represents a 4 × 4 patch size. A pixel is considered as part of a patch if its center lies inside a ground truth polygon. The x and y axes show the pixel coordinates.

Experimental Setting
As stated above, besides generally assessing the effectiveness of ConvNets with patched multispectral data, the purpose of the present study was to determine how the information needed for classification spreads to patterns over convex sets of pixels. To this end, six datasets of square patches with 1 × 1, 2 × 2, 3 × 3, 4 × 4, 5 × 5, and 6 × 6 sizes were created. Due to the spatial layout of the data, the 6 × 6 size is the maximum one allowed by the dataset in order to avoid the Type_1 patches disappearing. Table 4 shows the patch distribution with respect to the classes. The data in Table 4 represent the number of patches generated with different sizes for each grassland habitat class. Every pixel within the patches covers an area of 10 × 10 meters on the ground. Table 4, the dataset grow more and more unbalanced with the patch size. To restore balance, the classes have been weighted in the loss function according to the following equation:

As evident by inspecting
where freq i and w i are, respectively, the frequency and weight of the specific class.
To assess the quality of a ConvNet in a grassland habitats discrimination problem two experiments were carried out: A. Evaluating how the information needed for classification spreads over multiple multi-spectral pixels varying the size of square input patches to the ConvNet (Information Localization). In detail, in our setting the kernel size of the ConvNet (kernel_size × kernel_size) was grown linearly with the patch size. As no padding was set up, the FC part of the network remained unchanged while the convolution kernel increased and took charge of the pattern recognition task. B. Comparing the performance of a ConvNet with that of a corresponding FC architecture network. For fair comparison, the FC was set up by leaving untouched the original ConvNet with the exception of the kernel size of the convolutional layer, which, in this second instance, has been kept to a 1 × 1 size. Our CNN settings include: • A total of 120 epochs with a batch size of 32 and 1000 steps per epoch; • A kernel size equal to the size of the input patches; • An Adadelta optimizer with: (a) 0.001 as learning rate; (b) a decay rate of 0.95; (c) a stability factor of 1 × 10 −7 . The adopted Keras [70] framework and the experiments were performed on an Ubuntu 18.04 Lenovo ThinkStation P520. Figure 6 shows the flowchart of the different steps implemented in our experiment. Figure 6. Flowchart of the algorithm implemented for the grassland habitat mapping.

Accuracy Assessment
Due to the small size, the scarcity and the unbalanced character of the dataset, tessellating our ground truth polygons has proven progressively challenging when dealing with patches of increasing size. Therefore, at first, we split the dataset into two sets, one for training and testing (to perform hyperparameters tuning) and one for validation.
Once the hyperparameters had been tuned, we performed a final validation by stratified k-fold (with k = 3) using the whole dataset: the dataset was therefore randomly split three times into a training set and a test set, and the final test score was an average of the performance during the three combined training and test procedures. Stratified k-fold provides train/validation indices to split data in train/validation sets [71]. This cross-validation object is a variation of k-fold preserving the percentage of samples for each class. As our class distribution was strongly uneven, the performance of our system was measured with the F1-score metric, as detailed in the following formulas:

Grassland Habitats Characterization
Surface reflectances represent the input features of the classification procedure. They were, therefore, initially analyzed to provide grassland habitats characterization in terms of their spectral and seasonal behaviors. The surface reflectance values were analyzed separately for each season and habitat, resulting in 16 spectral signatures (Figure 7).
It can be noticed that all of the classes exhibited the highest photosynthetic activity in spring, except for Type_3, for which winter represented the peak of the biomass season. This can be explained considering that this habitat is typically located on south-facing slopes. The availability of water, due to precipitation and the presence of not particularly cold temperatures, resulted in reaching photosynthetic activity earlier than the other types. Although Type_1 showed the same seasonality as both Type_2 and Type_4, it differed from them in terms of absolute values of surface reflectance in the rededge and NIR bands in spring. Type_2 and Type_4 habitats showed similar spectral responses for the different seasons. Some differences could be observed only for SWIR bands' surface reflectances in autumn.
The spatial characterization of the different grassland habitats can be observed hereafter. In Figure 8 the yellow patch 6 × 6 size, overlaid on samples of the different classes of grassland habitat, highlights the higher Type_4 class heterogeneity on the ground compared to the other classes. Hence, as patch size increased, the presence of heterogeneity resulted in a reduced suitability to discriminate that class by the CNN.

Information Localization
The findings obtained in the first experiment (Section 2.3.2, Experiment A) are reported in Table 5 and shown in Figures 9 and 10.    Figure 9 reports the values of F1-score computed for each of the classes considered for the different input patch sizes considered.  The averaged F1-score (Table 5) increased as the patch size increased to the 5 × 5 size and then showed a slight decrease with 6 × 6 patch size. The explanation of this behavior is twofold: on the one hand, the dataset grows more and more asymmetric by the patch size, which makes it harder to re-balance; on the other hand, the Type_4 class, which is non-specific as it collects all the vegetation types not belonging to the other types, grows more noisy by the patch size and eventually drives the overall behavior. In other words, unlike the other cases, increasing the patch size augments the amount of noise present in the Type_4 patch instead of the amount of information. This fact is highlighted by the per-type F1-score graph (Figure 9).
In Figure 10 the accuracy estimators, i.e., precision ( Figure 10a) and recall (Figure 10b), for the different classes vs. the input patch size are plotted.
It is interesting to note that Type_4 displayed a decreasing trend that was slightly different from the other types. This is especially evident in the precision graph, while the decrease in the recall graph is slightly less noticeable. However, the F1-score being the harmonic mean of precision and recall, its decreasing trend is clearly biased by the precision contribution, which, in turn, is determined by a significant variation of FP over the FN. An ideal compromise appears to be suggested by the 5 × 5 patch size with the four habitat types showing similar values both in precision and recall.
The results of the grasslands habitat mappings and the percentage of pixels mapped as the patch size varies can be seen in Figures 11 and 12, respectively.  From the prior knowledge provided by ecologists it is known that the study area is characterized by the presence of 70-80% of the dominant class Type_2 followed by the class Type_4 and then by the presence of a minority of Type_1 and Type_3 classes. Such a distribution can be evaluated, also, from the ground truth samples considered for training (Figure 3). To handle the asymmetric nature of our dataset, we adopted a twofold strategy: first, we balanced the contribution of each class type in the loss function with respect to their frequency; second, we chose F1-score as our metric to balance precision and recall.
Mappings obtained at larger patch sizes (i.e., 5 × 5 or 6 × 6) can be considered more confident with respect to the expected grassland habitats distribution, whereas at smaller patch sizes an overestimation mainly of Type_3 class followed by Type_1 emerged. Those pixels belong to the misclassified Type_2 dominant class. The specific distribution of the mapped pixels varying the patch size can be seen in Figure 12.
The effect of the patch size growing is to increase the percentage of grassland assigned to Type_2 (Figure 12), in agreement with the expected spatial distribution of the four habitats in the study site. Figure 13 shows the results of the second experiment (Section 2.3.2, Experiment B). The fully connected network corresponds to the case of ConvNet with a 1 × 1 patch size. As seen in Figure 13, the ConvNet-based architecture outperformed the corresponding FC when the size of the input patches grew past 3 × 3. This result is more relevant when the number of parameters used in the two approaches are also compared ( Figure 14). As shown in Figure 14, the number of parameters needed for the FC architecture to approximately match the ConvNets in the 5 × 5 and 6 × 6 cases was more than two times larger.

ConvNet vs. FC Network Performance
To evaluate further this aspect we performed an additional experiment reducing the number of parameters of the FC architecture to a size similar to the ConvNet case. To this end, we reduced the depth of the convolution layer from 32 to 12 and fed it with the patches exhibiting the best performance (5 × 5 input patches). This resulted in an architecture with 39,536 parameters for this FC case, to be compared with the performance of the 36,772 parameters of the corresponding ConvNet with a filter kernel of 5 × 5 size. Figure 15 shows the results. In Figure 15, the F1-scores of the two architectures are further compared with the ConvNet architecture with 3 × 3 patches as input. The FC architecture using 5 × 5 patches (middle column) was outperformed in terms of F1-score by the ConvNet using 5 × 5 patches, which relied on a similar number of parameters (left column). Moreover, it performed slightly worse than the ConvNet case using 3 × 3 patches (right column), which relied on a smaller number of parameters.
This experiment highlights the typical ConvNet ability to leverage the spatial correlation among the pixels: a multi-perceptron (FC) fed with 5 × 5 patches, with a kernel size of 1 × 1, needs~3 times the number of parameters of a ConvNet to achieve similar performances (although still slightly inferior). Indeed, single-pixel fed networks are also able to exploit spatial correlation; however, unlike Convnets, they are not specifically designed for it, and therefore they need more parameters and more training (Figure 15).

Discussion
The present work aimed at assessing the performance of a ConvNet classifier for a grassland habitats discrimination problem. The Mediterranean grassland habitats considered require a highly time-consuming process for the recognition and collection of the ground truth data to be used for training and validation of the classifier. This implies that the reference polygons dataset is characterized by a rather small cardinality and a high asymmetry, mainly for those grassland habitat classes with lower presence on the ground that are characterized by small and highly fragmented patches. Due to the mentioned peculiarities of the specific application, the ability to obtain a reliable mapping for these grassland habitats represent a challenge.
Our work exploited the effectiveness of a CNN in the detection of grassland habitats and its higher suitability to manage such a problem with a training dataset characterized by low cardinality and asymmetry. These limitations were approached by considering tessellating the input dataset with square patches centered around the reference data for each sample. Enlarging the patch size, an increase in accuracy in terms of F1-score for all the classes except Type_4 was registered. This result can be explained considering the essential role of contextual information around each training sample, due to the correlation among pixels. The use of such a per patch approach can be assimilated to an automatic spatial feature extraction. The patch size cannot be increased indefinitely because, as can be observed in Table 4, of the risk of missing the presence of the Type_1 class. Another limitation in the increasing patch size was obtained in the worsening of the accuracy for the Type_4 class. This can be due to the specific heterogeneity of the Type_4 class ( Figure 8), which is composed of different grassland communities and, presumably, the higher the patch size, the noisier the informative contribution associated with the patch, which can cause misclassification.
However, observing the overall habitat mapping obtained by using different patch sizes, a reduction of the overestimation areas covered by Type_3 class can be noticed, followed by Type_1, in favor of the dominant habitat Type_2 (Figures 11 and 12). Patch size growth results in a distribution of grassland habitats areas that is quite in agreement with the expected spatial distribution in the study site. Indeed, the "Murgia Alta" site is characterized by a large presence of Type_2 habitat of almost 70-80% and a lower presence of the remaining types. Hence, the use of a variable patch size can be considered a useful approach to take into account the edaphic conditions of the grasslands ecosystem in the study site. Moreover, the combined use of multi-seasonal and multispectral information derived by the four selected satellite images as input to the CNN seems to provide encouraging results in grassland habitat discrimination.
It is well-known that ConvNets are able to exploit eventual correlations among closely located data on a 2D grid by extracting information in a similar way to which 2D nonseparable finite impulse response (FIR) filters extract frequency content.
From a theoretical standpoint, the difference between the two approaches examined in this work (1 × 1 patches and N × N patches with N > 1) is highlighted by the formulae in Equation (3): In Equation (3), the top equation refers to the 1 × 1 patch case (spk, single pixel kernel) while the bottom one refers to ConvNets equipped with wide-kernels (wk) covering a whole patch (with patch size bigger than 1 × 1). In this representation, biases have been neglected.
In our case, the chosen F is a Rectification Linear Unit (ReLU), which is compliant with the standard ConvNet setting. Assuming our input data are made up of positive values, as in the considered surface reflectance product, the consequence of this choice is that the 32 wide-kernels (wk) of the convolution layer are allowed relevance by the activation function only when their gain is positive: after training, such a convolution layer is able to cover the whole input spectrum up to the Nyquist frequency, therefore fully exploiting the informative frequency content present in the input. The above described machinery is absent in the "spk" case: without a bias, the ReLU would cut the negative weights, causing the first layers to be trained (and behave) as a set of low pass FIR filters. This fact would prevent the elaboration from exploiting the full frequency content of the input data, as high frequencies would be excluded from further processing. However, a bias can alleviate this problem by assuming positive values, but this comes to be a requisite of the architecture and not a potential ideal range learned from the dataset. This explains why to achieve decent results (Figure 15), the "spk" approach requires more complexity (i.e., more parameters).

Conclusions
The aim of the present study was to investigate the improvements that can be obtained by applying CNN techniques for mapping grassland habitats. Specifically, we have analyzed the effectiveness of ConvNets with a multi-seasonal dataset of four Sentinel-2 images. To this end, we compared two approaches differing only by the first layer machinery, which was instantiated as a fully connected layer (fully-connected case) and as a ConvNets equipped with kernels covering the whole input (wide-kernel ConvNet).
Our results show that: (a) with an F1-score of around 97% (5 × 5 patches), ConvNets provided an excellent tool for patch-based pattern recognition with multispectral data without requiring special feature extraction; (b) the information spreads over the limit of a single pixel: the performance of the network increased up to 5 × 5 patches being used and then ConvNet performance started decreasing for patch sizes larger than 5 × 5. This decrease in performance could be probably ascribable to: (a) overfitting caused by the increasing size of the parameter set; (b) the dataset becoming extremely asymmetric and no longer balanceable with Equation (1); (c) the information residing near the patch boundaries being no longer relevant and possibly misleading; (d) the decrease of available samples (Table 4), used for training, which can result as being insufficient. Further studies will be necessary to finally assess the exact nature of this phenomenon.