# Vineyard Gap Detection by Convolutional Neural Networks Fed by Multi-Spectral Images

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

- NDWI is used for remote sensing of vegetation water from space. This type of supervision is used in agriculture and forest monitoring for fire risk evaluation, and is particularly suitable in the context of climate change. NDWI is responsive to changes in the water content of leaves in vegetation canopies and is less sensitive than NDVI to atmospheric changes [24]
- CIR uses the near-infrared (NIR) portion of the electromagnetic spectrum. This type of imagery is very useful when detecting different plant species, since the hue variations are more pronounced than in the visible light spectrum. CIR can also be used to detect changes in soil moisture [25].
- RGB is the most common image data available, being that it recreates images in the visible light spectrum. RGB is an additive color model where three colors—red, green, and blue—are combined to create a bigger color spectrum. From RGB, one can create GS images with image processing techniques, although, in this instance, several GS images were already provided, so it was chosen not to increase their number by converting the RGB images.
- NDRE combines NIR and a band between visible red and NIR. This index is very similar to NDVI but is more sensitive to different stages of crop maturation and is more suitable than NDVI for later crop seasons, after the vegetation has accumulated a bigger concentration of chlorophyll. This makes it so that NDRE is more fit for the entire cultivation season [26].
- NDVI is the oldest remote sensing technique used for vegetation monitoring. By observing different types of wavelengths (visible and non-visible light), one can determine the density of green vegetation in a patch of land. The pigment chlorophyll in plant leaves absorbs visible light (from 0.4 to 0.7 $\mathsf{\mu}$m) when doing photosynthesis, and the cell structure of the leaves reflects NIR. This index is better applied when trying to figure out how much plants cover a certain area, and can be great at detecting gaps in green crops [27].

## 2. Related Works

## 3. Materials and Methods

- Stage 1: Training all regression-based networks. For this, the networks in this repository [59] were trained with the following configurations: height and width of 406 pixels for input images, 32 batches, 4 subdivisions, 3 channels, momentum of 0.9, 0.0005 decay rate, 1.5 for saturation and exposure, a hue of 0.1, a learning rate of 0.001, and a maximum number of batches of 4000. The maximum number of batches determines how long the network will train for. The parameters of the network were not altered from the defaults found in the repository at any stage of training.
- Stage 2: Training ‘head’ layers of Mask-RCNN for 30 epochs. These layers included the region proposal network (RPN), the mask head layers, and the classifier layers. We used the Mask-RCNN network in this repository [60]. This network is based on feature pyramid network (FPN) and a ResNet101 as a backbone. The resulting weights of Stage 1 were fed as starting weights for Mask-RCNN. The configurations were not changed from the default ones in the repository configuration files.
- Stage 3: Continuation of Mask-RCNN training, but this time with ‘all’ layers being trained for 50 epochs. The same repository of Stage 2 was used once again, with the same configurations. The resulting weights of Stage 2 were used as starting weights for this stage.

^{®}Core™ i7-9750H CPU @ 2.60 GHz × 12 and a NVIDIA GeForce GTX 1660 Ti/PCIe/SSE2 graphics card, and the languages and frameworks used were Python, Tensorflow, Keras, and OpenCV.

#### 3.1. The Images and Datasets

Algorithm 1 Crop images with overlap. |

Find every X and Y that defines the borders of the new images for every ${y}_{value}$ in Y dofor every ${x}_{value}$ in X do Split the image according to the coordinates if Cropped image only contains $alpha$ values then Skip Image end if Save cropped image as a new image end forend for |

Algorithm 2 Separate images into train batch and test batch. |

Create a parent folder with each image format separated into different folders Percentage = fraction of images going into the test batch for every folder in directory dofor every file in folder do Choose a random number in an interval if number < percentage then Move image to test batch folder else Move image to train batch folder end ifend forend for |

#### 3.2. The Code and Repositories

## 4. Results

#### 4.1. Results of Training Regression Based Networks

#### 4.2. Results of Training Mask-RCNN ‘head’ Layers

^{®}Core™ i7-9750H CPU @ 2.60 GHz × 12 processor and a NVIDIA GeForce GTX 1660 Ti/PCIe/SSE2 graphics card, and Google Colab [70] was used for time management purposes.

#### 4.3. Results of Training Mask-RCNN ‘all’ Layers

^{®}Core™ i7-8700 CPU @ 3.20 GHz × 12 and a graphics card GeForce RTX 2070 SUPER/PCIe/SSE2. The computer was changed because the code could not run in the previous one due to the GPU not being able to complete the training and crashing.

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

CNN | Convolutional Neural Networks |

RCNN | Regional Convolutional Neural Networks |

YOLO | You Only Look Once |

SSD | Single Shot MultiBox Detection |

SVM | Multiclass Support Vector Machine |

MLR | Multinomial Logistic Regression |

SGD | Stochastic Gradient Descent |

ReLU | Rectified Linear Unit |

mAP | Mean Average Precision |

fps | Frames per Second |

FPN | Feature Pyramid Network |

RoI | Region of Interest |

RPN | Region Proposal Network |

IoU | Intersection over Union |

FCN | Fully Convolutional Network |

AR | Average Recall |

SS | Selective Search |

NDWI | Normalized Difference Water Index |

CIR | Color-Infrared |

NIR | Near-Infrared |

RGB | Red-Green-Blue |

GS | Gray-Scale |

NDRE | Normalized Difference Red-Edge |

NDVI | Normalized Difference Vegetation Index |

UAV | Unmanned Aerial Vehicle |

GDP | Gross Domestic Product |

GS | Gray-Scale |

PAN | Path Aggregation Network |

SPP | Spatial Pyramid Pooling |

NMS | Non Maximum Suppresion |

## Appendix A. Network Architectures

Layer | Filters | Size | Input | Output |
---|---|---|---|---|

0 conv | 16 | $3\times 3$ | $416\times 416\times 3$ | $416\times 416\times 16$ |

1 max | $2\times 2$ | $416\times 416\times 16$ | $208\times 208\times 16$ | |

2 conv | 32 | $3\times 3$ | $208\times 208\times 16$ | $208\times 208\times 32$ |

3 max | $2\times 2$ | $208\times 208\times 32$ | $104\times 104\times 32$ | |

4 conv | 64 | $3\times 3$ | $104\times 104\times 32$ | $104\times 104\times 64$ |

5 max | $2\times 2$ | $104\times 104\times 64$ | $52\times 52\times 64$ | |

6 conv | 128 | $3\times 3$ | $52\times 52\times 64$ | $52\times 52\times 128$ |

7 max | $2\times 2$ | $52\times 52\times 128$ | $26\times 26\times 128$ | |

8 conv | 256 | $3\times 3$ | $26\times 26\times 128$ | $26\times 26\times 256$ |

9 max | $2\times 2$ | $26\times 26\times 256$ | $13\times 13\times 256$ | |

10 conv | 512 | $3\times 3$ | $13\times 13\times 256$ | $13\times 13\times 512$ |

11 max | $2\times 2$ | $13\times 13\times 512$ | $13\times 13\times 512$ | |

12 conv | 1024 | $3\times 3$ | $13\times 13\times 512$ | $13\times 13\times 1024$ |

13 conv | 1024 | $3\times 3$ | $13\times 13\times 1024$ | $13\times 13\times 1024$ |

14 conv | 30 | $1\times 1$ | $13\times 13\times 1024$ | $13\times 13\times 30$ |

15 detection |

Layer | Filters | Size | Input | Output |
---|---|---|---|---|

0 conv | 32 | $3\times 3$ | $416\times 416\times 3$ | $416\times 416\times 32$ |

1 max | $2\times 2$ | $416\times 416\times 32$ | $208\times 208\times 32$ | |

2 conv | 64 | $3\times 3$ | $208\times 208\times 32$ | $208\times 208\times 64$ |

3 max | $2\times 2$ | $208\times 208\times 64$ | $104\times 104\times 64$ | |

4 conv | 128 | $3\times 3$ | $104\times 104\times 64$ | $104\times 104\times 128$ |

5 conv | 64 | $1\times 1$ | $104\times 104\times 128$ | $104\times 104\times 64$ |

6 conv | 128 | $3\times 3$ | $104\times 104\times 64$ | $104\times 104\times 128$ |

7 max | $2\times 2$ | $104\times 104\times 128$ | $52\times 52\times 128$ | |

8 conv | 256 | $3\times 3$ | $52\times 52\times 128$ | $52\times 52\times 256$ |

9 conv | 128 | $1\times 1$ | $52\times 52\times 256$ | $52\times 52\times 128$ |

10 conv | 256 | $3\times 3$ | $52\times 52\times 128$ | $52\times 52\times 256$ |

11 max | $2\times 2$ | $52\times 52\times 256$ | $26\times 26\times 256$ | |

12 conv | 512 | $3\times 3$ | $26\times 26\times 256$ | $26\times 26\times 512$ |

13 conv | 256 | $1\times 1$ | $26\times 26\times 512$ | $26\times 26\times 256$ |

14 conv | 512 | $3\times 3$ | $26\times 26\times 256$ | $26\times 26\times 512$ |

15 conv | 256 | $1\times 1$ | $26\times 26\times 512$ | $26\times 26\times 256$ |

16 conv | 512 | $3\times 3$ | $26\times 26\times 256$ | $26\times 26\times 512$ |

17 max | $2\times 2$ | $26\times 26\times 512$ | $13\times 13\times 512$ | |

18 conv | 1024 | $3\times 3$ | $13\times 13\times 512$ | $13\times 13\times 1024$ |

19 conv | 512 | $1\times 1$ | $13\times 13\times 1024$ | $13\times 13\times 512$ |

20 conv | 1024 | $3\times 3$ | $13\times 13\times 512$ | $13\times 13\times 1024$ |

21 conv | 512 | $1\times 1$ | $13\times 13\times 1024$ | $13\times 13\times 512$ |

22 conv | 1024 | $3\times 3$ | $13\times 13\times 512$ | $13\times 13\times 1024$ |

23 conv | 1024 | $3\times 3$ | $13\times 13\times 1024$ | $13\times 13\times 1024$ |

24 conv | 1024 | $3\times 3$ | $13\times 13\times 1024$ | $13\times 13\times 1024$ |

25 route | 16 | $26\times 26\times 512$ | ||

26 conv | 64 | $1\times 1$ | $26\times 26\times 512$ | $26\times 26\times 64$ |

27 reorg_old | /2 | $26\times 26\times 64$ | $13\times 13\times 256$ | |

28 route | 27 24 | $13\times 13\times 1280$ | ||

29 conv | 1024 | $3\times 3$ | $13\times 13\times 1280$ | $13\times 13\times 1024$ |

30 conv | 30 | $1\times 1$ | $13\times 13\times 1024$ | $13\times 13\times 30$ |

31 detection |

Layer | Filters | Size | Input | Output |
---|---|---|---|---|

0 conv | 16 | $3\times 3$ | $416\times 416\times 3$ | $416\times 416\times 16$ |

1 max | $2\times 2$ | $416\times 416\times 16$ | $208\times 208\times 16$ | |

2 conv | 32 | $3\times 3$ | $208\times 208\times 16$ | $208\times 208\times 32$ |

3 max | $2\times 2$ | $208\times 208\times 32$ | $104\times 104\times 32$ | |

4 conv | 64 | $3\times 3$ | $104\times 104\times 32$ | $104\times 104\times 64$ |

5 max | $2\times 2$ | $104\times 104\times 64$ | $52\times 52\times 64$ | |

6 conv | 128 | $3\times 3$ | $52\times 52\times 64$ | $52\times 52\times 128$ |

7 max | $2\times 2$ | $52\times 52\times 128$ | $26\times 26\times 128$ | |

8 conv | 256 | $3\times 3$ | $26\times 26\times 128$ | $26\times 26\times 256$ |

9 max | $2\times 2$ | $26\times 26\times 256$ | $13\times 13\times 256$ | |

10 conv | 512 | $3\times 3$ | $13\times 13\times 256$ | $13\times 13\times 512$ |

11 max | $2\times 2$ | $13\times 13\times 512$ | $13\times 13\times 512$ | |

12 conv | 1024 | $3\times 3$ | $13\times 13\times 512$ | $13\times 13\times 1024$ |

13 conv | 512 | $3\times 3$ | $13\times 13\times 1024$ | $13\times 13\times 512$ |

14 conv | 30 | $1\times 1$ | $13\times 13\times 512$ | $13\times 13\times 30$ |

15 detection |

**Table A4.**Tiny YOLOv3 Architecture. Layer 16: [yolo] params: iou loss: mse, $io{u}_{norm}$: 0.75, $ob{j}_{norm}$: 1.00, $cl{s}_{norm}$: 1.00, $delt{a}_{norm}$: 1.00, $scal{e}_{xy}$: 1.00.

Layer | Filters | Size | Input | Output |
---|---|---|---|---|

0 conv | 16 | $3\times 3$ | $608\times 608\times 3$ | $608\times 608\times 16$ |

1 max | $2\times 2$ | $608\times 608\times 16$ | $304\times 304\times 16$ | |

2 conv | 32 | $3\times 3$ | $304\times 304\times 16$ | $304\times 304\times 32$ |

3 max | $2\times 2$ | $304\times 304\times 32$ | $152\times 152\times 32$ | |

4 conv | 64 | $3\times 3$ | $152\times 152\times 32$ | $152\times 152\times 64$ |

5 max | $2\times 2$ | $152\times 152\times 64$ | $76\times 76\times 64$ | |

6 conv | 128 | $3\times 3$ | $76\times 76\times 64$ | $76\times 76\times 128$ |

7 max | $2\times 2$ | $76\times 76\times 128$ | $38\times 38\times 128$ | |

8 conv | 256 | $3\times 3$ | $38\times 38\times 128$ | $38\times 38\times 256$ |

9 max | $2\times 2$ | $38\times 38\times 256$ | $19\times 19\times 256$ | |

10 conv | 512 | $3\times 3$ | $19\times 19\times 256$ | $19\times 19\times 512$ |

11 max | $2\times 2$ | $19\times 19\times 512$ | $19\times 19\times 512$ | |

12 conv | 1024 | $3\times 3$ | $19\times 19\times 512$ | $19\times 19\times 1024$ |

13 conv | 256 | $1\times 1$ | $19\times 19\times 1024$ | $19\times 19\times 256$ |

14 conv | 512 | $3\times 3$ | $19\times 19\times 256$ | $19\times 19\times 512$ |

15 conv | 18 | $1\times 1$ | $19\times 19\times 512$ | $19\times 19\times 18$ |

16 yolo | ||||

17 route | 13 | $19\times 19\times 256$ | ||

18 conv | 128 | $1\times 1$ | $19\times 19\times 256$ | $19\times 19\times 128$ |

19 upsample | $2\times $ | $19\times 19\times 128$ | $38\times 38\times 128$ | |

20 route | 19 8 | $3\times 3$ | $38\times 38\times 384$ | $38\times 38\times 384$ |

21 conv | 256 | $1\times 1$ | $38\times 38\times 256$ | $38\times 38\times 256$ |

22 conv | 18 | $38\times 38\times 18$ | ||

23 yolo |

**Table A5.**Tiny YOLOv4 Architecture. Layer 30: [yolo] params: iou loss: ciou, $io{u}_{norm}$: 0.07, $ob{j}_{norm}$: 1.00, $cl{s}_{norm}$: 1.00, $delt{a}_{norm}$: 1.00, $scal{e}_{xy}$: 1.05 $nm{s}_{kind}$: greedynms, beta = 0.600000. Layer 37: [yolo] params: iou loss: ciou, $io{u}_{norm}$: 0.07, $ob{j}_{norm}$: 1.00, $cl{s}_{norm}$: 1.00, $delt{a}_{norm}$: 1.00, $scal{e}_{xy}$: 1.05 $nm{s}_{kind}$: greedynms, beta = 0.600000.

Layer | Filters | Size | Input | Output |
---|---|---|---|---|

0 conv | 32 | $3\times 3$ | $608\times 608\times 3$ | $304\times 304\times 32$ |

1 conv | 64 | $3\times 3$ | $304\times 304\times 32$ | $152\times 152\times 64$ |

2 conv | 64 | $3\times 3$ | $152\times 152\times 64$ | $152\times 152\times 64$ |

3 route | 2 | 1/2 | $152\times 152\times 32$ | |

4 conv | 32 | $3\times 3$ | $152\times 152\times 32$ | $152\times 152\times 32$ |

5 conv | 32 | $3\times 3$ | $152\times 152\times 32$ | $152\times 152\times 32$ |

6 route | 5 4 | $152\times 152\times 64$ | ||

7 conv | 64 | $1\times 1$ | $152\times 152\times 64$ | $152\times 152\times 64$ |

8 route | 2 7 | $152\times 152\times 128$ | ||

9 max | $2\times 2$ | $152\times 152\times 128$ | $76\times 76\times 128$ | |

10 conv | 128 | $3\times 3$ | $76\times 76\times 128$ | $76\times 76\times 128$ |

11 route | 10 | 1/2 | $76\times 76\times 64$ | |

12 conv | 64 | $3\times 3$ | $76\times 76\times 64$ | $76\times 76\times 64$ |

13 conv | 64 | $3\times 3$ | $76\times 76\times 64$ | $76\times 76\times 64$ |

14 route | 13 12 | $76\times 76\times 128$ | ||

15 conv | 128 | $1\times 1$ | $76\times 76\times 128$ | $76\times 76\times 128$ |

16 route | 10 15 | $76\times 76\times 256$ | ||

17 max | $2\times 2$ | $76\times 76\times 256$ | $38\times 38\times 256$ | |

18 conv | 256 | $3\times 3$ | $38\times 38\times 256$ | $38\times 38\times 256$ |

19 route | 18 | 1/2 | $38\times 38\times 128$ | |

20 conv | 128 | $3\times 3$ | $38\times 38\times 128$ | $38\times 38\times 128$ |

21 conv | 128 | $3\times 3$ | $38\times 38\times 128$ | $38\times 38\times 128$ |

22 route | 21 20 | $38\times 38\times 256$ | ||

23 conv | 256 | $1\times 1$ | $38\times 38\times 256$ | $38\times 38\times 256$ |

24 route | 18 23 | $38\times 38\times 512$ | ||

25 max | $2\times 2$ | $38\times 38\times 512$ | $19\times 19\times 512$ | |

26 conv | 512 | $3\times 3$ | $19\times 19\times 512$ | $19\times 19\times 512$ |

27 conv | 256 | $1\times 1$ | $19\times 19\times 512$ | $19\times 19\times 256$ |

28 conv | 512 | $3\times 3$ | $19\times 19\times 256$ | $19\times 19\times 512$ |

29 conv | 18 | $1\times 1$ | $19\times 19\times 512$ | $19\times 19\times 18$ |

30 yolo | ||||

31 route | 27 | $19\times 19\times 256$ | ||

32 conv | 128 | $1\times 1$ | $19\times 19\times 256$ | $19\times 19\times 128$ |

33 upsample | $2\times $ | $19\times 19\times 128$ | $38\times 38\times 128$ | |

34 route | 33 23 | $38\times 38\times 384$ | ||

35 conv | 256 | $3\times 3$ | $38\times 38\times 384$ | $38\times 38\times 256$ |

36 conv | 18 | $1\times 1$ | $38\times 38\times 256$ | $38\times 38\times 18$ |

37 yolo |

## References

- Kummu, M.; Taka, M.; Guillaume, J.H. Gridded Global Datasets for Gross Domestic Product and Human Development Index over 1990–2015; Nature Publishing Group: Berlin, Germany, 2018; Volume 5, pp. 1–15. [Google Scholar] [CrossRef] [Green Version]
- Tang, Y.; Luan, X.; Sun, J.; Zhao, J.; Yin, Y.; Wang, Y.; Sun, S. Impact assessment of climate change and human activities on GHG emissions and agricultural water use. Agric. For. Meteorol.
**2021**, 296, 108218. [Google Scholar] [CrossRef] - Jensen, C.; Ørum, J.E.; Pedersen, S.; Andersen, M.; Plauborg, F.; Liu, F.; Jacobsen, S.E. A Short Overview of Measures for Securing Water Resources for Irrigated Crop Production. J. Agron. Crop Sci.
**2014**, 200, 333–343. [Google Scholar] [CrossRef] - Mestre, G.; Matos-Carvalho, J.P.; Tavares, R.M. Irrigation Management System using Artificial Intelligence Algorithms. In Proceedings of the 2022 International Young Engineers Forum (YEF-ECE), Lisbon, Portugal, 1 July 2022; pp. 69–74. [Google Scholar] [CrossRef]
- Tsouros, D.C.; Bibi, S.; Sarigiannidis, P.G. A review on UAV-based applications for precision agriculture. Information
**2019**, 10, 349. [Google Scholar] [CrossRef] [Green Version] - Merz, M.; Pedro, D.; Skliros, V.; Bergenhem, C.; Himanka, M.; Houge, T.; Matos-Carvalho, J.P.; Lundkvist, H.; Cürüklü, B.; Hamrén, R.; et al. Autonomous UAS-Based Agriculture Applications: General Overview and Relevant European Case Studies. Drones
**2022**, 6, 128. [Google Scholar] [CrossRef] - Pedro, D.; Lousã, P.; Ramos, Á.; Matos-Carvalho, J.P.; Azevedo, F.; Campos, L. HEIFU—Hexa Exterior Intelligent Flying Unit. In Proceedings of the Computer Safety, Reliability, and Security, SAFECOMP 2021 Workshops, York, UK, 7 September 2021; Habli, I., Sujan, M., Gerasimou, S., Schoitsch, E., Bitsch, F., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 89–104. [Google Scholar]
- Correia, S.; Realinho, V.; Braga, R.; Turégano, J.; Miranda, A.; Gañan, J. Development of a Monitoring System for Efficient Management of Agricultural Resources. In Proceedings of the VIII International Congress on Project Engineering, Bilbao, Spain, 7–8 October 2004; pp. 1215–1222. [Google Scholar]
- Torky, M.; Hassanein, A.E. Integrating blockchain and the internet of things in precision agriculture: Analysis, opportunities, and challenges. Comput. Electron. Agric.
**2020**, 178, 105476. [Google Scholar] [CrossRef] - Sishodia, R.P.; Ray, R.L.; Singh, S.K. Applications of remote sensing in precision agriculture: A review. Remote Sens.
**2020**, 12, 3136. [Google Scholar] [CrossRef] - Charu, C.A. Neural Networks and Deep Learning: A Textbook; Determination Press: San Francisco, CA, USA, 2018. [Google Scholar]
- Henrique, A.S.; Fernandes, A.M.R.; Rodrigo, L.; Leithardt, V.R.Q.; Correia, S.D.; Crocker, P.; Scaranto Dazzi, R.L. Classifying Garments from Fashion-MNIST Dataset Through CNNs. Adv. Sci. Technol. Eng. Syst. J.
**2021**, 6, 989–994. [Google Scholar] [CrossRef] - Matos-Carvalho, J.P.; Santos, R.; Tomic, S.; Beko, M. GTRS-Based Algorithm for UAV Navigation in Indoor Environments Employing Range Measurements and Odometry. IEEE Access
**2021**, 9, 89120–89132. [Google Scholar] [CrossRef] - Santos, R.; Matos-Carvalho, J.P.; Tomic, S.; Beko, M.; Correia, S.D. Applying Deep Neural Networks to Improve UAV Navigation in Satellite-less Environments. In Proceedings of the 2022 International Young Engineers Forum (YEF-ECE), Lisbon, Portugal, 1 July 2022; pp. 63–68. [Google Scholar] [CrossRef]
- Santos, R.; Matos-Carvalho, J.P.; Tomic, S.; Beko, M. WLS algorithm for UAV navigation in satellite-less environments. IET Wirel. Sens. Syst.
**2022**, 12, 93–102. [Google Scholar] [CrossRef] - Salazar, L.H.A.; Leithardt, V.R.; Parreira, W.D.; da Rocha Fernandes, A.M.; Barbosa, J.L.V.; Correia, S.D. Application of Machine Learning Techniques to Predict a Patient’s No-Show in the Healthcare Sector. Future Internet
**2022**, 14, 3. [Google Scholar] [CrossRef] - Ramesh, N.V.K.; B, M.R.; B, B.D.; Suresh, N.; Rao, K.R.; Reddy, B.N.K. Identification of Tomato Crop Diseases Using Neural Networks-CNN. In Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 6–8 July 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Narvekar, C.; Rao, M. Flower classification using CNN and transfer learning in CNN- Agriculture Perspective. In Proceedings of the 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), Thoothukudi, India, 3–5 December 2020; pp. 660–664. [Google Scholar] [CrossRef]
- Salvado, A.B.; Mendonça, R.; Lourenço, A.; Marques, F.; Matos-Carvalho, J.P.; Miguel Campos, L.; Barata, J. Semantic Navigation Mapping from Aerial Multispectral Imagery. In Proceedings of the 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, BC, Canada, 12–14 June 2019; pp. 1192–1197. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst.
**2015**, 28, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Gao, B.C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ.
**1996**, 58, 257–266. [Google Scholar] [CrossRef] - Mozgeris, G.; Gadal, S.; Jonikavičius, D.; Straigytė, L.; Ouerghemmi, W.; Juodkienė, V. Hyperspectral and color-infrared imaging from ultralight aircraft: Potential to recognize tree species in urban environments. In Proceedings of the 2016 8th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Los Angeles, CA, USA, 21–24 August 2016; pp. 1–5. [Google Scholar]
- Boiarskii, B.; Hasegawa, H. Comparison of NDVI and NDRE Indices to Detect Differences in Vegetation and Chlorophyll Content. J. Mech. Contin. Math. Sci.
**2019**, spl1. [Google Scholar] [CrossRef] - Yagci, A.L.; Di, L.; Deng, M. The influence of land cover-related changes on the NDVI-based satellite agricultural drought indices. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 2054–2057. [Google Scholar]
- Subba Rao, V.P.; Rao, G.S. Design and Modelling of anAffordable UAV Based Pesticide Sprayer in Agriculture Applications. In Proceedings of the 2019 Fifth International Conference on Electrical Energy Systems (ICEES), Chennai, India, 21–22 February 2019; pp. 1–4. [Google Scholar] [CrossRef]
- Zheng, H.; Zhou, X.; Cheng, T.; Yao, X.; Tian, Y.; Cao, W.; Zhu, Y. Evaluation of a UAV-based hyperspectral frame camera for monitoring the leaf nitrogen concentration in rice. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 7350–7353. [Google Scholar] [CrossRef]
- Li, D.; Zheng, H.; Xu, X.; Lu, N.; Yao, X.; Jiang, J.; Wang, X.; Tian, Y.; Zhu, Y.; Cao, W.; et al. BRDF Effect on the Estimation of Canopy Chlorophyll Content in Paddy Rice from UAV-Based Hyperspectral Imagery. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 6464–6467. [Google Scholar] [CrossRef]
- Matos-Carvalho, J.P.; Pedro, D.; Campos, L.M.; Fonseca, J.M.; Mora, A. Terrain Classification Using W-K Filter and 3D Navigation with Static Collision Avoidance. In Proceedings of the SAI Intelligent Systems Conference, London, UK, 5–6 September 2019; Bi, Y., Bhatia, R., Kapoor, S., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 1122–1137. [Google Scholar]
- Vardhini, P.; Asritha, S.; Devi, Y. Efficient Disease Detection of Paddy Crop using CNN. In Proceedings of the 2020 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE), Bengaluru, India, 9–10 October 2020; pp. 116–119. [Google Scholar] [CrossRef]
- Feng, Q.; Chen, J.; Li, X.; Li, C.; Wang, X. Multi-spectral Image Fusion Method for Identifying Similar-colored Tomato Organs. In Proceedings of the 2019 IEEE International Conference on Unmanned Systems and Artificial Intelligence (ICUSAI), Shenzhen, China, 21–23 April 2019; pp. 142–145. [Google Scholar] [CrossRef]
- Zhou, Z.; Li, S.; Shao, Y. Crops Classification from Sentinel-2A Multi-spectral Remote Sensing Images Based on Convolutional Neural Networks. In Proceedings of the IGARSS 2018–2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 5300–5303. [Google Scholar] [CrossRef]
- Hossain, M.I.; Paul, B.; Sattar, A.; Islam, M.M. A Convolutional Neural Network Approach to Recognize the Insect: A Perspective in Bangladesh. In Proceedings of the 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), Moradabad, India, 22–23 November 2019; pp. 384–389. [Google Scholar] [CrossRef]
- Murata, K.; Ito, A.; Takahashi, Y.; Hatano, H. A Study on Growth Stage Classification of Paddy Rice by CNN using NDVI Images. In Proceedings of the 2019 Cybersecurity and Cyberforensics Conference (CCC), Melbourne, Australia, 8–9 May 2019; pp. 85–90. [Google Scholar] [CrossRef]
- Habibie, M.I.; Ahamed, T.; Noguchi, R.; Matsushita, S. Deep Learning Algorithms to determine Drought prone Areas Using Remote Sensing and GIS. In Proceedings of the 2020 IEEE Asia-Pacific Conference on Geoscience, Electronics and Remote Sensing Technology (AGERS), Jakarta, Indonesia, 7–8 December 2020; pp. 69–73. [Google Scholar] [CrossRef]
- Sobayo, R.; Wu, H.H.; Ray, R.; Qian, L. Integration of Convolutional Neural Network and Thermal Images into Soil Moisture Estimation. In Proceedings of the 2018 1st International Conference on Data Intelligence and Security (ICDIS), South Padre Island, TX, USA, 8–10 April 2018; pp. 207–210. [Google Scholar] [CrossRef]
- Liu, Z.; Wu, J.; Fu, L.; Majeed, Y.; Feng, Y.; Li, R.; Cui, Y. Improved Kiwifruit Detection Using Pre-Trained VGG16 With RGB and NIR Information Fusion. IEEE Access
**2020**, 8, 2327–2336. [Google Scholar] [CrossRef] - Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar] [CrossRef] [Green Version]
- Wang, X.; Zhuang, X.; Zhang, W.; Chen, Y.; Li, Y. Lightweight Real-time Object Detection Model for UAV Platform. In Proceedings of the 2021 International Conference on Computer Communication and Artificial Intelligence (CCAI), Guangzhou, China, 7–9 May 2021; pp. 20–24. [Google Scholar] [CrossRef]
- Gotthans, J.; Gotthans, T.; Marsalek, R. Prediction of Object Position from Aerial Images Utilising Neural Networks. In Proceedings of the 2021 31st International Conference Radioelektronika (RADIOELEKTRONIKA), Brno, Czech Republic, 19–21 April 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Ding, Y.; Qu, Y.; Zhang, Q.; Tong, J.; Yang, X.; Sun, J. Research on UAV Detection Technology of Gm-APD Lidar Based on YOLO Model. In Proceedings of the 2021 IEEE International Conference on Unmanned Systems (ICUS), Beijing, China, 22–24 October 2021; pp. 105–109. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv
**2018**, arXiv:1804.02767. [Google Scholar] - Bochkovskiy, A.; Wang, C.; Liao, H.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. CoRR
**2020**. Available online: http://xxx.lanl.gov/abs/2004.10934 (accessed on 1 October 2021). - Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. pp. 1440–1448.
- Srinivas, R.; Nithyanandan, L.; Umadevi, G.; Rao, P.V.V.S.; Kumar, P.N. Design and implementation of S-band Multi-mission satellite positioning data simulator for IRS satellites. In Proceedings of the 2011 IEEE Applied Electromagnetics Conference (AEMC), Kolkata, India, 18–22 December 2011; pp. 1–4. [Google Scholar] [CrossRef]
- Weidong, Z.; Chun, W.; Jing, H. Development of agriculture machinery aided guidance system based on GPS and GIS. In Proceedings of the 2010 World Automation Congress, Kobe, Japan, 19–23 September 2010; pp. 313–317. [Google Scholar]
- Yu, H.; Liu, Y.; Yang, G.; Yang, X. Quick image processing method of HJ satellites applied in agriculture monitoring. In Proceedings of the 2016 World Automation Congress (WAC), Rio Grande, Puerto Rico, 31 July–4 August 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Murugan, D.; Garg, A.; Ahmed, T.; Singh, D. Fusion of drone and satellite data for precision agriculture monitoring. In Proceedings of the 2016 11th International Conference on Industrial and Information Systems (ICIIS), Roorkee, India, 3–4 December 2016; pp. 910–914. [Google Scholar] [CrossRef]
- Bansod, B.; Singh, R.; Thakur, R.; Singhal, G. A comparision between satellite based and drone based remote sensing technology to achieve sustainable development: A review. J. Agric. Environ. Int. Dev. (JAEID)
**2017**, 111, 383–407. [Google Scholar] - Shao, L.; Zhu, F.; Li, X. Transfer Learning for Visual Categorization: A Survey. IEEE Trans. Neural Netw. Learn. Syst.
**2015**, 26, 1019–1034. [Google Scholar] [CrossRef] - Chiba, S.; Sasaoka, H. Basic Study for Transfer Learning for Autonomous Driving in Car Race of Model Car. In Proceedings of the 2021 6th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand, 20–21 May 2021; pp. 138–141. [Google Scholar] [CrossRef]
- Shenavarmasouleh, F.; Arabnia, H.R. DRDr: Automatic Masking of Exudates and Microaneurysms Caused by Diabetic Retinopathy Using Mask R-CNN and Transfer Learning. In Advances in Computer Vision and Computational Biology; Arabnia, H.R., Deligiannidis, L., Shouno, H., Tinetti, F.G., Tran, Q.N., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 307–318. [Google Scholar]
- Khan, M.A.; Akram, T.; Zhang, Y.D.; Sharif, M. Attributes based skin lesion detection and recognition: A mask RCNN and transfer learning-based deep learning framework. Pattern Recognit. Lett.
**2021**, 143, 58–66. [Google Scholar] [CrossRef] - Wani, M.A.; Afzal, S. A New Framework for Fine Tuning of Deep Networks. In Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017; pp. 359–363. [Google Scholar] [CrossRef]
- Too, E.C.; Yujian, L.; Njuki, S.; Yingchun, L. A comparative study of fine-tuning deep learning models for plant disease identification. Comput. Electron. Agric.
**2019**, 161, 272–279. [Google Scholar] [CrossRef] - AlexeyAB. Darknet. Github Repository. Available online: https://github.com/AlexeyAB/darknet (accessed on 1 November 2021).
- matterport. Mask RCNNGithub Repository. Available online: https://github.com/matterport/Mask_RCNN (accessed on 1 November 2021).
- RedEdge-MX Integration Guide. 2022. Available online: https://support.micasense.com/hc/en-us/articles/360011389334-RedEdge-MX-Integration-Guide (accessed on 1 November 2022).
- Pino, M.; Matos-Carvalho, J.P.; Pedro, D.; Campos, L.M.; Costa Seco, J. UAV Cloud Platform for Precision Farming. In Proceedings of the 2020 12th International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP), Porto, Portugal, 20–22 July 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Vong, A.; Matos-Carvalho, J.P.; Toffanin, P.; Pedro, D.; Azevedo, F.; Moutinho, F.; Garcia, N.C.; Mora, A. How to Build a 2D and 3D Aerial Multispectral Map?—All Steps Deeply Explained. Remote Sens.
**2021**, 13, 3227. [Google Scholar] [CrossRef] - AlexeyAB. Yolo Mark. Github Repository. Available online: https://github.com/AlexeyAB/Yolo_mark (accessed on 1 November 2021).
- Dutta, A.; Zisserman, A. The VIA Annotation Software for Images, Audio and Video. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; ACM: New York, NY, USA, 2019. [Google Scholar] [CrossRef] [Green Version]
- Michelucci, U. Advanced Applied Deep Learning: Convolutional Neural Networks and Object Detection; Apress: Pune, India, 2019. [Google Scholar] [CrossRef]
- Allanzelener. YAD2K: Yet Another Darknet 2 Keras. Github Repository. Available online: https://github.com/allanzelener/YAD2K (accessed on 1 November 2021).
- Xiaochus. YOLOv3. Github Repository. Available online: https://github.com/xiaochus/YOLOv3 (accessed on 1 November 2021).
- Runist. YOLOv4. Github Repository. Available online: https://github.com/Runist/YOLOv4.git (accessed on 1 November 2021).
- Google. Google CoLaboratory. Available online: https://colab.research.google.com/drive/151805XTDg–dgHb3-AXJCpnWaqRhop_2#scrollTo=ojGuEt8MpJhA (accessed on 1 November 2021).
- Casamitjana, M.; Torres-Madroñero, M.C.; Bernal-Riobo, J.; Varga, D. Soil Moisture Analysis by Means of Multispectral Images According to Land Use and Spatial Resolution on Andosols in the Colombian Andes. Appl. Sci.
**2020**, 10, 5540. [Google Scholar] [CrossRef]

**Figure 1.**Different image types of the same map. This figure includes the multispectral indices used (figures (

**a**,

**d**,

**e**)) and the color image formats (figures (

**b**,

**c**)). (

**a**) NDWI, (

**b**) CIR, (

**c**) RGB, (

**d**) NDRE, and (

**e**) NDVI.

**Figure 2.**Manual identification of plantation gaps. As observed, we included parts of the vegetation inside the ground truth boxes (dashed line) and polygons (continuous line and colored areas).

**Figure 3.**Example of cropped images from the same map in different multi-spectral indices. All 3 images share the same name in their respective datasets. (

**a**) CIR, (

**b**) NDWI, (

**c**) RGB.

**Figure 4.**Examples of regression-based networks’ results with predicted boxes from the respective networks, and confidence values for each object identified. Dataset: RGB; (

**a**): Tiny-YOLO; (

**b**): YOLO; (

**c**): Tiny-YOLOv2; (

**d**): Tiny-YOLOv3; (

**e**): Tiny-YOLOv4.

**Figure 5.**Examples of Mask-RCNN, after training ‘head’ layers for 30 epochs, detecting too many objects. Dataset: CIR; Weights: Tiny-YOLO; (

**a**): Original; (

**b**): Detected.

**Figure 6.**Examples of Mask-RCNN, after training ‘head’ layers for 30 epochs, miscategorizing the path. Dataset: GS; Weights: Tiny-YOLO; (

**a**): Original; (

**b**): Detected.

**Figure 7.**Examples of Mask-RCNN, after training ‘head’ layers for 30 epochs. Dataset: NDRE; Weights (

**a**,

**b**) YOLO; (

**c**,

**d**) Tiny-YOLOv2; (

**a**,

**c**): Original; (

**b**,

**d**): Detected. The Detected images (

**b**,

**d**) show two different network outputs in the same dataset and network. The image in (

**b**) failed to detect the gap in (

**a**), and the image in (

**d**) predicted too many gaps in (

**c**).

**Figure 8.**Examples of Mask-RCNN, after training ‘head’ layers for 30 epochs, failing to detect diagonal objects. Dataset: RGB; Weights: Tiny-YOLOv2; (

**a**): Original; (

**b**): Detected.

**Figure 9.**Examples of Mask-RCNN, after training ‘all’ layers for 50 epochs, failing to detect objects. Dataset: NDRE; Weights: YOLO; (

**a**): original; (

**b**): detected.

**Figure 10.**Examples of Mask-RCNN, after training ‘all’ layers for 50 epochs, miscategorizing the path. Dataset: CIR; Weights: Tiny YOLOv4; (

**a**): original; (

**b**): detected.

**Table 1.**Number of images per multi-spectral index. First dataset used for training. Total: total number of images; Train: total number of images used in training; Test: total number of images used for testing. Approximately 10% of Total was used for testing.

Multi-Spectral Index | Total | Train | Perc. (%) Train | Test | Perc. (%) Test |
---|---|---|---|---|---|

RGB | 1696 | 1590 | 94% | 106 | 6% |

CIR | 1575 | 1414 | 90% | 161 | 10% |

NDWI | 1575 | 1418 | 90% | 157 | 10% |

NDRE | 1352 | 1226 | 91% | 126 | 9% |

NDVI | 327 | 281 | 86% | 41 | 14% |

GS | 650 | 583 | 90% | 67 | 10% |

**Table 2.**Number of images per multi-spectral Index. Second dataset used for training. Total: total number of images; Train: total number of images used in training; Test: total number of images used for testing. Approximately 20% of Total was used for testing.

Multi-Spectral Index | Total | Train | Perc. (%) Train | Test | Perc. (%) Test |
---|---|---|---|---|---|

RGB | 2429 | 1906 | 78% | 523 | 22% |

CIR | 2018 | 1586 | 79% | 432 | 21% |

NDWI | 2018 | 1615 | 80% | 403 | 20% |

NDRE | 1795 | 1397 | 78% | 398 | 22% |

NDVI | 665 | 554 | 83% | 111 | 17% |

GS | 650 | 505 | 78% | 145 | 22% |

**Table 3.**Regression based networks training. $A{P}_{50}$: average precision with IoU of 0.5; $A{P}_{75}$: average precision with IoU of 0.75; Train: dataset used in training; Method: Network used for training.

Method | Train | ${\mathbf{AP}}_{50}$ | ${\mathbf{AP}}_{75}$ |
---|---|---|---|

YOLO | RGB | 41.61% | 2.07% |

Tiny-YOLO | RGB | 26.52% | 0.30% |

Tiny-YOLOv2 | RGB | 38.67% | 6.65% |

Tiny-YOLOv3 | RGB | 56.73% | 11.82% |

Tiny-YOLOv4 | RGB | 61.47% | 19.62 |

YOLO | CIR | 31.72% | 2.79% |

Tiny-YOLO | CIR | 21.93% | 0.49% |

Tiny-YOLOv2 | CIR | 28.58% | 1.16% |

Tiny-YOLOv3 | CIR | 49.82% | 4.92% |

Tiny-YOLOv4 | CIR | 52.87% | 12.39% |

YOLO | NDRE | 11.10% | 1.34% |

Tiny-YOLO | NDRE | 4.37% | 0.01% |

Tiny-YOLOv2 | NDRE | 16.65% | 0.76% |

Tiny-YOLOv3 | NDRE | 46.20% | 6.57% |

Tiny-YOLOv4 | NDRE | 50.86% | 11.96% |

YOLO | NDWI | 4.42% | 0.00% |

Tiny-YOLO | NDWI | 0.67% | 0.00% |

Tiny-YOLOv2 | NDWI | 17.16% | 0.57% |

Tiny-YOLOv3 | NDWI | 42.97% | 3.23% |

Tiny-YOLOv4 | NDWI | 46.34% | 4.09% |

YOLO | GS | 20.09% | 0.30% |

Tiny-YOLO | GS | 11.99% | 0.44% |

Tiny-YOLOv2 | GS | 14.51% | 1.89% |

Tiny-YOLOv3 | GS | 29.61% | 8.55% |

Tiny-YOLOv4 | GS | 32.28% | 10.04% |

YOLO | NDVI | 21.63% | 3.33% |

Tiny-YOLO | NDVI | 18.61% | 0.26% |

Tiny-YOLOv2 | NDVI | 18.13% | 0.11% |

Tiny-YOLOv3 | NDVI | 27.79% | 3.00% |

Tiny-YOLOv4 | NDVI | 21.25% | 6.11% |

**Table 4.**Mask-RCNN ‘heads’ layers training for 30 epochs. Method: weights used + Mask-RCNN; Train: dataset used in training; $A{P}_{50}$: average precision with an IoU of 0.5; $A{P}_{75}$: average precision with an IoU 0.75.

Method | Train | ${\mathbf{AP}}_{50}$ | ${\mathbf{AP}}_{75}$ |
---|---|---|---|

YOLO + Mask-RCNN | RGB | 21.42% | 1.52% |

Tiny-YOLO + Mask-RCNN | RGB | 36.30% | 4.03% |

Tiny-YOLOv2 + Mask-RCNN | RGB | 33.72% | 2.17% |

Tiny-YOLOv3 + Mask-RCNN | RGB | 31.71% | 3.78% |

Tiny-YOLOv4 + Mask-RCNN | RGB | 7.12% | 0.12% |

YOLO + Mask-RCNN | CIR | 24.48% | 1.66% |

Tiny-YOLO + Mask-RCNN | CIR | 36.67% | 3.34% |

Tiny-YOLOv2 + Mask-RCNN | CIR | 23.96% | 0.93% |

Tiny-YOLOv3 + Mask-RCNN | CIR | 36.02% | 4.50% |

Tiny-YOLOv4 + Mask-RCNN | CIR | 19.30% | 1.16% |

YOLO + Mask-RCNN | NDRE | 4.55% | 0.47% |

Tiny-YOLO + Mask-RCNN | NDRE | 0.11% | 0.0% |

Tiny-YOLOv2 + Mask-RCNN | NDRE | 0.02% | 0.0% |

Tiny-YOLOv3 + Mask-RCNN | NDRE | 5.68% | 1.94% |

Tiny-YOLOv4 + Mask-RCNN | NDRE | 0.0% | 0.0% |

YOLO + Mask-RCNN | NDWI | 2.22% | 0.0% |

Tiny-YOLO + Mask-RCNN | NDWI | 1.68% | 0.0% |

Tiny-YOLOv2 + Mask-RCNN | NDWI | 11.69% | 0.61% |

Tiny-YOLOv3 + Mask-RCNN | NDWI | 9.14% | 0.0% |

Tiny-YOLOv4 + Mask-RCNN | NDWI | 1.20% | 0.0% |

YOLO + Mask-RCNN | NDVI | 45.65% | 6.88% |

Tiny-YOLO + Mask-RCNN | NDVI | 23.41% | 1.61% |

Tiny-YOLOv2 + Mask-RCNN | NDVI | 37.95% | 6.76% |

Tiny-YOLOv3 + Mask-RCNN | NDVI | 0.07% | 0.0% |

Tiny-YOLOv4 + Mask-RCNN | NDVI | 9.87% | 0.0% |

YOLO + Mask-RCNN | GS | 35.32% | 6.46% |

Tiny-YOLO + Mask-RCNN | GS | 23.32% | 1.75% |

Tiny-YOLOv2 + Mask-RCNN | GS | 18.91% | 3.20% |

Tiny-YOLOv3 + Mask-RCNN | GS | 46.48% | 6.48% |

Tiny-YOLOv4 + Mask-RCNN | GS | 15.38% | 0.04% |

**Table 5.**Mask-RCNN ‘all’ layers training for 50 epochs. Method: weights used + Mask-RCNN; Train: dataset used in training; $A{P}_{10}$: average precision with an IoU of 0.10; $A{P}_{50}$: average precision with an IoU of 0.5; $A{P}_{75}$: average precision with an IoU 0.75.

Method | Train | ${\mathbf{AP}}_{10}$ | ${\mathbf{AP}}_{50}$ | ${\mathbf{AP}}_{75}$ |
---|---|---|---|---|

YOLO + Mask-RCNN | RGB | 66.12% | 56.55% | 9.64% |

Tiny-YOLO + Mask-RCNN | RGB | 67.98% | 60.88% | 12.81% |

Tiny-YOLOv2 + Mask-RCNN | RGB | 66.42% | 59.10% | 15.33% |

Tiny-YOLOv3 + Mask-RCNN | RGB | 63.25% | 57.46% | 11.50% |

Tiny-YOLOv4 + Mask-RCNN | RGB | 64.07% | 57.69% | 9.18% |

YOLO + Mask-RCNN | CIR | 65.37% | 52.56% | 12.38% |

Tiny-YOLO + Mask-RCNN | CIR | 59.92% | 53.69% | 13.90% |

Tiny-YOLOv2 + Mask-RCNN | CIR | 61.90% | 52.74% | 8.46% |

Tiny-YOLOv3 + Mask-RCNN | CIR | 62.83% | 52.14% | 11.01% |

Tiny-YOLOv4 + Mask-RCNN | CIR | 63.53% | 52.88% | 5.86% |

YOLO + Mask-RCNN | NDRE | 43.76% | 24.16% | 2.07% |

Tiny-YOLO + Mask-RCNN | NDRE | 42.67% | 23.60% | 2.02% |

Tiny-YOLOv2 + Mask-RCNN | NDRE | 44.77% | 25.22% | 3.27% |

Tiny-YOLOv3 + Mask-RCNN | NDRE | 35.89% | 21.17% | 0.16% |

Tiny-YOLOv4 + Mask-RCNN | NDRE | 42.17% | 24.19% | 1.17% |

YOLO + Mask-RCNN | NDWI | 56.24% | 34.98% | 2.98% |

Tiny-YOLO + Mask-RCNN | NDWI | 58.21% | 40.93% | 7.57% |

Tiny-YOLOv2 + Mask-RCNN | NDWI | 52.19% | 40.43% | 2.38% |

Tiny-YOLOv3 + Mask-RCNN | NDWI | 59.69% | 50.98% | 8.30% |

Tiny-YOLOv4 + Mask-RCNN | NDWI | 55.05% | 43.10% | 7.17% |

YOLO + Mask-RCNN | NDVI | 51.06% | 40.49% | 5.43% |

Tiny-YOLO + Mask-RCNN | NDVI | 55.12% | 41.92% | 5.39% |

Tiny-YOLOv2 + Mask-RCNN | NDVI | 58.69% | 47.08% | 3.91% |

Tiny-YOLOv3 + Mask-RCNN | NDVI | 57.50% | 48.31% | 3.26% |

Tiny-YOLOv4 + Mask-RCNN | NDVI | 60.63% | 49.64% | 10.46% |

YOLO + Mask-RCNN | GS | 56.43% | 53.75% | 15.08% |

Tiny-YOLO + Mask-RCNN | GS | 60.30% | 56.67% | 12.40% |

Tiny-YOLOv2 + Mask-RCNN | GS | 54.90% | 46.00% | 5.20% |

Tiny-YOLOv3 + Mask-RCNN | GS | 56.94% | 46.45% | 7.72% |

Tiny-YOLOv4 + Mask-RCNN | GS | 61.93% | 57.94% | 13.28% |

Method | Train | ${\mathit{AP}}_{50}$ | ${\mathit{AP}}_{75}$ |
---|---|---|---|

Tiny-YOLOv4 | RGB | 61.47% | 19.62% |

YOLO + Mask-RCNN (just ‘head’ layers) | NDVI | 45.65% | 6.88% |

Tiny-YOLOv3 + Mask-RCNN (just ‘head’ layers) | GS | 46.48% | 6.48% |

Tiny-YOLO + Mask-RCNN (full) | RGB | 60.88% | 12.81% |

Tiny-YOLOv2 + Mask-RCNN (full) | RGB | 59.10% | 15.33% |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sulemane, S.; Matos-Carvalho, J.P.; Pedro, D.; Moutinho, F.; Correia, S.D.
Vineyard Gap Detection by Convolutional Neural Networks Fed by Multi-Spectral Images. *Algorithms* **2022**, *15*, 440.
https://doi.org/10.3390/a15120440

**AMA Style**

Sulemane S, Matos-Carvalho JP, Pedro D, Moutinho F, Correia SD.
Vineyard Gap Detection by Convolutional Neural Networks Fed by Multi-Spectral Images. *Algorithms*. 2022; 15(12):440.
https://doi.org/10.3390/a15120440

**Chicago/Turabian Style**

Sulemane, Shazia, João P. Matos-Carvalho, Dário Pedro, Filipe Moutinho, and Sérgio D. Correia.
2022. "Vineyard Gap Detection by Convolutional Neural Networks Fed by Multi-Spectral Images" *Algorithms* 15, no. 12: 440.
https://doi.org/10.3390/a15120440