Static and Dynamic Algorithms for Terrain Classiﬁcation in UAV Aerial Imagery

: The ability to precisely classify different types of terrain is extremely important for Unmanned Aerial Vehicles (UAVs). There are multiple situations in which terrain classiﬁcation is fundamental for achieving a UAV’s mission success, such as emergency landing, aerial mapping, decision making, and cooperation between UAVs in autonomous navigation. Previous research works describe different terrain classiﬁcation approaches mainly using static features from RGB images taken onboard UAVs. In these works, the terrain is classiﬁed from each image taken as a whole, not divided into blocks; this approach has an obvious drawback when applied to images with multiple terrain types. This paper proposes a robust computer vision system to classify terrain types using three main algorithms, which extract features from UAV’s downwash effect: Static textures-Gray-Level Co-Occurrence Matrix (GLCM), Gray-Level Run Length Matrix (GLRLM) and Dynamic textures-Optical Flow method. This system has been fully implemented using the OpenCV library, and the GLCM algorithm has also been partially speciﬁed in a Hardware Description Language (VHDL) and implemented in a Field Programmable Gate Array (FPGA)-based platform. In addition to these feature extraction algorithms, a neural network was designed with the aim of classifying the terrain into one of four classes. Lastly, in order to store and access all the classiﬁed terrain information, a dynamic map, with this information was generated. The system was validated using videos acquired onboard a UAV with an RGB camera.


Introduction
Unmanned Aerial Vehicles (UAVs) are a topic of interest in several areas.Their use is expected to have a great impact on society [1] in the near future.Computer vision is an example of one such area, and which aims to perform the same complex cognitive processes and to perform the same tasks (example, identify objects, processing the data, and then making a decision of how to respond) with at least the same level of efficiency as humans.One very important application of Computer Vision and UAVs is to help Unmanned Surface Vehicles (USV), giving them proper information about the terrain types, allowing them to identify where they can navigate, making them autonomous robots.Thus, the UAVs must be able to capture, process and analyze environment images in order to classify the terrain type under its flight area.
Several methods for terrain classification have been proposed in different works.Texture algorithms, such as those proposed in [2][3][4][5], have been widely recommended to emphasize the high and low frequencies of the images, supporting image classification.Other algorithms use color information to classify terrains, such as presented in [6], which is able to distinguish four different terrain types within an image.During this process, each channel's pixel is divided by the square root of its own three channels intensity.The final result will emphasize the color that most represents the terrain type (eg, blue for water).Additionally, frequency domain [7,8], segmentation [6,9,10], bayesian network [11], and Hyperspectal Images [12] can also be used in terrain classification.
Other types of sensors such as LiDAR [13][14][15][16] can complement the classification decision.Algorithms that use laser scanners proved to be qualified to accurately distinguish between water and non-water terrains [13][14][15][16].However, shallow water terrains increase the decision error due to laser reflection, which leads to a misclassification as non-water terrain.
Although prior research work has proposed many good solutions for terrain classification, there is still a gap regarding the study of dynamic terrain.The previously mentioned algorithms suffer from a high sensitivity to changes in the environment, mainly due to changes in brightness, color and texture.Recent works [17,18] proposed the use of downwash effect to overcome these limitations.
The downwash effect caused by UAV propellers, shown in Figure 1, might have a major impact on terrain classification, because each terrain, when subject to this effect, present different behaviours.Take the example of water-type terrains, where there is no color variation, i.e, no motion presented, these are wrongly classified by algorithms prepared to evaluate static textures, as shown in [2][3][4][5].This happens because in water it is not possible to observe any texture under these conditions.Using the downwash effect, it adds motion in the environment, and circular textures can be observed, something that normally occurs on water-like terrain.Until now, only two recent publications [17,18] were known to use the concept of downwash effect for terrain classification.Ricardo Pombeiro [17] uses the optical flow concept to extract the dynamic part of an image using the Lukas Kanade method [17], from an onboard RGB camera.This work presents a method to determine if the terrain under study was water type or not.However, the algorithm's extremely long processing time is an issue, since it needs at least four seconds to classify if the terrain under study is water type or not.That means that the UAV needs to stand still for at least four seconds to know which type of terrain it is flying over.
The second most recent work also using the downwash effect to classify terrain types, beyond the use of dynamic properties, also takes advantage of static properties of an image.Unlike in [17], the work presented in [18] not only can identify water, but also sand and vegetation terrains and can do so in near real-time (in less than 30 ms).To classify the terrain type, the system from [18] uses the gabor concept in order to extract the static terrain texture; the optical flow concept, is used under the Farneback algorithm, which obtains the dynamic terrain properties.However, the systems from [17] and [18] only identify one type of terrain per frame, which clearly becomes a disadvantage when there are several types of terrain in a single frame.
The execution of algorithms for terrain classification is computationally heavy and can lead to lower performance than desired.Field Programmable Gate Array (FPGA) implementations have been used to accelerate the execution of algorithms due to their maximization of parallel processing and lower energy consumption [19].This features allow FPGA implementations to achieve faster execution times when compared to computer vision software libraries, such as OpenCV, or high performance interactive software for numerical computation, such as MATLAB [20,21].An FPGA-based real-time tree crown detection approach for large-scale satellite images was proposed in [22] and the results showed a speedup of 18.75 times for a satellite image with a size of 12,188 × 12,576 pixels when compared to a 12-core CPU.An ortho-rectification technique based on an FPGA was presented in [23].Compared to a PC-based platform, to process the same remotely sensed images, the FPGA-based platform was 4.3 times faster.An onboard georeferencing method for remotely sensed imagery was proposed in [24].The experimental results showed that using FPGA is 8 times faster than using a PC.An FPGA-based method for onboard detection and matching was presented in [25].The proposed implementation execution speed is 27 times higher when compared to a PC-based implementation.A pattern recognition architecture based on content addressable memory was implemented in [26] and the results show that the worst-case execution time to recognize a pattern is less than a few microseconds.This is 37.12% less then the previous implemented system based on software pattern recognition algorithms [26].Hardware accelerators, containing content addressable memory and index 8-bit and 16-bit in parallel with each clock cycle, were implemented in [27].This system consumes as low as 6.76% and 3.28% of energy compared to CPU and GPU-based designs respectively.In the literature, many other articles can be found, such as [28][29][30], describing the usage of FPGAs to support image processing, in real time.The execution times and/or the energy consumption, may justify the use of FPGAs to implement terrain classification and feature extraction algorithms.
This paper proposes a computer vision system that uses static and dynamic features to classify different types of terrain.It uses the downwash effect created by UAV propellers (visible at low altitudes, as shown in Figure 1) to improve the pattern recognition process.The developed system improves the terrain classification accuracy while at the same time, supporting the identification of multiple terrains (water, vegetation, asphalt and sand) in each image.The developed system prototype uses the OpenCV library and it was implemented in a general-purpose computer.It was also partially specified in a Hardware Description Language and implemented in an FPGA based platform.
This paper is organized in seven sections.Section 2 presents the information regarding the UAV used in this paper, whereas Section 2.2 identifies the terrain under study as well as their location.In Section 2.3 we describe the proposed system model of this paper.Sections 2.4.1 and 2.4.2 contain information about the static texture features, whereas Section 2.5 explains the dynamic extraction to improve the system accuracy to sort terrain.The implementation is described in Section 3. Section 4 shows how the results will be provided using a dynamic map.Section 5 describes and discusses the experimental results, including a comparison with other published results.Conclusions and future work to improve our system are discussed in Sections 6 and 7, respectively.

UAV Platform Design
The aerial vehicle used in this work was the Bebop2 [31], a four-rotor UAV from Parrot, shown in Figure 2, with the following characteristics: A hardware device to control the low and high level operations.This hardware contains the Global Positioning System (GPS) and the Inertial Measurement Unit (IMU) to detect the UAV position and orientation, respectively.An Wi-Fi receiver is connected to this hardware controller, in order to control UAV motors; • An RGB camera with a gimbal to stabilize imaging capture.Since the camera technical specifications are known, it is possible to obtain the Field of View (FoV) and therefore, to know the distance (in meters) per pixel.The selected resolution of each frame was 640 × 480 with a rate of 30 Hz; • A mini controller to stabilize the UAV when it is in flight mode.The sensor used for this purpose was a mini camera pointed to the ground where an optical flow algorithm runs to determine how much the UAV has traveled and to determine the unwanted motion of the UAV and thus reverse the process; • For autonomous UAV landing, it features a sonar sensor pointed at the ground, in order to detect any obstacle within a six-meter range allowing it to land in an unobstructed zone.

Data Gathering
In this paper, four different terrain types (water, vegetation, asphalt and sand) were analyzed, as shown in Figure 3.
These terrain types were studied at four locations in Portugal as shown in Figure 4: At these locations, it was collected several data in different conditions.Data was collected at an altitude between one and two meters, between 8 a.m. to 6 p.m. Regarding environmental conditions, the authors collected data during both clear and cloudy skies.To avoid overfitting and make the proposed algorithms more robust, older data (with and without the downwash effect) from other locations (e.g., agriculture field at Spain) and from other UAVs (from PDMFC) was also used.Besides, for improving generalization, a technique called "Early Stopping" was used that uses two different data sets: the training set, to update the weights and biases, and the validation set, to stop training when the network begins to overfit the data.Regarding the cross-validation, this paper used the "k-fold" technique where the data was divided into k randomly chosen subsets of roughly equal size (in this paper k = 5 was chosen).
The complete data set collected is composed by 513.840 images where 161.525 are of water, 95.403 are of vegetation, 31.093 are of asphalt and 225.819 are of sand.

Proposed System Model
The model of the proposed system, for terrain classification, is presented in Figure 5.As previously mentioned, two main measurements were used in this proposal: Dynamic and Static Textures.Since these two techniques are independent from each other, it is possible to operate it in parallel mode, in order to decrease the time required to process images.As shown in Figure 5, five main processes were identified in the architecture, namely: • Rectified Image: It is necessary to calibrate the camera before any process to make the proposed algorithms universal.Only then is possible to work with all RGB cameras regardless of their resolution; • Static Texture: To extract the terrain's static textures, Gray-Level Co-Occurrence Matrix (GLCM) and Gray-Level Run Length Matrix (GLRLM) were used to calculate features capable of providing information to classify terrain types; • Dynamic Texture: To identify the movement of each terrain type, an extraction of dynamic textures was performed using the Optical flow concept; • Classification: The outputs generated by the static and dynamic extraction phases are turned into inputs for a Neural Network (NN) [32] tasked with classifying the terrain the UAV is flying over.Machine learning techniques have already been proven to be efficient for terrain classification [33][34][35].In this work, an NN was used, namely a Multilayer Perceptron (MLP) architecture.The Neural Network inputs are the output values from static and dynamic feature extraction algorithms.The Neural Network model consists of three layers: the hidden layer contains 10 neurons, whereas the third layer corresponds to the system output and has four neurons, e.g., four possible outputs (water, vegetation, asphalt and sand).Each neuron uses a sigmoidal function to calculate its output.They are connected as a Fully Connected Feed Forward Neural Network.During the training stage, 70% of the total data was used for training, 15% for testing and 15% for validation; • Frames > n: "n" is the total number of frames required to increase the algorithm's accuracy.This threshold value was chosen empirically by the authors.

Static Textures
Knowing how to take advantage of textures in order to classify any type of terrain is very important for a variety of areas, as already mentioned in Section 1.In this section, we will discuss two types of static texture algorithms: Gray-Level Co-Occurrence Matrix (GLCM); Gray-Level Run Length Matrix (GLRLM).

Gray-Level Co-Occurrence Matrix
In this section, it is presented the first static texture algorithm-Gray-Level Co-Occurrence Matrix, known as GLCM [36,37].This algorithm extracts statistical texture features, that represents textures based on the relation and the distribution between pixels of a given frame.The algorithms that evaluate the texture of a frame can be classified as first, second or higher statistical texture orders.While the first order only calculates properties for individual pixels (such as mean and variance from the original image) neglecting the spatial relationship between pixels, the second and higher texture orders calculate properties of an image using a spatial relationship between two or more pixels, such as GLCM [38,39].
Before building the GLCM of this paper, it is necessary to understand these three parameters: 1.The distance d between i and j pixels; 2. The angular orientation θ chosen; 3. The Symmetric matrix decision.
After understanding the parameters described above, it is possible to create our matrix for the terrain texture classification.
As a first step, it is necessary to create the GLCM N × N matrix.In this paper the matrix dimensions are 256 × 256 because the input images are defined between 0 and 255 levels (256 gray levels).Figure 6 is an example of a 4 × 4 GLCM matrix.To simplify and make the explanation more intuitive, the following figures will be related to a 4 × 4 GLCM matrix with five gray intensity levels.As it can be seen in Figure 6a, the GLCM matrix was designed to have the distance d, between pixels, equal to 1 with an angular orientation θ equal to 0 (the distance d and the θ values were chosen only for the concept description).The GLCM matrix can also be normalized as shown in Figure 6b, by dividing each element by the sum of all image pixels.Lastly, the GLCM can have up to eight different angular orientations (0, 45, 135, 180, 225, 270 and 315 degrees).
After having the GLCM matrix defined, the next step is to determine its transpose in order to become a GLCM symmetric matrix: In this paper, a d = 100, θ = 0 and a symmetric matrix were chosen in order to have a trade-off between noise and system speed.These values were obtained by trial and error.
With a GLCM matrix defined, 6 out of 14 textural features by Haralick [36] were extracted to classify the terrain: where: p(i, j) = (i, j) th entry in a GLCM normalized matrix as shown in Figure 6b. ) Equations ( 2)-( 7) were used in this paper and have different meanings: • Contrast: Used to return the intensity contrast between a pixel and its neighbor throughout the entire image; • Correlation: This method is important to define how a pixel is correlated with its neighbor throughout the entire image; • Energy: Also known as angular second moment, the goal is to evaluate how constant is an image; • Homogeneity: Know as Inverse Difference Moment, this equation returns 1 when the GLCM is uniform (diagonal matrix); • Entropy: This feature measures the randomness of intensity distribution.The greater the information's heterogeneity in an image, the greater the entropy value is.However, when homogeneity increases, the entropy tends to 0; • Variance: Represents the measure of the dispersion of the values around the mean.
Section 2.4.2 will explain the second static texture algorithm and how to send the feature results to the NN.

Gray-Level Run Length Matrix
The Gray-Level Run Length Matrix, known as GLRLM, is the second static texture algorithm proposed in this paper.The GLRLM was introduced in [40] to define various texture features.Like the GLCM, GLRLM also evaluates the distribution of gray levels in an image or multiple images.As explained in Section 2.4.1, whereas GLCM evaluates the gray levels within neighbour pixels (taking into account the distance d and the angle θ), GLRLM evaluates run lengths.A run length is defined as the length of a consecutive sequence of pixels with the same gray level intensity along direction t.
As with Section 2.4.1, a 4 × 4 matrix (which represents an image of a terrain type) was also created to demonstrate the algorithm.As shown in Figure 7a, the length (columns) of each gray level intensity are calculated for each line.The GLRLM may also have up to eight different angular orientations, t, (0, 45, 135, 180, 225, 270 and 315 degrees).In this work, a 256 × 256 GLRLM matrix with an angular orientation of t = 0 was used.After the GLRLM matrix was created, the six features mentioned in Section 2.4.1 are extracted and information is sent to the NN to classify the terrain.Figure 8 shows an example of an NN response that received information from GLCM and GLRLM, as mentioned in Section 2.4, where the blue area represents water and the green area represents vegetation.For each NN output, a color was used for visualization and validation purposes:

Dynamic Textures
The ability to detect static textures for terrain classification is important, as described in Sections 2.4.1 and 2.4.2.However, the ability to distinguish different terrains using motion analysis increases the accuracy of the system.It is therefore necessary to study several features of different terrains to support the terrain classification.
In this paper the downwash effect from the UAV propellers was studied to extract dynamic textures from water (circular movements), which can be used to differentiate itself from any other type of terrains (vegetation, asphalt and sand).
As mentioned in Section 2.2, different types of terrain create different behaviors when affected by the UAV's downwash effect.Whereas in water (in this paper lake and pool terrains) it makes a circular pattern, vegetation terrain originates a linear movement and asphalt/sand are almost static.This movement can be detected using a well known concept, namely, Optical Flow.
Optical flow is an image processing method capable of detecting terrain movement in a sequence of frames.There are two well-known algorithms that use the concept of optical flow:
In this work, in order to detect the texture movement, the Farneback algorithm was used [42].One of the advantages of using the Farneback algorithm is that it provides the flow displacement, F d , given by the difference in the features between two frames.The flow displacement F d between features in frame n and n − 1 can be obtained from Equation (13): where S n and S n−1 are the sample pixels between two consecutive sequence of frames.
The obtained flows are shown in Figure 9 (red arrows).To determine their density, all linear equations (Figure 10a) were calculated for each flow.In this paper, all line equations are divided by a factor of 20 (this value was obtained by a set of iterations) to make the density of the flows more compact.In Figure 10b,c it is possible to observe the difference between a flow density without dividing by 20 and when dividing by 20, respectively.After the flow density calculation, the next step is to find the maximum value (e.g., the highest pixel value), which is the downwash center.Knowing the center and going back to Figure 9, it is necessary to check, for each flow (arrows in red), whether it is pointing in or out of the downwash center.In this water terrain type example, it is expected that all the flows originated by the Farneback algorithm point out from the center of the downwash (the circular movement goes from the center of the downwash outwards).Thus, there are three possible outputs, when creating a result image:  Lastly, the output information in Figure 11 is sent to the NN to classify the terrain.

Software
This section describes and explains (using pseudo-code) the software implementation (GLCM, GLRLM and Optical Flow) using OpenCV's libraries and ROS (Robot Operating System) framework:

• Open Source Computer Vision Library (OpenCV):
Being an open source library, OpenCV provides a variety of functions used for image processing; • Robot Operating System (ROS): The Robot Operating System is a "robot application development platform that provides various features such as message passing, distributed computing, code reusing, [...]" [43], as well as an integrated software and hardware applied to robotics applications.ROS also provides support for a wide range of programming languages; the possibility of visualizing data (e.g., the content of topics, nodes, packages, coordinate systems graphs and sensor data); and the possibility to write and execute code in a modular way, increasing robustness and also contributing to the standardization of this framework.
The pseudo-code shown in Algorithm 1, describes how the terrain classification system was implemented.
In Algorithm 1, it is possible to identify the main processes of the system represented by the functions: DynamicTexture, StaticTexture, Features and NeuralNetwork.Regarding the gray scale conversion, the system merges the three channels (red, green and blue) using CCIR 601 standard: Gray scale = 0.299 Red + 0.587 Green + 0.114 Blue (14) Using Equation ( 14), the dynamic textures (with Optical Flow) and static textures (GLCM and GLRLM) are calculated.The dynamic textures return each pixel flow, as mentioned in Section 2.5.Regarding the static textures, these will return the calculated features from GLCM and GLRLM (i.e., contrast, correlation, energy, homogeneity, entropy and variance) as mentioned in Sections 2.4.1 and 2.4.2.
The last step is to input these values to the NN and obtain the selected terrain (water, vegetation, asphalt or sand).

Hardware
The GLCM algorithm was also partially specified in VHDL (VHSIC Hardware Description Language) code and implemented in hardware, in an FPGA-based platform, which calculates the GLCM matrix and some of image features.This hardware implementation can be integrated with the software implementation as presented in Figure 12.It consists of a camera providing images to an FPGA.The features (contrast, energy) can be determined by applying the GLCM algorithm.The determined features are then sent to the computer, to be processed in software.The implementation of image processing algorithms in hardware can result into smaller execution time, as described in Section 1.To validate the implemented algorithm, a platform with the architecture presented in Figure 13 was used.It consists of an FPGA, a DDR2 RAM memory and a Computer.The FPGA is used to receive and process the frames, whereas the DDR2 is used to store these frames and the calculated matrices.After processing the images, the FPGA can sends the matrices and their features to the computer.The Avnet Spartan-3A DSP 1800A Video Kit board, which includes a Xilinx Spartan-3A DSP 1800A FPGA and a 128 MB (32 M × 32) DDR2 SDRAM, was used.The block diagram, of the hardware prototype implemented in the FPGA, is presented in Figure 14.This block diagram is composed by five main blocks, UART RX, the Memory Interface, the algorithm, the Convert to Gray, and the UART TX.The presented multiplexers and demultiplexers are used to control which addresses are accessed and which data is written or read from the DDR2 memory.The Memory Interface includes a custom interface that interacts with another interface that was generated using the Core Generator tool available in the Xilinx ISE.It stores and reads the images and their respective features, each in different addresses.The image features are represented by the GLCM matrix, which has a 256 × 256 dimension.The generated DDR2 memory interface has 2 main operations, write and read.The write operation writes data, whereas the read operation returns stored data.These operations can access and give data to the DDR2 in bursts of 4 × 32 bits, which totals 128 bits, or 16 bytes.
The images are transferred from the computer to the FPGA, with the UART RX block being used to accomplish this.It writes pixels with 24 bits each, in the DDR2.When the image is completely stored in the DDR2, the GLCM algorithm can be applied.After the GLCM algorithm executes, the UART TX block sends the GLCM matrices and associated features to the computer.
To execute the algorithm, the gray levels of the image must be calculated.The block that performs these calculations is the Convert to gray block, as shown in Figure 15.It receives stored pixels from the DDR2 and applies the grayscale algorithm.The output is the gray level of each pixel, represented by 8 bits (256 different gray levels).The output is used by the algorithm block in order to implement the GLCM algorithm.The implemented grayscale algorithm calculates a weighted average of the Red, Green, and Blue components, using Equation ( 14).To calculate this, the Red is multiplied by 306 (0.299 × 1024), the Green is multiplied by 601 (0.587 × 1024), and the Blue is multiplied by 117 (0.114 × 1024).Then, the number 512 is added to these three values, to ensure that the final value is properly rounded.Lastly, to obtain the final value, the sum value is shifted to the right 10 bits (divided by 1024 and truncated).The Algorithm block diagram, which implements the GLCM algorithm, is displayed in Figure 16.It creates the GLCM matrix, calculates a pair of features, and sends the matrix and the features to the UART TX block.To fill the GLCM matrix with the gray levels, the algorithm block receives, from the convert to gray block, sets of 4 pixels (8 bits per pixel) and saves them in the registers Pixel1 to Pixel4.For each pair of pixels ((PixelN, PixelN + 1) : N ∈ {1, 2, 3}), one or two addresses to access the matrix are built with the concatenation of one pixel (PixelN, the X in matrix) with the next pixel from the same line (PixelN + 1, the Y in matrix).If X is different from Y a second address is built with the concatenation of Y with X.When X is different from Y, for each of the two addresses, the matrix position is read, the value is incremented, and the value is saved.However, if X is equal to Y, only the first address is used, and the value is incremented twice.After processing these four pixels, Pixel4 is saved in Pixel0, and the next four pixels are received.If the next four pixels are not from the beginning of a line, the pair (Pixel0, Pixel1) is also processed.It is important to note that the resulting matrix stored in the DDR2 memory, is already the sum of the matrix with its transpose (Equation ( 1)), the GLCM symmetric matrix.The next step in the algorithm is the calculation of the contrast and energy, which are calculated in parallel.For each position of the matrix: • the difference between X and Y is multiplied by itself and by the value in the position (X, Y) of the matrix, and the result is accumulated in the contrast register; • the value in the position (X, Y) of the matrix is multiplied by itself, and the result is accumulated in the energy register.
An FPGA was used to implement the GLCM algorithm in hardware.This device is based on configurable logic blocks, and can be programmed using hardware description languages, such as VHDL (VHSIC Hardware Description Language).The selected FPGA was the Spartan-3A DSP XC3SD1800A, and to program it, the Xilinx ISE 14.7 was used.The VHDL code was synthesized and the circuit implemented.The correspondent device use summary is displayed in Table 1.Only the number of bounded IOBs (pins), DCMs (digital clock managers), and DSP48As (multipliers) surpass 10% of the available resources.

Aerial Mapping
An aerial vehicle's autonomy is possible when its route planning and decision making abilities are accurate enough to enable a safe and self-controlled navigation.Furthermore, aerial mapping of the classified terrains using UAVs can be very useful and implemented for many different application kinds such as autonomous navigation, precision agriculture, emergency landings and rescue missions [44].
The fusion between both the techniques applied for terrain classification: static textures (GLCM and GLRLM) and dynamic textures; through MLP, result into several images where each pixel represents one of the four studied terrain types (water, vegetation, asphalt and sand).This system's mapping procedure is similar to the Aerial Semantic Mapping algorithm described for a Precision Agriculture experiment [10].The framework's pipeline uses a dynamic ROS grid map composed by four layers (one for each type of terrain, see Figure 17).These layers are converted into the OccupancyGrid ROS message type which saves a georeferenced map with all the collected and classified images.The layered grid map cells have value 0 (white) when the pixel presents the corresponding layer's terrain type, otherwise the cell is 1 (black).
Equation ( 15) computes the real dimensions of the captured area, where FOV is measured in degrees ( • ) and the AspectRatio is the image's height:width proportion factor.The Parrot Bebop2 (Base_link) uses a counter-clockwise IMU sensor, which is in line with ROS REP:103 conventions [47] followed by the World and Map frames.On the other hand, OpenCV uses a clockwise system, which means that when x-axis points to right, then y-axis is pointing down, Figure 19.The precision implied by computer vision systems and ROS image mapping also requires caution when it comes to implement the proper rotation and translation transformations.As illustrated in the schematic of Figure 19, it is considered the standard 90 • counter-clockwise rotation to deal with the world→map transformation in Equation (16).
StandardRotation : 90 o Rotated : The following equation determines pixels rotation motion according to a rotation matrix M by θ (Equation ( 17)).
Then, according to Figure 19, the first captured image center is positioned at (E o , N o ).The subsequent images will be mapped in line with a pivot rotation, as explained in the following steps, where P(px, py) is the position of each image pixel: Equation ( 18), as explained in [10], describes the pivot rotation applied in mapping procedures, where: Based on the previous pivot rotation formula, it was possible to successfully compute a dynamic mapping algorithm (Equation (19)) that accurately builds an aerial view over the UAV's flown area.
The classified terrain information obtained during this experiment, was all georeferenced and accurately mapped using the ROS grid map layers, as illustrated in Figure 20 and can be visualized using the RViz tool (3D visualization tool for ROS).

Results
The results provided by Sections 2.4.1, 2.4.2 and 2.5, serve as input to the classification algorithm that converts the selected features into output labels.As previously mentioned in Section 2.2, a Multilayer Perceptron (MLP), was used in our experiments.The MLP receives the output values from the static and dynamic algorithms and returns one of four possible terrains (water, vegetation, asphalt or sand).
As mentioned before, the images used in our experiments were taken at an altitude between one to two meters.At lower altitudes the images had poor quality; whereas, above two meters, the downwash effect of the used UAV, which is not a powerful UAV, was almost nonexistent.
The presented results were obtained with a three-layer MLP, with ten neurons on the hidden layer and four neurons for the output layer.Multiple tests were performed with NNs using up to four layers and 100 neurons; however, the differences in the results were insignificant.It is important to mention that this work's main goal was to identify the best features that can improve the automatic terrain classification system.
In Table 2 it is possible to observe the confusion matrix table, from the classification results using static (GLCM and GLRLM) and dynamic (Optical flow) algorithms.The left column represents the labeled terrain, whereas the for each line, each cell represents the percentage of times the NN identified a certain type of terrain.For example, for the first line, when the terrain is water, the NN succeeds 94.60% in identifying the terrain as water, and fails 5.40% out of the 161.525 considered images.The 5.40% of times it failed, classifies the terrain as follows: 1.88% as vegetation, 1.70% as asphalt and 1.82% as sand.This paper proposals are outlined in Section 5.2, with a comparison between the present work and other approaches.The four terrain types' classification results are analyzed below.

Comparison and Discussion of Results
This subsection compares the results obtained by our proposed system, with results obtained by five related works.The comparison is shown in Table 3.For each work we show the respective successful rates for each terrain type as well an overall system average (in the column on the right).The average does not consider always these four terrain types, but each work's terrain types, since in some of the related works the proposed algorithms do not use all four terrain types.As illustrated in Table 3, the developed system presents the best accuracy in all four terrain types classification: water, vegetation, asphalt, and sand.One of the main advantages of this work, when compared to [10,12,16], is the ability to obtain a better classification because the UAV is only 1 to 2 meters away from the ground.In [10,12,16], there are images taken by satellites or at a distance above 60 meters from the ground, which increases the error in the classification of the studied terrain types.In [8,18], the UAV was also used at a short distance from the ground, as in this article, also providing images in high resolution.However, ref. [8,18] only classify one terrain which clearly becomes a disadvantage when there may be several terrain types to be captured by the on-board camera.This work's developed system has the ability to classify different terrain types in the same image.
Another major advantage studied in the current work, when it comes to terrain classification, is the terrain's dynamic part usage.This action clearly shows an improved system accuracy, since each terrain behaves differently when affected by the UAV's downwash effect.According to the works cited in Table 3, only [18] used the dynamic terrain feature, observing that the proposed system's accuracy is greater than in [8,10,12,16] articles and enhancing the terrains dynamism when it comes to terrain classification.Compared to [18], this work implements different static and dynamic algorithms, as well as classifies different terrain types with increased accuracy.
In summary, the system presented in this paper allows a more accurate terrain classification than with the other works mentioned in Table 3.This is due to the usage of both dynamic (from the Optical Flow concept) and static (from the GLCM and GLRLM) parts of the terrain, as previously mentioned, at 1 to 2 meters height from the ground, in order to have images with higher resolution.The current work, as well as [10], have datasets from the same Portuguese region.When compared to [8,18], the current work uses an extended and more heterogeneous dataset.To allow a direct comparison with these two works (those that use images taken at the same height from the ground), the system developed in this work was also tested with the datasets used in [8,18].The results are presented in Table 4. Compared to the results presented in Table 3, the system had higher success rates, as expected, due to the smaller and more homogeneous datasets.

Conclusions
This paper presents a complete solution for terrain classification, using the downwash effect by the UAV's rotors.The system implements two static algorithms: Gray-Level Co-Occurrence Matrix; Gray-Level Run Length Matrix; and one dynamic texture algorithm: Optical Flow.Regarding the terrain classification experimental results, the proposed solution achieves a 95.14% average success rate in differentiating among four terrain types (water, vegetation, asphalt and sand).
The developed FPGA-based prototype validates the use of hardware to implement parts of the proposed system.The integration of this hardware prototype into the system substantially reduces the software processing time and accelerate terrain classification.The developed hardware prototype will be extended in the near future to provide additional image features.
The obtained classification accuracy, the ability to classify more than one terrain in the same image, and a dynamic map generation (with the classification results), provide a set of advantages in the area of cooperative autonomous robots with fully autonomous navigation (e.g., an Unmanned Surface Vehicle (USV) needs to know where the water is to be able to navigate and an Unmanned Ground Vehicle (UGV) needs to avoid it).
The developed system had to overcome a major obstacle between software and hardware with UAVs, due to the limited existing work done in this area.A main goal of this paper work was the development of an open source system that merges ROS, OpenCV, and FPGAs, for UAVs.This system will be the basis for further research into cooperative autonomous robots.

Future Work
Despite the good results obtained so far, there are several areas in which improvements are foreseen.We plan to further investigate in the near future the following areas: • Perform a deeper study on changing of the environment colors, to improve the robustness of the algorithm.Environment color variation, regarding to the different time of the day, could influence the results and therefore false positives/negatives.Although the dynamic texture does not suffer much from these changes, due to the fact that the terrain movement is similar, the static texture could be highly affected; • Since the algorithm was designed for a UAV flying at an altitude between one and two meters, it is likely that at this altitude the UAV may collide with objects in the environment.The authors of this article developed an algorithm [48] to avoid obstacles, from images taken from a depth camera.However, currently it is only working with static objects.An improved version of this algorithm could be used to support the autonomous navigation of the proposed system; • It is also important to make a deeper study regarding different camera resolutions in order to improve the robustness of the proposed system, however, the system speed may be affected with higher resolutions.

•
Orange circle: Costa da Caparica; • Blue circle: Faculty of Sciences and Technology of NOVA University of Lisbon; • Red circle: Parque da Paz at Almada; • Green circle: Portuguese Navy's Naval Base at Alfeite;

Figure 4 .
Figure 4. Data collection localization in Portugal.The circles show the areas where the data set was obtained.

•Figure 8 .
Figure 8. Neural Network response using static features (GLCM and GLRLM).(a) Two terrains in one frame (vegetation and water); (b) NN results (green is vegetation and blue is water).

Figure 9 .
Figure 9. Example of a Optical Flow concept using Farneback algorithm in water (pool).

•
Blue Pixels: If the flow is pointing out of the calculated center (downwash center) from the flow density; • Green Pixels: If the flow is pointing into the calculated center (downwash center) from the flow density; • Gray Pixels: If the size of the flow is below a certain given treshold.The result of this example is shown in Figure 11.

Figure 11 .
Figure 11.Result of the dynamic algorithm from the concept of optical flow.

•
P (px ; py ) is the new P(px; py) point pivot rotation result; • M(θ) is the counter-clockwise 90 • rotation matrix (Equation (16)); • T(E T ; N T ) is the Map-to-Base_link translation matrix, where E T and N T are respectively the easting and northing translation from Figure 19; • θ is the counter-clockwise Base_link IMU rotation angle (rad).

Table 1 .
Device use summary.

Table 2 .
Results based on GLCM, GLRLM and Optical flow algorithms.

Table 3 .
Comparison between existing Terrain Classification Systems-Sorted by success rate.

Table 4 .
Comparison between existing Terrain Classification Systems-Sorted by success rate.