1. Introduction
In community ecology there are several theories to explain the distribution of species across environmental gradients. For instance, niche theory describes an
n-dimensional hyper-volume of conditions and resources in which populations have a positive growth rate [
1]. To understand species distributions along environmental gradients using niche theory is challenging however, because a niche is partitioned into conditions (e.g., habitat characteristics) and resources (e.g., food availability) and results will vary according to the methods used to quantify the habitat characteristics [
2].
Remote sensing provides opportunities to characterize aquatic habitat structures with repeatability and robust quantitative metrics. At large spatial scales, freely available satellite imagery has been used to map river networks at scales from individual rivers or basins to the entire globe [
3,
4]. However, at small scales, satellite imagery with fine spatial resolution (<5 m) is expensive and not always available for specific sites on the optimal date(s) due to cloud cover (especially in the tropics) or satellite revisit times. The recent developments in unmanned aerial vehicle (UAV) platforms for ecological applications [
5,
6,
7] are revolutionizing the way ecological variables can be mapped at the spatial and temporal scales needed for community ecology and environmental studies.
In terrestrial environments, UAV based photography [
8,
9,
10] and Structure from Motion (SfM) with multi-view stereo (MVS) photogrammetry to produce 3D reconstructions of landscapes is increasingly popular [
11,
12]. In a strict sense our use of the term SfM refers to an analytical workflow that combines both SfM and MVS photogrammetry often followed by interpolation and in some cases textured mesh generation as described in Reference [
13] but use SfM-MVS throughout for brevity. When correctly implemented, UAV SfM-MVS products have high spatial accuracy, even to within the error of differential GPS measurements [
13,
14,
15,
16,
17]. The landscape reconstructions from SfM-MVS are also much higher spatial resolution (<1–5 cm) than conventional satellite imagery used to assess land cover change. Recently, SfM-MVS from underwater photography and videography has been used to model coral specimens [
18], reefs [
19,
20,
21,
22,
23] and vertical wall marine environments [
24] with relatively high accuracies [
21]. It has also been shown to be a powerful analytical approach for deep sea applications of terrain and object reconstruction [
25]. In freshwater fluvial ecosystems, UAV SfM-MVS has been used to derive submerged digital elevation models and reach-scale morphology [
26,
27]. However, to the best of our knowledge, SfM-MVS (from UAV or underwater photography) has not yet been explored for freshwater fish habitat complexity characterization.
Biodiversity is strongly related to habitat structural complexity in both marine and freshwater environments [
28,
29,
30,
31,
32,
33,
34]. While several definitions of habitat complexity have been used interchangeably with heterogeneity and diversity [
35], in this study, we refer to habitat diversity as the variety of substrate types (e.g., sand, pebbles, boulders, bedrock) and heterogeneity as the magnitude of substrate diversity within a study site such as patches of sand interspersed with rocks versus homogenous sand. We further refer to complexity as the heterogeneity in the arrangement of substrate types within the habitat [
36,
37]. Metrics for quantifying aquatic habitat complexity and diversity include rugosity [
32,
38,
39], spatial autocorrelation [
40], fractal geometry [
23,
35,
41], grain size [
34], corrugation [
42], current velocity [
43], optical intensity [
32], slope and aspect [
44,
45].
In freshwater ecology, most studies use either low-tech methods (e.g., chain-and-tape) or proxies (e.g., abundance of macrophytes, submerged vegetation) to infer habitat complexity [
37,
46,
47,
48,
49]. The simple and subjective sampling methods (e.g., chain-and-tape) are time consuming, labour intensive and importantly, non-spatially extensive [
23,
50]. With the chain-and-tape method, a rope or chain is draped over the substrate profile and its length is measured and compared to the linear distance measured over the transect to produce a ‘substrate rugosity index’ (SRI) [
51]. It is one of the most commonly used methods to quantify aquatic habitat complexity despite its well-known weaknesses [
37,
51]. For this reason, in marine environments, the SRI has been shown to be less accurate than digital 3D reconstructions of the habitat structures [
18,
23,
50]. In freshwater ecosystems, distinguishing complexity and diversity among substrate profiles (a weakness of the SRI) is of fundamental importance to the aquatic biodiversity that inhabit only certain habitat types (e.g., large boulders vs. sand).
In this study, our goals are to illustrate the utility of SfM-MVS from UAV and underwater photography to quantify the habitat complexity in freshwater ecosystems and to map the area of habitat classes important for endemic and/or specialized ichthyofauna. Because SfM-MVS is still in early stages of adoption in the freshwater ecological communities [
52], there is a lack of good practice guidelines (e.g., sensor, lens choice, resolution, lighting effects). Our discussion addresses these aspects as means to provide an initial set of recommendations for freshwater habitat SfM-MVS studies.
3. Results
We found a strong significant relationship between the UAV based submerged DSM (GSD = 0.65 cm) and the in-situ measurements of the distances between the targets and the surface of the water (R
2 = 0.994, Sy.x = 6.205,
p < 0.001, F = 1449). While these data were collected under controlled conditions, they represent the range of depths at the Jatobá study site.
Table 3 illustrates the results of the underwater SfM-MVS measurements of distance between features in the dense 3D point cloud and measurements taken in-situ. The distance between features measured in the point cloud and in-situ with the tape measure underwater range from 1.4 to 2.5 cm (
Table 3,
Figure A1). For the measurements of the dimensions of the lock nuts, the difference between the digital calliper and the dense 3D point cloud range from 0.01–0.04 cm.
From the Jatobá site,
Figure 7 illustrates the finer spatial resolution obtainable from UAV photography in contrast to best available satellite imagery collected close in time to the UAV data (9 days difference). In the pansharpened GeoEye satellite image (50 cm panchromatic, 2 m multispectral), the location of the largest rapids can be seen but substrate classes cannot be determined (
Figure 7a). The glare off the surface of the water also prevents determining other information from the subsurface such the presence of aquatic vegetation. From the UAV photograph (
Figure 7b) locations of shallow and deep water can be inferred as well as the outlines of the largest boulders underwater. The flexibility of the UAV operation allows for multiple view angles to minimize glare and optimize the spatial information in each frame. A single aerial frame (
Figure 7b) is a flat 2D representation of the surface and lens distortions, variations in depth of field and a lack of coordinates for each pixel of the photograph prevent accurate measurements of area, distance or volume. However, once the full set of photographs for a site is processed through the SfM-MVS workflow (
Figure 6), measurements of distance and orientation between objects can be made. Furthermore, the 3D point cloud (
Figure 7c) can be rotated on the screen to view the landscape from various perspectives. The 3D representation of the landscape and subsequent products (e.g., DSM, orthomosaic) (
Figure 8) are also located in real-world geographical coordinates.
While the underwater substrate topography is not immediately evident from the full area DSM with a large range of elevation values (~25 m) (
Figure 8a), when a subsection of the DSM representing a 7 × 7 m area of the aquatic substrate is extracted (
Figure 9), it is possible to visualize the topography underwater. The values of the submerged DSM (
Figure 9a), following the refractive index correction, represent the distance from the surface (1.52–2.39 m), with brighter pixels indicating structures closer to the surface. The RMS height (standard deviation of the height) was found to be 0.1. Higher values indicate larger variations in height and therefore have been interpreted to mean a potentially rougher substrate [
23]. Higher values of slope (
Figure 9b) represent the edges of boulders. The rugosity (surface to planar ratio) (
Figure 9c) is similar, except the overall slope of the riverbed has been removed and pixels with high values of rugosity are found around and between the largest boulders. Due to the GSD of 1.2 cm, features smaller than that cannot be resolved. The length of the topographic correlation was found to be 54.7 cm (E-W) and 52.3 cm (N-S).
From the underwater SfM-MVS a higher GSD (1 mm) was achieved allowing for a more precise definition of the edges of the underwater structures (e.g., boulders) as well as the location of aquatic vegetation and the presence of green filamentous algae (
Figure 10a,b). From aerial photographs it can be difficult to distinguish between underwater vegetation and patches of filamentous algae but the two are clearly separable in the underwater orthomosaic (
Figure 10a). The map of rugosity (
Figure 10c) illustrates the most complex areas are at the boulder—sand interface. Small holes/depressions in the boulders are also distinguishable. The RMS was found to be 0.15. The length of the topographic correlation was found to be 27 cm (E-W) and 13.5 cm (N-S). When the DSM and rugosity maps are spatially degraded to 3 and 15 cm GSD, increasingly generalized, larger scale patterns of topography and overall complexity are seen. The range of rugosity values also decreases with increasing GSD from a maximum of 262 at 1 mm (
Figure 10c) to 1.9 at 15 cm (
Figure 10g). The three digital transects (
Figure 10h) of SRI resulted in values of 2.44 (A-A’), 2.18 (B—B’) and 2.05 (C—C’) respectively.
The low water of the dry season allowed for the 3D reconstruction of habitat classes in portions of the river that are normally submerged in the wet season (
Figure 11). The four sites encompass six of the habitat classes from
Table 1 as well as illustrate an example of the white water and deep/turbid classes. The Iriri site (
Figure 11a) exhibits the largest heterogeneity of habitat classes including cobbles, solid rock, medium and large boulders. In contrast, the Retroculus site (
Figure 11b) and Culuene (
Figure 11c) are the most homogenous in terms of the number of habitat classes present.
Based on the confusion matrices (
Figure A2), the neural networks separated the habitat classes with a high level of accuracy and minimal potential overfitting. The overall accuracy for all four sites was over 90%. Individual class user and producer accuracies were all > 90% with two exceptions, a shallow water class (UA = 83.3%, PA = 85.6%) at Culuene and the shadow class at Iriri (UA = 63.2%, PA = 32.2%). The results of the neural network classification (
Figure 12 and
Figure 13) reveal that only the bedrock and shallow water classes are consistently found at the four sites. The shallow water class represents water less than 30 cm deep with the sand and/or gravel assemblage classes mixed together. Classes Podostemaceae and Wet Bedrock, while not included in
Table 1 are only found in certain areas of the Xingu basin where there are rapids with a splash zone on the surrounding boulders. In our site they are found at Iriri. The class tree/shrub illustrates areas that are not flooded on a permanent basis. In the wet season when these areas are submerged they provide critical habitat for a range of species (
Figure 4). Also evident from the proportions (
Figure 12) and spatial distribution (
Figure 13) of the classes is the variability between locations. Consistent with the overall complexity of the site (
Figure 14), the substrate classifications (
Figure 11 and
Figure 12) reveal Iriri has the highest diversity of habitat classes (9 classes) and Culuene the least (5 classes). The ‘shadow’ class is not included in this total because it represents areas with topographic features such as the spaces between boulders that are not observable from the SfM-MVS products.
The 3D fractal dimension for the different habitat classes, as calculated from the substrate exposed in the dry season (
Figure 14) reveals a natural variability in complexity not only between sites but also within the same class at different sites. For example, the solid rock class from Culuene has a low value of 1.27, whereas the same class from Iriri is more complex with a value of 1.7 due to fissures in the bedrock that are not present in Culuene. Overall, Iriri is the site with the greatest habitat complexity (avg = 1.8), whereas Culuene was found to be the least complex (avg = 1.26).
4. Discussion
We demonstrated that ultrafine resolution spatially explicit information about habitat complexity can be generated from both UAV and underwater photographs with a SfM-MVS workflow. The complexity metrics calculated from the SfM-MVS products are critical analytical tools for gaining insight into the habitat classes necessary for conservation of the local species. As for coral reefs [
67], for many freshwater fishes in the Xingu river basin, fine resolution satellite imagery (<2 m) (
Figure 7a) is spatially too coarse to extract meaningful habitat complexity information. Furthermore, the orbits of most satellites restrict the revisit times and the time of day images are collected at a given location. Cloud cover is further a challenge when relying on optical satellite imagery in the humid tropics. The low altitude from which UAVs are operated not only improves the spatial resolution of the data but also increases the range of conditions under which photographs can be taken (e.g., uniform cloud cover). Underwater SfM-MVS has additional flexibility because varying the camera settings (e.g., ISO) or the use of artificial illumination such as dive lights can produce high quality photographs even under less than ideal natural illumination conditions [
24]. In high energy or dangerous aquatic systems such as rivers with strong current, data collection can be extremely challenging. Both UAVs for areas with shallow water and cameras operated from a boat or by an experienced diver/snorkeler could mitigate these challenges [
23].
Based on
Table 3, the measurements of distance within the dense 3D point cloud were more similar to the digital calliper than to the tape measure because use of the tape measure underwater is prone to human error. A minimum of two snorkelers were required to use the tape measure and record the values. The strong current added to the difficulties of taking accurate measurements underwater.
From the SfM-MVS workflow, the spatial variability in complexity at individual sites is captured due to the 2D and 3D nature of the products (
Figure 9,
Figure 10 and
Figure 11). In comparison, the SRI is less robust and more subjective. The theoretical SRI from the DSM ranges from 2.05 to 2.44. It does not capture the differences in the cross-section profiles. There is also a high degree of subjectivity depending on where the transect is placed [
50]. The 2D map of rugosity is more robust where the entire variability can be summarized by the range of values (1–262).
Similar to [
50] we found that as the GSD decreases, the values of rugosity increase in the SfM-MVS products, indicating that at finer GSD more of the structural complexity can be actually measured. This scalability is one of the strengths of SfM-MVS; from products generated at a fine GSD, generalizations of the habitats to coarser scales can be readily achieved. Such scaling is not possible from SRI measured in the field because the chain-and-tape method is calculated at a fixed resolution (i.e., chain-link size) [
23].
With terrestrial laser scanner reconstructions of terrain, highly accurate classifications have been achieved for geomorphological classes using multi-scale 3D point cloud classification [
68]. For classification of the exposed habitats from the SfM-MVS products we found that colour and topographic and texture metrics combined as predictors in the shallow neural network resulted in high accuracy of the classification (
Figure 12 and
Figure 13,
Figure A2). Location and proportion of the habitat classes can provide important information about the ichthyofauna. For example, in-situ observations indicated that at all sites different species preferentially inhabit the different classes. At Iriri for example, the payara (
Hydrolycus armatus) is only found in the main white-water channel. The splash zone on the large boulders next to the main rapids allows for growth of riverweeds (
Podostomacae spp.) that adhere to solid surfaces in high flow environments. The endemic parrot pacu (
Ossubtus xinguensis) has specialized dentition to feed on them. The wet bedrock and Podostemaceae classes were important to differentiate from the surroundings because not only are the riverweeds a food source for fish but also the wet rocks represent the splash zone from the rapids where Podostemaceae dormant from the previous year may regenerate provided a consistent water source. The trunks and lower branches of the vegetation class may become submerged in the wet season where they provide important habitat for fry and juvenile fishes (e.g.,
Cichla melaniae) as well as spawning and feeding grounds for species such as
Leporinus frederici,
Farlowella amazona,
Geophagus cf altifrons,
Aequidens mikaeli and so forth. The boulder class represents habitat for anostomids (e.g.,
S. respectus and
Synaptolaemus latofasciatus), as well as loricariids (e.g.,
Pseudacanthicus pirarara) and cichlids (e.g.,
C. dandara) among others. The gravel assemblage class represents habitat and feeding areas for several species including cichlids (e.g.,
Crenichla sp. 1,
G. argyrostictus,
R. xinguensis), anostomids (e.g.,
Leporinus maculatus) and Serrasalmids (e.g.,
Myleus setiger). The sand class is further important for stingrays (e.g.,
P. orbignyi), loricariids (e.g.,
Limatulichthys spp.) and cichlids (e.g.,
Teleocichla monogramma) among others.
The 3D fractal dimension is a powerful metric, which allows for quantitative comparisons between habitat classes at multiple sites as well as between sites. This metric is scale invariant and relates complexity, spatial patterns and scale [
23]. In order for this metric to be calculated, fine spatial resolution, spatially extensive, digital 3D data are required. From an ultrafine resolution (mm pixel size) underwater photography SfM-MVS derived DSM of an inter-tidal reef, Ref. [
23] found that fractal dimension better characterized the roughness of the reef than other conventional measures. Our results reinforce this finding (
Figure 14).
Despite the strengths of the SfM-MVS framework presented here, there remain limitations. Both the UAV and underwater photography were collected with a nadir view of the substrate, therefore elements of the substrate such as the underside of overhangs, caves, crevices, or tunnels inside boulders or other structures underneath or between boulders are only partially represented (
Figure A1). These elements, especially caves within the piles of boulders or holes inside rocks are important features of the habitat for certain fishes. For example, small loricariids such as
Leporacanthicus heterodon are preferentially found in crevices and caves between boulders to avoid predation. Ref. [
19] also point out this limitation for SfM-MVS reconstruction of coral colonies. For the UAV based submerged DSM, more sophisticated corrections such the fluid lensing approach [
67] for the individual frames used in the SfM-MVS workflow have the potential to further improve the reconstruction of the submerged topography.
Both UAV and underwater photographs can provide a baseline for assessing freshwater habitat complexity. However, it is not an automated process and because of the complex nature of the analysis, a thorough understanding of the camera system is necessary to produce reliable and repeatable digital representations of the habitats. Errors can result in the SfM-MVS products if the most important factors are not considered [
19]. A number of factors influence the accuracy of the photogrammetric models and subsequent products (DSM, orthomosaic and metrics of complexity). The various software implementations of the SIFT (or SIFT-like) algorithms rely on distinctive key points (i.e., invariant features) in the photographs. Following steps aimed at improving the initial set of candidate key points and filtering to reject points with low contrast and a strong edge response, the retained key points are robust to varying illumination conditions, view angle, pixel noise and so forth [
69].
Similar to terrestrial environments, best practices for photographic data collection for underwater SfM-MVS involve a few major considerations: GSD, depth of field, shutter speed and overlap. The GSD (the distance between two consecutive pixel centres as measured on the ground) is a product of the distance of the camera to the ground/substrate and the focal length of the lens [
12]. At a given focal length, increasing the distance between the camera and the substrate will result in a coarser GSD and larger imaging area (
Figure 15). Conversely, at a given distance between the camera and substrate, increasing the focal length will result in a finer GSD because each pixel will capture a smaller area (will also result in a smaller imaging area). Overlap in both the along track (direction of travel) and across track (neighbouring lines) directions can be optimized once the GSD and imaging area have been determined. For underwater scenes a high overlap in both directions (~ 80%) is recommended.
Depth of field (DoF), the zone within which the photograph is in focus, depends on the aperture used (e.g.,
f/8), the distance to the substrate and the focal length of the lens. Narrower apertures result in greater DoF (
Figure 16).
In addition to overlap, the DoF is a critical element to maximize photograph quality for underwater SfM-MVS because when as much of the subject (i.e., substrate) in the frames is in focus (i.e., maximizing DoF) it allows for a greater number of keypoints to be generated and retained. A wider angle lens and small aperture will increase DoF. An increase in the ISO (i.e., sensitivity to light) will result in an increase in the shutter speed and aperture, if the aperture cannot be set manually (
Figure 17). Under the consistent illumination conditions (clear sky with direct solar illumination—not under overhanging vegetation) the photographs from the stream survey (
Figure 10) were collected with a narrow range of f-number (µ = 5.7 ± 0.4, range 5.0–7.1) and shutter speed (µ = 138.3 ± 22.1, range 100–200). For the underwater model of the rock with the hexagonal locknuts (
Figure A1), overhanging vegetation with patches of clear views to the sky resulted in a greater range of both f-number (µ = 5.6 ± 1.7, range 3.5–10.0) and shutter speed (µ = 143.8 ± 90.1, range 50–400) used. For the UAV photographs (
Figure 17b) the X3 camera consistently used a fixed aperture of
f/2.8. An increase in shutter speed due to the glare off the surface of the water at the Retroculus site can be seen with the widest range of shutter speeds. For the X5S camera the aperture varied from
f/4.0–
f/6.3 at Culuene and
f/4.0–
f/5.6 at Jatobá.
Given the sensor size of the camera (e.g., full frame vs. micro 4/3), the focal length of the lens and the aperture chosen, the hyperfocal distance can be calculated. This value is the closest distance to the lens which is in focus while maintaining the furthest distance in the photograph acceptably sharp. For example, on a full frame sensor with
f/5.6 and a 24 mm focal length, the hyperfocal distance is 3.4 m. Focusing at the hyperfocal distance will result in the foreground (anything closer) being out of focus. With UAV based photographs, generally given altitudes of tens of meters above the tallest features in the landscape (e.g., trees), the entire photograph will be collected within the hyperfocal distance. For example, with a micro 4/3 sensor (such as the X5S),
f/5.6 and a 24 mm lens, the hyperfocal distance is 6.9 m. With the 15 mm lens used at Jatobá, the hyperfocal distance ranged from 2.7–3.7 m. At a flight altitude of 30 m AGL (
Figure 5a), the entire landscape was within the hyperfocal distance. Underwater, however, if the water is shallow and/or the operator is close to the substrate, care must be taken retain an acceptable focus for as much of each frame as possible. This can be achieved by calculating the DoF given a particular focal length and aperture (
Figure 16). The lens can be focused on a section of the substrate that would centre and maximize the DoF over the range of topography.
The shutter speed should be fast enough to freeze motion; this is fundamentally important to ensure the frames are in focus. At least 1/125 sec or faster should be used for underwater photography whenever possible. Photographs should be taken under the brightest illumination conditions available taking into consideration water depth and clarity. And, if possible, flash should be avoided because shadows from the objects or structures created by the flash will change with each perspective from which the frames are taken and reduce the likelihood of matching keypoints. Diffusing the flash from the strobes can help to mitigate the harsh shadows if they must be used. Lastly, variable focal length lenses could result in deterioration of the SfM-MVS model over a fixed focal length lens. However, with the use of a lens with high quality optics, high sharpness, low distortion and low chromatic aberration the difference should be minimal.