Following the design and testing of Reef Rover Version 1, the same location at West Bay, Grand Cayman was used to test Version 2 in November 2017. The subsequent field campaign proved more productive due to the new design, which facilitated set up and dismantling, transport, and overall execution, which resulted in more accurate products.
3.1. Results of Testing Version 1
Our initial objective in building Version 1 was to compare SfM models using two different camera systems: a smaller YI™ 88001 16MP Action Camera and a larger mirrorless Sony a6000 with a 16mm lens. Unlike many action cameras on the market, the 16MP Yi camera has the ability to set image acquisition parameters, such as ISO and shutter speed, although doing so requires uploading of 3rd party firmware to the camera. The Sony a6000 has a 24 MP APS-C format sensor. The Yi camera ($150 USD) is much less expensive than the larger mirrorless a6000 ($600 USD); however, both cameras allow the shutter speed to be controlled, which is critical for minimizing image blur when taking underwater photos. No attempt was made to use artificial illumination when taking the photos, but we relied on available sunlight through the water column.
When testing the cameras to collect rapid subsequent images, each camera was set at the fastest interval capable for the length of the planned mission, which was typically between 20–30 min. For both camera models, the maximum rate was once every 1.5 s. In practice, for both models, the write buffer would occasionally get filled (once every several minutes) and would result in an image interval delay longer than 1.5 s. Both cameras were tested at various depths within the same reef area, knowing that shallower areas would require closer track lines to achieve adequate overlap. We used the mission planning Android app, Tower, to specify the lens and automate the process of calculating the distance between mission lines that is required for a desired overlap (e.g., 70%), and an average depth of the features below the camera (e.g., reef). Tower has a preset lens configuration for the Sony 16mm lens, which is used to determine proximity of the tracks based on a given “flying height” for which we input the depth (i.e., the software is designed for aerial platforms). Although a lens configuration was not available for the Yi camera, the corresponding parameters were manually set up using the software. However, setting a consistent depth proved problematic, especially in areas where the reef came near the water surface (i.e., <2 m).
Although the Tower app does compute the necessary distance between survey lines, in order to achieve acceptable overlap at an expected depth, we often found it useful to calculate these numbers manually for planning custom missions, such as double grid patterns described earlier in
Figure 4. The formula for calculating the dimension of an individual camera footprint is the tangent of one half the field of view (FOV) angle times twice the depth. Even though the lens is round, the sensor is not, so there is a different angle associated with the length and width of an image. The field of view angles for an APS-C sensor using a 16mm lens are 73 degrees for the wider dimension (width) and 52 degrees for the narrower dimension (length). Once the length and width are calculated for a given depth, the overlap can be calculated as 1 minus the desired overlap (e.g., 0.7) times the wider dimension; we only tested orienting our cameras with the wider dimension perpendicular to the survey tracks. The nominal ground sample distance (GSD) for a given depth is calculated as the image dimension divided by the number of pixels for the corresponding dimension.
Table 4 presents these calculations for the Sony camera tested set at a focal length of 16mm. All units in the table are given in meters (m) except ground sample distance (GSD), which is listed in millimeters (mm).
Images were collected over the same reef area using both cameras and copied directly from the memory cards to a PC to build the SfM models using photogrammetry software. Although both camera models lack an onboard GPS, we did not attempt to geotag the images for our first test comparison. Geotagging involves assigning a coordinate to each image based on the GPS location reported by the autopilot at the same time as the image was acquired and can facilitate the construction of the SfM model. Our initial attempts to create SfM products from these datasets were not successful. However, we explored several different data acquisition parameters and processing options and learned many important lessons through trial and error, which guided future efforts and may assist others attempting similar research.
3.1.1. Sun Angle Relative to Acquisition
The sun angle relative to the water surface proved to be a critical image collection parameter. When the sun is at higher angles relative to the horizon (>25°), the action of the waves, even if very calm, will produce an unpredictable, constantly changing light pattern across the underwater surface that is being imaged. This presents an unsurmountable issue for the SfM software as it relies on identifying similar patterns (i.e., keypoints) that represent the same objects in overlapping images. The differences in light patterns from image to neighboring image prevents this recognition from occurring and/or produces false positives in areas without significant distinguishing features (e.g., sand). Others have noted this same issue in terrestrial or aerial SfM work [
13,
26,
27].
The only acceptable solution to this issue was to acquire imagery when the sun angle was low (i.e., <25° from the horizon), either in the early morning or late afternoon. Alternatively, one could acquire images on an overcast day with diffused light conditions that would minimize the changing light patterns that are caused by high angle and direct sun rays. However, the availability of light is diminished underwater and acquiring images during overcast days or at lower sun angles can often prove problematic, particularly in the case of using the action cameras (discussed below).
3.1.2. Depth of Underwater Features
Another topic of our research considered depth and determining how deep the camera system could map and model the reef environment. We were aware that our cameras, being mounted to the vehicle at or slightly below the water surface, would be limited in their ability to capture recognizable images at deeper depths as the visibility of the water column diminishes at greater depths. Pure water, although mostly transparent, does absorb light. Ocean water always has other materials both dissolved and particulate matter suspended within it. Due to the nature of SfM, it is impossible to create a model if there are no distinguishable features. Although no attempts were made to quantitatively measure water clarity at the test sites, the clear waters of the Cayman Islands provided an environment where we could collect data at depths of less than 10–15 m that could effectively be used to create SfM models. We did not attempt any tests beyond this depth range.
In addition to the light penetration issue presented, depth variability of benthic features was also an issue, particularly at the Grand Cayman site. In our primary test location, the reef structures varied from being about 8 m deep at the deepest locations (mostly sandy areas) to coral reef heads that were within 1 m of the water surface. This presented the problem of motion blur in the images (e.g., shutter speed too slow to capture detail at close proximity) as well as not having sufficient image overlap to adequately represent features in shallow areas. Since building SfM models relies on computer vision, images acquired closer to the subject require more overlap than subjects that are further away (given the same camera FOV or focal length). This is because a feature closer to the camera appears to change more with slight changes in perspective than an object that is further away from the camera.
Depth also complicates motion blur issues. Motion blur is often confused with focus because it can look similar in an image, but motion blur is created when an object moves within the image frame during exposure. In context of the Reef Rover, motion blur is more of an issue in the near field as the apparent motion of a close object is greater than an object further away (i.e., at greater depth). For example, it is easy to take a picture with little apparent motion blur in an aircraft moving hundreds of miles per hour when flying at 3000 m above ground, but it would be nearly impossible to take a picture without motion blur of a stationary object if one was only flying a meter or so above the ground. Taking a reliable image in 1–2 m of water with a vehicle moving at multiple meters per second without much light was an issue. We found a shutter speed faster than 1/1000 was able to eliminate motion blur at this depth. To compensate for this, we effectively increased ISO sensitivity by setting the camera to automatically set the ISO value to any value at or below a certain ISO threshold (ISO < 1000) on each exposure. When using the APS-C format Sony cameras, this allowed for acceptable images to be acquired; however, when using the action cameras, the image became very grainy or blurred. When imaging deeper areas, particularly in areas where no imaged surface was less than 2 m below the water, a slower shutter speed (1/640) produced acceptable images using the same ISO settings.
When considering focus and depth of field settings, the action cameras are fixed focus with a wide lens (i.e., fisheye) and large depth of field. This can have advantages but image quality across varying environments generally suffers from having one fixed setting. After testing the Sony cameras, the optimal setting was automatic aperture and AF-C mode, meaning the camera attempts to search and focus continually. Occasionally, this yields an out-of-focus image if the camera needs to hunt for a focus point while the shutter is open. The depth of field was only an issue in cases where the subject was very close to the camera lens (<~1.5 m), particularly since larger aperture values were required in the lower light conditions.
3.1.3. Action Camera versus Larger Format Camera
Overall, the image quality acquired from the action cameras (both the Yi™ and the GoPro™) were much poorer than those acquired with the Sony a6000. We were not able to create suitable SfM models from any of the images collected by action cameras when mounted to either version of the vehicle. Like many action cameras, they both utilize a 1/2.3” sensor (crop factor of 5.62 compared to full frame), which is over 10 times smaller than the APS-C sensor in the Sony a6000 (1.5 crop factor). This means the individual size of the detectors is smaller, allowing less light to come into contact with the sensors. Therefore, the APS-C sensor will have less noise (and subsequently provide richer detail) when using similar camera settings (particularly ISO).
Another important consideration is the type of shutter used in the camera. The action cameras operate using a rolling shutter, whereas the Sony cameras use a mechanical shutter (when taking still pictures rather than video). A mechanical shutter image is captured by taking a snapshot of the entire scene at a single instant in time, similar to the way film cameras work. A rolling shutter camera operates by scanning across the scene rapidly, either vertically or horizontally, like a television or computer monitor reproduces an image on screen. This concept is different than shutter speed, which in the case of a rolling shutter refers to the amount of time a single detector on the sensor is exposed to light for a particular image. When using a rolling shutter, the image is not acquired all at the same time, and if the camera is moving faster than the scan rate, there will be distortion in the image. This effect is often referred to as the “rolling shutter effect” and can be difficult for SfM software to handle as it attempts to build a model. In cases where the effect is mostly caused by motion in a constant direction, it can typically be accounted for in the software as a camera option. However, in the images collected using the action cameras, oscillations were created both by the electric motors themselves, and by the effect of the autopilot constantly correcting the position of the vehicle.
Upon inspection of the images acquired using both action cameras, we noticed evidence of the rolling shutter effect, which was likely the reason these data did not provide an acceptable SfM model. Although others have reported success using action cameras [
10,
14,
21], this is likely due to the ability of a diver to hold a position and acquire a more deliberate still image. We did not attempt to stabilize the image through mechanical techniques (i.e., powered gimbal) as it would have added significantly to the cost and, given the budget restraints, was not practical in the underwater setting in which we were working in.
3.1.4. Camera Capture Interval and Heating Issues
One of the primary camera issues tested on the Reef Rover Version 1 was having insufficient image overlap due to the increased speed of the vehicle that prevented getting pushed too far off course by ocean currents. We attempted to solve this by adding closely spaced waypoints along each track; however, the slow interval of one frame every 1.5 s ultimately did not provide enough overlap. We attempted to solve this problem in two ways. First, the camera needed to be capable of acquiring images at a more rapid interval, and second, a second camera needed to be mounted on both sides of the vehicle to ensure sufficient overlap between images. The subsequent design of the Reef Rover Version 2 upgraded the systems to a pair of Sony a6300 cameras that provided the capability of acquiring up to three frames per second in continuous shooting mode. When the camera is firing at this rapid interval, the memory buffer does fill up and often changes the rate of continuous capture to match the maximum rate at which the images can be written to the memory card. After examining our data from the a6300 cameras, we found the effective capture rate to vary from approximately 100–170 images captured per minute (i.e., the capture interval was between 0.35–0.6 s when averaged over a minute). We suspect that the effective rate is determined by a number of factors including the quality of the SD card, the camera settings, and the complexity of the imaged scene (particularly when capturing JPEGs). For all data missions we used a SanDisk Extreme Plus SD card 90MB/s U3 Class 10.
Considering that the Sony cameras are not waterproof, a Miekon waterproof housing was used and mounted on the bottom of the vehicle frame. While firing at the most rapid interval during a mapping mission, the cameras inside the housing would eventually overheat and report to the user a need to cool down. Testing of the camera outside the housing did not result in overheating. The overheating would cause a delay in the mission for approximately five minutes until the camera sufficiently cooled to resume normal operation. When using the a6000 shooting at a longer interval (1 frame/1.5 s), overheating occurred after 50–60 min of continuous shooting. However, the a6300 would overheat after about 30 min of continuous shooting since it was shooting at a much faster interval. While this may not be a problem for mapping smaller areas (<50 m × 50 m, with depths >2 m), the overheating issue presents a practical barrier for mapping relatively larger areas that would take longer than 30 min to acquire.
3.1.5. Structure from Motion (SfM) Processing and Data Volume Issues
Data post-processing was completed using Pix4Dmapper, a widely adopted photogrammetry software originally designed to create drone-based SfM models. Using this software, users can specify a large number of processing parameters as well as the types and formats of the outputs generated. We present how changing these processing parameters affected our ability to create SfM models successfully. Although we utilized Pix4Dmapper there are other software programs that operate in a similar manner, and have similar processing parameters. These include Agisoft Metashape (commercial license) and Open Drone Map (open source license).
For each set of images acquired during a data collection effort, we created a 3D point cloud, 3D mesh, digital surface model, and orthophoto mosaic using Pix4D’s “3D Maps” template settings. First, a user must specify the camera settings (e.g., sensor size, pixel width and height, focal length, etc.). These settings are typically contained within the EXIF metadata for each image and automatically identified by Pix4D. If necessary, these settings can be specified manually. Once the camera settings have been entered, the software uses computer vision techniques to identify the same object across multiple overlapping images. These identified object locations are referred to as “keypoints” and are used to infer the relative position of each “camera” or image using the camera parameters—a term referred to as camera calibration. Camera calibration can be aided if the GPS position for each camera is known, because the software can eliminate pairs of images that would not share any keypoints because the images do not overlap. Without the coordinate information, the software must check each image against every other image for corresponding keypoints—a process that is computationally intensive and very tedious. As an alternative, Pix4D can use other strategies to speed processing and minimize image search time by eliminating pairs of images that are not neighbors. These include using the time stamp information of each image and comparing overall image similarity. Since the image data collected with Reef Rover Version 1 was not geotagged, both of these options were selected in an attempt to speed processing in the selection of keypoints.
Another processing parameter we adjusted in relation to the keypoint identification is the scale of the images used in the search process. The user has the option of using the full resolution image, or resampled versions at specified intervals (i.e., use half the pixels, one quarter of the pixels, etc.). Using the images at full resolution will take longer to process; however, the location of the keypoints will be more precise. Also, a different number of keypoints are often identified by the software when processing at varying resolutions. When there are not enough keypoints, the camera positions will not be able to be inferred by the software (i.e., calibrated). In an extreme case, if not enough camera positions are inferred, the software is not able to find a solution and stops processing the images. After the cameras have been calibrated, a second attempt is made to find a greater number of matching points using the camera positions to aide this process. This process results in a dense point cloud from which a 3D mesh is created by connecting the matching points. A resulting orthophoto mosaic is created by projecting the point cloud onto a 2D plane.
Throughout this process, an important issue to keep in mind is computer processing power to handle the large data volume created from SfM missions. Our missions using the Reef Rover Version 1 were planned at time periods ranging from 25–40 min and collected images every 1.5 s. Each mission produced approximately 1000–2000 images, and processing these data became very time and resource consuming, depending on the processing power of the computer. For example, on a desktop machine with an Intel Core i7 3.4 GHz CPU, 64 GB of RAM and a 4GB graphics card with 1024 Compute Unified Device Architecture (CUDA) cores, a mission dataset with roughly 2000 images, took several days of processing time. In comparison, processing the same number of aerial drone images usually takes 5–8 h on a similar machine. The primary reason the processing took much longer is because images were not geotagged and the processing software could not take advantage of the geolocation of the images.
3.1.6. Version 1 SfM Model
A successful SfM model was created using the Reef Rover Version 1 from a dataset of 1674 images acquired from the Sony a6000 in the late afternoon (around 4 pm local time) at the West Bay, Grand Cayman site in January 2017. As previously mentioned, this mission was only partially automated, with a swimmer guiding the vehicle to image areas for supplemental coverage. However, all the images were acquired from the Reef Rover platform using the camera mounted in the waterproof case in the same position as under normal operation.
Figure 6 presents several renderings of the outputs centered over an Elkhorn Coral (
Acropora palmata) colony approximately 1.5 m in diameter.
3.2. Results of Testing Version 2
Following the setup and parameter tuning of the redesigned Reef Rover Version 2, several mapping missions were conducted using the GoPro™ Hero 6 Black cameras in the early morning and late afternoon over a 3-day period at the Wet Bay, Grand Cayman site in November of 2017. Our objective was to test if adding multiple cameras and upgrading the action camera (i.e., better sensor and faster shooting interval) would provide better quality and sufficient overlapping images to successfully process the SfM models. Compared to the single Yi camera test, four GoPro cameras were mounted and spaced evenly 30 cm apart (
Table 2b and
Figure 3 upper left) to test if additional cameras taking more simultaneous overlapping images would aide in the processing of SfM models.
The GoPro Hero 6 Black camera has a setting called “Protune”, which allows a user to set a few additional parameters such as white balance, ISO, and sharpness. We set the Protune “White Balance” setting to “Native” since this option yields a minimally processed data file. We experimented with a range of shutter and ISO settings on the GoPro camera over the duration of our tests. Similar to the tests with the Yi action camera, we were not able to produce any successful underwater SfM products with the GoPro images using any of the Pix4D processing techniques discussed in the previous section. Although the image quality of the GoPro did appear better than the Yi, it was still not as sharp as the images that were later captured using a Sony a6300 camera. The software simply did not find adequate keypoints due to both clarity of the images and enough image overlap. The shallowness of some of the reef features near the water surface also contributed to insufficient overlap and image blur.
Our test of the Reef Rover Version 2 in the Dominican Republic in May 2018 proved to be the most successful—applying all the knowledge we had gained from the previous tests. We executed several fully automated missions over a few different types of coral reefs off the northeastern shore of Catalina Island. The average depth at the site varied between 5–8 m, which was deeper than the study area we had tested on Grand Cayman, which averaged around 2–4 m. We found that the deeper depths created sufficient overlap suitable for mapping and missions could be similar to those depicted in
Figure 4c with tracks 1 to 1.5 m apart. At this location, a 30 × 30 m square area was mapped using the Reef Rover traveling at an average speed of 3 m/s in approximately 12 min, covering a total distance of approximately 2000 m. Even though the vehicle was set to cruise at 5 m/s, slowing occurs upon approaching each turn waypoint and the vehicle rarely reaches full cruising speed before arriving at the next waypoint. In practice, the vehicle would be under power for closer to 15–18 min in a 30 × 30 m scenario considering launch and initial testing time plus travel to the start of the mission from the base. Such a mission would yield approximately 1100 images on each camera, or 2200 images total when using two cameras.
Over a larger 50 × 50 m square area, a similarly planned mission would take approximately 25 min, traveling at a slightly faster average speed. However, if deployment times start to exceed 30 min, the camera overheating problem may cause an interruption of the mission. Consequently, we found that 30 min of mapping time represents the largest area that could be practically executed in a single non-interrupted mission using Version 2. A 30-min mission would produce approximately 2500 images for each camera, or 5000 images in total. It is important to remember when planning a mission, the operator should also consider environmental parameters such as ocean currents and wind that may increase mapping time.
At the Catalina Island test site, we exclusively utilized the a6300 cameras because of the faster shooting interval and superior quality images. Our first deployment in May 2018 used a single camera, but in August 2018 we added a second camera to increase image overlap. In each case, the cameras were set to acquire at 1/640 or 1/800 shutter speed with an ISO of 400 and automatic aperture (i.e., shutter priority mode). Although the a6000 cameras have an automatic underwater setting that we used when testing them, the a6300 cameras lacked this setting. Prior to acquiring images with the a6300 cameras, we manually held one camera underwater simulating the conditions during actual data collection and manually adjusted the white balance to a visually pleasing value using the color temperature setting. The images during the automated missions were geotagged based on the GPS log from the Pixhawk autopilot using a software program called Mission Planner.
During the May 2018 test at Catalina Island, a single a6300 camera was used to map an area approximately 15 × 20 m that was largely dominated by Boulder Star Coral (
Orbicella annularis). Several automated missions were completed over this area, however the stronger ocean current at the site tended to push the vehicle off course, resulting in the need to revisit some of the areas immediately after the automated mission finished. The track of the vehicle can be observed in real-time using the Tower app on the tablet, so any resulting gaps can be later covered manually. In general, it is easier for the vehicle to follow tracks that run parallel to the ocean current rather than perpendicular to the current where drift is more likely to occur. Mapping this area produced approximately 2000 images. We used these as input into Pix4D to generate a SfM model. The geotagged images initially failed to yield a SfM model despite several attempts to adjust the image scale during keypoint generation and adjusting the assumed accuracy of the initial camera position. Pix4D was able to generate the ortho shown in
Figure 7 using only image collection time and image similarity to generate the keypoints.
On our return trip to Catalina Island in August 2018, we focused on imaging areas with more coral cover and rugosity. We imaged three areas that were roughly the same size to each other and slightly smaller than the area imaged in
Figure 7 (May 2018). We decided to forgo geotagging the images until we could retrofit the system with a higher accuracy GPS unit. The mapped areas had similar depths to the previously successful imaged area (i.e., 5–8 m deep) and included an area dominated by Boulder Star Coral (
Orbicella annularis) and Pillar Coral (
Dendrogyra cylindricus) that can be seen in
Figure 8. A video flythrough animation of the pillar coral 3D mesh for this location is available as
Supplement #S2.
Recent storm damage was observed in the other two coral areas that were mapped. In
Figure 9, a toppled Elkhorn Coral
(Acropora palmata) can be seen as a result of the recent Category 1 Hurricane Beryl that had passed near the island a month prior. The average depth for these areas was slightly shallower (i.e., 3–6 m). The documentation of the damaged coral demonstrates the potential to utilize the Reef Rover to monitor changes in coral reef over time and study the impacts of other hazards, anchor damage, disease, bleaching, or other effects of climate change.