1. Introduction
There is an increasing need to replace or supplement diver and snorkeler based marine field sampling with sampling based on autonomous platforms, such as Autonomous Surface Vessels (ASVs) or Autonomous Underwater Vehicles (AUVs). This is being driven by the need to sample in deep areas (below dive-able depth or around 30 m [
1,
2]), to sample at night or in murky conditions, or to work in areas inhabited by potentially hazardous animals such as sharks or crocodiles [
3,
4].
For many autonomous platforms visual imaging systems are the main form of information collection [
5]. These include standard digital cameras, hyper-spectral cameras, three-dimensional (3D) and 360 degree rigs using a range of lighting from ambient to wavelength-specific light sources (such as UV light) [
6]. In designing cameras for new platforms there is an implied assumption that these cameras can capture the same level of information as a diver or snorkeler can using their eyes, with the benefit that the camera produces a permanent record that can be analyzed by human or machine vision systems. This assumption needs to be tested by defining what a human observer can perceive and then comparing or benchmarking this against what a camera system can achieve. The camera system can be benchmarked on what a human can interpret from the captured images (machine capture + human interpretation) or against an automated image analysis system (machine capture + machine interpretation).
This study looks to provide an initial comparison of the visual acuity of two snorkelers and a modern camera system as applied to the manta-tow sampling method, a common method for observing shallow water coral reefs [
7]. The manta-tow method involves an observer on snorkel being towed on the surface behind a small vessel and assessing the condition of the benthos or reef over a series of two-minute tows. At the end of the two minutes a summary of the reef just seen is recorded on an underwater slate before the next two-minute tow is started. This method is quick and simple but requires at least three people (snorkeler, boat driver and safety observer) and does not create a permanent record of what is seen, apart from the summary record. There are other coral reef and shallow water visual assessment methods, such as line intercept transects [
8] and fish visual surveys [
9,
10,
11], that are also amenable to being replaced by autonomous platforms using still or video imaging systems [
2,
6,
7]. Note that this was just an initial study using one site, at one time, with two observers; it is hoped to extend this work to other sites and water types as field opportunities allow.
2. Experimental Section
The manta-tow method uses a small floating wooden board, approximately 70 cm × 50 cm, which a snorkeler holds onto as they get towed along the surface around the reef. The board holds a sheet of underwater paper for recording what they observe. For this study an existing manta tow board was re-purposed by attaching a Sony
® A7r 36 mega-pixel full frame camera (Tokyo, Japan) in a Nauticam
® NA-A7 underwater housing (Hong Kong, China) with a Nikon UW-Nikkor
® 20 mm f2.8 lens (Tokyo, Japan) to the lower side of the board to give a view of the bottom equivalent to what a snorkeler would see. The underwater camera housing was adapted to allow for external power and camera control. On the upper side of the board a series of sensors were mounted including a sonar altimeter (Tritech
® PA500, Aberdeen, Scotland), a roll/tilt sensor (LORD MicroStrain
® 3DM-GX4-25, Williston, VT, USA) and a GPS unit, all controlled by a Raspberry Pi
® microcomputer (Caldecote, UK) located in a waterproof instrument pod.
Figure 1 shows the set up and the location of the main components.
The manta board rig allowed the collection of position data via GPS (using a small aerial that remained above the water during tows), distance to the bottom via a sonar altimeter and roll/tilt/acceleration data, all synchronized by the GPS time signal. The camera was synced to the other data by photographing another GPS unit and then correcting the internal clock of the camera to the common GPS time. This allowed the precise location, altitude (equivalent to bottom depth), speed, direction, and orientation of the camera to be recorded for each image taken. The manta board rig was controlled by the snorkeler to be just under the surface with the camera facing directly down.
A number of visual targets were printed on A3 paper (279 × 420 mm), laminated and weighted down to stay on the bottom. Two 1 m steel rulers were used to give distance measures. A small depth sensor (Schlumberger
® CTD Diver, Houston, TX, USA) was attached to one of the targets to give a separate measure of depth. The visual targets included A3-sized print-outs of the standard Snellen eye chart [
12], a color chart for white balance adjustments, and print outs with varying-sized lines to check what level of detail could be seen by the snorkelers and the camera. The visual targets are shown in
Figure 2.
The visual targets were positioned in approximately four-meter deep water in a sandy area off Wistari Reef in the southern part of the Great Barrier Reef, Australia. The two metal rulers were arranged in a “T” shape at the start of a flat sandy area with the other visual targets arranged along a line over which the snorkelers were towed (see
Figure 3). Conditions for the sampling were excellent with clear skies, calm seas and very clear water. The area chosen consisted of flat sandy region with a depth of around 4–5 m in a sheltered back reef area. All runs were done over a period of about 30 min, mid-morning, at the same location with effectively the same conditions.
The snorkelers were towed at 2 knots over the targets (which they had not previously seen) three times; once to determine the visual acuity using the Snellen chart, the second to read off the metal ruler markings and the third to look at resolving power using the left-most chart in
Figure 2. Each observer was only given one pass per target to determine what they could observe to ensure that they did not have prior knowledge of what they were seeing. For each observer run the camera was fired once a second along the tow to capture the same information. The best images were then used for the comparison; in particular images with the targets at the center where lenses typically perform best [
13] were selected although the uniformity of the sampling conditions meant that the images were all of similar quality (sharpness, detail and lighting).
The camera images were recorded as raw files (as Sony® AWR files, Tokyo, Japan) and converted to Adobe® Digital Negative (DNG) files using the Adobe® DNG Converter (version 9.2, San Jose, CA, USA); from there they were imported into Adobe Lightroom® (version 4.4, San Jose, CA, USA) and processed in Adobe Photoshop® (version CS5.1, San Jose, CA, USA). For all of the field images presented here the details are: Sony A7r camera, Nikon 20 mm f2.8 UW-Nikkor Lens, depth 4.3 m, exposure f8.0, ISO 100, shutter 1/320 s, white balance 7900 K, camera speed 2 knots moving right to left.
5. Conclusions
This short study showed that the currently available cameras and lenses cannot match a human observer for shallow water coral reef observations where the goal of the program is to identify particular targets (for example a particular type of coral or organism such as a crown-of-thorns starfish). In this scenario, trained observers tend to automatically use the highest resolving part of their eyes (by actively looking through moving their head and eyes to scan a scene), allowing them to not only have a greater optical acuity but, by scanning, to also have a greater field of view. This increased field of view and visual acuity coupled with the ability of the human mind to recognize shapes and objects [
21] puts the human observer ahead for this type of observing. Where the observing goal involves more passive observation, such as estimating the cover of benthic forms where the entire scene needs to be scanned and analyzed, cameras should be able to match what humans can do. The ability of a camera to create a permanent record is an important advantage allowing change detection by comparing images taken at different times along with allowing future analysis of particular targets or benthic types.
The implication for autonomous platforms is that, to match human capability, the imaging systems need to be optimized in terms of the sensors, lenses, lighting, image formats and processing and deployment. There is a need to utilize advances in machine vision systems over consumer-grade equipment with the explicit goal of producing images suitable for automated image analysis to accommodate future advances in this area. For studies looking to identify particular targets (such as pest species), cameras will need to be located closer to the subject, about half what a person would need, with a corresponding reduction in field of view. The camera needs to record images in raw format to allow for color correction and sharpening to be done in post-processing and to ensure that compression artifacts, found in JPEG and other compressed formats, do not interfere with automated image analysis routines.
Positioning cameras closer to complex terrain, such as that found on a coral reef, requires more complex navigation and collision avoidance capabilities than what is needed if the system is able to operate further away from the target. Reducing the height also reduces the field of view, requiring more passes over an area to capture the entire scene. Autonomous platforms will therefore need to operate lower and longer, and so these requirements need to be part of the design specifications for platforms under development.
The limited nature of this study (one site and only two observers) highlights the need to better understand human and imaging system performances under a range of other conditions, especially in turbid water and low light. It also highlights that the type of observing (active target recognition or passive scene analysis) will change the relative strength of human and camera-based observing. A full understanding of the environment and goals of the observing needs to be obtained first to inform the specifications of the imaging system deployed. There is a need to move away from generic consumer-based solutions to targeted systems using industrial vision equipment tailored to deliver against the project outcomes.
A full understanding of the imaging requirements and corresponding limitations is needed in order to develop platform mission profiles that will deliver the required set of outcomes. This in turn will need to link into the fundamental platform design to ensure the platform can support the operational imaging requirements. These linkages, between platform design, mission planning and image outcomes, need to be fully understood so that new platforms are able to support the operational capabilities of the current imaging systems while delivering the required science outcomes.