1. Introduction
To minimize the transport and introduction of non-indigenous aquatic species by commercial ships, discharged ballast water must meet limits for organism concentrations set by the Ballast Water Management Convention (International Maritime Organization [IMO] [
1]) or similar U.S. standards [
2]. These limits are based upon numerical concentrations of organisms within defined size ranges. For organisms with minimum dimensions ≥50 µm, the discharged ballast water must have <10 individuals per m
3. For organisms ≥10 and <50 µm, concentrations must be <10 individuals per mL. Additionally, discharged ballast water must meet limits on the concentrations of indicator microbes. To meet the discharge requirements, most ships will use a Ballast Water Management System (BWMS). Many of these BWMS employ oxidant-based treatment technologies, where biocidal chemicals (e.g., sodium hypochlorite, chlorine dioxide, and ozone) are introduced into ballast water as it is brought onboard [
3]. However, ships using reactive oxidants to kill planktonic organisms must also comply with national or international limits for oxidants in discharges upon release of treated ballast water. The IMO and U.S. Environmental Protection Agency (EPA) have an established discharge limit of ≤0.1 mg L
−1 of total residual oxidant (TRO) for BWMS using oxidants such as chlorine or ozone ([
1] and [
2], respectively) because it is well established that TRO > 0.1 mg L
−1 can be toxic to aquatic life. Also, TRO reacts with natural organic matter in ballast water, leading to the formation of harmful disinfection by-products (DBPs), which are released during ballast water discharge [
4,
5,
6].
Reliable and effective shipboard analytical instruments for quantifying TRO (hereafter, “TRO instruments”) in-line are critical for ensuring appropriate initial treatment dose and environmental protection when oxidant-treated ballast water is discharged. In addition to monitoring and logging TRO concentrations, TRO instruments may interface with the BWMS control systems, providing feedback to control dosing, oxidant production, and oxidant neutralization (e.g., sodium thiosulfate) before the release of ballast water. Given the importance of in-line TRO instruments for monitoring and control, it is necessary to determine the performance of these instruments relative to the standard and widely accepted colorimetric method using DPD (
N,
N-diethyl-p-phenylenediamine) for measuring TRO [
7]. In the current testing, we used hand-held DPD-based colorimeters for reference TRO measurements. More broadly, it is necessary to understand the limitations of shipboard TRO instruments, given the broad range of temperatures, salinities, and water characteristics of surface waters used as ship ballast.
Here, we describe an initial laboratory evaluation of in-line TRO instruments designed for shipboard use in BWMS, which implements methods and concepts for existing independent testing frameworks. The study aimed to test multiple instruments concurrently in a pipe loop under tightly constrained conditions. Three different commercially available TRO instruments were installed in a piping system with flowing water to simulate the manner of use in shipboard environments. Two instruments were based on the previously described DPD method, while the third TRO instrument was based on amperometry, which determines TRO using the electrical current produced by oxidants, which is proportional to the concentration of TRO. The test instruments were exposed to TRO concentrations within four ranges representing concentrations from untreated to treated water: <1 mg L−1, 2–3 mg L−1, 4–6 mg L−1, and 8–10 mg L−1. Individual instruments were only compared to the standard reference method (not to each other), and in this report, data from the instruments are anonymized. While evaluation results for specific TRO instruments are instructive, the focus of this work is the development and implementation of the test approach. Consequently, test methods are described in detail, where the methods established outlast the outcomes of a particular evaluation. As new models, alternative configurations, and firmware updates are released, a standard procedure for evaluation, such as described herein, will allow end-users (e.g., shipowners, regulatory authorities, and BWMS manufacturers) to ensure the TRO instruments meet their performance requirements.
  2. Methods
  2.1. Overview
Evaluations of the TRO instruments were designed to quantify accuracy and precision. The complete set of trials included:
- three, day-long trials measuring accuracy; 
- one partial-day trial measuring precision; 
- one day-long trial where sodium dichloroisocyanurate dihydrate (NaDCC; Sigma-Aldrich, Inc.; St. Louis, MO, USA) was used as an alternative to sodium hypochlorite (NaOCl; Sigma-Aldrich), and 
- one day-long trial using ambient water collected from the Patuxent River, a tributary to the Chesapeake Bay. 
Additional test characteristics are described in the sections ahead. Testing occurred at the Chesapeake Biological Laboratory (CBL), part of the University of Maryland Center for Environmental Science (UMCES), located in Solomons, Maryland. An indoor workspace housed all test structures, instruments, equipment, and analytical stations. Trials occurred in September–October of 2019. The time of year is mostly relevant for the ambient water trial, but it also affected workspace temperature, which (although partially temperature-controlled) was subject to afternoon heating and temperature fluctuations. The following sections describe these components, the test design, and the experimental logistics, including some minor details and design characteristics to allow foreplanning for laboratories implementing this protocol.
The assessment of TRO instrument reliability during the testing period included (a) the documentation of failures, faults, and other instrument-specific operational issues, and (b) a comparison of the expected total number of test instrument collected data points versus the actual number produced. While this is an accepted approach for the quantification of instrument reliability, the individual reliability results are not included in this publication.
  2.2. Test Instruments
Three replicate TRO instruments (same make and model) were provided by three different manufacturers for independent testing. All instruments were commercially available and in use for monitoring BWMS operations. Two of the instrument types were in-line automated DPD-based TRO analyzers, and the third was an amperometric TRO sensor. Each of the three manufacturers provided all associated operations manuals and expendable supplies (e.g., reagents, if needed), assisted in the training on instrument use, and advised on the appropriate instrument testing setup. Again, the focus of this effort was to design and implement a scientifically sound evaluation of in-line ballast water TRO instruments. Thus, instruments and resulting test data are anonymized.
  2.3. Instrument Racks
Figure 1 shows an overview of the workspace, which was designed to simultaneously evaluate three instruments, each at three temperatures. Racks for instruments, power supplies, piping, and waste handling were constructed specifically for this evaluation. Panels were created with wood framing lumber (~4 × 9 cm; 1.5″ × 3.5″) and plywood sheets. Instruments were mounted on the plywood sheets facing away from the tanks, but 
Figure 1 shows the tank-facing sides with guides and mounts for cabling, piping, compressed air lines, and drainage tubes. The two DPD-based instruments were mounted on the vertical plywood sheets. These instruments both produced a waste stream of DPD and test water, which were collected in 20 L containers (visible on the floor beneath the instrument racks, 
Figure 1). The third instrument measured TRO via an amperometric approach. This instrument was plumbed into a 2-inch (5 cm) diameter pipe on the tank recirculation loop, but its power supplier and data logger were mounted on the instrument racks.
   2.4. Power Supply
The power demands of the chillers, recirculation pumps, air compressor, instruments, and cooling fans were significant, so an additional 100 Ampere (A) power supply was added to the workspace specifically for this testing. The main power source supplied several circuits, each with a dedicated circuit breaker. Power demands on each of the circuits were kept well below the rating of the individual circuit breaker, and the overall power usage was maintained safely below the maximum amperage. Power supplies for each of the three tanks and their platforms were routed over the workspace to keep the power cords off the floor and away from water. Power supplied to each station was connected to a station shutoff switch and a ground fault circuit interrupter. From this, power was distributed to a multiple-outlet, power strip with a surge protector and an on/off switch. One of the DPD-based TRO instruments required compressed air, and an air compressor (which was kept outside, but adjacent to the workspace) was powered on its own dedicated 20 A circuit.
  2.5. Test Water
To assure homogeneity of test water, we used Type II deionized (DI) water as a basis for all test water, except for the trial using ambient water from the Patuxent River. The planned experiments required large volumes of DI water, and such volumes would exceed the production capacity of laboratory-sized water purifiers. As a solution, we procured five “Deionized Water Totes” (Serv-a-Pure Co., Bay City, MI, USA), each with 220 gallons (~830 L) of DI water. DI water for rinsing and filling tanks was transferred from the totes to the tanks with a dedicated impeller pump and flexible tubing. These pumps and tubing were only used for unamended DI water to avoid contamination. Prior to filling, we rinsed tanks with DI water (20–50 L, typically), which was later collected with a wet-dry shop vacuum. After rinsing and emptying the rinse water, we filled the tanks with 340 L of DI water, as measured using the depth gauge described below.
  2.6. Tank Filling and Circulation
We used polymer-walled tanks as reservoirs for the flow loop, which supplied test water for the TRO instruments and the reference method. Accuracy trials required concurrent use of three tanks, each with a set temperature range. 
Figure 2A shows the three tanks and support systems used for simultaneous experimental manipulations. The top wall of the tank was completely cut away, allowing access for piping and instrumentation, and easing the process of filling and cleaning (
Figure 2B). The tanks—79 × 86 × 55 cm (31″ × 34″ × 22″) in length, width, and height, respectively—held a maximum of 380 L (~100 gallons) of test water. Given the dimensions of the tanks, each cm of water height represented ~7 L. From this relationship, we marked a pole with gradations representing 20 L. For verification, we added exactly 20 L in a partially filled tank to check that the water depth increased by the distance marked by one gradation.
Once filled, tanks recirculated water through a pipe loop, driven by a centrifugal pump (Hayward Power-Flo II, SP1750 0.5 HP; Hayward Industries, Charlotte, NC, USA), at a rate of 150 to 180 L min−1 (~40 to 50 gal. min−1). Two-inch (5 cm) diameter PVC pipes were used for the main trunk line, which connected to 0.75-inch (~2 cm) diameter PVC pipe tank influent and effluent lines, located on opposite sides of the tank, near the tanks’ bottoms. Given the flow rate, the residence time of water in the tank was 2–3 min, and the entire volume was quickly mixed and homogenized. We verified this using a dye tracer study conducted prior to actual testing. For this preliminary test, a tank was filled with municipal water. Approximately 5 mL of the fluorescent dye fluorescein isothiocyanate was added to the center surface of the tank while water was circulating. The dye was homogenized almost immediately (i.e., within ~20 s of addition). In addition to the flow loop, we positioned two small submersible pumps to increase circulation and limit the settling of particles on the tanks’ bottoms.
  2.7. Temperature Manipulation
For trials measuring accuracy, we set water temperatures to be cold (7–10 °C), moderate (15–18 °C), or warm (24–27 °C). Recirculating chillers (e.g., 
Figure 1) maintained set water temperatures throughout day-long or partial-day trials. Insulation helped reach and maintain set temperatures: Sides of each tank were covered with foam board insulation (2″ [~5 cm] thickness). Tubing used as part of the chiller and coil circuit was covered in foam insulation to minimize exchange with the atmosphere. Tops of the tanks with the exposed water surfaces were covered with sheets of flexible insulation, consisting of two layers of air-bubble plastic film draped with a 1 cm-thick vinyl foam mat. Additionally, laboratory windows were shaded to limit solar radiation and heating of the workspace. When not in active use, tanks were covered with air-bubble plastic film and covered by a heavy insulator sheet (See 
Figure 1). The chillers circulated DI water in a closed loop from the chiller reservoir through custom-manufactured, stainless steel, heat exchanger coils (visible in 
Figure 2A,B), then back to the chiller reservoir. The tank dedicated to water temperatures 7–10 °C contained two coils, one for each of two recirculating chillers. The other two tanks both used just one chiller and one coil each. Tanks were filled the evening before the tests to allow the water to equilibrate to its target temperature overnight. The number of chillers (and chiller capacity) and the insulation were sufficient to keep the water temperatures within the specified range during the day-long trials, even as outdoor, ambient temperatures increased throughout the day.
  2.8. Salinity and Other Water Characteristics
The evening before the tests, after filling the tanks with DI water, sea salts were added (conforming to ASTM D1141-98 [
8]) to achieve salinities of either 0.2, 15, or 30 practical salinity units (PSU). Sea salts were pre-weighed in a carboy or another sealable container for mixing (
Figure 2B). DI water from the tank was added to the container with the dry salts to between 25 and 50% of the container’s capacity. Then, the salts were suspended by mixing, and the slurry was added to the tank. Four rinses were sufficient to dissolve and mix the granules into the tank water. After mixing for 30 min, we measured the salinity using a hand-held refractometer to verify that it was within the expected range. Salinity was checked again the next morning before the start of the test.
In the morning of the test, we added lignosulfonic acid calcium salt (Sigma Aldrich, St. Louis, MO, USA; CAS: 8601-52-7) and micromate (micronized humates, Mesa Verde Humates, Huma, Inc., Gilbert, AZ, USA) as surrogates for natural dissolved and particulate organic matter (DOM and POM), respectively. The added materials targeted 6 and 4 mg L
−1 of DOM and POM, respectively, which are minimum concentrations required for certification testing of BWMS (US EPA, 2010) [
9].
Before starting the dosing-sampling experiment, pH was measured using a digital pH meter (Orion Star A214; Thermo Fisher Scientific, Waltham, MA, USA) that was calibrated each day using three standard buffer solutions (pH 4, 7, and 10). Based on TRO instrument specifications, a range of acceptable pH values from 6.8 to 7.6 was set. Tank water outside this range was treated with relatively small volumes (0.1 to 1.3 L) of dilute acid or base (0.1 M HCl or 0.1 M NaOH), and then pH was measured to verify it was within the target range (ASTM D1141-98).
  2.9. Sampling Method
Each tank’s pipe loop supplied all the TRO instruments, which were operating continuously throughout the experiments. The pipe loop also included a port that fed tank water through a semi-rigid plastic tube used for sampling (visible in 
Figure 2A–C images of tank sampling). For the reference method, we collected water from these tubes in pre-labeled, 250 mL amber glass bottles. The three tanks were sampled simultaneously. Bottles were first rinsed and emptied three times (rinse water was discarded, not returned to the source tank). After rinsing, the sample tube opening was kept submerged but near (~2 cm) the surface-water interface. The bottle was overfilled, capped, rinsed in deionized water (to rinse the overflow water), then dried. This procedure is typically used for analysis of dissolved gases, which may be lost at air-water interfaces (e.g., in bubbles or water surfaces, if turbulent). The sampling procedure minimized turbulence, bubble formation, and headspace in the bottles. Samples were immediately carried to one of three analysis stations, each dedicated to a single tank. For each sampling event, three replicate sample bottles were collected (paired with three instrument readings) at 5 min intervals.
  2.10. Preliminary Sampling and Dosing
After the addition of sea salts and other additives, we verified that the salinity, temperature, and pH were within acceptable ranges. Once all parameters were in range, we collected a preliminary “no dose” sample using the procedure described above. Following this, a series of “doses” to escalate TRO in the tanks over time was started. The “doses” were prepared the morning of the test using stock solutions of sodium hypochlorite for all but one trial. For this trial, we added NaDCC as the “alternative oxidant” as this compound is used in at least one BWMS [
10]. Oxidant solutions were dispensed into amber glass bottles, which were labeled with the oxidant volume and the identification of the designated tank. Upon the designated start time, the solution was poured into the center of the tank, and the bottle was rinsed three times with water from the sample port (pouring the rinse water into the center of the tank). Once dosing was completed, tanks were recovered with insulation and were periodically monitored to track TRO concentrations using the reference method for TRO (see below, 
Section 2.12).
The first dose was held the longest (~3 h) prior to sampling. This time was needed to allow initial oxidant reactions with organic matter to subside, and TRO to stabilize at the first dose level (“D1”, <1 mg L−1 TRO). Once tanks were in target TRO ranges, sampling for the first dose occurred. Following this, another dose was added to each tank, allowed to equilibrate at 2–3 mg L−1 (“D2”), and the second sample set was collected from all three tanks. Following this pattern, the final two doses were added to achieve concentrations 4–6 mg L−1 (“D3”) and then 8–10 mg L−1 (“D4”).
  2.11. Workflow
Trials occurred on six separate days: For the first three days, we examined one salinity level at three different temperatures, each in separate tanks. The remaining three trials each used a single tank per day (
Table 1).
Nine people participated in the accuracy trials, as multiple temperature treatments were evaluated on a single day. Personnel conducted specific tasks at designated workstations to avoid crowding, cross-contamination, duplication of efforts, or accidental omission of tasks. Designated workspaces included three stations for TRO analyses (one dedicated to each tank), each staffed by an individual analyst. A designated analyst prepared doses and monitored TRO concentrations in the tanks using the reference method. At the time of sampling, three individuals simultaneously collected samples from the three tanks while another individual captured images from the instruments’ displays with a digital camera, which were the concurrent measurements of TRO. One individual processed samples for water quality analysis (dissolved and particulate organic matter). Finally, one individual oversaw the entire process and assisted others as needed.
  2.12. TRO Measurements
Instruments under evaluation were set to continuously measure TRO and display readings. The accepted Reference Method for TRO measurements is the EPA-certified Standard Method for measuring Total Chlorine (equivalent to total oxidizing capacity of the sample expressed as TRO) using the DPD method (4500-C Chlorine, residual; APHA 2023). We used instruments, equipment, and reagents designed for rapid, manual measurements (Hach colorimeters, Hach Co., Loveland, CO, USA). The methods—low, mid, or high range—used vary based on the expected ranges of TRO (
Table 2). In all cases, we use an electronic “colorimeter” to record measurements of TRO in mg L
−1. The handheld colorimeters—the Hach DR300 and the Hach Pocket Colorimeter II—were checked daily with certified standards appropriate for low, medium, and high range measurements.
The instruments and method performance are described in the documents in 
Table 2. In general, precision (95% confidence intervals) was within 10% of measured values. For low ranges, the sensitivity of the instrument was 0.02 mg L
−1 TRO. Procedures for checking calibration and analyzing samples followed these protocols, which are described briefly below:
- Daily, before sample analysis, we measured a blank (DI water) and three gel standards with concentrations along the detection range of each instrument (Hach, Hach Co.; Loveland, CO, USA). 
- For analysis, a specified amount of sampled water (10 mL for Low and Mid-Ranges; 5 mL for High Range) was transferred from the center of the bottle to a sample cell using volumetric pipettes with disposable tips. 
- Packets of pre-measured, dry powder pillows of DPD reagent for Total Chlorine (Hach # 2105669) were opened, then added to the vial. 
- After mixing for 20 s, the solution sat for at least three minutes (but no more than six minutes); all times were verified with a digital clock with both a stopwatch and a countdown timer. 
- Then, analysts placed the sample cell in the instrument and collected the reading (reported in mg L−1). 
We used two different instruments to optically measure the quantity of the indicator color: the Pocket Colorimeter II for low-range TRO concentrations and the DR300 Pocket Colorimeter for medium and high TRO concentrations. This procedure was repeated two additional times, so that each sample had three discrete readings. We used the average of the three readings as the overall reading of the sample bottle, and the variation among replicate readings was used to indicate data quality.
  2.13. Water Quality Monitoring
Temperature, conductivity, and salinity were measured using a multiparameter probe (YSI Pro DSS; YSI, Inc., Yellow Springs, OH, USA). Readings were collected from each tank at the start of the day and at the time of each sampling event (rinsing the probe between tanks). Ancillary measurements were performed to monitor water characteristics over the course of the experiments; method short descriptions and references are shown in 
Table 2. Dissolved and Total organic carbon (DOC and TOC, respectively) and total suspended solids (TSS) concentrations were measured by collecting 4 L grab samples from each tank at the start and end of the trial day. Briefly, we measured TOC and DOC (following filtration) via high-temperature combustion and infrared detection (SM5310 [
11]). We used gravimetry to measure TSS (EPA-NREL 160.2 [
12]). Details of both methods are available through the laboratory performing the analyses (
http://www.umces.edu/nutrient-analytical-services-laboratory, last accessed on 9 May 2025).
  2.14. Test Quality Control
All test activities were pre-planned, documented, and distributed to all test personnel before initiating the evaluation. The representatives from each TRO instrument manufacturer reviewed and then approved the test plan, as did a technical advisory committee and all personnel involved in testing. An independent, technical systems auditor also reviewed the test plan and all documentation produced for these experiments. The auditor observed several days of testing to verify that the tests adhered to documented plans and procedures. Several analyses and operations were defined in standard operating procedures, and personnel were trained in these procedures, as required. Each of the TRO instrument vendors sent representatives to the test site, and during their visit, the representatives verified that the instruments were operating and integrated into the test platform. Vendors provided a brief, in-person training and demonstrations to the test personnel to ensure that the instruments were maintained and used as designed.
To standardize sampling, we produced a “Work Instruction” that described the sampling process in detail, such that those involved in collecting water samples or instrument readings did so in the same manner as others. Prior to the actual tests, the test team practiced the procedures in several preliminary experiments (“mock trials”). For these mock trials, we conducted tests following the protocol for measuring accuracy but used municipal water rather than DI water. Data from these trials were not used except to verify that TRO measurements from multiple analysts agreed and that variation among repeated measurements fell within acceptable limits. We considered the completeness of data collected (e.g., the frequency of missing measurements) as an indicator of data quality.
Missing data occurred only when one instrument did not appear to be responsive to changing TRO concentrations. In this case, we performed troubleshooting following the manufacturer’s recommendations. Changing the reagents solved the issue. Nevertheless, some trial data were not available for this instrument.
  2.15. Data Analyses
For all trials and treatment measuring accuracy, linear regression was used to compare sets of measurements from the instruments and the reference method. Calculations were performed using statistical software (SigmaPlot V12.5; San Jose, CA, USA), which used an iterative, least-squares approach to determine the optimal slope and intercept values for the data. The equation 
was not forced through the origin, where both the TRO instrument and the reference method would measure 0 mg L
−1 TRO. Regression results yielded both the slope (
m) and the intercept (
y-intercept) of a line-of-best-fit as well as the standard errors of each measurement. A perfect linear relationship would yield 
m = 1, so a one-sample 
t-test (two-tailed; 
α = 0.05) was used to determine whether the calculated 
m differed significantly from the hypothesized value, where the null (H
0) and alternative hypotheses (H
A) are defined below:
The 
y-int indicates the offset between the observed relationship and a predicted relationship where the regression line passes through the origin (0 mg L
−1). Likewise, a one-sample t-test using the y-intercept and its standard error evaluated the hypotheses:
We calculated the magnitude of the offset in cases where the y-int differed significantly from zero. The relative importance of the offset is best observed when expected concentrations are 0 mg L−1, as measured in TRO-free controls. For example, when the y-intercept is positive and significantly greater than 0 mg L−1, the TRO instrument predicts TRO is detectable in cases where TRO is not present (a false positive). Additional analyses included analysis of variance (ANOVA), which we used to compare whether temperature or salinity significantly affected the slope of the regression equations.
As part of our plan to assure test quality, we collected samples to measure the total suspended solids (TSS), organic carbon, both dissolved (DOC) and total (TOC), and pH. We also kept water quality sondes in the tanks throughout the day of testing.
  3. Results and Discussion
  3.1. Accuracy
Details of the test results, including results from specific vendors, are found elsewhere (Alliance for Coastal Technologies [ACT], Technology Evaluations, 
https://www.act-us.info/evaluations.php; last accessed 1 April 2025). These include ancillary data not addressed here. As noted above, we omit the names of the instruments and avoid comprehensive comparisons among the TRO sensors. Rather, our comparisons focus on the metrics for comparing a TRO sensor to a standard reference method. Like our description of this study’s methods, we include justifications for our analysis of results with the aim of aiding other test laboratories that adopt and implement these methods. For accuracy trials, DPD values within the method detection limits ranged from (in mg L
−1): 0.01 to 0.07 (No Dose); 0.14 to 0.53 (Dose 1); 1.42 to 3.39 (Dose 2); 3.7 to 6.7 (Dose 3); and 7.5 to 9.9 (Dose 4).
Figure 3 shows results from one instrument for accuracy trials performed at various temperatures and salinities. The solid lines on the plots show the lines of best fit resulting from a linear regression analysis. For quantifying accuracy, linear regression analyses yield several important metrics:
 - The regression coefficient (R2) is the strength of the relationship between measurements from the TRO sensor under test and the reference method. The line fit increases as R2 approaches 1.0, the perfect fit of a linear relationship to the data. 
- The slope (m) of the line-of-best-fit indicates proportional changes in the measurements of the reference method (the independent variable) to the TRO sensor (the dependent variable). Slopes indicate the TRO sensor will over-(m >1) or underestimate (m < 1) the readings from the reference method, and the deviation increases as the TRO concentration increases. 
- The y-intercept of the line-of-best-fit indicates the offset, or the value reported from the TRO sensor when the reference method measures 0 mg L−1 TRO. 
Overall, linear regression analysis yielded R2 values > 0.98. However, in one case, an instrument’s readings appeared to be unresponsive to changes in TRO concentrations. Following the troubleshooting guidance in the sensor’s manual, we determined that the reagent supply for that instrument was contaminated or degraded. The instrument, when resupplied with new reagents, returned to measuring values in line with the known dosage, the other units of that same model, as well as other TRO sensors.
Table 3 reports the detailed results from linear regression analyses for one TRO sensor (Instrument #3). Slopes of the lines-of-best-fit ranged from 0.717 to 0.962 for this instrument, and slopes all differed significantly from 1.0. Restated, we rejected that null hypothesis (H
0 = 1), as slopes were significantly different and lower than 1.0 (
t-test, α = 0.05). The y-intercepts ranged from −0.194 to 0.005, and in several cases, y-intercepts were 
not significantly different from zero (
Table 3, highlighted values).
 For context, analytical devices rely upon linear relationships between known (standard) quantities and the quantity measured by the sensor. The sensors quantitize the measurement—whether, e.g., optical (fluorescence or absorbance at a peak wavelength) or electrical (voltage, resistance)—on a scale representing the range of potential values. For example, a 12-bit digital scale (212) will have 4096 discrete values, so any measurement is assigned to a bit value on this scale. The sensor’s calibration sets the relationship between the concentrations of reference samples and the bit value associated with its measurement. In this context:
- Strong, linear relationships between the TRO sensor and reference method indicate that the sensor is responsive to changes in TRO concentrations and consistent when quantizing similar quantities of TRO. 
- The slopes’ differences from a 1:1 relationship with the reference method may also reflect the conditions used to set the instrument’s calibration (e.g., the reference standards used as known values). 
Many analytical sensors require frequent calibrations to set or verify the relationship used to report meaningful, rather than relative, units. A sensor’s calibration may require multiple standards (e.g., a pH meter that requires solutions with pH values of 4, 7, and 10) or a single, known standard to verify its reported reading is within an established tolerance. Frequent calibrations can adjust for “sensor drift”—a term that encompasses both degradation of the physical materials within the sensor and the environmental conditions that (even if slightly) bias the measurement or quantitization. For example, changes in LED light intensity (whether due to LED output fluctuations, subtle variations in electronics, or turbidity of the sample water) may impact the accuracy of TRO sensors [
13].
Given our reference method best captured the true concentrations of TRO in sample waters, slopes < 1 indicate that the test instruments underestimate concentrations, and the magnitude of the underestimate increases as TRO concentrations increase. For these instruments, dose thresholds are likely exceeded (e.g., if a target dose is 8 mg L−1 TRO, actual doses may be ~10 mg L−1 or higher, depending on the calculated slope). At low concentrations (<2 mg L−1 TRO), the gap between measured and actual dose for neutralized discharge narrows, but these small differences between the true and measured concentrations are critical for determining compliance with discharge limits for TRO.
At low concentrations, the y-intercept of the line-of-best fit gives the value reported by the sensor when the reference method reports “zero”, indicating concentrations are below the method detection limit. For the highlighted cells in 
Table 3, y-intercepts were indistinguishable from zero, but several were significantly below zero. For these, the offset is the solution to the equation for the line-of-best-fit:
For a line with y-intercept of −0.194 and a slope of 0.809, a zero value for the reference method (TRO = 0 mg L
−1) would yield a measurement of 0.23 mg L
−1 TRO:
A comprehensive survey of TRO discharges from ships’ oxidant-treated ballast found that 10% of ballast discharges exceeded the 0.1 mg L
−1 limit for TRO [
14]. This survey, conducted during shipboard BWMS commissioning tests, observed a general decrease in exceedances over time (from a high of 50%). Ship owners undertake commissioning tests early in BWMS service lives, so the initial tests of shipboard TRO instruments can act as a reference point to track long-term performance.
Figure 4 displays the ranges of slopes and y-intercepts for the series of trials for all three TRO instruments. In this evaluation, we simply aimed to qualify that the sensors could operate at all three salinities, and in all cases (except the instances where the trial data were nullified due to problematic reagents), the instruments yielded significant linear responses to changes in TRO concentrations in all water types. Observed together, the TRO instruments showed no consistent, qualitatively observable trends in performance based upon temperature or salinity (Two-way ANOVA). However, the lack of replicate trials in this study could not sufficiently resolve subtle differences in performance for any single instrument.
   3.2. Precision
The coefficient of variation (CV) is the relative magnitude of the variation in a set of values normalized to the central value, or the standard deviation divided by the average of the measurements. Here, we used CV as a metric of measurement precision. 
Figure 5 shows the readings from all three sensors and the reference method. The mean (±standard deviation) of 12 measurements from the reference method was 3.45 (±0.09) mg L
−1; its CV was 2.58%. Consistent with the results of the accuracy trials, the sensors generally reported smaller values for TRO, and CVs ranged from 2 to 4%.
  3.3. Value of Formalized Tests of Shipboard TRO Instruments
Monitoring ballast water on ships presents two distinct challenges: First, the physical, chemical, and biological properties of ballast water represent the ranges of conditions of navigable surface waters. TRO instruments must operate along this spectrum of conditions. Second, shipboard instrumentation must work reliably in challenging environments. They experience continuous vibrations from the engine and atmospheres with high temperatures, humidity, and (potentially) volatile organic compounds. Several studies examined the performance of TRO instruments in laboratory or shipboard environments (e.g., [
15,
16,
17]). These few studies are not proportional to the magnitude of the potential impacts of ships that use oxidants to treat ballast water. The US alone receives ballast water from more than 10,000 unique ships each year, with combined annual discharges of 100s of millions of cubic meters [
18]. Though oxidant-based BWMSs represent only a portion of the volume treated, the risks of undertreatment or exceeding discharge limits for TRO can still be impactful. A survey of ballast water discharges over two years found the frequency of exceedances decreased over time, but still ~10% of the discharges exceeded the maximum allowable discharge concentration of 0.1 mg L
−1 TRO [
14]. Formalized tests of TRO instruments—both laboratory and shipboard—will help ensure both treatment doses and discharge levels are met. Testing near the discharge limit is especially important for adjusting the delivery of TRO neutralizers (e.g., sodium thiosulfate). The measurement of TRO near the 0.1 mg L
−1 discharge limit is challenging, so future tests should aim to resolve values near the limits of detection. This report aims to add to proceeding studies and provide a template for formalized, laboratory-based evaluations.
  3.4. Recommendations for Future Evaluations
The trials described herein aimed to verify the basic performance of TRO sensors in simplified test waters, with temperatures and salinities, and other potential interferences (e.g., dissolved or particulate matter) tightly controlled. When installed on ships, TRO instruments will experience wide ranges of water characteristics that influence performance (e.g., [
17]). Additional tests and considerations for performance fell outside the scope of this study. Yet, these characteristics warrant examination. Characteristics include:
These trials evaluated TRO instruments provided directly from factory certification, but TRO sensors are integrated into BWMS and are likely only serviced infrequently (or following operational problems). Stability over months (or years) will set the instruments’ lifespans (e.g., [
19]) or service and maintenance intervals. A test of long-term stability could determine sensor drift or reagent shelf life over various storage and usage conditions.
Related to stability, reliability reports on the proportion of the time (or the percentage of measurements) where the instrument performed within requirements versus the total time or measurement count. Reliability is reduced when the instrument requires unexpected maintenance or fails to meet its indicators of data quality.
Shipboard TRO sensors face (potentially) high temperatures and vibration from the ship’s engine, but they are also in continuous contact with natural ambient water, and surfaces accumulate fouling from both microorganisms and organic matter and minerals. For amperometric-based TRO sensors, this includes potential fouling of the pH sensor that delivers the pH measurements required for accurate TRO measurements. Additionally, TRO instruments using reagents (e.g., DPD) will likely have stocks stored on the ship and (depending on the storage conditions and powder or liquid form) may degrade at accelerated rates.
While temperature and salinity affect chemical reactions and electric measurements, other factors will also interfere with the TRO instruments. For example, highly turbid waters may decrease light transmission, reducing the quantity of light through optical cells. Organic matter and proportion of salt ions can interfere with measurements of electrical current. BWMS operate throughout the world’s ports, so these TRO instruments will encounter a wide range of dissolved and suspended materials.
For BWMS using oxidants to treat ballast water, TRO measurements verify the oxidants meet treatment target doses (i.e., >12 mg L
−1; [
20]) as well as discharge limits (0.1 mg L
−1). TRO instruments, therefore, must demonstrate their performance to maintain confidence in their estimates over their service lives. Infrequent, ad hoc, or informal calibration checks are likely insufficient without additional safeguards or data quality indicators. BWMS using oxidants may employ redundant sensors or logical checks within their control systems. For example, is the TRO concentration responsive to increases in chlorine or ozone generation? Does the relative rate of oxidant injection produce similar TRO concentrations (for comparable water temperatures and salinities)? Shipboard testing, as well as frequent, routine monitoring, will identify problems in TRO monitoring unique to the tasks of shipboard BWMS (versus municipal water treatment, industrial settings with stable or well-characterized waters, etc.). Once known, the issues can be addressed. For example, Lee et al. [
13] recently developed an algorithm to account for variations in a TRO sensor’s electronics and the turbidity of sample water. This and other innovations will improve BWMS performance, and, in turn, help safeguard aquatic environments from ballast-borne bioinvasions and TRO exceedances in discharged water.