Use of Multiple Temperature Logger Models Can Alter Conclusions

: Remote temperature loggers are often used to measure water temperatures for ecological studies and by regulatory agencies to determine whether water quality standards are being maintained. Equipment specifications are often given a cursory review in the methods; however, the effect of temperature logger model is rarely addressed in the discussion. In a laboratory environment, we compared measurements from three models of temperature loggers at 5 to 40 °C to better understand the utility of these devices. Mean water temperatures recorded by logger models differed statistically even for those with similar accuracy specifications, but were still within manufacturer accuracy specifications. Maximum mean temperature difference between models was 0.4 °C which could have regulatory and ecological implications, such as when a 0.3 °C temperature change triggers a water quality violation or increases species mortality rates. Additionally, precision should be reported as the overall precision (including a consideration of significant digits) for combined model types which in our experiment was 0.7 °C, not the ≤ 0.4 °C for individual models. Our results affirm that analyzing data collected by different logger models can result in potentially erroneous conclusions when <1 °C difference has regulatory compliance or ecological implications and that combining data from multiple logger models can reduce the overall precision of results.


Introduction
Water temperature influences aquatic ecosystem function from the cellular level through landscape-scale processes [1][2][3][4]. Due to its importance for aquatic organisms and ease of measurement, water temperature is used as a measure of habitat impairment for regulatory monitoring [5,6] and often is included in ecological studies of aquatic systems and their biota [7,8]. Imprecise measurement of changes in water temperature due to land use changes (e.g., logging) or other anthropogenic activities such as effluent discharge could lead to regulatory compliance problems. Additionally, imprecise measurements could hinder our understanding of a species' thermal requirements and responses to temperature changes which could result in deleterious consequences in fields such as aquaculture, dam management, and species conservation. Recent attention on regional-scale stream temperature modeling and its application in predicting response of water temperature to projected changes in climate further highlights the value of water temperature data [7][8][9][10][11].
Digital data loggers are now commonplace in stream monitoring and researchers must consider numerous potential sources of error when applying data from these loggers to address regulatory compliance or ecological questions [12,13], including the number and spatial arrangement of loggers [14,15], physical placement [16], logger capabilities and limitations, what questions the data will be used to address, and statistical application of the acquired data. Methodological considerations become especially important when temperature data sets from multiple sources are compiled across space and time to address a singular objective (see Terando et al., [17] for air temperature biases associated with solar shields). Often these datasets contain data from different logger models which may have varying accuracy levels and responsiveness and applied in a field setting using different approaches (e.g., positioned in shallow, fast-flowing versus deeper, slower flowing waters) [10,[18][19][20]. Lack of standardization may produce unusable data or conclusions based on measurements that are biased or lack adequate precision and accuracy to address a given research question or regulatory compliance monitoring requirement.
This study examined whether inaccurate conclusions might be made using temperatures recorded from different logger models and to what extent using different logger models would alter the overall precision (i.e., proximity of measurements to each other [21]) and accuracy (i.e., proximity of measurements to the actual value [21]) of water temperature data recorded. We are aware of no published work that compares the measurements of multiple temperature logger models in water. Our objectives were to determine: 1) if mean recorded temperatures differed among logger models, including those with similar manufacturer specifications, and 2) whether overall accuracy and precision of each model differed from manufacturer's specifications. We used those results to discuss limitations of using datasets compiled from multiple logger models and recommend approaches that address variation among temperature loggers when reporting and analyzing temperature data.

Materials and Methods
We selected three digital temperature data logger models offered by a single company, which we found through a literature review to be frequently used in freshwater systems (Onset Computer Corporation, Bourne, MA, USA: Pendant, Pro v2, and TidbiT v2; Table 1). By using equipment from the same manufacturer, we were able to directly compare models tested with the same specification protocols because these protocols are not standardized among manufacturers. We examined accuracy and precision by submerging nine of each logger model simultaneously into an isothermal water bath contained by a 38-L glass aquarium where the loggers remained submerged and free to circulate for the duration of this experiment. Loggers recorded temperature at one-minute intervals and were held for 10 minutes at each of eight target temperatures (5,10,15,20,25,30,35, and 40 °C). The 10-minute period was initiated after the reference thermometer had registered the target temperature for a period of several minutes. Manufacturer specifications for all three logger models stated that loggers would achieve 90% of the true temperature within five minutes. Because the loggers remained in the water bath as the temperature was adjusted, the loggers would have been registering the increasing temperature rather than being exposed to an abrupt change. Only observations from the last five minutes at each target temperature were used for analysis. We refer to this five-minute time frame as the test period. Temperature was also recorded at each one-minute interval from a reference thermometer of higher resolution and accuracy than the logger models (Thermoworks Model #222-555; National Institute for Standards and Technology traceable certified accuracy ±0.05 °C, resolution 0.01) that was placed in the middle of the loggers. This reference thermometer's readings stabilize within a few seconds as per manufacturer specifications. Target temperatures were obtained using a combination of a temperature-controlled chamber and a 1000 W heater with a digital controller. A powerhead pump (689 lph) ensured mixing of water to prevent thermal pockets. We compared the extent to which precision and accuracy of temperatures differed by logger model and precision for the entire combined dataset across each target temperature. Logger precision was calculated as the difference between the maximum and minimum temperature recorded by all 9 loggers of a particular model at each target temperature Accuracy was assessed by calculating the difference between the mean temperature measured by the reference thermometer and by logger model for each target temperature. An overall precision was calculated based on the difference between maximum and minimum temperatures measured by all 27 loggers from the test period for each target temperature.
Next, we tested whether mean temperature at each target temperature differed between logger models by running a repeated measures ANOVA as a mixed model in R using the lme4 package version 1.1-21 where loggers were grouped by model and target temperature (fixed effects), each individual logger was a subject (random effect), and temperature records from the last five minutes of observation at a given target temperature were the repeated measures. Mixed models are statistical tests that contain both fixed and random effects and are useful when measurements are not independent, such as in repeated measure designs [21]. We used an alpha value of 0.05 to test for significance. We further investigated differences in means among different models of loggers post hoc using least mean squares with a Bonferroni correction (α = 0.017 [21]).

Results
We were able to maintain water temperature within ±0.2 °C of targeted temperatures except for the lowest temperature of 5 °C which was cooler than the target by a maximum of 1.24 °C and at 20 °C a maximum of 0.54 °C cooler ( Table 2). The most accurate loggers across all target temperatures were the Pro v2s, which never varied from the reference thermometer by more than ±0.11 °C ( Figure  1). Tidbit loggers were the next most accurate (≤0.18 °C from true temperature), followed by Pendant loggers (≤0.39 °C from true temperature). The manufacturer specifications for these three logger models provide accuracy estimates plotted by target temperatures and also as an overall accuracy estimate [22][23][24]. For all models, our accuracy estimates were similar to those plotted by the manufacturer at the same target temperatures [22-24] and were higher than the overall accuracies reported in manufacturer specifications (Table 1). Accuracy generally decreased as temperature increased which is also reflected in the manufacturer specifications (Table 1), but this was confounded by less stable water temperature during the highest temperature interval as reflected by the increased range in temperatures recorded by the reference thermometer (Table 2). There was an unexplained decrease in accuracy at 20 °C across all loggers. Because this decrease in accuracy was also reflected by the reference thermometer, we are attributing this to instability in water bath temperature. All loggers overestimated temperature across target temperatures, except for the Pro v2 loggers, which underestimated temperatures for four of the five observations at 10 °C, three observations at 25 °C, and two observations at 30 °C (Figure 1). However, none of these temperature estimates were ever off by more than 0.03 °C which is within the margin for error of the reference thermometer. The rate of change between temperature measurements was highest during the initial 5-minute period prior to our test period. Rates of change remained within 0.05 °C during the test period with a few exceptions at the warmest target temperatures (Figure 2). Table 2. The temperature range (°C) recorded by a reference thermometer and the maximum discrepancy (maximum temperature-minimum temperature; °C) in recorded temperature among digital data loggers of the same model at specific target temperatures. We used maximum discrepancy to measure precision. Standard deviation (SD) of recorded values is provided in brackets.   Mean water temperature differed among the three logger models across all target temperatures (F(2,24) = 53.03; P < 0.001). Based on the mixed model and a pairwise comparison, the temperatures recorded by the Pendant loggers were significantly higher than Prov2 by 0.16 °C and Tidbits by 0.15 °C (P < 0.001).

Maximum
Logger model precision was calculated per target temperature as the range in temperatures recorded for each logger model and separately with the data from all models combined ( Table 2). The range between maximum and minimum temperatures within each logger model was <0.3 °C at temperatures below 35 °C, and the largest observed discrepancy was 0.4 °C for the Pro v2 at the 40 °C target temperature. When measurements per target temperature were combined from all logger models, the maximum discrepancy was 0.67 °C at the 40 °C target temperature. Across all other target temperatures, the discrepancy for the combined data sets was 0.38-0.52 °C.
Standard deviation (SD) for each logger model was ≤0.08 at temperatures below 40 °C, but when temperature measurements were combined, SD was ≥0.09 across all target temperatures with a maximum SD of 0.15 at the 40 °C target temperature. Temperatures measured by the reference thermometer were not constant at a given target temperature, but only varied more than 0.20 °C at the highest target temperature ( Table 2).

Discussion
Our findings indicate the use of a variety of logger models in a study may alter the conclusions. Although the differences we detected between logger model measurements may seem small or biologically insignificant, these differences may be important in situations when a < 1 °C change in temperature has regulatory implications [5,6,25] or biological relevance such as near-freezing temperatures, initiation of reproductive and behavioral responses, or determination of thermal tolerance [26].
Environmental regulations associated with water quality address temperature as well as contaminants. In the Pacific Northwest, environmental quality regulations state that human instigated changes in water temperature of >0.3 °C in specified streams and lakes can trigger noncompliance [5,6]. If Pro v2 loggers were used to provide initial temperature measurements and Pendant loggers were used in subsequent measurements, an actual change of 0.15 °C might result in a non-compliance determination because these loggers differed by 0.16 °C when held in the same environment. Conversely, if the order of logger model used was reversed, an actual compliance violation could be overlooked.
Small temperature changes may also result in substantial, and sometimes detrimental, biological impacts. Hartman et al. [27] observed two to four times greater mortality rates of crayfish in acid mine drainage sites in water that was 0.3 to 1.2 °C warmer than at other sites. An increase of 1 °C in summer mean of maximum daily air temperature delayed spawning of brook trout (Salvelinus fontinalis) by one week and resulted in fewer redds [28]. Subspecies of Largemouth Bass (Micropterus salmoides) vary in their sensitivity to lethal maximum temperatures by less than 1 °C [29]. Under projected climate change scenarios, with every 1 °C rise in stream temperature, Smallmouth Bass (Micropterus dolomieu) are expected to increase growth by over 5% with a corresponding increase in consumption of over 25% [30]. A study of ecosystem metabolism found that freshwater ecosystems are likely to sequester about 2% less carbon for every 1 °C rise in water temperature [31]. These small temperature changes, which have important biological implications, might have been overlooked if the researchers had used multiple logger models, that exhibited the nearly 0.5 °C range of values we observed.
Although we anticipated that an accuracy assessment of the logger models would support manufacturer claims, it was necessary to first verify whether any individual loggers were faulty before using those for this project. Our measurements indicated that the accuracy of each logger model was higher than the overall value reported in manufacturer specifications. Similarly, Hubbart et al. [32] tested the accuracy of a different digital temperature logger (Thermochron iButton, Maxim/Dallas, Dallas, TX, USA) and found it performed within the manufacturer stated accuracy level of ±1 °C. The VEMCO Minilog TX temperature logger (VEMCO Limited, Nova Scotia, Canada) also performed within manufacturer accuracy specifications [33]. Although we did observe greater variation in measured temperatures at the warmest target temperatures, because the reference thermometer also documented greater variation at the warmest target temperatures, the elevated variation may be an artifact of unstable water temperature rather than increased measurement error. This led us to conclude that like the temperature loggers tested by Hubbart et al. [32] and Alexander and McQuarry [33], the Pendant, TidbiT, and Pro v2 all performed within the manufacturer specifications. Given our confidence in manufacturer stated specifications, concerns about accuracy and precision can focus instead on proper use, reporting, and interpretation of collected data.
Although it should be a standard practice to incorporate equipment capabilities in project design, reporting, and interpretation of findings, we have noted inappropriate applications and reporting for temperature loggers in peer-reviewed publications. We have observed that accuracy but not resolution specifications tend to be reported in the methods and neither considered in the results or discussion. Resolution is important because it dictates the degree of temperature change that is registered by a temperature logger. In addition, it is not uncommon to see the number of reported significant decimal places exceed those of the equipment capabilities, which gives a false sense of precision. In our study, the Pendant loggers were the least accurate (±0.53 °C) with the lowest resolution (0.14 °C) and thus, a study that combines measurements from these three models should report no more than two decimal places for temperature measurements, be cautious interpreting differences in temperature ±0.53 °C, and realize that the Pendant logger may not register a temperature change <0.14 °C.
The lag time for stabilization of temperature measurements also may need to be a consideration in project design. Typical applications of electronic data loggers for stream research involve collecting water temperature at hourly intervals but may range from a few minutes to several hours [10,34,35]. However, logger settings allow for measurements to be taken as frequently as one second apart. Applications where this lag time would have been problematic include tow-behind temperature monitoring in streams to assess spatial variation of temperature [36] and laboratory studies of aquatic organism temperature tolerance where temperature change can be as high as 1 °C per minute [37]. Furthermore, in long-term studies, another important consideration is the amount of drift which may occur which for the models used for this study was 0.1 °C/year. Before using multiple logger models in a study, researchers should carefully consider the limitations of specific logger models relative to the objectives and necessary accuracy, precision, and response times because combining data from various logger models will reduce precision of results. Although combining such datasets is not likely to be problematic for studies making broad generalizations for water temperature (e.g., Isaak et al. [38]), when regulatory decisions and ecological affects are attributed to changes in temperature as low as 0.3 °C, the use of multiple logger models may affect conclusions.