Correction: Redmond et al. Evaluating the Effects of Novel Enrichment Strategies on Dog Behaviour Using Collar-Based Accelerometers. Pets 2025, 2, 23
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript describes a behavioural study investigating dogs’ activity when exposed to food, tactile, or olfactory enrichment. It is generally good, but I have two major criticisms:
- The paddocks were close together. Given what we know about how well dogs can smell, it seems unlikely that they would not have had any olfactory exposure when in the other paddocks, so there is a potential confound here. Furthermore, there will necessarily be an olfactory component to the other enrichment types (and, indeed, a tactile component to all three, too), so the overlap between the stimuli should also be highlighted. The authors argue in Discussion L420 that sniffing behaviours may have been misclassified by the algorithm, but it’s also possible that the dogs just enjoyed sniffing the food enrichment more than the olfactory enrichment. Perhaps our human-oriented categorisation of these enrichment types is different from how dogs would classify them based on their perceptual abilities, thus rendering the separation of the categories into ‘tactile, olfactory, and food-based’, artificial. This is not a dealbreaker because it is, of course, humans who are providing enrichment to captive dogs, so there is practical value in this categorisation. Nonetheless, this should all be mentioned as a limitation in the Discussion. It should also be justified in the methods that these enrichment types were categorised as they were even though it is impossible to fully isolate sensory stimuli in a naturalistic setting.
- According to the Methods section, the algorithm classified behaviours at an acceptable accuracy level, but how was this determined? Was there some manual coding of part of the dataset to confirm accuracy? Was another computer model somehow used to validate this model? If the validation was only done by another computer program, I would strongly recommend that the authors go back and code a subset of the data to ensure that the model agrees with expert coding.
Abstract – fine. Well done
Intro – Generally good but there’s a minor logic flow issue in the paras describing the different types of enrichment. In L36, it says that food-based enrichment effectiveness can vary, but then there’s no more info about food-based enrichment until L52, which briefly mentions food-based toys and games for dogs. I suggest restructuring slightly to make it flow better, and adding a bit more info about the mixed results, perhaps focusing on the measures used. That will strengthen the argument that more research is needed in this area, and specifically with accelerometry.
Methods
I suggest adding the habituation and baseline data collection periods to Figure 1. Also, for L89 which mentions the habituation period, specify whether this is habituation to the paddocks where data collection occurred and/or the accelerometer. As written, it could be either, or both, but the location and the accelerometer are both described later, which is confusing. A brief mention should be made here as well.
L102 – please provide some examples of what you looked for when ensuring that the dogs didn’t mind wearing the Actigraphs. What would have constituted an ‘adverse response’?
L104 – mentions ‘the six dogs that participated in the experimental phase’. This implies that there were more than 6 dogs but some of them didn’t get through to the experimental phase. Is this correct? If not, please rephrase. If it is correct, please provide details about how many dogs were intended to participate and why they were excluded.
L155-173 would be better under ‘analysis’, as it’s part of the data cleaning process.
L189-190 – a tendency or trend is just non-sig. Effect sizes would be an appropriate way to determine whether an outcome has any real-world meaning, regardless of p-value. But the alpha level of the p-value should be either significant or not. If the authors believe an alpha level of 0.1 would be more appropriate, this should be justified.
L190 – I assume that using Kruskal-Wallis and Spearman’s tests meant that those data were non-parametric? Please confirm this and whether ODBA was parametric since they were put into an ANOVA. Also, I believe Wilcox should be Wilcoxon (L192 and onward).
L191 – how was ‘interacting’ defined in this context? What did the dog need to be doing for their behaviour to count as an interaction?
L201 – eating and drinking were excluded from analysis due to low occurrence, but one of the enrichment activities was food-related. This seems like a discrepancy. Please justify this decision. Even though the occurrence was low, food consumption seems like an especially relevant outcome variable in this study.
Results
All the KW test results should be reporting the chi-square values in addition to the p-values, especially for the significant results.
Fig 4 – please confirm the caption, where it says that the olfactory rho = 0.62, p = 0.42. The rho value seems high given the slope of the regression line. It may be a function of the way it’s presented in the figure, but please check.
L230 – should it be P > 0.05? It’s saying there was no difference between the enrichment types
Discussion
The abstract and intro’s aim statement makes sense, but the one in the Discussion is strangely phrased (L360-362). How can an enrichment’s ‘efficacy’ be evaluated? In this study, only the enrichment’s effect on dog behaviour was measured, so that needs to be rephrased in this instance.
Since the important aspects of this study are the results/dog behaviour, rather than the methods used, I’d recommend moving the summary of the results (e.g., L381-388) up above the paragraphs that talk about Actigraph and the machine learning algorithm (i.e., L363-380). Or, if that will throw off the flow too much, add a sentence or two at the end of the first paragraph quickly summarising the results.
L392 – ‘dogs were most active when given the food enrichment…’ I thought the dogs were most active during baseline.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsIt seems that there are already publications on this subject. I have several comments:
- First of all, it is a bit complicated written. Discussion is too long and should be shortened.
- It is stated that 6 dogs are used, but they are used in pairs and could have influenced each other in this experiment, meaning that n=3 and not 6. That is a very small number for a study especially when large variation is expected which is common in behavioural research..
- Completely unclear for me is the description of the accelerometers which should cover 9 behaviours but is nowhere explained! Machine learning is explained nowhere and I do not understand what is meant. Maybe when you have read other papers on this subject, but I haven't..
- Dog breeds like Huntaway never heard of. It should at least explain a bit about these dogs...
- Do these accelerometers measure all these behaviours? but how? And how do you read these?
- Line 136 Why soiled chicken litter as olfactory clue? Are dogs attracted to this smell or not? Puzzling...
- Line 162 Belle= Belvedere?
- Line 162 2 out of 3 pairs did not generate data from day 10! so 1 pair left, n=1......
- Assuming knowledge on these devices, which I don't have
- Line 374 Error in the sentence..
- Fig 4 shows a high variation. And that with n=3....
- One "the"should be left out
- Table 2 Food scored higher than olfactory? Not in agreement with the findings described earlier?
Author Response
Please see attachment.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsImproving the welfare and QoL of dogs in shelters and laboratories is indeed important. Thus the paper is worth publishing
However the design in my opinion has some shortcomings which at least need to be discussed insofar as cannot be improved, or in cases where improvement is possible should be done:
a) the behavioural categories are very coarse: "bark": which type of barking?? see Taylor/Reby, Morton etc regarding motivation behind vocalizations in mammals; locomotion : walk, trot, or gallop; sniffing : which nostril (Siniscalchi re motivation); grooming: autochtonous or displacement eg in scratching...
b) saliva cortisol would have been another additional measuremenTparameter
c) statistics: please use boxplots, medians and quartiles instead; why do you use ordinal statistics ((K-W, Wilcoxon) for metric data?? randomization test, especially for small samples, would be stronger; Why only FIVE instead of at least SIX days iof test duration (for nonparametric statistics n =/> 6 is considered minimum; you use data sets for repeated testuing so you need a correction e.g. Bonferroni?!; please give exact p-values instead of < 0,.005 etc wherever possible
c) some improvements for content: Please give at least some data on statistics (tests, p-values) in Abstract
please explain what the "forest test" is - this paper may be very interesting to practinioners who are not so familiar with modern statistics
did you change the ropes everyday, or only for each new pair of dogs??
l 259: you cannot determine whether "hour of day" has an influence if you start same hour each day. Just that a decrease over time occurs
l 312: the verb is missing
l 507 ff: no content given
Author Response
Please see attachment
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript has improved. I still have some doubts about the sample size. I am still not pleased with the referrals to previous papers...
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThanks for your clarifications, and for attending to some of the remarks. However I still do have some objections:
Regarding the courseness of the behavioural categories, in my opinion a method of behavioural recording that violates basic ethological principles (e g different types of barking) has a very limited value for research. Please discuss this in the section on limitations of your method and give at least some examples which you find can be addressed by thismethod nevertheless, to demonstrate to the readership why you find it worth reporting despite these limits (please refer to biological, not technical value)
Concerning the statistics: Randomization tests are specifically for metric sdata of non parametric conditions!
and with regard to boxplots: I leave this to the academic editor but my position is that effects that do not clearly show in the statistically more apprpriate diagram seem questionable in general.
regarding cortisol/saliva: Please again mention this shortcoming in the limitations section. The method is still valid without hormones but the limits need to be mentioned
If your view is upheld by the editor please again at least discuss this in the "limitations" section.
Concerning the abstract: In my opinion it would be better to shorten the rest of the abstract and include some results instead. But I leave this decision to the academioc editor
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
