Concurrent and Predictive Criterion Validity of a Puppy Behaviour Questionnaire for Predicting Training Outcome in Juvenile Guide Dogs

Simple Summary The ability to predict later success in guide dog training can be of great benefit to assistance dog providers, such as those providing guide dogs, to ensure maximum resource and production efficiency and to maintain high welfare standards. This study evaluated the predictive capabilities of a behaviour questionnaire (the refined puppy walker questionnaire, r-PWQ) completed by volunteer carers of puppies (puppy walkers) for dogs aged eight months of age. The r-PWQ includes traits such as “Distractibility” and “Excitability”, which are common withdrawal reasons for many guide dogs. The predictive validity of the r-PWQ was compared to a well-known behaviour questionnaire, (Canine Behavioral Assessment and Research Questionnaire—C-BARQ). The results show that the r-PWQ can be used to predict guide dog outcome and may be better suited to guide dog populations than the C-BARQ. Abstract Working dog organisations regularly assess the behaviour of puppies to monitor progression. Here, we tested the predictive validity (for predicting success in guide dog training) of a shortened version of a previously developed juvenile dog behaviour questionnaire (the refined puppy walker questionnaire, r-PWQ) and compared it with the Canine Behavioral Assessment and Research Questionnaire (C-BARQ). The r-PWQ is used by Guide Dogs UK, whereas the C-BARQ was designed for pet dogs and is used by some other guide dog schools internationally. A cohort of dogs aged eight months (n = 359) were scored concurrently on the r-PWQ and C-BARQ. Analogous traits between the questionnaires were evaluated for internal consistency and association with training outcome and compared for concurrent validity. The r-PWQ was associated with training outcome for five scales (r-Excitability, Trainability, Animal Chase, r-Attachment and attention seeking and Distractibility) and the C-BARQ for two scales (Excitability and Separation-related behaviour). There were significant correlations between analogous C-BARQ and r-PWQ trait scores (p < 0.001) except for Separation-related behaviour and questionnaire scales had similar internal consistencies. The r-PWQ may be more suitable to use with guide dog schools. However, due to the correlation between analogous scales (except for “Distractibility”) some scales could be substituted for one another when reviewing the behaviour of dogs between guide dog schools using different questionnaires.


Materials and Methods
Here, we detail the methods used to refine the original Puppy Walker Questionnaire (PWQ) into a shorter version for applied use within Guide Dogs, UK. Development of the original PWQ is described in full in [1]. In brief, the questionnaire was developed based upon previously published literature in consultation with Guide Dogs staff and puppy walkers, to capture behavioural scores for puppies in their first year of life that would be relevant to their personality and likely suitability as guide dogs. Following development, the final PWQ questionnaire contained 61 items asking puppy walkers to rate their dog's behaviour over the past month on a 100mm visual analogue scale with the anchors "Never" and "Almost Always" (original questions are listed alongside the results in Table S2). Of the 61 items, 20 were from the C-BARQ and were retained in the same scales with the same names; Excitability, Separation-related behaviour and Attachment and attention-seeking. The remaining questions split via reliability analysis into a miscellaneous group and five scales with acceptable internal consistency, inter-rater reliability and test-retest reliability over the three ages named: Trainability; Body Sensitivity; Distractibility; General Anxiety; Stair Anxiety and Energy.
In order for the PWQ to be of most practical use to Guide Dogs, the questionnaire was refined to reduce the number of questions asked to contain only those with potential for predictive associations with qualification/withdrawal, plus scales which demonstrate temporal consistency (an indicator of personality) but lacked predictive associations, for use in profiling.

Participants
The same data were used to refine the PWQ as were used to develop it originally. Guide Dogs puppy walkers (PWs) of dogs that turned five months of age between October and December 2012 (n = 311) were invited to complete the PWQ at three points during the first year of the dog's life; when dogs were aged five, eight and twelve months. After initially opting into the study, invitations to complete the PWQ were sent by post or email (at the puppy walkers request) two weeks prior to the date the dogs were due to turn five, eight and twelve months of age. Puppy walkers of dogs that participated in a behavioural test at the same three age points were also invited to complete the PWQ. In total, 276 dogs (130M/146F) had at least one completed PWQ. The dogs comprised eight breeds or crossbreeds (Golden retriever Sire × Labrador Dam, 105; Labrador, 65; Golden retriever, 30; Labrador Sire × Golden retriever crossbreed Dam, 29; Golden retriever × German Shepherd Dog, 24; German Shepherd Dog, 16; Labrador Sire × Golden retriever Dam, 5; Labrador × Labrador crossbreed, 2). The dogs were a mean age of 5.17 months (S.D. ± 8 days) for the first assessment, 8.17 months (± 7 days S.D.) for the second assessment and 12.04 months (± 12 days S.D.). For the purpose of this study, only dogs that had qualified as a guide dog or been withdrawn permanently for behavioural reasons were included in this analysis (Table S1). Dogs the entered the breeding program (n = 14), were withdrawn for health reasons (n = 16), transferred to other organisations (n = 3) or deceased (n = 1) were all excluded from this analysis.

Refining the PWQ
In the original published PWQ [1], scales that came from the Canine Behavioral Assessment and Research Questionnaire (C-BARQ) were presented in the form they take in the C-BARQ for the purposes of comparability of results. However, for operational purposes within Guide Dogs permission was granted by the C-BARQ creator, Prof James Serpell, to change the composition of the scales to optimise performance for applied use. For this purpose, a new internal reliability analysis was performed using the C-BARQ items for the scale Excitability and six of the miscellaneous items that were hypothesised to also be measuring behaviour related to excitability. These six items ("Is hyperactive, restless, has trouble settling down"; "Is self-controlled and calm"(negatively transformed); "Barks persistently when alarmed or excited"; "Is calm and quiet"(negatively transformed); "Is excessive and if it lunges is hard to hold back" and "Jumps up on people (stands to place front paws on persons/chest/legs)") were successfully grouped with the C-BARQ items achieving a high Cronbach's alpha value of 0.85 (using the scores for 12 month old dogs), suggesting they could be reliably averaged to create a new scale score for Excitability. The miscellaneous question "Returns directly to you if startled or frightened" was designed to evaluate the secure base aspect of attachment [3], so we also tested the internal reliability of the C-BARQ Attachment and attention-seeking scale if this was included in it. The Cronbach's alpha score did not differ significantly when this question was added (0.54 originally reduced slightly to 0.53) so this question was retained as part of the Attachment and attention-seeking scale prior to predictive refinement. Following the alterations to the two described scales, individual items from each PWQ scale were examined for predictive associations in order to identify questions that could be removed.
To reduce the length of the questionnaire, none of the remaining miscellaneous questions were included in the r-PWQ, as averaged scale scores were considered to be more useful for Guide Dogs. Two steps of analysis were used to identify individual items from within the scales to retain for future use. In Step 1, all individual items were evaluated for potential associations with qualification or withdrawal for behaviour using univariate logistic regressions models. All items that showed an association with qualification or withdrawal to a 90% confidence level (p < 0.1) for at least one of the three ages were retained for use in the r-PWQ as being potentially predictive. Where a scale contained some items that were associated with outcome (for at least one age) and some that were not associated at any age, all of those items with no predictive association were removed, so that future scaleaverage scores would be made using only items with potential predictive value. If a whole scale did not contain any items with potential predictive value, the items were checked for inter-rater reliability and temporal consistency as part of Step 2, and the scale was kept only if both of these criteria were met for all items.
To test for predictive associations, each individual item from the PWQ was assessed for potential associations with qualification or withdrawal for behaviour using univariate logistic regressions models. Separate analyses were conducted for each item for the five, eight and twelve-month PWQs. The basic model equation using a logit link function can be written as: Where yi represents the response variable (withdrawal for behaviour vs. entry in to advanced training or qualified) for the ith dog; πi represents the probability that yi = 1; β0 is the model intercept (the estimated response value when the predictor equals zero), and the regression coefficient for the explanatory variable is represented by β1Xi .
This analysis provided statistics representing predictive associations for each individual item as scored at each of the three ages. All items which showed an association with qualification or withdrawal to a 90% confidence level (p < 0.1) were retained for further analysis. Retained items were kept in their original PWQ groups, even if some of the group's items had been excluded. Trait scores were calculated as means of all items within the groups, with those worded negatively in relation to the rest of the scale changed in direction (100-item score).
To assess temporal consistency, correlation estimates were sought between the scores given to all dogs (n = 176) that had an assessment completed at each age (five, eight and twelve months). Twoway random intra-class correlation coefficients (ICC) with the consistency method were used to provide a coefficient that summed the overall consistency between the three assessments. Items which achieved ICC's of >0.30 (±0.01) were considered to show acceptable temporal consistency. Inter-rater reliability (evaluated using a 2-way mixed ICC model with a consistency method) was accepted for items statistically significant to p < 0.05 (n = 21 pairs of puppy walkers living with and scoring the same dog, methods for data collection described in full in [1]).
The inter-rater and temporal consistency analyses were only done for individual items not meeting the criteria for Step 1, for the purpose of refining the PWQ. Scale level temporal consistency and inter-rater reliability is described in full in [1] for the novel PWQ scales and for the C-BARQ scales has been described in other studies e.g. [2].

Results
In total, 39 items were included in the r-PWQ (Table S2). A new scale named Animal Chase was added to the r-PWQ, containing two questions originating from the C-BARQ scale Chasing ("Chases birds or squirrels (or would like to)" and "Chases cats (or would like to)") and three scales had some items removed due to lack of individual associations with training outcome, or lack of reliability. In the r-PWQ, these scales are referred to as r-Attachment and attention seeking (r-AAS), r-Separationrelated behaviour (r-SRB) and r-Excitability to indicate that they have been refined as compared to their original form in the original PWQ. Table S2. Each of the 61 items from the original PWQ shown with P-values for predictive validity from logistic regression models of each individual question, at each sampled age, against training outcome (qualified or withdrawn for behaviour) for the original cohort of dogs. Associations that met each steps criterion for retention in the r-PWQ are highlighted in bold.
Step 2 analyses were only conducted for individual items that failed to meet Step 1 criteria.

Item
Step 1 Step 2  7 items that were altered or created following panel feedback. A indicates that the anchors for the 100mm VAS scale were "Really does not describe this dog" to "Really describes this dog", whilst all remaining items were scored on a frequency scale from "Never" to "Almost Always".
All of the miscellaneous questions were excluded from the r-PWQ, as averaged scale scores were considered to be more useful for Guide Dogs and this helped to shorten the length of the questionnaire. When individual items within trait scales were not predictive but others were, they were excluded from the r-PWQ even if they met the criterion for Step 2, in order to create scales that were averages only of questions with predictive potential. The two questions that comprise the scale Energy were not predictive, but met Step 2 criteria, so this scale was retained as it could still be used to form a scale useful for profiling purposes if not for prediction.
One question was removed from the C-BARQ-derived scale Attachment and attention-seeking, three were removed from the C-BARQ-derived scale Separation-related behaviour. Four questions were removed from the Excitability scale (two of these were original C-BARQ items and two were newly added ones) due to lacking predictive associations. In the r-PWQ, these scales will be referred to as r-AAS, r-SRB and r-Excitability to indicate that they have been refined as compared to their original form in the original PWQ and C-BARQ.
Two additional questions from the C-BARQ scale Chasing were added into the r-PWQ as Guide Dogs wanted to evaluate how dogs reacted to animals. The two questions were worded as follows "Chases birds or squirrels (or would like to)" and "Chases cats (or would like to)" and were averaged to make a C-BARQ-derived score called Animal Chase.

Materials and Methods
The mean and standard deviation (reported as ± S.D.) for r-PWQ and C-BARQ comparable traits, with the addition of the r-PWQ Distractibility trait were calculated from a population of 359 dogs (n=321 Guide Dogs UK, n=38 Guiding Eyes). Mann-Whitney U tests were used to compare scores between populations and results reported as significant when P < 0.05.

Results
Mean scores were similar between Guiding Eyes and Guide Dogs UK populations. Scores for Body Sensitivity and Attachment and attention seeking traits in both the r-PWQ and C-BARQ showed the greatest difference between populations (see Table S3). Table S3. Mean (± S.D.) trait scores, Mann-Whitney U and significance values for Guide Dogs UK and Guiding Eyes populations for C-BARQ and r-PWQ comparable traits (with the addition of the r-PWQ Distractibility).

Trait Group
Guide Dogs UK mean (±S.D.)