Talking to Dogs: Companion Animal-Directed Speech in a Stress Test

Simple Summary Companion animal-directed speech is a current topic of research, interesting due to its similarity to infant-directed speech. Dog owners seem to almost subconsciously use this high-pitched and repetitive way of speaking, slightly adapted, for dogs. The aim of this study was to investigate dog-directed speech in different contexts and examine whether owner personality and relationship quality affect it. We found that owners’ personality and gender affect their dog-directed speech. The majority of the modifications of dog-directed speech could be explained by a differential use of voice pitch and range. Our study supports the idea that voice pitch was used to communicate affect, whereas pitch range was used as an attention-getting strategy. Based on our results, we conclude that dog-directed speech is adjusted depending on context, gender, and personality. Societal value in this study consists of its contribution to basic knowledge of how we talk to animals, which may help in preventing accidents (e.g., dog bites) as well as improving animal training. Abstract Companion animal-directed speech (CADS) has previously been investigated in comparison to infant-directed speech and adult-directed speech. To investigate the influence of owner caregiving, attachment pattern, and personality on CADS, we used the Ainsworth strange situation procedure. It allowed us to assess voice source parameters of CADS across different contexts. We extracted speech parameters (voicing duration, voice pitch, pitch range, and jitter) from 53 dog owners recorded during the procedure. We found that owner personality and gender but not caregiving/attachment behavior affect their voice’s pitch, range, and jitter during CADS. Further, we found a differential and context-specific modification of pitch and range, consistent with the idea that pitch communicates affect, whereas range is more of an attention-getting device. This differential usage, and the increased pitch, emphasize and support the parallels described between CADS and infant-directed speech. For the first time, we also show the effect of personality on CADS and lay the basis for including jitter as a potentially useful measure in CADS.


Introduction
It has long been known that the human voice carries substantial cues to the emotional state of the owner, completely independent of any verbal or linguistic content [1]. With the exception of a few familiar words (e.g., their name, "good", "bad", etc.), animals listening to human speech presumably female-female, male-female, and female-male were a subset of 132 human-dog dyads who had already participated in an experimental study of interaction styles and human-dog relationships [30,31]. The recruitment for our study was based on voluntary participation of teams from the pool of 132 dyads. Our subset of dyads represents those who responded positively to our request for participation in this study. All dogs lived with the owner who was also the main attachment figure from puppyhood onward. All dogs were intact, i.e., were neither spayed nor neutered. Their mean age was 4 years ± 1.5 SD and their mean weight was 30 kg ± 13.2 SD. Mean owner age was 46.2 years ± 10.2 SD. All dyads were recruited from Vienna and surrounding areas in Austria.

Owner Personality Axis
Dog owners were asked to fill in the German version of the NEO Five-Factory Inventory evaluating their own personality [32,33], a 60-item psychometric instrument designed to evaluate nonclinical adult personality structures along five major dimensions: neuroticism, extraversion, openness, agreeableness, and conscientiousness.

Ainsworth Strange Situation Procedure
The Ainsworth strange situation procedure [20] was adapted to assess the human-dog relationship. Within this assessment, dog attachment behavior is activated via increasing stress during the procedure, which in turn potentially activates caregiving behavior in the owner. Before testing, the room was prepared by placing two color-tagged chairs next to each other in the middle of the room, closing the windows and shades, and depositing several toys between and in front of the chairs. The dog owners were guided through the procedure by the experimenter in the following fixed order: 0. Controls (~10 s): Two control settings (reading and speaking/adult-directed speech) are recorded in the waiting room, adjacent the experimental room: First, the owner is asked to read a predefined text in presence of the dog out loud. (" . . . der beste Freund des Schäferhundes Rex ist die Ente Oskar . . . "-" . . . the German shepherd Rex's best friend is the duck Oscar . . . "). For the second control, the experimenter engages the owner in small talk about their dog.
1. Introduction (~20 s): The experimenter leads the owner-dog dyad into the room, introduces them to the surroundings, asks the owner to unleash the dog for the duration of the procedure, and leaves the room again.

Exploration (3 min):
Both dog and owner can move around freely and explore the room and the toys. The owner is free to interact and talk with the dog normally.
3. Encounter (1 + 2 min): A person previously unknown to the dog ('the stranger') enters the room and asks the owner to take their designated seat if they were not sitting yet. Other than this, there is no further interaction initially: For one minute, the stranger sits down on the other chair, remaining motionless. In the second minute, the stranger initiates small talk with the dog owner. In the third and last minute of this first encounter, the stranger tries to interact with and elicit play from the dog. At the end of minute three, the stranger asks the dog owner to leave the room.

Separation (3 min):
The dog stays alone in the experimental room with the stranger. The stranger tries to engage the dog by playing and interacting.

Call (~5 s):
The experimenter instructs the dog owner to stand in front of the experimental room door and to loudly call the dog's name.

Reunion (3 min):
The owner re-enters the room and reunites with the dog. The owner is free to interact and to play with the dog. The stranger quietly leaves the room. At the end of minute three, a vibrating phone indicates that the owner should leave the experimental room.

Separation (3 min):
The dog remains alone in the experimental room for three minutes.

Encounter (3 min):
The stranger from stage 3 re-enters and the dog remains in the experimental room with the stranger. The stranger tries to comfort the dog by playing and interacting.
9. Call (~5 s): The experimenter again instructs the dog owner to stand in front of the experimental room door and loudly call the dog's name.

Reunion (3 min):
The owner enters the room, reunites with the dog, and provides at least one physical contact. The owner is free to interact and play with the dog. The stranger quietly leaves the room. At the end of minute three, the experimenter enters the room and ends the procedure.
to interact and to play with the dog. The stranger quietly leaves the room. At the end of minute three, a vibrating phone indicates that the owner should leave the experimental room.

Separation (3 min):
The dog remains alone in the experimental room for three minutes.

Encounter (3 min):
The stranger from stage 3 re-enters and the dog remains in the experimental room with the stranger. The stranger tries to comfort the dog by playing and interacting.

Call (~5 s):
The experimenter again instructs the dog owner to stand in front of the experimental room door and loudly call the dog's name.

Reunion (3 min):
The owner enters the room, reunites with the dog, and provides at least one physical contact. The owner is free to interact and play with the dog. The stranger quietly leaves the room. At the end of minute three, the experimenter enters the room and ends the procedure.  The icons represent all parties of this study. The dog was present at all times and is therefore marked as such in each episode. The episodes marked with the owner icon were used for audio analysis.

Video Recording, Surveillance and Animal Welfare
The test room was equipped with a wide-angle lens video surveillance system (Canon Inc., Canon Austria GmbH, 1100 Vienna, Austria) to record and monitor the proceedings throughout from outside the room. The experimenter monitored the situation directly outside of the test room to ensure the dog and dyad's safety and to intervene if needed. The dog owners could terminate the experiment at any point without giving any reason and were permitted to monitor the dog after exiting the room from the video monitor outside, with the experimenter. In cases where any of the dogs experienced extensive stress, the experiment would have been stopped by the experimenter independently of the dog owner; however, this was never the case. As judged by behavioral parameters, none of the dogs experienced stress levels outside of ordinary levels, so in no case did the owner or experimenter intervene to terminate the test situation. All participants were asked to sign consent forms and were informed about their voluntary participation and their right to stop the experiment at any time without providing a reason (see Section 2.11).

Dog Attachment Classification
The dog's attachment classification, based on the adapted ASSP, was analyzed according to methods detailed in Schöberl et al. and Solomon et al. [30,34]. Based on the video recordings of the ASSP, coauthors AB and JS, two psychologists trained in attachment categorization of human toddlers, together classified the dogs (with 89% interrater reliability) into five categories: secure, insecure avoidant, insecure ambivalent, insecure disorganized, and unclassifiable. KK and IS assisted with their expertise in dog behavior. Sample sizes and criteria were as follows: A total of 27 out of 53 dogs were classified as 'securely attached': They eagerly approached their owners during the reunion and actively searched for and tolerated physical contact, while also showing interest in exploring the room. Three dogs were classified as 'insecure-avoidant', showing little tendency to approach their owner during the reunion but spending much time exploring the room and the toys. Six dogs were classified as 'insecure-ambivalent' based on their tendency to not explore their environment after the reunion at all but instead remaining in close proximity to their owner. Nine dogs were categorized as 'insecure-disorganized' who showed odd behavioral elements not being part of the normal attachment repertoire, such as freezing, staring or evident stereotypies. For eight dogs, no consensus could be reached; they were therefore labelled 'unclassifiable'.

Owner Caregiving Rating
The owner caregiving rating was developed by Solomon et al. [34] in the context of previous studies [30,31] and is based on Ainsworth's maternal sensitivity scale [20] and the 'supportive presence' scale [35]. The scale was designed to capture the caregiver's responsiveness and sensitivity to the dog's needs in a threatening situation [34]. During the threat task, an unfamiliar person entered the test room wearing a black coat, hat, and ski mask (with only the eyes visible). The unfamiliar person took three steps (in an interval of three seconds) towards the tethered dog while staring at its face. This process was repeated twice. After the second encounter, the unfamiliar person de-escalated the threat by stepping back, taking off the disguise, talking to the dog in a calming matter and offering cheese. The dog owner was present in only one encounter with the unfamiliar threat. The order of the owner's presence or absence during the encounters was randomized for each dyad. The experimenter (and in one test scenario, also the owner) observed both conditions on monitors outside the room. The dyad participated in the mild threat task four weeks to a year prior to entering the ASS procedure. Caregiving behavior by owners was rated based on the threat task on a seven-point caregiving scale for all dyads. The highest score of seven was given if the owner showed a consistent, quick, and flexible response in their caregiving behavior. The minimum score of one was given if the dog owner did not respond at all to the dog, or if they responded in a negative or punishing way. The ethics of the caregiving scales development was reviewed by the 'Faculty of Life Sciences' at the University of Vienna (case number: 2014-015).

Audio Recording
The dog owners' vocalizations were recorded during the entire ASSP with an H4N recorder connected to a small Sennheiser (ew 100 G3) microphone attached to the clothing in the chest area. The sampling frequency was 48 kHz with 16-bit quantization, and the sensitivity was adjusted prior to Animals 2019, 9, 417 6 of 16 recording to prevent clipping. The recording was stopped by the experimenter shortly after entering the room after episode ten.

Audio Treatment and Analysis
All 53 recordings had an adequate signal-to-noise ratio for audio analysis. The audio files of all dyads were prepared for semi-automated, acoustic analysis by hand editing with Audacity (Version 2.1.1; www.audacityteam.org), removing ASSP episodes and episode parts where either a talking second person was in the test room with the owner, or where the dog owner quietly waited outside of the test room ( Figure 1). Following this procedure, episodes 1, 4, 7, and 8 as well as minutes two to three of episode 3 were excluded with Audacity (1: experimenter talking in room; 3: strange person talking in room; 4, 7, and 8: owner waiting outside). The remaining episodes 0, 2, 3, 5, 6, 9, and 10 were run through a semi-automated analysis pipeline in Praat (Version 6.0.23; www.fon.hum.uva.nl/praat/). All analysis was conducted using five custom written Praat scripts. I. A: Normalization and prefiltering: Each audio file was adjusted to a maximum amplitude peak of 0.99 and treated with a spectral subtraction background noise filter using Praat commands 'Scale peak' and 'Remove noise'.
I. B: Speech segmentation: Praat's 'To TextGrid (silences)' was used to label those portions of the audio containing the owners' speech. The thresholds for the speech stream segmentation were based on silence in between intervals with a threshold of −35 dB silence, a minimum duration of 0.5 s for silent intervals, and a minimum duration of 0.07 s for spoken intervals. This process labelled the relevant spoken intervals automatically by marking them in the Praat TextGrid. The analyst then verified these automatic labels and made manual accuracy adjustments if necessary, in case of bursts of sound created by the dog vocalizing or playing with the toys.
II: Pitch contour tracking: All spoken intervals were saved, and their pitch contours were tracked and extracted using the Praat command 'Extract visible pitch contour'. The Praat internal commands 'Get mean', 'Get minimum', and 'Get maximum' were used to measure the acoustic parameters mean, minimum, and maximum of each individual pitch contour.
III: Episode matching: A TextGrid tier with episode identifiers was added to match each spoken interval with the corresponding episode. The start and end time of all spoken intervals were extracted.
IV: Jitter measurements: The command 'To PointProcess' was used to create a point process object out of the pitch contour extracted in script II. Those 'PointProcesses' and the command 'Get jitter (local)' were used to measure voice's jitter. The default settings were used except for the 'Longest period (s)', where 0.033 (minimum frequency measured) was input.

Analyzed Parameters
Four summary parameters of the f0 ('voice pitch') track were used to describe the owner's speech in spoken intervals throughout the test procedure ( Figure 2): voiced duration, mean, range, and jitter. The semiautomated system measured each speech parameter below for every spoken interval. Intervals labelled as spoken but without measurable f0 were automatically excluded from further analysis. For readability, the fundamental frequency (f0) is referred to as 'pitch' throughout the rest of the paper.
Duration: Voiced utterance duration measured in seconds for each interval and calculated by subtracting the start time from the end time.
Mean Pitch: Mean f0 of the spoken interval in hertz. Pitch Range: The minimum f0 measured within the interval, subtracted from the f0 maximum. Jitter: Jitter was measured using Praat's algorithm 'To Jitter (local)'. This parameter is used as a measure of voice quality and is 'the average absolute difference between consecutive periods, divided by the average period' (Version 6.0.23; http://www.fon.hum.uva.nl/praat/manual/PointProcess__Get_ jitter__local____.html). Jitter: Jitter was measured using Praat's algorithm 'To Jitter (local)'. This parameter is used as a measure of voice quality and is 'the average absolute difference between consecutive periods, divided by the average period' (Version 6.0.23; http://www.fon.hum.uva.nl/praat/manual/Point Process__Get_jitter__local____.html).

Statistical Analysis
Statistical analyses were done using R (Version 3.3.3; www.r-project.org/) and R-Studio (Version 1.0.13; www.rstudio.com/) and the packages ggthemes, ggplot2, psych, data.table, visreg, dplyr, doBy, xlsx, usdm, tidyverse, G.Gally, stats, lme4, car, nlme, lme4, cowplot, multcomp, MuMIn, piecewiseSEM, sjstats, and MASS. The data set for this study is available in the supplementary materials (Spreadsheet S1). Of all 53 dyads included in this study, over 4100 measurements were analyzed for each of the four response variables. Prior to running detailed analysis, all fixed factors tested negatively for multicollinearity. After visual inspection of the residuals of each response variable, the basic assumption of linear mixed models that the residuals follow a normal distribution could not be confirmed for jitter, mean pitch, pitch range, and voiced duration. Therefore, generalized linear mixed models (glmm) fit using maximum likelihood were calculated for all these four variables. The best distribution fit for each individual model's response variable was established by investigating the residuals' histogram and plotting the full models fitted residuals over the estimated residuals. Since the four response variables jitter, mean pitch, pitch range, and voiced duration were continuous and positively skewed towards the right, a Gamma distribution with a log link was used. Dyad was included as random effect to control for individual variances in each model, and the centered and scaled NEO FFI character traits (agreeableness, conscientiousness, extraversion, openness, and neuroticism), centered and scaled caregiving rating, attachment pattern, and owner gender were added as fixed factors and episode as a covariate of the full model. Because voice pitch varies by roughly an octave between adult men and women, gender was highly important to include as a fixed factor for each model. All attachment categories, except for the secure group, were collapsed into the nonsecure attachment group. This was a necessary simplification to reduce the number of levels within categories and facilitate model convergence.

Statistical Analysis
Statistical analyses were done using R (Version 3.3.3; www.r-project.org/) and R-Studio (Version 1.0.13; www.rstudio.com/) and the packages ggthemes, ggplot2, psych, data.table, visreg, dplyr, doBy, xlsx, usdm, tidyverse, G.Gally, stats, lme4, car, nlme, lme4, cowplot, multcomp, MuMIn, piecewiseSEM, sjstats, and MASS. The data set for this study is available in the Supplementary Materials (Spreadsheet S1). Of all 53 dyads included in this study, over 4100 measurements were analyzed for each of the four response variables. Prior to running detailed analysis, all fixed factors tested negatively for multicollinearity. After visual inspection of the residuals of each response variable, the basic assumption of linear mixed models that the residuals follow a normal distribution could not be confirmed for jitter, mean pitch, pitch range, and voiced duration. Therefore, generalized linear mixed models (glmm) fit using maximum likelihood were calculated for all these four variables. The best distribution fit for each individual model's response variable was established by investigating the residuals' histogram and plotting the full models fitted residuals over the estimated residuals. Since the four response variables jitter, mean pitch, pitch range, and voiced duration were continuous and positively skewed towards the right, a Gamma distribution with a log link was used. Dyad was included as random effect to control for individual variances in each model, and the centered and scaled NEO FFI character traits (agreeableness, conscientiousness, extraversion, openness, and neuroticism), centered and scaled caregiving rating, attachment pattern, and owner gender were added as fixed factors and episode as a covariate of the full model. Because voice pitch varies by roughly an octave between adult men and women, gender was highly important to include as a fixed factor for each model. All attachment categories, except for the secure group, were collapsed into the nonsecure attachment group. This was a necessary simplification to reduce the number of levels within categories and facilitate model convergence. Null models included the covariate episode and the random effect dyad. All full models, except the duration model, tested to be a significantly better fit than the null model. Based on the results of the full-null model testing, duration was thus excluded from further analysis. Up to this point, standard practice of fitting linear mixed models was followed. Further decisions for the statistical analysis require some explanation. A widely-used traditional but criticized approach for finding the best fitting model is a method called stepwise model reduction [36]. This process excludes one variable at a time from the full model based on p-values or AICc scores until reaching the null model. Apart from the problems arising from the usage of p-values, another issue arises in this context [37]: This stepwise reduction prevents an overview of all possible combinations of fixed factors explaining the response variable. To circumvent the problems and limitations arising through stepwise reduction, we used the function 'dredge' of the MuMIn package. This function creates every single model possible out of the fixed factors of the full model with (in our case) AICc scores. The model with the lowest AICc score is picked to provide a baseline, and within a delta of 2 upwards, all models are considered mathematically equally good fits [38]. The models within this range are put through model averaging to create averaged coefficients. This process allows evaluation of the influence all fixed factors have on the response variable without the restriction of p-values. For compatibility with the described approach of model averaging based on AICc values, a confidence interval of 85% is used to judge each factor's impact on the speech parameters [39]. Factors with a confidence interval not including 0 were chosen to be of importance in predicting voice parameters. Relative importance, a measure describing each factor's relative importance compared to the most valuable factor within the averaged coefficients, is used as a second measure and confirmation of the confidence intervals.
This approach of comparing all models within the delta 2 of the lowest AICc through model averaging was used to gain insight into the complex framework of CADS and its interactions with human personality and the human-dog attachment/caregiving system, without the unnecessary restriction of eliminating valuable comparisons from step one. For a review on null hypotheses significance testing and information theory based approaches and their possible combinations as used here, see Mundry 2011 [40].

Ethics
All participants were asked to sign consent forms, were informed about the procedure, and could terminate participation at any time. The ethics regarding human participation in the ASS procedure was reviewed and approved by the German Society for Psychology (Deutsche Gesellschaft für Psycholgie, AB 07_2011). All human/animal data collection and analysis was done in accordance with the declaration of Helsinki and the EU Directive 2010/63/EU for animal experiments. The ethics for this study was reviewed and approved by the 'Faculty of Life Sciences' animal welfare committee at the University of Vienna (case number: 2014-015).

Mean Pitch
We found owner gender to affect pitch across episodes (Table 1). CADS was consistently higher in voice pitch than read speech (male mean: 121 ± 20 Hz SD; female mean: 203 ± 33 Hz SD) or conversational speech (male mean: 119 ± 19 Hz SD; female mean: 204 ± 38 Hz SD; Figure 3). Both male and female owners' highest median pitch was recorded during the call episodes (male mean: 176 ± 32 Hz SD; female mean: 300 ± 57 Hz SD). In both genders, the reunions, exploration, and encounter had a similar mean pitch. No effect of personality traits on CADS mean pitch was observed for male and female owners.

Pitch Range
We found gender and openness to affect pitch range across episodes (Table 2). Pitch range in CADS and conversation was reduced relative to reading. Both male and female owners showed the highest median pitch range in the reading control condition ( Figure 4A). The median pitch range in the reunions, exploration, and encounter episodes in both men and women was lower than the speaking (male mean: 73 ± 48 Hz SD; female mean: 124 ± 84 Hz SD; Figure 4A) and reading controls (male mean: 84 ± 42 Hz SD; female mean: 146 ± 54 Hz SD; Figure 4A). Female but not male owners high in openness showed an increased frequency range ( Figure 4B).

Jitter
Jitter results in the CADS condition were highly variable. We found owner gender and openness to influence jitter across episodes ( Table 3). The median jitter was lowest in the encounter (male mean: 0.014 ± 0. 009 % SD; female mean: 0.01 ± 0.007 % SD) and the call episodes (male mean: 0.009 ± 0.004 % SD; female mean: 0.007 ± 0.003 % SD) in both male and female owners. The speaking control condition had the highest median percentage of jitter in both men (male mean: 0.018 ± 0.008 % SD) and women (female mean: 0.012 ± 0.005 % SD). Male owners' reunion and exploration episodes had a lower percentage of voice jitter than their speaking control condition; women's percentage of jitter in the same episodes had similar levels to the speaking control ( Figure 5A). Male owners' openness values scaled positively with the percentage of jitter in their vocalizations; this effect was comparatively weaker in female owners ( Figure 5B).

Discussion
We analyzed vocal parameters in dog-directed speech during a series of controlled encounters in dog-human dyads and compared these to normal speech (a read passage, or between-human conversation). The staged encounters were designed to elicit arousal in the dogs and caregiving from their owners. In general, we found that CADS was higher in f0 ("voice pitch") than normal speech but showed a narrower pitch range. Results concerning pitch perturbation (jitter) were quite variable and showed no clear effect of arousal; the most pronounced effect was a considerable decrease in jitter associated with the owner calling the dog's name in the two "call" episodes.
Compared to earlier work on CADS, our results support the findings of an increased pitch but fail to support the previously described broader pitch range. Jeannin et al. [29] partially used an approach similar to the ASSP and found voice pitch in dog-directed speech to increase in reunion episodes compared to adult-directed speech. Our results showed the exact same pattern, with a strong increase in voice pitch during the reunion episodes. Gergely et al. [28] reported a broader frequency range in CADS in mothers compared to fathers. We found the same effect in women to using a broader frequency range during CADS than men did. A well-described phenomenon in IDS and CADS is a differential use of voice pitch and frequency range. Pitch is said to communicate affect, whereas the range is used more in attention getting [10,28,[41][42][43]. This different function of pitch and range might also be the explanation as to why we did not find a broader pitch range. Pitch range might simply be adjusted strongly in accordance with context. Namely, if the owner wants to draw the dogs' attention away from something, a broad pitch range could be used. The opposite would be true if the owner already has the dog's attention and is trying to soothe and calm it [41]. This would explain why the owners used a broader pitch range and a high pitch while calling their dogs and therefore hoping to draw their attention while also communicating positive intentions. The pitch in exploration, encounter, and the reunion episodes was lower in comparison to the call episodes but still increased compared to the read and adult-directed speech. This elevated pitch coincided with the narrowest pitch ranges in the same episodes. Exploration, encounter, and both reunions share in common that most dogs had already directed their attention towards their owner (in a 2013 study, 85% of the dogs followed their owners around in the reunion and encounter episodes [44]); no broad pitch range was needed to keep their focus. The owners continued to use an elevated pitch to communicate positive affect and a nonthreatening situation.
The results regarding jitter were less clear and more variable. Both males' and females' adult-directed speech had the highest percentage of jitter compared to the lowest in the call episodes. Female owners' median jitter slowly returns to the levels of the reading control throughout the exploration, encounter, and reunions. Male owners' jitter stays at an elevated level during the exploration and reunion episodes. Interestingly, jitter was highest in the adult-directed speech and behaved contrary to what we would have expected. An explanation for this might be found in the hypothesis that an increase in the speakers' stress leads to a decreased jitter due to higher tension on the vocal folds [1,4]. This might be a parabola-like phenomenon with relaxed and extremely stressed speakers producing the highest jitter values. This hypothesis would explain why we would see jitter to be lowest when the speaker is stressed but not overly so; enough to cause tension (in the body and the vocal folds) yet mild enough to be dealt with. This hypothesis fits best with our results, but further research must be done to empirically assess this claim.
Our work also illustrates the importance of owner gender and owner personality on CADS. Not only does our study support the idea that CADS is gender-specific and variable enough among dyads to constitute an important component of human-dog caregiving and attachment strategy, we also found openness to drive CADS modulation within this ASSP setup. The influence of personality on CADS seems to be gender-specific and may be considered part of gender-specific performance. Men's openness was positively correlated with utterance jitter, while women showed almost no correlation. Higher scores in openness correlated with a higher pitch range in women but showed no correlation in men. The limited literature regarding personality in the context of CADS restricts us in doing more than speculating as to why openness might influence jitter and pitch range. There might be one intriguing hypothesis explaining the influence of this one personality axis. The ASSP is designed to evoke arousal and a stress response (and therefore cortisol excretion) in the dogs, which in turn causes a similar stress response in owners. Research suggests a link between cortisol stress response, personality, and gender [45,46]. We propose this link to be reflected in the vocal parameters of jitter and pitch range. This possible interaction of personality, cortisol, and CADS might be a potentially interesting field of further research.

Conclusions
To summarize, our data partially support and partially contradict our initial hypothesis of an increase in voice source variables with increasingly aroused situations. We did find voice pitch to increase in the ASSP setup in comparison to normal speech, but we found the opposite to be true for pitch range. The decrease in pitch range might be caused by the differential use and function of mean pitch and range. With the narrower range, the owners tried to calm the dogs in this stressful setting. Due to limited reports in literature and close to no precedent, our predictions and hypothesis on jitter were less clear. The idea of voice perturbation to increase with arousal was not supported by our data, but it fit the hypothesis of being (at least partially) personality-dependent.
Kershenbaum and one anonymous reviewer for comments and suggestions of improvement on a previous version of this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest. The sponsors had no role in the design, execution, interpretation, or writing of the study.