Intra-Individual Behavioural Variability: A Trait under Genetic Control

When individuals are measured more than once in the same context they do not behave in exactly the same way each time. The degree of predictability differs between individuals, with some individuals showing low levels of variation around their behavioural mean while others show high levels of variation. This intra-individual variability in behaviour has received much less attention than between-individual variability in behaviour, and very little is known about the underlying mechanisms that affect this potentially large but understudied component of behavioural variation. In this study, we combine standardized behavioural tests in a chicken intercross to estimate intra-individual behavioural variability with a large-scale genomics analysis to identify genes affecting intra-individual behavioural variability in an avian population. We used a variety of different anxiety-related behavioural phenotypes for this purpose. Our study shows that intra-individual variability in behaviour has a direct genetic basis that is largely unique compared to the genetic architecture for the standard behavioural measures they are based on (at least in the detected quantitative trait locus). We identify six suggestive candidate genes that may underpin differences in intra-individual behavioural variability, with several of these candidates having previously been linked to behaviour and mental health. These findings demonstrate that intra-individual variability in behaviour appears to be a heritable trait in and of itself on which evolution can act.


Introduction
Individuals within a population are often repeatable in many aspects of their behaviour [1]. Repeatable differences in behavioural traits, such as aggressiveness, shyness, sociability and activity between individuals have given rise to a growing field focusing on animal personality and have motivated the development of many evolutionary theories aimed at understanding the processes that allow and maintain such within-population variation [2].
Individual repeatability does not mean that the behaviour of an individual is fully predictable or stable. An increasing number of studies have shown that when individuals are measured more than once in the same context they do not behave in exactly the same way each time [3]. Interestingly, the degree of intra-individual variation also differs between individuals, with some individuals showing low levels of variation around their behavioural mean while others show high levels of variation [4]. This can result both from intrinsic individual variability in behavioural or individual differences in phenotypic plasticity in response to unmeasured internal or external stimuli [5]. Whereas proximate and ultimate causes of inter-individual variation in behaviour has been an area of intense research interest across animal taxa [6], intra-individual variation in behaviour has previously been assumed to be homogenous across individuals. Interest in this field has increased, however. A variety of studies have assessed the extent of this variation in species and how it can affect ecologically relevant situations [7][8][9][10][11][12][13]. Despite this interest, almost nothing is known about the genetic factors that affect this large but understudied component of behavioural variation [14][15][16], with only one study assessing heritability in this to date [17]. Insight into the potential genetic control that gives rise to intra-individual behavioural variability is crucial for disentangling its proximate causes and thereby fine-tune tests of hypothesis about the evolution of this type of behavioural variability [18]. Knowledge of the genetic basis of intra-individual behavioural variability can also help understand the link between personality and behavioural variability, and to what extent and how intra-individual behavioural variability is consistent in different situations.
A classical approach to identify the genetic regions or loci that underpin a trait is to use a technique known as quantitative trait locus (QTL) mapping. This is a process by which two populations divergent for the trait of interest are inter-crossed for two or more generations, to create an F 2 or further intercross. This intercross can then be used to map genetic loci affecting the trait of interest by genotyping the individuals using markers that can differentiate between the two parental populations. These markers can then be used to map the larger-effect loci that underpin the variation between the two parental populations. This approach can be further complemented by using an advanced intercross (intercrossing for additional generations to generate more recombinations) and integrating gene expression (expression QTL) to further aid in gene identification. Such intercross populations are poor at calculating the heritability of the founder populations (essentially only representing the number of individuals actually used in the initial parental intercross), but offer the ability to identify the actual genomic regions affecting the trait of interest.
In this study, we combine standardized behavioural testing to estimate intra-individual variation in anxiety and sociability related behaviours with a previously generated large-scale combined genetics and genomics analysis to attempt to identify genes affecting intra-individual behavioural variability in a population of intercross chickens. Whereas more typically studies on intra-individual variation restrict themselves to a single trait, here we have replicate measures (two per test) of three different separate anxiogenic tests. This allows us to assess how repeatable intra-individual variation is within and between different contexts and test conditions. By combining quantitative trait locus (QTL) and expression QTL (eQTL) analyses of the brains of an advanced intercross based on Red Junglefowl (the wild progenitor of the modern domestic chicken) and domestic White leghorn chickens we identify putative genes underlying phenotypic differences in intra-individual behavioural variability and to what degree these differ between behavioural trait.

Overview of Study Performed
The main objective of this study was to test if intra-individual behavioural variability is trait-specific, what is the genetic architecture of this trait and whether it is discrete from the standard trait QTL, and whether candidate genes for these traits could be identified. To perform this, we used an eighth generation advanced intercross between wild and domestic chickens. These birds had already been used in a genetical genomics experiment, with gene expression measured in the hypothalamus of these birds, and with three separate anxiety-related traits measured: Open field (OF), Social reinstatement (SR), and Tonic immobility (TI). These traits have already been QTL mapped for their mean trait effects [19][20][21], while expression QTL (eQTL) were also measured in these birds, based on the gene expression profiles. eQTL are similar to standard QTL, but the phenotypes used are gene expression values. To identify candidate genes for these anxiety behaviours, first the trait behavioural QTL were mapped, then the eQTL were mapped, before the two sets of QTL and eQTL were overlapped with one another. Any of the eQTL that overlapped with a QTL were considered to be potential candidates. As each individual in the study (n = 129) with a gene expression profile was also assayed for anxiety behaviour, it was possible to correlate the expression of each of these overlapping genes with the actual behavioural trait it overlapped with. Any genes that were significantly correlated were then taken forward to be assessed using a statistical causality analysis. This analysis is used to determine whether a genomic location controls gene expression, and that gene expression in turn controls a phenotypic (behavioural) trait. In this way, it is possible to identify genes that putatively control a particular phenotypic trait.
The same approach was therefore applied to the intra-individual variability (IIV) traits that were derived from the anxiety behaviours that were previously mapped. As eQTL and gene expression profiles were also available, it was possible to not only QTL map the IIV traits, but also to overlap them with the eQTL, and test these further to assess for potential candidate genes that regulated anxiety-related IIV behaviours. As the genetic architecture (QTL locations and effects) were already available for the standard trait effects (individual trials and means of the two trials), it is therefore possible to compare the two types of genetic architecture to assess how closely they overlap and how much they differ.

Chicken Study Population and Cross Design
The population used in this study was an eighth generation intercross between a population of Red Junglefowl, derived originally from Thailand and a line of selected White Leghorn chickens [22,23], with a total of 572 F8 individuals used in this study. The intercross was based on 3 White Leghorn (WL) females and 1 Red Junglefowl (RJF) male. These 4 birds were used to generate 41 F1 and then 811 F2 progenies. The 572 F 8 individuals were produced over six batches and were generated from 118 families using 122 F 7 individuals (63 females and 59 males), while the average family size was 4.76 ± 3.1 (mean, s.d.) in the F 8 . This advanced intercross has already been used to identify candidate genes underlying variation in anxiety and sociability related behaviours [19][20][21]. The birds were behaviourally tested between the age of 3 and 5 weeks of age (for the social reinstatement and open field assays) and adulthood (for the tonic immobility assay). All individuals were culled by cervical neck dislocation followed by decapitation (as per the ethical permit) at 212 days of age after which the hypothalamus was dissected out and RNA extracted. For further details on feed and housing see [24]. The study was approved by the local Ethical Committee of the Swedish National Board for Laboratory Animals, ethical permit Dnr 50-13.

Behavioural Phenotyping
The advanced intercross birds had previously been phenotyped and mapped for three different behavioural traits: open field behaviour, social reinstatement and tonic immobility, with each measuring some aspects of anxiety-related behaviour. The specific tests are described in detail below: The social reinstatement test [25] measures social motivation and anxiety, with stressed chicks exhibiting a stronger social cohesion response [26]. In this test, the individual is placed at one end of a narrow arena, with conspecifics located at the far end. The amount of time the individual spends associated with the conspecifics as opposed to exploring the remainder of the arena is considered a measure of sociality and anxiety. A more social or anxious animal will spend more time associating with conspecifics and will approach the conspecifics more rapidly, and therefore spend less time in the start zone of the arena [26]. Trials were performed in a 100 cm × 40 cm arena. The social zone measured 20 cm × 40 cm and was adjacent to a wire mesh compartment containing three unfamiliar conspecific birds of the same age. Birds were placed in the start zone of the arena (also measured 20 cm × 40 cm) in the dark, prior to the lights being turned on and the trial beginning. Measurements were taken using the Ethovision software and continuous video recording (Noldus Information Technology, www.noldus.com). For each trial, total distance moved, length of time spent in the stimulus zone, latency to first enter the stimulus zone, and length of time in the start zone were measured. Each trial was five minutes and replicated twice per individual, with one week between an individual's first and second test. Individuals were immediately removed from the arena upon the completion of the test to reduce potential habituation. Trials were first performed at 3 weeks of age. There was 1 week between an individual's first and second trial.

Open Field
The open field assessment is a standard anxiety measurement, where the individual is placed alone in a brightly lit novel area after which behaviour is measured for a fixed duration and has been performed in a variety of vertebrates and invertebrates [27]. In our study, trials were performed in a 100 × 80 cm arena. Individuals were placed in the corner of the arena in complete darkness, prior to the test starting, with the lights turned on immediately at the commencement of the test. Trials lasted 5 min. Measurements were taken using the Ethovision software and continuous video recording (Noldus Information Technology, www.noldus.com). For each trial, total distance moved, proportion of time spent in the central zone (the 60 × 40 cm area in the middle of each arena was considered to be the central zone), velocity, and frequency (number of times) that the central zone was entered were measured. Velocity consists of the average time taken to move between two consecutive time-points, therefore this can distinguish if an animal moves rapidly or slowly through the arena. Each trial was performed twice, with~1 week between an individual's first and second trial, with the first trial at 4 weeks of age. The inter-test interval was the same as for the social reinstatement test and the tonic immobility test. Individuals were removed from the arena immediately upon the test finishing to reduce potential habituation.

Tonic Immobility
The third test, the tonic immobility test, measures an individual duration of immobility after being placed on its back and is thought to be a defence strategy evolved to reduce a predator's interest in the prey, when the prey stops moving after it has been caught. The longer the animal stays in this immobile state the more fearful it is considered to be. In our study, the test bird was placed on its back in a V-shaped wooden cradle (approximately 50 cm in length) and held by the experimenter with one hand over the sternum. The bird was held for 10 s and then the hand was slowly removed. The duration of tonic immobility was recorded up to 600 s. If the bird stood up within 30 s after the hand was removed from its sternum, new attempts to induce tonic immobility were made with up to three attempts per bird (for more information see [19]). The birds were tested after sexual maturity at 170 days of age, with 7 days between each trial.

Defining Intra-Individual Variability as a Trait
The main objective of this study was to test if intra-individual behavioural variability is a discrete trait, separate to the behaviour itself, and in particular if we can detect genomic loci that underlie this trait. We therefore calculated the degree of intra-individual variability present in the three separate behavioural traits already measured. As each test was performed twice for each bird, it was possible to calculate the magnitudes of the difference between the two trials to give an estimate of the Intra-individual variation (IIV, see Figure 1, and below) for each of the specific sub-phenotypes taken from each test (i.e., the IIV of distance moved in the open field, velocity in the open field, etc.), with increasing magnitude indicating increasing intra-individual variation. Initially the magnitude of intra-individual variation was calculated for each behaviour (IIV trait ). Second, these calculations of magnitude were used to calculate an average intra-individual variation magnitude per test (IIV average ) and finally a global magnitude of intra-individual variation was calculated for each individual based on all the recorded behaviours from the three tests (IIV global ), see below for more details). These composite variables were calculated as a way to see whether combined overall metrics for these traits could also be used to identify general IIV QTL for all sub-traits in a test and even if QTL for general IIV over multiple tests could be identified. Such loci can also be found when multiple different IIV QTL overlap the same location. magnitude of intra-individual variation was calculated for each individual based on all the recorded behaviours from the three tests (IIVglobal), see below for more details). These composite variables were calculated as a way to see whether combined overall metrics for these traits could also be used to identify general IIV QTL for all sub-traits in a test and even if QTL for general IIV over multiple tests could be identified. Such loci can also be found when multiple different IIV QTL overlap the same location.

Calculating the Magnitude of Intra-Individual Behavioural Variability
Intra-individual variation (IIV) was calculated for each behavioural trait (IIVtrait) as the absolute difference in values obtained for that trait in trial 1 versus trial 2. For these individual intra-individual trait values, the QTL analysis was run both with and without using the mean trait score as a covariate (see below). An average intra-individual variation (IIVaverage) was calculated for each test situation as the sum of all IIVtrait calculated within that test situation. Finally, a global intra-individual variation (IIVglobal) was calculated for each individual based on all the recorded behaviours from the three tests. Trait means and standard deviations are included in Supplementary Materials Table S1. To test for correlations between IIV traits, a Pearson's correlation test was used. IIV = Intra-individual variation IIVtrait = Trait Trial-1 -Trait Trial-2  IIVaverage = (Sum of all IIVtrait within a test)/(Number of traits measured in test) IIVglobal = (Sum of all IIVtrait for all test)/(Total number of traits measured)

Genotyping and QTL Mapping
A full set of genotypes and the required marker map for QTL mapping was already generated previously (see [28,29] for full details). This map comprised of a total of 652 SNP, that were used to generate a map length of ∼9267.5 cm, with an average marker spacing of ∼16 cm. Note that the study presented here is an example of a classical linkage study-it uses linkage between markers to map the number of recombinations that occur between two populations that have been intercrossed, with these recombinations occurring in a fixed series of inter-cross generations [30]. In contrast, a Genome-Wide Association Mapping study uses the linkage disequilibrium that exists in a single natural population (and has built-up historical recombinations over a much longer period of time). The advantage of a linkage study is that the genome is covered using relatively few markers. There is not much advantage gained from increasing the marker density to having less than 10 cm between markers, with the recommended marker density for standard QTL mapping being an average distance of 20-30 cm between markers [31]. In the study presented here, we have an average marker density of ~16 cm, therefore we have followed these recommended guidelines. The increase in total number of

Calculating the Magnitude of Intra-Individual Behavioural Variability
Intra-individual variation (IIV) was calculated for each behavioural trait (IIV trait ) as the absolute difference in values obtained for that trait in trial 1 versus trial 2. For these individual intra-individual trait values, the QTL analysis was run both with and without using the mean trait score as a covariate (see below). An average intra-individual variation (IIV average ) was calculated for each test situation as the sum of all IIV trait calculated within that test situation. Finally, a global intra-individual variation (IIV global ) was calculated for each individual based on all the recorded behaviours from the three tests. Trait means and standard deviations are included in Supplementary Materials Table S1. To test for correlations between IIV traits, a Pearson's correlation test was used. IIV = Intra-individual variation IIV trait = | Trait Trial-1 -Trait Trial-2 | IIV average = (Sum of all IIV trait within a test)/(Number of traits measured in test) IIV global = (Sum of all IIV trait for all test)/(Total number of traits measured)

Genotyping and QTL Mapping
A full set of genotypes and the required marker map for QTL mapping was already generated previously (see [28,29] for full details). This map comprised of a total of 652 SNP, that were used to generate a map length of~9267.5 cm, with an average marker spacing of~16 cm. Note that the study presented here is an example of a classical linkage study-it uses linkage between markers to map the number of recombinations that occur between two populations that have been intercrossed, with these recombinations occurring in a fixed series of inter-cross generations [30]. In contrast, a Genome-Wide Association Mapping study uses the linkage disequilibrium that exists in a single natural population (and has built-up historical recombinations over a much longer period of time). The advantage of a linkage study is that the genome is covered using relatively few markers. There is not much advantage gained from increasing the marker density to having less than 10 cm between markers, with the recommended marker density for standard QTL mapping being an average distance of 20-30 cm between markers [31]. In the study presented here, we have an average marker density of~16 cm, therefore we have followed these recommended guidelines. The increase in total number of recombinations present in the intercross is reflected in the threefold increase in total map length of the intercross in the F 8 compared to the F 2 generations (~3000 cm in the F 2~9 000 cm in the F 8 generation). QTL analysis was performed using the R/qtl software package [32] and interval mapping was performed using an additive + dominance model. Batch and sex were always included in the QTL behavioural analysis as fixed effects, while a principle component to account for population structure was included as a covariate. In addition, for all the individual trait IIV values, two analyses were run with and without the trait mean as a covariate. This is important to ascertain whether effects are due to the range being proportional to the mean trait value [7]. A sex interaction effect was added, when significant, to account for a particular QTL varying between the sexes. Digenic epistatic analysis was performed according to Broman and Sen (2009) guidelines [33], and a global model incorporating standard main effects, sex interactions, and epistasis was built. The most significant loci were added to the model first, followed by the less significant loci. eQTL mapping was also performed previously on the cross using R/qtl [21]. A local, potentially cis-acting, eQTL (defined as a QTL that was located close to the target gene affected) was called if a signal was detected in the closest flanking markers to the gene in question, to a minimum of 100 cm around the gene (i.e., 50 cm upstream and downstream of the gene). A distance of 50 cm was used to ensure that at least two markers up and downstream from the gene location were selected to enable interval mapping to be performed. The trans-eQTL scan encompassed the whole genome and used a genome-wide empirical significance threshold. In total, 535 local eQTL and 99 trans-eQTL were identified previously.

Significance Thresholds
Significance thresholds for the behavioural QTL analysis were calculated by permutation tests [34,35]. Permutation is by the far the most standard method for calculating significance thresholds in QTL mapping [30], and involve shuffling the phenotype (trait), while maintaining the genotype structure for each individual. For a permuted dataset, a full QTL mapping scan is then conducted and the highest trait value of all the positions assessed is retained. This is then repeated 1000 times, to give 1000 permuted values showing the highest QTL effect detected by chance in each case. To assess significance in the original data, any QTL detected are then compared to the permuted data, with this being used to generate an experiment wide threshold that controls for the large number of tests performed during QTL mapping (i.e., with a marker map of 9000 cm, essentially 9000 tests are being performed, though this is complicated by the fact that many are of course not independent from one another, hence the power of the permutation approach). The function n.perm in the r/qtl package was used to perform these permutations and generate the significance thresholds [32]. A genome-wide 20% threshold was considered suggestive, with this being more conservative than the standard suggestive threshold [36], while a 5% genome-wide level was significant. The~5% significant threshold was LOD 4.4, while the suggestive threshold was~3.6. Confidence intervals for each QTL were calculated with a 1.8 LOD drop method (i.e., where the LOD score on either side of the peak decreases by 1.8 LOD), with such a threshold giving an accurate 95% confidence interval for an intercross type population [37]. The nearest marker to this 1.8 LOD decrease was then used to give the confidence intervals in megabases. Epistatic interactions were also assessed using a permutation threshold generated using R/qtl, with a 20% suggestive and 5% significant genome-wide threshold once again used. In the case of epistatic loci, the approximate average LOD significance threshold for pairs of loci were as follows (using the guidelines given in Broman and Sen [33] full model~11, full vs. one~9, interactive~7, additive~7, additive vs. one~4. Although permutation testing in this manner accurately controls for the large number of multiple testing performed during QTL mapping in terms of the number of markers being tested, it does not control for the number of independent traits being analysed. In this case, we have analysed 13 different traits taken from three separate behavioural tests. Within each test, however, the traits are strongly correlated with one another (see Table 1). As all correlated tests are not independent, they do not need to corrected for. Of the tests, all open field IIV traits were strongly correlated with one another, while social reinstatement IIV traits were essentially composed of two independent groups. Strong correlations also existed between the traits comprised of averages over different tests (global IIV, social reinstatement and open field behaviour IIV). As these composite traits were significantly correlated with the individual IIV measurements and not independent it was not necessary to apply a multiple testing correction for them. Therefore, we applied a multiple testing correction of 4, representing the three different test types, plus one additional correction as social reinstatement essentially comprised of two groups. To correct the LOD thresholds, the log 10 of 4 was added to the LOD score (one LOD being equivalent to a tenfold increase in significance). This led to final suggestive and significant thresholds of~4.2 and 5.0 LOD, respectively.

Candidate Gene Analysis
To identify putatively causal genes underlying intra-individual behavioural variability, candidate QTL were overlapped with eQTL detected in the same cross (see [21]). For this overlap, any eQTL whose 95% confidence interval overlapped with the 95% confidence interval of a behavioural IIV QTL was then identified as being a putative candidate for further analysis. Once these QTL and eQTL were overlapped, each significant eQTL that overlapped a QTL was correlated with the IIV behavioural trait of the QTL to test for significance. For each eQTL overlapping a behaviour QTL, a linear model was fitted with the behaviour trait as a response variable and the expression trait as predictor, including sex and batch as factors. Weight at 42 days was included for traits where weight was used as a covariate in the QTL analysis, all nonsignificant co-factors were then removed sequentially from the model. A multiple testing correction was included based on the number of overlapping eQTL that were present in each QTL region, though where these probes were correlated with one another the multiple testing correction was reduced accordingly due to such probes being non-independent. Any probes that were suggestive (p-value below a nominal 0.05 threshold) or significantly correlated (those with a multiple testing corrected p-value below 0.05) were then used for the final causality analysis using NEO (see below). One issue with using this approach with this particular data set is that the behavioural QTL were based on up to 572 individuals, whereas the eQTL/expression phenotypes were available only for 129 individuals. Therefore, the network edge orienting (NEO) method for causality testing was applied only where the behavioural QTL that a gene was potentially causative to was detectable in the smaller data set (n = 129). This technique has previously been successfully used with this intercross to detect genes that were potentially causal for both open field and social reinstatement behaviour [20,21].

Network Edge Orientation (NEO) Analysis
Causality analysis was performed using NEO software [38] to test whether the expression of correlational candidates was consistent with the transcript having a causal effect on the behavioural trait. Single-marker analysis was performed with NEO fitting a causal model (marker → expression trait → behaviour) and three other types (reactive, confounded, and collider). The models tested by NEO were (1) CAUSAL: Genotype modifies gene expression which in turn modifies behaviour (genotype → expression trait → behaviour). (2) REACTIVE: Genotype modifies behaviour which in turn modifies the expression trait (genotype → behaviour → expression trait). (3) CONFOUNDED: Genotype modifies both the expression trait and the behaviour separately (expression trait ← genotype → behaviour). (4) COLLIDER (behaviour is the collider): Genotype and the expression trait both independently modify behaviour (genotype → behaviour ← expression trait). (5) COLLIDER: (expression is the collider): Genotype and behaviour both independently modify the expression trait (genotype → expression trait ← behaviour). The leo.nb score quantifies the relative probability of the causal model (the preferred model) to the model with the next best fit (of the four remaining). The NEO software evaluates the fit of the model with a χ 2 -test, a higher p-value indicating a better fit of the model. The best fitting model is chosen based on the ratio of the χ 2 p-value to the p-value of the next best model on a logarithmic scale (base 10), called local edge orienting against the next best model (leo.nb) scores. A positive leo.nb score indicates that the causal model fits better than any competing model. Aten et al [38] use a leo.nb score of 0.8 when two traits share the same SNP-anchor locus, and a multiple genetic marker leo.nb.oca of 0.3 as their threshold. More recently, other authors have relaxed this to use a threshold of 0.3 for the leo.nb score [39,40]. They also suggest that users inspect the p-value of the causal model to ensure the fit is good (in this case meaning the model p-value should be non-significant if the causal model fits the best). In effect this p-value is the probability of another model fitting the observed data (i.e., if this is significant, then another model also fits the data, if it is not significant then the causal model is the only one that fits). For each gene, we report the leo.nb score and p-value of the causal model. Leo.nb scores of 0.3 or more were considered suggestive, whilst leo.nb scores of 1.0 or more were considered significant. Contrastingly, if the p-value of the model was non-significant then only the causal model was significant, whilst if this was significant (p < 0.05) then another model also fitted the data.

Data Availability
Microarray data for the chicken hypothalamus tissue are available at E-MTAB-3154 in ArrayExpress.

Correlational Structure between Intra-Individual Behavioural Variability Traits
All of the IIV sub-traits within the Open Field test were strongly correlated with one another, while these traits were also correlated with the IIV of distance moved in the Social Reinstatement test, as well as with the global, average of all OF traits, and average of the combined OF and SR traits (see Table 1). A similar pattern was seen for the Social Reinstatement IIV sub-traits, however one trait (IIV of distance moved in the SR test) was correlated with Open Field IIV traits rather than the other SR IIV traits. In contrast, the Tonic Immobility IIV trait shows no correlation with any of the sub traits from either Open Field or Social Reinstatement tests. The correlations of the different intra-individual behavioural variability scores therefore appear to show that individuals were repeatable in Intra-Individual Variability (IIV) across the SR-test and OF-tests (Table 1). For some behavioural traits, the magnitude of intra-individual variability was even more strongly positively correlated between behaviours in different test-situations, than behaviours within the same test situation (see Table 1). These correlations were seen between IIV scores obtained from the OF-test and the SR-test (such as magnitude of IIV in "movement" in the SR test and magnitude of IIV in "velocity" or "time-spent-in-centre" in the OF test). The same pattern could be seen in the correlations between average IIV scores obtained from the OF-test and SR-test and the individual IIV variability scores obtained from each trait (see Table 1). Correlation between the magnitude of intra-individual behavioural variability, with p-values above the diagonal and correlation coefficients below the diagonal. Significant correlations are in bold and indicated with an asterisk (* indicate a p-value significant to p < 0.05, *** indicate a p-values significant to p < 0.001).

Genetic Architecture of Intra-Individual Behavioural Variation
In total, 13 IIV traits were analysed from the three tests, with four each from the open field test and social reinstatement test, one from the tonic immobility test and four that were created by averaging within and between tests. We identify18 IIV QTL in 10 discrete loci, so essentially 10 separate QTL regions were identified for these traits (see Table 2). Of the initial 18 QTL, twelve QTL were detected for the magnitude of IIV within the OF-test, two QTL for the magnitude of IIV within SR-test, three QTL for the magnitude of IIV for the TI-test and one QTL for "global" magnitude of IIV (an overall measure of IIV using all the behavioural traits-see methods). The average effect sizes of these QTL was 5.8%, which is in line with the standard effect size of a behavioural QTL (Flint and Mott. 2001) and was slightly higher than the average for the main effect QTL for these traits (i.e., the QTL from the mean trait values), which was 5.0%. In terms of the direction of effects (i.e., if the allele conferring the largest effect was domestic or wild-derived), these were fairly equally distributed, with eight showing a greater additive effect coming from the White Leghorn allele, five showing a greater additive effect from the Red Junglefowl allele, and six with only dominance effects. Overall, the detected genetic architecture for IIV was largely separate from the genetic architecture for the standard corresponding traits (see Table 2 and Supplementary Table S2). Only three QTL, located on chromosome 10 (from 126-130 cm), on chromosome 7 (at 104cM) and on chromosome 24 (located at 77 cm) overlapped any of the previously detected QTL for behavioural scores in the OF and SR (see Table 2, and [20,21]). In the case of the QTL on chromosome 24 (for IIV in the open field test), although there was an overlap with a standard QTL, this QTL was for the social reinstatement test, not for the open field test. This pattern held true even when correcting for the mean value of the behavioural traits (see Supplementary  Table S3), thereby ensuring that variation in the mean value did not contribute to variation in the variability measured.   Location (in cm), LOD score, additive and dominance effects, 95% confidence interval in cm, covariates used and r-squared (effect size %) are all included. IIV QTL are highlighted in blue (for QTL that do not overlap with any main effects QTL) and in grey (for QTL that do overlap main effects QTL).

Candidate Genes for Intra-Individual Behavioural Variability
To identify potential candidate genes that are involved in forming different IIV phenotypes, we combined quantitative trait locus (QTL) and expression QTL (eQTL) analyses of the brains from the advanced intercross. Intra-individual behavioural variability QTL were overlapped with eQTL, then the relevant behavioural IIV trait was correlated with the overlapped expression trait to identify those that not only overlapped but also exhibited a significant correlation (as used in [20,21]). This initial analysis led to the identification of 10 putative genes underlying phenotypic differences in behavioural IIV, with these genes then taken to the next step of causality analysis (see Table 3).
As correlations alone are not enough to indicate the direction of the relationship between the candidate genes and behavioural traits, and support causality, we then utilized the network edge orienting (NEO) package [38], to infer causality between genotype, gene expression and behavioural IIV. The NEO package uses structural equation modelling to test the fit of five models each explaining a potential relationship between genotype (based on genetic marker), gene expression and behavioural IIV. The analysis produces a leo.nb score and a model p-value (see methods). The leo.nb score is defined as the log10 ratio of the causal model p-value to the next best model of the four remaining (all of which suggest a non-causal relationship between the behaviour and candidate gene). Therefore, a leo.nb score of 0.3 indicates that the causal model is the best fit, with twice as high model p-value than the next best model. In NEO and structural equation modelling in general, a small model p-value (e.g., p < 0.05) indicates a poor model fit. Our NEO analysis found six of the candidate genes (Novel gene/X603599288F1, ITGBL1, SFRP4, X603600179F1/LOC100136711, MAP7, ENPP1) passed the suggestive threshold of 0.3 (see Table 3). The NEO analysis also showed support for the gene C7H2oRF47 as controlling the magnitude of "global" IIV (this model was done with multiple markers, which has a recommended threshold of 0.3, see [38]), with the model almost reaching the more stringent threshold suggested by Aten and colleagues.
Four of the candidate genes had previously been identified as having some bearing on "behaviour" (MAP7, SRP4, ENPP1, LOC100136711) or neuronal development (MAP7), while one had no previous evidence of functionality (ITGBL1) and one was a novel gene (X603599288F1).

Discussion
We find that intra-individual variability (IIV) in behaviour is replicable between separate traits measured in the same behavioural test and to some extent between traits in different behavioural tests, although of similar contexts. These cross-test correlations demonstrate that individuals show consistency in their level of behavioural IIV both within and across test-type. While the repeatability of IIV that we find across traits measured within the same behavioural test could be caused by either correlated measurement error, or because they all respond in the same way to the same (unmeasured) environment gradient, the positive correlations we find between IIV across different types of assays are not autocorrelated. Note that the above conclusions can only be from the correlations that exist between the individual IIV measures. The correlations that exist between the different IIV traits and the composite measurements is not an indication of within-trait consistency, as these composite measures are by definition always correlated with the individual sub-traits they are composed of. In addition, we detected 10 discrete genetic loci (reflected in 18 QTL), with only three of these overlapping any of the QTL detected for the standard behavioural measures. Therefore, the majority of the IIV QTL detected in this cross had at least a partially separate genetic architecture to the standard behavioural QTL from the same tests. An important caveat here is that with any QTL analysis, not all of the QTL affecting the trait will be detected, in particular smaller effect QTL will easily be missed [30], and we will have less power to detect QTL that fall between two markers that are relatively far apart. Therefore, many more QTL for the IIV and standard behavioural traits exist but with effect sizes too small to detect, and these may well overlap. Therefore, we are restricted to concluding that of the detected QTL, very few overlap, but that we cannot say this is consistent for the entire genetic architecture.
IIV (also referred to as the variability sensitivity hypothesis), although previously ignored, has more recently been considered more carefully and is now considered a trait in its own right [41,42]. Mechanistically, IIV consists of two aspects of temporal variability [7]: systematic changes and residual variation. Systematic changes in behaviour over time (such as habituation and acclimatization effects, with the animal adjusting/adapting to the test itself) usually occur over a short time period, while "residual" unpredictable variation shows no pattern [43,44]. Both of these types may not simply be noise, but stable instability over time [9,45,46], with evolutionary and ecological advantages. In birds, IIV in song repertoire has been found to decrease with age [9], while poor nutrition in early life can lead to a reduction in IIV later [10]. Provisioning in great tits can also vary in response to the environment, with IIV increasing in challenging conditions [11], whilst brood size manipulations in pied flycatchers can lead to variability sensitivity in provisioning [12]. Unpredictable behaviour has also been shown to reduce capture risk when an individual is under a threat of predation [8,13]. Hermit crabs, for example, increase both their startle response time and their IIV in response to an increase in the perceived risk of predation [8].
Despite these advantages, for IIV to evolve and be selected upon, it must be heritable [47,48]. To date, only one study has estimated the heritability of IIV [17]. This study found a low, yet significantly non-zero heritability estimate of 0.03 (0.02-0.05 interval) for predictability in docility behaviour in marmots. As yet, however, no previous studies have examined the genetic architecture for this type of trait, which we have performed here. We identified a total of 18 QTL to underlie between-individual variation in IIV, of which only three QTL overlapped any of the previously detected QTL for the behavioural scores they were based on. However, the IIV QTL did not overlap the behavioural QTL from the mean behavioural traits they were based on (see Supplementary Materials Tables S2 and S3). This implies that selection can act separately on an individual's level of fearfulness and degree of IIV in fearfulness, thereby increasing the scope for diversity in behavioural phenotypes [7]. An individual's fear response would therefore not necessarily be an indication of how predictable that individual would be in its fear response. However, our finding, that individuals show consistency in predictability across test situations, indicates that predictions can be made about an individual's predictability from one situation to another, at least to some extent. This is particularly interesting in light of the idea of contextual plasticity [49], which implies that IIV should be test specific.
We identified 6 candidate genes underlying intra-individual variation in behaviour. Some of these genes have previously been shown to be involved in natural behavioural variation and neural development, SFRP4 and LOC100136711 (the probeset was previously annotated as LOC770352, [21]) have, for example, been linked to anxiety in the chicken [21], demonstrating some overlap in the genetic architecture of intra-individual variation in behaviour based on anxiety-related traits and anxiety itself. Another of the candidate genes ENPP1 has been linked to learning and memory abilities in mice, while MAP7 has been shown to play a critical role in the developmental regulation of neural axon branch maturation (MAP7, [50]) and is also involved in schizophrenia (MAP7, [51,52]). This could indicate a potential link between schizophrenia and intra-individual variation in behaviour and the genes that underlie these traits. Increased intra-individual behavioural variability characterizes the performance of people with schizophrenia and has been used for early diagnosis of this disorder (for review see [53]).
The intra-individual behavioural variability we measure in our population could be due to an innate difference in predictability between individuals, or alternatively could be due to inter-individual differences in adjustment to changes in novelty, such as habituation (decreased responsiveness to a stimulus with repeated presentation) or sensitization (increased responsiveness to a stimulus with repeated presentation, [54]). Domestic ducks, for example, habituate faster than wild Mallard ducks [55] and we find slightly more domesticated alleles associated with intra-individual behavioural variation suggesting that the domesticated genotype produces an individual that is either more unpredictable in its behaviour or habituates faster. Domesticated chickens differ considerably in their behavioural responses from their wild progeny the Red Junglefowl, especially in anxiety related tests, such as the ones used in this study [56] and have also developed a brain that differs both in size and composition from their wild progeny [57]. Alternatively, it could also be that the intra-individual behavioural variation that we measure is due to individual differences in behavioural stabilization [5]. However, due to the nature of our data we can only speculate about the true nature of the intra-individual variability that we measure. The fact that we identify an underlying genetic mechanism implies not only that this is a trait that selection can act upon, but also a trait that can interact with and be affected by different environmental factors and potentially also determine the sensitivity of an individual to the environment [4,41]. Our data also shows that IIV in behaviour is consistent at least to some degree between test-situations and that its genetic architecture is at least partly distinct from that of the main effects for those behaviours.
Intra-individual variation in the tonic immobility test did not correlate with the intra-individual variation observed in the other test situations. One possible explanation could be that individuals were adult when measured in the TI test and chicks when measured in the OF and SR test. Age has previously been shown to affect intra-individual behavioural variation in Red Junglefowls [58] perhaps because the cumulative experience of the environment leads to increasing consistency with age or because the neural circuitry has matured [53]. Yet, a large meta-analysis across animal species [3] found no difference in the repeatability of behaviour between juveniles and adults. Another explanation could be that the TI-test situation is different from the two other tests used in this study. In the TI-test, a test person physically restrains the bird to induce fear, whereas in the SR-test and OF-test, the test-person stays out of sight. Tonic immobility might therefore elicit a more direct fear response to a "predator", whereas the SR-test and OF-test measures an animal's anxiety levels when feeling exposed and alone. This would indicate that intra-individual variation in behaviour is trait specific, at least to some extent. It would therefore be interesting in future studies to explore intra-individual variation in behaviour across more direct fear inducing tests, to see if the magnitude of intra-individual behavioural variation in the TI-test is test or trait-specific.
In conclusion, this study demonstrates that intra-individual behavioural variation, like other behavioural traits, shows consistent differences between individuals and genotypes, even when animals are tested and reared in the same environment. This kind of behavioural variability between individuals has its own unique genetic architecture, separate from the behavioural traits it is based upon and therefore represents an important axis of consistent behavioural variation that evolution can act on. Our analysis also highlights six genes as putatively causative for intra-individual behavioural variation, four of which have been previously linked to behaviour and neural development.