Observing Brief and Irregular Behaviour of Animals—The Validity of Short Observation Periods

: There are efﬁcient sampling methods to accurately estimate behaviour with a moderate or long duration. For short behaviour, observing animals continuously is recommended although there is no recommended minimum observation time. In most studies, sampling method and observation time per day is determined by practical considerations. Thus, this study analysed the validity of behavioural observations in different observation periods using continuous sampling (CS) or time sampling (TS) based on biting behaviour. Tail-biting and ear-biting of weaned piglets in six pens were continuously observed for 12 h per day for 4 days to form a reference. Shorter observation periods of CS and TS were simulated by taking subsets of this reference. The amount of behaviour per hour of each observation period was compared to the reference and to other observation period of the same kind and length. Four different measurements were deﬁned to calculate accuracy scores (AS; 0–1; higher values are better). Comparison to the reference shows better AS for observation periods with longer observation time in total (0.5 h of CS: 0.2; 6 h of CS: 0.6). Additionally, TS covers longer time periods without decreasing AS. However, focus on activity time results in an overestimation of irregular behaviour. Comparing AS among observation periods of the same kind and length show overall low agreement. This study indicated problems of different observation periods of CS and TS to accurately estimate behaviour. Therefore, validity of behavioural observations should be analysed in greater detail to determine optimal sampling methods.


Introduction
A valid database is important for every scientific study. In most cases, studying animal behaviour relies on animal observations to acquire this database. However, behavioural observations are highly time-consuming. Since the time used for observations is often restricted by practical considerations, different observation techniques are used to estimate the behaviour as effectively as possible by reducing the required analysis time [1]. For example, instead of using continuous sampling, which observes the animals continuously for a certain time-period, time sampling only observes the animals for portions of time with breaks in between. Here, the animals are still observed continuously but not for the complete time-period. Focal sampling observes only one animal continuously to facilitate the observation, and scan sampling periodically analyses snapshots after a determined time interval [2]. As each method has advantages and disadvantages, it is important to choose the appropriate method regarding the behaviour, situation and hypothesis being investigated to achieve a valid database. Thus, several studies have evaluated the validity of sampling methods of different behaviour. It has been shown that focal sampling can estimate behaviour correctly which do not differ significantly between the animals [3] and Appl. Sci. 2021, 11, 9770 2 of 12 scan sampling is a very efficient method to estimate behaviour with a long duration such as resting behaviour [4][5][6][7][8]. For behaviour with a shorter duration, however, the time between intervals has to be adjusted, i.e., shortened, to ensure validity. Thus, it can be difficult to observe social interactions as this behaviour often varies between the animals and only has a short duration. For example, biting behaviour in pigs takes only a few seconds and is easily missed by scan sampling if the time interval between scans is too long. However, shortening the interval as far as to accurately estimate biting behaviour (e.g., scan sampling every 5 s) would be less efficient than using continuous sampling. Thus, time sampling or continuous sampling is recommended to observe short behaviour such as biting behaviour in pigs [5,[9][10][11][12], although there is no recommendation regarding the required observation length. The observation time per day is often determined by practical factors and rarely exceeds 2 h per day [13][14][15][16][17][18].
Therefore, the aim of this study was to analyse the validity of different observation periods of different length comparing continuous sampling and time sampling on the basis of tail-biting behaviour and ear-biting behaviour of pigs.

Materials and Methods
This study used behavioural observations of a tail-biting behaviour study in pigs [19], which investigated tail-biting behaviour in weaned piglets from November 2016 until April 2017 on the agricultural research farm "Futterkamp", which is part of the Chamber of Agriculture of Schleswig-Holstein, Germany. There were six conventional pens with 24 piglets per pen, which were weaned after 28 days and then video recorded for 40 days using one AXIS M3024-LVE Network Camera per pen produced by Axis Communications. The piglets were individually marked to allow identification in the videos. Additionally, tail-lesions were scored twice a week using the "German Pig Scoring Key" (German designation: Deutscher Schweine Boniturschlüssel) [20]. If a pig was identified as sick or injured, it was treated in the pen or removed to a hospital pen. The day of a tail-biting outbreak was defined for each pen individually as the first scoring day on which at least one large tail-lesion (larger than the diameter of the tail) was documented. If a tail-biting outbreak was identified, further pen enrichment was added to the pen and biting pigs were removed. From this day on, the previous four days were analysed using continuous event sampling during hours of light (6:00 to 18:00). One trained observer analysed the tail-and ear-biting behaviour of the pigs. This behaviour was defined as manipulating, sucking or chewing on the tail or ear of another piglet [15]. The time, the initiator and the receiver of the behaviour was documented. These behavioural observations of the 12 h observation period formed the reference. For the analysis of this study, it was assumed that these represented reality without missing true events or adding false events. All smaller observation periods were later compared to this reference.
To simulate shorter observation periods of continuous sampling or time sampling, the dataset of the continuous observation of 12 h per day was used as a starting point for subsampling. See Table 1 and Figure 1 for an overview of the observation periods used. The 12 h (06:00 to 18:00) were evenly divided into smaller time intervals forming shorter observation periods (6 h, 3 h, 1 h and 0.5 h) of continuous sampling. However, the observation periods 3 h, 1 h and 0.5 h were limited to the activity times of the pigs (08:30 to 11:30 in the morning and 14:30 to 17:30 in the afternoon). The activity time was defined as time periods with the most biting behaviour per hour and it was determined beforehand. Observation periods of the same length did not overlap. For example, the first observation period of 1 h started 08:30 and ended 09:30, the second observation period of 1 h started 9:30 and ended 10:30, etcetera. This resulted in one 12 h observation period, two 6 h observation periods, two 3 h observation periods, six 1 h observation periods and twelve 0.5 h observation periods per day. Additionally, the first 10 min of each 30 min were used to simulate time sampling. Here, the observation periods covered the 3 h of activity time in the morning or in the afternoon (6 × 10 min), a combination of both activity times (2 × 6 × 10 min), 6 h in the morning or afternoon (12 × 10 min) and the whole day during the 12 h of light (24 × 10 min). Again, observation periods of the same length did not overlap. For example, the observation period of 6 × 10 min started at 08:30 and analysed the first 10 min of each 30 min for the following 3 h (08:30-08:40, 09:00-09:10, 09:30-09:40, 10:00-10:10, 10:30-10:40 and 11:00-11:10). The second observation of 6 × 10 min started at 08:40 and analysed the first 10 min of each 30 min for the following 3 h (08:40-08:50, 09:10-09:20, 09:40 -09:50, 10:10-10:20, 10:40-10:50 and 11 :10-11:20), etcetera. This resulted in three 24 × 10 min observation periods, six 12 × 10 min observation periods, three 2 × 6 × 10 min observation periods and six 6 × 10 min observation periods per day. 09:40 -09:50, 10:10-10:20, 10:40-10:50 and 11 :10-11:20), etcetera. This resulted in three × 10 min observation periods, six 12 × 10 min observation periods, three 2 × 6 × 10 m observation periods and six 6 × 10 min observation periods per day.   The behaviour illustrated was counted within each observation period for each pig and each behaviour (initiating tail-biting behaviour, receiving tail-biting behaviour, initiating ear-biting behaviour and receiving ear-biting behaviour) separately. To simplify the comparison between different lengths of observation periods, the amount of behaviour illustrated was adjusted to the amount of shown behaviour per hour. Then, all observation periods were compared twice for each behaviour separately: Comparison 1; compared all observation periods to the continuous observation period of 12 h to analyse the agreement to the reference, i.e., the representation of reality. Comparison 2; compared all observation periods of the same kind (continuous sampling or time sampling) and length with each other to analyse the agreement between the observation periods. It was known that the amount of biting behaviour shown increased over the four observation days [19]. Thus, differences between the days would arise from the developing tail-biting outbreak and not from measurement errors of the chosen sampling method or observation period. Because this study focused on the differences of observation periods and sampling methods, each day was used as an independent unit and all comparisons were limited to the observation periods within a day. Different measurements of accuracy were used for the comparisons.
On the one hand, a measurement relied on the numerical agreement. The amount of behaviour of a pig shown in one observation period was compared to the amount of behaviour of the same pig shown in the other observation period. However, a tolerance range was applied, because it was highly unlikely to observe an animal showing behaviour in two observation periods for the exact same amount of time. The robustness of the database of this study was analysed beforehand for the effect of missing data. It could be shown that observation periods of different length were robust against missing data if an error rate of 10% was not exceeded. Based on these results, the tolerance range was set to ±10% as values within this tolerance range could still be used for an agreement.
Other tolerance ranges were also tested. Wider tolerance ranges increased the resulting accuracy scores and their variance; however, the ratio between the mean accuracy scores of the different observation periods did not change. Still, the tolerance range was set to ±10%. Therefore, the agreement was either 1 (the amount of behaviour of a pig shown in one observation period was within the range of the amount of behaviour ±10% shown in the second observation period) or 0 (the amount of behaviour of a pig was not within the range of the amount of behaviour ±10% shown in the second observation period). The percentage of agreements (PA) was then calculated per pen.
On the other hand, measurements relied on the stability of the ranking order within a pen. The measurement Overlap Top 1 checked whether the animal(s) in the highest ranking in an observation period had the highest rank in the second observation period as well. This was defined as where U is the set of animals of the highest ranking in one observation period and V is the set of animals of the highest ranking in the other observation period [21]. Similarly, the measurement Overlap Top 3 checked whether the animals in the highest three rankings of one observation period were among the highest three rankings of the second observation period. This was defined as where U is the set of animals in the highest three rankings of one observation period and V is the set of animals in the highest three rankings of the other observation period [21]. The last measurement was the squared Spearman correlation coefficient, which analyses the changes in the whole ranking order between one observation period and the other. It can be interpreted as the proportion of variance in the ranking order of one observation period accounted for by the observed ranks of the other observation period [21].

Results
The analysis of the 288 h of video footage produced a total of 18,364 observations divided into 9950 tail-biting behaviour events and 8414 ear-biting behaviour events. The mean, standard deviation and range of the four types of behaviour analysed (initiating tail-biting behaviour, receiving tail-biting behaviour, initiating ear-biting behaviour and Appl. Sci. 2021, 11, 9770 5 of 12 receiving ear-biting behaviour) according to the observation periods used are shown in Table 2. There is a wide span between the mean and extreme values. The results of comparison 1 (comparison of different observation periods to the reference of 12 h continuous observation) are shown in Figure 2a-d according to behaviour analysed and measurement of accuracy used. Generally, the accuracy scores range from 0 to 1, whereas the mean accuracy scores range from 0.03 to 0.77. The differences between continuous sampling observation periods and time sampling observation periods come mainly from an overall trend of longer observation periods producing better accuracy scores. Apart from this, the accuracy scores are on a similar level. The accuracy scores of the different behaviours are on a similar level as well, although initiating tail-biting behaviour has slightly better scores. Comparing the different measurements of accuracy shows better accuracy scores for the measurement based on the ranking order.
In Figure 3a-d, the results of comparison 2 (comparison between the observation periods of the same kind and length) are displayed. Here, the mean accuracy scores range from 0.05 to 0.47. Analogous to the comparison 1, the initiating tail-biting behaviour produces slightly better accuracy scores. Additionally, the measurement PA produces the best accuracy scores for observation periods of 1 h or shorter.   In Figure 3a-d, the results of comparison 2 (comparison between the observation periods of the same kind and length) are displayed. Here, the mean accuracy scores range from 0.05 to 0.47. Analogous to the comparison 1, the initiating tail-biting behaviour produces slightly better accuracy scores. Additionally, the measurement PA produces the best accuracy scores for observation periods of 1 h or shorter.

Discussion
This study investigated the validity of behavioural observations of biting behaviour in pigs. Observation periods of different length using continuous sampling or time sampling were compared to the observations of 12 h continuous sampling, which were set as the reference. The validity evaluates whether a measurement accurately measures reality [22]. Thus, the biting behaviour of pigs was analysed in as much detail as possible by a

Discussion
This study investigated the validity of behavioural observations of biting behaviour in pigs. Observation periods of different length using continuous sampling or time sampling were compared to the observations of 12 h continuous sampling, which were set as the reference. The validity evaluates whether a measurement accurately measures reality [22]. Thus, the biting behaviour of pigs was analysed in as much detail as possible by a thor-oughly trained observer for 12 h per day to approximately represent reality. Therefore, the observations were defined as an accurate representation of reality without missing events or adding false events to be able to test subsets of shorter observation periods for their validity.
In the comparison of different observation periods to the reference of 12 h continuous observation, the mean accuracy scores differed according to the observation period used. However, the differences in total analysis time accounted for this effect more than for example the differences between continuous sampling and time sampling. Here, analysing the video footage for a longer time in total resulted in better accuracy scores. Comparing observation periods of continuous sampling and time sampling that cover the same time-period (e.g., the activity time), it becomes clear that there is a loss of information resulting in worse accuracy scores of the time sampling observation periods, whereas the accuracy scores are at a similar level if observation periods of continuous sampling and time sampling which invested the same analysis time (e.g., 1 h during the activity time) are compared. Thus, time sampling is a trade-off as it accepts a loss in information in exchange for saving analysis time. However, this trade-off can be beneficial to use the limited time, which can be invested in observations more efficiently, because it allows for longer time periods without increasing the required analysis time and without decreasing accuracy, which is more important.
Comparing the accuracy scores of the behaviour analysed (initiating tail-biting behaviour, receiving tail-biting behaviour, initiating ear-biting behaviour and receiving earbiting behaviour) shows small differences, but they follow the same trends according to measurement of accuracy and observation period used. However, the accuracy scores of initiating tail-biting behaviour are slightly better for the measurements based on the ranking order (Top 1, Top 3 and R S 2 ). Here, the wider range in the amount of behaviour and the extreme values caused greater differences between the animals, especially in the higher rankings. Therefore, the ranking order was less affected by shorter observation periods with a disproportionate distribution of behaviour shown. This resulted in better accuracy scores of initiating tail-biting behaviour.
When evaluating the results of the accuracy scores it is essential to keep in mind the different foci of the measurements of agreement and the hypothesis of the study to be investigated. The measurements Top 1 and Top 3 are the proportion of correctly chosen animals in the highest or the three highest rankings in a pen based on the behaviour shown. R S 2 gives an insight into the stability of the ranking order of the whole pen and PA is the proportion of correctly estimated amount of behaviour shown. Thus, it is important to choose a measurement that is appropriate for the hypothesis under investigation. Besides, for the measurement PA, it has to be considered when the animals are observed and whether the behaviour is regularly shown throughout the day, as there are differences in the social interaction of animals during activity or resting periods [23]. In this study, for example, tail-biting behaviour was shown more often during the activity time in the morning or in the afternoon. Thus, focusing the observations on activity time and applying this to the whole day overestimated certain behaviour, since resting periods were neglected. As a result, the accuracy scores of observation periods focusing on the activity time were worse for the measurement PA compared to observation periods which included inactivity time.
Moreover, the comparison between the observation periods of the same kind and length produced mostly low accuracy scores. Therefore, the starting point of the observation can also affect the results considerably, which was demonstrated by Hämäläinen et al. [24] as well. This can complicate comparability between studies if animals are observed at different times or if animals differ in their daily routine because of variations in, for example, feeding schedules. Thus, sampling method, observation length as well as observation time should be carefully considered.
However, the question is whether it is possible to distinguish between bad, moderate or good accuracy. Or whether it is possible to determine a required minimum analysis time per day, bearing in mind that it is unusual for studies on biting behaviour to observe animals for longer than 2 h per day [13][14][15][16][17][18]? There are defined thresholds for the evaluation of correlation coefficients [2]. Applying them to R S 2 , an accuracy score between 0.16-0.49 results in moderate accuracy, between 0.49-0.81 in good accuracy and an accuracy score above 0.81 results in very good accuracy. If these thresholds are applied to the other measurements of accuracy as well, no observation period would exceed the threshold for very good accuracy on average. Moreover, not all observation periods of a total analysis time of 2 h or longer would exceed the threshold for good accuracy on average. For the measurement PA, only mean accuracy scores of the observation periods longer than 4 h would produce a moderate accuracy. Thus, the ranking order could be estimated with an at least moderate validity, whereas the amount of behaviour shown could not be estimated with an at least moderate validity except for the longest tested observation periods.
Nevertheless, these are results of biting behaviour that is difficult to observe because it is irregular and has a short duration [9]. Thus, it should be analysed whether the results presented are transferable to other types of behaviour as well. As different studies have demonstrated, behaviour can be efficiently and accurately estimated using scan sampling for behaviour with a long duration (e.g., resting) [4,6,[8][9][10][11] or time sampling for behaviour with a moderate duration (e.g., feeding and locomotion) [3,8]. However, in other studies on pigs, it was not possible to accurately estimate different behaviour with short duration using scan sampling, such as pain associated behaviour [12] or agonistic or sexual behaviour [9]. Hence, the validity of observations of short-term and irregular behaviour should be investigated in greater detail to approve results of existing studies and to determine optimal observation techniques for future studies.

Conclusions
This study investigated the validity of different observation periods and sampling methods on the basis of biting behaviour of pigs. It could be shown that observation periods which analysed more time in total produced better accuracy scores. Additionally, time sampling covers longer time periods without decreasing the accuracy compared to continuous sampling. Focusing on the activity time of the animals can lead to overestimation if the behaviour is not shown regularly during the day. This study demonstrated difficulties in choosing the appropriate observation period or sampling method to accurately observe short and irregular behaviour. Therefore, the validity of other behavioural observations should be further investigated in future studies to develop optimal sampling methods.
Author Contributions: Conceptualization, T.W., J.K. and K.B.; methodology, T.W., J.K. and K.B.; software, T.W. and K.B.; validation, J.K. and N.K.; formal analysis, T.W. and K.B.; investigation, T.W., J.K. and K.B.; resources, J.K. and K.B.; data curation, T.W.; writing-original draft preparation, T.W.; writing-review and editing, T.W., J.K., N.K. and K.B.; visualization, T.W.; supervision, J.K., N.K. and K.B.; project administration, K.B.; funding acquisition, J.K. and K.B. All authors have read and agreed to the published version of the manuscript. Institutional Review Board Statement: Ethical review and approval were waived for this study, because the animals were housed under standard commercial conditions and no pain, suffering or injury was inflicted on the animals during the experiments. The authors declare that the experiments were carried out strictly following international animal welfare guidelines. Additionally, the "German Animal Welfare Act" (German designation: TierSchG), the "German Order for the Protection of Animals used for Experimental Purposes and other Scientific Purposes (German designation: TierSchVersV) and the "German Order for the Protection of Production Animals used for Farming Purposes and other Animals kept for the Production of Animal Products" (German designation: TierSchNutztV) were applied.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy concerns.