Reliability of Perceptual-Cognitive Skills in a Complex, Laboratory-Based Team-Sport Setting

: The temporal occlusion paradigm is often used in anticipation and decision-making research in sports. Although it is considered as a valid measurement tool, evidence of its reproducibility is lacking but required for future cross-sectional and repeated-measures designs. Moreover, only a few studies on decision making in real-world environments exist. Here, we aimed at (a) imple-menting a temporal occlusion test with multi-dimensional motor response characteristics, and (b) assessing intra- and inter-session item reliability. Temporally occluded videos of attack sequences in a team handball scenario were created and combined with the SpeedCourt ® contact plate system. Participants were instructed to perform pre-speciﬁed defensive actions in response to the video stimuli presented on a life-size projection screen. The intra- and inter-session (after at least 24 h) reproducibility of subjects’ motor responses were analyzed. Signiﬁcant Cohen’s (0.44–0.54) and Fleiss’ (0.33–0.51) kappa statistics revealed moderate agreement of motor responses with the majority of attack situations in both intra- and inter-session analyses. Participants made faster choices with more visual information about the opponents’ unfolding action. Our ﬁndings indicate reliable decisions in a complex, near-game test environment for team handball players. The test provides a foundation for future temporal occlusion studies, including recommendations for new explanatory approaches in cognition research.


Introduction
In team sports, players use relevant visual cues of their opponents to score or prevent goals, or simply to position themselves advantageously for attack or defense. Visual information about player positioning [1,2] or postural cues [3][4][5] can be used to anticipate an opponent's intention [6][7][8] and allows for making punctual decisions. To investigate anticipation and decision making in laboratory settings, temporal occlusion (TO) [9] is a well-established paradigm that has been applied in several studies. In TO, action sequences are occluded at different times in order to restrict the visual information available and thus to create varying stages of anticipatory requirements. TO can therefore be used to identify postural cues that influence predictions of future actions or to distinguish better and worse players [10][11][12]. Previous studies using this method have demonstrated that high-skill athletes outperform their low-skill counterparts in response quality, meaning that they can use less visual information to foresee intended movements in action sequences. A systematic review conducted by Mann et al. [13] found an overall expert-novice betweengroup difference (p < 0.001) for response accuracy with an effect size of 0.25 in 64 selected TO studies. TO was applied as an expert-novice paradigm in numerous sport disciplines, for example, in volleyball [14,15], squash [16], badminton [17], tennis [18] and field-hockey [19], to name a few.
Despite the large body of evidence, relevant gaps in TO research include a systematic assessment of test reliability. Even though expert-level comparisons indicate validity, reliability is an equally important psychometric property with direct relevance for applied and basic research on developmental or training-induced changes in decision making. So far, reliability analyses in TO are mostly based on the prediction outcomes of the participants. In volleyball, a cross-sectional study using a computer test with binary choice options [14] investigated the internal consistency of prediction responses in a visual anticipation test. A split-half technique, using the Spearman-Brown formula, revealed a reliability coefficient for video pair responses of 0.72. Longitudinal studies in racquet sports demonstrated high inter-rater (r =0.92) and intra-rater (r = 0.98) reliability for decision accuracy in cricket [20] and for response accuracy in tennis (r =0.90-0.96) [21] and softball (r = 0.74 and r = 0.99) [22]. Intra-class correlations were used, and accuracy calculations were executed with interval scaled variables. When considering that the TO paradigm has been applied for the past 40 years, very limited knowledge about reliability and reproducibility of the paradigm itself exists. The effects of the choice of outcome parameter, test design (cross-sectional or longitudinal) and test setup on later interpretations of the obtained findings remain uninvestigated.
Moreover, other works with TO examined mainly accuracy outcomes in the form of dichotomous choice options (e.g., ball flight direction or type of throw). In team handball, the 7 m penalty, a rather isolated closed-game situation, has been the central object of investigations. A study by Loffing et al. [14] revealed differences between experienced and novice goalkeepers in anticipating hard or lobbed shots, and accuracy increased with later occlusion conditions. Results were confirmed in another study by Cocić et al. [23]. The binary outcomes were obtained in computer-based test settings or as verbal reports, often without time restrictions. Such laboratory-based test setups could surely lead to diminished expert advantages and seem to only partially capture anticipation skills [13]. Decisive moments of kinematic cues could be identified in penalty throwing; however, decision making under time constraints in complex, multifactorial situations was not considered in team handball hitherto. In one of a few studies, Williams et al. [21] noted more rapid decision making by skilled players in a real-world test scenario in tennis. Here, participants had to respond to real-life tennis serve projections by stepping onto one of four pressure sensitive pads and by swinging the racket as if to intercept the ball. As Ratcliff et al. [24] stated in their work, diffusion models could provide further reference points for anticipation and decision making in such multialternative choice assessments. They also emphasized the inclusion of supportive confidence judgements and response times.
Investigations dealing with open-game situations in team sports, in which field players face multialternative attack or defense decisions, are severely lacking. The general importance of sport-specific anticipation measures with near-game tasks and real-size projections is clear in the given literature [25].
In order to assess perceptual-cognitive skills in team handball, our experimental setup provided the possibility to circumvent the mentioned deficiencies of TO research. Our test setup required participants to make multi-categorical decisions in typical team handball defense situations, facing an attacker. We created a TO test scenario with standardized videos, where an elite center backcourt player executed specific attack actions. The defending participants had to decide how to respond to these attacks with predetermined defensive options. Throughout the test scenario, the duration of the videos increased, so the amount of information increased as well. We recorded the distributions of defensive actions and their particular motor initialization times. Initialization times of decision outcomes contribute to a better understanding of anticipatory judgements [12]. The multiple-choice nature of the test offered a genuine reflection of option-generating tasks in team handball. The main focus of this study was to develop a software-based test scenario and a subsequent quality criteria for intra-and inter-session reliability analysis.

Materials and Methods
A detailed overview including the statistical approaches used in the study design is given in Figure 1. With respect to the main aim of this study-reliability analysis-the study was created with a test-retest design, with two measurement sessions (session 1, session 2). To evaluate intra-session reliability in TO1 (session 1), we analyzed the level of agreement of the motor responses in each of the doubled video clips by using Cohen's kappa. Within session 1, we also analyzed the initialization times of the motor response choices with repeated measures ANOVA. The inter-session (sessions 1 and 2) reliability was evaluated by the level of agreement between the two doubled video clips from TO 1 and TO 2 with the use of Fleiss' kappa statistics.

Participants and Recruitment
Sample size calculations for the study design at hand revealed that at least 59 participants (with a default 10% drop-out) were required for analyses with n = 2 videos, a minimally acceptable level of reliability of p0 = 0.4 (null hypothesis), p = 0.05 and β = 0.2 [26]. Sixty-six male team handball players (M = 17.89 years, SD = 7.64 years) from six teams, of different age and performance levels (elite under-15; amateur and elite under-17; elite under-19; amateur and elite adult), participated in this study. Four teams (n = 44) belonged to a youth academy of a professional team handball club of the German Handball Federation. These four teams had six to eight training sessions per week, with one match at the weekend. They all competed in the highest leagues in their respective age groups. Therefore, players of these teams can be considered as elite players. Two teams (n = 22) were recruited from the rural and city area of Magdeburg, Germany. They competed at the local level, with two training sessions and one match per week. Players of these two teams can be considered as non-elite players. Testing was carried out in the first half of the team handball season of 2020/2021, in October and November. During this time, all championships in every league were running already; that is why all teams had a normal weekly training and match schedule, without being affected or restricted by any local or federal COVID-19 regulations. During the test, participants were instructed to perform with maximum effort. Injuries led to exclusion from the study. Prior to their participation, all participants and legal guardians were informed about the purpose, risks and benefits of the study. Participants had to give written informed consent before the first test. Participants later were not able to be identified from their test results. The study protocol was approved by the president's office from the Otto-von-Guericke University Magdeburg and the German Federal Institute of Sport Science (070506/ [19][20]. The study was conducted in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments.

Apparatus and Stimuli
The experimental test setup followed the procedures of Raab [27]. Here, we used the SpeedCourt ® (Q12 PRO mobile, GlobalSpeed, Hemsbach, Germany) system, an interactive, cognitive-motor test device. It includes ten contact plates (50 cm × 50 cm) on a platform of 5.25 m × 5.25 m with a life-size projection screen for experimental stimulus application. All plates can either be modularly connected or controlled individually. Signals of each plate were processed if the applied force exceeded 80 N. Due to the contact plate distances of about two meters (between the seven and nine meter lines, and in both sideways directions from the central plate), the SpeedCourt ® covers typical movement ranges of a central defender in team handball. A set of video clips of individual basic handball attacking actions was produced before the experiment. First, we created a video script with potential team handball attacks. We focused on individual basic and simple actions. In accordance with Müller et al. [28], we used four representative tasks as an important methodological aspect in the test design: breakthrough, pass, standing throw, and jump throw were filmed while considering their key movement characteristics as described by Kromer [29]. A video script provided the basis for subsequent video recordings and included various versions of all four attack actions, including a variety of movement executions, such as different run-up steps or changing movement directions.
The recordings took place on an official team handball field with three back players. A high-speed camera (GoPro HERO 6) with a resolution of 25 frames per second was placed on the 7 m penalty line. That position was meant to imitate the central block position for the defense, from a 1.8 m viewing height towards the attacking center back player. The left and right back players had a passive-assistive function as pass-givers; the right-handed center back player was the ball-carrying player. In order to ensure standardization with the highest movement quality, the center back player was a new member of a German DKB Handball Bundesliga team and also part of the German under-21 national team (during the championship season 2019/2020). This player was presented later in all videos during the measurements. Players' movements were performed as near-game as possible and with realistic dynamics.
Out of the recorded material, we scanned all clips for appropriateness or inclusion criteria. The inclusion criterion was a clear provocation of a defense action, which was feasible and applicable on the SpeedCourt ® for future participants. The four final selected attacks for the test scenario were characterized as follows: Breakthrough began with a pass from the right back player to the center back player into a parallel standing position, followed by a fluent deception move with two last steps to the player's right side and a jump throw onto the goal; the Standing throw and the Pass also started with a pass from the right back to the center back player, who immediately executed a three-step run-up with a subsequent throw onto the goal, or a pass to the left side; the jump throw was executed after the pass from the right side and a two-step run-up. Figure 2 illustrates the motion sequence of all four attacks. The appropriate defensive actions that had to be chosen by the participants later were forward movement/tackling, sideways movements (left/right) and passive position/blocking through holding the defense position [30]. The assignment of these actions to the respective contact plates is shown in Figure 3. We excluded attack actions that were considered too ambiguous in terms of execution or response. These actions were later used as dummy trial videos to avoid expectation effects in response behavior [31].
Videos were temporally occluded using Adobe Premiere Pro CS5. We detected and erased visual artefacts (e.g., the pass-givers), that could lead to possible memory effects in the participants. According to the fact that, due to their handedness, left-handers experience greater advantages in the context of anticipation in sports [32,33], all clips were horizontally flipped later into left-handed versions. The four attack sequences were temporally occluded within a general time frame of the ball being passed to an attacker (t 6 ) and the obvious end of the attack (t 0 ), with time intervals of 200 ms (t 6 = −1200 ms; t 5 = −1000 ms; t 4 = −800 ms; t 3 = −600 ms; t 2 = −400 ms; t 1 = −200 ms; t 0 = 0 ms). Finally, we doubled every video clip for the envisaged reliability analyses. We created 112 videos, resulting in a total of 224 video clips when dummy trials are also considered (4 base stimuli × 2 dummy trials × 7 TO conditions × 2 doubled × 2 handedness). This occlusion paradigm enables later explanations about how the number of postural cues within the attacker's movements effects decision making processes. The duration of each clip was not longer than 2 s (ending at t 0 ), and videos were 1280 × 720 pixels (width × height). The final experimental test scenario was implemented using Lazarus (Version 2.0.10), a Delphi compatible cross-platform for rapid application development. Breakthrough began with a pass from the left back player to the center back player in a parallel standing position, who made a fluent deceptive move and two steps to the right, followed by a throw after a single-leg jump. The jump throw was executed after a pass from the left side, catching the ball and a two-step run-up. The standing throw was executed after a pass from the left side, catching the ball and a three-step run-up. The pass was executed after a pass from the left side, catching the ball, two steps and a pass to the right side.

Procedure
In each TO test scenario, participants were tested individually in front of a projection screen (3 m width × 2.5 m height). In the test scenario, participants were instructed to give a motor response for every video in form of a team handball-specific defense action. For that, participants had to step onto a predetermined contact plate-for example, the participant could leave the central plate to move forward or jump block onto the central plate again. Team handball field lines were also marked for a game-realistic setup. Participants always started as a center block player in a classic man-to-man defense system, positioned on the 7 m line on the central contact plate (Figure 3). When assuming the starting position, a 3 s countdown appeared on the screen with an attack video following. Participants were instructed to respond as fast as possible after each presentation ended. To create equal conditions for the entire sample, participants were told to imagine having the same body height, body weight and age as the attacker, and they were used to seeing themselves in the central block position in the defense. The advice to show the motor response that came to the mind intuitively while watching and before responding after the video was strongly emphasized during the instructions. Generally, there was no time limit for making a decision, but it was to be as fast and realistic as possible. After valid "defending" of an attack, "Ready?" appeared on the screen for informing the participant that the defense action was recorded. A response was valid when the participant entered another (or the central plate again) contact plate with a step. After each response, questions about the intuitive tactical choices given were raised regarding confidence in them in form of a six-point Likert-type scale (1 = absolutely ambiguous, 2 = ambiguous, 3 = indecisive, 4 = tendentious, 5 = unambiguous, 6 = absolutely unambiguous). Then, the participant was allowed to head back to the starting position with a new countdown coming. Unintended (e.g., short hop on the central plate before movement) or early actions were marked off by the laboratory staff and excluded from analyses.
Following the provision of standardized oral instructions, the participants performed ten trials to familiarize themselves with the test setup. A selection of all four attack actions (right-handed) was presented in a randomized order with different occlusion points. A member of the test staff made sure that participants initialized their defensive actions within the given time frame. Additional advice was given when responding too early, when the contact plate aimed for was not hit adequately or when movements were too hesitant. Due to its team-handball-specific nature, participants engaged themselves quickly in the test setting. No further information about the number of videos, the test scenario or the test performance was provided during the subsequent experimental trials. During the experiment, participants always started with the right-handed block, with a 5 min break before continuing with the left-handed block. The TO scenario ran in a structured video clip order, starting with the littlest (t 6 ) and ending with the most information (t 0 ). Within every occlusion time condition, the videos were randomized. The test session took about 35 min for a total number of 224 videos. All participants completed both tests with a time lag of at least 24 h between the measurements, but not longer than seven days. For the longitudinal reliability analysis, two teams (n = 22) underwent two test sessions.

Data Analysis
Two dependent variables based on the data from the contact plates were considered for statistical analyses. The choice of motor response (CoMR, as multi-categorical variable) was defined as a participant's response to the attacker's action, recorded through contact with one of the four response plates. The initialization time of motor response (ItMR) was defined as the elapsed time (in ms) from the end of the video until the participant left the contact plate (i.e., the applied force fell below 80 N). Note that individual ItMR values exceeding 2.5 times the absolute deviation around the median (calculated according to Leys et al., 2013 [34]) were categorized as outliers and therefore discarded from statistical analyses.
Statistical Package for the Social Sciences Version 26 (SPSS Inc., Chicago, IL, USA) was used for all statistical analyses. Cohen's kappa [35] was used for intra-session reliability of doubled videos for respective agreements of CoMR (session 1). Fleiss' kappa [36] was used to assess inter-session reliability of two x doubled videos for agreement of CoMR (sessions 1 and 2). A 95% confidence interval was calculated according to Sheskin [37]. Overall kappas (intra-and inter-session) are presented as mean kappa values of occlusions for hand-specific attack actions. The interpretation of kappa coefficients based on the proposed standards for strength of agreement: <0 = poor, 0.01-0.20 = slight, 0.21-0.40 = fair, 0.41-0.60 = moderate, 0.61-0.80 = substantial and 0.81-1 = almost perfect [38]. For all reliability calculations of CoMR (multi-categorical variables), we followed the proposed Guidelines for Reporting Reliability and Agreement Studies (GRRAS) of Kottner et al. [39].
With respect to ItMR, we were interested in whether the expected pattern of faster initialization times in response to videos containing more information was present. After establishing normally distributed data by means of the Kolmogorov-Smirnov test, differences in ItMR as a function of occlusion time point within TO session 1 were assessed with one-way repeated measures ANOVA. Greenhouse-Geisser correction was applied in cases of violation of the sphericity assumption (assessed with Mauchly's test).
The significance level for all analyses was set to the conventional p < 0.05.

Choice Confidence
Choice confidence in all four attack situations (left and right-handed; intra-session) was high (M = 4.5 and SD = 0.4 on the 1 to 6 point Likert-type scale; see Figure 4).

Intra-Session Reliability
The distribution and frequency of CoMR in session 1 are presented in Figure 5. The number of complete CoMR video pairs (intra-session) ranged between n = 44 and 65, from a total of 66 pairs. Missing pairs resulted from the exclusion of videos with invalid motor responses (see Methods). CoMRs at the different occlusion time points can be found in Supplementary Materials Figures S1-S4.
A visual inspection of subjects' CoMRs revealed that in most attacks, a consistent preference for either a passive position/blocking or moving forward/tackling were present. Furthermore, attacks with less available visual information (t 6 − t 4 ) often corresponded with a forward/tackling choice, and a passive position/blocking response was chosen more often as the amount of visual information (t 3 − t 0 ) increased. There is a notable difference in the response dynamics in breakthrough. Decisions in the left-handed version tended sideways-right after occlusion time point t 4 , whereas participants in the righthanded version preferred passive position/blocking or moving forward/tackling (see Supplementary Materials Figures S1-S4).
Cohen's kappa statistics revealed that intra-session reliability was significant for all actions (all p's ≤ 0.025; see Table 1). Agreements ranged from fair (right-handed Pass) to moderate (left-handed breakthrough). Six substantial correlations were found, four for the breakthrough and two for the standing throw. For occlusions t 6 and t 5 , agreement was mostly moderate; then there was a consistent decrease with chance level (t 5 − t 2 ); finally, there was the strongest agreement level at the end of an attack (t 1 − t 0 ).
The overall mean kappa agreement of CoMR for individual right and left-handed attacks can be considered as moderate (Table 1).   Table 1. Intra-session agreement of the CoMR for all right and left-handed attacks. Agreement between video pairs (n) for each occlusion of an attack (session 1) was assessed using Cohen's kappa (K). The 95% confidence interval (CI) and significance values (p) were calculated.

Right-Handed Attacks Left-Handed Attacks
Attack Action Occlusion n K 95% CI p n K 95% CI p

Inter-Session Reliability
Results for inter-session reliability (n = 22) can be found in Table 2. Agreement in CoMR ranged from fair (left-handed jump throw) to moderate (right-handed breakthrough). Only two non-significant, slight levels of agreement were found (right-handed jump throw at t 3 ; right-handed standing throw at t 2 ). Three left-handed attacks demonstrated substantial agreement in the latest occlusion points (pass at t 0 ; breakthrough at t 0 ). Almost perfect agreement was found for the left-handed jump throw.
Overall mean agreement of CoMR for individual right and left-handed attacks can be considered as fair to moderate. Note the between-hand differences in single kappa values at t 6 and t 0 in breakthrough, t 2 -t 0 in jump throw, t 3 in standing throw and t 0 in pass, and in the overall agreement in jump throw.
A summarizing graphical overview of within and between-session reliability results is provided in Figure 6. Table 2. Inter-session agreement of the CoMR in all right and left-handed attacks. Agreement of four responses from 2 video pairs (one pair in each of both sessions) for each occlusion of an attack was assessed using Fleiss's kappa (K). The 95% confidence interval (CI) and significance values (p) were calculated.

Right-Handed Attacks
Left-Handed Attacks

Initialization Time
The results for ItMR ( Figure 7) show

Discussion
The TO paradigm is considered a well-established tool to assess perceptual-cognitive skills in sports [9,40]. The aim of the present study was to create and evaluate a real-worldlike test environment to address perceptual-cognitive skills in team handball. Specifically, in line with recommendations in the literature [25], our test uses (a) a life-sized projection screen and a contact plate system, (b) varying open-game attack actions from team handball and (c) multi-categorical motor defensive actions. Athletes' self-reports indicated that they responded with a high degree of confidence to the video clips, thereby suggesting that meaningful team handball-related information was presented. Within and betweensession reliability analyses generally revealed moderate agreement among the motor responses chosen. With increasing visual information about the attackers' unfolding actions, participants more rapidly initiated their defensive actions. Our results qualify this new test setup for future longitudinal measurements (e.g., in the context of cross-sectional analyses, correlation studies or tactical skill training).

Choice Confidence and Initialization Times
Choices in our test setup were generally rated as tendentious to unambiguous, which we interpret as evidence for an appropriate task difficulty level within the near-game test environment Additionally, we observed faster response times with increasing visual information, which is consistent with current models of decision making [24]. Subjects seemed to get closer to decision thresholds with more information. With temporal progression in the videos, the attacker offers more information about the intended action through the ongoing occurrence of kinematic cues, what apparently lead to clarification about the tactical decisions to be made by the defending participant. The resulting accuracy increase at later occlusion time points was in line with several computer-based TO studies [14,41]. Regarding the motor aspect in this study, our results are also in good agreement with the findings of Farrow et al. [41], where the accuracy of decision quality from tennis-specific return strokes improved with more information. Through the overall linear decrease of motor initialization time in the occlusion time course, we suppose that motor response times in our TO model are associated with decision making processes and accuracy outcomes. Projected to the one-on-one situation in team handball, an earlier perception of an attacker's future motion could lead to a higher success rate by the defender.
Following up on the matter of response time, explanatory approaches in team handball were given by the study of Raab and Laborde [42]. Their video-based experiment demonstrated that expert players make faster and better tactical choices than near-experts and nonexpert players. Comparisons of intuitive and deliberative preferences for tactical choices in attack situations were drawn using decision time as a performance-discriminating factor. Supporting take-the-first heuristics [43], experts seem to rely on very little information for making rather intuitive tactical decisions, resulting in faster and better choices. By projecting this heuristic model onto our findings, the initialization times could be of strong consideration for future intuitive decision making analyses in complex motor settings. Worthwhile approaches for possible expert-novice comparisons could be provided by the take-the-first and take-the-best strategies [44]. Generally, using motor initialization times in complex TO settings could also benefit future accuracy outcomes (e.g., through identification of waiting strategies before decision making [45], or in the context of embodied choices [46]).

Intra-and Inter-Session Reliability
Reliable measurements constitute a basic prerequisite for reproducible correlational studies, cross-sectional group comparisons and longitudinal studies (within or between groups) [47,48]. Surprisingly though, to the best of our knowledge, no study hitherto systematically investigated the reliability of multi-categorical performance metrics in TO research.
Cross-sectional studies revealed some evidence for reliability in the TO paradigm in team sports. Internal consistency analysis of a computer-based TO test in volleyball, where participants had to distinguish between a smash and lob, found acceptable reliability (r = 0.72) for video pair responses on the interval scale level [14]. When novices (no experience in competitive volleyball) and skilled volleyball players were separately analyzed, coefficients decreased to 0.66 and 0.55, respectively. With respect to intra-session reliability, we generally found moderate-to-substantial response consistency (right and left-handed) in all occlusion time points evaluated. Therefore, besides one exception (pass), the reliability estimates reported here were comparable to those of Loffing et al. [14].
A closer look at our data revealed attack-specific differences in terms of reliability, which emphasizes the specificity of varying open-game situations. For example, in breakthrough, the late occluded videos (t 3 ; t 2 ; t 1 ) revealed relatively high levels of agreement. In these occlusions, the full action intentions of the attacker seemed to be terminated due to a highly dynamic run-up (t 0 till t 1 ) and a deceptive movement at t 3 , which is why most participants just reacted subconsciously in the following occlusions (t 4 till t 6 ) with most likely identical decisions. That may explain the visible rise in the level of agreement in breakthrough. A further example for the necessity of varying game-situations is given by the different distinctions of reliability between right and left-handed jump throw attacks (inter-session). Higher left-hander reliability in later occlusion points (t 4 − t 0 ) could be traced back to different defensive behavior based on greater uncertainty in how to defend against left-handed players. In fact, left-handed players in team handball are less frequently represented in team handball [33], which leads to divergent levels of agreement.
Similarly to within-session reliability, there are only a few longitudinal studies that report between-session reliability. Without a TO approach, a related study of Raab and Johnson [49] assessed long-term reliability in the context of option-generating research in team handball. Over a 2-year measurement period, their experimental setup contained full video clip presentations of competition-like attack situations, from the perspective of an attacker coming onto the defense line. After the end of each video clip (frozen video frame), participants were instructed to verbally report options for the player in possession of the ball. Reliability estimations for decision quality within four measurement points were calculated using the split-half test. Spearman-Brown coefficients for quality of the first option ranged from 0.49 in test wave one to 0.89 in test wave two. The variability of response reliability in their analyses can be compared to the distinctions among our inter-session kappa values, ranging between fair and substantial. Recommendations by the authors about further longitudinal studies in heuristic settings in sports were given as well. Other investigations with the TO paradigm, executed in cricket, tennis and softball, found overall high reliability (r = 0.74-0.99) for decision and response accuracy [20][21][22]. Probably, the high reproducibility reported in these studies can be explained by the closegame character of test setups (batting in softball and cricket; tennis serving), with accuracy outcomes consisting of binary predictions of ball flight directions of type of throw. In this study, we found levels of response agreement ranging from fair to substantial. A possible explanation for the fact that agreement in our study was slightly worse compared to racquet sports studies is that we used a complex test environment in combination with multi-categorical (instead of binary) response outcomes.
A detailed view on inter-session agreement data reveals that the highest kappa values occurred for the earliest or latest occlusions. This provides margins for interpretative patterns about either easier or more difficult tactical decisions to make at these time points.
High kappa values can be explained by the participants' full knowledge about the attacker's intentions in the video clips, especially in the late occlusions at the end of an attack sequence.
High kappa values in the earliest occlusions seem to suggest that too few kinematic cues in the video clips were given for an early and risky defensive intervention by the participant. Little information at the beginning of an attack seems to rather excluded certain response options, such as sideways movements, within the decision making process. The exclusion of options increases the response probabilities for the options that are left, and so the chance to identify the appropriate option at the same time. Based on a more concentrated number of the likelihood of responses, the kappa agreement increased. The comparably low kappas later in the ongoing attack (t 5 − t 3 ) seem to suggest that the number of kinematic cues in the attackers' movements reached an oversupplying limit in the participants' decision making, shifting from intuitive to rather deliberative. In particular, we suspect these time points to be crucial for the perceptual-cognitive skills based on anticipatory information pick-up.

Limitations and Future Research
First, breakthrough showed a disproportionally large number of missing video pairs that can be explained by the high frequency of dynamic kinematic information (e.g., through the attackers' deceptive move). Here, defenders were "dragged" from the attacker's postural changes, which led to habitually premature movement initializations.
Second, the choice of defensive actions was governed by the doubled but randomized video clip presentation within occlusion conditions. As mentioned before, standing throw and pass, and jump throw and breakthrough, demonstrated similar movement patterns and run-up steps, respectively. Subconsciously, participants could have been aware of previously observed kinematic cues of matching video clips. Previously primed tactical decisions from other attacks could have resulted. We counteracted this problem with a large number of video clips (224 videos per subject) and the inclusion of dummy trials in the test paradigm.
Third, although degrees of freedom of the defender's movements were exceptionally high compared to previous studies [14,[20][21][22][23], the pre-specified contact plate positions of our test setup constrained defenders' movement paths. Defending a one-on-one situation in team handball implies varying body and arm positions that also lie between or off the prescribed contact plates. Therefore, only full-body changes of defender position could be analyzed. Nevertheless, the execution of an additional offensive block requires an initial position changing movement. Staying passive and blocking could not be distinguished either, but again, the positioning in the defense was the main focus. That is why we still expected valid insights into tactical defense behavior with this setup.
In team handball, the so called "show-up" is a typical behavior of defenders. "Showups" provoke movement changes or discontinuations of an attacker through disconcertion. A show-up normally implies a fast and single step forward up to the 8 m mark, and a fastpaced movement backwards. Other defenders prefer slightly offensive positions at the 8 m line, and not the classically instructed 7 m line position required in our study. Additional contact plates at 8 m and between central and lateral contact plates would broaden the space for defensive actions and allow analyses of so-called triangular movements (a show-up with a lateral move backwards to the side of the ball).
Fourth, the TO paradigm by nature presents simple time frames with varying postural cues, but our paradigm is unable to provide a clear identification of cues' decisive contributions in action sequences. Additional spatial occlusion and eye movement registration [50] could deliver combined knowledge about what areas or cues in a visual search field are of significant importance for anticipatory processes and provide information about an athlete's information gathering strategy.
The test battery forms a basis for new entry points into future anticipation research in real-life environments in team handball, and overall invasive sports. The rarely considered but crucial aspect of contextual information, or situational probabilities [40,51], offers fruitful research perspectives in the context of this new test setup. Due to the focus on reliability in this paper and the movement basis of the experimental set up, these factors are not applicable for explanatory approaches of present findings in this study so far. Nevertheless, it must be mentioned that with involvements of these factors, our test battery can lay the foundation for more holistic clarifications in cognition in team sports. Due to its now proven reliable properties, the test setup could be used as a motor tool for modified perceptual training [52] in the future. With further developments and adjustments, it could allow athletes to improve visual information gathering by repeating natural skill executions in a discipline-specific way. Other areas of interest could be the prediction of a team handball defender, given distinctive situational information in videos-for example, changes of court position by the attacker (see [53]). How will tactical decisions change when the two back players perform attack-specific actions at the same time? Brenton and Müller [54] also recommend the presentation of different protagonists in video-based testing.

Conclusions
In summary, we have extended previous studies by demonstrating that the TO method can be considered as a reliable measurement tool based on cross-sectional and longitudinal investigations in team handball. We found fair to moderate agreement among multicategorical defense responses with obvious tendencies toward substantial and excellent agreement. We have also illustrated that the combination of the TO paradigm with teamhandball-specific motor responses on a test battery is feasible. The team-handball-specific nature of the test battery, including a reliable anticipation test method (TO) in a real-life inspired decision-making setting, can contribute not only to improvements in cognitive study designs and interpretations, but also to a deeper understanding of the cognitive mechanisms in team handball. As psychological abilities are claimed to be some of the most momentous performance prerequisites in team handball [55], our test offers possibilities not only for visuomotor training interventions but also for talent identification and talent development processes in team handball.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/app11115203/s1. Figure S1: Response distribution and consistency of choices of motor response (doubled videos; intra-session) in breakthrough. Figure S2: Response distribution and consistency of choices of motor response (doubled videos; intra-session) in jump throw. Figure S2: Response distribution and consistency of choices of motor response (doubled videos; intra-session) in standing throw. Figure S4: Response distribution and consistency of choices of motor response (doubled videos; intra-session) in pass. Funding: This work was supported by the Federal Institute of Sport Science (070506/ [19][20]. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Institutional Review Board Statement: Ethical review and approval were waived for this study, due to the study protocol's approval by the president's office from the Otto-von-Guericke University Magdeburg and the German Federal Institute of Sport Science (070506/ [19][20]. The study was conducted in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Not applicable.