How Just Culture and Personal Goals Moderate the Positive Relation between Commercial Pilots’ Safety Citizenship Behavior and Voluntary Incident Reporting

: Flight safety is consistently inﬂuenced by pilots’ self-inﬂicted incidents in routine ﬂight operations. For airlines, pilots’ reports on these incidents are essential input to learn from incidents (LFI) and for various safety management processes. This paper aims to explain the voluntary reporting behavior of pilots’ self-inﬂicted incidents from an occupational safety perspective. We investigate how the relation between pilots’ safety citizenship behavior (SCB) and reporting behavior is moderated by pilots’ fear, shame, goals, and goal-striving when reporting, as well as the inﬂuence of a just culture on the decision to report incidents. In total, 202 German commercial pilots participated in an online survey. The results showed that reporting behavior can be considered a speciﬁc form of self-intentional SCB, but should be differentiated into subtypes depending on a pilot’s unsafe acts (errors or violations) that caused the incident. Reporting behavior-speciﬁc motivational factors inﬂuenced different subtypes of reporting behavior: Just culture moderated a positive relation between SCB and reported incidents caused by violations. Moreover, depending on the subtype of reporting behavior, the relation was moderated by different types of goals in relation to the pilots. No moderating effects of fear or shame could be demonstrated. Our ﬁndings highlight the value of a just culture for encouraging goal-oriented reporting behavior in the context of LFI and safety management.


Introduction
"We must plant the seeds of training and just culture in order to have a harvest of safe behaviors". This sentence opens the 2019 safety report of the International Air Transport Association (IATA) [1] (p. 2). The report thereby addresses two recent topics at a time when safety management systems (SMS) have become mandatory throughout aviation, but the mere presence of this system has not proven sufficient to ensure safe operations [2] (p. 16). In this research, we illustrate how a just culture positively influences pilots' voluntary incident reporting behavior and how crew training can also benefit from just culture and pilots' goals when reporting.

Learning from Incidents and Experience
Organizational members' safe behavior is particularly important for commercial airlines, as a class of organizations whose activities have the potential to produce major accidents [3]. This potential is increased in organizations where a tension between work and safety fosters violations of regulations [4] or errors [5][6][7]. From a high reliability theory perspective, preoccupation with failure, combined with an organizational vigilance for minor incidents, is one of the main aspects for dealing with this potential [8]. In contrast to accidents, an occurrence is already considered an incident if it simply had the potential to affect safety [9]. Learning from incidents (LFI) aims to prevent future comparable occurrences by reflecting on past incidents and putting lessons learned into practice [10]. In this context, scientific literature also refers to near-misses as unplanned occurrences that did not result in damage but had the potential to do so [11]. An even more proactive view should also include a lowest possible threshold for interest in the learning potential of so-called weak signals [12,13]. In this research, we use the term "incident", as a collective term for incidents, near-misses, or weak signals that allow airlines to learn from experiences (LFE) [10,13]. To build experience, commercial airlines use flight data analysis, various behavioral observations of pilots while working in the cockpit, findings from accident investigations, and data from reporting systems [14]. The IATA encourages airlines to use this experience, for example, to tailor evidence-based training (EBT) to the specific needs of the organization and pilots, and to use first-hand experience of pilots on "airline-critical procedures" by evaluating their incident reports [15] (p. 52).
Incident reports can also be an important prerequisite for enabling LFI, hazard identification, and risk monitoring in the context of safety management systems (SMS) [16]. An SMS is based on a formal and prescriptive approach and includes explicit and formal methods of how organizations should identify and manage risks; this includes processes for how incidents are identified and reflected upon [17]. The use of incident reports, despite their role as a lagging indicator of safety performance, can be an indication of proactive safety management in organizations [18].
Incident reporting can be viewed through various industrial and organizational psychology lenses, such as various aspects of organizational change, theories of work satisfaction, or social constructivist perspectives [19]. In reflecting reporting systems from an occupational safety perspective, scientific results from the recent years identified several work environment-related and individual-related barriers that could limit the effectiveness of reporting systems [20]. Not all incidents perceived by organizational members are actually reported e.g., [21]. Underreporting is a common issue in various high reliability areas, such as healthcare settings [22], energy supply [23], or civil aviation [24]. It has been shown that the extent of underreporting often depends on the severity of underreported event [25]. As the dark figure of unreported incidents is usually not known to airlines, the extent of fogging is difficult to quantify [13]. Therefore, underreporting is likely to result in impaired safety management practices being at risk of failing to prevent accidents at an early stage [10,26]. A review by [26] revealed inadequate reporting behavior by organizational members to be the primary reason for ineffective reporting systems in more than half of the papers they studied, thus underlining that the initial LFI phase of incident detection can be considered as a significant bottleneck of the LFI (e.g., [10]). As a result, the application of IATA's recommendation to design evidence-based training (EBT) for pilots from the evidence of incident reports would only be based on a fraction of the incidents that had actually occurred [15]. One of the authors is a commercial pilot who is employed in the airline's safety department, and has repeatedly experienced incidents not being reported by his colleagues; therefore, this research aims to investigate how voluntary reporting behavior can be improved at organizational and individual levels.
In this study, we intend to further explain commercial pilots' voluntary reporting behavior of self-inflicted incidents in the context of safety citizenship behavior (SCB), while also investigating the influence of individual and environmental factors in order to enable LFI and EBT, thus reducing future incidents caused by errors and violations.

Safety Citizenship Behavior and Voluntary Reporting Behavior
For commercial airlines, legal requirements stipulate incidents that are mandatory to be reported [27]. However, incidents that might not be captured by mandatory reporting regulations also tend to contain "safety-related information which is perceived by the reporter as an actual or potential hazard to aviation safety" [27] (p. 28). These actual or potential hazards to flight safety must be reported voluntarily by pilots.
Voluntary reporting behavior can broadly be defined as extra productive work behavior that promotes organizational goals by following established rules [28]. When considering safety-related behavioral aspects, safety-related cultural and climatic constructs need to be considered, as these express the way safety is valued and help to explain the processes through which the meaning attached to safety in an organization influences safety outcomes [29] (p. 194). Safety culture can be defined as the underlying values and assumptions that guide the safety-related behavior of organizational members [29]. A just culture, mentioned at the beginning, is characterized by an atmosphere of trust and openness and is considered a cultural prerequisite for developing a reporting culture within a clearly defined framework of acceptable and unacceptable behavior that encourages the voluntary reporting of incidents [10,30,31]. As cultural characteristics are less accessible to conscious assessment, aspects of safety climate are often considered preferable, serving as a "surface feature of safety culture" [29,32] (p. 194).
Safety climate refers to "shared perceptions with regard to safety policies, procedures and practices" in an organization [29,33] (p. 143). Through sensemaking, a process of aggregating individual experiences into a collective phenomenon, a safety climate provides a framework shared by organizational members that influences motivation and behavior [34,35]. Several studies showed a strong correlation between organizational members' positive perception of a safety climate, their safety motivation, and voluntary safety behaviors [36,37]. The work-safety tension mentioned initially can be seen as a facet of the safety climate that exerts particularly negative effects on employees' safety behaviors, accidents, and injuries [6,7]. Safety motivation describes an "individual's willingness to exert effort to enact safety behaviors and the valence associated with those behaviors" [37] (p. 947). The relation between safety climate and safety motivation is influenced by various individual psychological processes, such as feelings of empowerment, psychological ownership, personal engagement, and passion for meeting challenging work goals [11,29,38]. When work, especially in high-risk areas, seems meaningful to organizational members, safety behavior can be seen as an intrinsically motivated investment in self-protection [39]. Organizational members' safe behavior can be considered a better predictable key safety outcome than the number of accidents or incidents, due to the closer proximity to the psychological factors described above [36].
Voluntary behavior that has a positive impact on the organization's ability to function goes beyond established processes and organizational expectations, and is not directly or explicitly considered within the formal incentive system, referred to as organizational citizenship behavior (OCB) [40,41]. Voluntary safety behavior that positively impacts organizational safety levels and goes beyond safety compliance is referred to as safety citizenship behavior (SCB) and has its origins in OCB [11]. SCB is often used interchangeably with the concept of safety participation in the scientific literature, and can be divided into affiliation-oriented, prosocial-oriented SCB, which mainly refers to affective dimensions of organizational membership and change-oriented, initiative-based SCB, which is strongly related to self-efficacy [42][43][44][45].
Change-oriented SCB has been studied much less frequently than affiliated SCB in the psychological literature to date [43]. However, according to [46], further analyses of SCB could significantly improve safety in an organizational context. In this research, we will therefore address how SCB and voluntary reporting behavior are related. Here, we expect that commercial pilots who exhibit a pronounced fundamental SCB in their routine work will also exhibit a pronounced reporting behavior when they cause a self-inflicted incident. Therefore, we hypothesize that commercial pilots' safety citizenship behavior is positively related to their voluntary reporting behavior being a specific form of change-oriented SCB on the dimension of "individual initiative" ( [47]) or "initiating safety-related change (improving safety)" ( [48][49][50]).
However, because studies have shown that voluntary reporting behavior appears to differ substantially depending on which factors cause an incident to be reported (e.g., [51]), we briefly describe ways of reflecting on incidents in the following section, before we formulate our first research hypothesis.

Reflecting Incidents
The overall performance of the flight crew remains the "primary contributing factor for accidents and incidents" [1] (p. 45). Particularly during critical phases of the flight, such as takeoff and landing, a limited "human consistency" [52] is often related to accidents and incidents.
When investigating accidents, however, modern perspectives, such as System-Theoretic Accident Model and Processes (STAMP), consider accidents more as failures of a complex sociotechnical system structure and, thus, as the result of uncertain interactions of various system components, in which human factors are one of many components [53]. The interactions between the different system structures and their environment implied in this view allow for a more complete understanding of the system, and thus allow for more effective safety management [54]. In addition, the consideration of system dynamics within this approach is essential to comprehend the development of more complex accidents, such as the Bhopal incident, to give one example [55,56].
When investigating incidents, less complex, linear models such as the Swiss Cheese Model are frequently used due to the usually significantly larger number of incidents [56,57]. The Human Factors Analysis and Classification System (HFACS), which is based on the Swiss Cheese Model, considers the occurrence of incidents as a linear failure in the areas of organizational influences, unsafe supervision, preconditions, and pilots' unsafe acts [58]. The appropriateness of HFACS for incident report analysis has also been demonstrated in scientific studies [59,60]. In HFACS, pilots' unsafe acts are divided into performance-based errors, judgment and decision-making errors, and violations. The DoD-HFACS 7.0 has established a total of 13 different subordinate categories for unsafe acts, as shown in Figure 1 [61]. As mentioned before, errors and violations as causes of incidents are also favored by a work-safety tension [4,5,7]: Work-safety tension can be understood as "the tension felt when working safely is perceived to be at odds with effectively performing one's job" [6] (p. 1462). In addition, this tension negatively affects organizational members' attitudes towards reporting accidents and incidents [4].
Studies by [51,62] showed that the number of incident reports differed depending on the unsafe acts that caused a given incident. Since it is usually not known how often incidents occur, but instead only the number of reports, fewer reports do not necessarily mean fewer incidents. We therefore suggest that operationalizing of "one" incident report-ing behavior is not appropriate because pilots' behavior seems to differ fundamentally in terms of what causes an incident. Our first research hypothesis is therefore: H1: Commercial pilots' safety citizenship behavior is positively related to their voluntary reporting behavior of self-inflicted incidents caused by: H1a: Performance-based errors; H1b: Decision-making errors; H1c: Violations. When we use the term "reporting behavior" in this research, we always use it to indicate voluntary reporting behavior of commercial pilots' self-inflicted incidents. For improved readability, we use the term "performance-based error reporting" when referring to commercial pilots' voluntary reporting behavior of incidents caused by performancebased errors. This wording is also used analogously for "decision-making error reporting" and "violation reporting".
Reporting behavior differs from change-oriented SCB by the fact that a pilot must admit their own errors to initiate safety-related changes (cf. [46,48,63]). This aspect underlines that the expected positive relation between SCB and reporting behavior, like any form of work behavior, should be considered under the influence of reporting behavior-specific motivational factors [64].

Factors Influencing Reporting Behavior
For a fundamental consideration of proximal and distal factors influencing safetyrelated work behaviors and safety outcomes, the reader may refer to the Integrative Model of Workplace Safety (IMWS) by [36]. In this model, the safety climate and leadership are described as distal, situation-related factors, whereas personality traits and attitudes belong to person-related distal factors; these distal factors determine the degree of safety motivation, which is considered a proximal factor for safety performance, and thus also for safety participation, which is often used interchangeably with SCB in the scientific literature [29,36,42].
With respect to the reporting behavior of organizational members in hazardous organizations, the state of research is already quite extensive; an expansion has taken place in recent years, primarily via studies in the healthcare sector (cf. [65,66]). The reporting behavior of commercial pilots has been investigated through a relatively small number of studies [24,67]. Furthermore, commercial pilots are often a minority in studies with multiple employee groups [20]. For this reason, we consulted experts in the field of incident analysis on factors influencing reporting behavior in preparation for this research in the summer of 2019 (publication currently under review). Based on these findings, we outline how reporting behavior in terms of "optimal functioning at work" can be affected by individual and environmental motivational antecedents and goal-related aspects, with reference to an Integrative Motivation Framework [64] in the following section.
At an individual level, momentary states of fear and shame can be considered motivational antecedents of voluntary reporting behavior [29,64]. Fear when reporting can lead to restricted reporting behavior and is directed, for example, toward what is perceived as fear of disciplinary action, fear of an unfair performance appraisal, or individual reputation concerns [68,69]. Shame impairs reporting behavior and can even trigger concealment behavior, especially in areas perceived as risky [26,69].
At an environmental level, reporting behavior can be influenced by the prevailing reporting culture within an organization and its associated just culture [18,26,64,70]. The European Parliament defines just culture to be a "culture in which front-line operators or other persons are not punished for actions, omissions or decisions taken by them that are commensurate with their experience and training, but in which gross negligence, willful violations and destructive acts are not tolerated" [27] (p. 27). Even though just culture is generally considered a facet of organizational safety culture, commercial airlines in the European Union are required to translate the "principles" described in the definition into internal organizational rules [18,27]. The legislation on the level of the European Union intends to clearly protect the reporter from disadvantages or prosecution within the framework of a just culture [27]. In a meeting between the authors and safety experts from the Professional Association of Commercial Aircraft Pilots and Flight Engineers in Germany "Vereinigung Cockpit (VC)" during the preparation of this study, it became clear that, although these legal framework conditions apply at an EU level, they have not been integrated into national German law so far.
Like any behavior at work, reporting behavior is goal-directed and can be considered in the context of various motivational theories [64]. In this study, we consider aspects of goal choice and goal striving that pilots pursue in the context of a work-safety tension by writing a report [6,64]. Based on the results of the expert survey mentioned above, two types of goals are distinguished: change goals are characterized by giving other pilots or the organization the opportunity to learn from their own errors or to achieve safety-related improvements in the work environment [24,51]. Protection goals focus on documenting an incident to limit individual liability from self-inflicted incidents [27]. Factors that affect goal striving when reporting can be limiting from the perspective of the Theory of Planned Behavior when there is an intention to report but that intention does not translate into action (cf. [71]). One mechanism identified in this context during the expert interview was procrastination (a voluntary delay of a deliberate, important action even though the individual is aware of potentially unfavorable consequences) (cf. [72]). Pronounced procrastination behavior may result from low levels of self-efficacy, whereas high levels of self-efficacy are related to SCB [43,73].
The reporting behavior-specific individual and environmental factors derived from the mentioned preparatory study go beyond person-and situation-related factors affecting an overall "individual's willingness to exert effort to enact safety behaviors" [37] (p. 947) in the IMWS [36]. We therefore hypothesize that voluntary reporting behavior may not be explained solely by these factors and that the individual and environmental motivational factors relating exclusively to reporting behavior outlined in this section might have an effect on the magnitude of the positive relation between fundamental SCB and reporting behavior. Our second research hypothesis is therefore: H2: The magnitude of the positive relation between a commercial pilot's safety citizenship behavior and their voluntary reporting behaviors depends on: H2a: Fear; H2b: Shame; H2c: Just culture; H2d: Change goals; H2e: Protection goals; H2f: Procrastination.
With this hypothesis, we also follow the recommendations of [42] and [64] to further investigate motivational aspects in safety-participating activities at work.

Other Planned Analyses
As stated in our hypotheses, from an occupational safety perspective, we expect a positive relation between fundamental SCB and reporting behavior, depending on a moderating influence of the variables presented, which are "at the very heart of theory testing in the social sciences" [74] (p. 255). Thus, we take up one possible perspective on reporting behavior, and it is conceivable, as discussed above, that other perspectives or motivation-based approaches unrelated to safety aspects in organizations may also be appropriate. Therefore, we will use an exploratory approach to investigate whether fundamental SCB remains the best predictor of voluntary reporting behavior, even when compared with possible direct effects of the variables in H2a to H2f. The results are intended to support future research and will not be interpreted further. Figure 2 summarizes our research agenda.

Materials and Methods
The methodology was a non-experimental, correlative design using a computerbased online questionnaire. The questions associated with theoretical constructs were intended to determine the differences between pilots in terms of their individual degrees of agreement with statements presented to use the data obtained to draw conclusions about underlying assumptions (cf. [75]).
We do not expect any procedural restrictions, because we expect pilots to be sufficiently familiar with computer and internet use (cf. [76]). We justify the choice of this method with positive previous experiences in comparable studies (cf. [20,24,36,51]) and the high objectivity of the procedure and evaluation (cf. [77]).

Participant Characteristics
The sample consisted of n = 202 participants, of whom 23 (11.4%) were female and 177 (87.6%) were male, while 2 did not indicate their sex. The age distribution is shown in Figure 3. In total, 99 (49.0%) participants indicated their rank with captain, 99 (49.0%) with officer, and 4 did not specify. Moreover, 180 (88.1%) pilots indicated that they worked for a German-registered airline, 11 (5.4%) worked for an EU-registered airline, and 11 did not specify. On average, the participants were employed by their airline for about 16 years (M = 15.74, SD = 9.03). In total, 32 (15.8%) participants answered the question about an additional function in airline management with yes, 154 (76.2%) with no, and 16 (6.9%) did not want to specify. There were 30 (14.9%) participants who answered the question about an additional role as flight instructor with yes, 189 (78.7%) answered with no, and 11 (5.0%) did not want to specify. For our analyses, no inclusion and exclusion criteria or restrictions based on demographic characteristics were defined (cf. [78]).

Sampling Procedure
The participants were recruited with the support of the Professional Association of Commercial Aircraft Pilots and Flight Engineers in Germany "Vereinigung Cockpit" (VC). The study was conducted in accordance with the Declaration of Helsinki. The protocol of research project with identification code "RESEARCH SURVEY ON THE REPORT-ING BEHAVIOR OF PILOTS ON SELF-INFLICETED INCIDENTS" was approved by the Professional Association of Commercial Aircraft Pilots and Flight Engineers in Germany "Vereinigung Cockpit" (VC) on 19 September 2019. Approximately 10,000 potential participants were encouraged to participate in the context of a regular e-mail newsletter from the professional association on 14 October 2019. During the data collection, two reminders for participation were sent out (13 December 2019 and 6 February 2020). The data collection was completed on 21 February 2020. With n = 202 participants, the percentage of pilots participating was about two percent. Participants received no incentives for participation.

Sample Size
The required sample size was calculated using G*Power (version 3.1) for a linear multiple regression with a fixed model for R 2 deviations from zero [79]. Although the effect sizes calculated in one of our pre studies [51] were in a high range, these results were unsuitable for the calculation of the expected effect sizes in this study because interaction effects usually explain less variance than main effects [80]. Therefore, we considered comparable studies with moderating effects in the context of safety citizenship behavior and safety-related constructs (cf. [81,82]). In these studies, the ∆R 2 due to moderation amounted to between 5% and 8%. We used the smallest effect (∆R 2 = 0.053) found in these studies and calculated a required sample size of 199 participants (f 2 = 0.056, p α < 0.05, p 1-α = 0.80). The achieved sample size of n = 202 was larger than the intended sample size of n = 199.

Measures
Participants were asked to indicate their individual level of agreement with presented statements (task master). Therefore, a verbal, bipolar, end-point-named rating scale with neutral middle category was used (cf. [83]). The scale was presented from left to right and consisted of five intervals from 1 = "I strongly disagree" to 5 = "I strongly agree"; an "I do not know" category was not offered (cf. [84,85]). The scale was consistent throughout the questionnaire (cf. [86]).

Outcome Variables: Reporting Behaviors
Reporting behaviors were measured using with a total of 13 hypothetical incident scenarios as item masters. The universe of items was defined by the subordinate categories of the three output variables of performance-based errors, decision-making errors, and violations included in DoD-HFACS 7.0 (cf. Figure 1) [61,87]. We used a rational construction strategy to develop one hypothetical incident scenario for each item master. The content of each scenario was derived following an evidence-based approach from an analysis of more than 2000 pilots' voluntary reports of self-inflicted incidents, which was conducted in preparation for this research [62]. In addition, the following design rules were established in advance: The incident scenario assigned to the respective subordinate unsafe act category should correspond to a frequently described incident in this subordinate category found in the evaluated reports and, if possible, include information about contributing factors identified by [62] to be risk factors for this type of incident. Moreover, the incident scenarios should contain all information relevant for the assessment (cf. [88]), have a medium occurrence severity (cf. [51]), be comprehensible regardless of the type of aircraft flown by the pilot, and should not describe any measurable damage.
The selection of the items based on the unsafe act categories of the Dod-HFACS 7.0 and the evidence-based construction for each item described above supported a content-valid operationalization of the three subtypes of reporting behavior (cf. [75]). Considering that the association structure of the hypothetical incident scenarios with the three subtypes of reporting was already given by the HFACS systematics, no further test of construct validity was conducted (cf. [75]).
Performance-based error reporting was measured with six hypothetical incident scenario items. An example item was: "During cruise you receive a climb-clearance to FL380. You initiate the climb using the auto flight system by selecting a vertical speed. After that, you continue reading in your magazine. A few moments later you see that the IAS has reduced significantly toward the minimum speed. You decrease your vertical speed and continue the flight without further incidents". The internal consistency of the scale in this study was in a good range (Cronbach's α = 0.82) [89].
Decision-making error reporting behavior was measured with four hypothetical incident scenario items. An example item was: "During final approach, the tower reports that a wooden plank has been seen on the right edge of the runway by another aircraft. You decide to land a little left of the centerline. During landing roll, you realize that it is a larger number of wooden planks that could have caused damage if you had touched down right of the centerline. You taxi to the gate without further incidents". The internal consistency of the scale in this study was in a questionable range (Cronbach's α = 0.69) [89].
Violation reporting behavior was measured with three hypothetical incident scenario items. An example item was: "During final approach you fly faster than usual to reduce the delay of your flight. Distracted by several ATC calls you miss an on-time configuration of the aircraft and reach the altitude prescribed by your airline for a stable condition with 30 knots overspeed. Together with your colleague, you perform a risk assessment and decide to continue the approach, due to the fact the missed approach would lead through an area of bad weather. You land and taxi to the gate without further incidents". The internal consistency of the scale in this study was in a good range (Cronbach's α = 0.80) [89].

Predictor Variable: Safety Citizenship Behavior
Safety citizenship behavior (SCB) was measured using the three items of the safety participation factor by [37]. We chose this scale because the items reflected most closely the fundamental aspects of SCB described above. An example item was: "I put in extra effort to improve the safety of the workplace". The internal consistency of the scale in this study was in a good range (Cronbach's α = 0.88) [89].

Moderators
To measure the constructs of fear, shame, just culture, change goals, protection goals, and procrastination, item groups with items constructed in an inductive construction strategy were used. To ensure that all aspects relevant to the construct were included in a representative and content-valid manner, the content relevant to voluntary reporting behavior was identified through the named expert interview (cf. [90]). The universe of items was defined by the construct-specific aspects described as relevant by experts (cf. [87]). When formulating the item masters, literal formulations used by the experts in the interview were used whenever possible. According to [75], the following rules were observed: The personalized statements should directly address the characteristic of interest and aimed to be as concrete and emotionally neutral as possible. When formulating the items, linguistic comprehensibility had the highest priority; we placed special emphasis on a positive formulation of the items, the use of simple terms, and a clear sentence structure. To ensure the unambiguousness of the item content, we avoided universal expressions and adapted the item content to the language level of the target group; we made sure that no item contained multiple statements and refrained from any suggestions.
Fear was measured with eight items. Example items were "When I think about reporting an unsafe act to my airline, I fear being asked unpleasant questions" and "When I think about reporting an unsafe act to my airline, I fear being fired". The internal consistency of the scale in this study was in an excellent range (Cronbach's α = 0.93) [89].
Shame was measured with four items. An example item was "When I commit an unsafe act, I feel ashamed". The internal consistency of the scale in this study was in an acceptable range (Cronbach's α = 0.74) [89].
Just culture was measured with three items. The phrasing of the items beginning with "In my airline" should reflect that these items are designed to address the organizational, not psychological, level of analysis (cf. [91]). An example item was "In my airline, the management practices just culture principles". The internal consistency of the scale in this study was in a good range (Cronbach's α = 0.89) [89].
Change-related goals relate to enabling a safety-related change or learning in the organization, and were measured with four items. An example item was: "By reporting an unsafe act, I can help other pilots learning from it". The reliability of the scale in this study was in good range (Cronbach's α = 0.81) [89].
Protection goals relate to documenting the incident to reduce individual liability concerns, and were measured with two items. An example item was: "By reporting an unsafe act, I am legally protected if the act becomes known to my airline through other sources". Moreover, recommend using the Spearman-Brown coefficient to measure the reliability of a two-item scale [92]. The reliability of the scale in this study was in a good range (ρ y1y2 = 0.83) [89].
Procrastination was measured with four items. An example item was "It happens that I keep postponing filing my report, ending up not filing it at all". The internal consistency of the scale in this study was in a good range (Cronbach's α = 0.84) [89].

Descriptive Items Statistics
Except for four items, item difficulty (P i ) was in a medium range (20 ≤ P i ≤ 80) [75]. Eighteen items indicated good item discrimination (0.4 ≤ r it ≤ 0.7). The item discrimination of 13 items was below r it ≤ 0.4 and above r it ≥ 0.7 for 9 items [75]. As the items with low discrimination were not close to zero and were relevant for content-theoretical reasons, they were not removed [75]. The descriptive item statistics are summarized in Appendix A.
To assess the correlation structure between the test items and the constructs used as independent variables, we applied structure-seeking factor analytic procedures for construct validation [75,90]. Hence, a first descriptive classification of the items into the existing theoretical structure of the described constructs should provide information about the homogeneity of the test items [75]. The number of factors was set a priori to seven. Sampling adequacy was confirmed by the Kaiser-Meyer-Olkin measure (KMO = 0.837). The Bartlett's Test of sphericity was significant (χ 2 (378) = 3796.97, p < 0.001). The seven factors explained 64.51% of the total variance. The rotated factor structure reflected the theoretical structure and is presented in Appendix B.

Pretest
A multistage pretest was conducted to qualitatively test the comprehensibility of the items. In addition, we wanted to ensure that all relevant information was available for assessing the incident scenarios described (cf. [88]. In a first step, 24 participants, 15 of them pilots, completed a printed version of the questionnaire and documented encountered difficulties. After the corrections had been incorporated, a further pre-test with three pilots was conducted using "cognitive pretesting" in combination with the "think aloud method" (cf. [83]). A further pre-test using paraphrasing was carried out with another pilot. Four persons, two of them pilots, tested the final online version of the questionnaire. The pretests resulted, for example, in the rewording of items or scenario adjustments.

Procedure
Data collection was carried out with a web-based questionnaire using EvaSys software. Reasons for the choice of this software were data protection aspects, as well as optimized usability for mobile devices. All subjects gave their informed consent for inclusion before they participated in the study. After the content-related questions had been completed, participants were provided with a short debriefing text (cf. [93]). The questionnaire was in the English language; completion took between 20 and 30 min.

Analysis
IBM SPSS-Software (version 27) was used for analyses. Due to the rather large number of constructs, hypothesis testing using latent variable models was not possible with our sample size, especially given the usually small effect sizes of interaction effects (cf. [94,95]). Moreover, these models suffer from several methodological constraints when testing for interaction effects (cf. [96][97][98]).
Therefore, linear regression models were calculated.

Statistical Requirements and Missing Values
We assumed linear relations of variables involved after a visual inspection of the scatterplot with LOWESS smoothing [99]. The collinearity statistics were inconspicuous (Variance Inflation Factor (VIF) < 10) with the largest VIF = 2.09 (cf. [100]). The visual inspection of the standardized residuals and standardized estimated values revealed signs of heteroskedasticity, and the PP-plots of observed and expected cumulative probabilities revealed signs of violated normal distribution requirements of the residuals. Therefore, bootstrapping and heteroscedasticity-consistent standard errors were used as described above (cf. [100,101]).
For the evaluation of influential data points, we considered the changes in regression coefficients (Df BETAS) and in predictedŷ-values (Df FITS) larger than |1|, as well as Cook's Distances with D > 0.020 (cf. [74,100]). As neither the demographic data nor the commentary field in the questionnaire suggested subpopulations or unique participants, the hypotheses were tested using the entire data set (cf. [100]). As explained above, the hypotheses were also tested by excluding influential data points. The number of excluded cases in the respective regression model is indicated by the degrees of freedom.

Hypotheses Testing
To test hypothesis one, three simple linear regression models were calculated. To test hypothesis two, moderated multiple regression involving a two-way interaction between SCB and the moderator named in the hypothesis were calculated with ordinary least squares (OLS) regression using the PROCESS 3.5 macro by [105]. The moderation was tested in 18 individual regression models because interactions with more than one moderator are difficult to interpret meaningfully (cf. [80]). Outcome variables were performancebased error reporting, decision-making error reporting, and violation reporting. All continuous variables that defined products were mean-centered to facilitate the interpretation of the regression parameters (cf. [105,106]). Bootstrapping with 5000 iterations, together with heteroscedasticity-consistent standard errors (HC3), was used to calculate confidence intervals (cf. [107,108]). To probe the interactions, the Johnson-Neyman technique was applied (cf. [109]). To assess the sensitivity of the regression results, these are additionally presented without the effect of influential data points (cf. [100,110]). A significant interaction term in each regression model calculated with the complete dataset was defined as the criterion for acceptance or rejection of the hypotheses [106,109]. We use the semi-partial correlation of the product term (∆R 2 ) as an effect size measure because this value contains the portion of the variance of the outcome variable that can be explained exclusively by the interaction effect [111].
Hypotheses, which were also confirmed during the sensitivity analyses, are additionally presented visually for a simplified interpretation (cf. [112]).

Other Planned Analyses
To investigate whether SCB remains the best predictor for voluntary reporting behaviors when compared with possible direct effects of the predictors defined as moderators in hypothesis two, we calculated multiple linear regression models for each dependent variable. The predictors were included data driven using forward selection because the explorative approach did not contain any assumptions about the causal order or relevance (cf. [74]). The results are additionally presented without the effect of influential data points.

Results
The descriptive statistics and correlations for study variables are shown in Table 1. Values between 0.10 ≤ r ≤ 0.30) correspond to a weak effect, between 0.30 ≤ r ≤ 0.50) to a moderate effect, and to a strong effect for r > 0.50 [113].

Hypotheses Testing
Hypothesis one assumed that commercial pilots' safety citizenship behavior is positively related to their voluntary reporting behavior of self-inflicted incidents caused by performance-based Errors (H1a), decision-making errors (H1b), and violations (H1c).
This hypothesis could be accepted because SCB was shown to be a significant predictor for all subtypes of reporting behavior. The effect sizes for all three models could be considered as large (R 2 > 0.25) [113]. The result was also confirmed without influential datapoints but indicating a decrease in explained variance for SCB predicting violation reporting (H1c). Table 2 shows the summary of the regression models. For simplified interpretation, the results are also shown graphically in Figure 4.  −.14 ** * p < 0.05. ** p < 0.01. Note. M = mean; SD = standard deviation; SCB = safety citizenship behavior; Likert-scale ranging from 1 = "I strongly disagree" to 5 = "I strongly agree". a 1 = low expression of the measured construct, 5 = high expression of the measured construct. Hypothesis two assumed that the magnitude of the positive relation between commercial pilot's safety citizenship behavior and their voluntary reporting behaviors depends on fear (H2a), shame (H2b), just culture (H2c), change goals (H2d), protection goals (H2e), and procrastination (H2f). The decision to accept or reject a hypothesis was based on a statistically significant interaction term between SCB and the respective moderator formulated in H2a to H2f. To provide an initial overview of which hypotheses were accepted or rejected, we summarize the interaction terms in Table 3; accepted or partially accepted hypotheses are then described in detail. As mentioned above, accepted or partially accepted hypotheses that were also supported by analyses without influential data points are presented visually in Figure 5.
H2a was rejected because no significant interaction between SCB and fear could be shown for all three subtypes of reporting behavior. The analyses without influential data points found support for accepting the hypothesis for violation reporting and confirmed a rejection of the hypothesis for performance-based error reporting and decision-making error reporting.
H2b was rejected because no significant interaction between SCB and shame could be shown for all three subtypes of reporting behavior. The analyses without influential data points confirmed this result for all subtypes of reporting behavior.
H2c was accepted for violation reporting and rejected for performance-based error reporting and decision-making error reporting. The analyses without influential data points confirmed this result for decision-making error and violation reporting, and found support for accepting the hypothesis for performance-based error reporting.  Note. b = coefficient; SE = standard error; CI = confidence interval; LL = lower limit; UL = upper limit; df = degrees of freedom; p = p-value (p < 0.05 in bold); ∆R 2 = change in R 2 due to moderation; SCB = safety citizenship behavior. a Outcome variable, b Predictor Variable, c Moderator. 1 Sensitivity analyses in italics. Figure 5a shows the conditional effect of SCB on reporting of violations depending on just culture, shown for mean and ±1 SD values. The magnitude of the positive relation between SCB and violation reporting was decreased by lower values of just culture. Lower values of just culture also led to lower predicted values in violation reporting, independent of the SCB level. As described above, the figure may only be interpreted up to meancentered values of just culture ≤ 0.72.
H2d was accepted for decision-making error reporting and violation reporting, and rejected for performance-based error reporting. The analyses without influential data points confirmed this result. SCB (t (3,198 Figure 5c shows the conditional effect of SCB on violations reporting depending on change goals, shown for mean and ±1 SD values. The effect is comparable to the effect in Figure 5b. The magnitude of the positive relation between SCB and violation reporting increased by lower levels of change goals. For approximately average values of SCB, the predicted values of violation reporting were independent of the level of change goals. As described above, the figure may only be interpreted up to mean-centered values of change goals ≤0.50. H2e was accepted for performance-based error reporting and rejected for decisionmaking error reporting and violation reporting. The analyses without influential data points confirmed this result. SCB (t(3, 198) = 10.65, p < 0.001, b = 0.46), protection goals (t(3, 198) = 2.01, p < 0.05, b = 0.09) and the interaction between SCB and protection goals (t(3, 198) = 2.19, p < 0.05, b = 0.08) significantly predicted performance-based-error reporting (F(3, 198) = 40.59, p < 0.001, R 2 = 0.35). The change in R 2 by the interaction term amounted to 1.39% (F(1, 198) = 4.81, p < 0.05, ∆R 2 = 0.01). There were no statistical significance transition points within the observed range of the moderator found using the Johnson-Neyman technique. Figure 5d shows the conditional effect of SCB on performance-based-error reporting depending on protection goals, shown for mean and ±1 SD values. The magnitude of the positive relation between SCB and performance-based error reporting was increased by the level of protection goals. For low levels of SCB, the predicted values of performance-based error reporting were almost independent of the level of protection goals.

Other Planned Analyses
The other planned analyses indicated that SCB explained the largest proportion of variance in performance-based error reporting and decision-making error reporting, when the variables fear, shame, just culture, change goals, protection goals, and procrastination were included in the model. For explaining variance in violation reporting, results indicated that just culture appeared to be a stronger predictor than SCB.
The results showed that using non-moderated multiple regression just culture and change goals explained proportions of variance in performance-based error reporting in addition to SCB. Variance proportions of decision-making error reporting could be explained by shame, just culture, and protection goals in addition to SCB. A more detailed overview of the results is presented in Appendix C.

Discussion
The aim of this study was to further explain commercial pilots' reporting behavior in the context of safety citizenship behavior (SCB), while also investigating the influence of individual and environmental factors from an organizational safety perspective.
Voluntary incident reporting is frequently discussed in relation to SCB in the scientific literature, although empirical evidence for this relationship is limited. The results of this study provided a theoretical contribution [114] by suggesting that reporting behavior may be considered a specific form of self-intentional, change-oriented SCB; commercial pilots who exhibit a pronounced fundamental SCB in their routine work are likely to exhibit a pronounced reporting behavior when they have caused a self-inflicted incident. The strength of this effect depended on the given pilot's unsafe acts (performance-based errors; decision-making errors, and violations) that caused the incident. Therefore, our results indicated that different subtypes of incident reporting behavior should be distinguished in terms of these three causal factors. The requirement for this differentiation was also underlined when considering different motivational factors; particularly, just culture and goals when reporting were found to be reporting behavior-specific motivational factors; a moderating effect of just culture on the magnitude of the positive relation between SCB and reporting behavior could only be shown for violation reporting; and higher levels of just culture decreased the magnitude of this positive relation but resulted in the overall highest predicted values for violation reporting. In other words, the higher the level of just culture, the less relevant SCB was for predicting violation reporting. Combined, SCB, just culture, and their interaction explained nearly 50% of the variance in violation reporting. When commercial pilots pursued a higher level of change-oriented goals, this decreased the magnitude of the positive relation between SCB and decision-making error reporting and violation reporting, but led to higher predicted values up to an average level of SCB. In contrast, pilots primarily pursued protection-oriented goals when reporting performancebased errors, and a higher level of these goals increased the magnitude of the positive relation between SCB and performance-based error reporting. If pilots tended to higher levels of procrastination, this increased the magnitude of the positive relation between SCB and performance-based error reporting. However, this effect was only very weak and not stable without influential data points, and was therefore not interpreted further. Neither fear nor shame were found to be reporting behavior-specific motivational factors because they did not show a moderating effect on the magnitude of the positive relation between SCB and the different subtypes of reporting behavior in our study.
As part of the exploratory approach, SCB was found to be the strongest predictor for reporting behavior for incidents caused by errors, even when compared with direct effects of the other independent variables in this study. The result was different regarding violation reporting; for this subtype of reporting behavior, just culture appeared to be a stronger predictor than SCB. When testing models without interaction effects, shame, both types of goals, and just culture showed direct effects on different subtypes of reporting behavior.
Our research proposed to consider reporting behavior to be a specific form of selfintentional change-oriented SCB, further elaborating the state of research on safe work behaviors; our finding that reporting behavior should be interpreted in the context of its causes confirmed the research of [24,51,62]. The explained variance in decision-making error reporting was lower compared to the other two subtypes of reporting behavior, complementing the results of [62] and underlining an area that may be particularly affected by missed learning opportunities in the context of LFI. Our results showed that several motivational factors investigated by previous studies are transferable to pilots' reporting behavior and confirmed findings from the expert survey we conducted. For example, we were able to demonstrate several beneficial effects of a positive perception of just culture on violation reporting, extending the findings of recent research (e.g., [115]) and supporting claims made by the IATA [2] to civil airlines. A key aspect of a just culture is that pilots can report without fear [2,30]. The strong negative correlation between just culture and fear shown in our study (r fear,just culture = −0.61) suggests that these two constructs are two sides of the same coin. From this viewpoint, it was surprising that, despite a significant interaction effect between SCB and just culture, no interaction effect between SCB and fear could be shown. In summary, our results concerning fear were in contrast with numerous related studies from the last 20 years (cf. [20,68,69,116,117] and the results of our expert interviews. Our findings suggested that reporting behavior, as with any form of work behavior, is goal-directed, and we expanded the field of research on reporting behavior to include a previously poorly considered aspect (cf. [64]). The goal of writing a report so that others can learn from it confirmed the research findings of [24,51] and underlined the relevance of incident reports in the context of LFI (cf. [10]). The goal of protecting oneself when reporting confirmed the research of [118], in which most participants viewed reporting as a professional obligation and a strategy to limit their individual liability. Our study also extended the state of research that these liability concerns may also include possible negative consequences by the civil aviation authority, and confirmed the VC safety experts' assumption to that effect. The weak effect of procrastinating shown supported the findings of [119] that procrastinating appears to be a product of prior performance rather than a predictor of behavior in the future. Contrary to the findings of [26,69], shame did not affect reporting behavior; only a direct effect was shown in the exploratory analysis on decision-making error reporting. Given that inadequate decisions are often a product of inadequate teamwork skills and limited crew resource management (CRM), and pilots themselves consider these skills important, it was surprising that shame affected and even seemed to reinforce this very subtype of reporting behavior [120,121]. A summarizing view of the results underscores the relevance of a goal-oriented view of reporting behavior, for example, from the perspective of VIE-theory. In other words, pilots who pursue personal goals in reporting are likely to attribute increased valence to these goals and consider the writing of a report as instrumental to achieving these goals. Just culture results in the report, which leads to the desired outcome (change or protection) rather than a disadvantage for the pilot, and may thereby improve reporting behavior (cf. [122]).

Limitations
In the following section, we describe limitations regarding quality criteria of the measurement instrument and the overall research design (cf. [123]). Implementation objectivity may be limited by a top-down display of the scale on mobile devices that differs from the display on computer screens (cf. [124]). Consistency effects due to the fixed order of items in the questionnaire could not be prevented, and even items belonging to one construct were partially mixed. Safety-related cultural processes tend to be more implicit and less accessible to conscious perception compared to climatic processes [29]; therefore, the reliability of measuring individual perceptions of just culture may be limited. Furthermore, the individual assessment of hypothetical incident scenarios items may be biased by a different perception of severity (cf. [125]) or by the fact as to whether a pilot had already experienced a comparable incident described in the scenario. Although the use of hypothetical incident situations is an innovative approach for measuring reporting behavior, decisions made in hypothetical scenarios may not reflect decisions made in real situations (cf. [126]).
Due to the predominantly medium difficulty of the items, it may not have been possible to capture marginal areas in item expression (cf. [127]). The values for the discriminatory power of the items of the fear scale were mainly in a low range, which may be an indication of multidimensionality within this scale (cf. [75]). The internal consistency of the decision-making error reporting scale was only in a questionable range (cf. [75,89]). When construct validity was tested, the items of the just culture construct showed a factor loading with the scale of change goals; the factor loading of one item of the factor shame was <0.5 (cf. [75,100]). The practical relevance of the results might be limited by the fact that the estimation of effect sizes via R2-increase is positively biased (cf. [111]). Furthermore, using the Johnson-Neyman technique, the moderator effects were significant in two cases for only a certain range of values. The results found in the sensitivity analyses partly differed from results of the analysis of the complete data set. In addition, the guidelines for the use of thresholds are often not clearly defined, and it is up to the researcher which thresholds are chosen (cf. [100]). Basically, from a mathematical perspective, moderation is the only interaction between two variables; the distinction between predictor and moderator is permissible in the context of the study design, but ambiguous (cf. [105]). For example, Hypothesis 2c could also be interpreted in the sense that the magnitude of the positive relation between just culture and violation reporting is moderated by SCB.
Regarding limitations in the overall research design quality criteria, we want to explicitly point out to readers that the relatively low response rate of about 2% of the pilots invited to participate can be regarded as an important restriction to the external validity of the results (cf. [123]). Strictly speaking, the results only reflect the attitudes of a very small proportion of German commercial pilots. Moreover, it needs to be assumed that pilots who voluntarily complete a questionnaire which aims to "increase flight safety", according to the email invitation of the Vereinigung Cockpit, are also more likely to voluntarily report an incident to increase flight safety than those who did not participate.
Furthermore, the results might be biased by response tendencies in terms of socially desirable response behavior of the participants. As we could not convince the European Cockpit Association to support the study, only German pilots received an invitation to participate in the study. Therefore, country-specific aspects could not be investigated, and the results can only be generalized to German pilots. As no individual token was used for participation in the web-based questionnaire, and the participation link could be forwarded without author control, the reliability of the procedure may be limited by multiple participations of the same person. An unconscious bias of the authors in the selection of the studied predictors and the determination to focus on interaction effects may result in aspects relevant to the description of reporting behavior not being considered. Moreover, the assumption of a precedence of SCB could be questioned as a prerequisite for causal inferences (cf. [100]).
Regardless of the limitations described, the strength of this study lies in its role as the consolidating, quantitative component of a method triangulation consisting of mixed methods (analysis of incident reports) and qualitative methods (expert interviews, publication in review) to answer the superordinate research question on explaining commercial pilots' voluntary reporting behavior.

Implications for Research
This research was able to show that the different subtypes of reporting behavior can be explained by different factors, but further research is needed to explain these relationships.
Self-efficacy is particularly related to change-oriented SCB [43]; in the context of our results, research could investigate the effects of self-efficacy on different subtypes of reporting behavior. Our study was only able to indicate weak signs of an effect of procrastination on reporting behavior. Further research could deepen this topic using the theory of "Theory of Planned Behavior" [71]. Moreover, research might also explore reporting behavior of violations in the context of the construct "Pro-social Rule Breaking" (cf. [128,129]). Moreover, it might seem reasonable to suggest that a work-safety tension which has been found to be a strong driver of violations [130] is also related to the reporting behavior of incidents caused by violations; further research might explore this relationship.
The age distribution of the participating pilots is unbalanced and shows a high proportion of pilots between 31 and 40 years of age. Given that this research did not investigate a possible difference in voluntary reporting behavior depending on age (presumably also reflecting seniority), we encourage future research to address this aspect. Research could review the generalization of our findings to non-German pilots or employees from other organizational contexts. To establish a holistic perspective on commercial airline incidents, further research could also address the reporting behavior of cabin crew members, technicians, and ground staff. The various direct effects shown in the exploratory evaluation may provide researchers with indications of different perspectives on reporting behavior. For example, further research could explore effects of a more diverse operationalization of just culture or investigate the non-expected effects of shame on reporting behavior.

Implications for Airlines
The results of this study provide indications on how voluntary reporting behavior can be improved at the organizational and individual levels. Airlines should encourage fundamental SCB of their pilots to improve reporting behavior indirectly. Reinforcement of change-oriented goals in reporting could involve feedback to pilots that LFI activities or EBT measures have been adjusted to meet pilots' needs based on findings from incident reports. Airlines should bear in mind the beneficial effects of a just culture that is perceived positively by pilots and not "rest" on having written the principles into the internal organizational rules. Airlines could use a survey to collect data about pilots' levels of their perception of just culture and personal goals when reporting to improve the airline's assessment of pilots' reporting behaviors; this could also support an increase in the accuracy of report-related safety performance indicators.
Reporting self-inflicted incidents encourages a pilot to admit their own errors or violations. If airlines live the principles of a just culture and pilots recognize the value of their incident reports, the seeds of the forest mentioned at the beginning can flourish, and incidents may be more effectively prevented in the future.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy policy of the participants.

Acknowledgments:
The authors would like to thank the participating pilots for their contribution. Special thanks are given to the Professional Association of Commercial Aircraft Pilots and Flight Engineers in Germany "Vereinigung Cockpit" (VC) for their support and valuable suggestions. Further thanks go to Sebastian Brandhorst for his valuable advice.