Burnout Assessment Tool (BAT): Validity Evidence from Brazil and Portugal

The Burnout Assessment Tool (BAT) has been gaining increased attention as a sound and innovative instrument in its conceptualization of burnout. BAT has been adapted for several countries, revealing promising validity evidence. This paper aims to present the psychometric properties of the Brazilian and Portuguese versions of the BAT in both the 23-item and 12-item versions. BAT’s validity evidence based on the internal structure (dimensionality, reliability, and measurement invariance) and validity evidence based on the relations to other variables are the focus of research. A cross-sectional study was conducted with two non-probabilistic convenience samples from two countries (N = 3103) one from Brazil (nBrazil = 2217) and one from Portugal (nPortugal = 886). BAT’s original structure was confirmed, and it achieved measurement invariance across countries. Using both classic test theory and item response theory as frameworks, the BAT presented good validity evidence based on the internal structure. Furthermore, the BAT showed good convergent evidence (i.e., work engagement, co-worker support, role clarity, work overload, and negative change). In conclusion, the psychometric properties of the BAT make this freely available instrument a promising way to measure and compare burnout levels of Portuguese and Brazilian workers.


Introduction
Although the burnout syndrome appeared in the 1970s, it is still a global issue such that the 11th revision of the International Classification of Diseases of World Health Organization (ICD-11) defines it as an occupational phenomenon with risk of harming health [1]. The adopted definition of burnout in the ICD-11 comprises three factors (exhaustion, cynicism, and reduced professional efficacy) as the framework proposed by Maslach et al. [2]. However, the conceptualization of burnout is somewhat controversial [3]; for example, a meta-analytical study on the physicians' burnout found 142 unique definitions of burnout with at least 47 unique definitions using MBI. Some constructs, such as depression and fatigue, are conceptually linked to job burnout [4,5]. These phenomena are potentially part of the process of long-term sick leave. At the core of burnout lies severe fatigue (i.e., exhaustion); however persistently fatigued workers are not necessarily (by definition) in burnout, nor must burned-out workers necessarily report fatigue as the main complaint [5]. Occupational fatigue has been linked to an imbalance between the intensity and duration and timing of work with recovery time [6]. Studies over decades have shown evidence that burnout syndrome predicts various negative consequences to individuals and organizations, such as cardiovascular diseases, hypercholesterolemia, type 2 diabetes, coronary heart disease, musculoskeletal disorders, prolonged fatigue, headaches, gastrointestinal issues, mood disturbance, depressive symptoms, absenteeism, poor performance, insomnia, depressive symptoms, and life and job dissatisfaction [7][8][9][10][11][12][13][14][15][16].
Research shows that job demands (e.g., work overload) are more associated with job burnout, while job resources (e.g., co-worker support) are more related to job burnout's antipode, i.e., work engagement [17]. Nowadays, researchers claim the COVID-19 pandemic has posed strain and increased workload and job stress, particularly in healthcare workers, who have presented a higher risk of burnout than other occupations [18][19][20]. Going beyond the individual consequences of burnout, recent research has also investigated burnout in a large range of occupations, organizations, and countries [21][22][23][24][25][26]. The literature has firmly established that burnout is not only detrimental for workers' health but also has negative effects at the organizational level.
The most widely used instrument to assess burnout is the Maslach Burnout Inventory (MBI) [27]. Despite MBI's early contribution to enlighten burnout as an important psychological state to be deeply studied, researchers are still discussing its theoretical framework and psychometric basis and the practical applicability of this instrument [21,28]. Schaufeli et al. [29] summarize these criticisms of MBI as including the following as the most important: (a) the questioning of the validity evidence of the constituting dimensions of burnout, (b) the lack of clinically established cut-off values, (c) the lack of representative and national samples to ground its statistical norms, (d) the limitations of its practical usability, and (e) the inconsistent dimensionality also in the cross-national studies on MBI [30]. Finally, with similar problems and weaknesses, there are other burnout measures, such as the Copenhagen Burnout Inventory [31], the Oldenburg Burnout Inventory [32], and, recently, the COVID-19 Burnout Scale [33].

Burnout Assessment Tool (BAT)
Taken together, these criticisms call for an alternative instrument to assess burnout and overcome these flaws using a novel conceptualization of the matter, which has been addressed by the development of the Burnout Assessment Tool (BAT) [34]. The BAT instrument assumes that burnout is a syndrome assessed by core symptoms (exhaustion, mental distance, emotional impairment, and cognitive impairment) and secondary symptoms (psychological distress and psychosomatic complaints), which could be associated with depressed mood and other comorbidities. Therefore, BAT considers burnout a second-order factor that acts as a syndrome, meaning that all four components are connected and belong to the same higher-order construct, i.e., burnout [21]. Based on the Job Demands-Resources Model (JD-R) [17], the key components constituting the burnout process are the draining energy that leads to feeling exhausted and extremely tired at the same time that the distancing mentally manifests itself as a lack of interest and aversion to work [35]. In addition, in-depth interviews with experts brought two significant dimensions of burnout, which were not known until then, which are emotional impairment and cognitive impairment. These dimensions affect one's self-regulation to deal adequately with the daily working activities and to recover self-energy linked to the motivational process [36].
Meanwhile, BAT has largely been investigated [37][38][39][40][41] and has demonstrated measurement invariance between seven countries in Europe and Japan [21]. As in the works of De Beer et al. [21] and Sakakibara et al. [28], the current study is based on the BAT reconceptualization of burnout as a work-related state of exhaustion, extreme tiredness with reduced ability to regulate cognitive and emotional processes, and mental distancing. It can develop depressed mood as well as non-specific psychological and psychosomatic complaints [34]. Despite using the raw scores of only one item from MBI (i.e., "I feel exhausted at the end of the working day"), Schaufeli [24] found medium levels of burnout in Portugal, in comparison with a random sample of workers from thirty-five European countries (n = 43,675), using data from the 6th European Working Conditions Survey [42]. While in Brazil there is no publication reporting burnout scores using a survey (conducted with a representative sample) at the national level.
This current study is focused on the psychometric properties of BAT from a crossnational perspective (i.e., Brazil and Portugal). The main goal is to assess BAT's validity evidence based on the internal structure and based on the relations to other variables.

Research Hypotheses
Following the recommendations of the Standards for Educational and Psychological Testing [43], this study aims to evaluate two types of validity evidence for both for BAT's Portuguese and Brazilian version: one related to the internal structure, and one based on the relations to other variables (i.e., work engagement, role clarity, co-worker support, work overload, and negative change). BAT's original structure was successfully confirmed in several countries in a study by De Beer et al. [21] with data from Austria, Belgium (Flanders), Finland, Germany, Ireland, Japan, and the Netherlands. The Japanese version of BAT was also confirmed in a different study [28], while the South Korean version maintained the hierarchical structure with four first-order dimensions, albeit with the removal of one item from the mental distance factor [44]. The Russian version also provided evidence indicative of the stability of the hierarchical structure [45]. Altogether, it is expected that the hypothesized hierarchical structure for BAT-23 (one second-order latent variable with four first-order factors, 23 indicators) and BAT-12 (second-order factor with four first-order dimensions, 12 items) hold with a satisfactory fit to the data in both Brazil and Portugal (H1). The reliability of the scores is one of the key components of the internal structure of any psychometric instrument [46]. It can be analyzed using four different types of approaches: internal consistency, test-retest, parallel forms, and interrater agreement. Previous research showed good evidence of internal consistency estimates using the ordinal α [47] for both second-order factor and first-order dimensions [21]. BAT's also presented satisfactory evidence in terms of test-retest evidence [48]. The second hypothesis (H2) states that BAT presents satisfactory internal consistency estimates (≥0.80) [46]. Measurement invariance is another component of the validity evidence based on the internal structure; it is an essential feature that is necessary before any substantive group comparisons (e.g., countries, sex) can be established. BAT has shown measurement invariance between seven countries Austria, Belgium (Flanders), Finland, Germany, Ireland, Japan, and the Netherlands [21]. Regarding sex, there is no single study investigating measurement invariance among sex. However, it is known that sex might be an important factor regarding burnout [49], it is expected that females present higher levels of burnout [50] namely in terms of exhaustion [51,52]. While others did not reach definite conclusions [25].
Other instruments measuring burnout and related constructs have previously shown measurement invariance among workers from Brazil and Portugal [53] and among sex within the two mentioned countries [54,55]. Reinforcing the similarity of the measurement structure of psychometric instruments among workers from the two countries. It is hypothesized (H3) that BAT holds measurement invariance among countries (Brazil and Portugal), and sex.
Another important source of validity evidence is provided by the relationship of instrument scores to external variables to the instrument [43]. This source of evidence allows understanding if the interpretation of the scores can be done as expected by the nomological network of constructs [56]. The JD-R model identifies possible antecedents of job burnout [57]. The central idea of the JD-R model is that working conditions, which are specific to every occupation, can generally be classified as either job demands or job resources, and those job characteristics will contribute to job burnout and work engagement [58]. The JD-R model suggests that work engagement is negatively related to burnout, since high job demands lead to a health impairment process (i.e., job burnout) and high resources will lead to a motivational process, i.e., work engagement [59]. Several meta-analyses have supported the relationship between job demands and resources and burnout [60][61][62]. In these studies, several job demands and resources were identified. For instance, social support, workload, and role clarity have been found as relevant demands and resources. As such, it is expected to observe a positive association between burnout and job demands and a negative relation between burnout and job resources [28,39,63]. It is anticipated that BAT's scores are negatively correlated with work engagement, role clarity, and co-worker support and positively correlated with work overload, and negative change (H4).

Sampling, and Data Collection
In this cross-sectional survey, a non-probabilistic convenience sample was collected. The inclusion criteria consisted of participants being able to read Portuguese and having easy access to a smartphone, PC, or tablet where they could open a digital questionnaire. The authors invited workers from Brazil and Portugal to participate. Considering BAT's second-order factor and four first-order dimensions with 23 manifest variables, it results in a total of 226 degrees of freedom [64], assuming that the population RMSEA should be not higher than 0.08 (i.e., ε 0 = 0.08; H 0 : ε ≥ 0.08), since rejecting this hypothesis will lead to the conclusion that the model fit is better than 0.08, the recommended cutoff for a reasonable fit [65]. Additionally, the true population RMSEA was considered to be ε = 0.064 based on the findings from de Beer's et al. [21] study using a sample of 10,138 participants. Altogether, for an α = 0.05, β = 0.20 (i.e., power = 0.80) resulted in a required sample size of n = 171 [66].

Constructs and Psychometric Instruments
All used measures were used in their adapted version to Brazilian and Portuguese contexts.

Job Burnout
The BAT was used to assess burnout [29] through the development of two transculturally adapted versions: one for Brazil and one for Portugal (Table 1). The BAT-23 is a self-report psychometric instrument that comprises 23 items to be answered using a fivepoint rating scale (1-"Never"; 2-"Rarely"; 3-"Sometimes"; 4-"Often"; 5-"Always"). The BAT-23 version measures burnout's core symptoms, and another version is also available that also includes items to assess burnout's secondary symptoms. To develop the Portuguese version (Table 1), the BAT's English version was used [34] following the ITC Guidelines for Translating and Adapting Tests [67]. BAT-23 operationalizes burnout as a second-order construct with four first-order factors: exhaustion (eight items), mental distance (five items), cognitive impairment (five items), and emotional impairment (five items). From BAT-23's items, it is possible to extract a short version (BAT-12) with three items per each first-order latent construct (Table 1).

Work Engagement
Work engagement refers to a positive motivational state and is composed of vigor, dedication, and absorption [68]. This construct was measured with the ultra-short version of the Utrecht Work Engagement Scale (UWES-3) [69], which used items from the shortversion (i.e., UWES-9). The used UWES-3 items have been previously adapted with success to Portugal and Brazil [70,71]. It uses a seven-point ordinal scale (0-"Never"; 1-"Almost never"; 2-"Rarely"; 3-"Sometimes"; 4-"Often"; 5-"Very often"; 6-"Always"), with one item pertaining to each of the three dimensions. The UWES has shown good convergent evidence with the burnout scores since work engagement and burnout are moderately and negatively related [29,72]. The UWES already presented measurement invariance among Brazil and Portugal in the 9-item version [70]. One example of an item is: "At my work, I feel bursting with energy."

Co-Worker Support
Co-worker support refers to the function and quality of social relationships at work, such as perceived availability of help from coworkers or support actually received [73]. To assess the perceptions of co-worker support, the co-worker support sub-scale (3 items) of the Energy Compass psychometric instrument [74] was used. The items were answered using an ordinal five-point scale (1-"Never"; 2-"Seldom"; 3-"Sometimes"; 4-"Often"; 5-"Always"). One example of an item is: "Can you count on your colleagues for help and support when needed?"

Role Clarity
The clarity of the role assesses the extent to which the tasks to be performed are clearly defined and the expectations and responsibilities for the employee are clear [75]. This construct was assessed using the sub-scale role clarity of the Energy Compass psychometric instrument [74]. The three items of the sub-scale were answered using a five-point ordinal scale (1-"Never"; 2-"Seldom"; 3-"Sometimes"; 4-"Often"; 5-"Always"). One example of an item is: "Is it sufficiently clear what you need to do in your job?"

Work Overload
Work overload can be defined as the extent to which the employee has to deal with changes in job content, ICT systems, and leadership, as well as in the organization as a whole. Four items answered using a five-point ordinal scale (1-"Never"; 2-"Seldom"; 3-"Sometimes"; 4-"Often"; 5-"Always") were used, as suggested in Schaufeli et al. [76]. One example of an item is, "Do you have too much work to do?" 2.2.6. Negative Change Negative change refers to the pessimistic views produced by the introduction of modifications at work, e.g., pace of work, interpersonal conflict, work-home conflict, and use of skills [77]. The negative change construct was assessed using the corresponding subscale (three items) from the Energy Compass psychometric instrument [74] answered in a five-point scale of frequency (1-"Never"; 2-"Seldom"; 3-"Sometimes"; 4-"Often"; 5-"Always"). An example of an item is, "Do changes cause turmoil in your company?"

Procedure
Data were collected simultaneously in Portugal and Brazil. The workers were invited to participate through social networks or e-mail. Firstly, the participants were presented with the electronic informed consent, which they had to accept to participate in the study. The digital survey was deployed using LimeSurvey [78] and SurveyMonkey [79], which contained a group of psychometric instruments together with a group of sociodemographic and job questions. To check how likely the research process would work, a pilot study was conducted with 15 workers, which provided feedback (e.g., potential issues with the digital platforms where the survey was deployed, clarity of the questions/items, and mean time of fulfillment).
All the subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the study was approved by the Ethics Committee of the Federal University of Health Sciences of Porto Alegre Brazil, (CAAE 78617617.8.0000.5345; 25 October 2017).

Data Analysis
To conduct the statistical analysis the statistical programming language R [80] through the integrated development environment, RStudio [81] was used. To estimate the adequate sample size for the confirmatory factor analysis, the MBESS package [82] was used. The skimr package [83] and the table1 package [84] were utilized to produce the descriptive statistics. The skewness (sk) using the "sample" method (i.e., sample skewness of the distribution) and the kurtosis (ku) using the "sample excess" method (i.e., sample kurtosis of the distribution with a value of 3 being subtracted) were calculated using the PerformanceAnalytics package [85]. The coefficient of variation (CV) was estimated with the sjstats package [86], the standard error of the mean (SEM) was calculated with the plotrix package [87]. The mode was computed by the modeest package [88]. Absolute values of |sk| > 3 and |ku| > 7 were considered as severe univariate normality violations [89,90]. To evaluate the multivariate normality the psych package [91] was used to calculate Mardia's multivariate kurtosis [92].
To obtain evidence about the originally proposed dimensionality of the measurement models, the confirmatory factor analysis (CFA) was used. The following goodness-of-fit indices were used: NFI (Normed Fit Index), TLI (Tucker-Lewis Index), CFI (Comparative Fit Index), RMSEA (Root Mean Square Error of Approximation), and SRMR (Standardized Root Mean Square Residual). Estimates above 0.95 are considered good for NFI, TLI, and CFI [93]. While values below 0.08 were considered good for SRMR and RMSEA [94]. The package lavaan [95] was used to run the CFA analysis using the Weighted Least Squares Means and Variances (WLSMV) estimator [96]. The WLSMV was chosen because it does not require multivariate normality as an assumption, and because all items of the used psychometric instruments have an ordinal response scale.
The Average Variance Extracted (AVE) was estimated to test the evidence for convergent validity [97]. Satisfactory convergent validity evidence in terms of the internal structure was assumed for AVE ≥ 0.5 [98].
Item response theory analysis was conducted using a multidimensional polytomous Rasch model [99] as a particular case of the multidimensional random coefficients multinomial logit model (MRCMLM) [100]. The TAM package [101] was used to conduct the multidimensional polytomous Rasch analysis. Wright maps (also known as item-person maps or item maps) were used to present the location of both items and respondents on the same scale [102,103]. The WrightMap package [104] was used to produce the Wright Maps. Two mean square fit statistics (i.e., infit and outfit) were used to assess how well the data fit the model [105]. Considering the ordinal nature of the rating scale (i.e., 1-"Never" to 5-"Always"), the interval (0.6; 1.4) was considered as reasonable for the item mean square ranges for infit and outfit statistics [106]. Values above 1 suggest an increasing quantity of answers diverging from model's predictions, while values below 1 indicate answers with less heterogeneity than expected [107].
To assess the evidence of reliability of the first-order factors, the following estimators of internal consistency were used: composite reliability (CR) [97], the α ordinal [47], and ω [108]. Values of ≥ 0.8 on the different mentioned estimators are considered indicative of acceptable reliability evidence [46,98]. The second-order latent factor also had estimates of internal consistency: the proportion of variance among first-order common factors that is attributable to the second-order factor (ω L2 ), the proportion of variance of a composite score calculated from the observed indicators that is attributable to the second-order factor (ω L1 ), and the proportion of observed variance explained by the second-order factor after partialling out the uniqueness from the first-order factors (ω partial L1 ). Both second-order and first-order internal consistency estimates were calculated using the semTools package [109]. In the item response theory framework, the MRCMLM provided the expected a posteriori (EAP) reliability index for each latent factor. The EAP reliability is defined as the ratio of the variance of the EAPs and the variance of the plausible values [110]. Values of EAP reliability ≥ 0.8 are preferable.
Using the theta-parameterization for categorical items through the semTools package [109], measurement invariance was evaluated comparing a group of eight different models [111]: (I) configural invariance, (II) thresholds of the indicators, (III) first-order factor loadings, (IV) structural weights, (V) intercepts of the first-order factors, (VI) latent means, (VII) disturbances of the first-order factors, and (VIII) residual variances of observed variables. The differences between the nested models were compared using two criteria. The ∆CFI ≤ −0.010 criterion [112], which advocates the non-rejection of the null hypothesis of invariance if the ∆CFI is smaller or equal to −0.010, and the ∆χ 2 criterion [113], which does not reject the null hypothesis of invariance if a significant χ 2 robust difference test is obtained.
The structural models were tested using the lavaan package [95] to test validity evidence based on relations to other variables. In the latent score means comparison, Cohen's d [114] was used as an effect size measure. The doBy package [115] was used to compute the raw score percentiles. A significance level of 5% was used (α = 0.05).

Descriptive Statistics of Study Participants
A merged sample of 3103 workers was collected (n Brazil = 2217; n Portugal = 886) 74.2% female, with an average of 37.2 (11.1) years old. More than half of the workers (53.4%) were professionals according to the International Standard Classification of Occupations (ISCO-08) [116], and 72.5% had graduation or a higher academic level. Table 2 presents the descriptive statistics for each country, and for the merged sample.

Validity Evidence Base on the Internal Structure
This source of validity evidence investigates the dimensionality, reliability of the scores, and measurement invariance.

Dimensionality
The distributional properties of BAT's 23 items are presented in Table 3; these were used to judge distributional properties and psychometric sensitivity. None of the items in both countries presented severe univariate normality violations [89,90]. Mardia's multivariate kurtosis [92] for the data from Brazil was 101.637 (p < 0.001), while for the data from Portugal it was 60.063 (p < 0.001). All items in both countries had the maximum range of possible answers, and no outliers were removed. These items' distributional properties are indicative of appropriate psychometric sensitivity, as it would be expected that these items would follow an approximately normal distribution in the population under study. Despite these univariate normality indicators, the weighted least squares means and variances (WLSMV) [96] estimation method was used, taking into consideration the ordinal level of measurement of the items.

Validity Evidence Base on the Internal Structure
This source of validity evidence investigates the dimensionality, reliability of the scores, and measurement invariance.

Dimensionality
The distributional properties of BAT's 23 items are presented in Table 3; these were used to judge distributional properties and psychometric sensitivity. None of the items in both countries presented severe univariate normality violations [89,90]. Mardia's multivariate kurtosis [92] for the data from Brazil was 101.637 (p < 0.001), while for the data from Portugal it was 60.063 (p < 0.001). All items in both countries had the maximum range of possible answers, and no outliers were removed. These items' distributional properties are indicative of appropriate psychometric sensitivity, as it would be expected that these items would follow an approximately normal distribution in the population under study. Despite these univariate normality indicators, the weighted least squares means and variances (WLSMV) [96] estimation method was used, taking into consideration the ordinal level of measurement of the items.     , and 52% or more of the variance of its indicators for BAT-12, i.e., AVE i ≥ 0.52 [98].
From a Rasch perspective, the items match the workers' sample, since BAT's items garnered information about workers at all ranges of the burnout distribution. Figure 3 displays both items' scale values (in terms of location) and persons' burnout levels (in terms of their location) spaced along a common vertical axis marked with a logits scale [103,117].

Measurement Invariance
The measurement invariance among countries and sex was tested through a group of nested models with increasing constraints (Table 5). Full uniqueness measurement invariance (i.e., strict invariance) was achieved among countries and sex for BAT-23 (H3) considering the ∆CFI ≤ −0.010 [112]. Using the ∆χ 2 criterion [113] thresholds, invariance among countries was achieved, and first-order factor loadings invariance was obtained among sex. However, the ∆χ 2 criterion is too restrictive [90]; consequently, the ∆CFI criterion was preferred. The fit of the data to the model was acceptable among countries and sex, as seen in the dimensionality analysis. The measurement of burnout using BAT works in a similar manner across countries and sex, allowing comparisons of scores to be established between the different groups.
BAT-12 presented scalar measurement invariance among workers from Brazil and Portugal (H3). In order to avoid negative disturbance (it is not theoretically possible) of the mental distance latent variable among the Portuguese sample, the disturbance of the mental distance first-order factor was constrained to 0.1 for three models (i.e., 4, 5, and, 6). Full uniqueness measurement invariance was achieved among sex using the ∆CFI criterion (H3).

Discussion
The data from Brazil and Portugal provided robust validity evidence for the BAT-23, and the BAT-12 using item response theory (i.e., multidimensional Rasch model) and classical test theory (i.e., confirmatory factor analysis) in conjunction. Satisfactory evidence was obtained based on both the internal structure and the relations to other variables. The current study adds to the already available evidence about BAT's psychometric properties using the classical test theory, e.g., [21], and item response theory [37]. The present study intends to take advantage of the benefits of the two measurement theories in conjunction while bringing some novelties, such as the second-order estimates of internal consistency (i.e., ω L2 , ω L1 , and ω partial L1 ) and the EAP reliability index. In terms of the Rasch model, the MRCMLM was used in contrast with the unidimensional approach used in BAT's previous research [37]. The multidimensional measurement model is both substantively advantageous and technically appropriate in cases where the unidimensionality is not expected [99]. The multidimensional approach considered BAT's four first-order dimensions, and a second-order latent variable. This is also the first study to provide infit and outfit estimates for BAT's items, these two mean square statistics are useful to understand how well the data fit the model [107].
The originally proposed dimensionality for BAT-23 and BAT-12 presented a satisfactory fit to the data for both countries without removing items (H1). Such findings are corroborated by samples from other American and European countries, Ecuador using BAT-23 and BAT-12 [40], and Italy using BAT-23 [38]. Currently, the cumulated evidence of BAT's dimensionality is consistent across countries from Asia, America, and Europe [21].
Globally, the evidence of the reliability of the scores in terms of internal consistency obtained by both samples was satisfactory both for second-and first-order dimensions (H2). In fact, only the mental distance dimension of the BAT-12 version with the Portuguese data presented estimates slightly below the desirable the Portuguese workers; nevertheless, those values were acceptable (i.e., ≥0.71). BAT's mental distance was the first-order dimension that had the lowest α, and ω in the Ecuadorian version [40], as did in the Italian version [38]. However, samples from other countries showed that mental distance did not present the lowest internal consistency estimates of all first-order dimensions [21]. As expected, BAT-12's first-order internal consistency estimates were lower than BAT-23's ones. Notwithstanding, BAT-12's internal consistency estimates were globally satisfactory [46], with both classical test theory (i.e., α ord , ω, and CR) and item theory response estimators (i.e., EAP).
Both versions of BAT had measurement invariance (i.e., at least scalar) for countries and sex (H3), allowing mean comparisons for BAT among countries, and sex. BAT-23 presented measurement invariance among seven countries in a previous study by De beer et al. [21]. One of the novelties of the current study is the measurement invariance of BAT-23 among sex and the test of measurement invariance among countries and sex of BAT's short version (i.e., . The BAT's scores' relation to other variables presented convergent evidence (H4) since all latent correlations' paths were statistically significant (moderate to strong effect sizes) with the theoretically expected direction for each correlation pair. BAT's burnout latent scores were negatively correlated with work engagement, role clarity, and co-workers' support. Positive latent correlations were found among BAT's burnout latent scores, work overload, and negative change. The latent correlations' effect sizes were similar among countries. Burnout's correlation with work overload and burnout's correlation with role clarity presented the largest difference among countries. The observed latent associations between BAT's scores and job demands and job resources are in accordance with the findings from research reading other BAT's versions; for example, the Romanian [39] and the Japanese [28]. BAT's burnout latent scores' correlation with work engagement was the strongest negative correlation for both countries. Strong negative correlations between burnout and work engagement are in accordance with what is theoretically expected from these two constructs [58,118]. Regarding the data from Brazil, the strongest positive correlation with BAT's burnout scores was achieved with work overload. While for Portugal, the strongest correlation for BAT's burnout latent scores was observed with negative change. However, the absolutes values were smaller than the ones observed for burnout and work engagement correlation. The data provided validity evidence based on the relations to other variables, allowing to consistently build up on the existing burnout's nomological network of constructs [56] reinforcing BAT's psychometric properties.
Both BAT-23 and BAT-12 presented good validity evidence and thus both can be used to measure burnout levels among workers from Brazil and from Portugal. The advantage of using BAT-23 concerns its finer-grained assessment of burnout (i.e., more items lead to capturing more content of the construct); however, if time is a constraint and other measures are being collected, BAT-12 can be a more parsimonious alternative. As the results showed, the obtained raw scores for BAT-12 and BAT-23 presented an almost perfect correlation (i.e., 0.98). Moreover, short-versions of psychometric instruments are preferable since their validity evidence is not compromised with its shorting from the full-length version. The main goal of using a short-version is to reduce the time burden of assessment. Usually, short-versions have lower estimates of reliability than full versions. The BAT-12 results were (in some of the first-order dimensions) slightly lower than the BAT-23 ones, although with no meaningful losses in terms of its satisfactory validity evidence. Practitioners and researchers opting between BAT-23 and BAT-12 will have to balance between time-saving of brevity versus construct content coverage and validity evidence [119]. The BAT-12 option seems to be the most balanced, since its validity evidence is equivalent to its longer counterpart, and longer instruments can present several problems, for example, boredom, fatigue, increasing dropout rates, and lack of attention [120].

Weaknesses, Strengths, and Suggestions for Further Research
The obtained non-probabilistic convenience sample introduces some degree of selection bias. However, probabilistic sampling (i.e., all units in the population have known and positive probabilities of inclusion) is only possible when there is a complete and up-todate list of the member of the population being investigated [121,122], which was not the case. Even with large samples, the representativeness of the samples cannot be assumed if the sampling method is not probabilistic. However, many valid conclusions can still be taken from the current study. Future research should be conducted with samples from occupational groups with few elements in the current paper (e.g., craft and related trades workers or elementary occupations). The sex proportions and academic level should also be more similar to each workers' population parameters. The current correlational study has a cross-sectional design. Longitudinal designs can strengthen the validity evidence of BAT, namely allowing longitudinal measurement invariance to be tested, which will allow BAT's structure stability through time to be studied. The current paper only investigated two of the five sources of validity evidence [43]. One of them (i.e., validity evidence based on the relations to other variables) was only analyzed from a correlational perspective with five related constructs. Further research on the BAT's scores' relations to other variables should expand to other conceptually linked constructs such as fatigue using, for example, the Portuguese adaptation of the Occupational Fatigue Exhaustion/Recovery [6] or the Brazilian adaptation of the Feeling of Fatigue scale [123]. Test-criterion relationships should be analyzed in future studies using a predictive or concurrent design. Future studies should also investigate other sources of validity evidence, e.g., the validity evidence based on the response processes.
The findings of the current study are based on large samples from two different countries. The obtained findings are promising in terms of the measurement of burnout's core symptoms. Future research should investigate the version of BAT including the secondary symptoms items for both countries, so as to also compare BAT's psychometric properties directly with the Brazilian and Portuguese adaptations of other psychometric instruments that measure burnout (e.g., MBI, OLBI, CBI). It will also be convenient to obtain cut-off values for different levels of burnout; for such a purpose, clinical samples will have to be investigated. Using the receiver operating characteristic (ROC) curve [124] will allow sensitivity (true positive rate) and specificity (true negative rate) to be estimated. Although it should be taken into consideration that BAT's score by themselves will not be enough, a full thorough clinical interview and complementary information will be required [29]. Another call that should be made for future studies is the incorporation of the increasing evidence (prior knowledge) about BAT's dimensionality to take advantage of the Bayesian approach [125], which particularly useful with small samples and allows some frequentist approach potential problems to be avoided (e.g., non-convergence, negative variances).

Practical Implications
BAT-12 was shown to be virtually equal to BAT-23 in terms of scores, as in terms of validity evidence, representing an equally robust alternative to measure burnout. The decision to use BAT-23 or BAT-12 will be related to the level of detail that one intends to obtain regarding the burnout measurement. BAT's Brazilian and Portuguese versions are invariant in terms of measurement, allowing for comparisons of means among countries, and between males and females. BAT's scores presented the expected associations with related measures. The quartiles and mean scores are also provided as the first reference in terms of burnout at the country and sex levels. BAT is a promising instrument and is a viable alternative to measuring burnout in workers from Brazil and Portugal.

Conclusions
The data of multi-occupational workers from Brazil and Portugal presented good validity evidence for both BAT-23 and BAT-12, supporting its use to measure and to compare burnout levels among sex and countries. BAT's scores provided support for the theoretical nomological network of constructs. Both samples' data fitted well the original structure of BAT-23 and BAT-12 with good reliability evidence. This leads to the conclusion that BAT is a good instrument for practitioners and researchers to measure burnout among different occupations.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on reasonable request from the corresponding author.