The Behavioral, Emotional, and Social Skills Inventory (BESSI): Psychometric Properties of a German-Language Adaptation, Temporal Stabilities of the Skills, and Associations with Personality and Intelligence

Social, emotional, and behavioral (SEB) skills comprise a broad set of abilities that are essential for building and maintaining relationships, regulating emotions, selecting and pursuing goals, or exploring novel stimuli. Toward an improved SEB skill assessment, Soto and colleagues recently introduced the Behavioral, Emotional, and Social Skills Inventory (BESSI). Measuring 32 facets from 5 domains with 192 items (assessment duration: ~15 min), BESSI constitutes the most extensive SEB inventory to date. However, so far, BESSI exists only in English. In three studies, we comprehensively validated a novel German-language adaptation, BESSI-G. Moreover, we expanded evidence on BESSI in three ways by (1) assessing the psychometric properties of the 32 individual skill facets, in addition to their domain-level structure; (2) providing first insights into the temporal stabilities of the 32 facets over 1.5 and 8 months; and (3) investigating the domains’ and facets’ associations with intelligence, in addition to personality traits. Results show that BESSI-G exhibits good psychometric properties (unidimensionality, reliability, factorial validity). Its domain-level structure is highly similar to that of the English-language source version. The facets show high temporal stabilities, convergent validity with personality traits, and discriminant validity with fluid and crystallized intelligence. We discuss implications for research on SEB skills.


Introduction
Social, emotional, and behavioral (SEB) skills denote people's capacity to build and maintain social relationships, regulate emotions, and manage goal-and learning-directed behaviors (Abrahams et al. 2019;Schoon 2021;Soto et al. 2022). SEB skills comprise a broad set of inter-and intrapersonal abilities beyond those measured by traditional intelligence test that are sometimes referred to as "non-cognitive skills", "soft skills" or "character skills", although the term SEB skills is arguably more general and less valueladen. SEB skills, variously measured, predict educational achievement and attainment, job performance, well-being, health, and other consequential life outcomes-often above and beyond intelligence, which is traditionally seen as the major driver of many aspects of life success (e.g., Brandt

The Behavioral, Emotional, and Social Skills Inventory (BESSI)
Despite its vibrancy in recent years, research on SEB skills has long suffered from a high level of fragmentation, terminological confusion, and a lack of consensus regarding the definition of SEB skills as well as how to best assess them ( Napolitano et al. 2021), proposed an integrative framework for defining and organizing SEB skills. These authors defined SEB skills as functional capacities that relate to a person's maximum ability to show SEB skill-related behaviors when the situation calls for it. Their hierarchical framework, shown in Figure 1, distinguishes between 32 SEB skill facets that are grouped in 5 broader skill domains (i.e., the colored circles in Figure 1): Social Engagement, Cooperation, Self-Management, Emotional Resilience, and Innovation.
Akin to other SEB skill assessment frameworks-for example, the OECD's recent Study on Social and Emotional Skills (SSES; Chernyshenko et al. 2018)-the broader domains resemble the Big Five domains, which are the dominant framework in individual differences research. Soto et al. (2022) argued that the Big Five provide a comprehensive, and the empirically best-validated, framework to conceptualize individual differences in functional capacities as well as behaviors. Thus, the domain of Self-Management Skills is theoretically and empirically related to the Big Five domain Conscientiousness, Social Engagement Skills are related to the Big Five domain Extraversion, Cooperation is related to Agreeableness, Emotional Resilience is related to Neuroticism, and Innovation Skills are related to Openness. Most of the 32 individual SEB skills are uniquely assigned to one of these five domains. Some of the facets, labeled "interstitial facets" by Soto et al. (2022), fall in between two domains as per their loadings. Moreover, three facets, labeled "compound facets", do not fall under any of the domains but add distinct content. Along with this framework, Soto et al. (2022) introduced BESSI, a novel inventory to assess the SEB skills distinguished by their framework. Measuring each of the 32 SEB skill facets with 6 items (192 items in total), BESSI constitutes one of the most comprehensive SEB inventories to date. By grouping the 32 facets into 5 skill domains, BESSI allows for analyses both on the level of global domains and narrow facets. The option to analyze facets is a key asset because narrow facets add predictive power for life outcomes compared to domains, allow for a better understanding of mechanisms linking SEB skills to life outcomes, and offer a more apt target for interventions compared to global domains (e.g., Danner et al. 2021;Stewart et al. 2022). Global domains, meanwhile, offer a more parsimonious description and may be appropriate when assessment time and questionnaire space are limited, or when outcomes of interest are similarly global.
Crucially, although the organization of the five BESSI domains closely resembles the familiar five-factor structure from the realm of personality research, the response format of BESSI explicitly asks about perceived ability levels. That is, BESSI asks how well one can perform various tasks, rather than about typical behaviors (i.e., personality traits), in line with Soto et al.'s (2022) conceptualization of SEB skills as functional capacities. This allows for a clearer distinction between personality traits and SEB skills in a research field that has often used both concepts interchangeably (e.g., Lechner et al. 2019).
Across five studies and multiple adolescent and adult samples, BESSI showed good psychometric properties in Soto et al. (2022). Specifically, its facets showed high to very high internal consistencies, reached acceptable fit when modeled in a joint 32-facet item factor analysis (IFA) model, and clustered in the five broad domains largely as expected. Moreover, BESSI's facets and domains showed convergent and discriminant validity in relation to the Big Five personality traits and a variety of other SEB skill measures. Importantly, BESSI's domains and facets were also related to a wide range of outcomes including academic achievement and engagement, social relationships, and well-beingoften incrementally over the Big Five. Although based on self-reports and cross-sectional data, these findings suggest that SEB skills measured with BESSI provide unique information in predicting life outcomes that is related to, yet distinct from, personality traits. Along with this framework, Soto et al. (2022) introduced BESSI, a novel inventory to assess the SEB skills distinguished by their framework. Measuring each of the 32 SEB skill facets with 6 items (192 items in total), BESSI constitutes one of the most comprehensive SEB inventories to date. By grouping the 32 facets into 5 skill domains, BESSI allows for analyses both on the level of global domains and narrow facets. The option to analyze facets is a key asset because narrow facets add predictive power for life outcomes compared to domains, allow for a better understanding of mechanisms linking SEB skills to life outcomes, and offer a more apt target for interventions compared to global domains (e.g., Danner et al. 2021;Stewart et al. 2022). Global domains, meanwhile, offer a more parsimonious description and may be appropriate when assessment time and questionnaire space are limited, or when outcomes of interest are similarly global.
Crucially, although the organization of the five BESSI domains closely resembles the familiar five-factor structure from the realm of personality research, the response format of BESSI explicitly asks about perceived ability levels. That is, BESSI asks how well one can perform various tasks, rather than about typical behaviors (i.e., personality traits), in line with Soto et al.'s (2022) conceptualization of SEB skills as functional capacities. This allows for a clearer distinction between personality traits and SEB skills in a research field that has often used both concepts interchangeably (e.g., Lechner et al. 2019).
Across five studies and multiple adolescent and adult samples, BESSI showed good psychometric properties in Soto et al. (2022). Specifically, its facets showed high to very high internal consistencies, reached acceptable fit when modeled in a joint 32-facet item factor analysis (IFA) model, and clustered in the five broad domains largely as expected. Moreover, BESSI's facets and domains showed convergent and discriminant validity in relation to the Big Five personality traits and a variety of other SEB skill measures. Importantly, BESSI's domains and facets were also related to a wide range of outcomes including academic achievement and engagement, social relationships, and well-being-often incrementally over the Big Five. Although based on self-reports and cross-sectional data, these findings suggest that SEB skills measured with BESSI provide unique information in predicting life outcomes that is related to, yet distinct from, personality traits. These findings lend support to the idea that SEB skills can be meaningfully distinguished from personality traits.

Overview over the Present Research
In sum, BESSI provides a promising new tool for assessing SEB skills in a way that is valid, reliable, and comprehensive yet efficient. That said, Soto et al.'s (2022) initial studies were confined to English-speaking (mostly US) participants and the English-language source version of BESSI. For BESSI to be used in future research on SEB skills around the globe, additional language versions besides English are needed. To contribute to this endeavor, we therefore developed a German-language adaptation of BESSI, termed BESSI-G. Using this German-language adaptation, we set out to answer several fundamental questions about the SEB skills measured by BESSI, including their temporal stability and associations with intelligence.
Our validation of BESSI-G has three parts. In Study 1, a pilot study, we comprehensively assessed the psychometric properties of the initial translations of the 32 facet scales. In addition to internal consistency as an estimate of reliability, we assessed the facets' test-retest correlation over 1.5 months, providing the first evidence on the temporal stability of the SEB skills assessed by BESSI. In Study 2, the main study, we assessed the same psychometric properties of a slightly revised second version of BESSI-G facet scales using data from a fresh adult sample. We also estimated the test-retest stabilities and true-score correlations over approximately 8 months to gauge the temporal stability of the BESSI facets over an extended period. Moreover, we test the domain-level structure of BESSI-G. Finally, in Study 3, we present evidence on the convergent and discriminant validity of BESSI-G's facets and domains in relation to personality traits (as in the original study by Soto et al. 2022) and to fluid and crystallized intelligence, thus presenting first evidence on how BESSI(-G) relates to cognitive abilities.

Study 1: Pilot Study
The aim of Study 1 was to assess the psychometric properties of the initial version of the German-language adaptation of BESSI (BESSI-G v0.1). Whereas the original publication of the English-language source version of BESSI ) focused on the joint structure of all 32 BESSI facets, in Study 1 we focused on the 32 individual skill facets as the building blocks of the newly translated inventory. We examined their (uni-)dimensionality, reliability (including test-retest stability), and factorial validity. Such facet-level analyses are important to determine the psychometric properties of the individual facets when using single items as input. These analyses are also informative if item parcels are later to be used as input for factor analyses, as Soto et al. (2022) did in their analyses of the original BESSI, because parcels require unidimensionality.

Data
Data for Study 1 came from 1164 adolescents and adults aged 14 to 64 years residing in Germany whose native language was German. We determined sample size based on simulation studies suggesting that samples sizes of at least 500, and sometimes 1000+, are needed to obtain stable correlations on the observed and latent-variable level (Kretzschmar and Gignac 2019; Schönbrodt and Perugini 2013). We collected the data via a commercial online access panel provider (Respondi AG). Respondents received a small monetary incentive for participation. For adults (20-64 years), there was a quota for age, gender, and education according to the German Microcensus 2017, ensuring that the sample was sufficiently diverse and resembled the general population in terms of its sociodemographic compositions. For teenagers (14-19 years), there was a quota for gender (quotas for age and education were not feasible). The data collection took place in January 2021. After carefully screening out 30 cases that provided invalid responses (e.g., straightliners), our final analysis sample for Study 1 comprised 1134 respondents.
To investigate test-retest stability and administer additional measures used in Study 3 (personality traits and intelligence), we invited a subset of the T0 sample to participate in up to three additional waves in February 2021 (T1, n = 727, focusing on intelligence), March 2021 (T2, n = 597, focusing on the BESSI-G retest and testing potential replacement items), and May 2021 (T3, n = 300, focusing on the retest of the potential replacement items).

Materials
We translated BESSI from English to German using the TRAPD approach (Harkness 2003). TRAPD is a team-based translation approach that represents the current gold standard in questionnaire translation. It produces superior translations compared to traditional translation approaches such as backtranslation (Behr 2018;Behr and Braun 2021). The TRAPD approach through which we translated BESSI from English to German comprised 5 steps: (T) In the translation phase, two independent translators (a translation expert and an expert psychometrician) translated the instruction, items, and response scales of BESSI from English to German. (R) In the review stage, the translators and two independent experts reviewed the two translations and decided on the final translations. Where necessary, they suggested alternative translations. (A) In the adjudication phase, an independent adjudicator who had not been involved in the prior phases compared the different translations against the English source version, chose between competing version in cases in which the reviewers had not agreed on a final translation for an item, and approved the translation for the fieldwork. (P) The pretest phase consisted of presenting the translated instructions, items, response scales to a small number of psychometrics experts and laypersons to test whether all translations were properly understood. The second part of the pretest phase was the initial data collection presented in the following. (D) Finally, the documentation phase consisted in documenting the previous phases in the project's OSF archive as well as in the present manuscript. We denote the initial translation resulting from the TRAPD approach as BESSI-G v0.1.
We administered the 192 items of the initial version of BESSI-G (v0.1) to respondents in a three-form planned missingness design (Graham et al. 1996). We randomly assigned respondents to one of three different questionnaires, each of which comprised four out of six items per facet (i.e., 128 out of 192 items) in three different combinations. This design reduced survey length (and hence costs and respondent burden) by one-third. Resulting data are completely missing at random (MCAR) that can be analyzed with standard missing data methods without incurring bias (Zhang and Yu 2021).

Analyses
Unidimensionality. Unidimensionality holds if there exists a single latent variable underlying a set of items (Hattie 1985;Ziegler and Hagemann 2015). Only when a scale is unidimensional can scale scores be unambiguously interpreted as reflecting the target skill. Unidimensionality is thus a prerequisite for unbiased estimates of validity and reliability. Moreover, unidimensionality is a prerequisite for using item parcels in later analysis (e.g., Little et al. 2002;Matsunaga 2008 EKC (Braeken and Van Assen 2017) uses an eigenvalue decomposition of the inter-item correlation matrix to identify reliable factors. EKC can be seen as a sample-specific variant of the commonly used (population-appropriate) Kaiser-Guttman criterion, which is to retain factors with an eigenvalue greater than 1. However, EKC incorporates random sample variations of the eigenvalues and uses an empirical correction factor before retaining dimensions (Auerswald and Moshagen 2019). MAP (Velicer et al. 2000) identifies the number of systematic components in a correlation matrix through a series of principal component analyses. In each step, components from the preceding step are partialled out. The step number with the lowest average squared partial correlation resulting from the matrices' off-diagonals (reflecting common variance) indicates the number of components to retain. PA (Horn 1965) contrasts the empirical eigenvalues to those resulting from simulating random data with the same number of variables and observations as the empirical data set. Any factor to be retained must exceed the 95th percentile of the random eigenvalue distribution (Crawford et al. 2010;Zwick and Velicer 1986). As suggested by Lim and Jahng (2019) and Crawford et al. (2010), we used the full correlation matrix and extracted principal components (not factors) to avoid overextraction bias.
EKC, MAP, and PA offer complementary information for assessing dimensionality. Because their performance varies with the characteristics of the items and scales (i.e., number of items, distribution of responses), we computed all three to obtain an informative picture of the scales' dimensionality. Although none of them perform better than the other two in all empirical scenarios, we gave EKC the greatest weight because EKC performs as well or better than other indices for relatively short scales such as the six-item BESSI facets (Auerswald and Moshagen 2019; Braeken and Van Assen 2017). EKC also performs equally well when the true model is a zero, one, or two-factor-model, which renders it a good choice for testing the unidimensionality of the BESSI-G scales. We provide further details on the three tests in Appendix A, which illustrates why they may diverge and provide complementary information. Additionally, we inspected the first eigenvalue and the ratio of the first to second eigenvalue. Although there can be no universal cutoffs for how large eigenvalues or their ratio should be, larger values are generally preferable.
Reliability. AIn line with Soto et al. (2022), we computed two measures of internal reliability: Cronbach's alpha (α) and McDonald's omega (ω). Both are measures of the reliability of a unit-weighted scale score; whereas α assumes an at least essentially τequivalent model, ω assumes a τ-congeneric model and can be used even if there are correlated errors (Widaman and Revelle 2022; Zinbarg et al. 2006). For that reason, we mainly focused on ω. Additionally, we estimated the highest and lowest possible split-half reliability of each facet resulting from all possible combinations of assigning items to test halves. Moreover, we estimated the test-retest stability over a period of approximately 1.5 months that elapsed between T0 and T2 (median time interval across all respondents: 45 days). We used the pseudo-indicator method (PIM) as described by Rose et al. (2019) to handle missing data with full-information maximum likelihood estimation (FIML).
Reliability depends on scale length and is sample-specific. There are no universally accepted cut-offs for what constitutes sufficient reliability. For individual diagnostic decisions, stricter standards apply than for research purposes. We tentatively judged internal reliability estimates of .60-.70 as "acceptable" and .80 or greater as "very good" (Hulin et al. 2001).
As an ancillary statistic, we computed average variance extracted (AVE), which indicates the share of variance in an item set that can be attributed to the latent construct as opposed to uniqueness and random error (Fornell and Larcker 1981). AVE is therefore often considered to be a measure of factorial validity. Fornell and Larcker (1981) suggested a threshold value of AVE ≥ .50, although lower values are frequently observed. We tentatively adopted the same threshold.
Factorial validity (CFA measurement models). To test factorial validity, we fit a singlefactor confirmatory factor analysis (CFA) measurement model for each of the 32 facets. The six items per facet were loaded on a single factor whose variance we fixed to unity for identification. These single-factor models test the local independence of the items given the (single) latent trait and will indicate poor fit if local independence is violated. Thus, single-factor CFA models constitute another, arguably strict, test of unidimensionality (Hattie 1985). Moreover, these models inform about the factorial validity of each facet when conceived as a unitary construct. Importantly, even scales that are unidimensional according to the dimensionality tests discussed above may show insufficient fit according to the strict standards of CFA, for example because some items have (residual) correlations beyond the common latent variable that lead to misfit if they go unmodeled (i.e., because local stochastic independence does not always hold).
We estimated all models with a robust maximum likelihood estimator (MLR) and FIML to handle missing data. In line with current conventions for judging model fit (Hu and Bentler 1999), we chiefly relied on the comparative fit index (CFI), root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR) to assess model fit. We judged model fit to be acceptable according to the following rules of thumb: CFI > .90 ("adequate") or > .95 ("good"), RMSEA < .05 ("good") or at least < .08 ("adequate"), and SRMR < .05 ("good") or at least < .10 ("adequate"). We stipulated that a model was acceptable when at least two of the three indices passed the cutoffs.

Results
We summarize the main results of the pilot study here. We present the tables with detailed results in Appendix A (Tables A1-A3). Table A1 shows unidimensionality results for the BESSI-G facets. For all 32 facets, there was only one large eigenvalue, whereas the second eigenvalues were small throughout, resulting in ratios of the first to second eigenvalue of 2.53 to 7.20. For 31 of the facets, all indices unequivocally indicated unidimensionality. The sole exception was the selfreflection skill facet, where-despite a clearly dominant first eigenvalue-all tests pointed to a second factor, indicating that the unidimensionality assumption was violated. Table A2 shows reliability estimates for the BESSI-G facets. For 28 of the 32 facets, ω exceeded .80, the threshold conventionally seen as indicating "good" reliability. The three remaining facets (e.g., Abstract Thinking Skill) fell short of this standard by only a small margin. The average ω across the 32 facets was .85.

Reliabilities of the 32 BESSI-G Facets
Test-retest stabilities (r tt ) of the observed scores over 1.5 months were slightly lower than internal consistencies. They ranged between .66 and .87 with an average of .75. The facet with the lowest test-retest stability in this sample was Impulse Regulation, while that with the highest was Leadership Skill.
AVE was in excess of >.50 for 22 of the 32 facets. The other ten facets fell short of this threshold. For the latter scales, the respective common factor explained only a relatively small amount of variance in the indicators, whereas item uniquenesses/errors were relatively large. Table A3 shows the fits of single-factor CFA models for the 32 facets. Although the model χ 2 indicated significant deviations for all models, many of the facet scales of BESSI-G 0.1 showed satisfactory fit according to conventional cutoffs for at least two of the fit indices. For eight facets, all three fit indices indicated acceptable fit, and for an additional seven facets at least two out of three fit indices signaled acceptable fit. However, fit indices still showed room for improvement for most facets, and nine facets did not achieve good fit in the present sample.

Factorial Validity of the 32 BESSI-G Facets
The model modification indices suggested that for most of the insufficiently fitting models, an unmodeled residual covariance for a sole item pair (and only rarely more than one item pair) was responsible for the misfit. That is, these two items were not fully locally statistically independent given the latent variable. Across the 32 facets, the average χ 2 values of the highest modification index for a residual correlation was 110.02. Upon closer inspection, the reasons for these residual covariances appeared to be trivial in many cases, such as specific words or grammatical constructions that the two items had in common. For example, the first ("Learn about other cultures") and fourth ("Study other languages or cultures") item from the cultural competence both referred to learning/acquiring knowledge about other "cultures". In other cases, the reasons behind the residual correlations with the highest modification indices were less obvious, for example in the case of the teamwork skill items 2 ("Contribute to group projects.") and 5 ("Cooperate to get things done").

Discussion
Study 1 demonstrated that the initial version of BESSI-G already achieved satisfactory psychometric properties in the present sample. With the sole exception of Self-Reflection, all facets were clearly unidimensional. The facets' internal reliabilities were mostly very good (ω ≥ .80 for 28 facets). They were slightly lower than those reported by Soto et al. (2022) for the English-language source version of BESSI, which might be due to the different research design (recall that we applied a three-form design in which each respondent answered different combinations of four out of six items). Test-retest stabilities across a roughly six-week period were lower than internal reliabilities but still moderately high. Overall, these findings suggest that BESSI-G reliably measures a person's SEB skills.
Despite the facets' unidimensionality and good reliabilities, single-factor CFA models showed mixed results. Model fit was acceptable for several of the facets according to CFI and SRMR, whereas RMSEA was mostly above the conventional threshold. Generally, the χ 2 values signaled room for improvement. It should be noted that BESSI was conceived with item parceling in mind, which is why the original publication ) did not test single-factor measurement models for individual BESSI facets. Modification indices suggested that one pair of items per facet was more strongly interrelated than the common factor allowed. Combined with the evidence for unidimensionality and reliability, this suggests that the lack of model fit for most facets was unlikely to reflect major problems with the scales. Moreover, we expected that-if desired-the lack of fit could be remedied by modeling a residual correlation between one item pair-a possibility that we explored in Study 2.
In sum, the initial version of the 32 BESSI-G facet scales already showed promising psychometric properties. However, at least in the present sample, some of the facets showed some room for improvement regarding the fit of single-factor CFA models using single items as input instead of item parcels for joint models as in the original BESSI publication ). We also found a lack of unidimensionality for self-reflection. Reliability was mostly lower than in the English-language source version. Additional analyses (not reported) also identified some items that showed too much overlap with other facets. We therefore drafted revised versions for 31 of the initial 192 items with the aim to further improve the BESSI-G facet scales. Based on ratings by two of the authors (with regard to content validity and translation quality) and a pretest of the alternative translations in a subsample of respondents who were reinterviewed at T2 and T3, we retained 14 of the revised items to replace their respective original items. The decisions are documented in the project's OSF archive. We thus obtained a refined version of BESSI-G (henceforth "BESSI-G v0.2") that we fielded in a fresh sample in Study 2.

Study 2: Testing the Refined BESSI-G Facets and Their Domain-Level Structure
The purpose of Study 2 was threefold. First, we aimed to assess the psychometric properties of the refined version of BESSI-G's (v0.2) facet scales, repeating the same analyses as in Study 1 in a fresh (and larger) adult sample.
Second, we expanded our analyses of the factorial validity of BESSI-G over Study 1 by testing its joint facet-level and domain-level structure in addition to the 32 single-factor CFAs. Adopting the same modeling strategy as Soto et al. (2022) with the English-language source version of BESSI, we estimated a joint measurement model with all 32 facets to test the overall facet-level structure of BESSI-G. Moreover, we used exploratory factor analyses (EFA) to test whether the 32 facets cluster in the 5 domains as described by Soto et al. (2022) and Napolitano et al. (2021) for the English-language source version. We expected to replicate the five-dimensional structure of BESSI that these authors reported, including its interstitial skills and the three compound skills (see Figure 1).
Third, we aimed to garner novel insights into the BESSI facets that are of more substantive interest. Next to testing the test-retest stabilities over approximately 8 months-a much longer period than the 1.5-month period in Study 1-we estimated the true score correlations ρ tt (i.e., latent correlations correcting for measurement error) over the same period to gauge the temporal stability of the 32 facets over an extended time period. Moreover, we computed the sample means of the facets to garner insights into the SEB skill distribution in this adult sample.

Data
Data for Study 2 (henceforth "T4") came from 1008 adults aged 18 to 65 years residing in Germany whose native language was German. We collected the data via a commercial online access panel provider (Respondi AG). Respondents received a small monetary incentive for participation. There was a quota for age, gender, and education according to the German Microcensus 2017, ensuring that the sample was sufficiently diverse and reflected the sociodemographic compositions of the general population. Different from Study 1, about half (n = 517) of the respondents received the full 192-item questionnaire, whereas the other half (n = 491) of the respondents received the same three-form PMD as in Study 1. In this way, information from the full design could be borrowed for handling the missing data introduced by the PMD, adding further precision. After checking data quality and screening out a small number of invalid responses (e.g., straightliners), our final analysis sample for Study 2 comprised 940 respondents.
Incidentally, a subset of 238 of these adults had already participated in the T2 survey of Study 1, in which we had pretested the revised version of BESSI-G. Of those, 203 provided valid data on BESSI-G at both time points. We exploited this overlap in the respondent pool to estimate the test-retest stability of BESSI-G v0.2 over a period of approximately 8 months.

Materials
Respondents answered the 192 items of BESSI (v0.2). Fourteen of these items differed slightly from the earlier version. The instructions and response scale remained identical. The items can be found in Table A6 in Appendix B and in a spreadsheet in the OSF archive at https://osf.io/9pvmj/?view_only=16e79cfced2743aab00d937215a8fe17.

Analyses
Psychometric properties of the 32 skill facets. To test the psychometric properties of the 32 BESSI-G (v0.2) facet scales, we assessed the unidimensionality, reliability, and factorial validity (i.e., CFA model fit) of BESSI's 32 individual skill facets as described in Study 1.
In addition to estimating the test-retest correlation of the observed scores, we used the repeated-measures data from T0 and T2 to estimate the true-score correlation ρ tt over 8 months for each facet in a latent-variable framework. The models contained residual correlations across time points between the corresponding items (as required for longitudinal models). We imposed metric invariance over time by fixing the loadings and residual correlation to the same value at both time points. We then extracted the latent (i.e., true-score) correlation between the two time points for each facet.
Facet-Level and Domain-Level Structure. To assess the fit of BESSI-G's facet structure in its entirety, we fit a joint CFA model containing all 32 BESSI-G facets as correlated first-order facets. To ensure comparability with the original BESSI publication, we followed the same analysis strategy as Soto et al. (2022). That is, we used item parceling in order to reduce model complexity, facilitate model convergence, and improve the distributional properties of the manifest indicators. We computed three parcels per facet (96 parcels in total) by assigning each of the 192 items to a two-item parceling the same way as Soto et al. (2022) (i.e., the three parcels consisted of Item 1 and Item 4, Item 2 and Item 5, and Item 3 and Item 6, respectively) and then taking the mean across the two items in each parcel. Given that no cross-loadings are permitted in CFA models, this still constitutes a strict test of the 32 facets' joint structure. Different from Soto et al. (2022), we used a robust maximum likelihood estimator (MLR) instead of the mean and variance adjusted weighted least squares (WLSMV) estimator to estimate the model. Because the parcel scores followed nearly normal distributions and were quasi-continuous, using WLSMV was not necessary.
To test the domain-level structure of BESSI-G, we conducted an exploratory factor analysis (EFA) with oblique target rotation using the 32 facet scores as input. The target matrix for the rotation contained the theoretical loadings. Following Soto et al. (2022), each facet had a unit loading on its main domain and a zero loading on other domains. The interstitial facets (i.e., Energy Regulation, Information Processing, Ethical Competence, and Impulse Regulation) had loadings of 0.5 on two domains. We did not specify target loadings for the three compound skills (i.e., Adaptability, Capacity for Independence, and Self-Reflection). We then compared how closely the pattern of EFA loadings resembled that of the original BESSI. For this purpose, we first rotated the matrices towards the loading matrix reported by Soto et al. (2022, Table 7) by means of oblique target rotation and then computed the factor congruency (Tucker's φ). Values in the range of .85 ≤ φ ≤ .94 indicate "fair" similarity, whereas values in excess of .95 imply that factors can be considered equal (Lorenzo-Seva and Ten Berge 2006).
Descriptive statistics for the 32 facets. For each of the 32 BESSI-G facets, we computed the sample mean of the observed scores, that is, unit-weighted mean scores as in Soto et al. (2022), and its standard error to construct 95% confidence intervals. We also computed additional moments (e.g., skewness, kurtosis) for each facet score that we report in Table A5 in Appendix A. Table 1 shows the dimensionality results for the Study 2 sample. All three indices were in unison, suggesting that all 32 facets were unidimensional. Compared to BESSI-G v0.1 investigated in Study 1, the first eigenvalues tended to be higher, with an average of 4.29 compared to 3.60 in the previous sample. Consequently, the ratio of the first to the second eigenvalue was larger (8.26 compared to 4.67 in Study 1). Thus, we concluded that unidimensionality held for all BESSI-G (v0.2) facets.  Table 2 shows reliability estimates for the BESSI-G facets. Although reliabilities were already good in Study 1, they further improved in the fresh sample of Study 2. Internal consistencies were now in excess of .80 for 32 of the 32 facets and often surpassed .90. The average ω across the 32 facets was .90 and thus virtually identical to what Soto et al. (2022) obtained in multiple samples with the English-language source version. AVE now passed the threshold of >.50 for 32 of the 32 facets, indicating that the common factors explained more variation per item on average compared to Study 1.

Unidimensionality of the BESSI-G (v0.2) Facets
As one would expect, test-retest stabilities (r tt ) over 8 months were lower than those across 1.5 months in Study 1. Recall that test-retest stability reflects measurement error (i.e., unreliability) and trait change as well as state fluctuations. Still, r tt ranged between .40 and .80 with an average of .66. The facet with the lowest r tt was, somewhat ironically, Capacity for Consistency, which was the only facet with a test-retest correlation below .50.
The true-score stabilities ρ tt (i.e., test-retest correlations corrected for measurement error trough latent-variable modeling) ranged from .69 to .91 with an average of .79. On average, ρ tt exceeded r tt by .13, although the difference was often much greater. Hence, the true score stabilities of the skills were all substantial.  Table 3 shows the model fits of the 32 single-factor CFAs for BESSI (v0.2). The fit of the 32 single-factor CFAs improved over Study 1 for most of the facets. Still, several of the facets did not fully meet conventional cutoffs-despite now even clearer evidence for their unidimensionality and reliability. Time Management and Self-Reflection Skill showed the poorest fit, whereas other facets such as Responsibility Management showed good fit.  As in Study 1, modification indices suggested that misfit arose from unmodeled residual covariances (i.e., violations of local stochastic independence). Across the 32 facets, the average χ 2 values of the highest modification index for a residual correlation was 96.03. We therefore tested the fit of measurement models that additionally included one residual covariance for the item pair with the highest modification index. Such residual covariances are likely to reflect similarities in item wording or grammatical constructions that two items share with each other (but not with the remaining four items). Accounting for this misfit by modeling the residual covariances could improve model fit but otherwise leave model interpretation intact. Results shown in Table A4 in Appendix A suggest that most models achieved acceptable fit after introducing one residual correlation. In all cases, model fit improved over the models without the residual correlation, and all but one facet now showed good fit with CFI > .95, SRMR < .05, and in most cases RMSEA < .08. The sole exception exhibiting insufficient fit in the Study 2 sample was the Time Management facet. Because this facet had shown good fit in Study 1 (see Table A3 for BESSI-G v0.1) and the items had remained unchanged in v0.2, we concluded that its poorer fit in Study 2 was likely attributable to sampling variation. Overall, this suggests that the misfit in the measurement models of the BESSI-G facets was mostly trivial, arose from linguistic similarities between some items, did not threaten the overall factorial validity of the model, and could (if desired) mostly be remedied by introducing a single residual covariance. Although further improvements might be possible by introducing a second residual covariance, we did not pursue any further data-driven model modifications but accepted the current measurement models for all facets.
All factor loadings in these CFA models were moderate to high, ranging from .54 to .92 with an average of .81. Figure 2 displays the loadings of all items on their respective facet factor based on the improved models shown in Table 3. The figure shows that, in fact, 167 out of 192 standardized loadings (i.e., 86%) were λ ≥ .70, indicating consistently strong relationships between the latent variables and their indicators.
did not pursue any further data-driven model modifications but accepted the current measurement models for all facets.
All factor loadings in these CFA models were moderate to high, ranging from .54 to .92 with an average of .81. Figure 2 displays the loadings of all items on their respective facet factor based on the improved models shown in Table 3. The figure shows that, in fact, 167 out of 192 standardized loadings (i.e., 86%) were λ ≥ .70, indicating consistently strong relationships between the latent variables and their indicators.  Table 6), although the fits are not directly comparable because these authors used a WLSMV estimator and not MLR.  Table 6), although the fits are not directly comparable because these authors used a WLSMV estimator and not MLR.
The standardized loadings of the 96 parcels on their respective factors had a range of .80 ≤ λ ≤ .97 with an average of λ = .90. Because the parcels' loadings were high and homogeneous, we explored whether a stricter (and more parsimonious) model, namely, an essentially τ-equivalent model in which all three parcels had equal loadings on their respective latent variable, fit the data. The fit of the essentially τ-equivalent joint CFA model was acceptable, χ 2 (4032) = 7237.17, p < .001, CFI = 0.95, RMSEA = 0.03 [0.03, 0.03], SRMR = 0.03. The model also had a better balance of fit and complexity/parsimony (BIC = 113,973.68) compared to the τ-congeneric model (BIC = 114,056.46). Thus, the more parsimonious essentially τ-equivalent should be preferred over the τ-congeneric model. Figure 3 shows the zero-order correlations between the 32 latent variables from the joint CFA model. The skill facets formed a positive manifold, meaning that all correlations among them were positive. The correlations ranged from small (r = .08 between Artistic Skill and Responsibility Management) to high (r = .86 between the two Innovation Skill facets Creative Skill and Abstract Thinking Skill). The facets' intercorrelations were approximately normally distributed around their average of r = .49. Most fell in the .40 ≤ r ≤ .60 range, showing that the BESSI-G facets were related (as one would expect) but at the same time far from redundant. The facet that, on average, had the smallest correlations with all other facets was the Artistic Skill facet (r = .32), whereas the facet that, on average, had the strongest correlations with other facets was Capacity for Social Warmth (r = .56).
Skill and Responsibility Management) to high (r = .86 between the two Innovation Skill facets Creative Skill and Abstract Thinking Skill). The facets' intercorrelations were approximately normally distributed around their average of ̅ = .49. Most fell in the .40 ≤ r ≤ .60 range, showing that the BESSI-G facets were related (as one would expect) but at the same time far from redundant. The facet that, on average, had the smallest correlations with all other facets was the Artistic Skill facet ( ̅ = .32), whereas the facet that, on average, had the strongest correlations with other facets was Capacity for Social Warmth ( ̅ = .56).   Figure 4 shows the means and 95% confidence intervals of the BESSI-G facets' observed scores-the type of scores most researchers working with BESSI will be using. The scores are sorted in descending order by their sample mean. The facets are colored by the domain(s) to which they are assigned according to Soto et al. (2022) and Napolitano et al. (2021). Table A5 in Appendix A shows additional descriptive statistics.
Several observations about Figure 4 are noteworthy. First, most means were above the scales' midpoint of three, indicating that respondents, on average, thought that they mastered these SEB skills "pretty well" to "very well". This also implies that most observed scores were skewed towards higher skill levels, an impression that is confirmed by the descriptive statistics in Table A5. Second, respondents rated their skills most highly in the Self-Management domain. All facets of this domain were among the ten top-rated skills. Third, the facet with by far the lowest mean was Artistic Skill, which was the only facet whose sample mean was below the scale's midpoint.
the scales' midpoint of three, indicating that respondents, on average, thought that they mastered these SEB skills "pretty well" to "very well". This also implies that most observed scores were skewed towards higher skill levels, an impression that is confirmed by the descriptive statistics in Table A5. Second, respondents rated their skills most highly in the Self-Management domain. All facets of this domain were among the ten top-rated skills. Third, the facet with by far the lowest mean was Artistic Skill, which was the only facet whose sample mean was below the scale's midpoint. Domain-level structure of BESSI-G. Table 4 shows target-rotated loadings from the EFA model testing the domain-level structure of BESSI-G when extracting five factors. Additionally, it shows two indices: "Complexity" refers to the number of factors needed to account for the observed variable (in this case: the facet score). Complexities of 1 would imply a perfect simple structure in which each facet loads on only one factor, whereas complexities greater than one imply that the facet loads on multiple factors. "Uniqueness" refers to the variance that is unique to each facet and not shared with other facets; it equals one minus the communality; for example, a uniqueness of .20 suggests that 20% of a facet's variance is not shared with any other facets. Domain-level structure of BESSI-G. Table 4 shows target-rotated loadings from the EFA model testing the domain-level structure of BESSI-G when extracting five factors. Additionally, it shows two indices: "Complexity" refers to the number of factors needed to account for the observed variable (in this case: the facet score). Complexities of 1 would imply a perfect simple structure in which each facet loads on only one factor, whereas complexities greater than one imply that the facet loads on multiple factors. "Uniqueness" refers to the variance that is unique to each facet and not shared with other facets; it equals one minus the communality; for example, a uniqueness of .20 suggests that 20% of a facet's variance is not shared with any other facets.  Of the 25 skill facets that could be uniquely assigned to exactly one of the five domains according to their loadings by Soto et al. (2022) (see Figure 1), all had their highest loading on the expected domain factor in our present data as well. The overall pattern of loadings was in close alignment with that of the English-language BESSI reported in Soto et al. (2022). The congruency coefficients between their BESSI loading matrix and our present BESSI-G loading matrix ranged between .93 ≤ φ ≤ .94 per domain, indicating that the domain factors were highly similar or in fact equivalent when comparing BESSI-G to BESSI. When comparing the loadings against the idealized target matrix containing only 0 and 1 loadings, the congruencies were still quite high (.82 ≤ φ ≤ .97), implying a good fit between theoretical expectations and the empirical loading pattern.
Some differences to the English-language source version emerged in the details, specifically, the interstitial and compound skills. Four facets (i.e., Energy Regulation, Information Processing, Ethical Competence, and Impulse Regulation) were labeled as "interstitial" facets by Soto et al. (2022) and Napolitano et al. (2021) because they loaded similarly highly on two domains. In the present sample, the Ethical Competence facet was a truly interstitial facet, loading equally on both Cooperation and Self-Management. The other three primarily loaded on one factor: Energy Regulation on Self-Management (and to a lesser extent Emotional Resilience but not Social Engagement as in the original BESSI), Information Processing Skill on Innovation; and Impulse Regulation on Emotional Resilience). Thus, the "interstitial" facets tended to fall more clearly under a single domain in our sample compared to the original BESSI. On the other hand, Cultural Competence had a non-negligible cross loading on the Cooperation domain.
Regarding the compound skills that did not load on any of the five domains in the original BESSI, two of the three (namely, Adaptability and Self-Reflection Skill) likewise did not have a strong and dominant loading (all λ ≤ .40) on any of the five domains. These facets also had the highest complexities and were thus indeed "compound skills". By contrast, the Capacity for Independence facet clearly fell under the Self-Management Skills domain-as originally intended by Soto et al. (2022) but different from these authors' empirical findings in samples from the US. Thus, Adaptability and Self-Reflection but not Capacity for Independence were compound skills.

Discussion
BESSI-G (v0.2) performed well in terms of unidimensionality, reliability, and factorial validity on both the facet and domain level. All facets now were clearly unidimensional, had high to very high internal consistencies comparable with the English-language source version (average ω = .90) and moderate-to-high test-retest correlations across a period of approximately 8 months. With an average true score correlation of ρ tt = .79, the 8month stabilities (correcting for measurement error) were substantial and in line with the hypothesis that SEB skills are relatively stable over time (though malleable in principle).
The model fits of single-factor CFAs per facet were mostly acceptable, although some facets still did not meet conventional cutoff criteria of model fit (Hu and Bentler 1999). We do not deem the remaining misfit very problematic for three reasons. First, the conventional fixed cutoffs should not be overgeneralized, and there is no need to fully reject a model if it fails to meet the conventional cutoffs (e.g., Groskurth et al. 2021). Second, the sources of misfit seemed to be mostly trivial, arising from shared wording effects. After introducing a single residual correlation per facet to account for wording effects, model fit improved (Table A4). This is a strategy that we do not generally recommend but that researchers may choose to pursue if further optimizing model fit is the goal. Third and most important, it should be kept in mind that Soto et al. (2022) designed BESSI with a joint 32-facet model based on item parcels in mind and that did not optimize the fit of individual facets. It is important to note that moderate amount of model misfit is unlikely to introduce major bias in coefficients of interest (e.g., means, correlations) when using item parcels as input in CFA models, especially since the unidimensionality assumption held for all 32 facets. The same applies when using the observed unit-weighted scale scores for the BESSI facets, which will be the default way in which most users will work with BESSI(-G).
We also replicated the facet-level structure and domain-level structure of BESSI proposed by Soto et al. (2022). A joint CFA facet model for the 32 facets showed good fit to the data. Additional analyses showed that the joint CFA model even fit when applying the restriction that all parcels load equally on their target facet (i.e., an essentially τ-equivalent model). Correlations in the joint CFA model revealed that the 32 facets formed a positive manifold in which most facets were positively correlated, with latent (i.e., true score) correlations ranging from small (r < .10) to substantial (r~.80) with an average slightly below .50. At the same time, the correlations indicated that all facets were sufficiently distinct from all others, offering unique information about a person's SEB skills.
Moreover, an EFA closely replicated the domain-level structure of the English-language source version of BESSI. All BESSI-G core facets that Soto et al. (2022) could clearly assign to one of the five domains fell under the same domain in our sample. The only differences to the source version we observed were in the finer details, namely, the loadings of the interstitial and compound facets. By and large, however, our findings lend further support to the organization of the 32 BESSI(-G) facets in five global SEB skill domains that resemble the Big Five in the realm of personality traits ) in both English and German.
The descriptive statistics suggested that respondents ascribed to themselves rather high levels on many SEB skills (mostly between "pretty well" and "very well"). Social desirability may be one of the factors behind the relatively high means, an explanation that future studies could test by contrasting self-reports with informant-reports.

Study 3: Convergent and Discriminant Validity
Having established the final translation of the BESSI-G items in Study 2, the aim of Study 3 was to locate BESSI-G's facets and domains in a nomological network with the two arguably most important and historically dominant individual difference constructs: personality traits and intelligence. We tested correlations (for some of which we preregistered hypotheses in the project's OSF archive) between the BESSI-G domains facets, personality domains and facets, as well as fluid and crystallized intelligence.
In  2019)). We therefore sought to replicate the convergent validity of the BESSI domains in relation to the Big Five. At the same time, BESSI intends to measure skill (functional capacities), not traits, leading us to expect that BESSI-G would exhibit discriminant validity against the Big Five. In this regard, we generally expected to find the same patterns of associations that Soto et al. (2022, Study 4) reported for the English-language source version of BESSI. These authors found convergent correlations between the BESSI domains and corresponding Big Five domains ranging from r = .67 for Cooperation and Agreeableness to r = .79 for Social Engagement and Extraversion. The discriminant correlations of the BESSI domains with the non-corresponding Big Five domains ranged from .09 to .42 in Soto et al. (2022, Study 4).
Moreover, expanding previous evidence on the validity of BESSI, in Study 3 we present first evidence on the associations of the 32 facets with both fluid (g f ) and crystallized intelligence (g c ). Evidence based on Big Five inventories (e.g., the BFI-2 in Rammstedt  , which contains a Big Five-based, faceted measure of SEB skills, closely echo these findings. These authors found that SEB domains and facets had only small to moderate correlations with a short measure of mostly fluid cognitive abilities, the highest associations being those of tolerance (r = .17) and curiosity (r = .16), two facets from the Open-Mindedness domain (corresponding to BESSI's Innovation Skills domain). Based on this prior work and for conceptual reasons (i.e., that SEB skills are designed to measure abilities other than intelligence), we expected that SEB skills are largely independent of both fluid and crystallized intelligence, with two important exceptions: we expected the facets from the Innovation skills domain (especially Abstract Thinking Skill and Information Processing Skill, and to a lesser extent Intercultural Competence, Creative Skill, Artistic Skill) to correlate positively with fluid and crystallized intelligence.
6.1. Method 6.1.1. Sample Study 3 used data from the subsample of 767 respondents who participated in the follow-up waves of our data collection in which we assessed intelligence (T1) and piloted BESSI-G v0.2 (T2).

Measures
BESSI-G. We used the v0.2 of BESSI-G as evaluated in Study 2, measured at T2. Big Five. We measured the Big Five personality traits with the short Big Five Inventory-2 (BFI-2-S; Soto and John 2017) in its German adaptation ). BFI-2-S measures each Big Five domain with 6 items (i.e., 30 items in total). We administered 15 of the items at T0 and the further 15 items at T2. Respondents rated each item on a five-point scale ranging from 1 (disagree strongly) to 5 (agree strongly). Note that the original BESSI paper by Soto et al. (2022) used the full 60-item BFI-2. The BFI-2-S has only 2 items instead of 4 per facet. Therefore, we expected associations on the facet level to be slightly lower, whereas the differences at the domain level should be negligible.
Fluid intelligence. We assessed fluid intelligence (g f ) with 12 items from the International Cognitive Assessment Resource (ICAR; Condon and Revelle 2014), a short measure of intelligence that shows convergent validity with longer standard intelligence measures (e.g., Young and Keith 2020). These items measured 3 subsets: Verbal Reasoning (VR), Letter and Number Series (LN), and Matrix Reasoning (MR). Each set of four items was presented in three separate blocks with a time limit of 2, 3, and 3 min, respectively. Participants could work on block-tasks at their own speed and/or skipping blocks via a progress button. They were required to indicate 1 out of 8 options (1 correct option, 6 distractors, plus "None of these", or "I don't know") for each item. When the time limit was reached, the assessment automatically jumped to the next block. Answers were recoded to 0 for wrong, do not know, or non-given answers and to 1 for the correct solution. The final sum score ranges from 0 to 12. We used data from 607 respondents who took the assessment before 6 February 2021, when we changed the time limit for the assessment as part of a survey experiment for another study unrelated to the present BESSI-G validation. Reliability of the 12-item sum score of ICAR in our sample was α = .73.
Crystallized intelligence. To assess g c , we used the short version of the Berliner Test zur Erfassung fluider und kristalliner Intelligenz (BEFKI GC-K; Schipolowski et al. 2014). BEFKI contains 12 items that cover basic knowledge from humanities, natural, and social sciences. For each item, participants are asked to mark 1 out of 4 possible answers. Following the test's manual, we limited the assessment time to 5 min. We recoded respondents' answers to 0 for wrong or missing and to 1 for the correct answer, such that the (number-right) test scores range from 0 to 12. Schipolowski et al. (2014) reported good factorial validity, small to medium correlations with socio-economic and personality measures (e.g., Big Five Openness, r = .21), and good reliabilities (Cronbach's α = .81, or for the manifest sum score, Raykov's ρ = .70). As with g f , we used data from 607 respondents. Reliability of the 12-item sum score in our sample was α = .68.

Analyses
Following the original BESSI paper ), we computed Pearson correlations between observed scores to evaluate the nomological network of BESSI-G. For the facet-level correlations, we used the PIM method (Rose et al. 2019), which enabled us to compute observed scores using FIML to account for any missing data arising from our planned missingness design. We estimated separate models for each bivariate correlation between a BESSI-G facet with a Big Five facet, g f , or g c . For the domain-level correlations, PIM models did not converge in many cases, such that we decided to use the prorated mean (i.e., the mean across all available items per respondent) for all models involving the domains. Because missing data was fully random, using the prorated mean would not introduce significant bias.
To gauge the similarity of the correlations between BESSI and personality traits in our sample with those reported by Soto et al. (2022), we computed two statistics: (1) the pattern correlations (i.e., the Pearson correlations of the Fisher-z-transformed correlations for each column vector in the respective correlation table) and (2) the average absolute difference of the Fisher-z-transformed correlations per column vector. We report these statistics in each correlation table involving the BESSI facets and domains and the Big Five facets and domains. Table 5 shows the correlations between the 32 BESSI-G facets and the Big Five personality traits. The strongest correlations per facet are highlighted in bold. The BESSI-G facets were moderately linked with the Big Five. Few correlations-to be specific, only 15 out of 160-exceeded r = .50. Similar to the English-language source version of BESSI, with very few exceptions each BESSI-G facet had at least one (and mostly only one) correlation with a Big Five domain that exceeded r = .30. The majority of BESSI-G's facets had their strongest correlation with the Big Five domain that corresponds to these facets' BESSI domains.  Table 6 shows associations between the BESSI-G facets and the 15 personality facets of BFI-2-S. Several insights can be gleaned from the table. First, some of the BESSI-G domains had substantial associations on the observed-score level with a personality facet that supported their convergent validity. For example, we observed strong convergent associations (r ≥ .60) between Leadership Skills and Assertiveness, Organizational Skill and Organization, Creative Skill and Creative Imagination, as well as Stress Regulation and Anxiety. Second, however, most correlations were not as strong, with only 22 out of the 480 correlations in Table 6 exceeding a value of .50. Apart from typically one or two personality facets that often came from the corresponding Big Five domain, the BESSI-G facets had mostly small associations with personality facets, supporting their discriminant validity. With a few exceptions, the pattern of correlations was similar to that reported by Soto et al. (2022). Note. Strongest correlation of each BESSI-G facet in bold. The Big Five domains and facets are abbreviated as follows: extraversion-sociability, assertiveness, and energy level; agreeableness-compassion, respectfulness, and trust; conscientiousness-organization, productiveness, and responsibility: emotional stability-anxiety, depression, and volatility; open-mindedness-aesthetic sensitivity, intellectual curiosity, and creative imagination. Table 7 additionally shows the correlations at the domain level. As expected, each BESSI-G domain had its highest correlation with the Big Five domain to which it corresponds theoretically according to Soto et al. (2022). The correlations with the other four Big Five domains-shown in the off-diagonals of the table-were consistently weaker, such that there was a single substantial correlation per BESSI-G domain. With a maximum value of r = .68 and an average of r = .58, the convergent correlations in the diagonal of the table were slightly weaker than those reported by Soto et al. (2022), yet the general pattern was highly similar. Thus, as hypothesized, the BESSI-G domains showed convergent validity with the Big Five as well as discriminant validity.  Table 8 shows the correlations of the 32 BESSI-G facets with fluid (g f ) and crystallized (g c ) intelligence. Most correlations were near zero. None exceeded an absolute value of |r| = .20. Thus, in line with our hypotheses, most SEB skills measured by BESSI-G were largely independent of intelligence. The highest correlation of any BESSI facet with g f was that of Information Processing Skill at r = .20. Facets from the Innovation Skills domain-Information Processing Skill, Abstract Thinking Skill, and Cultural Competencewere among those that had the strongest associations with g f . Other facets with small positive correlations to g f were those from the Self-Management domain: Responsibility Management, Detail Management, and Capacity for Independence, which was a compound facet in the original BESSI (Soto et al. 2022) but fell under the Self-Management domain in Study 2. In turn, facets that loaded on the Self-Management domain-Responsibility Management, Decision-Making Skill, Detail Management, as well as Ethical Competence and Capacity for Independence-were among those that had the highest correlations with g c . The facet with the strongest association to g c was Responsibility Management at r = .16. Information processing skill also had a non-negligible positive association with g c .  Table 9 shows the correlations for the BESSI-G domains. All correlations were close to zero. The largest correlation was that between Innovation Skills and g f at r = .08. Table 9. Correlations of the BESSI-G domains with cognitive ability (Study 3).

BESSI-G Domain
Fluid Intelligence (g f ) Crystallized Intelligence (g c )

Discussion
Study 3 demonstrated that BESSI-G's 32 facets are associated in expected and theoretically plausible ways with personality traits (i.e., Big Five domains and facets as measured with BFI-2-S). The pattern of associations was highly similar to the one Soto et al. (2022, Study 3) observed for the English-language source version of BESSI, although-likely as a result of the lower reliability of BFI-2-S compared to the full BFI-2-the correlations tended to be weaker than in the original study by Soto et al. (2022). This means that BESSI-G closely resembles BESSI not only in how the skill facets related to each other and their higher-level domains (see Study 2) but also in how the skill facets related to personality traits. Notably, although some associations were substantial even on the observed-score level that we investigated here (i.e., not controlling for measurement error and attenuation), few associations were strong enough to suggest complete overlap between personality traits and the SEB skills measured by BESSI-G. Despite the fact that both the Big Five and SEB skills share the same referents, the skill-focused framing and response format of BESSI (asking about a person's skill levels instead of typical behavioral tendencies that characterize them) achieved sufficient discriminant validity from personality traits. Moreover, Study 3 was the first to investigate how the SEB skills measured by the BESSI assessment framework relate to intelligence. As expected, most of the 32 BESSI-G facets were largely unrelated to both fluid and crystallized intelligence, at least in the lowstakes situation investigated here. This is similar to what prior research (e.g., Rammstedt et al. 2018;Lechner et al. 2017;Guo et al. 2022) has reported for Big Five domain and facets measures. Also akin to this earlier research, the facets with the strongest relation to intelligence came from the Innovation Skills domain (corresponding to Open-Mindedness in the BFI-2) and, for g c , those from the Self-Management Skills domain (corresponding to Conscientiousness): Responsibility Management, Detail Management; as well as Cultural Competence and Information Processing Skill, which had the strongest associations with g f . These findings suggest that only a few BESSI-G facets have-theoretically plausible-links to g f and g c . Correlations were even smaller at the domain level.
These findings give further credence to the view that SEB skills are functional capacities, many of which can be cultivated largely independent of cognitive ability (e.g., Lechner et al. 2017;Rammstedt et al. 2020). A possible alternative explanation that we could not rule out at this stage, however, is that people do not have fully accurate perceptions of their SEB skills, which in turn may limit any associations between their self-reported SEB skills and their-more objectively measured-fluid and crystallized intelligence. In the realm of cognitive ability, meta-analytic correlations between self-perceived and measured ability are often in the vicinity of r ≈ .30, although with large variation across types of abilities (Freund and Kasten 2012; Zell and Krizan 2014); it is unclear whether the same might apply to SEB skills, mainly because objective, maximum-performance measures of SEB skills are in short supply. A meta-analysis by McDaniel et al. (2007) that analyzed associations with general intelligence (g) of situational judgement tests (SJTs) used for personnel suggestion (including some SJTs that incidentally measure skills somewhat similar to some of the SEB skills measured by BESSI) found somewhat stronger associations between SJTs and g than we did with BESSI, especially when the SJTs used knowledge (r = .32) as opposed to behavioral tendencies (r = .17) instructions. Thus, future research assessing associations with intelligence using additional measures of BESSI's facets (such as SJTs and informant-ratings) will be able to provide further insights into how cognitive and SEB skills are related. These studies should also investigate these associations in high-stakes settings.

General Discussion
In this paper, we presented BESSI-G, a German-language adaptation of the recently introduced BESSI (Soto et al. 2022; see also Napolitano et al. 2021). BESSI-G is the first foreign-language adaptation of BESSI. We expanded the results presented by Soto et al. (2022) on the English-language source version by (1) assessing the psychometric properties of the 32 individual facets (in addition to their joint facet-level and domain-level structure), (2) providing first insights into the temporal stabilities of the SEB skills measured by BESSI (in addition to internal reliabilities), and (3) investigating these facets' associations with intelligence (in addition to personality traits).
Results from our three studies demonstrate that BESSI-G has good psychometric properties that are in many ways comparable to the English-language source version. BESSI-G's facets are all unidimensional, have good reliabilities that are high enough even for practical applications, and exhibit mostly acceptable CFA model fit, especially after allowing one item pair per facet to have a residual correlation. The facets also fit well when modeled jointly in a 32-facet CFA with parcels as input; they even conformed to an essentially τ-equivalent model with equal factor loadings for all parcels. Moreover, the facets cluster in the five domains as expected when modeled with an EFA. The organization of the 32 facets in 5 higher-order domains resembling the Big Five was highly similar to the English-language source version and supported the BESSI framework (Napolitano et al. 2021;Soto et al. 2022). The same applied to the patterns of associations with personality traits, which closely resembled those of the source version.
Our findings contribute to the wider debate about how to best conceptualize SEB skills. A large number of frameworks have been proposed (for reviews, see Abrahams et al. 2019;Schoon 2021;Soto et al. 2022). Where many of these frameworks-including BESSI-build on the familiar Big Five framework and hence share many similarities and a common language, others take a more theory-driven approach that does not directly map on the Big Five. Examples include the Values-in-Action (VIA) framework of character strengths and its attendant inventory VIA-IS (Peterson et al. 2005;Ruch et al. 2010), which are based on a cross-cultural analysis of valued traits, and the DOMASEC taxonomy recently proposed by Schoon (2021), which aims to link the Big Five to self-determination theory and other theoretical considerations. Compared to virtually all previous taxonomies and inventories, BESSI has the advantage of allowing for a more fine-grained and comprehensive assessment of SEB skills (e.g., the VIA framework comprises only three global factor-analytic domains; see Partsch et al. 2021) and of unequivocally assessing SEB skills as skills instead of traits or preferences. Despite its recent nature, BESSI rests on a solid psychometric footing and allows for a comprehensive assessment of SEB skills within a relatively short assessment duration. An analysis of time stamps in the Study 2 subsample that answered all 192 BESSI-G items showed that BESSI-G typically took respondents between 10 and 20 min to complete, with an average of about 15 min. This is highly similar to what Soto et al. (2022) reported for the English-language source version and implies that even the relatively long 192-item version-perhaps owing to the simple item wording-might not come with overly high respondent burden. Thus, we believe that BESSI will be a good choice for researchers seeking to investigate SEB skills. That said, future work comparing BESSI(-G) to other SEB skill inventories, especially those not based on the Big Five framework such as VIA, will be helpful in further mapping out the SEB skill space.
Our findings also add to research on the nature of SEB skills and their malleability. The SEB skills measured by BESSI proved to be systematically related (as one would expect) but not interchangeable with personality traits. They also proved to be largely independent of intelligence. Reminiscent of similar findings on personality traits and intelligence (e.g., Lechner et al. 2017;Rammstedt et al. 2018), this suggests that the SEB skills measured by BESSI(-G) are functional capacities that people can develop independent of their highly heritable intelligence. Additionally, our repeated-measures design allowed us to provide first evidence on the test-retest stabilities of the BESSI facets. The facets' observed scores are moderately stable over a 1.5-month period (average r tt = .75) and somewhat less stable over an 8-month period (average r tt = .66), yet the temporal stabilities of the true scores across 8 months were substantial throughout (average ρ tt = .79). These temporal stabilities are consistent with the view espoused by many SEB skill researchers (e.g., Abrahams Grosz et al. 2022). BESSI provides a novel instrument that may prove fruitful for future inquiries into the development of SEB skills over the entire life span.

Strengths, Limitations and Directions for Future Research
Our present research has several strengths. It offers an in-depth psychometric analysis in multiple large samples of a comprehensive SEB skill inventory translated through the state-of-art TRAPD approach, including extensive analyses of BESSI's nomological network with personality traits and intelligence as well as first evidence on the test-retest stability of BESSI's facets. At the same time, our research has limitations that future research should address.
First, we focused on the self-report version of BESSI-G. For both research and applied purposes, validating an observer-report version would be a natural next step. In the English-language source version, the observer report version of BESSI showed similarly good psychometric properties as the self-report version. Future research should investigate whether-as indeed we expect-the same applies to our German-language adaptation. Going forward, it might also be possible to complement the self-and observer-report forms of BESSI with situational judgement tests (SJTs) or multiple forced choice that might further increase objectivity and reduce social desirability.
Second, we relied on online samples of respondents who answered BESSI-G in the absence of any external pressures (i.e., in low-stakes settings). Future research should investigate whether the psychometric properties of BESSI-G are equally good, or perhaps better, in other survey modes (e.g., paper-pencil) and in high-stakes settings. The latter will be especially important if BESSI-G is to be applied to support placement or admission decisions. In this regard, social desirability and its impact on the validity of BESSI(-G) self-reports will be a crucial issue for future studies to address.
Third, although we investigated the nomological network of BESSI-G's facets with personality traits and intelligence, we did not yet investigate whether BESSI predicts important outcomes such as success in education, at work, and beyond. Accumulating evidence on its predictive validity will further support BESSI-G's utility in research and applied settings. Soto et al. (2022) already presented evidence that BESSI predicted a range of criteria, although almost all these criteria were self-reported. Therefore, future research should investigate whether BESSI (including both the self-report and observer-report version) predicts important life outcomes in prospective designs and using non-self-report outcome measures.
Fourth, our samples were confined to adolescents and adults aged 14 to 65 years. Although this is not itself a major limitation, BESSI was developed with an even broader age range in mind. To facilitate developmental research into the precursors, life-span dynamics, and outcomes of SEB skills, it will be important to assess whether BESSI(-G) is equally applicable to children. Given that BESSI(-G) uses short, simple statements, there is reason to be optimistic that the inventory will work in children below the age of 10. However, so far this is only a hope not backed up by evidence. We encourage future studies to test BESSI's applicability to children.

Conclusions
BESSI-G is a German-language adaptation of BESSI ). It measures 32 SEB skill facets reliably, validly, and efficiently with an average assessment duration of 15 min. These facets cluster in five domains in ways that are theoretically expected and highly similar to the English-language source version. Given its good psychometric properties established in this paper, at this stage, we can recommend BESSI-G (and its English-language source version) for research applications in educational, clinical, developmental, or organizational research. We are hopeful that BESSI-G will enable future research into the assessment and conceptual status of SEB skills, their predictive power for life outcomes, as well as their life-span development (including targeted interventions). BESSI-G is freely available to researchers. Provided that future studies resolve some open questions and limitations that we discussed above, BESSI-G may also become a viable tool for applied contexts, such as SEB skill training and admission or placement decisions.  Institutional Review Board Statement: All data used in this paper were collected in line with the Helsinki declaration and the European Union's General Data Protection Regulation (GDPR). Respondents had given consent to joining a commercial panel, gave separate informed consent to our surveys, and their responses were fully anonymized. The survey did not contain any sensitive material, such that no additional review was required under German law.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: All study materials including the BESSI-G items and the documentation of the translation process, can be found in the project's OSF archive at https://osf.io/9pvmj/?view_ only=16e79cfced2743aab00d937215a8fe17. The complete analysis code is available in the first author's GitLab archive at https://git.gesis.org/lechnecs/bessi-g. We used R (Version 4.1.2; R Core Team 2021) for all our analyses. The final translations are also shown in Appendix B.

Acknowledgments:
The authors would like to thank Isabelle Schmidt and David Grüning (GESIS) for assistance in the data collection.

Conflicts of Interest:
The authors declare no conflict of interest. Note. All models contained one residual correlation.

Response Format
The translated response scale read as follows. Respondents are only shown the verbal labels. 1 = überhaupt nicht gut 2 = nicht so gut 3 = recht gut 4 = sehr gut 5 = extrem gut

Items
Items read as shown in Table A6.