1. Introduction
Dark traits, stable personality dispositions associated with ethically and socially aversive behaviors, have recently received more attention in personality psychology recently [
1]. The most prominent theory about dark traits is “Dark Triad” proposed by Paulhus and Williams [
2]. Dark Triad has three components: Machiavellianism, Narcissism, and Psychopathy. In recent years, some other personality traits with dark aspects, which means that these traits are always associated with negative outcomes, were suggested to be included in the framework of dark traits, such as spitefulness and greediness [
3]. It is also found that there are conceptual and empirical overlaps among these different aversive traits [
1]. We can find that both the “dark aspects” and the overlaps in the previous studies indicate commonalities with the aversive traits. To describe and explain the underlying commonalities, Moshagen and his colleagues proposed the theory of the Dark Factor of Personality (D) and regarded D as a general disposition of all aversive traits rather than a specific dark trait [
1].
Dark traits are often regarded as representatives of the dark side of the personality, are connected with negative outcomes, and the connections are cross-culturally stable. Many studies from Western countries have found that scores on the Dark Triad characteristics are positively associated with aggression, bullying, and cheating [
4,
5,
6]. Results from various studies conducted in China also indicated that aggression is a distinctive characteristic outcome of the Dark Triad [
7]. One study on Chinese university students found that Dark Triad constructs can positively predict scholastic cheating [
8]. However, despite the negative influences mentioned above, some characteristic behaviors of dark traits are also seen as adaptive survival strategies which are helpful for individuals to adapt to the environment [
9]. It is also found that some dark traits are beneficial in the competitive environment, and high scores on some dark traits are positively correlated with success [
10]. For example, individuals with a high level of Machiavellianism are good at handling various tasks flexibly and leading others in the workplace [
11]. A study conducted in Germany also showed that Machiavellianism was positively associated with leadership position and career satisfaction [
12]. Therefore, studying the dark traits and their relationships with psychological or behavioral outcomes will not only help to understand and intervene in socially aversive behaviors, but it may also be helpful in everyday areas, such as the company’s recruitment and selection process of positions requiring leadership. To conduct relevant research on dark traits, a precise and valid tool for measuring the traits is necessary.
Among all measures, the Dark Factor of Personality Scale is a relatively new scale based on the Dark Factor of Personality theory. Three versions of the Dark Factors of Personality Scale were developed to measure D, and they, respectively, included 70, 35, and 16 items (thus donated D70, D35, and D16) [
13]. All three versions had good reliability and validity and it was also verified that they performed well in some Western countries (e.g., Germany) [
13,
14]. Although the scales have been used in the Western context in several studies, no studies on D have been conducted in Chinese samples. However, given that there are often cultural differences between Western and Eastern countries, the definition of dark traits may vary in East countries [
11]. Thus, it is necessary to verify whether the scales are applicable to the sample in China. To compensate for this limitation, we created and evaluated a Chinese version of the D scale.
Additionally, previous studies about the psychometric properties of D scale are all based on Classic Test Theory (CTT). However, it is found that CTT has some shortcomings, such as sample dependence, which may negatively influence the estimation results. By contrast, item response theory (IRT) provides a more comprehensive description of psychometric properties of scales at both the item and scale level than traditional psychometric methods based on the classical test theory (CTT). IRT methods can also provide item and test information, which can help to discriminate which items are more attributive to the whole scale. It is particularly useful because researchers can remove ineffective items to shorten the scales and improve the efficiency of the scale.
Thus, the purpose of this study is: (1) to translate the Dark Factor of Personality Scale into Chinese and utilize IRT to evaluate the psychometric properties of the Chinese version scale; and (2) to develop a short version of the Chinese D scale to facilitate a quick and efficient assessment. The present study can examine whether the scale is appropriate in China and provide additional psychometric information about the scale that cannot be obtained using classical psychometric methods.
4. Results
The initial Chinese version of the D scale consisted of 32 items. The mean score of each item is shown in
Table 2. The total score of the initial scale for the whole students ranged from 38 to 147, and the overall mean total score was 79.78 (
SD = 17.33). The mean total score difference between males (
M = 81.06,
SD = 17.73) and females (
M = 79.15,
SD = 17.11) was not significant (
t(760) = 1.43,
p = 0.15, Cohen’s
d = 0.11).
4.1. Unidimensionality and Local Independence
The KMO statistic was 0.93 and Bartlett’s test was statistically significant (p < 0.001), indicating that the data met the assumptions of EFA. The EFA results showed that the first factor explained 24% of the total variance, and the eigenvalues of the first and the second factor were 7.72 and 1.51, respectively, with a ratio larger than 3. Therefore, the 32-item scale could be regarded as unidimensional. It is worth noting that the two items (item 2 and item 16) had low loadings (less than 0.3) in the single-factor model, indicating that these items could be considered for removal in the later analysis. Moreover, we calculated the residual correlations of the items and the results showed that the absolute values of the correlations among residuals were all smaller than 0.3, indicating that the local independence assumption was met.
4.2. Model Selection
As shown in the
Table 3, the GRM had the smallest values among the relative model-fit indices (−2LL, AIC, BIC), indicating that the GRM was more suitable for the initial Chinese version D scale data than GRSM and GPCM. Therefore, GRM was used for further analysis. Additionally, the fit statistics of the IRT model were calculated, and the results showed that the fit for the GRM was acceptable (
M2 = 1462.23,
df = 368,
p < 0.001; RMSEA = 0.06, 95%CI [0.059, 0.066], TLI = 0.79, and CFI = 0.80).
4.3. Item Properties and Selection
The results of parameter estimations are presented in
Table 4. The discrimination parameters ranged from 0.35 to 1.71. Most of the item discrimination values were larger than 0.75, except for items 2, 9, 15, and 16. Thus, these four items were also candidates for removal. For threshold parameters, all the items had ordered values with the first thresholds being the lowest, indicating that as the level of dark traits increased, the probability of responding to the items with higher scores was increasing. It is acceptable that there were some items with threshold values slightly larger than 4 or smaller than −4, but it is noteworthy that the first threshold parameter of item 9 was smaller than −5, and item 16 had a threshold value near 6, indicating that these items could be removed for their very low/high level of difficulty.
Based on the analysis above, we found that four items (item 2, 9, 15, and 16) did not perform well on the item properties. They had poor discrimination parameter values (≤0.75) or threshold parameter values that were far outside of the common range (≥4 or ≤−4). In addition, item 2 and item 16 had rather low loadings in the one-factor model. As a result, we removed items 2, 9, 15, and 16 and repeated the analysis procedures above for the updated, 28-item scale (D28-C).
4.4. Item Fit and Parameter Estimation
The 28-item scale still satisfied the unidimensionality and local independence assumptions of IRT, and the psychometric properties of each item were good.
Table 5 presents the results of parameter estimations with GRM. The discrimination parameters ranged from 0.79 to 1.76, meaning that they all had moderate to high discrimination values. Thus, the D28-C had a good capability of discriminating students with different levels of dark trait disposition. For threshold parameters, all the items also had ordered values with the first thresholds being the lowest, indicating that the probability of choosing options with higher scores increases with the level of the D factor. Among all the items, item 21 had the smallest values for the third and fourth threshold values, and item 34 had the highest values for the last two thresholds. This means that item 21 and item 34 required the lowest and highest level of dark traits, respectively, to endorse options with high scores. Additionally, it is noteworthy that only the first threshold values were negative, and most of the second and third thresholds and all the fourth threshold values were positive. This means that most of the items in D28-C are “difficult”, namely, a higher level of D was needed to get a higher score.
4.5. Differential Item Functioning
DIF by gender was tested using the likelihood ratio
χ2 test approach, and
Table 6 presents the results of all the items. The
χ2 values for DIF by gender ranged from 0.00 to 10.13. After Bonferroni adjustment, the results showed that all the items did not have severe DIF, except for item 13. Item 13 had the largest
χ2 value (10.13) and it had significant DIF both before and after the Bonferroni adjustment. To confirm the magnitude of the detected DIF for item 13, the effect size was calculated. The result was 0.006, which is a negligible (<0.13) DIF according to the classification guideline of Zumbo [
45]. In addition, because only 4% of the items in the D28-C were noninvariant, we determined that the D28-C was invariant as a whole [
48].
4.6. Item and Test Information Function
Figure 1 displays the item information function for D28-C items. Information provided by items 1, 5, 12, 14, 20, 22, 34, and 35 was relatively greater across the continuum of dark trait disposition. Item 24 provided the largest amount of information (near 1) across the range of −1.0 to 3.0. Most items appeared to provide the largest amount of information for the students whose dark trait disposition was in the range of −1.0 to 2.0. In addition to the item information function, test information was calculated by summing up all the item information across the trait continuum ranging from −3.0 to 3.0. The total information of the test was 65.25, and
Figure 2 describes the total information provided by D28-C. The solid line represents the test information curve and the dashed line represents the standard error of measurement for the whole scale. It shows that when the dark trait disposition continuum ranged from −0.73 to 2.72, the scale could produce slightly more test information (≥12.0) and less SE (≤0.30), which means that D28-C could provide a more precise measurement for the students whose dark trait disposition level fell in this range.
As a result, 15 items (items 1, 4, 7, 8, 10, 11, 13, 17, 22, 23, 24, 26, 28, 29, and 32) were retained and 6 items were negatively keyed. The IRT analysis results showed that the discrimination parameters ranged from 0.98 to 1.57 and the short form fit the GRM model well at the item level. The threshold parameters ranged from −3.59 to 5.13, with item 33 having the highest values for the last two thresholds, indicating that it was the most difficult item in the brief scale. Item information (across the trait level from −3 to 3) of the items ranged from 1.89 to 3.46, while the test information was 40.71, indicating that the short form retained 62.80% of the test information of the final version scale. As shown in
Figure 3, the two curves had a similar trend, which means that they could both provide a more accurate measurement at some specific range of D levels than other ranges. The information curve of the brief scale was lower than that of D28-C, indicating that the removal of items did lead to some loss of the total information provided by the whole scale.
5. Discussions
This study examined the structure and reliability of D35 in the Chinese cultural context. It was found that the Chinese version scale was also unidimensional, but some of the items were not applicable in the Chinese context and, thus, were removed in the revision process. The final 28-item dark factor of personality scale provided a reliable instrument for measuring the D level in the Chinese-language samples. In addition, a short version of D28-C was developed based on the content and detailed information of the items, obtaining a 15-item scale.
After removing four items from the initial Chinese version of the scale, our Chinese version of the D scale (D28-C) was comparable to the original English version and had sound psychometric properties. EFA results showed that all items shared one latent structure, consistent with previous studies [
15,
20]. The discrimination parameters of the final scale ranged from 0.79 to 1.76, indicating that the items could discriminate different levels of D with adequate accuracy. For the four deleted items (item 2: “Payback needs to be quick and nasty.”; item 9: “Never tell anyone the real reason you did something unless it is useful to do so.”; item 15: “In principle, everyone is worth the same.”, and item 16: “I cannot imagine how being mean to others could ever be exciting. (R)”), their discrimination estimations were slightly small, indicating that it was difficult for these four items to distinguish among students with different levels of D factor. This discrepancy with the original scale could be explained from a cultural perspective. For items 2 and 16, it is generally known that collectivist culture encourages cooperation and tolerance and the Confucian culture also emphasizes benevolence [
11], so behaviors that may hurt others are strongly inhibited in Chinese society. For item 9, it appears to emphasize strategy more than just dark traits such as Psychopathy. China has a long history of strategic culture and profoundly influences on many areas of individuals’ social life [
49]. Being good at using strategies to achieve goals is not always negative in Chinese culture. Additionally, this item does not specify whether it is detrimental to the profit of others. Thus, it may not be highly related to dark traits, which weakens its capability to distinguish people with different levels of D. For item 15, unlike in Western culture, which emphasizes individual achievement, commonwealth and equality are more emphasized pursuits and consensuses in China’s collectivist culture [
50]. Therefore, this item will be relatively more difficult to distinguish individual differences. Also, the high values of the third and fourth threshold parameter can provide some support for this reason.
The test of DIF identified nonsignificant DIF items after Bonferroni adjustment for gender, except for item 13 (“A person should use any and all means that are to his advantage, taking care of course, that others do not find out.”), which had a minor but significant DIF. According to the social role theory [
51], the result is reasonable. Society and culture have different role expectations for different gender groups, which may influence the criteria to which individuals refer in their self-assessment. For instance, Machiavellianism is a dark trait marked by good uses of strategies. This trait characteristic is more in line with males’ instrumental roles (associated with work, achievement, and domination) and less with females’ expressive roles (associated with emotional expression and interpersonal relationships). Thus, using strategies to achieve goals is less encouraged for females, and they will be more sensitive about it. As a result, men with high levels may consider that they are at an average level. In contrast, women with average levels perceive themselves as being at a high level, resulting in the DIF across gender. However, given that item 13 accounted for only 4% of the whole scale and the effect size was small, we still concluded that the whole D28-C was gender invariant.
For the test information, the total information of the D28-C was 64.83 and the marginal reliability was 0.92, indicating that it was a reliable scale. It can be found that the information at the moderate and higher disposition levels (i.e., −1.35 < θ < 3) was over 10, which converted to reliability larger than 0.9, indicating sufficient measurement precision. Although the reliability dropped quickly out of the range of the continuum, it could still provide reasonable precision for most individuals at higher attitude levels. For instance, even for the low attitude level of −3.0, the information was still 4.26, which equaled acceptable marginal reliability of 0.77 [
52]. Overall, the scale items provided acceptable test information for measuring the D factor of students, especially for those located in the moderate and higher range of the continuum, with high scale reliability in this interval.
Meanwhile, this study provided some interesting findings through the test information, which cannot be found with CTT methods. For example, it was found that the D scale had different measurement accuracy across the D continuum. With traditional methods, we can know only that the Cronbach’s α value was 0.91 for the final Chinese version of the scale, indicating that the scale had identical reliability for all D levels. However, the IRT analysis provided a more detailed explanation: the scale had differential measurement precision across the trait continuum. For example, the scale was highly reliable for individuals with moderate and higher trait levels and provided less, but acceptable, reliability for lower levels. Moreover, item information function displayed how the item information, a concept similar to the reliability in CTT, varied with the level of the D factor. From the function, researchers can discern which level range of the D is the most appropriate for this item. Then, in situations where it is necessary to differentiate among individuals with specific D levels, adding some “appropriate” items based on the item functions to the scale is recommended to improve accuracy at the required range of the trait continuum.
Though the 28-item scale was reliable, we want to further shorten the instrument for more efficient measurement. After the selection based on the item information and content, we obtained a 15-item scale which retained 62.80% of the test information of the D28-C over the D range of (−3, 3). However, it still provided test information of more than 5 across the trait level of (−1, 3), which means that the scale reliability was still acceptable at most levels of D (larger than 0.7) [
53]. Given that personality psychological studies often tend to focus on the relationships among different variables, the questionnaire often consists of a number of scales. Thus, the shortened version of the D scale may be more effective in this situation.
6. Conclusions
The present study translated the D35 into Chinese and used IRT methods to analyze the Chinese version of the D scale. After removing four items that contributed little to the whole scale, we obtained a 28-item Chinese version of D factor scale (D28-C). This work concluded that the D28-C had acceptable psychometric properties and provided a precise measurement of D from low to very high levels, which means that it can be applied to most students. Although the whole scale can be regarded as invariant across gender, item 13 may require further attention or modification to reduce the gender non-invariance in the context of Chinese culture. In addition, on the basis of the item information and content, a brief form of the D scale was obtained for more effective measurement.
Despite the promising findings, several limitations should be taken into account. First, although the sample size was acceptable, the participation was restricted to college students and less than 30% were men. These facts might limit the generalizability of our findings. Future research should recruit gender-balanced samples with a wide age range, including adolescents and older people. Second, only gender was used to test DIF in the current study. To ensure the generalizability of the scale, other variables, such as age and occupation, could be considered in further studies. Present studies found that individuals scored lower as age increased [
10]. Also, it was found that some dark traits were positively related to success in the competitive atmosphere [
53]. It is possible that individuals who work in highly competitive jobs, such as athletes, have a higher mean level of some dark traits and a more positive attitude towards these traits. However, these factors were not explored in the current study due to the sample limitation.