Next Article in Journal
A Sense of Scarcity Enhances the Above-Average Effect in Social Comparison
Next Article in Special Issue
Development and Validation of the “Lying Flat” Tendency Scale for the Youth
Previous Article in Journal
Domain Specific and Cross Domain Associations between PASS Cognitive Processes and Academic Achievement
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Gender-Based Differential Item Function for the Positive and Negative Semantic Dimensions of the Relationship Satisfaction Scale with Item Response Theory

1
Department of Social and Behavioural Sciences, City University of Hong Kong, Hong Kong, China
2
Wee Kim Wee School of Communication & Information, Nanyang Technological University, Singapore 639798, Singapore
*
Author to whom correspondence should be addressed.
Behav. Sci. 2023, 13(10), 825; https://doi.org/10.3390/bs13100825
Submission received: 14 August 2023 / Revised: 28 September 2023 / Accepted: 5 October 2023 / Published: 7 October 2023

Abstract

:
Relationship satisfaction is at the core of a robust social life and is essential to mental health. The positive and negative semantic dimensions of the relationship satisfaction (PN-SMD) scale is considered in the field of relationship studies to be a reliable tool for assessing the quality of a person’s interpersonal relationships. This study evaluated the psychometric properties of the PN-SMD scale by conducting multidimensional item response theory (MIRT) and differential item functioning (DIF) analyses, both of which are emerging assessment methods that focus on individual items. We recruited 511 Chinese undergraduate students for this study. Construct validity, internal consistency, and concurrent validity were assessed, and MIRT and DIF analyses were conducted. Five of the 14 items were found to have gender-based DIF traits, affecting the scale’s construct validity. A revised nine-item scale (DIF items excluded) had a significantly better model fit and demonstrated comparable concurrent validity to the original scale. The implications of our results and future research directions are discussed.

1. Introduction

Relationship satisfaction refers to a person’s own assessment of their interpersonal relationships and is regarded as an essential variable in relationship research [1]. Satisfaction in interpersonal relationships is an important component of mental well-being. Vanhalst, et al. [2] suggested that poor-quality relationships can lead to loneliness, which typically arises when one’s interpersonal relationships do not fulfil expectations [3]. The recent literature also suggested the quality of social relationships has been shown to be correlated with life satisfaction [4,5], positive affect [6,7], self-esteem [8,9], and self-efficacy [10]. As relationship satisfaction and relationship quality are very similar measures, a fact that is widely recognized in marriage-related research [11,12], we assume that satisfaction and quality are synonymous in the context of relationships [13]. However, no broadly accepted scale that measures relationship quality has been developed thus far.
In the early stages of the development of relationship quality scales, scholars tended to define the scales as unidimensional models (i.e., from dissatisfied to satisfied), in which the total score was thought to reflect an individual’s evaluation of his or her social relationships. Examples of these scales are the marital adjustment test (MAT) [14], the dyadic adjustment scale (DAS) [15], and the couples satisfaction index (CSI) [12]. Although these models are widely used and have been shown to demonstrate good psychometric properties, their unidimensional structure has been repeatedly challenged [16,17]. Unidimensional measures of relationship quality are now believed to obscure various characteristics in interpersonal relationships [18,19], as most relationships have both satisfactory and unsatisfactory aspects [20].
Based on the argument above, Fincham and Linfield [7] proposed a two-dimensional scale with six items assessing relationship quality. This measure, the positive and negative quality in marriage scale (PANQIMS), demonstrated adequate psychometric properties. PANQIMS allowed for the measurement of ambivalence and indifference, concepts that cannot be measured in single-dimensional models. Mattson, Rogge, Johnson, Davidson and Fincham [1] highlighted some of the limitations of PANQIMS (e.g., failure to account for conflicting attitudes and inconsistency in the target of assessment) and proposed a 14-item scale called the positive and negative semantic dimensions of relationship satisfaction (PN-SMD) based on the semantic differential technique [21]. The PN-SMD comprises two seven-item subscales that measure the positive and negative aspects of a relationship separately. Each subscale includes seven adjectives that are positive or negative according to the subscale. The instrument shows adequate incremental validity [1]. The PN-SMD has been used to examine deviant internet behavior among Chinese adolescents [22] and the mediation effect between problematic internet usage and self-esteem among Chinese undergraduates [23]. Although the PN-SMD showed good internal consistency in previous research (α = 0.92–0.93) [24], it showed deficiencies in individual item evaluation when item response theory was used [13]. In previous assessments using IRT, a unidimensional model was adopted, although PN-SMD is clearly a two-factor model. This may lead to concerns about applicability. It is therefore important to revisit the PN-SMD using multidimensional item response theory (MIRT) [25].
Item response theory (IRT), also known as latent trait theory, is designed to evaluate each item in a scale individually, using a statistical model [26]. There have been several applications of IRT, such as the CSI scale by Rogge, Fincham, Crasta and Maniaci [13] and the PN-RQ scale by Funk and Rogge [12], in the field of developing relationship scales. Funk and Rogge [12] suggested to analyzing the two subscales separately for the two-dimensional PN-SMD and PN-RQ. As such, MIRT is a more appropriate model, as it involves an IRT analysis of multiple scales [25] to assess the two-factor PN-SMD.
Differential item functioning (DIF) within the IRT framework is becoming more popular for evaluating a scale’s validity in education, psychology, and clinical settings [27,28]. DIF assesses whether different subgroups respond differently to each item [29]. A more recent and popular approach is multiple indicators and multiple causes (MIMIC) modeling for the detection of DIF [30]. The MIMIC model is a specialized version of structural equation modeling (SEM) that incorporates causal variables, or covariates, into a confirmatory factor analysis model [31]. According to Cheng, Shao and Lathrop [30], DIF can be understood as a model mediated by groups. There has been a scarcity of DIF analysis conducted on scales measuring relationship quality in the existing literature. The invariance of PN-SMD has been tested by Zeng, Zhang, Fung, Li, Liu, Xiong, Jiang, Zhu, Chen and Luo [23], who examined the configural invariance, metric invariance, and scalar invariance of the whole scale. Still, an investigation of the invariance for each item is warranted because the level of information contained in the individual items in the PN-SMD was found to be inadequate [13], and the model fits reported in previous studies were also unsatisfactory [23], though the internal consistency was good (α = 0.92–0.93) [24]. When employing a MIMIC model, both the measurement model and the structural model can be used to evaluate the direct effect of a covariate that defines group membership (e.g., gender) on factor means and factor indicators (items) [31]. This method can help to identify any gender-based DIF traits in the PN-SMD.
Based on the above controversies regarding the PN-SMD, it is important to re-evaluate the psychometric properties of both the whole scale and its individual items. MIRT and DIF are seldom-used scale assessment tools that provide superior assessment of individual items. This work will generate new insights that cannot be obtained using the typical mean groups difference approach.

2. Methods

2.1. Participants

Our cross-sectional research was conducted between April and May 2019 as part of a larger study focused on examining the relationship between quality of life and internet usage at a university in Guangzhou, China. Five hundred and eleven undergraduate students (18–23 years old) were recruited (refer to Table 1) through the university’s intranet system, utilizing a smartphone-based self-report application. The sample represented the demographic profile of the university population. Prior to participating in the study, all participants provided informed consent.
The research procedures adhered to the relevant regulations outlined in the current versions of the Statistics Law of the People’s Republic of China and the Declaration of Helsinki.

2.2. Measures

The PN-SMD scale is a relationship satisfaction scale. Participants are required to rate 14 items, including 7 positive qualities (interesting, full, sturdy, enjoyable, good, friendly, and hopeful) and seven negative qualities (bad, lonely, discouraging, boring, empty, fragile, and miserable) on an 8-point Likert-type scale, ranging from 0 (not at all) to 7 (always) [1].
The positive and negative affect schedule (PANAS) [32] is a 20-item measure designed to self-evaluate positive affect (PA) and negative affect (NA) using a 5-point Likert-type scale. The overall reliability of PANAS is 0.846. In this study, the Cronbach’s α values for the PA and NA subscales (10-item each) were 0.802 and 0.884, respectively, which aligns with those reported in other PANAS studies conducted in the Chinese context [33,34,35].
The brief resilience scale (BRS) [36], is a self-reported measure to assess the perceived ability to recover from stress, comprising six 5-point Likert-type items ranging from 1 (does not describe me at all) to 5 (describes me very well), e.g., ‘I tend to bounce back quickly after hard times’. The scale demonstrated good internal consistency in this study (α = 0.708). The Chinese version of the BRS was translated and validated by Fung [37].
To assess the participants’ satisfaction with their lives, we utilized the satisfaction with life index (SWLS) [38]. SWLS items are rated on a 7-point Likert-type scale ranging from 1 (strongly disagree) to 7 (strongly agree). The total score, derived from summing all item scores, ranges from 5 to 35, with higher scores indicating greater life satisfaction. The Cronbach’s α in this study was 0.819, which is comparable to values reported by Diener, Emmons, Larsen and Griffin [38] and Kong, et al. [39] (α = 0.81–0.89).
The Chinese version of the Rosenberg self-esteem scale (RSE), validated by Wu, et al. [40], was employed to measure participants’ self-esteem. This scale consists of 10 items rated on a 4-point Likert-type scale ranging from 1 (strongly disagree) to 4 (strongly agree), for example, ‘I wish I could have more respect for myself’. Song, et al. [41] reported a Cronbach’s α of 0.83 for this scale among Chinese university students. In the present study, the Cronbach’s α was 0.755.
The general self-efficacy scale (GSE) [42], translated by Zhang and Schwarzer [43], comprises 10 items, each rated on a 4-point Likert-type scale ranging from 1 (absolutely incorrect) to 4 (absolutely correct), e.g., ‘I am confident that I could deal efficiently with unexpected events’. The scale demonstrated good internal consistency in this study, with a Cronbach’s α of 0.884, which is consistent with recent research conducted by Zeng, et al. [44].

2.3. Data Analysis

IRT is a testing model that is based on the relationship between respondents’ scores on an item and their level on the overall score on the scale [45,46]. According to Brennan [47], IRT focuses on item responses, whereas classical test theory (CTT) focuses on test or form scores. There are different assumptions and features related to the IRT and CTT, such as in the issues related to forms and parallelism, true score, and assumptions’ primary strengths. As CTT is beneficial in evaluating brief instruments [48,49], the MIRT model was adopted in this study to calibrate item parameter estimates and assess how participants responded to item-level stimuli [25]. MIRT combines IRT and factor analysis, enabling the analysis of multi-factor models. The MIRT model calculates an item discrimination index (a parameter) for each factor and intercept parameters (d parameters). The difficulty (threshold) values, which reflect the probability that a person will score above or below a given threshold, were computed using the following equation: a parameter * difficulty (threshold) values = intercept. The criterion (a parameter > 1.00) was used [50,51], a practice widely adopted in recent IRT analyses [52,53,54]. Values > 1.70 are considered very high [55].
DIF was calculated using the MIMIC method with scale purification, which controls false-positive rates and yields higher true-positive rates [56]. Recent simulation study also suggested that a stepwise purification procedure is suitable for the sample size and number of items of the scale in the current study [57]. Following Byrne [58], because the scale is based on categorical data, the weighted least squares (WLS) approach was used to estimate the parameters. This involved (a) testing each item for DIF one at a time, using all other items as the anchor; and (b) subsequently using a purified anchor to test all remaining items for DIF, repeating this process until the same set of items was detected as showing DIF for two successive iterations [30]. The following criteria indicated DIF items: −1.96 < z value < 1.96; p value < 0.05 [59].
To evaluate the construct validity and verify the shortened version [60], confirmatory factor analysis (CFA) with the DWLS estimator [61] was used to evaluate the construct validity of the PN-SMD scale. The following criteria indicated a good model fit: comparative fit index (CFI) > 0.95, Tucker–Lewis index (TLI) > 0.95, root mean square error of approximation (RMSEA) < 0.06, standardized root mean square residual (SRMR) < 0.06, and χ2/df ≤ 3 [62].
Concurrent validity was evaluated using other construct-related scales, consistent with the previous literature. Relationship satisfaction has been shown to be significantly correlated with positive or negative affect [6,7], resilience [63], satisfaction with life [4,5], self-esteem [8,9], and self-efficacy [10]. Hence, we used PANAS, BRS, SWLS, RSE, and GSE to assess the convergent and divergent validity of our proposed scale adaptation.
All of the above analyses were computed with SPSS version 26.0, in the R computing environment (4.1.1) [64] with the lavaan package 0.6–9 [65] and Mplus (8.6) [66].

3. Results

3.1. MIRT Results

Table 2 shows the PN-SMD item parameter estimates for MIRT. The items had parameters for their own dimension between 1.969 and 3.9. All of these exceeded 1.7, which is regarded as a high value for a slope parameter [55]. This means that all of the items discriminated well between low and high levels of positive and negative relationship satisfaction.
Although the scale comprised 8-point items, the fact that no respondents selected option 0 resulted in only 6 intercepts being reported. For the graded response model, a minimum distance of 0.3 between adjacent intercepts is required to ensure that closer categories will not be selected often [25]. As shown in Table 2, the distance between intercepts was satisfactory.
Difficulty or ‘threshold’ values (b parameters) were calculated using the equation above. For example, the threshold parameters of PN-SMD 1 were −2.287, −1.464, −1.107, −0.003, 0.670, and 1.963. These parameters signify the cut-off points between item categories.

3.2. DIF Results

Before conducting the DIF analysis, we compared the mean PN-SMD scores using a single-df analysis of variance (ANOVA). No statistically significant differences were found in subscale scores between gender-based groups (p = 0.516/0.710 > 0.05). Table 3 shows the results of DIF items in the last iteration of the MIMIC model using the scale purification method. The items in Table 3 show DIF in two consecutive iterations, with PN-SMD 9 presenting marginal values. As the β values of PN-SMD 1, PN-SMD 2, PN-SMD 5, and PN-SMD 10 were negative, these items were regarded as in favor of the focal group (female). Conversely, PN-SMD 9 is in favor of the reference group (male). Women were more likely to use the adjectives ‘interesting’, ‘full’, ‘good’, and ‘discouraging’ when describing their relationships, whereas men were more inclined to say they felt lonely. For detail corresponding z values in each iteration, please refer to Appendix A.

3.3. Construct Validity

The 14-item PN-SMD failed to show an acceptable model fit in this sample of Chinese undergraduate students (see Table 4). After deleting the DIF items identified above, the new nine-item model demonstrated a significant improvement over the original model. However, this new model did not fulfil some of the cut-off criteria for a good model fit (χ2/df, RMSEA, and TLI). Stability is an important element of belonging [67], and there is a clear correlation between belonging and enjoyment [68]. To evaluate this possible correlation, we correlated PN-SMD 3 (stable) with PN-SMD 4 (enjoyable). The nine-item model (P-SMD: items 3, 4, 6, and 7; N-SMD: items 8, 11, 12, 13, and 14) correlated error terms showed satisfactory values, with the exception of χ2/df (with 3.43 > 3). According to Marsh and Hocevar [69], χ2/df < 5.00 still can be regarded as a parsimonious fit. Thus, the final model can be considered statistically acceptable.

3.4. Concurrent Validity

As shown in Table 5, the new subscales (with deleted DIF items) showed satisfactory convergent and divergent validity according to Pearson’s correlation coefficient, with the new scale being comparable to the original scale (r = 0.951/0.98, p < 0.01).

4. Discussion

In the present study, we evaluated each item in the PN-SMD scale proposed by Mattson, Rogge, Johnson, Davidson and Fincham [1] using MIRT and DIF. With reference to the Web of Science database, this is the first time that each item on the scale has been assessed individually.
Based on the results of our MIRT analysis, all of the items in both subscales showed a good ability to distinguish between different levels of positive or negative relationship satisfaction. The slope parameters of the positive subscale (M = 2.51; SD = 0.23; range 2.290–2.893) and the negative subscale (M = 3.06; SD = 0.69; range 1.969–3.900) reached the value of a parameter > 1.7 and are therefore considered highly discriminated [55]. These parameters are comparable to the factor loadings on each dimension. This finding is in line with the idea that a higher parameter value indicates a stronger link to the structure [70], as the ‘mirt’ package [25] provides intercepts (d parameters) rather than difficulty or ‘threshold’ values (b parameters). Threshold parameters (b parameters) were found to be unevenly distributed across the trait range. The items on the positive subscale (for example, PN-SMD1: “My relationship is interesting”) showed a positive skew, while items on the negative subscale (such as PN-SMD9: “My relationship is lonely”) exhibited a negative skew. This means that more respondents were likely to endorse positive adjectives when describing relationships [71], which is consistent with the findings of Rogge, Fincham, Crasta and Maniaci [13]. In summary, all 14 items in the PN-SMD showed sufficient discrimination and difficulty in the MIRT model.
The results of DIF analysis provide new insight into the function of PN-SMD items across different gender groups. Of the 14 PN-SMD items, five exhibited DIF in favor of female participants (PN-SMD 1: interesting, PNSMD 2: full, PN-SMD 5: good, and PN-SMD 10: discouraging), while one exhibited DIF in favor of male participants (PN-SMD 9: lonely). When discussing DIF items, it is important to note that items showing DIF are not necessarily biased [72]. Zieky [73] suggested that judgmental reviews are required when converting statistical differences into practical significance. As the attributes of the MIMIC model and the detected DIF are uniform, MIRT shows a significant group difference in the item intercept, or difficulty [74]. In this study, female participants had higher scores on most items marked as DIF. This is consistent with evidence that women have a better ability to identify and name their feelings than men [75,76]. Although this finding does not explain why these particular items showed differences in item function, it does support the possibility that there may be a female bias. Consistent with the higher scores of male participants for PN-SMD 9 (lonely), some studies have suggested that men report more loneliness. Women tend to prefer dyadic relationships, which may lead to a stronger connection with partners, whereas men tend to interact in groups of three or more people [77,78,79]. Borys and Perlman [80] found that men reported a higher level of loneliness than women, a finding replicated by Stokes and Levin [79]. In summary, the results of this study indicate that the PN-SMD exhibit gender-based differential item functioning (DIF) characteristics. It is important for future studies to take this into consideration when utilizing this scale.
Another major contribution of this study is to propose an adapted version of nine-item PN-SMD that omits the gender-biased DIF items. According to Teresi, et al. [81], it is sometimes appropriate to remove an item with DIF or to flag it as an item that should not be used for certain groups. We ran CFA on the original scale and found that without the DIF items, the model fit improved considerably (see Table 4). As the indices did not reach all of the cut-off values listed, we further correlated PN-SMD 3 (stable) and PN-SMD 4 (enjoyable) based on the modification index. The final model (removed items 1, 2, 5, 9, and 10) showed a satisfactory fit, demonstrating that deleting DIF items can be a fruitful measure. The results of a Pearson’s correlation analysis showed that the revised model had adequate concurrent validity and the coefficients were comparable to those of the initial model.
This study has several limitations. The first is that the MIMIC model adopted here only examines uniform DIF [74], which means that non-uniform DIF may have existed but may not have been detected. In future research, scholars can adopt other methods of assessing DIF that are able to detect non-uniform DIF, such as logistic regression and the IRT model [74,82,83,84,85] and quantify the impact of DIF on mean comparisons by a variance measure by linking errors, either by analytical treatments [86] or with resampling techniques [87]. Second, the probability of Type I errors in this study was inflated because we did not test the clinical meaning of DIF using qualitative methods, which is considered a best practice [88]. This limitation could be improved in future studies by using mixed methods and the Bonferroni correction [89] as well as the simultaneous item bias test (SIBTEST) to account for multilevel data structures with small sample context [90,91]. Given the importance of interpersonal relationship satisfaction to a person’s mental health, it is of obvious practical significance to accurately measure the quality of interpersonal relationships in different groups. Removing gender-biased items makes this scale more suitable in various contexts.

5. Conclusions

The validity and rationality of relationship satisfaction, an important concept in psychological well-being, is still widely debated. MIRT and DIF are increasingly being used to assess psychometric properties, and this study represents the first time these methods have been applied to a scale related to relationship satisfaction. The main contribution of this study is an analysis of the psychometric properties of the PN-SMD from the perspective of individual items. Based on our findings, we propose a nine-item scale that omits the gender-biased DIF items. The new scale shows satisfactory construct validity that is comparable to the original scale. It is suggested that this nine-item scale could be adopted in subsequent studies related to relationship satisfaction, especially those working with a Chinese sample.

Author Contributions

S.-f.F.: conceptualization; formal analysis; investigation; methodology; validation; writing—original draft; writing—review and editing; data curation; project administration; supervision. J.J.: conceptualization; formal analysis; writing—original draft; writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Guangzhou Huashang College (approval number: 20190502).

Informed Consent Statement

Informed consent was obtained from all participants involved in the study.

Data Availability Statement

Correspondence and requests for materials should be addressed to S.F.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. The Corresponding z Values in Each Iteration

z-Value
Identified AnchorsPN-SMD 2PN-SMD 4PN-SMD 5PN-SMD 7PN-SMD 11PN-SMD 12PN-SMD 13PN-SMD 14
PN-SMD 1 (interesting)−2.454 * −4.534 *** − 2.891 ** −4.345 *** −4.107 *** −4.140 *** −3.692 ***−3.644 ***
PN-SMD 2 (full)/−2.902 ** −1.545−2.731 *** −2.988 ** −2.909 ** −2.564 **−2.476 **
PN-SMD 3 2.133 *0.6372.093 * 0.6430.1440.120.3530.313
PN-SMD 4 1.578/1.844−0.005−0.446−0.44−0.204−0.168
PN-SMD 5 (good)−0.558−2.694 ** /−2.369 ** −2.327 ** −2.326 * −2.058 *−1.973 *
PN-SMD 6 2.634 **1.423.227 ** 1.5640.7240.6970.9380.943
PN-SMD 71.269−0.2881.353/−0.584−0.574−0.339−0.313
PN-SMD 80.6321.4610.7741.4121.962.109* 1.7141.565
PN-SMD 9 (lonely)1.0712.081 *1.3732.060 * 3.158 ** 3.176 ** 2.577 **2.347 *
PN-SMD 10 (discouraging)−2.407 * −1.314−2.053 * −1.345−1.444−1.359−2.162 *−2.100 *
PN-SMD 11−1.736−0.435−1.233−0.498/−0.023−0.645−0.758
PN-SMD 12−1.588−0.46−1.243−0.513−0.169/−0.897−0.917
PN-SMD 13−0.7750.149−0.5880.1130.6370.876/−0.053
PN-SMD 14 −0.4770.275−0.2690.2490.7940.9330.264/
Note. The detected DIF items were bold. * p < 0.05; ** p < 0.01; *** p < 0.001.

References

  1. Mattson, R.E.; Rogge, R.D.; Johnson, M.D.; Davidson, E.K.; Fincham, F.D. The positive and negative semantic dimensions of relationship satisfaction. Pers. Relatsh. 2013, 20, 328–355. [Google Scholar]
  2. Vanhalst, J.; Luyckx, K.; Goossens, L. Experiencing loneliness in adolescence: A matter of individual characteristics, negative peer experiences, or both? Soc. Dev. 2014, 23, 100–118. [Google Scholar] [CrossRef]
  3. Perlman, D. Loneliness: A Sourcebook of Current Theory, Research and Therapy; John Wiley & Sons Incorporated: Hoboken, NJ, USA, 1982; Volume 36. [Google Scholar]
  4. Goodman-Deane, J.; Mieczakowski, A.; Johnson, D.; Goldhaber, T.; Clarkson, P.J. The impact of communication technologies on life and relationship satisfaction. Comput. Hum. Behav. 2016, 57, 219–229. [Google Scholar] [CrossRef]
  5. Fuller-Iglesias, H.R. Social ties and psychological well-being in late life: The mediating role of relationship satisfaction. Aging Ment. Health 2015, 19, 1103–1112. [Google Scholar] [CrossRef] [PubMed]
  6. Shortt, J.W.; Capaldi, D.M.; Kim, H.K.; Laurent, H.K. The effects of intimate partner violence on relationship satisfaction over time for young at-risk couples: The moderating role of observed negative and positive affect. Partn. Abus. 2010, 1, 131–151. [Google Scholar] [CrossRef]
  7. Fincham, F.D.; Linfield, K.J. A new look at marital quality: Can spouses feel positive and negative about their marriage? J. Fam. Psychol. 1997, 11, 489. [Google Scholar] [CrossRef]
  8. Sciangula, A.; Morry, M.M. Self-esteem and perceived regard: How I see myself affects my relationship satisfaction. J. Soc. Psychol. 2009, 149, 143–158. [Google Scholar] [CrossRef]
  9. Erol, R.Y.; Orth, U. Development of self-esteem and relationship satisfaction in couples: Two longitudinal studies. Dev. Psychol. 2014, 50, 2291. [Google Scholar] [CrossRef]
  10. Weiser, D.A.; Weigel, D.J. Self-efficacy in romantic relationships: Direct and indirect effects on relationship maintenance and satisfaction. Personal. Individ. Differ. 2016, 89, 152–156. [Google Scholar] [CrossRef]
  11. Karney, B.R.; Bradbury, T.N. The longitudinal course of marital quality and stability: A review of theory, methods, and research. Psychol. Bull. 1995, 118, 3. [Google Scholar] [CrossRef]
  12. Funk, J.L.; Rogge, R.D. Testing the ruler with item response theory: Increasing precision of measurement for relationship satisfaction with the Couples Satisfaction Index. J. Fam. Psychol. 2007, 21, 572. [Google Scholar] [CrossRef]
  13. Rogge, R.D.; Fincham, F.D.; Crasta, D.; Maniaci, M.R. Positive and negative evaluation of relationships: Development and validation of the Positive–Negative Relationship Quality (PN-RQ) scale. Psychol. Assess. 2017, 29, 1028. [Google Scholar] [CrossRef] [PubMed]
  14. Locke, H.J.; Wallace, K.M. Short marital-adjustment and prediction tests: Their reliability and validity. Marriage Fam. Living 1959, 21, 251–255. [Google Scholar] [CrossRef]
  15. Spanier, G.B. Measuring dyadic adjustment: New scales for assessing the quality of marriage and similar dyads. J. Marriage Fam. 1976, 38, 15–28. [Google Scholar] [CrossRef]
  16. Weiss, R.; Pinsof, W. Family Psychology: The Art of the Science; Oxford University Press: Oxford, UK, 2005. [Google Scholar]
  17. Jacobson, N.S. The role of observational measures in behavior therapy outcome research. Behav. Assess. 1985, 7, 297–308. [Google Scholar]
  18. Johnson, D.R.; White, L.K.; Edwards, J.N.; Booth, A. Dimensions of marital quality: Toward methodological and conceptual refinement. J. Fam. Issues 1986, 7, 31–49. [Google Scholar] [CrossRef]
  19. Fincham, F.D.; Rogge, R. Understanding relationship quality: Theoretical challenges and new tools for assessment. J. Fam. Theory Rev. 2010, 2, 227–242. [Google Scholar] [CrossRef]
  20. Gilford, R.; Bengtson, V. Measuring marital satisfaction in three generations: Positive and negative dimensions. J. Marriage Fam. 1979, 44, 387–398. [Google Scholar] [CrossRef]
  21. Osgood, C.E. Semantic differential technique in the comparative study of cultures. Am. Anthropol. 1964, 66, 171–200. [Google Scholar] [CrossRef]
  22. Wong, D.S.-W.; Fung, S.-F. Development of the Cybercrime Rapid Identification Tool for Adolescents. Int. J. Environ. Res. Public Health 2020, 17, 4691. [Google Scholar] [CrossRef]
  23. Zeng, G.; Zhang, L.; Fung, S.-F.; Li, J.; Liu, Y.-M.; Xiong, Z.-K.; Jiang, Z.-Q.; Zhu, F.-F.; Chen, Z.-T.; Luo, S.-D. Problematic Internet Usage and Self-Esteem in Chinese Undergraduate Students: The Mediation Effects of Individual Affect and Relationship Satisfaction. Int. J. Environ. Res. Public Health 2021, 18, 6949. [Google Scholar] [CrossRef] [PubMed]
  24. Wagner, S.A.; Mattson, R.E.; Davila, J.; Johnson, M.D.; Cameron, N.M. Touch me just enough: The intersection of adult attachment, intimate touch, and marital satisfaction. J. Soc. Pers. Relatsh. 2020, 37, 1945–1967. [Google Scholar] [CrossRef]
  25. Chalmers, R.P. Mirt: A multidimensional item response theory package for the R environment. J. Stat. Softw. 2012, 48, 1–29. [Google Scholar] [CrossRef]
  26. Bortolotti, S.L.V.; Tezza, R.; de Andrade, D.F.; Bornia, A.C.; de Sousa Júnior, A.F. Relevance and advantages of using the item response theory. Qual. Quant. 2013, 47, 2341–2360. [Google Scholar] [CrossRef]
  27. Edelen, M.O.; Stucky, B.D.; Chandra, A. Quantifying ‘problematic’ DIF within an IRT framework: Application to a cancer stigma index. Qual. Life Res. 2015, 24, 95–103. [Google Scholar] [CrossRef]
  28. Huang, S.J.; Valdivia, D.S. Wald & chi2 Test for Differential Item Functioning Detection with Polytomous Items in Multilevel Data. Educ. Psychol. Meas. 2023. [Google Scholar] [CrossRef]
  29. Teresi, J.A. Overview of quantitative measurement methods: Equivalence, invariance, and differential item functioning in health applications. Med. Care 2006, 44, S39–S49. [Google Scholar] [CrossRef]
  30. Cheng, Y.; Shao, C.; Lathrop, Q.N. The Mediated MIMIC Model for Understanding the Underlying Mechanism of DIF. Educ. Psychol. Meas. 2016, 76, 43–63. [Google Scholar] [CrossRef]
  31. Tsaousis, I.; Sideridis, G.D.; AlGhamdi, H.M. Measurement Invariance and Differential Item Functioning Across Gender Within a Latent Class Analysis Framework: Evidence From a High-Stakes Test for University Admission in Saudi Arabia. Front. Psychol. 2020, 11, 622. [Google Scholar] [CrossRef]
  32. Watson, D.; Clark, L.A.; Tellegen, A. Development and validation of brief measures of positive and negative affect: The PANAS scales. J. Personal. Soc. Psychol. 1988, 54, 1063. [Google Scholar] [CrossRef]
  33. Fang, X.; Zhang, J.; Teng, C.; Zhao, K.; Su, K.-P.; Wang, Z.; Tang, W.; Zhang, C. Depressive symptoms in the front-line non-medical workers during the COVID-19 outbreak in Wuhan. J. Affect. Disord. 2020, 276, 441–445. [Google Scholar] [CrossRef]
  34. Liu, J.-D.; You, R.-H.; Liu, H.; Chung, P.-K. Chinese version of the international positive and negative affect schedule short form: Factor structure and measurement invariance. Health Qual. Life Outcomes 2020, 18, 285. [Google Scholar] [CrossRef]
  35. Liang, Y.; Zhu, D. Subjective well-being of Chinese landless peasants in relatively developed regions: Measurement using PANAS and SWLS. Soc. Indic. Res. 2015, 123, 817–835. [Google Scholar] [CrossRef]
  36. Smith, B.W.; Dalen, J.; Wiggins, K.; Tooley, E.; Christopher, P.; Bernard, J. The brief resilience scale: Assessing the ability to bounce back. Int. J. Behav. Med. 2008, 15, 194–200. [Google Scholar] [CrossRef]
  37. Fung, S.-F. Validity of the Brief Resilience Scale and Brief Resilient Coping Scale in a Chinese Sample. Int. J. Environ. Res. Public Health 2020, 17, 1265. [Google Scholar] [CrossRef] [PubMed]
  38. Diener, E.; Emmons, R.A.; Larsen, R.J.; Griffin, S. The satisfaction with life scale. J. Personal. Assess. 1985, 49, 71–75. [Google Scholar] [CrossRef] [PubMed]
  39. Kong, F.; Zhao, J.; You, X. Emotional intelligence and life satisfaction in Chinese university students: The mediating role of self-esteem and social support. Personal. Individ. Differ. 2012, 53, 1039–1043. [Google Scholar] [CrossRef]
  40. Wu, Y.; Zuo, B.; Wen, F.; Yan, L. Rosenberg Self-Esteem Scale: Method effects, factorial structure and scale invariance across migrant child and urban child populations in China. J. Personal. Assess. 2017, 99, 83–93. [Google Scholar] [CrossRef] [PubMed]
  41. Song, H.; Cai, H.; Brown, J.D.; Grimm, K.J. Differential item functioning of the Rosenberg self-esteem scale in the US and China: Measurement bias matters. Asian J. Soc. Psychol. 2011, 14, 176–188. [Google Scholar] [CrossRef]
  42. Schwarzer, R.; Jerusalem, M. Generalized self-efficacy scale. In Measures in Health Psychology: A User’s Portforlio. Causal and Control Beliefs; NFER-Nelson: Windsor, UK, 1995; Volume 1, pp. 35–37. [Google Scholar]
  43. Zhang, J.X.; Schwarzer, R. Measuring optimistic self-beliefs: A Chinese adaptation of the General Self-Efficacy Scale. Psychol. Int. J. Psychol. Orient 1995, 38, 174–181. [Google Scholar]
  44. Zeng, G.; Fung, S.-F.; Li, J.; Hussain, N.; Yu, P. Evaluating the psychometric properties and factor structure of the general self-efficacy scale in China. Curr. Psychol. 2020, 41, 3970–3980. [Google Scholar]
  45. Edelen, M.O.; Reeve, B.B. Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual. Life Res. 2007, 16, 5–18. [Google Scholar] [CrossRef] [PubMed]
  46. Hambleton, R.K.; Swaminathan, H. Item Response Theory: Principles and Applications; Springer Science & Business Media: Dordrecht, The Netherlands, 2013. [Google Scholar]
  47. Brennan, R.L. Generalizability Theory and Classical Test Theory. Appl. Meas. Educ. 2011, 24, 1–21. [Google Scholar] [CrossRef]
  48. Embretson, S.E. The new rules of measurement. Psychol. Assess. 1996, 8, 341. [Google Scholar] [CrossRef]
  49. Lord, F.M. Applications of Item Response Theory to Practical Testing Problems; Routledge: Abignon, UK, 2012. [Google Scholar]
  50. Hambleton, R.K.; Swaminathan, H.; Rogers, H.J. Fundamentals of Item Response Theory; Sage: Thousand Oaks, CA, USA, 1991; Volume 2. [Google Scholar]
  51. Ansley, T.N.; Forsyth, R.A. An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data. Appl. Psychol. Meas. 1985, 9, 37–48. [Google Scholar] [CrossRef]
  52. Fletcher, R.B.; Crocker, P. A polytomous item response theory analysis of social physique anxiety scale. Meas. Phys. Educ. Exerc. Sci. 2014, 18, 153–167. [Google Scholar] [CrossRef]
  53. Huang, F.; Ye Han, X.; Chen, S.-L.; Guo, Y.F.; Wang, A.; Zhang, Q. Psychometric testing of the Chinese simple version of the Simulation Learning Effectiveness Inventory: Classical theory test and item response theory. Front. Psychol. 2020, 11, 32. [Google Scholar] [CrossRef]
  54. Pasca, L.; Aragonés, J.I.; Coello, M.T. An analysis of the connectedness to nature scale based on item response theory. Front. Psychol. 2017, 8, 1330. [Google Scholar] [CrossRef]
  55. Baker, F.B. The Basics of Item Response Theory; ERIC: Washington, DC, USA, 2001.
  56. Wang, W.-C.; Shih, C.-L.; Yang, C.-C. The MIMIC Method With Scale Purification for Detecting Differential Item Functioning. Educ. Psychol. Meas. 2009, 69, 713–731. [Google Scholar] [CrossRef]
  57. Khalid, M.N.; Glas, C.A.W. A scale purification procedure for evaluation of differential item functioning. Measurement 2014, 50, 186–197. [Google Scholar] [CrossRef]
  58. Byrne, B.M. Structural Equation Modeling with LISREL, PRELIS, and SIMPLIS: Basic Concepts, Applications, and Programming; Psychology Press: Abignon, UK, 2013. [Google Scholar]
  59. Lai, J.-S.; Teresi, J.; Gershon, R. Procedures for the Analysis of Differential Item Functioning (DIF) for Small Sample Sizes. Eval. Health Prof. 2005, 28, 283–294. [Google Scholar] [CrossRef] [PubMed]
  60. Smith, G.T.; McCarthy, D.M.; Anderson, K.G. On the sins of short-form development. Psychol. Assess. 2000, 12, 102–111. [Google Scholar] [CrossRef]
  61. Li, C.H. Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behav. Res. Methods 2016, 48, 936–949. [Google Scholar] [CrossRef] [PubMed]
  62. Kline, R.B. Principles and Practice of Structural Equation Modeling; Guilford Publications: New York, NY, USA, 2015. [Google Scholar]
  63. Bradley, J.M.; Hojjat, M. A model of resilience and marital satisfaction. J. Soc. Psychol. 2017, 157, 588–601. [Google Scholar] [CrossRef] [PubMed]
  64. R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  65. Rosseel, Y. lavaan: An R Package for Structural Equation Modeling. J. Stat. Softw. 2012, 48, 1–36. [Google Scholar] [CrossRef]
  66. Muthén, L.K.; Muthén, B.O. Mplus User’s Guide, 8th ed.; Muthén & Muthén: Los Angeles, CA, USA, 2017. [Google Scholar]
  67. Baumeister, R.F.; Leary, M.R. The need to belong: Desire for interpersonal attachments as a fundamental human motivation. Psychol. Bull. 1995, 117, 497. [Google Scholar] [CrossRef]
  68. Gao, W.; Liu, Z.; Li, J. How does social presence influence SNS addiction? A belongingness theory perspective. Comput. Hum. Behav. 2017, 77, 347–355. [Google Scholar] [CrossRef]
  69. Marsh, H.W.; Hocevar, D. Application of confirmatory factor analysis to the study of self-concept: First-and higher order factor models and their invariance across groups. Psychol. Bull. 1985, 97, 562. [Google Scholar] [CrossRef]
  70. Lameijer, C.; Van Bruggen, S.; Haan, E.; Van Deurzen, D.; Van der Elst, K.; Stouten, V.; Kaat, A.; Roorda, L.; Terwee, C. Graded response model fit, measurement invariance and (comparative) precision of the Dutch-Flemish PROMIS® Upper Extremity V2. 0 item bank in patients with upper extremity disorders. BMC Musculoskelet. Disord. 2020, 21, 170. [Google Scholar] [CrossRef]
  71. Van Dam, N.T.; Earleywine, M.; Borders, A. Measuring mindfulness? An item response theory analysis of the Mindful Attention Awareness Scale. Personal. Individ. Differ. 2010, 49, 805–810. [Google Scholar] [CrossRef]
  72. Holland, P.; Wainer, H. Differential item functioning. Psicothema 1995, 7, 237–242. [Google Scholar]
  73. Zieky, M. Developing Fair Tests. In Handbook of Test Development; Downing, S., Haladyna, T., Eds.; Routledge: New York, NY, USA, 2015. [Google Scholar]
  74. Teresi, J.A.; Fleishman, J.A. Differential item functioning and health assessment. Qual. Life Res. 2007, 16, 33–42. [Google Scholar] [CrossRef]
  75. Lane, R.D.; Sechrest, L.; Riedel, R. Sociodemographic correlates of alexithymia. Compr. Psychiatry 1998, 39, 377–385. [Google Scholar] [CrossRef]
  76. Barrett, L.F.; Lane, R.D.; Sechrest, L.; Schwartz, G.E. Sex differences in emotional awareness. Personal. Soc. Psychol. Bull. 2000, 26, 1027–1035. [Google Scholar] [CrossRef]
  77. Bell, R.R. Worlds of Friendship; Sage Publications: Thousand Oaks, CA, USA, 1981. [Google Scholar]
  78. Eder, D.; Hallinan, M.T. Sex differences in children’s friendships. Am. Sociol. Rev. 1978, 43, 237–250. [Google Scholar] [CrossRef]
  79. Stokes, J.P.; Levin, I. Gender differences in predicting loneliness from social network characteristics. J. Personal. Soc. Psychol. 1986, 51, 1069. [Google Scholar] [CrossRef]
  80. Borys, S.; Perlman, D. Gender differences in loneliness. Personal. Soc. Psychol. Bull. 1985, 11, 63–74. [Google Scholar] [CrossRef]
  81. Teresi, J.A.; Ramirez, M.; Jones, R.N.; Choi, S.; Crane, P.K. Modifying measures based on differential item functioning (DIF) impact analyses. J. Aging Health 2012, 24, 1044–1076. [Google Scholar] [CrossRef]
  82. Yasemin, K.; Leite, W.L.; Miller, M.D. A comparison of logistic regression models for DIF detection in polytomous items: The effect of small sample sizes and non-normality of ability distributions. Int. J. Assess. Tools Educ. 2015, 2, 22–39. [Google Scholar]
  83. Walker, C.M.; Gocer Sahin, S. Using a multidimensional IRT framework to better understand differential item functioning (DIF): A tale of three DIF detection procedures. Educ. Psychol. Meas. 2017, 77, 945–970. [Google Scholar] [CrossRef]
  84. Swaminathan, H.; Rogers, H.J. Detecting differential item functioning using logistic regression procedures. J. Educ. Meas. 1990, 27, 361–370. [Google Scholar] [CrossRef]
  85. Bulut, O.; Suh, Y. Detecting multidimensional differential item functioning with the multiple indicators multiple causes model, the item response theory likelihood ratio test, and logistic regression. Front. Educ. 2017, 2, 51. [Google Scholar] [CrossRef]
  86. Robitzsch, A. Linking Error in the 2PL Model. J 2023, 6, 58–84. [Google Scholar] [CrossRef]
  87. Robitzsch, A. Robust and Nonrobust Linking of Two Groups for the Rasch Model with Balanced and Unbalanced Random DIF: A Comparative Simulation Study and the Simultaneous Assessment of Standard Errors and Linking Errors with Resampling Techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
  88. Hambleton, R.K. Good practices for identifying differential item functioning. Med. Care 2006, 44, S182–S188. [Google Scholar] [CrossRef] [PubMed]
  89. Napierala, M.A. What is the Bonferroni correction. AAOS Now 2012, 6, 40. [Google Scholar]
  90. French, B.F.; Finch, W.H. Transforming SIBTEST to Account for Multilevel Data Structures. J. Educ. Meas. 2015, 52, 159–180. [Google Scholar] [CrossRef]
  91. Shealy, R.; Stout, W. A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika 1993, 58, 159–194. [Google Scholar] [CrossRef]
Table 1. Participant Demographic Characteristics.
Table 1. Participant Demographic Characteristics.
VariableRespondents (n = 511)
Age mean (SD)20.41 (2.49)
Gender n (%)
Male74 (14.5%)
Female437 (85.5%)
Table 2. MIRT Estimates for Item Discrimination and Intercept Parameters.
Table 2. MIRT Estimates for Item Discrimination and Intercept Parameters.
Item a 1 a 2 d 1 d 2 d 3 d 4 d 5 d 6
PN-SMD 12.4425.5863.5752.7030.076−1.635−4.794
PN-SMD 22.3865.6203.4221.873−0.360−2.020−4.803
PN-SMD 32.2685.9533.5492.1780.255−1.608−4.355
PN-SMD 42.8937.6105.5203.6380.916−1.239−4.900
PN-SMD 52.6796.9964.6762.9620.141−1.833−5.148
PN-SMD 62.2906.7364.7653.2851.278−0.337−3.583
PN-SMD 72.5906.6794.8763.3150.827−1.018−4.165
PN-SMD 81.9692.4900.105−1.474−2.963−4.551−6.364
PN-SMD 92.4913.4190.813−0.746−2.288−3.797−5.626
PN-SMD 103.9003.979−0.147−2.129−4.541−6.351−8.718
PN-SMD 113.3503.8640.052−1.774−3.594−5.787−8.458
PN-SMD 123.5903.844−0.246−2.150−4.300−6.198−8.425
PN-SMD 132.6773.0580.244−1.232−2.789−4.562−7.846
PN-SMD 143.4333.260−0.926−2.553−4.598−6.432−7.999
Note. a 1 = discrimination index for positive subscale, a 2 = discrimination index for negative subscale, d n = intercepts.
Table 3. DIF Items and the Corresponding z Values in the Last Iteration.
Table 3. DIF Items and the Corresponding z Values in the Last Iteration.
DIF Itemβσ (β)z p
PN-SMD 1 (interesting)−0.1220.033−3.6440.000
PN-SMD 2 (full)−0.0800.032−2.4760.013
PN-SMD 5 (good)−0.0740.037−1.9730.048
PN-SMD 9 (lonely)0.0690.0292.3470.019
PN-SMD 10 (discouraging)−0.0500.024−2.1000.036
Note. β = severity parameter, σ (β) = standard error (severity parameter), z = β/σ (β).
Table 4. Confirmatory Factor Analysis of PN-SMD models.
Table 4. Confirmatory Factor Analysis of PN-SMD models.
Modelχ2dfχ2/dfRMSEA
[90% CI]
CFITLISRMR
14-item model463.414766.100.100 [0.091~0.109]0.9180.9020.044
9-item model126.580264.870.087 [0.072~0.102]0.9600.9450.031
9-item model *85.733253.430.069 [0.053~0.085]0.9760.9660.024
Note. Nine-item model *: correlated PN-SMD 3 and PN-SMD 4.
Table 5. Correlation for seven-item subscales in relation to newly subscales.
Table 5. Correlation for seven-item subscales in relation to newly subscales.
ScaleP-SMD (4 Item)N-SMD (5 Items)P-SMD (7 Items)N-SMD (7 Items)
α = 0.857α = 0.889α = 0.904α = 0.921
P-SMD (4 item)
N-SMD (5 item)−0.457 **
P-SMD (7 item)0.951 **−0.486 **
N-SMD (7 item)−0.464 **0.981 **−0.502 **
Positive Affect 0.307 **−0.125 **0.377 **−0.131 **
Negative Affect −0.238 **0.447 **−0.241 **0.451 **
Resilience0.389 **−0.392 **0.393 **−0.388 **
SWLS0.390 **−0.263 **0.438 **−0.276 **
RSE0.376 **−0.427 **0.420 **−0.443 **
GSE0.237 **−0.184 **0.280 **−0.172 **
** Correlation is significant at the 0.01 level (2-tailed).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fung, S.-f.; Jin, J. Gender-Based Differential Item Function for the Positive and Negative Semantic Dimensions of the Relationship Satisfaction Scale with Item Response Theory. Behav. Sci. 2023, 13, 825. https://doi.org/10.3390/bs13100825

AMA Style

Fung S-f, Jin J. Gender-Based Differential Item Function for the Positive and Negative Semantic Dimensions of the Relationship Satisfaction Scale with Item Response Theory. Behavioral Sciences. 2023; 13(10):825. https://doi.org/10.3390/bs13100825

Chicago/Turabian Style

Fung, Sai-fu, and Jiahui Jin. 2023. "Gender-Based Differential Item Function for the Positive and Negative Semantic Dimensions of the Relationship Satisfaction Scale with Item Response Theory" Behavioral Sciences 13, no. 10: 825. https://doi.org/10.3390/bs13100825

APA Style

Fung, S. -f., & Jin, J. (2023). Gender-Based Differential Item Function for the Positive and Negative Semantic Dimensions of the Relationship Satisfaction Scale with Item Response Theory. Behavioral Sciences, 13(10), 825. https://doi.org/10.3390/bs13100825

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop