Development and Calibration of the PREMIUM Item Bank for Measuring Respect and Dignity for Patients with Severe Mental Illness

Most patient-reported experience measures (PREMs) are paper-based, leading to a high burden for patients and care providers. The aim of this study was to (1) calibrate an item bank to measure patients’ experience of respect and dignity for adult patients with serious mental illnesses and (2) develop computerized adaptive testing (CAT) to improve the use of this PREM in routine practice. Patients with schizophrenia, bipolar disorder, and major depressive disorder were enrolled in this multicenter and cross-sectional study. Psychometric analyses were based on classical test and item response theories and included evaluations of unidimensionality, local independence, and monotonicity; calibration and evaluation of model fit; analyses of differential item functioning (DIF); testing of external validity; and finally, CAT development. A total of 458 patients participated in the study. Of the 24 items, 2 highly inter-correlated items were deleted. Factor analysis showed that the remaining items met the unidimensional assumption (RMSEA = 0.054, CFI = 0.988, TLI = 0.986). DIF analyses revealed no biases by sex, age, care setting, or diagnosis. External validity testing has generally supported our assumptions. CAT showed satisfactory accuracy and precision. This work provides a more accurate and flexible measure of patients’ experience of respect and dignity than that obtained from standard questionnaires.


Introduction
Severe mental illnesses (SMIs), including schizophrenia, bipolar disorder, and major depressive disorder, are associated with suboptimal quality of care that seems to worsen over time [1][2][3]. These illnesses are often unrecognized or misdiagnosed, leading first to a prolonged duration of untreated illness [4][5][6], increased risk of relapse and hospitalization [7][8][9], and subsequently, to poorer outcomes in treatment response, symptoms, and quality of life [10,11]. Care access is crucial for SMIs patients [12], and quality of care plays a major role in the chances to reach full-functional recovery [13,14]. It is therefore essential to measure the quality of care to identify areas in which changes are needed. The patient's perspective is now considered to be an important measure of the quality of care, and the use of patient-reported experience measures (PREMs) is recommended by many organizations worldwide [15]. Most PREMs are paper-based, are frequently too lengthy, and have fixed content, leading to a high burden on both patients and care providers, making such PREMs difficult to use in routine clinical practice [16]. Modern statistical methods based on item response theory (IRT) are used to develop item banks and computerized adaptive testing (CAT) to overcome some of these limitations [17][18][19]. In an item bank, since the items are calibrated by an IRT model, the scores resulting from the administration of different subsets of items can be compared with one another. These item banks can then be used to develop static short forms or CATs [17]. CATs allow administering only the most informative items for a given patient, thereby optimizing the precision of the assessment while reducing the length of the questionnaire and completion time [20]. Item banks and CATs have been developed in the field of mental health, but these have focused on health outcomes (e.g., quality of life [21]), and none are available for the experience of SMIs patients in the French psychiatric context. To improve the use of patient-reported measures in the field of mental health, the Patient-Reported Experience Measure for Improving qUality of care in Mental health (PREMIUM) French group is currently developing item banks for PREMs and associated CATs [22]. In our work, we have identified respect and dignity as an important dimension of the experience of adult patients with SMIs [23], as has the work carried out by the PAtient Reported Indicators Survey (PaRIS) working group of the organization for economic cooperation and development (OECD) [24].
The aim of this study was, therefore, to (1) calibrate an item bank to measure patients' experience of respect and dignity for adult patients with SMIs and (2) develop a CAT to improve the use of this PREM in routine practice.

Study Design and Setting
Data taken from a national, multicenter, cross-sectional study were used. Patients were recruited between January 2016 and November 2020 from inpatient and outpatient departments (including full-time hospitalization, part-time hospitalization, and outpatient care) of the Assistance Publique-Hôpitaux de Marseille, from the FondaMental Foundation's network of expert centers and through an online survey. All participants gave their informed consent before participating in the study. This study was approved by the competent ethics committee (CPP-Sud Méditerranée V, 12 November 2014, n • 2014-A01152-45).

Inclusion Criteria
The inclusion criteria for this study were as follows: a clinical diagnosis of schizophrenia, bipolar disorder, or major depressive disorder according to the DSM-5 [25]; inpatient or outpatient psychiatric care, regardless of current or previous care, duration, or severity of illness; age over 18 years and under 65 years; and ability to read and speak French without comprehension problems. The exclusion criteria of this study were as follows: mental retardation or decompensated organic illness; vulnerable persons (i.e., pregnant or nursing women, persons under legal protection measures, etc.); inability to complete a self-administered questionnaire; and withdrawal of consent.

Data Collection
The following data were collected: Socio-demographic data: sex; age; educational level; marital status; and occupational status.
Clinical data: main diagnosis (schizophrenia, bipolar disorder or major depressive disorder); duration of illness; psychological, social, and occupational functioning of an individual as measured using the Global Assessment of Functioning scale [26] (GAF, ranging from 0 to 100, with a higher score indicating better functioning); health-related quality of life (QoL) as measured using the medical outcome study 12-item Short Form (SF-12) [27], which describes 8 QoL dimensions: physical functioning (PF), social functioning (SF), role physical (RP), role emotional (RE), mental health (MH), vitality (VT), bodily pain (BP), general health (GH), and 2 composite scores for physical (PCS) and mental (MCS) quality of life (ranging from 0 to 100, with higher scores indicating better quality of life).
The respect and dignity item bank (PREMIUM-RD) includes 24 items as well as an overall satisfaction item ("Overall, you feel you were treated with respect and dignity") and a corresponding visual analog scale (VAS) (ranging from 0 to 10). All items were scored on a 5-point Likert scale ("strongly disagree", "disagree", "neither agree nor disagree", "agree", "strongly agree") with a "not applicable" response option. The coding of negatively worded items was reversed so that higher scores indicated a greater experience of respect and dignity shown by mental health professionals. The assessment period referred to the four weeks prior to administration.

Statistical Analysis
The general steps for the development of the item banks and associated CATs of the PREMIUM project have been described in detail previously [22]. Based on rigorous and well-established methodology [17,28,29], the procedure was divided into four steps: (1) conceptual work and definition of domain mapping; (2) item selection; (3) item bank calibration and CAT simulations; and (4) CAT validation. In this article, we report the main results of the third step for one of the seven PREMIUM item banks [21], which measured respect and dignity (PREMIUM-RD).

Descriptive Analysis
The 24 items of the PREMIUM-RD bank were first subjected to descriptive analysis, and items that presented (1) high missing value rates (>70%); (2) extreme skewness (>95% of response rate in one category or an absolute coefficient > 4); or (3) inter-item correlation coefficients higher than 0.70 were excluded. Internal consistency was evaluated by calculating Cronbach's alpha coefficient, with α > 0.70 considered to be acceptable [30].

Evaluation of the Assumptions of the IRT Model
Use of an IRT model requires that the key assumptions underlying the IRT framework be fulfilled, including (i) unidimensionality, (ii) local independence, and (iii) monotonicity [19].
The unidimensionality assumption was evaluated based on a 1-factor confirmatory factor analysis (CFA) with the weighted least square mean and variance (WLSMV) estimator due to the ordinal nature of the data [28]. The following indices were used to assess the goodness-of-fit of the model to the data, with an acceptable fit defined by the root mean square error of approximation (RMSEA) ≤ 0.08, the comparative fit index (CFI), and the Tucker-Lewis index (TLI) ≥ 0.95 [31,32]. If the CFA showed poor fit, an exploratory factor analysis (EFA) was performed after randomly dividing the entire sample into 2 subsamples (n = 229 for EFA and n = 229 for CFA). The number of factors to be kept was based on the Kaiser-Guttman's rule (eigenvalues ≥ 1), differences in the magnitude of eigenvalues between factors (a ratio greater than 4 is expected), the scree test (looking for an "elbow" in the curve), parallel analysis and factor loadings (with minimum item loadings set at 0.40) [28]. Next, to investigate whether item responses are sufficiently unidimensional for IRT application, we used a bifactor model [33]. The bifactor model assumes one general factor (in this case, experience of respect and dignity), onto which all items load, and several group factors, onto which unique subsets of items load [33]. The percentage of explained common variance (ECV) and the omega hierarchical (ω h /ω hs ) coefficients accounted for by the general factor and by group factors were calculated, with an expected ω h coefficient for the general factor greater than or equal to 0.70 and the expected percentage of ECV for the general factor greater than or equal to 60% to support unidimensionality [34,35].
Local independence was examined using residual correlations from the final CFA model. All residual correlations greater than 0.20 (or 0.25) indicated possible local depen-dence, leading to the deletion of the item with the highest residual correlation with other items in the bank [36,37].
Finally, monotonicity was evaluated by visual inspection of item characteristic curves (ICCs), with each response category expected to have a maximum probability of being selected on a specific range of the latent trait continuum. If two categories were not sufficiently discriminative for a particular item, they were collapsed, and the resulting model was re-estimated. The deviations of the Akaike information criterion (AIC) [38] and the Bayes information criterion (BIC) [39] between the final model (recoded items) and the initial model (no recoded items) were computed to ensure that the recoding process resulted in a substantial improvement in the model.

Calibration and Fitting of an IRT Model to the Data
The generalized partial credit model (GPCM) was used to calibrate the responses to the items [40]. The GPCM is appropriate for items with ordered polytomous response options (such as Likert scales). In the GPCM, each item has a discrimination parameter (i.e., the ability to distinguish among individuals with different levels of a latent trait) and a set of threshold parameters (i.e., the item's difficulty). The GPCM is a generalization of the partial credit model (PCM), in which the discrimination parameter is equal across all items [41]. The likelihood ratio test [42], as well as the information criteria AIC [38] and BIC [39], were calculated and compared to select the IRT model that best fit the data. The item parameters (discrimination and thresholds) were then estimated under the selected model.
Item parameters were estimated using the maximum marginal likelihood estimation (MMLE) implemented via the expectation-maximization (EM) algorithm [43]. Items with a discrimination parameter below 0.50 were also considered problematic [44,45], as they were not sufficiently informative and were thus removed from the item bank. Next, the goodness-of-fit was evaluated by computing the infit mean square (Infit MnSq) statistic [46], with an expected value in the range [0.7-1.3] [47].

Evaluation of Differential Item Functioning (DIF)
Differential item functioning (DIF) analyses were carried out to see if all items in the PREMIUM-RD bank in the same way across different subgroups [48,49], identified by sex (men vs. women), age (median split: patients 37 years or younger vs. patients older than 37 years), care setting (outpatient vs. inpatient), and psychiatric diagnosis (schizophrenia vs. bipolar disorder vs. major depressive disorder). If an overall DIF was detected at a level of p < 0.01, the magnitude was assessed according to Zumbo's DIF classification by computing the pseudo R 2 change (∆R 2 ):negligible if ∆R 2 < 0.13, moderate if 0.13 < ∆R 2 < 0.26, and large if ∆R 2 > 0.26 [50]. Items with a large DIF were excluded from the item bank.
Latent trait scores (θ) for each respondent were estimated by Bayesian expected a posteriori (EAP) estimation [51]. Then, a linear transformation was performed to have θ scores ranging from 0 to 100 (the higher the score was, the better the experience of respect and dignity). Item and test information were also calculated.

External Validation of the Item Bank
External validity was examined by hypothesizing that PREMIUM-RD scores should be positively and moderately correlated with GAF scores and SF-12 dimension scores, but also positively and strongly correlated with scores on the overall satisfaction item and the corresponding VAS. Discriminant validity was examined by testing the association of PREMIUM-RD scores with socio-demographic (i.e., age, sex, educational level, marital status, and employment status) and clinical (i.e., care setting, duration of illness, and main diagnosis) characteristics using t-tests, analysis of variance (ANOVA), and Pearson's correlation coefficients.

Elaboration of Item Administration algorithm
CAT simulations were performed using both real response data (i.e., complete response patterns to items in the final PREMIUM-RD item bank) and simulated data (i.e., after imputation of plausible missing responses using IRT-based estimation).
The CAT algorithm began by selecting the starting item based on the maximum Fisher information (MFI) criterion, which is relevant to polytomous items and adapted to a unidimensional item bank [52]. Based on the response to this item, an initial latent trait estimate (θ) was computed using the EAP estimate [51]. The CAT algorithm then selected as the next item the item with the highest information for the current θ estimate. The θ estimate was iteratively re-estimated based on the responses to previous items using the EAP estimate. Finally, the CAT algorithm ended when the stopping rule used was reached, which corresponded to the prespecified level of measurement precision based on the standard error of measurement (SEM) [53]. An acceptable range was defined as 0.33 to 0.55, corresponding to reliability coefficients between 0.90 and 0.70 [53]. Three scenarios with different stopping rules corresponding to SEM values of 0.33, 0.44, and 0.55 were simulated and compared using the following accuracy and precision indicators: correlation coefficients (r) between CAT scores and scores based on the full set of items in the bank with expected values greater than or equal to 0.90 and the root mean square error (RMSE) with expected values less than or equal to 0.30 [54].

Description of the Cohort
The sample included 458 SMIs patients; the majority of the patients were men (61%), single (76%), with an education level of bachelor's degree or higher (68%), and unemployed (75%). Most of them were outpatients (84%), and among the inpatients (16%), 30% were under constraint. The mean age was 38.1 years (SD ± 12.0). Approximately 65% of patients had a main diagnosis of schizophrenia, while 20% and 15% had a diagnosis of bipolar disorder or major depressive disorder, respectively. The mean duration of illness was 12.3 years (SD ± 8.6). The socio-demographic and clinical characteristics of the sample are presented in Table 1.

Descriptive Analysis
For the initial 24-item pool, the mean ranged from 2.36 ± 1.26 to 3.50 ± 0.82. The floor and ceiling effects ranged from 0.4 to 7.2% and from 15.1 to 60.0%, respectively. Each item had an acceptable skewness coefficient (ranging from −2.15 to −0.35), and missing values ranged from 0.0% to 30.8%. Inter-item correlation coefficients ranged from 0.16 to 0.78 (all with p < 0.001, data not shown). Following this step, items 5, 15, and 17 were discarded from the item bank because they exhibited inter-items correlations that were too high (>0.72), reflecting redundancy between items. These characteristics are presented in Table 2.

Evaluation of the Assumptions of an IRT Model
The fit indices of the one-factor CFA model were not adequate (RMSEA = 0.106, 95% CI [0.097-0.115], CFI = 0.942 and TLI = 0.935). In the EFA, the eigenvalue of the first factor was 12.0, and the eigenvalue of the second factor was 1.9. The ratio between the first and second eigenvalues was 6.3, and the total amount of variance explained by the first factor was 57.3%. The scree plot and parallel analysis revealed two predominant factors, and all items loaded suitably on the first factor (>0.40). Additionally, 17 of the 21 items in the bank were recoded after examining the item characteristic curves (ICCs). The deviations (final model-initial model) of the AIC and BIC were −4655.00 and −4795.32, respectively, indicating an overall improvement in model fit. Next, we tested a bifactor structure with a general factor and two group factors, which showed adequate fit indices (RMSEA = 0.054, 95% CI [0.042-0.065], CFI = 0.988, TLI = 0.986) and a predominance of the general factor with a reasonable loading of all items (>0.40). The ωh coefficient for the general factor was 0.88, and those for the first and second group factors were 0.13 and 0.20, respectively. The percentage of ECV attributable to the general factor was 82.0%, while the remaining 18.0% was attributable to the group factors (10.7% and 7.3% attributable to the first and second group factors, respectively). All items had higher factor loadings on the general factor than on the group factors, indicating that the items predominantly reflected the general factor. Taken together, these findings suggest that the PREMIUM-RD item bank reflects an essentially unidimensional construct. Cronbach's alpha was 0.94, and no residual correlation was greater than 0.20. Consequently, all 21 items in the PREMIUM-RD item bank met the requirements for IRT modeling and were kept for further analysis.

Calibration and Fitting of an IRT Model to the Data
The partial credit model (PCM) and the generalized partial credit model (GPCM) were used to calibrate the 21 items of the PREMIUM-RD item bank. The fit indices of the PCM were less adequate than those of the GPCM (14,546.64 and 14,249.58 for the AIC and 14,757.11 and 14,542.59 for the BIC, respectively), and the likelihood ratio test indicated a better fit of the GPCM compared with the PCM, X 2 = 337.06, p < 0.001. As a result, we decided to use the GPCM to calibrate the PREMIUM-RD item bank. All items showed an adequate fit to the GPCM with respect to infit values ranging from 0.74 (item RD24) to 1.03 (item RD11). The discrimination parameters ranged from 0.68 (item RD7) to 3.35 (item RD1), and the threshold parameters ranged from −2.21 (item RD6) to 0.66 (item RD3) (Appendix A). Taken together, these results demonstrated that all items had moderate to very high discriminative power and that the threshold parameters reflected a broad spectrum of the latent trait, although there were relatively few items at the upper end of the continuum.
The test information curve of the final PREMIUM-RD item bank is provided in Figure 1 and shows that the items have a high measurement precision over a broad spectrum of the latent trait (78.7% of total information is included in the [−2, 1] range of the latent continuum values) and that this precision is lower only for patients at the extremes, especially at the upper extreme (i.e., above 1). Item 19 was the most informative of the bank ("You felt confident"), whereas item 7 was the least informative ("You were embarrassed to have to answer intrusive questions"). an essentially unidimensional construct. Cronbach's alpha was 0.94, and no residual correlation was greater than 0.20. Consequently, all 21 items in the PREMIUM-RD item bank met the requirements for IRT modeling and were kept for further analysis.

Calibration and Fitting of an IRT Model to the Data
The partial credit model (PCM) and the generalized partial credit model (GPCM) were used to calibrate the 21 items of the PREMIUM-RD item bank. The fit indices of the PCM were less adequate than those of the GPCM (14,546.64 and 14,249.58 for the AIC and 14,757.11 and 14,542.59 for the BIC, respectively), and the likelihood ratio test indicated a better fit of the GPCM compared with the PCM, X 2 = 337.06, p < 0.001. As a result, we decided to use the GPCM to calibrate the PREMIUM-RD item bank. All items showed an adequate fit to the GPCM with respect to infit values ranging from 0.74 (item RD24) to 1.03 (item RD11). The discrimination parameters ranged from 0.68 (item RD7) to 3.35 (item RD1), and the threshold parameters ranged from −2.21 (item RD6) to 0.66 (item RD3) (Appendix A). Taken together, these results demonstrated that all items had moderate to very high discriminative power and that the threshold parameters reflected a broad spectrum of the latent trait, although there were relatively few items at the upper end of the continuum.
The test information curve of the final PREMIUM-RD item bank is provided in Figure  1 and shows that the items have a high measurement precision over a broad spectrum of the latent trait (78.7% of total information is included in the [−2, 1] range of the latent continuum values) and that this precision is lower only for patients at the extremes, especially at the upper extreme (i.e., above 1). Item 19 was the most informative of the bank ("You felt confident"), whereas item 7 was the least informative ("You were embarrassed to have to answer intrusive questions").

Evaluation of Differential Item Functioning (DIF)
Of the 84 tests performed (21 items with 4 confounding factors), 6 exhibited overall DIF. Following Zumbo's DIF classification, no items were flagged for moderate or large DIF magnitudes, and only a few items were flagged for negligible DIF magnitudes: 1 item for diagnosis (item RD11) and 5 items for care setting (items RD1, RD2, RD12, RD20, and

Evaluation of Differential Item Functioning (DIF)
Of the 84 tests performed (21 items with 4 confounding factors), 6 exhibited overall DIF. Following Zumbo's DIF classification, no items were flagged for moderate or large DIF magnitudes, and only a few items were flagged for negligible DIF magnitudes: 1 item for diagnosis (item RD11) and 5 items for care setting (items RD1, RD2, RD12, RD20, and RD22). Given the negligible impact of these DIFs on the estimation of experience of respect and dignity, no items were deleted from the item bank (Appendix B).

External Validity of the Item Bank
The mean PREMIUM-RD score was 55.40 ± 22.43. Age was weakly correlated with PREMIUM-RD scores, and PREMIUM-RD scores were significantly higher for women and for non-single individuals. No significant differences were found by educational level, employment status, and care setting. Additionally, PREMIUM-RD scores were weakly correlated with GAF scores, and no correlation was found with duration of illness. PREMIUM-RD scores were significantly different according to the main diagnosis, with the highest scores observed for individuals with major depressive disorder and the lowest scores observed for patients with schizophrenia. PREMIUM-RD scores were strongly correlated with scores on the item measuring overall satisfaction with respect and dignity and the corresponding VAS. Finally, scores were weakly correlated with scores on SF-12 dimensions measuring physical functioning (PF), social functioning (SF), role physical (RP), role emotional (RE), vitality (VT), general health (GH), and composite scores of physical quality of life (PCS) and mental quality of life (MCS). Conversely, no correlation was observed between the dimensions of mental health (MH) and bodily pain (BP). The results regarding the external validation of the PREMIUM-RD item bank are presented in Table 3.

Elaboration of Item Administration Algorithm
Among the 3 scenarios tested, the CAT simulation with a level of precision of SEM < 0.33 was the most efficient, having the highest levels of accuracy (r = 0.97) and precision (RMSE = 0.23) while administering less than half of the items of the PREMIUM-RD item bank (on average 9 items). The other 2 simulations were not satisfactory, with a level of precision lower than expected (0.34 and 0.38, respectively), despite an adequate level of accuracy (r = 0.94 and r = 0.92, respectively) and a smaller average number of items administered (6 and 4 items, respectively). Table 4 provides the results of the CAT simulations.

Discussion
This work is part of the French PREMIUM initiative [22], which aims to provide a common measurement system for adult patients' experience of care for three targeted conditions (including schizophrenia, bipolar disorder, or major depressive disorder) and is applicable in several care settings (i.e., outpatient and inpatient) based on item banks and CATs. In this article, we present the calibration of the PREMIUM-RD item bank and the development of the associated CAT, which captures all important aspects of the patients' experience of respect and dignity during a hospital stay or consultation. Other item banks and associated CATs are under development by the PREMIUM project.
The final 21-item PREMIUM-RD item bank demonstrated strong psychometric properties (Appendix C). In particular, the assumptions required for IRT modeling (unidimensionality, local independence, and monotonicity) were fulfilled, and the GPCM showed an adequate fit to the data. The few indications of DIF were of negligible magnitude according to sex, age, care setting, and diagnosis. The item bank provides good information for a wide range of the latent continuum, although it may lack precision for patients at the highest extreme (i.e., for patients with high experience of respect and dignity). However, given that the goal of the measure is to identify aspects of patient experience that are suboptimal and therefore need to be improved, items that accurately distinguish patients with high experience of respect and dignity would be of limited interest; such items could be added at a later date if necessary. Additionally, this study provided preliminary evidence of the external validity of the PREMIUM-RD item bank. In particular, PREMIUM-RD scores were weakly correlated with GAF scores and SF-12 dimension scores, which is consistent with previous research that has shown a positive but weak association between patients' experience and their outcomes [62,63]. In other words, PREMs provide important information for improving quality of care by identifying areas where change is needed, but these measures must be complemented by patient-reported outcome measures (PROMs) to provide a complete picture of quality of care from the patient's perspective and support a patient-centered approach to care.
In addition to standard clinical indicators, patient experience is considered to be a valuable indicator of the quality of health care [64,65], and the use of PREMs is recommended by many organizations worldwide [66,67], but these data are not systematically collected in clinical psychiatric practice. In particular, organizational barriers, including a lack of time or resources to collect and analyze the data, have been reported in the literature [68]. The use of new technologies, based on item banks and CATs, has the potential to improve the use of these measures in routine clinical practice [19,20,28]. The PREMIUM-RD-CAT is the first adaptive PREM specific to adult patients with SMIs (i.e., schizophrenia, bipolar disorder, and major depressive disorder), which, unlike standard fixed-length questionnaires, administers only the most relevant items to the respondent, thereby reducing questionnaire completion time, increasing precision and providing a real-time score, thus minimizing patient and provider burden. The PREMIUM-RD-CAT, based on a level of precision of SEM < 0.33, showed adequate precision and accuracy, with correlations greater than 0.90 with scores based on the full bank of items and an RMSE less than 0.30. The validity of the PREMIUM-RD-CAT and its acceptability by stakeholders will be evaluated further in future analyses.

Implications for Clinical Practice
The use of a digital platform has great potential to improve the quality of mental health care. All adult patients with SMI will have the opportunity to complete a questionnaire after a hospital stay or consultation. Collecting data directly from patients will limit potential response bias and improve representativeness. This real-time feedback to mental health professionals has the potential to strongly improve the quality of care and decrease financial and human costs by optimizing patients' care adherence, continuity of care, and improving health outcomes [63,69]. A digital platform will also allow for early identification of patients at risk of relapse and prevent potential care disruption, especially in the context of the COVID-19 pandemic. Finally, the digital platform will provide aggregated data for benchmarking within and across healthcare facilities and/or services. The financial allocation of psychiatry is currently based on activity, on the one hand, and on sociodemographic population data on the other hand [70]. In the future, funding could be modulated by the results of patients' experience, within the framework of a quality-based financial allocation (IFAQ program-'Incitation financière à l'amélioration de la qualité'-Financial Incentive to Quality Improvement [67].

Limitations
First, our sample size can be discussed. However, the sample size was sufficiently large to calibrate the item pool [71,72], and the sample included a diverse patient population, both inpatient and outpatient, from several facilities in different geographic regions of the country. Future work with a larger sample will improve the generalizability of the PREMIUM-RD item bank. Second, the selection of the IRT model could be discussed. In this study, we used the GPCM, although other models could have been used. Like the EORTC initiative [73], we used the GPCM rather than the GRM, both of which tend to yield similar results and should be viewed rather as alternatives than as competitors [74,75]. Additionally, given that the PCM is nested within the GPCM (in the PCM, all items have the same slope), their fit to the data can be compared. The GPCM generally provides a better fit to the data than a more parsimonious model such as the PCM. Third, the construct validity of the PREMIUM-RD item bank was assessed by examining relationships with GAF scores and SF-12 dimension scores. A high rate of missing data may have impacted the validity of our results. However, our results were consistent with our underlying assumptions. Additional work should be conducted to further assess the external validity of the PREMIUM-RD item bank. Finally, in this study, all items were administered to participants as part of a complete item bank, which is different from the administration of the CAT. Because the assessment of the precision and accuracy of the scores was based on the data used to calibrate the IRT model, these indicators may have been overestimated. Future work should assess the precision, accuracy, and validity of the PREMIUM-RD scores based on adaptive item administration using an independent sample.

Conclusions
The PREMIUM project aims to develop item banks of PREMs and CAT specific to SMIs (including schizophrenia, bipolar disorder, and major depressive disorder). This work reported satisfactory psychometric characteristics for respect and dignity measured using the PREMIUM-RD item bank, and the associated CAT showed a satisfactory level of precision, allowing for more accurate and flexible measurement of patient experience than that achieved by standard questionnaires. The use of advances in psychometric modeling and computer technologies will help improve the use of patient-reported measures in routine practice, thereby promoting a culture of patient-centered care. Funding: This work is supported by an institutional grant from the French National Program on the Performance of the Health-care System (PREPS, financed by Direction Générale de l'Offre de Soins, 14, avenue Duquesne, 75350 Paris, France) and the Agence Technique de l'Information sur l'Hospitalisation (ATIH). The sponsors have no role in study design; collection, analysis, and interpretation of data; report writing; or the decision to submit the article for publication.

Institutional Review Board Statement:
The trial registration is NCT02491866. The study is being carried out in accordance with ethical principles for medical research involving humans. The assessment protocol was approved by the relevant ethical review board (CPP-Sud Méditerranée V, n • 2014-A01152-45). All data are collected anonymously.

Informed Consent Statement:
As this study includes data coming from regular care assessments, informed consent (non-opposition form) was given by all participants.

Data Availability Statement:
The data are available on demand from the PREMIUM Scientific Committee.

Conflicts of Interest:
The authors report no conflict of interest in this work.

RD10
You felt that you were not "taken seriously" Vous avez l'impression de ne pas « être pris(e) au sérieux »

RD11
You felt that the time spent with you was sufficient Vous avez eu l'impression que le temps qui vous a été consacré était suffisant

RD13
You have been treated as a "whole person" Vous avez été traité(e) comme un « individu à part entière » RD14 You felt like you were spoken to as an equal Vous avez ressenti que l'on vous parlait d'« égal à égal » RD16 Your opinions have been taken into account Vos opinions ont été prises en compte RD18 Your rights have been respected Vos droits ont été respectés RD19 You felt confident Vous vous êtes senti(e) en confiance

RD20
You think you have received all important information regarding your care Vous pensez avoir reçu toutes les informations importantes sur votre prise en charge

RD21
You think you have been involved in all important decisions regarding your care Vous pensez avoir été impliqué(e) dans les décisions importantes de votre prise en charge

RD22
You knew who to talk to when necessary Vous avez su à qui vous adresser quand vous en avez eu besoin

RD23
Your care has helped you to improve your well-being Votre prise en charge vous a aidé(e) à améliorer votre bien-être

RD24
Your care has met your expectations and needs Votre prise en charge a répondu à vos attentes et vos besoins