Patient-Reported Outcome Measures and Risk Factors in a Quality Registry: A Basis for More Patient-Centered Diabetes Care in Sweden

Diabetes is one of the chronic diseases that constitute the greatest disease burden in the world. The Swedish National Diabetes Register is an essential part of the diabetes care system. Currently it mainly records clinical outcomes, but here we describe how it has started to collect patient-reported outcome measures, complementing the standard registry data on clinical outcomes as a basis for evaluating diabetes care. Our aims were to develop a questionnaire to measure patient abilities and judgments of their experience of diabetes care, to describe a Swedish diabetes patient sample in terms of their abilities, judgments, and risk factors, and to characterize groups of patients with a need for improvement. Patient abilities and judgments were estimated using item response theory. Analyzing them together with standard risk factors for diabetes comorbidities showed that the different types of data describe different aspects of a patient’s situation. These aspects occasionally overlap, but not in any particularly useful way. They both provide important information to decision makers, and neither is necessarily more relevant than the other. Both should therefore be considered, to achieve a more complete evaluation of diabetes care and to promote person-centered care.

a questionnaire to measure patient abilities and judgments of their experience of diabetes

Introduction
Diabetes is one of the chronic diseases that constitute the greatest disease burden in the whole world [1]. It has a significant impact on daily life, and patients must be engaged in their disease and its treatment. In Sweden about 4%-5% of the population have diabetes [2], and the majority of these (85%-90%) have type 2 diabetes. The Swedish National Diabetes Register (NDR) was founded in 1996 with the purpose of improving diabetes care, reducing morbidity due to diabetes, and enabling comparisons between caregivers of a number of clinical outcome measures [3,4]. Currently, the NDR covers about 350,000 patients, constituting 85% of all Swedish diabetes patients. The registry has become an essential part of the daily routines of healthcare centers across Sweden. It is used as a tool for evaluating and improving diabetes care, but until recently has mainly recorded clinical outcomes only. In this article, we describe how the registry has started to collect patient-reported outcome measures, and how these data have been used together with registry data on clinical outcome measures as a basis for evaluating diabetes care.
The standard way of evaluating diabetes care is to monitor diabetes comorbidity risk factors or biomarkers such as glycated hemoglobin level (HbA 1c ) to keep them within defined treatment targets. While these clinical risk factors are important, they may not capture all relevant aspects of diabetes care evaluation. If the evaluation of diabetes care relies only on clinical outcomes, patient preferences are ignored. This is unreasonable, since health-related quality of life and welfare should naturally play a vital role. In this article, we attempt to add the patient perspective by including measurements of patients' abilities and patients' judgments of the diabetes care provided.
Previous research indicates that using clinical outcomes and patients' abilities and judgments simultaneously could improve evaluation of diabetes care, for example by reducing problems associated with the physician acting as a double agent by attempting to act both in the best interest of the patient, and in the interest of the funder of the healthcare services [5]. These two interests may be in conflict, and in the worst case may make the physician an imperfect agent for the patient. Further, there could be synergistic effects in outcomes. If patients are not fully informed, they tend to rely on their own experience rather than the physician's advice [6]. Good access to information should improve confidence and thereby reduce gaps between the patient's own opinion and the physician's advice. Hence, efforts that would lead to more positive judgment of the information provided by healthcare, presumably through actually improving how information is given to patients, might eventually lead to improved risk factor control.
Specifically in diabetes, there appear to be barriers to the routine clinical use of data on patients' abilities and judgments, including the lack of reference points to which these data can be compared, the tendency of low sensitivity to therapeutic change, and the fact that these data are patient ratings of their current health which do not have the same established link to future health outcomes that clinical risk factors have [7][8][9]. However, important aspects influencing living with diabetes include diabetes self-management ability [10], coping in everyday life [11], fear of hypoglycemia [12], and fear of future complications [13]. Furthermore, patients' beliefs and attitudes about the disease, cultural aspects, social support, co-morbidities, and financial resources have been identified as barriers to effective diabetes self-management [10]. These barriers need to be addressed to improve the quality of diabetes care, as they tend to obstruct self-management [10]. Another important aspect is how patients experience their contacts with their healthcare provider [14]. Diabetes care is responsible for offering patients evidence-based care which respects their preferences (i.e. person-centered care) [15]. The American Diabetes Association cites effective self-management and quality of life as essential outcomes to be evaluated as part of diabetes care [15]. Diabetes teams in Sweden believe they work as person-centered teams, but evaluations from different government studies have shown the opposite [16]. Therefore, it seems worthwhile to demonstrate successful collection and use of data on patients' abilities and judgments together with clinical outcome measures in diabetes.

Objectives
The overall purpose of the present work was to include patient reported outcomes in NDR's evaluation of diabetes care. We hereby hope to accomplish a more complete evaluation of diabetes care and its effects, in a way suitable for managing individual patients and for comparing healthcare providers (see discussion). There were three specific aims:  To develop a questionnaire so that patients' abilities and their judgments of diabetes care can be measured on a scale suitable for detecting and quantifying change.  To describe a Swedish diabetes patient sample in terms of their abilities and judgments together with registry data on their risk factors, and to study associations between different patient abilities, judgments and risk factors.  To characterize groups of patients with low abilities, low judgments, or high risk factor levels; namely, groups with a need for improvement. The intention here was to illustrate how the questionnaire can be used in practice, and to see what groups could be identified using ad hoc levels of abilities, judgments, or risk factors. In practice, detecting such patients might trigger some kind of response in the form of an intervention. In the following, we refer to these levels as ad hoc response levels.
With these three goals fulfilled, we can use both clinical outcomes and patients' abilities and judgments when we evaluate care. If we were to compare two groups of patients undergoing, for example, different modes of care or different treatments, and one of them appears better, then this would generate hypotheses about why one is more successful than the other. This would then need to be further analyzed to see whether there is some factor to be discovered that explains good and bad outcomes.
We used item response theory (IRT) in order to obtain values for the abilities and judgments, and to be able to detect changes in these over time (e.g., the effect of an intervention). This method has several advantages when successfully applied; it is used to estimate latent constructs that cannot be observed directly, such as patient abilities. It reduces a multidimensional set of questions down to a single estimate of the latent construct. It also gives an indication on how well the latent construct is estimated. Finally, it is more robust to missing values than using responses to the questions directly.

Previous Work
There are other studies of similar combinations of patient abilities or judgments and registry data. Glenngard et al. investigated whether general-population individuals seeking healthcare had made an active choice of healthcare provider, and also sought to identify determinants of this choice [17]. They worked with registry data combined with population survey data, aggregated on the municipal level, including patients' judgments of primary care in terms of service, information, and access. A patient's judgment of the care might affect the decision to choose another healthcare provider, but we wish to work with this judgment as part of the dialogue between the healthcare staff and the patient; that is, before such a decision is made. A study by Janlov and Rehnberg of persons in the general population seeking healthcare used a combination of registry data and patient-reported judgments of the quality of care [18]. Their aim was to use these data to determine measures of quality of care. Both these studies used aggregated data on the level of the municipality or healthcare provider, but neither focused on the field of diabetes.
Work similar to ours was presented by Lundstrom et al. [19], but their study was concerned with visual disability in cataract surgery patients. Their purpose was to assess and improve an existing questionnaire to obtain a measure of visual disability for measuring the outcomes of cataract surgery. They used the clinical measure of visual acuity to validate their visual disability construct, and overall found their questionnaire to be highly valid and suitable for routine clinical use.
A number of instruments to collect patient-reported outcomes measures have been developed for diabetes to date [20]. Often, generic health-related quality of life instruments have been used together with diabetes questionnaires addressing treatment satisfaction, symptoms, or obstacles related to having diabetes. Our pilot questionnaire was developed by a group of experts, and was inspired by questionnaires that existed at the time, the literature, and clinical expertise. We hope to show that it can be a useful complement to the registration of clinical risk factors in the evaluation of diabetes care.

Methods
Our conceptual model consists of a set of patient reported outcomes in the form of patient abilities and judgments of diabetes care, and a set of risk factors. We chose to work with a number of abilities and aspects of judgment of the provided diabetes care, that we considered important.
The patients' abilities were:  Diabetes self-management or self-care ability (items 1-4): the patients are able to manage their diabetes treatment, or take any other necessary actions, by themselves. Patients with a good ability should be less limited by diabetes, and thus be more able to live the life they desire. This is an important determinant of diabetes-related quality of life [10,11,21,22]. Poor self-management has also been found to be associated with depression [23,24].  Sense of security despite having diabetes (items [23][24][25]. This ability could also be described as being free of diabetes-related worries. This is an important determinant of good health-related quality of life [12,13].  Ability to carry out daily activities such as social activities (items 29-30) and work-related activities (items [26][27][28]. A good ability indicates a well-functioning patient who has overcome any limitations imposed by the disease, which is an important determinant of good health-related quality of life [25][26][27][28]. Further, the ability to carry out work-related activities is a determinant of costs from a societal perspective as it affects costs of lost production due to illness [29].
From the literature [10,[30][31][32][33], we know that important judgments of diabetes care are:  The quality of the service provided (items 9-10, 19-22); for example, did the staff give the patient a good reception? Did they understand the patient's situation? This would affect the patient's confidence in healthcare and their compliance to treatment and lifestyle advice. Good communication should also make healthcare receptive to the individual needs of the patient.  The quality of the information given by the staff (items [11][12][13][14]. The visit to healthcare should be an important source of diabetes-related information for the patient. Poorly informed patients may do worse, as described above.  Access to physicians and nurses (items 5-8). Is it easy for the patients to get an appointment with their physician or diabetes nurse when they wish? A timely response to any problem that occurs should be beneficial for the patient.  Involvement in decisions regarding diabetes care (items 15-18), for example regarding medical treatment or diabetes self-care. This should build up the patient's confidence in healthcare and could be a cornerstone of developing their self-management ability.
We developed a patient questionnaire in order to measure these patient-reported outcomes, several of which are listed by the World Health Organization as important factors for a person's functioning, disability, and health [34].
In addition, we chose to work with a small set of central risk factors, namely HbA 1c , systolic blood pressure (SBP), and low-density lipoprotein cholesterol level (LDL). These have been shown to be predictive of cardiovascular disease in diabetes patients [8,9], and are associated with specific treatment targets, such as those included in Swedish diabetes guidelines: HbA 1c < 52 mmol/mol, SBP < 130 mmHg, and LDL < 2.5 mmol/L [35].
The abilities, judgments, and risk factors that we have listed here represent our view of the patients and their situation. Evaluation of diabetes care alternatives (e.g., treatments) based on these outcomes could offer a way to optimize the patients' outcomes. The outcomes we have listed do not necessarily constitute a complete list of measures worthy of optimizing, but they provide a manageable set of central measures suitable for trying out our ideas. Further development of our work might include additional measures.

Patient Questionnaire
We developed a diabetes-specific questionnaire with a total of 36 items. Thirteen items (1-4, 23-30, 36) were adapted from the DTSQ, Swe-DES-23, Swe-PAID-20, and SF-36 questionnaires to address self-care/self-management, self-perceived health, diabetes-related worries about metabolic control and complications, and daily social and work-related activities [36][37][38][39]. Nine additional items were constructed to address the patient's judgment of service provided, information given, access to physicians and nurses, and their own involvement in treatment decisions. Organization can vary between healthcare centers, particularly in terms of whether care is provided by physicians or nurses. Therefore these items asked both about physicians and about nurses, making a total of 18 items (items [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. Furthermore, five items addressing mobility, self-care, usual activities, pain/discomfort, and anxiety/depression from the EQ-5D form were included as a known reference instrument (items 31-35) [40]. All resulting items have five response levels, except those from EQ-5D which have three levels. All items have ordered response levels. The questionnaire was issued to patients in Swedish, but the descriptions here of the items and response levels are translated into English (see Appendix).

IRT Model
The responses to the patient questionnaire are ordered response levels on a number of questions related to each of the underlying abilities and judgments; for example, "Strongly disagree", "Agree", and so on. These were converted to numerical estimates using item response theory [41,42], a technique to estimate an underlying value, or latent variable, which cannot be observed directly. In our case, the patient abilities and judgments are considered latent variables. Items are grouped into scales, each related to one latent variable. IRT uses a scale model of the relationship between the latent variable and the responses to the items. Individual IRT scores are estimated from each patient's response patterns. The purpose of IRT is to obtain an estimate that has interval properties, so that the difference between two points on the scale gives a meaningful quantity. This is not achieved with raw summary scores based directly on the responses, such as the sum of the response levels that a patient has chosen for a set of items, as summing in this way does not take into account whether a given level on one item is more difficult to endorse than that of another item. IRT allows items, as well as item response levels, to have different difficulties.
We grouped the items into scales reflecting each of the abilities and judgments listed above. The definition and review of each scale, fitting a scale model, and review of its fit was undertaken according to a number of steps until a final set of scales was obtained. First we used non-parametric IRT in order to check for unidimensionality, local dependency, and latent monotonicity [43]. As another check of unidimensionality, a parallel factor analysis was performed on the item set of each scale [44,45]. Scales indicating the presence of more than one factor were subjected to factor analyses to identify the items belonging to a second factor.
We then proceeded with parametric IRT using a graded response model (GRM). The choice of a parametric scale model for polytomous items appears somewhat arbitrary, and often different choices produce nearly identical results [44,46]. In a GRM [47], the probability P of a response in category k or lower, in item i, given the latent variable z, is given by log(P/(1 − P)) = D i (z − E ik ), where D i is the discriminant parameter and the E ik are the extremity parameters. For k = 5, P is the complement to the other (k = 1,2,3,4). The GRM assumes local independence and latent monotonicity. We evaluated manifest monotonicity, which is a pragmatically valid test for latent monotonicity [43]. Next, GRMs were fitted and tested for equal discrimination [41]. The fit of items was evaluated with S − X 2 [48], and by comparing probabilities of endorsing an item under its scale model to the observed proportions, since the risk of wrongly flagging items for misfit increases with the number of observations, especially in scales with few items [49]. Overall model fit was evaluated using the root mean square error of approximation (RMSEA) index based on the M 2 statistic [50]. Hooper et al. present a review of various rules of thumb for the magnitude of RMSEA [51], and Milfont and Fischer present similar figures [52]. We adapted this by accepting a point estimate of RMSEA ≤ 0.10 as a fair fit. We examined differential item functioning [41] (i.e. whether items are equivalent in meaning to different respondents) between diabetes types and between young and old patients split by median age. We also examined whether any differential item functioning had meaningful impact [41,44].
For the judgment of care, we began with separate scales for nurses and for physicians. Depending on how care is organized at a healthcare center, either or both could be relevant. From the patients' point of view, however, it might not matter whether care is provided by physicians or nurses, and so we also constructed combined scales by taking, one item at a time, the least desirable of the patient's responses regarding nurses and physicians respectively. Items 1-30 and 36 were recoded so that the value 1 represented the least desirable response level for the patient, and higher values represented more desirable response levels. The ltm package was used to obtain IRT scores from complete or partially complete response data. Latent variables for each respondent were estimated, and we subsequently transformed them to a scale from 0 (least desirable) to 100 (most desirable).
Another diabetes patient sample from NDR was used for validation of our IRT approach. We fitted the same GRMs to the responses in the second sample, and used these GRMs to estimate new IRT scores using the response patterns in the main sample. These were then compared to the originally estimated IRT scores. The correlations and pairwise differences between the new and original scores on each of the abilities and judgments were used to study their agreement.
The latent variables, the EQ-5D Index, the risk factors, and other registry data in the patient sample are described using summary statistics. In order to explore how IRT scores vary with other patient characteristics, we compared groups using the Kolmogorov-Smirnoff test of the empirical distribution functions (EDF) of IRT scores (null hypothesis identical distributions, against different distributions; two-sided). The groups in the tests were men versus women, and patients who responded poor versus excellent to self-rated health in general (item 36). For risk factors, age, and diabetes duration, we compared the groups of patients below the first quartile versus above the third quartile. Correlations between IRT scores, patient covariates, and risk factors were estimated using Spearman's rho. A p-value of <0.0001 was considered to indicate a significant difference in distribution of IRT scores (adjusted for mass significance). We also carried out tests for correlations significantly different from zero. The test on EDFs and the correlations measure different concepts, a test on EDFs detects difference in distribution, for example different shape, scale, or location.
In our analysis of ad hoc response levels, we tested if low levels of abilities or judgments were associated with deviations in risk factors, or if high risk factor levels were associated with deviations in abilities and judgments, compared to the overall sample, in order to see whether some of the collected data might be redundant. Subgroups of patients were defined as the patients with a value less than the tenth percentile of each of the abilities and judgments. Further subgroups were defined as the patients with a value above the ninetieth percentile of each risk factor. If the empirical distribution functions of an IRT score in one such subgroup differed from the overall sample (by diabetes type) with a p-value < 0.05, we flagged it as an association. We used this significance level only as a flagging device to identify potential associations that might be worthy of future investigation.
The review of scales, fitting of GRM models, estimation of latent variables, and further analyses were carried out using software written in the R language [54], namely the mokken [43], ltm [55], and psych [45] packages, our own R code, and IRT Pro [56].

Material
The NDR has been carrying out projects since 2003 to monitor and improve the quality of diabetes care in Sweden. One such project started in February 2008 and ended in November 2009. Primary care centers and hospital out-patient clinics connected to the NDR were invited to participate, and the first 26 to accept the invitation were included [57,58]. From each of the participating health care centers, we selected all diabetes patients who had visited the center between 1 January and 30 May 2008, and who were aged 18 to 80 years, not recently diagnosed with diabetes (<6 months ago), alive, and not living with protected identity. This was our main sample. In total, 4760 patients were selected for inclusion. The patient sample was representative for the patient population covered by the NDR, and is presented in greater detail elsewhere [57,58]. The questionnaire was sent by mail to the selected patients between June and August 2008, and a reminder was sent to non-responders after two months. A total of 2916 patients (61%) responded and were included in the analysis (Table 1) The NDR keeps longitudinal records of individual patient data entered by healthcare centers. These records include patient characteristics, type and duration of diabetes, risk factors, diabetes treatment, and more. Each person's data are associated with a unique identification number, which enables registry data to be deterministically linked on the individual level with patient-reported outcomes measures. For our study patients, risk factors were extracted from the NDR together with patient covariates such as age, gender, diabetes type and duration, and diabetes treatment, by extracting the most recent record in the NDR from the one-year period before issuing the questionnaire. Almost all patients (98%) had values for HbA 1c and systolic blood pressure, and about 77% of type 1 diabetes patients and 65% of type 2 patients had a value for LDL cholesterol. The availability of values was roughly stable between treatment groups. Notes: presented as mean (SD, min-max), or percent (%) of patients.
After the first 26 centers, 12 additional centers participated in a subsequent NDR project in which the same questionnaire was used [59]. A validation sample was formed from this subsequent project, with questionnaire data and registry data extracted as described above. This sample consisted of 1656 type 1 diabetes patients and 1431 type 2 patients (Table 1), corresponding to a response rate of 60%.
Ethical approval for this study was given by the Regional Ethical Review Board of Gothenburg, Sweden.

IRT Scales
Self-management had to be split into two scales, with knowledge and skills-related items in one scale, and satisfaction and stress-related self-management items in another scale. Item 23 showed local dependence and was removed from the Sense of Security scale. The other scales remained as initially constructed (items 1-30, below). As expected, given the clinical differences between type 1 and 2 diabetes, separate scale models had to be fitted for type 1 and type 2 diabetes. Item 36, self-reported health in general, was not used in any scale and was instead used ad hoc as a stratification variable. The remaining items (31)(32)(33)(34)(35) were used to determine the EQ-5D Index. See Appendix for details of the review, and the validation of scales.
The item response category characteristic curves for item 24 (Sense of Security) are shown in Figure 1 for type 1 and 2 diabetes. Values of the latent variables are not directly comparable between diabetes types due to separate scale models, and the numerical difference in IRT score that stems from an item due to differential item functioning (the horizontal shift in Figure 1) may or may not correspond to a difference in the actual ability. Our scales do not have any natural reference points, so there will be no absolute reference levels for the latent variables (see Discussion). However, based on the phrasing of the items and their response levels, an item-person map can give meaning to certain ability levels ( Figure 2). Further, a unit on a given IRT scale has the same meaning across the whole scale, but a unit on another IRT scale may have a different meaning. Hence, for example, we could say that a given ability is greater in one patient than in another, but we cannot say if the value of any given ability is different from the value of any other ability.

Patient Abilities and Judgments, EQ-5D Index, and Some Determinants
Some of the IRT scores varied by demographic characteristics according to EDF significance tests. Younger patients with type 1 diabetes had lower Self-Management Ability and Service & Information with EDF curves more to the left (Figure 3), and lower Access (data now shown). In type 2 diabetes, younger patients had lower Sense of Security and Access (Figure 3), and lower Self-Management Ability and Service & Information (data now shown). Neither gender nor diabetes duration had any influence on any of the IRT scores. Better self-rated health in general was associated with higher values of every IRT score, in both diabetes types, with the exception of Access in type 2 diabetes.
With regard to risk factors, patients with high HbA 1c had lower Self-Management Skills and Self-Management Ability in type 1 diabetes, whereas in type 2 diabetes, a high HbA 1c was associated with lower Sense of Security and Social Activities (Figure 4, Table 2). Neither SBP nor LDL had any influence on any IRT score, except that SBP was positively correlated with Self-Management Ability in type 1 diabetes ( Table 2). The correlations indicated broadly the same associations as the EDF tests between risk factors, abilities and judgments, with the above exception and that HbA 1c was negatively correlated with Self-Management Skills and Ability in type 2 diabetes (Table 2).
Men had higher EQ-5D Index than women in both type 1 and 2 diabetes. Type 1 patients with lower age or with a shorter diabetes duration had higher EQ-5D Index than the overall sample. In both diabetes types, a better self-rated health in general was associated with a higher EQ-5D Index. In type 1 diabetes, those with high HbA 1c had lower EQ-5D Index, and both low SBP and high SBP were associated with lower EQ-5D Index than overall. The correlation between SBP and EQ-5D Index was not significant ( Table 2). The EQ-5D Index was positively correlated with all patient abilities and judgments except Access in type 1 diabetes.
Thus, patient abilities and judgments were mainly associated with self-reported health in general and the EQ-5D Index, to some extent with age, but they were only occasionally associated with risk factors. Further, among the abilities and judgments, the correlations were relatively weak: at most 0.62 between abilities and 0.70 between judgments in type 1 diabetes, and at most 0.73 between abilities and 0.76 between judgments in type 2 diabetes ( Table 2). The highest correlation between a risk factor and an abilities or judgment was 0.26 (i.e. quite weak). Notes: The distance between the curves shows how the scores differ between the age groups. Empirical distribution function Fn(x) = probability of ( IRT score ≤ x ), so a curve more to the left indicates lower IRT scores. Age groups defined as below the first quartile, and above the third quartile, respectively. Notes: The distance between the curves shows how the scores differ between the HbA 1c groups. Empirical distribution function Fn(x) = probability of (IRT Score ≤ x ), so a curve more to the left indicates lower IRT scores. HbA 1c groups defined as below the first quartile, and above the third quartile, respectively.

Analysis of Ad Hoc Response Levels
We examined whether low levels of abilities and judgments were associated with different risk factors levels than in the overall sample. In type 1 diabetes, patients with low Self-Management Skills or low Social Activity had higher HbA 1c than the overall average, and patients with low Self-Management Ability had higher HbA 1c and lower SBP. In type 2 diabetes, patients with low Sense of Security had higher HbA 1c , and those with low Self-Management Ability or low Service & Information had lower SBP. No association was seen between low IRT scores and deviations in LDL.
We then examined whether high risk factor levels were associated with different abilities or judgments than in the overall sample. In type 1 diabetes, high HbA 1c values were associated with lower Self-Management Skills, Self-Management Ability, Sense of Security, and Involvement, than the overall average. In type 2, high HbA 1c values were associated with lower Self-Management Ability, Sense of Security, Social Activity, Work Activity, and Service & Information than the overall average. There was no association between high LDL and deviations in IRT scores, nor between high SBP and deviations in IRT scores. Table 2. Correlations (Spearman's rho) between latent variables, risk factors, and EQ-5D Index. Notes: S-M = Self-management, * = significant correlation, p < 0.0001.

Discussion
We have developed a questionnaire to collect patient abilities and judgments of their experience with diabetes care. The questionnaire was used in a large sample of diabetes patients. We have applied IRT on the data to derive scales for these abilities and judgments. Most of the scales were found to be adequate, and they validated well across two different samples of diabetes patients. Thus we can now use our questionnaire to measure these abilities and judgments in the form of IRT scores.
Secondly, we analyzed these IRT scores together with the patients' demographic characteristics and risk factor levels. Some IRT scores varied by age; for example, younger patients reported worse experience with diabetes care, and worse Self-Management Ability. On the other hand, gender and diabetes duration did not appear to be determinants of the IRT scores. Of the risk factors, HbA 1c was the main factor to show associations with IRT scores; patients with high HbA 1c reported poor Self-Management Skills and Ability in type 1 diabetes, and poor Sense of Security and Social Activities in type 2 diabetes.
We further characterized subgroups of patients with a need for improvement to illustrate practical use of the questionnaire, and also examined whether low levels of abilities and judgments were associated with different risk factor levels than in the overall sample. In type 1 diabetes, for example, we saw that patients with Self-Management Skills had higher HbA 1c than the overall sample, and poor Self-Management Ability was associated with low SBP. However, the causal relationship could be in any direction; we cannot say if patients rated their skills poorly because their high HbA 1c was obviously a problem, or whether their HbA 1c was high because of poor Self-Management skills. In type 2 diabetes, patients with low Sense of Security had higher HbA 1c , but again, the causal relationship was uncertain. We found no evidence that patients with low IRT scores differed from the overall sample with regards to LDL. When we looked at this in the other direction, to see if high risk factor levels were associated with different levels of abilities and judgments, we only saw associations with HbA 1c ; for example, a high HbA 1c level was associated with poor Self-Management Ability and poor Sense of Security in both diabetes types. We found no evidence that patients with high LDL or patients with high SBP differed from the overall sample with regards to IRT scores. High LDL and high SBP often give no symptoms, unlike high HbA1c levels which negatively affects how you feel, so the findings appear logical from that perspective. Looking for patterns going both ways, we found that high HbA 1c , low Self-Management Skills, and low Self-Management Ability were associated in type 1 diabetes, while high HbA 1c and low Sense of Security were associated in type 2 diabetes. Overall, it seemed that patients' abilities and judgments were not generally associated with the risk factors, and when they were, the association differed between diabetes types. One exception was that patients with low Self-Management Ability had lower SBP in both diabetes types. This might indicate that patients treated for high SBP (and who had, as a result, a lower SBP) were more ill and therefore reported a worse ability. Thus, our IRT scores appear to give one picture of a patient's situation, and the risk factors another picture, with only partial overlap between the two. Risk factors alone will not tell the healthcare provider how a patient is feeling, and thus are poorly sensitive to health-related quality of life. Conversely, a patient could feel well but still have high levels of risk factors, and thus be at risk of avoidable diabetes complications unless risk factors are considered too. We argue that patients' abilities, judgments and risk factors all provide important information to decision makers; the risk factors are useful for prognoses regarding future outcomes and costs, and the patients' abilities and judgments reflect patient preferences. The use of both data sources could achieve a more complete evaluation of diabetes care and promote person-centered care.
There are a number of aspects of our study that deserve special attention. One important aspect is the interpretation of the scales and the IRT scores. We cannot make direct comparisons of the latent variables between diabetes types because they have separate scale models. On the other hand, it is logical from a clinical point of view to have separate scales for the two diabetes types, since what is possible to accomplish for a type 1 patient may be very different from what is possible for a type 2 patient. Our scales lack natural (fixed) reference points to which we could assign some reference levels in the same way as, for example, weight, where 0 represents absence of weight. Nonetheless, we have developed a set of scales which can be used to collect data on patients and eventually follow up their outcomes to determine empirical reference levels. We have the data to describe a Swedish population of type 1 and 2 diabetes patients. Our dataset is fairly unique, in that we have data on patient abilities and their judgments of diabetes care deterministically linked with medical outcome measures from a National Diabetes Registry, on the individual patient level, from about three thousand individuals in our main sample and another three thousand in the patient sample used for validation. Taking this as a baseline, we have to date roughly 20,000 patient-years of follow-up.
We cover a number of dimensions judged as important for the diabetes patient. There might be further dimensions that need to be included to make the scope more complete, and work on this is ongoing at the NDR. However, we have now taken this first step of analyzing a combination of patients' abilities, patients' judgments, and risk factors. This could be extended in future work, e.g. by assigning weights to these various dimensions to obtain a single measure of value. This would simplify comparisons between e.g. healthcare providers or modes of diabetes care. It would be interesting to determine such weights, for instance by relating the various dimensions to patients' own valuations, or by relating them to registry outcomes later on, like survival or occurrence of complications. Knowing these weights could help prioritizing between interventions on, say, a poor Sense of Security, a high risk factor value, or indeed on both.
We derived the EQ-5D Index, and we asked about self-rated health in general. It could be asked whether one of these measures would be sufficient, as they were both correlated with most of the IRT scores. However, they do not reveal which specific aspect is poor. We argue that our set of IRT scores, linked with registry data on risk factors, offers more detailed and disease-specific information that we can use for studying and improving the diabetes patient's situation, because it captures more dimensions and has a greater ability to measure changes. On the other hand, our approach is disease-specific, so it cannot be used to make comparisons across diseases in the same way as a generic instrument such as the EQ-5D, and this is an important limitation. However, the instruments can easily be used together as we have done, and we do have the EQ-5D Index and self-rated health in general too.
There are a couple of technical details that should be mentioned. We defined subgroups with poor IRT scores and risk factors using arbitrary thresholds. This serves to illustrate the concept of using a level that would trigger some kind of response (i.e. an intervention). However, the actual thresholds could be defined differently. Some of our scales have very few items, and this affects their measurement properties and their robustness to missing values. Item 25 in the Feeling Safe scale displayed misfit, but it was kept because the scale contains only two items. New or additional items might be needed, but may conflict with having a short questionnaire that can be filled in without too much effort. Further, our Work Activities scale did not work well in any diabetes type, and requires further development. A revision of the questionnaire is ongoing at NDR. Additional scale properties are discussed in the appendix.
We see several potential implications of our work that should be beneficial. Optimizing patients' abilities and judgments should result in improving their health-related quality of life, diabetes self-management abilities, and experience of diabetes care. Reasons for poor risk factor control may be found by analyzing patient abilities and judgments, and this can guide caregivers in helping patients reach their treatment targets and thereby reduce their risk of future complications. In connection to a healthcare visit, estimates of a patient's abilities could be obtained and used in the dialogue between the physician and the patient along with the risk factors, to decide on further treatment, care or support. Judgments of the provided care might be a bit sensitive in a face-to-face meeting. But on the healthcare provider level, judgments as well as abilities and risk factors could be used to compare healthcare production between providers or between diabetes care alternatives, since the patient's judgments of their experience with diabetes care ought to be useful intermediate endpoints in optimizing the interaction between the patient and the healthcare provider.
We believe there is a strong health-economic value here that translates into avoided morbidity and thereby avoided costs for society and avoided burden for the patients.

Conclusions
We have successfully collected data to estimate patient-reported outcome measures in the form of patient abilities and judgments of their experience of diabetes care. Analysis of these together with risk factors showed that the different types of data describe different aspects of a patient's situation. These aspects occasionally overlap, but not in any particularly useful way. They both provide important information to decision makers and none is necessarily more relevant than the other. Both should therefore be considered, to achieve a more complete evaluation of diabetes care and to promote person-centered care.

Author Contributions
Sixten Borg researched the data, selected statistical methods and performed all statistical analyses, contributed to the discussion and wrote and edited the paper. Bo Palaszewski contributed to the statistical methods and analyses, the discussion and reviewed the paper. Ulf Gerdtham contributed to the discussion and the statistical theoretical framework and reviewed the paper. Fredrik Ödegaard participated in the discussion and reviewed the paper. Pontus Roos outlined the general methodological approach and developed the questionnaire (see Acknowledgements). Soffia Gudbjörnsdottir started this project and was responsible for the Diabetes perspectives in the work, and contributed to the discussion and reviewed the paper.
self-management knowledge and skills-related items were placed in one scale, and satisfaction and stress-related self-management items were placed in another scale, overall fit improved. M 2 indicated a poor overall scale model fit for all scales, but RMSEA indicated a fair fit in all scales except Work Activity for both diabetes types. The Self-Management Skills and Self-Management Ability scales just barely indicated a fair fit for type 2 diabetes (Table A1). Differential item functioning was detected with regard to diabetes type, and separate scale models were fitted for each type (Table A2); both types had the same items in every scale, but each diabetes type had its own set of item parameters. For the scales for judgments that had identical pairs of items for nurses and physicians, we selected the combined scales as their overall model fit showed an improvement over the separate nurse and physician scales. The fitted GRMs for the scales are presented in Table A2. Table A1. Summary of scale properties for the final scales: number of complete cases (N) and root mean square error of approximation (RMSEA) with 90% confidence intervals (CI), based on M 2 , by type of diabetes. Table A2. Fitted graded response models for type 1 and type 2 diabetes, respectively (discriminant parameter D i and extremity parameters E ik ), for Self-Management Skills (SM 1), Self-Management Ability (SM 2), Sense of Security (SS), Social Activities (SA), Work Activities (WA), Access (A), Service & Information (SI), and Involvement (I).

Type 1 Diabetes Type 2 Diabetes
Scale Item  * The latent variables are scaled from 0 to 100.