HLS19-NAV—Validation of a New Instrument Measuring Navigational Health Literacy in Eight European Countries

To manoeuvre a complex and fragmented health care system, people need sufficient navigational health literacy (NAV-HL). The objective of this study was to validate the HLS19-NAV measurement scale applied in the European Health Literacy Population Survey 2019–2021 (HLS19). From December 2019 to January 2021, data on NAV-HL was collected in eight European countries. The HLS19-NAV was translated into seven languages and successfully applied in and validated for eight countries, where language and survey method differed. The psychometric properties of the scale were assessed using confirmatory factor analysis (CFA) and Rasch modelling. The tested CFA models sufficiently well described the observed correlation structures. In most countries, the NAV-HL data displayed acceptable fit to the unidimensional Rasch partial credit model (PCM). For some countries, some items showed poor data–model fit when tested against the PCM, and some items displayed differential item functioning for selected person factors. The HLS19-NAV demonstrated high internal consistency. To ensure content validity, the HLS19-NAV was developed based on a conceptual framework. As an estimate of discriminant validity, the Pearson correlations between the NAV-HL and general health literacy (GEN-HL) scales were computed. Concurrent predictive validity was estimated by testing whether the HLS19-NAV, like general HL measures, follows a social gradient and whether it forms a predictor of general health status as a health-related outcome of general HL. In some countries, adjustments at the item level may be beneficial.


Introduction
The ability to navigate the health care system (HCS) is increasingly important owing to more complex HCSs with their various sectors and myriad of organizations [1][2][3][4][5][6][7]. Complexity and fragmentation of HCSs lead to a lack of transparency and increased demands on patients and users to access the right care, at the right time, at the right place [8][9][10][11][12][13][14]. Navigating the HCS is also challenging because patients and users are increasingly expected to take responsibility for their own health and healthcare and thus face the challenge of independently gathering and managing health information from a wide range of services and information sources [15,16]. However, numerous patients and users lack the required skills and information [17][18][19][20]. As a result, they may endure a tiring odyssey through the maze of the HCS, getting lost, experiencing dead ends, and receiving delays in diagnosis or treatment, which is also related to low satisfaction and even trust in the HCS and health professionals [21][22][23].
To avoid such consequences, navigating the HCS requires health literacy (HL), or more specifically, navigational health literacy (NAV-HL), which is defined as "people's knowledge, motivation, and skills to access, understand, appraise, and apply the information and communication in various forms necessary for navigating health care systems and services adequately to get the most suitable health care for oneself or related persons" [24] (p. 6). In particular, this applies to frequent users of the HCS and their significant others, especially to people with chronic illness who naturally have more health care needs; needs that change frequently, become more complex over time, and which may include multiple layers of health care [25][26][27][28][29]. In consequence, they are particularly exposed to the HCS and must continually acquaint themselves with new health care settings and services, making them especially reliant on adequate NAV-HL [30]. NAV-HL may also matter since low HL has been identified as a barrier for understanding and using the HCS, as supporting an inappropriate use of health care services, and as a cause of potentially higher health expenditure [31,32].
However, following the relational model of HL-describing HL as the result of individual skills and abilities, but also of the demands and complexity individuals face in dealing with health information [33]-the low availability and comprehensibility of navigation-related information led to the hypothesis of poorly developed NAV-HL in general populations. Navigation-related information usually meets criteria relevant to organizations within the HCS but does not sufficiently consider challenges faced by its users [30,34,35], i.e., it is characterized by low user orientation and usability.
Nevertheless, as has been pointed out elsewhere [17,24], so far, little is known about the NAV-HL in general populations. In HL research, HCS navigation has received infrequent or irregular attention. Only a few published studies exist. For example, in an exploratory study of hospital navigability, Rudd [36] referred to hospitals as complex "literate environments" in which it is difficult for users to find the right point of contact. In another study, Rudd et al. [37] (p. 8) analysed literacy tasks regarding "rights and responsibilities, application for insurance and other coverage plans, and informed consent for procedures and studies" and labelled "systems navigation" as one of five domains of the Health Activities Literacy Scale (HALS) [37][38][39]. While these latter studies mainly emphasis the functional domain of HL, Osborne et al. [40] operationalized HCS navigation as one of nine subscales of their Health Literacy Questionnaire (HLQ). However, the HLQ navigation dimension reflects only to a limited extent the comprehensive definition of HL of Sørensen et al. [17,24,30,41].
Apart from this small number of existing studies, there are no studies introducing instruments on HL in the specific field of navigating the HCS [17]. For this reason, a new instrument measuring NAV-HL, the HLS 19 -NAV, was developed as part of the European Health Literacy Population Survey 2019-2021 (HLS 19 ) [17,24]. The HLS 19 -NAV was applied for the first time in HLS 19 . Data were collected on the scale in eight countries (Austria (AT), Belgium (BE), Czech Republic (CZ), France (FR), Germany (DE), Portugal (PT), Slovenia (SI), Switzerland (CH)) by using different methods of survey data collection.
This article is part of a series of already published [42] and upcoming papers introducing new HL tools that have been developed, applied, and tested, all in the HLS 19 study. In general, the aim of these articles is to use the data collected in HLS 19 to examine the psychometric properties of the newly developed HL tools and different aspects of its validity. To derive overarching and comparable conclusions about the HLS 19 tools, the single papers address similar research questions, and use partly the same data and analyses procedures. For this article, it is asked: How well does a single-factor, as compared to a two-factor confirmatory factor analysis (CFA) model, describe the correlation structure of the HLS19-NAV data? 2.
To which extent does the data fit the unidimensional Rasch model? 3.
What is the impact of using dichotomous or polytomous data on the psychometric properties of the NAV-HL scale? 4.
How well does the instrument fulfil aspects of content and face validity and of construct validity measured as discriminant and concurrent predictive validity?
For answering these research questions, part of the analyses based on dichotomous data of the HLS 19 -NAV scale from the respective chapter in the "International Report on the Methodology, Results, and Recommendations" [17] are presented and supplemented by new additional analyses based on polytomous data.

Development of the HLS 19 -NAV
The working group for the development of the HLS 19 -NAV was mainly led by German researchers as part of the German national health literacy survey (HLS-GER 2) with which Germany participated in HLS 19 [43]. Prior to developing the HLS 19 -NAV, a scoping review of definitions, concepts, and tools related to HCS navigation was conducted. From the literature, three aspects were identified: Information tasks relevant for navigating the HCS at the system level (how the HCS is organized, how it functions and works), at the organizational level (how to choose a suitable health care organization, how to use it and find one's way), and at the interactional level (how to interact with health professionals and organizations to negotiate health care paths and settings). Initially, 15 items were developed and tested in an expert review (n = 6). Item comprehensibility and interpretation were evaluated by using four focus group discussions, each with eight participants, and later by using cognitive interviews (n = 33). Further details are available in Griese et al. [24]. The 12 items selected for the final measurement scale (Table 1) reflect the four cognitive operations access (3 items), understand (3 items), appraise (3 items), and apply (3 items), for information on navigational issues. Polytomous responses were collected by using a four-point rating scale (4 "very easy", 3 "easy", 2 "difficult", 1 "very difficult").

Translation Procedure
For this study, the HLS 19 -NAV was translated into seven languages (Czech, Dutch, French, German, Italian, Portuguese, and Slovenian). Two forward translations were performed in SI and BE (Dutch version). This was done by the responsible study team and by a national data collection agency. Countries with common languages collaborated in the translation process: In AT, CH, and DE, one forward translation for the German version was conducted by the national researchers and one by the German national data collection agency. The two versions were compared, and consensus was reached within the AT, CH, and DE study teams. The German version was translated into French and Italian by the language service of the Swiss Federal Office of Public Health (FOPH), reviewed by different experts, and consented on between the BE, FR, and IT study teams. One forward translation was conducted in CZ and PT. Additional backward translations were performed by the CZ and SI teams [17].

Data Collection
The HLS 19 survey collected cross-sectional data in 17 countries within the wider WHO European Region (HLS 19 Consortium 2021), where 8 countries (Table 2) collected data on all 12 items of the optional HLS 19 -NAV scale. Four survey methods were available: Computer-assisted personal interviewing (CAPI), pen-and-paper personal interviewing (PAPI), computer-assisted telephone interviewing (CATI), and computer-assisted web interviewing (CAWI). Three countries used a combination of methods (Table 2). Most countries used a multi-stage random sampling or quota sampling procedure ( Table 2). The total sample size was smallest in BE, with n = 1000 (BE), and largest in SI, with n = 3360 (Table 2) [17]. June 2020-10 August 2020 3360 AT = Austria, BE = Belgium, CH = Switzerland, CZ = Czech Republic, DE = Germany, FR = France, PT = Portugal, SI = Slovenia; CAPI = computer-assisted personal interviewing; CATI = computer-assisted telephone interviewing; CAWI = computer-assisted web interviewing; PAPI = pen-and-paper personal interview. No analysis is reported for the small, self-administered Paper-and-pencil sample (n = 12) in SI.

Other Variables Included in the Analysis
Among sociodemographic and socioeconomic variables, gender, age (in years), selfassessed social status (from 1 "lowest self-assessed social status" to 10 "highest self-assessed social status") [44], the highest level of completed education (lower secondary education or below: ISCED 0-2; higher secondary education: ISCED 3; above secondary education: ISCED 4-8) [45,46], and a self-assessment item on difficulties in "paying all bills at the end of the month" (four response categories from 4 "very easy" to 1 "very difficult") were included to describe sample characteristics. Additionally, data on respondent employment status (employed and unemployed or retired), self-reported general health status ((very) good or fair and (very) bad) were used to test for differential item functioning (DIF). For DIF analyses, variables on education (ISCED 0-3 and 4-8) and social status (levels 1-4 and levels 5-10) were dichotomized and various age categories were computed [47]. Variables entered in the regression model include the NAV-HL score (0-100), age, education (ISCED 0-8), self-assessed social status (1-10), financial deprivation (4 categories, from no deprivation (0) to severe deprivation (100)) and self-reported general health status [17].
General health literacy (GEN-HL) was measured using the HLS 19 -Q12 self-assessment scale, which is a 12-item revised short form of the HLS-EU-Q47 [17,48]. The HLS 19 -Q12 captures a comprehensive and public health-oriented concept of HL by operationalizing a conceptual framework of three health domains (health care, disease prevention, health promotion) combined with the previously mentioned cognitive operations (to access, understand, appraise, and apply health information). Using the same four-point rating scale as the HLS 19 -NAV (4 "very easy", 3 "easy", 2 "difficult", 1 "very difficult"), the HLS 19 -Q12 measures perceived difficulties in accomplishing HL tasks.

Analysis
The HLS 19 study is based on the HLS-EU study from 2012 [18], which proposed a standardized sum score using polytomous responses ("very easy", "fairly easy", "fairly difficult", "very difficult"). However, HLS 19 scale scores were calculated using dichotomized or rescored data (the "very easy"/"easy" and "very difficult"/"difficult" responses were combined, respectively). Since the initial results reported in the International Report of HLS 19 [17] were based on rescored or dichotomized data, one aim of this article is to explore how rescored data affect the psychometric properties of the HLS 19 -NAV compared to using the original polytomous responses. Analyses on psychometric properties are based on CFA and Rasch modelling.
Concerning CFA, a factor model reflecting the NAV-HL framework was computed. The NAV-HL framework [24] describes three domains or levels referred to as the organizational level (items HLS 19 -NAV1-5), system level (items HLS 19 -NAV6-11) and interactional level (item HLS 19 -NAV12). Discarding item HLS 19 -NAV12, which represents the interactional level, a two-factor CFA model was fitted on each sample. Single-factor CFA models, where HLS 19 -NAV12 was included, were additionally estimated, as HLS 19 reported on a single overall score for all 12 HLS 19 -NAV scale items [17]. Owing to categorical data, we used the lavaan package for R [49] with a diagonally weighted least-squares estimator (DWLS) [50][51][52]. A good or sufficient model fit was assumed if the following target values of the applied goodness-of-fit indices were met: standardized root-mean-squared residual (SRMR ≤ 0.08), root-mean-squared error of approximation (RMSEA ≤ 0.06), comparative fit index (CFI ≥ 0.95), Tucker-Lewis index (TLI ≥ 0.95), goodness-of-fit index (GFI ≥ 0.95), and adjusted goodness-of-fit index (AGFI ≥ 0.9) [50,53,54]. Due to the large sample sizes, no chi-squared values are reported. Standardized parameter estimates, or rather the respective R 2 values, were examined for low values. The residual correlation matrix was inspected for coefficient values greater than 0.1 as possible indicators of a possible model-data disagreement [51].
RUMM2030Plus [55] and ACER ConQuest 5 [56] were used for Rasch modelling [47]. Data were tested against the partial credit parameterization (PCM) of the unidimensional Rasch model [57]. Overall data-model fit was assessed by chi-square fit statistics, and scale targeting was evaluated by comparing the distribution of item locations to the distribution of person locations. Several tests of unidimensionality are available. Using dependent t-tests [58], we reported the proportion of respondents with significantly differ-ent location estimates based on the system level subscale (items HLS 19 -NAV1-5) and the organizational level subscale, including the item measuring the interactional level (items HLS 19 -NAV6-12). Comparing score on two theoretically defined subscales, we assume strict unidimensionality when less than 5% of t-tests are significant. However, constructing scales by a composition of subscales to increase validity, some multidimensionality is inevitable.
At the item level, single-item fit, differential item functioning (DIF), response dependency, and ordering of response categories were evaluated [47].
Single-item chi-square probability values above a Bonferroni-adjusted 5% level, fit residuals within the range of ±2.5, and Infit values between 0.7-1.3 indicate sufficient item fit [59][60][61]. Differential item functioning (DIF) refers to differences in item performance between the respondent groups we match with respect to the construct we measure. We matched groups on gender, age, education, status of employment, financial deprivation, social level, and/or general health status. We refer to uniform DIF when items have different relative difficulty and non-uniform DIF when items discriminate differently for different groups of people. An overview of person factor categories is available in Table S1 of the Supplementary file. DIF was evaluated by using two-way analysis of variance [62] and inspecting graphical displays. We used a Bonferroni-adjusted significant probability value of <5%.
Since models tested on large data sets run the risk of being rejected due to the chisquared statistic [63], analyses on data-model fit, item fit, and DIF were based on reduced sample sizes. Andrich and Marais [62] recommend using a sample size corresponding to 10-30 persons per threshold, where the total number of thresholds equals the product of the number of items (12) and the number of thresholds per item (3), yielding a sample size between n = 1080 and 360 when assessing the psychometric properties of the HLS 19 -NAV. By "ordered response categories", we mean significantly different item thresholds that are in "correct" order [62].
The items should only be correlated through the latent trait being measured. Using Rasch modelling, a residual correlation refers to the relationship between two items whilst taking away the effects of the latent variable on this relationship. A residual correlation between items of >0.3 was applied to detect response dependency [64].
As a measure of internal consistency, Cronbach's alpha and Omega for categorical data were estimated. The Person Separation Index (PSI) was computed to estimate the lower limit of the true reliability. In general, reliability indices refer to how well a scale is able to separate between respondents along the latent trait. The reliability was considered acceptable when indices exceeded 0.7 [51]. In addition, the average variance extracted (AVE) was calculated and interpreted using a limiting value of AVE ≥ 0.5 [65].
To ensure content or face validity, the HLS 19 -NAV was developed and tested, as mentioned above, based on the conceptual NAV-HL framework, which defines three levels relevant for navigating the HCS while referring to the multidimensional HL definition of Sørensen et al. [24,41].
As part of construct validity, discriminant validity, meaning that two theoretically different measures should not be too highly related [66], was tested by Pearson correlations between the NAV-HL and GEN-HL scores. Scores were standardized to the range of 0 to 100 [17]. A higher score indicates higher NAV-HL/GEN-HL (distributions of NAV-HL scores based on dichotomous and polytomous scoring can be found in the Supplementary file: Figures S1 and S2). As the NAV-HL and GEN-HL scales are based on the same HL construct [41], it was hypothesized that the measures would correlate to a certain degree.
Furthermore, it was tested (a) whether NAV-HL is determined by factors that were already identified as indicators for the presence of a social gradient in HL research and (b) whether NAV-HL predicts health-related consequences, here general health status, that were already found to be associated with general HL [18]. For (a), linear regression models, including NAV-HL score as dependent variable and gender, age, education, self-assessed social status, and financial deprivation as hypothesized predictor variables, were calculated. For (b), linear regression analyses were performed including the NAV-HL score and the mentioned social variables as predictor variables of general health status. In this regard, the aim was not to examine an optimal model that best explains the dependent variable, but to examine whether NAV-HL is related to the considered factors as expected. As sample sizes in CH (CATI: n = 192) and CZ (CATI: n = 532) were considerably smaller compared to other countries, in this case, data from the total country samples were used.

Results
The sample characteristics are presented in Table 3. In total, valid data on NAV-HL were collected for an overall of 15,685 respondents across the eight countries. In most countries, the number of missing values (cases not used to calculate the score because less than 80% of HLS 19 NAV items were answered) was small, varying between 0% and 2%. In AT (5%), CH (10% for CATI), CZ (10% for CATI), and PT (14%), missing values were higher.

Confirmatory Factor Analysis
Fitting the single-factor CFA ( Table 4) to dichotomous data, most goodness-of-fit indices indicated good to sufficient fit for most countries. SRMR varied between the acceptable values 0.03 (CZ, CAWI) and 0.07 (CH, CAWI, and DE), while the RMSEA of 0.07 observed for BE, CH (CAWI), and PT was slightly above the strict target value ≤0.06 [67]. With a minimum of 0.97, the observed values for the CFI, TLI, GFI, and AGFI are sufficient. The CFA analysis based on polytomous data points to similar results. The fit indices for the model remained stable except for the RMSEA, where values were considerably higher for polytomous data. Table 4. Goodness-of-fit indices for single-factor CFA of the HLS 19 -NAV based on dichotomous and polytomous data (based on HLS 19 Consortium [17] (p. 211)).  For the dichotomous data, the standardized parameter estimates are above 0.7 for all items except for the German data and for item HLS 19 -NAV9 (to understand how to get an appointment with a particular health service) in AT, BE, CH, and FR. The CFA for the polytomous data shows comparable results with only minor differences in the range of (−0.05 to 0.06). Except for the German data, the R 2 value is above 0.5 for all items but HLS 19 -NAV9, with the R 2 values being minimally higher on average (by 0.02) for the dichotomous data.

AT
For all country-specific samples, there were entries >0.10 [68] in the residual correlation matrix. Of the 66 possible residual correlation coefficients (cf. Tables S2 and S3), on average, 6 values are above 0.1. For the dichotomous data, the number of residual correlation coefficients above 0.1 is the highest for CH (CAWI: 12 times), DE (11 times), BE (8 times), CH (CATI: 7 times), and PT (7 times). The residual correlation coefficients are above 0.1 for the majority of data sets for HLS 19 -NAV1 with HLS 19 -NAV2. The residual correlation coefficients of HLS 19 -NAV1 or HLS 19 -NAV2 with HLS 19 -NAV7 or HLS 19 -NAV8 could also require further inspection. For the polytomous data, the data sets of BE, CH (CAWI, CATI), and DE show 10 or more reidual correlation coefficients that are higher than 0.1. The variables concerned are the same as for the dichotomous data. All residual correlation coefficients are below 0.2 with the exception of the residual correlation of HLS 19 -NAV11 with HLS 19 -NAV10 for the Swiss (CATI) polytomous data (r res = 0.2).
Fitting the two-factor CFA model (Table 5)  The values for the model fit coefficients CFI, TLI, GFI, and AGFI are also slightly higher (at most 0.01) for the two-factor model for some combinations of country and survey type. In the two-factor CFA models, the correlation coefficient between the two latent variables ranges from 0.84 to 0.96, which hints at the two factors being hardly distinguishable. The use of polytomous data results in lower SRMR in some countries, while the RMSEA, as in the single-factor model, increases. Table 5. Goodness-of-fit indices for two-factor CFA of the HLS 19 -NAV based on dichotomous and polytomous data (based on HLS 19 Consortium [17] (Annex). AT = Austria, BE = Belgium, CH = Switzerland, CZ = Czech Republic, DE = Germany, FR = France, PT = Portugal, SI = Slovenia; CAPI = computer-assisted personal interviewing; CATI = computer-assisted telephone interviewing; CAWI = computer-assisted web interviewing; PAPI = pen-and-paper personal interview. SRMR = standardized root-mean square residual, RMSEA = root-mean-square error of approximation, CFI = comparative fit index, TLI = Tucker-Lewis index, GFI = goodness-of-fit index, AGFI = adjusted goodness-of-fit index. CI = confidence interval.

Rasch Analyses at the Overall Level
When performing Rasch modelling based on dichotomous as opposed to polytomous data, it became evident, as expected, that the power of analyses of fit and reliability indices decreased and that the proportion of respondents with extreme scores increased (Table S4). Thus, it was decided to report results from Rasch modelling (with exception to PSI) based on polytomous HLS 19 -NAV data (taken from Guttersrud et al. [47]).
With a reduced sample size (n = 720: 20 persons per threshold), good overall datamodel fit for the HLS 19 -NAV was observed in AT (χ 2 : p > 0.05), and sufficient overall data-model fit (χ 2 : p > 0.01) was observed in CH (CAWI, CATI), CZ (CAWI, CATI), and DE (Table 6). When sample size was further reduced (n = 360: 10 persons per threshold), data from BE, PT, and SI (CAWI, CAPI) also displayed sufficient to good data-model fit. The data collected in FR did not fit well to the PCM.
Within countries, the distribution of HLS 19 -NAV item threshold locations was welltargeted at the distribution of person locations, with mean person location varying between −0.31 (DE) and 0.96 (SI, CAWI).
Testing for dimensionality revealed that the HLS 19 -NAV scale is not strictly unidimensional: Using dependent t-tests, between 4.2% (CZ, CATI) and 12.2% (DE) of respondents obtained significantly different scores or proficiency estimates on the organizational level and system level subscales. Thus, too many respondents obtained too different subscales scores to claim that the two subscales measure the same trait. Table 6. Overall data-model fit for the polytomous HLS 19 -NAV data when fitted against the partial credit parametrization of the unidimensional Rasch model (taken from Guttersrud et al. [47] (p. 15)).
Several HLS 19 -NAV items displayed DIF when the sample size was set at n = 1080 (Tables S5-S15), but there was no pattern in which items displayed DIF for specific person factors across the countries. For some items, DIF was also evident when the sample size was reduced to n = 720. In FR and CH (CAWI), respondents aged 46 and older tended to score higher on this item compared to younger respondents despite the same level of NAV-HL. Item HLS 19 -NAV3 also displayed DIF for employment status in BE and CH (CAWI). Moreover, DIF was observed for item HLS 19 -NAV7 (find information on the quality of a particular health service) for gender and age in CZ, only for age in FR, and for education level and 'difficulties with paying bills' in CH (CAWI). Item HLS 19 -NAV8 (judge if a particular health service will meet your expectations and wishes on health care) displayed DIF for age in FR and for gender, age, and social status in CZ (CATI), and item HLS 19 -NAV9 (understand how to get an appointment with a particular health service) for age and employment status in AT, general health status in CH (CATI) and age in CZ (CATI). Item HLS 19 -NAV12 (stand up for yourself if your health care does not meet your needs) displayed DIF for 'difficulties with paying bills' in BE. For data collected in DE and SI (CAWI, CAPI) no item displayed DIF when sample size was reduced to n = 720.
Response dependency was observed between items HLS 19 -NAV7 (find information on the quality of a particular health service) and HLS 19 -NAV8 (judge if a particular health service will meet your expectations and wishes on health care) for data collected in BE (r = 0.37), PT (r = 0.43), and CH (r = 0.38) [47] (p. 34) (not reported in the Table). No signs of unordered response categories were observed [47].

Reliability
The HLS 19 -NAV shows acceptable to high internal consistency across countries ( Table 7). The alpha and omega coefficient values are above 0.83 for all dichotomized data sets and above 0.88 for all polytomous data sets. The AVE is above 0.5 in all data sets except the German (AVE dichotomous = 0.49, AVE polytomous = 0.48) and Swiss (CATI) (AVE polytomous = 0.49) data. The PSI based on dichotomized data was considerably lower than for polytomous data. However, most PSI values were still above the required target values for acceptable internal consistency (Table 7).

Content, Discriminant and Concurrent Predictive Validity
Content or face validity was ensured by developing the HLS 19 -NAV with regard to its underlying theoretical framework and definition of NAV-HL, as the interactional level is only reflected with item HLS 19 -NAV12 in the scale.
With respect to discriminant validity, the NAV-HL scale showed a positive moderate to high correlation with the GEN-HL scale. Correlation coefficients based on dichotomous data varied from 0.41 (BE) to 0.64 (SI, CAPI), while correlation coefficients were higher based on polytomous data with exception to CH (CATI) ( Table 8). Table 8. Pearson correlation between the HLS 19 -NAV scores and the HLS 19 -Q12 scores based on dichotomous and polytomous data (based on HLS 19 Consortium [17] (p. 214)). The analysis confirms that NAV-HL is associated with sociodemographic and socioeconomic factors across countries (Table 9). Financial deprivation and self-perceived social status were significant predictors of NAV-HL in seven of the eight countries, with standardized coefficients varying between β = −0.09 (FR) and β = −0.25 (CZ) for financial deprivation and β = 0.12 (CZ) to β = 0.22 (BE) for self-perceived social status. The analyses revealed that education is negatively associated with the NAV-HL score in five countries (varying between β = −0.06 (AT) and β = 0.13 (CH)), whereas a positive association was observed in the German data (β = 0.10). NAV-HL scores decrease with increased age in some countries (ranging from β = −0.07 (AT) to β = −0.13 (FR)). For gender, no consistent pattern across countries was found. The regression models explained 4% (AT) to 13% (PT) of the variance.
Controlled for gender, age, education, self-perceived social status, and financial deprivation, NAV-HL was a significant predictor for self-reported general health status in seven of eight countries, with standardized coefficients varying between β = −0.06 (FR) and β = −0.13 (AT, DE) (Table 10). For the regression models of NAV-HL score and selfreported general health status, the explained variance varied between 12% (BE) and 32% (PT). In terms of concurrent predictive validity, it was not tested whether the use of the polytomous score affects the results. However, it can be expected from the other analyses for polytomous data that, for these, the coefficients would be somewhat higher. Table 9. Multivariable linear regression models of NAV-HL score (dependent variable) by social determinants (independent variables) for total samples in countries (equally weighted) (mainly taken from HLS 19 Consortium [17] (p. 217)). Standardized coefficients (β) with p-values lower than 0.01 in bold. Education by 9 ISCED levels, from 0 (lowest) to 8 (highest level). Self-perceived social status (from 1 = lowest level to 10 = highest level in society). Financial deprivation: 4 categories, from no deprivation (0) to severe deprivation (100). Due to rounding the numbers to two significant decimals, ±0.00 may represent a value <0.005. Table 10. Multivariable linear regression models of self-reported general health status (dependent variable) by NAV-HL score and other social variables (independent variables) (equally weighted) (mainly taken from HLS 19 Consortium [17] (p. 223)). Standardized coefficients (β) with p-values lower than 0.01 in bold. NAV-HL score, from 0 (minimal) to 100 (maximal). Education by 9 ISCED levels, from 0 (lowest) to 8 (highest level). Self-perceived social status (from 1 = lowest level to 10 = highest level in society). Financial deprivation: 4 categories, from no deprivation (0) to severe deprivation (100). Due to rounding the numbers to two significant decimals, ±0.00 may represent a value <0.005.

Discussion
This is the first study shedding light on to what extent the newly developed HLS 19 -NAV scale has acceptable psychometric properties and validity characteristics.
Fitting a single-factor CFA (based on dichotomous and polytomous data), the HLS 19 -NAV data obtained acceptable goodness-of-fit indices across countries, confirming that it is permissible to summarize the twelve NAV-HL items in one factor score. However, Rasch modelling indicates that the HLS 19 -NAV scale is not strictly unidimensional. As the HLS 19 -NAV is based on a framework with theoretically derived levels, multidimensionality was expected to some degree. This was also confirmed by the two-factor CFA model (based on dichotomous and polytomous data), showing improved fit values in comparison to the single-factor model. Considering a reduced sample size, the HLS 19 -NAV showed sufficient overall fit to the PCM in most countries, except for FR. The NAV-HL scale had sufficient internal consistency and reliability across countries. Furthermore, targeting of the scale was sufficient in most countries, pointing to a well-balanced level of item difficulty [60,69].
Some items displayed significant misfit (applying a reduced sample size). A systematic pattern across countries was found for item HLS 19 -NAV9 (understand how to get an appointment with a particular health service) with a poor data-model fit in most countries. HLS 19 -NAV9 was also identified as an under-discriminating item in data from BE and FR, leading to the conclusion that, apparently, the item measures something else in addition to NAV-HL that is negatively correlated with the underlying concept. It might be possible that the item is not only associated with the understanding of health information, but also with difficulties in obtaining appointments, as long waiting times for health services have been an important issue across countries [70]. As item HLS 19 -NAV9 showed limitations in both Rasch analyses and in the CFA, we conclude that it may be beneficial to the scale if item HLS 19 -NAV9 is supplemented by alternative items in future studies to examine whether it can be replaced.
It was also inspected whether respondents from different sociodemographic groups but with the same location on the underlaying latent trait, NAV-HL, responded differently on the given information tasks (DIF) [71]. In most samples, some items displayed DIF for personal factors, but no consistent patterns were observed. However, since possible causes of DIF include the content of an item, its level of intricacy in words and sentences, differences in cultural relevance [72], and probably differences in HCS characteristics, a further evaluation of items showing DIF in certain population groups is recommended. This could be done, for example, by using focus groups or cognitive interviews [73]. This is also supported by the fact that, in DE, where the items were evaluated with the help of a qualitative approach in the process of instrument development, no problems with DIF were noted (with a reduced sample size).
Another objective of this study was to investigate whether and how far the use of dichotomous (as was done in HLS 19 ) or polytomous (as was done in the HLS-EU) data affects the psychometric properties of the HLS 19 -NAV scale. No major differences were found between the polytomous and dichotomous responses when using CFA, with exception to higher RMSEA values when CFA was based on polytomous data. It may be considered "normal" that the presented fit indices come to different recommendations in terms of model fit as they react differently to model weaknesses, violation of distributional assumptions, or sample sizes [74] (p. 221). For this reason, Hu and Bentler [54] (pp. 27,28) (also Weiber and Mühlhaus [74] (p. 223)) recommended that, for large samples (n > 250), conclusions about model fit should be based on a combination of TLI or CFI (≥0.95-0.96) under consideration of the SRMR (≤0.09-0.10). Given that polytomous data also met these criteria, it is reasonable to assume that CFA describes the data sufficiently well for both polytomous and dichotomous responses. Considering that there were no problems with the four response categories of the HLS 19 -NAV and that the polytomous NAV-HL score tends to be more normally distributed than the dichotomous score across countries (Figgures S1 and S2), it may be beneficial in future studies to calculate a score based on polytomous data, since dichotomization always goes hand-in-hand with a loss of information and thus statistical power [42,75,76].
Furthermore, the analysis revealed response dependency between item HLS 19 -NAV7 (find information on the quality of a particular health service) and item HLS 19 -NAV8 (judge if a particular health service will meet your expectations and wishes on health care) in BE, CH, and PT, meaning that these items have something more in common than the underlying latent trait, which is reasonable as both items intend to measure related aspects that are important for choosing a particular health service. Although this was only a problem in three countries, it should be carefully examined whether adjustments of these "too similar" items [47] (p. 7) in the three countries will be beneficial in light of a future cross-national assessment of NAV-HL.
Regarding content or face validity, one strength of the instrument is that the HLS 19 -NAV scale was developed based on a definition and conceptual framework. Nevertheless, the interactional level of the NAV-HL framework is only represented by item HLS 19 -NAV12 (stand up for yourself if your health care does not meet your needs) in the scale [24], which implies a limitation in terms of content validity. It is likely that interacting and communicating with health services and professionals is critical for navigating the HCS [40,77]. At the same time, it must be considered that these aspects go beyond NAV-HL and form overarching concepts. For this reason, it was decided at an early development phase of the HLS 19 -NAV to display interactive tasks only very cautiously and to develop a separate instrument with a general focus on communicative HL, the HLS 19 -COM-P [42]. In future studies, it is recommended to examine whether items on interactive and communicative HL could be added to the NAV-HL scale to better reflect the conceptual NAV-HL framework. The items of the HLS 19 -COM-P may be used as a first starting point. Nevertheless, they have to be transferred to the specific field of navigating the HCS as they specifically measure general communicative HL in interaction with physicians [42].
Concerning concurrent predictive validity, the analysis revealed that HL-NAV is determined by sociodemographic and socioeconomic factors, as was shown for general HL [17,78]. NAV-HL was also linked to general health status in most countries, indicating that the measure is able to provide some kind of predictive value.
Discriminant validity as one indicator for construct validity was examined for the relationship between NAV-HL and GEN-HL. It is reasonable to conclude that, with HLS 19 -NAV, new aspects in managing health information for the specific context of navigating the HCS were assessed, since the two measures correlate only to a certain extent. At the same time, the results point to overlaps between the two measures, leading to the assumption that NAV-HL belongs to a family of specific HL measures, introduced in the HLS 19 , which are all linked to GEN-HL [17,42].

Strengths and Limitations
The major strength of this study is that information on the psychometric properties of the newly developed HLS 19 -NAV is based on large representative population samples from eight countries and that the HLS 19 -NAV scale could be successfully applied and evaluated for different HCSs, different languages, and for different methods of data collection.
A limitation of the study is that different linguistic groups within BE and CH were merged but not considered in the analyses. Furthermore, the timing of data collection was not strictly standardized across countries. In this regard, the COVID-19 pandemic may have influenced the results. It is likely that, in most countries (except DE, where data was collected before the pandemic), respondents who used COVID-19-related services were exposed to an increased level of information about the HCS. This information may have differed in content and availability from conventional information about the HCS and may have played a role in item interpretations. Moreover, country-specific characteristics of the HCS may have an impact on how the items were interpreted. A further limitation is that responses to the HLS 19 -NAV items may not only rely on respondents' direct experiences but also on general assessments, which is a well-known limitation of the self-assessment approach. However, self-reporting, in contrast to objective measures, offers the chance to better understand the barriers of the HCS from the user's perspective.

Conclusions
Obstacles in navigating the HCS are burdensome for many patients and users and may increase inequities in access to and the results of health care [79,80]. For this reason, measuring NAV-HL is important for deriving recommendations for decision-makers and practitioners on where and how information and practices should be adjusted to strengthen population NAV-HL, but also for shaping user-friendly, transparent, healthliterate HCSs and organisations. With the development, introduction, and validation of the new HLS 19 -NAV, a first important step towards measuring NAV-HL in general adult populations has been achieved. The new HLS 19 -NAV has proven to be a suitable instrument for measuring NAV-HL across countries, different HCSs, different languages, and different data collection methods by acceptable psychometric properties and first validation results. In addition, its integration into a family of HL measures-all introduced within HLS 19 -makes it possible to generate new specific insights about HL in a specific field without losing sight of the underlying HL concept. However, there is also room for improvement, especially with respect to under-discriminating items and DIF. In further analysis of the HLS 19 -NAV data and future studies, the instrument should be tested in specific population groups who are particularly dependent on navigation-related information, such as patients with chronic illness or caregivers.

Instrument Use
The instrument belongs to the HLS 19 Consortium. The use of the instrument needs contractual agreement between a non-profit applicant and the HLS 19 Consortium. Further information can be found here: https://m-pohl.net/tools, accessed on 24 October 2022.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/ijerph192113863/s1, Table S1: Categories used for the analysis of differential item functioning (DIF); Table S2: Entries in the residual correlation matrix >0.10 based on dichotomous data; Table S3: Entries in the residual correlation matrix >0.10 based on polytomous data; Table S4: Power of fit and extreme records in Rasch modelling based on dichotomous and polytomous (italic) data; Table S5-S15: Item fit statistics of the HLS 19 -NAV for Austria (CATI), Belgium (CAWI), Czech Republic (CAWI, CATI), France (CAWI), Germany (PAPI), Portugal (CATI), Slovenia (CAWI, CAPI), and Switzerland (CAWI, CATI); Figure S1: Distribution of the NAV-HL score (0-100) based on dichotomous data; Figure S2: Distribution of the NAV-HL score (0-100) based on polytomous data. Funding: This research received no external funding, but data collection was supported either by ministries of health, universities, public health institutes, or insurance funds in the respective countries. AT: The Austrian Health Literacy Survey was commissioned and financed by the Austrian Federal Health Agency and the Federation of Austrian Social Insurance Institutions. BE: The data collection for Belgium (NL and FR) was funded by the Union Nationale des Mutualités Libres (MLOZ). CH: The national HL survey was funded by the Swiss Federal Office of Public Health (FOPH). CZ: Data collection was jointly funded by (all seven) Czech health insurance funds. DE: The German study was funded by the German Federal Ministry of Health, grant number: Kapitel 1504 Titel 54401, ZMV I 1-2518 004 (HLS-GER 2). FR: The research was supported by the National Public Health agency (Santé Publique France, 21DPPA040-0) and by Ligue contre le cancer (LIGUE2019). NO: The Norwegian HLS 19 was commissioned and financed by the Norwegian Ministry of Health and Care Services. The Norwegian Directorate of Health funded the data collection and the administrative costs for the whole project, while Oslo Metropolitan University and Inland Norway University of Applied Sciences contributed with the scientific workforce. PT: This research received no external funding, but the data collection was supported by the Directorate General for Health. SI: The national survey of Health literacy in Slovenia took place within the framework of the project Increasing health literacy in Slovenia-ZaPiS, which is co-financed by the Republic of Slovenia in the amount of 20% and the European Union from the European Social Fund in the amount of 80% (grant number: C2711-19-031040).

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki. In each country, it has been ensured that ethical requirements have been met. For more information about ethical considerations, data protection, and informed consent by country, please see the International Report on the methodology, results, and recommendations of the European Health Literacy Population Survey 2019-2021 (HLS 19 ) of M-POHL [17].
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
Information about data supporting reported results can be found on the M-POHL webpage, https://m-pohl.net/Design_Methods, accessed on 24 October 2022.