The Reliability and Compatibility of the Paper and Electronic Versions of the POLLEK Cohort Study Questionnaire

Background: Chronic fatigue, depression, burnout syndrome, and alcohol addiction have been identified as significant mental health problems in young medical doctors. Given the lack of prospective studies in this area in Poland, the POLski LEKarz (POLLEK) cohort study was created. The goal of the POLLEK study is to assess the quality of life and health status (including mental health) of medical students and young physicians. The aim of the presented paper was to assess the reliability and compatibility of paper and electronic versions of the POLLEK questionnaire. Methods: Between 1 October 2019 and 28 February 2020, all medical students (N = 638) of the first year in the Medical University of Silesia were invited to participate in a cross-sectional study. Three hundred and fifty-three students (55.3%) who accomplished both versions were included in the current analysis. Results: Values of Cronbach’s alpha >0.7 proved both modes of delivery to have good internal consistency, except for the individual Alcohol Use Disorder Identification Test (AUDIT) domains and the Environmental domain of the WHOQOL-BREF (paper version). Similarly, interclass correlation coefficients equal to or greater than 0.9 denoted an excellent reproducibility. Conclusions: We documented very good accordance and reproducibility of POLLEK questionnaire (both paper and electronic versions). These findings legitimize the use of the questionnaire interchangeably.


Introduction
The current epidemiological situation of COVID-19 in all European countries (including Poland) vividly highlighted how difficult and responsible medical doctor work is. A previously published review paper of our team revealed that psychosocial determinants have a significant impact on mental health and quality of life of physicians [1]. Chronic fatigue, burnout syndrome, alcohol addiction, risky alcohol consumption, depression, and potential suicidal ideation are among the most important mental health problems of young medical doctors and even medical students [2][3][4][5]. Public health experts suggest that future research of mentioned problems should be conducted on the bases of prospective observations. The lack of this type of research in Poland justifies taking up the research topic in the group of medical students, as future doctors. We understand that reliable scientific knowledge requires appropriate, standardized tools, including validated research questionnaires.
Following the mentioned justification, we created the integrated original questionnaire that we used in the first step of POLLEK cohort study, which aimed to identify and evaluate the quality of life and health status (including mental health) of medical students and young physicians with simultaneous assessment of their determinants related to studying and working conditions in medical students and young physicians during a long-term observation. Additionally, in the model of epidemiological cohort study, a control of the socio-demographic factors, as well as those that identify lifestyle and chronic diseases is planned. The aim of the presented paper is an evaluation of the reliability and compatibility of both the paper and electronic versions of the POLLEK questionnaire.

Study Design and Sampling
A cross-sectional study was performed between 1 October 2019 and 28 February 2020. All medical students (N = 638) of the first year in the Medical University of Silesia (MUoS, Poland) were invited to participate in the study project. Written consent to the examination was obtained from n = 559 students (N 1 = 354; 91.2% of all medical students in Katowice and N 2 = 205; 82.0% of all medical students in Zabrze); both are medical faculties of MUoS. Detailed descriptive statistics were presented in Table 1. The first step of the study was related to the necessity of questionnaire validation. The integrated tool includes the Polish version of the WHOQOL-BREF questionnaire [6], the next is the Alcohol Use Disorder Identification Test (AUDIT) [7], and also the original questionnaire identifying individual nutrition, demographic, socioeconomic, and anthropometric determinants. It is worth indicating that the WHOQOL-BREF questionnaire regards four domains of quality of life (26 items in total): somatic (physical health), psychological, social (social relationships), and environmental domains. Whereas, the AUDIT questionnaire is a 10-item screening tool to assess alcohol consumption, drinking behaviors, and alcohol-related problems. Both questionnaires had been successfully used in previous studies [8,9].

Statistical Analysis
Initially, data were analyzed using descriptive statistics (median and interquartile range, IQR). Reproducibility (test-retest reliability of the POLLEK questionnaire) was assessed by asking all of the students (N = 638 in both medical faculties of MUoS) to complete the paper and online version of the instrument. A total of 560 students (response rate of 87.8%) completed the paper version (341 females, 218 males, and 1 missing data). As many as 353 students (55.3%) also completed the electronic version of the questionnaire; nearly 62% of them were women. The median age of respondents was 19 years. About three-quarters of students were living away from their families. Detailed statistics are presented in Table 1. The interclass correlation coefficient (ICC) was analyzed in a test-retest reliability study using the ICC function available in the psych (v1.9.12) package in R software. Moreover, the Bland-Altman plots were obtained to describe differences between the scores and assess heteroscedasticity [10]. Additionally, the repeatability was evaluated by Cohen's kappa statistics [11]. The reliability of the scales and their domains was evaluated using Cronbach's alpha coefficients of internal consistency. Moreover, we conducted confirmatory factorial analysis (CFA) using the lavaan (v0.6-5) package in R software to evaluate the structure of each major part of the questionnaire and their domains. To measure the goodness of fit, the Comparative Fit Index (CFI), Tucker Lewis Index (TLI), and root mean square error of approximation (RMSEA) were used. RMSEA results were scored as a good fit for ≤0.05, adequate fit (0.05-0.08), mediocre (0.08-0.10), while values > 0.10 denoted not acceptable fit. Furthermore, values of CFI and TLI greater than 0.95 were interpreted as an acceptable fit [12,13].
All analyses were performed in R 3.6.2 software [14], and results were presented with the respective confidence intervals (95%) or p values (significant at the level <0.05).

Ethical Approval
The ethics approval for the study was received from the Bioethical Committee of the Medical University of Silesia in Katowice (approval number KNW/0022/KB/217/19; date: 8 November 2019). Written informed consent was obtained from all participants.

Results
The Bland-Altman analysis demonstrated high accordance between scores in the paper and internet version of WHOQOL-BREF and AUDIT scales (see Figure 1 and Table 2 for more details).
In the ICC analysis, the accordance between paper and electronic version of the WHOQOL-BREF questionnaire was excellent for the overall scores (ICC = 0.92), and the specific domains (ICCs vary from 0.90 to 0.94). Additionally, assessment of repeatability of answers to particular questions (with Cohen's kappa, Spearman's rho, and Kendall's tau) is available in supplementary Table S1. Both versions of the WHOQOL-BREF questionnaire had very good internal consistency (α near or equal to 0.9), while the reliability of the electronic version was higher compared to the paper. The greater difference was revealed for the environmental domain. The accordance between both versions of the AUDIT questionnaire was also excellent (ICC value of 0.96), including specific domains except for the "Dependence Symptoms" domain with a value of 0.83. We demonstrated also a good internal consistency (α value of 0.77), except for the individual AUDIT domains. Detailed results are presented in Table 2.  Legend: Me-median; IQR-interquartile range; M-mean; SD-standard deviation; CI-95% confidence interval; α-Cronbach's alpha; MoD-mean of differences (also called "bias") calculated with Bland-Altman statistics; ICC-intraclass correlation coefficient; κ-Cohen's kappa (unweighted).

Discussion
Reliability and reproducibility are important aspects of questionnaire validation. Questionnaires should be able to reproduce results to be valid [15]. Regarding the statistical measures using commonly in validation studies, the results of the reviewed bibliography indicated that the questionnaire validity was assessed mainly by Cronbach's alpha coefficient and intraclass correlation coefficient (ICC) [16,17] This observation showed that these measures were used in our study in a reasonable manner. Moreover, obtained results documented well or very good reproducibility (ICC > 0.8 and Kappa Cohen > 0.8 in each assessed domain). Additionally, the results of the measured Cronbach's alpha statistic confirmed moderate or high consistency (α > 0.5 in each scale).
The possibility of using the paper version of WHOQOL-BREF in many populations has its established position [8,18]. Few studies have been conducted using the WHOQOL-BREF questionnaire to assess the impact of medical education on the quality of life of students [19][20][21][22]. Although the role of the electronic version of the WHOQOL-BREF questionnaire was confirmed in 2008 [23,24], we have not been able to find a study assessing the electronic form in the medical student population. To the best of the authors' knowledge, the presented study is the first study authorizing the use of the online version of this tool among medical students.
The AUDIT is a screening questionnaire developed by the World Health Organization (WHO) to assess alcohol-related problems, and available published data indicate that it is a reliable and valid tool used in different cultural backgrounds [25][26][27][28][29][30]. However, the AUDIT questionnaire has not yet been validated among medical students in Europe, both in the paper or electronic form. Nevertheless, the electronic version of this tool was used in a validation study among medical students from China [31] and was validated among university students [32,33]. It is worth mentioning that the AUDIT questionnaire was applied in a cross-sectional study on the prevalence of alcohol use disorders among American surgeons [34]. We believe that the findings described in the presented paper can complement the observed gap.
Obtained results confirmed, that both versions (paper and electronic) of the AUDIT and WHOQOL-BREF questionnaires can be used interchangeably in the Polish cohort study of medical students and young medical doctors. Very high agreement in both kappa and ICC statistics (higher than 0.8 in each case of the assessed domain) indicates that an electronic questionnaire is a reliable tool in planned cohort studies aimed to assess the quality of mental health. This is an important observation for future planned research that will be realized during the COVID-19 pandemic when real-time interpersonal contacts are significantly hampered. Choosing the electronic version of the tool will facilitate contact with medical students in the coming years, also after completing education, and at the same time, will significantly reduce the costs of subjects' recruitment. In general, it can be assumed that the results obtained by us are consistent with previous observations [8,20,21,[24][25][26]28,29,31,32].

Limitations of the Study
Although a large proportion of the invited students have agreed to participate in the study, the fact of not obtaining one hundred percent participation may to some extent limit the conclusions of the study. Similarly, as Chen et al. reported [23], we have only examined the validity of the Internet version of WHOQOL-BREF questionnaire with a standard set of items, whereas the WHOQOL group recommended adding some questions relevant for the studied population [35]. Although our integrated questionnaire also contained additional demographic and nutritional assessment questions, they were not an extension of the WHOQOL-BREF questionnaire.

Conclusions
We demonstrated very good accordance and reproducibility of both versions of the POLLEK questionnaire. These findings legitimize the use of both versions interchangeably.