Self-Ratings of Olfactory Function and Their Relation to Olfactory Test Scores. A Data Science-Based Analysis in Patients with Nasal Polyposis

: Olfactory self-assessments have been analyzed with often negative but also positive conclusions about their usefulness as a surrogate for sensory olfactory testing. Patients with nasal polyposis have been highlighted as a well-predisposed group for reliable self-assessment. In a prospective cohort of n = 156 nasal polyposis patients, olfactory threshold, odor discrimination, and odor identiﬁcation were tested using the “Snifﬁn’ Sticks” test battery, along with self-assessments of olfactory acuity on a numerical rating scale with seven named items or on a 10-point scale with only the extremes named. Apparent highly signiﬁcant correlations in the complete cohort proved to reﬂect the group differences in olfactory diagnoses of anosmia ( n = 65), hyposmia ( n = 74), and normosmia ( n = 17), more than the true correlations of self-ratings with olfactory test results, which were mostly very weak. The olfactory self-ratings correlated with a quality of life score, however, only weakly. By contrast, olfactory self-ratings proved as informative in assigning the categorical olfactory diagnosis. Using an olfactory diagnostic instrument, which consists of a mapping rule of two numerical rating scales of one’s olfactory function to the olfactory functional diagnosis based on the “Snifﬁn’ Sticks” clinical test battery, the diagnoses of anosmia, hyposmia, or normosmia could be derived from the self-ratings at a satisfactorily balanced accuracy of about 80%. It remains to be seen whether this approach of translating self-assessments into olfactory diagnoses of anosmia, hyposmia, and normosmia can be generalized to other clinical cohorts in which olfaction plays a role.


Introduction
Self-ratings of olfaction have been reported to reflect true olfactory function well in patients with nasal polyposis [1]. A PubMed database search at https://pubmed.ncbi. nlm.nih.gov on 19 May 2021 using the string "(("self-rating" OR "self-ratings" OR "selfestimate" OR "self-estimates") AND (smell OR olfaction)) NOT review[PT]" returned 45 results. After removing reports in which the self-rated item was not olfaction but, for example, depression, or the self-estimate merely referred to a global perception of a loss of smell, and adding a further paper from the references of the queried papers, 24 original papers on this topic remained (Table 1). Table 1. Studies (in order of publication year) that use olfactory self-ratings. The list is based on a PubMed database at https://pubmed.ncbi.nlm.nih.gov on 19 May 2021 using the string "(("self-rating" OR "self-ratings" OR "self-estimate" OR "self-estimates") and (sense of smell OR olfaction)) NOT review[PT]", followed by the curation of the hits. No correlation (n = 60) or present correlation (n = 23) [4] 100 Pregnant women 7-point NRS Identification (UPSIT) No correlation [5] 36 Healthy volunteers VAS Identification (n-butanol), 10-item identification (CCCRC) Good correlation [6] 1311 Twins, general population 7-point NRS Identification (6 odors) No correlation [7] 211 General population 5-point NRS, VAS Identification (16 odors) Weak correlation [8] 1005 General population 4-point NRS, VAS Identification (16 odors) No correlation [9] 1082 General population 5-point NRS None - [10] 31 Healthy volunteers VAS Threshold, discrimination, identification (Sniffin' Sticks) Not tested [11] 31 Fibromyalgia 3-point NRS Threshold, discrimination, identification (Sniffin' Sticks) Poor correlation [1] 80 Nasal polyposis VAS based DyNaChron questionnaire Threshold (n-butanol), identification (16 odors) Strong correlation [12] Pregnant women 9-point NRS Threshold (PEA), Poor agreement [13] 75 Patients with olfactory dysfunction 9-point NRS Threshold, discrimination, identification (Sniffin' Sticks) No correlation [14] 1422 General population and olfactory dysfunction 5-point NRS Identification (16 odors) Low but significant correlation [15] 162 HIV infection 1-point NRS Threshold, discrimination, identification (Sniffin' Sticks) Self-ratings and test results differed [16] 9 Nasopharyngeal carcinoma 10-point NRS, 6-point NRS Olfactory event-related potentials Significant correlation [17] 117 ( The assessment of olfactory self-ratings in terms of their correspondence to the tested olfactory function was mostly moderate rather than enthusiastic. For example, we recently published an analysis of a large cohort of n = 6049 subjects comparing self-assessments with the results of a 12-item odor identification test, with the conclusion that asking the patient about olfactory function can at best provide a rough diagnosis of anosmia versus normosmia that cannot be relied upon [20]. Similarly, an analysis of n = 211 subjects from the general population of Taiwan came to the result that most subjects did not rate their olfactory function well and measured olfactory function and self-ratings correlated only weakly [7]. This seems to be partly rooted in the fact that many individuals in the general population are not aware of their own olfactory status [23,24].
High correlations have been reported twice and independently for patients with nasal polyposis [1,21]. This raised the question of whether they constitute a special group that is particularly aware of their olfactory function. In the present study, a cohort of patients scheduled for surgery for nasal polyposis was assessed. Two different self-assessment scales were used together with a standard clinical test of olfactory function that included three main sensory dimensions: olfactory threshold, odor discrimination, and odor identification. Quality-of-life ratings were added as a common reference point to facilitate the interpretation of information on olfactory acuity, which is queried either through self-assessments or clinical odor tests.

Study Design
The prospective study was conducted in accordance with the Declaration of Helsinki on Biomedical Studies Involving Human Subjects. It was approved by the Ethics committee at the Dresden University Hospital (approval number EK14502017). All participants gave informed written consent.

Setting
The cohort included patients who were preparing for endoscopic sinus surgery at the Department of Otorhinolaryngology, St. Johannes Municipal Hospital, Dortmund, Germany. Measurements took place between May 2018 and August 2019.
The data analyzed in the present report were obtained as an add-on performed at baseline assessment of olfactory function in a study that aimed at investigating the olfactory outcome and the quality of life after functional endoscopic paranasal sinus surgery in patients with nasal polyps. The main part of the study will be analyzed separately. However, the present analyses of agreement between self-assessed olfactory function and olfactory performance quantified by an established clinical test, are not redundant with the main analyses of the present study; they would distract from the focus of the main analyses, hence, they are reported separately. Thus, the present data were obtained in a cross-sectional design on a single occasion.

Participants
A total of 158 patients with nasal polyps, i.e., 60 men and 98 women, aged 13.9-84.6 years (mean ± standard deviation: 49.1 ± 14.8 years) was included. Inclusion criteria were age 18 years and older, absence of pregnancy, absence of a neurodegenerative disorders such as Parkinson's or Alzheimer's disease, and absence of other disorders that are strongly associated with olfactory loss, e.g., advanced renal dysfunction.

Self-Ratings of Olfactory Function
Participants rated their olfactory function in two ways on two different Likert-type scales. Rating scale #1 was an 8-point scale, with each point being labeled as 0 = "no smell perception", 1 = "extremely bad", 2 = "much worse than normal", 3 = "worse than normal", 4 = "normal sense of smell", 5 = "better than normal", 6 = "much better than normal", and 7 = "excellent". Rating scale #2 was a discrete scale, with labels only at endpoints of the scale, with 10 data points on which subjects rated their olfactory function from 1 = "not present" to 10 = "excellent". The scales were presented at different positions of the questionnaire.

Olfactory Testing
Olfactory function was quantified using an established clinical test ("Sniffin' Sticks", Burghart Instruments, Wedel, Germany) [25,26], which evaluated three sensory dimensions of odors comprising olfactory threshold (to phenylethyl ethanol), odor discrimination (16 pairs of odors) and odor identification (16 odors). The olfactory functional diagnosis was obtained from the sum of scores for threshold, discrimination and identification (TDI) subtests, with a range between 1 and 48 points. The TDI score allows the categorization of subjects as normosmic (sum score > 30.5 points), hyposmic (16.5-30.5 points), or function-ally anosmic (< 16.5 points), based on normative scores obtained in more than 9000 healthy subjects [27].

Assessment of the Quality of Life
As a disease-specific measure of the patients' quality of life, the Sino-Nasal Outcome Test (SNOT-20) questionnaire [28] was used to quantify sinonasal symptoms. It consists of 20 questions categorized into five different domains (rhinologic symptoms, extranasal rhinologic symptoms, ear/face, psychological dysfunction, and sleep dysfunction). Each of the 20 queried items was rated on a Likert scale from 0 = "no problem" to 5 = "it can't get any worse". From the responses a sum score and three specific subscores are calculated. In the present context of a general exploration of the context of self-ratings of the sense of smell, the subscore "general quality of life" was selected. It contains the individual responses to questions about dizziness, problems with waking up at night, fatigue during the day, diminished performance, poor concentration frustration/restlessness/irritability, sadness, and embarrassment of the disease symptoms. According to the original definition of the SNOT-20 questionnaire, the final "general quality of life" subscore is calculated as ∑(rating#11, 13, . . . , 20)/45·100, where the value of 45 accounts for the maximum sum of individual response to nine questions and the numbers #11, . . . refer to the item numbers in the SNOT-20.

Bias
The study included an unselected random sample of patients consecutively enrolled for functional endoscopic sinus surgery at their scheduling. Potential confounders of olfactory function such as occupational exposure to toxic substances or intake of medications [29,30] were recorded and eventually excluded as a possible cause of the observed results.

Study Size
The sample size was defined to be twice that of a positive study of the correlation of measured olfactory function with self-assessments of olfaction in n = 80 patients with nasal polyposis [1]. A formal sample size estimate was not performed.

Quantitative Variables
The data set included n = 158 subjects and d = 9 variables, including (i-iii) d = 3 variables that contained the results of the olfactory subtests from which the individual sum scores (iv) were calculated and translated into the olfactory diagnosis, (v-vi) d = 2 variables that contained the self-assessments of olfactory function according to the two rating scales, (vii) the quality of life expressed as the weighted sum score of the relevant nine items in the SNOT-20 questionnaire, and (viii-ix) the patients' age and gender. Missing values in key variables on sense of smell were not imputed; missing values in other variables were replaced by the median of the available data.

Data Analysis
The data analysis was primarily aimed at evaluating the utility of self-assessments of olfactory function as a surrogate for functional olfactory testing. Quality of life was included in some of the analyses as a guide for interpreting possible discrepancies between self-assessments and olfactory test results.
The programming work for this report was performed in the R language [31] using the R software package [32] (version 4.0.5 for Linux), which is available free of charge in the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/, accessed on 20 May 2021. The analyses were performed on an Intel Core i9-7940X ® computer (Intel Corporation, Santa Clara, CA, USA) running Ubuntu Linux 20.04.2 LTS (Canonical, London, UK).

Statistical Comparison of Diagnostic Group Differences in Self-Rated Olfactory Function
Data analysis was aimed first at statistically significant differences in the two selfassessment scores between the olfactory diagnostic groups of anosmia, hyposmia, and normosmia. For this purpose, the results of the three subtests were subjected to a repeated measures analysis of variance (rm-ANOVA), with the within-subject factor "rating scale" (two levels), the between-subject factors "olfactory diagnosis" (three levels), and "gender". To avoid detecting the effects of the factor "rating scale" simply due to the different scaling, both scores were (re)scaled to the range [1, . . . ,10]. These calculations were performed using the R libraries "rstatix" (https://cran.r-project.org/package=rstatix, accessed on 20 May 2021 [33]) and "scales" (https://CRAN.R-project.org/package=scales, accessed on 20 May 2021 [34]). The α-level was set at 0.05.

Covariance and Correlation Analyses
Second, the covariance and correlation structure between clinical olfactory test results and subjective ratings of olfactory function were analyzed. Quality of life was included in the analysis for comparison. Olfactory thresholds were log-transformed to account for the geometric scaling of their acquisition. The 7 x n sized data space (n denoting the analyzed sample size) was projected onto a two-dimensional space using principal component analysis (PCA [35,36]) on scaled and centered data as the default settings of the R-library "FactoMineR" (https://cran.r-project.org/package=FactoMineR, accessed on 20 May 2021 [37]). To select the most suitable PCs for further analyses, the eigenvalues were submitted to an item categorization technique implemented as computed ABC analysis [38]. This divides each set of positive numerical items into three non-overlapping subsets, referred to as "A", "B", and "C" [39], of which subset "A" contains the "important few" items (i.e., the relevant PCs) to be retained. As shown previously, this is a mathematically valid replacement for traditional thresholds, such as the Kaiser-Gutman criterion, which chooses a threshold of eigenvalue > 1 for PC selection [40,41] and maximizes the information obtained from multivariate biomedical data. These calculations were performed using our R package "ABCanalysis" (https://cran.r-project.org/package=ABCanalysis, accessed on 20 May 2021 [38]). Finally, given the discussions about the suitability of PCA for Likert-scale data [42], nonlinear PCA was additionally performed to test whether the conclusions drawn from standard PCA held up. This was conducted using the R package "kernlab" (https://cran.r-project.org/package=kernlab, accessed on 20 May 2021 [43]).
Third, correlations between olfactory test results or self-assessments were analyzed by calculating Spearman's ρ [44]. Quality of life was included in the analysis as a common reference point to facilitate interpretation. Correlations were calculated for the entire data set and separately for the six subgroups consisting of the three odor diagnoses anosmia, hyposmia, and normosmia and the two genders. These analyses were performed using the R packages "GGally" (https://CRAN.R-project.org/package=GGally, accessed on 20 May 2021 [45]) and "inspectdf" (https://CRAN.R-project.org/package=inspectdf, accessed on 20 May 2021 [46]).

Assessment of the Utility of Self-Ratings for Olfactory Diagnosis Establishment
Fourth, the usefulness of the olfactory self-ratings for olfactory diagnosis assignment was evaluated. The two necessary breakpoints for the olfactory diagnoses hyposmia and normosmia were determined using an exhaustive search approach. That is, all possible combinations of two consecutive breakpoints along increasing self-assessment scores were analyzed with respect to the balanced accuracy [47] of the obtained olfactory diagnosis. The procedure resulted in an assignment rule for the olfactory diagnoses from the self-rating in the form of "IF self-assessment < breakpoint #1 THEN anosmia ELSE IF self-assessment < breakpoint #2 THEN hyposmia ELSE normosmia". This was repeated 1000 times using bootstrap [48] resampled data sets of 100 cases each, from which the minimum accuracy among the three olfactory diagnoses was maintained. The final combination of breakpoints was the one for which the median minimum balanced accuracy was highest. Other measures of classification performance were calculated for this combination, including sensitivity, specificity, negative and positive predictive values calculated using standard equations [49,50], and the F1 measure [51,52]. These calculations were performed using the R library "caret" (https://cran.r-project.org/package=caret, accessed on 20 May 2021 [53]). The 95% confidence intervals (CI) of the classification performance parameters were determined as the range between the 2.5th and 97.5th percentile of the respective values during the 1000 runs.

Participants and Descriptive Data
Two women were excluded because olfactory tests or self-ratings were incomplete. The analyzed data set thus consisted of n = 156 patients (60 men, 96 women, aged 49.1 ± 14.8 years), of whom n = 65 were anosmic, n = 74 had hyposmia, and n = 17 had normal olfactory function, based on testing with the "Sniffin' Sticks". Two missing quality-of-life ratings were replaced with the median of the available 154 cases.

Main Results
Patients assigned to either the olfactory diagnoses of anosmia, hyposmia, or normosmia based on TDI scores differed with respect to self-assessments of their olfactory function ( Figure 1). The results of the analysis of variance for repeated measures showed significant effects of the factors "olfactory diagnosis" and, at a much lower significance level, "rating scale", while "gender" had no effect. Details are given in Table 2.   [54]). The colors were selected from the "colorblind_pal" palette provided with the R library "ggthemes" (https://cran.r-project.org/package=ggthemes, accessed on 20 May 2021 [55]). Table 2. Results of analyses of variance for repeated measures. The analysis was designed using the within-subject factor "rating scale" (two levels, one degree of freedom) and the between-subject factors "olfactory diagnosis" (three levels, two degrees of freedom) and "gender" (one degree of freedom). The factor "rating scale" refers to the ratings on the two different scales, i.e., a numerical rating scale with seven named items (rating scale #1) and a 10-point scale with only the extremes named (rating scale #2). For the direction of the effects, see Figure 2. Degrees of freedom were corrected according to Greenhouse-Geiser [56]. * p < 0.05.

Pattern of Olfactory Tests and Self-Ratings
PCA projection of the high-dimensional olfactory test or self-estimates data ( Figure 3) and subsequent selection of the relevant PCs based on computed ABC analysis of the eigenvalues retained two PCs. The two PCs explained 64.7 and 14.8% of the total variance, respectively. PC1 carried relevant loadings from all olfaction-related variables except olfactory threshold. PC2 carried relevant loadings from the quality of life. However, nonlinear PCA showed an additional separation of rating scales from olfactory test scores. This was also observed in the factor plot ( Figure 3C) and became even more evident when quality of life was omitted from the standard PCA projection, which resulted in two main PCs in which PC1 carried loadings from the olfactory tests but not self-ratings, and PC2 carried loadings from the self-ratings (Supplemental Figure S1).

Correlations of Olfactory Tests and Self-Ratings
High correlations of the olfactory subtests with each other and the resulting TDI sum score of the two self-assessment scores with each other were implied by the PCA results and were not investigated further. The focus was on the correlations of self-ratings with olfactory test results. There, a global assessment also indicated high correlations with values of ρ between 0.53 and 0.66 and p-values < 0.001 (precisely, 1.57 · 10 −12 or less) ( Figure 1). However, the apparently high correlations of self-assessments and olfactory test results disappeared when the analyses were applied to the olfactory diagnoses separately. Then, the first self-assessment scale showed only five significant correlations with odor test scores, three of which were with total sum score, one with odor discrimination, and one with odor identification, all in females only (Figure 2), with no order of strength of correlations following the order of sample sizes. The second self-assessment score correlated significantly with olfactory test scores only three times, twice with the TDI total score, and once with odor discrimination, again only in women. Olfactory thresholds showed the least tendency for correlation with olfactory self-assessments among the olfactory test scores. Finally, both self-rating scales, but none of the variables from the odor tests, showed significant global negative correlations with the general quality-of-life scores. In addition, both rating scales correlated significantly negatively with the quality of life scores of the anosmic patients.

Pattern of Olfactory Tests and Self-Ratings
PCA projection of the high-dimensional olfactory test or self-estimates data ( Figure  3) and subsequent selection of the relevant PCs based on computed ABC analysis of the eigenvalues retained two PCs. The two PCs explained 64.7 and 14.8 % of the total variance, respectively. PC1 carried relevant loadings from all olfaction-related variables except olfactory threshold. PC2 carried relevant loadings from the quality of life. However, nonlinear PCA showed an additional separation of rating scales from olfactory test scores. This was also observed in the factor plot ( Figure 3C) and became even more evident when quality of life was omitted from the standard PCA projection, which resulted in two main PCs in which PC1 carried loadings from the olfactory tests but not self-ratings, and PC2 carried loadings from the self-ratings (Supplemental Figure S1).

Utility of Self-Ratings for Olfactory Diagnosis Establishment
The optimal breakpoints for determining the clinical olfactory diagnosis, known from the TDI test result from the self-ratings, differed with respect to the two self-rating scales and the sex of the patient (Supplemental Figure S2). For self-assessment score 1 and men, the obtained assignment rule was "IF self-assessment < 2 THEN anosmia ELSE IF selfassessment < 4 THEN hyposmia ELSE normosmia". For women, the breakpoints were at scores 1 and 3, respectively. For self-assessment score 2, the respective values were 4 and 7 for men and 3 and 5 for women. Using these rules in a 1000-resampling scenario provided the assignment performance measures and their 95% CI ( Table 3). The median assignment performance appeared to be satisfactory with balanced accuracies of 69.5-75.3%. However, the first of the lower bounds of the 95% CIs of 49.5, 54.6, 56.5, and 61.5% indicated that the assignment of the three TDI-based clinical olfactory diagnoses from the self-assessments may also be in the range of pure guessing. The median assignment performance improved when combining both self-rating scores. Using the rule "IF rating scale #1 < 2 AND rating scale #2 < 4 THEN anosmia ELSE IF rating scale #1 < 5 AND rating scale #2 < 7 THEN hyposmia ELSE normosmia" in men provided median balanced accuracies mostly above 70% for the entire three-diagnosis setting as well as for each olfactory diagnosis alone, occasionally reaching 80% or more and with 95% CIs always > 50%. The respective breakpoints in the rule for women were 2, 3, 4, 5, providing similarly successful assignments as observed in men (Table 3). Table 3. Performance of the assignment to the TDI based olfactory diagnoses either for the three-diagnoses setting of anosmia, hyposmia, or normosmia, or for each diagnosis separately, from the olfactory self-ratings based on the bestperforming rules as established in an exhaustive search (Supplemental Figure S2). The results represent the medians and 95% confidence intervals of the performance measures obtained during 1000 runs using bootstrapped resampling of each 100 cases from the original data set.

Key Results
The present results obtained in patients with nasal polyposis suggest that self-assessments of olfaction on numerical scales can be translated fairly accurately into nominal diagnostic categories of anosmia, hyposmia, or normosmia, but they cannot be expected to reliably provide fine-scale information about olfactory function that could be used as a surrogate for quantitative olfactory tests. Thus, the present independently recruited cohort replicated the observation that in patients with nasal polyposis, self-assessment of their olfactory function provides fairly reliable information about their sense of smell [1,21], but the reported high correlation of NRS ratings and olfactory test scores was not reproduced. This overall usefulness of the self-ratings as a source of information about main categories of olfactory function is consistent with previous assessments of olfactory self-reports, in which a simpler 4-point scale detected anosmia at positive predictive values of >58% in a mixed cohort of subjects [20]. The observed poor correlation between self-assessments and measured olfactory function, which contrasts with some previous reports, underscores the need to exercise caution when assessing correlations between self-assessments of olfaction and olfactory test results. Global analyses may suggest a strong correlation, but this is due to overall group differences between the olfactory diagnoses, which may suggest a correlative relationship to be much larger than it actually is. This has been performed in some of the reports of apparently strong correlations of self-assessments with sensory test scores and has probably contributed to the disagreeing study results about the utility of self-assessments as a substitute for olfactory testing. To avoid these pitfalls, it is a recommended practice (e.g., [59]) to cross-check numerical correlation calculations first with an analysis of the shape of the distributions of each variable and second with a scatter plot. However, there is often little correlation within the main diagnostic subgroups, which has been similarly observed previously [3]. Therefore, the self-ratings do not qualify as a good replacement for olfactory functional test scores.
Thus, when the olfactory diagnostic subgroups were assessed for the correlation of self-ratings with test scores, i.e., for the possibility of using the former as a surrogate for the latter, significant correlations were found sparsely and mainly in patients with normosmia, with an additional correlation in hyposmic subjects and with the first rating scale for anosmia, although this diagnosis implies test results in the range of chance. However, "functional" anosmia, which is the precise meaning of olfactory diagnosis [60], does not indicate a complete lack of the sense of smell, and it would not be surprising to observe differences in individual subtest scores along a low range of possible scores, occasionally exceeding chance but not summing-up above the limit of hyposmia. The finding may also be interpreted as a lower validity of the precisely labelled scale for which every point required a clear decision in contrast to the continuous rating scale #2.
The usefulness of self-assessment as a surrogate measure for scaled olfactory test scores was further supported by the significant negative correlations of the assessments with the general quality of life score of the SNOT-20 questionnaire, which is a weighted sum of nine questions in which worse symptoms get a higher rating than items that cause no problem for the patient. This agrees with the general perception that olfaction contributes positively to quality of life, which has been supported by numerous studies [61][62][63][64][65][66].
However, while the general association of a worse quality of life with more reduced perception of one own's sense of smell was preserved, the PCA projected the quality-of-life ratings to be almost perpendicular to the olfactory test scores, which was consistent with their almost nonexistent correlation. The present findings compare to previous multicentric work in a group of 760 individuals with olfactory loss where measures of quality of life were better correlated to self-rated olfactory function than results from psychophysical tests of olfactory function [65]. This raises the question of what is rated when patients are asked to estimate the acuity of their own sense of smell. Using an alternative projection technique on more complex data than the present set, implemented as multidimensional scaling as an alternative classical data projection technique [67], the odor perception space was found to be complex, leaving room for different aspects captured either by self-assessments or clinical sensory testing [68]. For example, olfactory self-ratings were found to be more related to the affective impact of the odor, such as annoyance, but not to the results of the olfactory tests [6,17]. Personality traits have also been associated with self-perception of the performance of one's sense of smell [69]. Importantly, nasal airflow has been shown to modify self-ratings of olfactory function [3]. Furthermore, the poor self-assessments of one's olfactory acuity could be due to the much more limited content of consciousness compared to the other main senses, which has been discussed as being a consequence of the mainly paleocortical processing of chemosensory information [70]. Finally, while other sensory perceptions such as seeing and hearing are subject to constant external feedback, this is less pronounced for olfactory function. Anosmia can go unnoticed despite the fact that, at least during eating, each day there are numerous olfactory encounters [23,24].

Strengths and Limitations
Splitting the sample into subgroups in terms of gender and odor diagnoses had the inevitable effect of rapidly reducing sample sizes per subgroup. This is a common problem with studies that are initially conducted with a fairly large sample, but as one moves into subgroup analyses, the sample size melts away. In addition, the study sample was not set based on a previous estimate of sample size, but was set at approximately twice the size of the largest study of olfactory self-assessments in patients with nasal polyposis [1]. However, visual inspection of the scatter plots ( Figure 1) shows that the apparent correlation was due to the artificial effect described above, and the linear trends in the subgroup-specific test score versus rating score plots showed no evidence that the too small sample just prevented statistical significance; on the contrary, there was little or no correlation. In addition, the lower correlations in the split analysis for gender and odor diagnosis, which also resulted in smaller samples compared with the correlations performed in the entire cohort, did not appear to be a consequence of the smaller samples. On the contrary, the order of sample sizes used for the specific correlations did not at all follow the order of decreasing strength of the correlations (Figure 2, right bar plot).
The present use of two numerical rating scales did not reveal clear differences in the query of self-assessment of olfactory function. However, the two scales were relatively similar, which has been used in most other studies in the same or slightly modified form. The question to the participants is simple, but odor awareness, i.e., a person's ability to perceive and respond to odor stimuli in the environment, is more complex and contributes to self-assessment. It also contrasts with the present assessments of the quality of life, which have been queried using a complex questionnaire. However, studies that used questionnaires instead of one-dimensional scales yielded ambiguous results with strong or no correlation to the measured olfactory function respectively [1,19]. A specific questionnaire on odor perception was not used in the present study or in others that addressed the issue of the accuracy of self-assessments related to measured olfactory function [71].
Modest data imputation was performed for the quality-of-life assessments, but was only required for two patients who belonged to the olfactory diagnostic category of anosmia, which was the second most common diagnostic subgroup with n = 65 patients. Two imputed values corresponded to 3.07%, which did not have a major impact on subsequent statistical analyses. Indeed, when the analyses were repeated without imputed values, meaning that two patients were lost from the analyses, no changes in the results were observed but in only numerical details, without any change in significant versus nonsignificant results. In particular, the values of the olfactory test results and the olfactory self-ratings, which are the focus of this report, were completely unaffected by this imputation.

Interpretation
The present results suggest that asking patients with nasal polyposis about the function of their sense of smell provides relevant correct information; however, not as a correlating ordinal-or interval-scale measure that correlates with measured olfactory scores, but when categorical information is extracted from the ratings. By combining two numerical self-rating scales of olfactory function, it was possible to create a diagnostic tool in the form of a simple rule that provides olfactory function with an accuracy of up to 80% or slightly above, which can be considered a moderately to fairly good diagnosis-assignment performance. With regard to a correlative relationship of ratings with measured olfactory scores, as previously reported [1,21], the present results from a cohort of comparable size suggest a contrary interpretation. Self-assessments cannot be not expected to provide scaled information that can substitute for quantitative olfactory functional measurements.

Generalizability
Self-assessments of olfactory function are applied in different clinical settings and different cohorts of healthy or ill subjects. Perceptions of their accuracy for true olfactory function, based on previously reported evidence, are mixed. The presently developed olfactory diagnostic instrument, which consists of a mapping rule of two numerical rating scales of one's olfactory function to the olfactory functional diagnosis based on the "Sniffin' Sticks" clinical test battery, demonstrates that self-assessments can be usefully employed in clinical and research settings if care is taken to ensure that they are intended to provide categorical rather than interval/ordinally scaled information. This has been demonstrated in patients with nasal polyposis, who may stand out as a group particularly well aware of their own olfactory function. However, with clear instructions available, it seems possible to generalize the approach of translating self-assessments into olfactory diagnoses to other clinical cohorts in which olfaction plays a role. Still, the obtained instrument provides a good performance of assigning the categorial olfactory diagnosis from self-ratings.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/app11167279/s1, Supplemental information includes the PCA results without quality-of-life ratings (Supplemental Figure S1) and the Supplementary Materials: balanced accuracies obtained during the exhaustive search for association rules from self-ratings to odor diagnoses (Supplemental Figure S2).

Institutional Review Board Statement:
The study has been approved by the Ethics committee at the Dresden University Hospital (approval number EK14502017).