Identifying Patient-Reported Outcome Measures (PROMs) for Routine Surveillance of Physical and Emotional Symptoms in Head and Neck Cancer Populations: A Systematic Review

The aims of this review were to identify symptoms experienced by head and neck cancer (HNC) patients and their prevalence, as well as to compare symptom coverage identified in HNC specific patient-reported outcome measures (PROMs). Searches of Ovid Medline, Embase, PsychInfo, and CINAHL were conducted to identify studies. The search revealed 4569 unique articles and identified 115 eligible studies. The prevalence of reported symptoms was highly variable among included studies. Variability in sample size, timing of the assessments, and the use of different measures was noted across studies. Content mapping of commonly used PROMs showed variability and poor capture of prevalent symptoms, even though validation studies confirmed satisfactory reliability and validity. This suggests limitations of some of the tools in providing an accurate and comprehensive picture of the patient’s symptoms and problems.


Introduction
In 2020, GLOBOCAN estimated 932,000 new cases of head and neck cancer (HNC) and 467,000 deaths in 2020 worldwide [1]. HNC refers to a group of cancers including oral cancer, pharynx, larynx, paranasal sinuses and nasal cavity, and salivary glands [2,3]. Due to the location of cancer and type of treatment, HNC patients experience unique oral morbidity and related symptoms such as dysphagia, xerostomia, trismus, osteoradionecrosis, mucositis, lymphedema, and sialadenitis [4,5]. They may also experience changes in appearance and speech, decreased neck mobility, and shoulder dysfunction [6,7]. These changes can affect self-esteem and body image, sexuality, social anxiety, physical functioning, and quality of life (QOL), leading to high levels of psychological distress [4,8].
Patient-reported outcome measures (PROMs) are used in healthcare systems to determine the impact of disease and treatment on the patient and to estimate disease burden across a population [9]. PROMs are standardized, validated questionnaires completed Patient-reported outcome measures (PROMs) are used in healthcare systems to determine the impact of disease and treatment on the patient and to estimate disease burden across a population [9]. PROMs are standardized, validated questionnaires completed by patients to measure their symptoms, perceptions of health status, and/or functional wellbeing [10]. PROMs should capture the most prevalent symptoms and treatment effects experienced by HNC patients. However, it is unclear to what extent PROMs map to specific problems of HNC patients and if they are psychometrically sound. Selection of a core set of condition-specific PROMs for routine capture specific to HNC and its treatment effects is critical for guiding patient management in routine care, for estimating disease burden, and for value-based performance measurement in the cancer system.
A preliminary review of the literature identified three previously conducted systematic reviews on PROMs for assessing QOL in HNC populations, but none have mapped PROMs to identify their capture of prevalent physical and emotional symptoms or other problems in this population [11][12][13]. Thus, the aims of this study were to (1) explore the prevalence of symptom burden and treatment effects in HNC, (2) identify relevant PRO domains and PROMs specific to HNC, and (3) evaluate psychometric properties to recommend use in routine care.

Materials and Methods
This systematic literature review focused on HNC patients undergoing treatment (surgery, radiation, and/or chemotherapy). There were three phases of work: (1) a systematic review of the literature to identify prevalence of symptom burden, (2) identification of common PROMs with mapping of domains and items to HNC specific symptoms and comparison of PROM-content across measures, and (3) review of psychometric properties of identified PROMs (Figure 1).

Search Strategy
Systematic searches of electronic databases were conducted in MEDLINE, EMBASE, PsychINFO, and CINAHL to identify studies that reported prevalence rates for HNC symptoms. Gray literature sources were also searched and included National Health Ser-

Search Strategy
Systematic searches of electronic databases were conducted in MEDLINE, EMBASE, PsychINFO, and CINAHL to identify studies that reported prevalence rates for HNC symptoms. Gray literature sources were also searched and included National Health Service in England (NHS), American Society of Clinical Oncology (ASCO), International Society for Pharmacoeconomics and Outcomes Research (ISPOR), Integrating the Healthcare Enterprise (IHE), and Cancer Australia websites. MEDLINE and PubMed searches were also conducted to obtain validation studies of the most commonly used PROMs, specifically the Head and Neck Radiotherapy Questionnaire (HNRT-Q), Quality of Life

Characteristics of the Included Studies
Studies were limited to 2004 onwards, with 88 (76%) of the included studies published after 2010, of which there were 63 cross-sectional studies, 45 prospective cohort studies, 4 retrospective cohort studies, 2 controlled studies, and 1 mixed-methods study. Studies either included patients across different cancer stages (I-IV) or did not specify the cancer stage. Study characteristics are provided in Supplementary Table S1.

Emotional Distress and Psychosocial Symptoms
As shown in Table 1, emotional distress and psychosocial symptoms were the most common issues identified in HNC, including depression (n = 22 studies), sadness (n = 5 studies), anxiety (n = 20 studies), worry (n = 3 studies), emotional distress (n = 7 studies), satisfaction with appearance (n = 4 studies), and avoidance of social interactions (n = 3 studies).

Characteristics of the Included Studies
Studies were limited to 2004 onwards, with 88 (76%) of the included studies published after 2010, of which there were 63 cross-sectional studies, 45 prospective cohort studies, 4 retrospective cohort studies, 2 controlled studies, and 1 mixed-methods study. Studies either included patients across different cancer stages (I-IV) or did not specify the cancer stage. Study characteristics are provided in Supplementary Table S1.

Emotional Distress and Psychosocial Symptoms
As shown in Table 1, emotional distress and psychosocial symptoms were the most common issues identified in HNC, including depression (n = 22 studies), sadness (n = 5 studies), anxiety (n = 20 studies), worry (n = 3 studies), emotional distress (n = 7 studies), satisfaction with appearance (n = 4 studies), and avoidance of social interactions (n = 3 studies).

Depression
Depression was commonly identified in many included studies, although study heterogeneity precluded meta-analysis. Sample sizes ranged from 23 to 1217 patients. Variability in rates of depression were noted and ranged from 2% to 84% due to differential timing of assessments, different scales, and different threshold values. For example, across different studies, depression was evaluated using any of the following instruments: Hospital Anxiety and Depression Scale (HADS) [16][17][18][19][20][21][22][23][24][25]35,36], the Beck Depression Inventory (BDI) [26][27][28], the short-form of the Geriatric Depression Scale (GDS-SF) [29], the Neuropsychiatric Inventory Questionnaire (NPI-Q) [30], the Research Diagnostic Criteria Schedule for Affective Disorders and Schizophrenia (RDC SADS) [31], the University of Washington Quality of Life Mood scale (UWQOL-mood) [32], and the Patient Health Questionaire-8 (PHQ-8) [4]. Chen et al. [16] evaluated the prevalence of depression over time using the HADS-D (cut-off score of ≥8) and the BDI (cut-off score ≥ 14) and reported a 13% difference in the number of patients with depression as identified by the HADS-D and the BDI at pre-treatment, a 5% difference in prevalence during treatment, and a 12% difference post-treatment between instruments. Further to this, Katz et al. [31] applied the research diagnostic criteria (RDC) clinical diagnostic criteria for depression to a sample of HNC patients and used these results to compare the sensitivity, specificity, and positive predictive values of different threshold scores for different instruments. As may be expected, each instrument and associated cut-off scores evaluated had varying levels of performance [31].
Levels of depression in HNC appear to be independent of age, sex, disease site, and cancer stage [26,31,33]. Karnell et al. [26] found that higher levels of pre-treatment depressive symptoms were the only factor in multivariate analysis that was associated with persistently high levels of post-treatment depressive symptoms (odds ratio of 1.762; p < 0.01). An increasing trend as treatment progressed in both the prevalence and severity of depression was noted across most studies; this trend generally reversed and declined post-treatment [16,27,28,47]. McDowell et al. [23] found depression to be prevalent in onequarter of patients with nasopharyngeal carcinoma even after 4 years of being disease-free after definitive intensity-modulated radiation therapy (IMRT). Given that depression is closely linked to physical symptom severity, this pattern of increasing prevalence and severity as treatment progresses was not surprising [17,47].

Sadness
Sadness was reported in five studies, ranging between 8% and 82% [37][38][39][40][41]. There was no consistency in terms of measurement tools used. One study reported that patients who underwent surgery were more likely to report being sad than those who had received chemotherapy (20% vs. 14% prevalence respectively) [38].

Emotional Distress
Seven studies examined emotional distress among HNC patients [38,40,41,[48][49][50][51]. Measures and cut-off scores for identifying clinically significant distress varied among studies. The Distress Thermometer (DT) was used in three studies, with cut-off scores ranging from 3 to 5 [48][49][50]; two studies reported an overall prevalence of distress in 50% of the population surveyed [48,49], while Wells et al. [50] reported a prevalence of 35% for mild distress and 33% for moderate/severe distress. Three studies that used MDASI-HN reported an overall prevalence of distress ranging between 14% and 86% [38,40,41]. Although treatment type (surgery versus chemotherapy) was not found to be a predictor of distress [38], one study reported that disease site-cutaneous (involvement of the lips, eyelids, ear, nose or face) versus non-cutaneous (larynx, oral/nasal cavity, glands, oro/nasopharynx) was a significant predictor [48].

Other Emotional Symptoms
Three studies reported on worry, with prevalence ranging between 30% and 62% [30,37,47]. Unlike most other symptoms, prevalence was highest before treatment (62%), and dropped significantly, as treatment progressed (38% at 5 weeks during treatment, and 33% at 12 weeks after treatment) [47,58]. Others, such as Bond et al., reported prevalence of emotions of apathy and indifference in 56.5% of patients and agitation and aggression in 52.5% [30].

Satisfaction with Appearance
Four studies of HNC patients undergoing surgery examined patient satisfaction with appearance [45,[52][53][54]. In two studies, approximately 75% of patients reported either some type of body image concern or dissatisfaction with appearance [52,53]. There was also a significant difference in pre-surgical levels of satisfaction compared to post-surgical levels, with patients reporting significantly lower levels of satisfaction post-surgery [52,54]. One study used PCI and reported an overall prevalence of 89% [45].

Avoidance of Social Interactions
Three studies provided estimates for the prevalence of social dysfunction [53,55,56]. In one study, 38% of patients reported avoidance of social activities due to appearance, speech or eating concerns [53]. Dwivedi et al. found that 41% of oral cancer patients and 16% of oropharyngeal cancer patients reported avoiding social activities due to speech alone [55].

Substance Abuse Problems
Duffy et al. examined problem drinking and smoking: 16% of patients screened positive for problem drinking, while 30% had smoked cigarettes within the last month [29]. The study found that smokers and problem drinkers were more likely to be younger, not married, and within one year of diagnosis. The authors also reported that while smoking was negatively associated with all quality of life scale domains, problem drinking was not associated with any.

Delirium
Bond et al. examined the prevalence of delirium among HNC patients undergoing chemotherapy [59]. Among 58 patients who completed their 3-month follow-up, 18 (31%) self-reported experiencing delirium at some point during their chemotherapy, while only 9% of patients were diagnosed with delirium using the Confusion Assessment Method (CAM). No patients reported experiencing delirium before or after treatments.

Physical Symptoms
Several studies evaluated prevalence of physical symptoms in HNC (Table 2).

Eating and Nutritional Status Dysphagia
A total of 35 studies assessed dysphagia, or difficulty swallowing [4,5,[37][38][39]41,42,45,56,58,. Prevalence of dysphagia ranged between 0% and 100% across studies. Sample size range was between 12 and 8002. The University of Washington Quality of Life (UW-QOL) questionnaire swallowing subscale [42,63,65,71], the M.D. Anderson Dysphagia Inventory (MDADI) [41,62,68,125], and the Common Terminology Criteria for Adverse Event [74,[77][78][79][80][81] were the most commonly used instruments to assess dysphagia. Jager-Wittenaar et al. reported that approximately 28% of patients (oral, pharynx, and larynx) experienced dysphagia at diagnosis, likely as a result of the disease itself [60]. Studies that compared the symptoms before, during, and after radiotherapy found HNC reported greater problems with swallowing as treatment progressed [58,61,67,68]. Symptoms also persisted well beyond treatment and did not return to baseline levels until 6 or more months post-radiotherapy [56,61,67]. Similar findings have also been reported pre-versus post-surgical resection [42,66]. Longer-term follow-up studies have suggested that the prevalence of dysphagia remains higher for patients who have undergone radiotherapy (15-95% prevalence at 12 months follow-up) [58,61,67] or multimodal treatments with chemotherapy and radiotherapy (75-79% prevalence at 6-60 months follow-up) [62,76] compared to those who underwent surgery alone (51% prevalence 28 months followup) [56]. Receiving multiple treatment modalities was identified as an important predictor of dysphagia [38,75]. Patients who received concomitant chemotherapy and radiotherapy generally experience a higher prevalence of dysphagia compared to those who undergo radiotherapy [61] or surgery alone [64]. However, radiotherapy alone is also a significant predictor of dysphagia [63,66]. Even type of radiotherapy was found as a predictor of dysphagia [76,77]. The absorbed dose to specific regions also appears relevant in the development of acute RT-related dysphagia [72]. Disease site may play an important factor in swallowing function. In a cross-sectional population-based study, Francis et al. found that the prevalence of dysphagia varied by disease site [64]. Compared to oral cancer, patients with cancer of the oropharynx, hypopharynx, or larynx were significantly more likely to have dysphagia [64]. In contrast, Rinkel et al. found that patients treated for a laryngeal or hypopharyngeal tumor had significantly better scores compared to patients treated for an oral cavity, oropharyngeal tumor, or nasopharynx tumor on the total [76]. More generally, Suarez-Cunquiero et al. found that patients with tumors located in the floor of the mouth and oropharynx experienced greater burden than other disease sites. In the same study, earlier stage disease was also found to be associated with better swallowing scores [66]. Difficulty swallowing also had a negative effect on quality of life [34,56,61] and weight loss [60,75]. Sixty-two percent of patients avoided eating with others, and 37% felt embarrassed at meal times due to their dysphagia [56]. In patients >65 years old during initial treatment, the development of severe late dysphagia was significantly more frequent [83].
As may be expected, treatment type was a significant predictor of xerostomia. Arribas et al. reported that after induction chemotherapy (iCT), the prevalence was 15%, and 45% after RT [73]. Gunn et al. reported that the patients scheduled to undergo radiotherapy who had completed prior chemotherapy or surgery experienced higher prevalence of xerostomia (11.9% and 14% respectively) than untreated patients (5.5%) [38]. Radiotherapy alone was a significant predictor of xerostomia [86].

Difficulty Chewing and Dental Problems
Five studies examined the prevalence of chewing difficulties [38,41,42,45,71]. Baseline levels of chewing difficulties were variable among population groups (12-44%). In patients with oral and oropharyngeal cancer, 44% were found to have difficulty chewing at preoperative assessment [42]. Prior to radiotherapy, Gunn et al. found that 14% (including multiple disease sites) reported difficulty chewing [38]. Within this group, patients with no previous treatment, compared to patients with prior chemotherapy or surgery, had the lowest prevalence (12%, 13%, and 19% respectively) [38]. In comparison, 91% of patients with tongue cancer treated with surgery and radiotherapy reportedly had difficulty chewing an average of 27 months post-treatment [71]. Chewing problems were one of the most prevalent symptoms (98.5%) in patients with nasopharyngeal carcinoma undergoing late-period RT [41]. Six studies evaluated problems with teeth among head and neck cancer patients [5,38,41,45,69,89]. Pre-treatment prevalence ranged from 13% to 27%, while during chemoradiotherapy, prevalence was reported at 82%; at one-year post-treatment, prevalence ranged from 14 to 42% [69,89].

Weight Change and Malnutrition
Loss of appetite, loss of taste, and dysphagia are significantly associated with critical weight loss [60]. Sixteen studies reported malnutrition (clinician reported) and weight loss, with prevalence ranging between 3% and 95% [5,34,35,45,56,58,60,62,70,73,81,[102][103][104][105][106]. Most studies defined critical weight loss as involuntary loss of more than 5% of normal weight within the past 1 to 6 months [34,56,58,60,62,[102][103][104]. Baseline prevalence of malnutrition ranged from 8.5% (at diagnosis) up to 42% (prior to any treatment) [5,34,73,104,105]. During (chemo)radiotherapy, the reported prevalence was much higher (43%, 91%, and 81% at 1 week, 5 weeks, and 9 weeks, respectively) [58]. Although it appears that the prevalence remained high immediately post-treatment, the general trend across studies showed that the prevalence declined over time [70,73,103]. In terms of treatment type, chemotherapy and radiotherapy were significantly associated with greater rates of malnutrition compared to surgery alone and patients treated without chemo or radiation treatment [56,103]. Malnourished patients also experienced worse quality of life compared to adequately nourished patients [102]. Patients indicated a critical need for improved symptom management and/or nutrient intervention options to reduce the burden of weight loss and malnutrition [102].

Communication Voice and Speech Impairment
A total of 14 studies examined voice and speech impairment with prevalence ranging from 9% to 88% (Table 2) [21,38,39,41,42,45,55,63,65,66,71,76,114,115]. Among oral and oropharyngeal cancer patients, the pre-treatment prevalence of speech impairment was found to be 42%; however, it is unclear if these patients had undergone any prior treatments [42]. When multiple disease sites were included, the pre-treatment prevalence was found to be much lower at 3% [38]. Post-treatment, the prevalence of voice and speech impairment increased significantly [21,63,65,71,114]. In terms of treatment type, prevalence was found to be higher in patients who received surgery (21.5%) than those who had received chemotherapy (7.5%) or no treatment (3%) [38]. However, when all treatment modalities were compared, patients who received radiotherapy reported the worst functional outcomes for speech [66]. However, in this study, RT was only given to late-stage cancer patients, and thus comparison between treatment types can be biased. In addition, Dwivedi et al. reported that oral cavity patients perceived more problems with voice and speech than oropharyngeal cancer patients [55]. Suarez-Cunqueiro et al. found that patients with tumors located in the floor of the mouth and oropharynx reported worse scores for speech compared to other tumor locations [66]. Only 7 of 14 studies used PROM instruments that were specifically designed to assess voice and/or speech impairment (VHI, VRQOL, GRBAS, SHI), while the rest used generic QOL instruments such as UWQOL, MDASI-HN, and FACT-HN.

Hearing Loss
Four studies examined prevalence of hearing loss among HNC patients [45,78,116,117]. In a small cross-sectional study (n = 11 patients), Liberman et al. reported that 36% of patients with laryngeal or hypopharyngeal cancer experienced hearing loss; however, the timing of this assessment was unclear [117]. Schultz et al. reported a prevalence of 72% hearing loss more than two years after treatment with radiotherapy in a study involving multiple HNC anatomic subsites [116]. The prevalence of hearing loss in this population was significantly higher than that of an age-matched control group treated with local surgery alone [116]. Huang et al. 2015 reported that IMRT technique was associated with less hearing loss [78].

Pain
Pain was reported in 22 studies with prevalence rates from 9% to 91% (Table 2) [5,21,34,37,38,[40][41][42]45,48,58,60,70,77,81,93,[118][119][120][121][122][123]. Most studies did not report the type or location of pain, and measurement tools were not consistent. Two studies used a visual analogue scale (VAS) to assess pain [21,119], three studies used MDASI-HN [38,40,41], one used self-reported pain [122], three studies did not describe their method of assessment [118,120,121], two used Common Terminology Criteria for Adverse Events (CT-CAE) [77,81], and the remaining studies each used a different assessment tool. One study estimated that as many as 36% of HNC experienced pain at the time of diagnosis [5]. However, during treatment with (chemo)radiotherapy, the prevalence of pain appeared to rise dramatically [41,58,121]. In fact, Pignon et al. reported that 71% of patients in their study experienced pain during radiotherapy and 30% of those patients were experiencing "new pain", most likely caused by treatment [120]. Post-treatment, a general trend towards decreasing prevalence of pain was noted over time [70,121]. Two studies examined risk factors for pain, finding that, in general, a higher cancer stage was associated with increased levels of pain [48], while gender, treatment modality, and tumor site were not [119]. Cramer et al. identified that tri-modality treatment with surgery with adjuvant chemoradiation was the only characteristic associated with pain [122]. Pain was consistently listed as one the most distressing symptoms at each measurement period among studies [42,58,63].

Dyspnea and Cough
Three studies reported the prevalence of dyspnea or shortness of breath (Table 2) [37,38,41]. Baseline levels of dyspnea were estimated at 6% in this population [38], while Lokker et al. estimated that approximately 21% of HNC in the palliative phase of care experienced dyspnea [37]. The prevalence of dyspnea in palliative patients was highest in those treated with chemotherapy (12%) compared to surgery alone (4%) or other treatments (3%) [58]. During (chemo)radiotherapy, the prevalence of dyspnea was reported at 68% [41]. Three studies examined the prevalence of cough, which ranged between 10.5% and 52% (Table 2) [69,70,124]. Prior to treatment, Ginex et al. found a prevalence of 32% in esophageal cancer patients [70]. This same study found that symptoms of cough worsen post-operatively but recovered to baseline at one year. The prevalence of cough seemed to be independent of early versus late tumor stage [69].

Functional Well-Being
Some studies evaluated prevalence of functional well-being in HNC (Table 3).

Activities of Daily Living Difficulties with Activities of Daily Living
Prior to treatment, Lango et al. reported that 9% of patients had problems with mobility, 2% with self-care, and 14% with performing usual activities [34]. As no reference population was used to compare these results, it is difficult to assess the severity of these symptoms (Table 3).

Sexual Function
Problems with sexual function were reported in two studies (Table 3) [20,39]. In one study, 32% of patients reported that they were less interested in sex following a laryngectomy, while 42% of males had erectile dysfunction [20]. The same study concluded that sexual problems were not treatment-related but were likely caused by the cancer itself [20]. Distress and depression were strongly correlated with sexual difficulties (p < 0.01) [20]. Beyond prevalence data, Ginex et al. found that patients reported problems with sexual activity and interest as one of the most bothersome symptoms both pre-and post-surgery [70] .

Fatigue and Energy Fatigue
The prevalence of fatigue, or decreased energy, was reported in 14 studies, ranging from 7% to 95% (Table 3) [23,[37][38][39][40][41]45,58,69,70,77,123,126,127]. The baseline prevalence of fatigue prior to any treatment ranged from 14.5% to 58% [38,70]. The prevalence of fatigue appeared to increase over the course of treatment with radiotherapy (71%, 91%, and 95% at 1 week, 5 weeks, and 9 weeks, respectively) [58]. However, post-radiotherapy, prevalence was likely to return to baseline levels [58]. A different picture is shown post-surgery, as the prevalence of fatigue was worse immediately after surgery but recovered to baseline by one year [70]. In a study by Qian et al., all patients reported some level of fatigue; however, patients considered mild fatigue to be normal, while 13% reported moderate fatigue [126]. McDowell et al. reported prevalence of moderate (14%) and severe (14%) fatigue even four years after treatment [23].

Sleep Quality
The prevalence of difficulty sleeping or sleep disturbance ranged from 16% to 100% across 11 studies (Table 3) [37,38,40,41,45,70,77,123,126,128,129]. Only one study reported prevalence before and after treatment, finding a bell-shaped trend over time (41%, 62%, and 42% at pre-surgery, immediately post-surgery, and 6 months, respectively) [70]. Qian et al. found a higher prevalence of obstructive sleep apnea in a group of patients treated without surgery (100%) compared to patients treated with surgery (93%), although the surgery group reported more severe symptoms [126]. Li et al. reported a high prevalence of poor sleep quality in long-term HNC survivors [129].

Characteristics of Outcome Measurement Instruments
Among 53 instruments identified by Ojo et al. [12], 45 instruments were reviewed, and information about their PROM items was extracted, resulting in 124 different symptoms identified. Among instruments, 22 instruments assessed general symptoms of HNC and quality of life, 10 assessed eating ability including symptoms such as dysphagia and xerostomia, 6 instruments assessed speech and voice, 2 instruments assessed neck and shoulder disabilities, 3 instruments assessed oral mucositis, and 1 instrument assessed skin symptoms and sinonasal outcomes.
Symptoms assessed by each instrument were mapped by the following domains and compared on content: (1) physical symptoms, (2) psychological symptoms, (3) psychosocial symptoms, (4) functional symptoms, and (5) quality of life (Figure 3). The complete cross-comparison of instruments can be found in Supplementary Table S2.    We found major discrepancies between the symptoms reported in the prevalence review and the symptoms captured by the PROMs. While some instruments had comprehensive overlap with the symptoms identified in the prevalence review, a number of the symptoms that recurrently appeared in the PROM instruments were not widely reported in the included studies. For example, 12 different PROM instruments in this review could assess 'cough', yet we found only three studies reporting this symptom [69,70,124]. Likewise, we found 15 PROM instruments that assessed 'changes in appearance' and its psychological impact, yet only four studies reported this symptom [45,[52][53][54]. This discrepancy is even more noticeable in the psychological symptom category. Functional well-being such as performing activities of daily living is broadly covered by 24 instruments, but we found only one study [34] that reported related symptoms. Social and family well-being is covered by 24 instruments in various aspects such as interference with family life or relationship with friends, ability to participate in social activities, and anxiety about social life. However, we found only three studies that examined 'avoidance of social contacts' only in relation to this problem category [53,55,56]. Prevalence data may be instrumental for identifying symptom burden in head and neck populations, but their capture of symptoms may be limited by the domains and items in the outcome measures used. On the other hand, PROMs may generate items on the basis of input of clinicians and patients regarding the relevant symptoms in the HNC population in their initial development and content validation process. In selecting PROM measures for routine surveillance in HNC populations, one should consider data from prevalence studies and PROMs for relevant capture of burdensome symptoms.
In summary, on the basis of the prevalence of symptom burden, PROMs for routine surveillance in HNC populations should capture physical well-being domains for eating and weight changes (especially dysphagia, xerostomia, dysgeusia, and weight loss), communication (voice/speech), pain, and fatigue. Depression and anxiety should also be key items in the psychosocial domain of PROMs given its prevalence in HNC. Specific capture of these symptom domains in PROM items could help to identity the impact of HNC and its treatment, thus enabling personalized tailoring of symptom management [130].
On the basis of a cross comparison of symptoms identified in the literature and symptoms addressed in the PROMs (Supplementary Table S1), we identified seven instruments for further review: FACT-NP, FSH&N-SR, HNRT-Q, MDASI-HN, OMQOL, the QOL-Rathmell, and the QOL-Thyroid. These seven instruments were selected for further review as they were frequently used in the prevalence studies, were specific to HNC populations, and covered common physical and emotional HNC symptoms that our expert team members considered important for routine surveillance in HNC populations. Content domains and number of the items from each PROMs are displayed in Table 4.
The quality of these studies were assessed using the COSMIN checklist, which provides an overall rating based on the quality of each article assessing internal consistency, reliability, measurement error, content validity, structural validity, hypothesis testing, criterion validity, responsiveness, and interpretability [138]. Articles evaluating or describing the translation of these PROs into languages other than English were not evaluated in this review. Results of our assessments of these validation studies are provided in Supplementary Table S3. Table 5 shows the psychometric properties that were reported in the included studies. Internal consistency was reported in all studies and for all the tools, yet no studies assessed measurement error or interpretability of the tools. Test-retest reliability and convergent validity were reported in two studies each [131][132][133][134]. Known-groups validity, concurrent validity, and responsiveness were reported for three tools [132,134,135], and content validity was also reported for one tool [133]. As seen in Table 5, the OMQOL was evaluated by seven properties, while assessment of other tools was conducted on the basis of three or four properties. For HNRT-Q, only internal consistency was reported.  [132] X X X X OMQOL [134] X X X X X X X MDASI-H&N [135,136] X X X HNRT-Q [137] X QOL-Rathmell There were no validation studies for these instruments QOL-Thyroid

Reliability
All of the tools demonstrated high Cronbach's alpha (α) (0.84-0.97). FACT-NP, OMQOL, and HNRT-Q showed excellent level of alpha for the total items (α ≥ 0.9) [131,133,137]. Among these tools, OMQOL demonstrated the highest α for both subscales and the total items [133]. Test-retest reliabilities were reported for the FACT-NP and the OMQOL. Both tools demonstrated good test-retest reliabilities, yet the OMQOL demonstrated the higher intraclass correlation coefficient (ICC) on the subscales (0.864-0.934) [133]. No studies reported measurement errors for assessing reliability of the tools.  [132,134].
These studies showed variation in the measurement of convergent and known-groups validity. Baker et al. computed Pearson's correlation coefficients between the FSH&N-SR and the Karnofsky Performance Scale (KPS), the 36-Item Short Form Survey (SF-36), and the Performance Status Scale for HNC patients [132]. On the other hand, Cheng et al. calculated Pearson's correlation coefficient for correlations between the OMQOL subscales and OM (oropharyngeal mucositis)-related symptoms peak and AUC (area-under-thecurve) scores [134].
For known-groups validity measurement, Baker et al. used t-test for two different patient groups [132], whereas Cheng et al. compared the OMQOL subscales peak and AUC scores among patients with different levels of OM and types of cancer therapy [134]. Rosenthal et al. compared mean scores of MDASI subscales between the patient groups categorized into good and poor performance status [135].
Given the variability in measurement approaches, direct comparisons are impossible; there is no way to conclude that any one instrument has shown better construct validity than another.

Criterion Validity (Concurrent Validity)
Criterion validities of FACT-NP, the OMQOL, and the MDASI-H&N were confirmed by assessing concurrent validities. Again, the measurement methods varied across studies. Tong et al. computed Pearson's correlation coefficients between the subscales of FACT-NP and those of the QOL-RTI-H&N [131]. Moderate or high correlations were found, which indicated concurrent validity of FACT-NP. Cheng et al. assessed Pearson's correlation coefficients between the OMQOL subscales peak and AUC scores with those of EORTC [134]. Moderate correlations confirmed the concurrent validity of the OMQOL. Weak or moderate correlations were found between the subscales of the MDASI-H&N and the 12-item Short-Form Health Survey (SF12v2), yet the study concluded that concurrent validity had been confirmed. MDASI scores were significant predictors of objective CTCAE scores on multivariate regression analysis [136].

Responsiveness
Responsiveness was confirmed for the FACT-NP, the FSH&N-SR, and the OMQOL. Tong et al. and Cheng et al. used effect size comparisons and confirmed that the FACT-NP and the OMQOL were responsive to the changes in the scores over time [131,134]. Baker et al. found that the FSH&N-SR demonstrated responsiveness to changes by cancer stage and the extent of initial surgical procedure using ANOVA and pairwise comparisons [132].

Discussion
In this review, we identified symptoms experienced by HNC populations, described their prevalence, and identified HNC-specific PROMs and their coverage of the physical and emotional symptom problems experienced by this population.
The prevalence of reported symptoms was highly variable among included studies. Variability in sample size, the timing of the assessments, and the use of different measures may explain some of this variability. HNC patients experience symptoms common to many other cancer patients but can also experience disease-specific or treatment-specific symptoms (i.e., dysphagia); evaluating both types of symptoms will be important to understand the burden of disease and treatment in this population.
The PROMs used varied across studies, thus precluding meta-analysis for estimating prevalence of symptoms. For example, depression was assessed using the Hospital Anxiety and Depression Scale (HADS) [16][17][18][19][20][21][22][23][24][25], the Beck Depression Inventory (BDI) [16,[26][27][28], the short-form of the Geriatric Depression Scale (GDS-SF) [29], the Neuropsychiatric Inventory Questionnaire (NPI-Q) [30], the Research Diagnostic Criteria Schedule for Affective Disorders and Schizophrenia (RDC SADS) [31], and the University of Washington Quality of Life Mood scale (UWQOL-mood) [32]. Furthermore, there was variability in cut-off scores used for the same instrument. For example, thresholds for HADS ranged from 7 to 11 and BDI thresholds ranged from 10 and 21. There is a need for standardization in PROM items for use in patient management for routine care and population comparison. A recent review recommended the Patient Health Questionairre-9, Zung Self-Rating Depression Scale, and Zung Self-Rating Anxiety Scale as having good content coverage and excellent psychometric properties to assess psychological distress in HNC populations [139].
The symptoms and their prevalence experienced by HNC patients varied widely, depending on the cancer site, treatment modalities, and phase of treatment. Thus, choice of PROM should focus on the content and its temporal application relative to the phases of the cancer journey (pre-treatment, during treatment, after treatment, during surveillance, etc.). Standardization in the temporal application of PROMs is also needed. We recommend that studies consider measuring depression, pain, dysphagia, and dysgeusia outcomes especially during treatment, in which the highest prevalence was noted. The following time points-during treatment, after treatment, during surveillance-should be considered when measuring symptoms that worsen during treatment and remain at higher levels into follow-up (e.g., trismus, xerostomia, and speech difficulties). However, many of these symptoms can persist as long-term problems post-treatment.
Standardization in the criteria used for validation of PROMs is also crucial, given wide variability across studies. For example, validation of FACT-NP was based on criterion validity and responsiveness [131], while FSH&N-SR was validated on the basis of construct validity and responsiveness [132]. The MDASI-H&N was evaluated by both construct validity and criterion validity [135]. Furthermore, the measurement methods for the same psychometric property were also highly variable.  [134]. Similarly, variability was found in the assessment methods for concurrent validity and responsiveness. Due to this variability, it is difficult to make meaningful comparison across measures in terms of psychometric properties. We can only conclude that there is at least some evidence supporting the validity of the PROM instruments; thus, the psychometric properties and content of multiple PROM instruments should be considered before selection and depending on purpose, i.e., routine surveillance and or research. Moreover, it is essential to carefully consider the content of each PROM before choosing it [140].
In order to determine the optimal choice of tools for monitoring symptoms in HNC patients, from the 45 instruments, on the basis of a cross comparison of symptoms identified in the literature and symptoms addressed in the PROMs, we were able to identify seven instruments for further review. Our findings do not suggest that the other PROMs are unacceptable as instruments to capture symptom burden in patients with HNC. However, a combination of different PROMs may be necessary to ensure capture of the important domains. We recommend further validation studies of the identified PROMs, as well as development of HNC-specific PROMs, in order to foster personalized symptom management, and to reduce survey fatigue.
There are limitations to our study. We only included studies from the last 15 years, restricted to the English-language. As such, the prevalence of some symptoms may be under or over-represented in our review. We did not restrict our analysis by methods used to assess the various symptoms. Therefore, there was wide variability in both the assessment and the definition of various symptoms. This was also reflected in the wide variability in symptom prevalence across studies. Given the heterogeneity of measurement tools and threshold values used, meta-analysis could not be performed on our group of studies, and we could not report on a final estimated prevalence for each of the symptoms. Quality of the prevalence studies was subjectively determined by reviewers and not used for exclusion purposes. Therefore, some caution should be applied when interpreting the findings of the report.

Conclusions
Our search identified wide variability in the specific symptoms assessed and their prevalence and in the content and psychometric validity of measurement tools. Further, there was some discrepancy between the symptoms reported in the included studies and the retrieved PROMs, suggesting incomplete reporting of important HNC symptoms and problems and potential for underestimation of impact. We recommend that journals either require or strongly recommend that authors provide public access to the raw and complete data from PROMs studies, which would help promote transparency, meta-analysis, and pooled analysis of data. Thus, we recommend standardization of elements such as inclusion of certain treatment-and condition-specific PROM items, as well as standardization of temporal application of PROMS relative to key events such as treatment or disease relapse, in order to promote cross-collaboration and cross-comparison across studies. Either the FACT-HN or the MDASI could be used in routine surveillance as they provided the most complete coverage of prevalent physical and emotional symptoms and had adequate psychometric properties, but supplementation with condition specific measures (i.e., dysphagia, body image disturbance) may be needed depending on the purpose of measurement.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/jcm10184162/s1, Table S1: Characteristics of the Included Studies; Table S2: Cross-comparison of instruments; Table S3: Quality of studies evaluating the validity of PROMS. Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.