Prediction Models for Tinnitus Presence and the Impact of Tinnitus on Daily Life: A Systematic Review

The presence of tinnitus does not necessarily imply associated suffering. Prediction models on the impact of tinnitus on daily life could aid medical professionals to direct specific medical resources to those (groups of) tinnitus patients with specific levels of impact. Models of tinnitus presence could possibly identify risk factors for tinnitus. We systematically searched the PubMed and EMBASE databases for articles published up to January 2021. We included all studies that reported on multivariable prediction models for tinnitus presence or the impact of tinnitus on daily life. Twenty-one development studies were included, with a total of 31 prediction models. Seventeen studies made a prediction model for the impact of tinnitus on daily life, three studies made a prediction model for tinnitus presence and one study made models for both. The risk of bias was high and reporting was poor in all studies. The most used predictors in the final impact on daily life models were depression- or anxiety-associated questionnaire scores. Demographic predictors were most common in final presence models. No models were internally or externally validated. All published prediction models were poorly reported and had a high risk of bias. This hinders the usability of the current prediction models. Methodological guidance is available for the development and validation of prediction models. Researchers should consider the importance and clinical relevance of the models they develop and should consider validation of existing models before developing new ones.


Introduction
Prediction models are made to inform clinical decision making. They quantify the relative importance of findings, characteristics and different types of factors when evaluating an individual patient [1]. Over the past decade, there has been a steep increase in the number of prediction models in clinical research. Before it can be decided whether models on tinnitus prediction could be applied in clinical care and research, more clarity regarding the quality, performance and outcomes of these models is necessary.
Tinnitus can be described as the hearing of a phantom sound. The sheer presence of tinnitus does not necessarily imply associated suffering. Quality of life is severely reduced in 0.5-1% of the population due to tinnitus [2]. Because of this, recently two operational definitions have been proposed to distinguish between the two: tinnitus and tinnitus disorder [3]. To measure the impact of tinnitus on daily life multi-item questionnaires are used in clinical practice such as the Tinnitus Functional Index (TFI), the Tinnitus Handicap Inventory (THI) and the Tinnitus Questionnaire (TQ) or single-item questions [3][4][5][6].
Adequate prediction of the experience of tinnitus or the impact of tinnitus on daily life could be beneficial for preventive or therapeutic purposes. Prediction models on the impact of tinnitus on daily life could aid medical professionals to direct specific medical resources to those (groups of) tinnitus patients with specific levels of impact. Models on tinnitus presence could possibly identify risk factors for tinnitus. Through this, preventive measures could be taken to avoid the potential negative impact of tinnitus on daily life.
In prediction models, the patient specific value of each included factor is taken and combined to calculate risk estimates on the outcome for each individual. For adequate development of a clinically useful prediction model, three steps are needed. In the first step, the model is derived. This phase includes the identification of predictors, for which weights are obtained. Model validation is the second phase. During the development of a model, internal validation serves to assess and correct overfitting in the model. With external validation, the performance of the model is assessed in a different dataset. In the third and last phase, the model's clinical impact is assessed by using the prediction rule as a decision rule [7]. In prognostic model development, it is advised that one should search, review, critically appraise and externally validate already existing prediction models before one starts to develop a new prediction model [7]. We aimed to systematically review the published prediction models of tinnitus presence and impact on daily life.

Materials and Methods
In this systematic review, we followed the Cochrane guidance for critical appraisal and data extraction for systematic reviews of prediction modelling studies (the CHARMS checklist) and the preferred reporting items for systematic reviews and meta-analyses (PRISMA) [8,9]. The protocol for this systematic review was registered at the international prospective register of systematic reviews (PROSPERO) with registration number CRD42021240493 [10].

Search Strategy
We searched the electronic literature databases of PubMed and EMBASE on the 21st of January 2021. The Ingui filter for finding studies on clinical prediction models was used in our search [11]. The search syntax can be found in Appendix A. In addition to the electronic database searches, reference lists were screened to identify additional studies. We searched for developmental as well as validation studies.

Study Selection/Eligibility Criteria
We included all studies that reported on multivariable prediction models. Multivariable models were defined as having two or more predictors included. Models were included when predicting the presence of tinnitus in adults or the effect of tinnitus on daily life. We included a broad range of outcomes to measure tinnitus-related effects on daily life. These included, but were not restricted to: tinnitus burden, tinnitus severity, tinnitus distress, tinnitus-associated quality of life, tinnitus-associated annoyance and tinnitus intrusiveness. These outcomes could be measured by using single-question and multiplequestion questionnaires. We excluded letters to editors, reviews and animal studies. If articles reported multiple prediction models with a unique combination of predictors, we considered these as separate models.
We differentiated between articles reporting on the development and the external validation of studies. Articles were classified as developmental studies if the authors described the development of one or multiple models in their objectives or conclusions or if it was clear from other information (like information in the methods section) that a prediction model was developed in the study.

Screening Process
Two researchers (I.S., M.M.R.) independently screened the title and abstract of the articles for eligibility after removal of duplicates. Subsequently, the selected studies were reviewed for full text screening using predefined inclusion and exclusion criteria. Disagreements were resolved by discussion.

Data Extraction and Analysis
We created a data extraction form. This was based on the CHARMS checklist and previous research projects [9,12,13]. The following items were extracted from the included studies and included in the data extraction form: authors of the study, year of publication, journal of publication, the continent where the research was conducted, study design, study setting, instrument(s) used to measure the impact of tinnitus on daily life or tinnitus presence, the provided definition of tinnitus, percentage of patients with tinnitus in the study, mean impact of tinnitus on daily life measured with questionnaires or single questions, duration of tinnitus, number of research centres, number of participants, gender of the included patients, age of the included patients, horizon of prediction, number of predictor candidates, number of included predictor candidates in the final model, the number of predictor models, missing data, used statistical methods and the results of the prediction model. The data extraction form was triple checked by S.M.M.

Critical Appraisal (CAT)
The risk of bias (RoB) of the included studies was independently assessed by two researchers (M.M.R., I.S.) using the prediction model RoB assessment tool (PROBAST) [14]. The PROBAST tool consists of 20 signalling questions divided over four domains: participants, predictors, outcome and analysis. These domains were scored on RoB and applicability as low, high or unclear risk, based on the criteria that were provided by PROBAST [14]. PROBAST provided specific definitions for different domains to detect RoB. For example: the reasonable number of participants with a specific outcome relative to the number of candidate predictor candidates is defined as >20 (EPV >20) in model development studies. For the specific definition per domain and more explanation see: Moons et al., 2019: PROBAST: A tool to assess Risk of Bias and applicability of prediction model studies: Explanations and Elaboration [15]. Disagreements between the two researchers were solved by discussion.

Descriptive Analyses
The results of the data-extraction were summarized with descriptive statistics. No quantitative analyses were performed as this was beyond the scope of our study

Risk of Bias
Based on the criteria that were provided by PROBAST [14], the overall RoB was judged to be high in all studies, mainly due to a high RoB in the analysis domain. No studies accounted for overfitting, underfitting or optimism. No studies reported on relevant model performance measures. The RoB in the participants, predictor and outcome domain was low. Ten studies reported on a reasonable number of participants with the outcome [16,17,19,21,[27][28][29]31,33,36], and for four studies no information on this account was provided [25,26,34,35]. Eight studies did not handle missing data appropriately [16,18,20,23,25,27,29,31], and thirteen studies did not provide any information on missing data [17,19,21,22,24,26,28,30,[32][33][34][35][36]. The applicability of the participants, predictor and outcome domain was judged to be low (see Table 2: CAT).       6 = How much do these noises worry, annoy or upset you when they are at their worst?'; severely, moderately, slightly or not at all. In this analysis, 'bothersome' tinnitus was identified on the basis of responses of either 'moderately' or 'severely'. 7 = Severe tinnitus suffering (STS) refers to patients who fulfilled the following criteria: (1) Absence from work more than one consecutive month, (2) more than three visits to the therapist or the audiological physician. The STS and non-STS patient groups were compared. 8 = Have you heard any ringing, buzzing, roaring, or hissing sounds without an external acoustic source in the past year? If yes: do these sounds bother you? No, a little annoying, and very annoying.
The reported mean THI scores varied between 38.3 and 48.3 points. Bhatt also used the THI but did not report the mean THI score [27]. Instead, they reported that 88.5% of the patients had a THI score <16, whereas 8.6% had a score >18. Beukes et al., did not report the mean TFI score, but subdivided the TFI score into three categories demonstrating that 10% had a score below 25 (mild tinnitus), 30% had a score between 25 and 50 (significant tinnitus) and 60% had a sore above 50 (severe tinnitus) [18]. Wallhauser-Franke et al., categorized outcomes of scores using the mTQ: 37.6% had a total score of seven or lower, 49% had a total score between 8 and 18, and 13.4% had a total score of 19 or higher [31]. Andersson (2005) used the TRQ and reported a mean of 37.4 [17]. The studies using singleitem questionnaires reported 'bothersome tinnitus' with different definitions in 9.1-30.9% of the cases [16,19,28].

Model Presentation and Predictive Performance in Tinnitus Impact Models
All except Andersson 1999 et al. [24] and Andersson 2005 et al. [17] presented a regression slope, and two studies also presented a intercept [18,30]. Overall model performance was reported by the proportion of variance (R 2 ) in eleven studies [17][18][19][20][23][24][25]27,31,33]. Holgers et al., used a probability regression plot [30]. The other studies did not report about predictive performance [22,26,28,29,35,37]. (Table 5)  Abbreviations and symbols: # = total number. CS = cross sectional. 1  Prefer not to answer. The presence of tinnitus was characterized by participants currently having symptoms at least "now some of the time. 2 'Do you get or have you had noises (such as ringing or buzzing) in your head, or in one or both ears, that last for more than five min at a time?' yes most of the time', 'yes a lot of the time' or 'yes some of the time. 3 Patients who had received a first tinnitus diagnosis (International Classification of Diseases, 10th revision [ICD-10]: H93.1). 4 How often nowadays do you get tinnitus (noises such as ringing or buzzing in your heard or ears) that lasts for more than.

Predictors of Tinnitus Presence
The number of candidate predictors reported in the included studies varied between 16 and 125 [16,21,28,36]. The most common candidate predictors for tinnitus presence were: Gender (in 5 models), age (in 3 models) and occupational or music noise exposure (both in 3 models). In the final models the most commonly used predictors were gender (n = 3) followed by age (n = 2). (Table 4/Appendix B).

Modelling Method and Prediction Horizon in Tinnitus Presence Models
Multiple different modelling methods were used: logistic hierarchical regression [28], multinomial logistic regression [16], Stepwise multivariate logistic regression [36], multinomial logit regression model [21]. Only the study of Dawes et al., had a prediction horizon of respectively 4.3 years [16]. The other studies had a cross-sectional design.

Model Presentation and Predictive Performance in Tinnitus Presence Models
All studies presented a regression slope. Couth et al., reported an intercept [28]. Overall model performance was reported by proportion of variance (R 2 ) by two studies [16,28]. Moore et al. [21] used the Akaike Information Criterion [37]. Kostev et al., did not report their predictive performance [36]. (Table 6)

Validation Studies
Zero studies were internally validated.

Discussion
In this systematic review, we presented the published prediction models on tinnitus presence, and the impact of tinnitus on daily life. We identified 21 different studies with a total of 31 models. Of these 31 models, five reported on tinnitus presence and 26 on the impact of tinnitus on daily life. For models of tinnitus presence, the most common predictors were age, gender and smoking. For models in which the impact of tinnitus of daily life was predicted, scores of depression-associated questionnaires and anxietyassociated questionnaires were the most common. Model performance was mostly reported by using the proportion of variance (R 2 ).
Despite the high number of developed models, the quality of prognostic modelling in tinnitus research is low. To date, regrettably, no models have been validated. Due to the lack of validation and impact analyses, the models cannot be used in clinical care. None of the included models were tested for calibration and discriminative performance [38]. Earlier studies showed that the discriminative and calibration abilities of models which are based on small datasets with simple statistical methods are generally poor. The use of categorized instead of continuous data further lowers that performance [39]. Therefore, it is necessary that sufficient statistical methods are used in the context of prediction modelling [38].
Van Royen et al., recently described the difficulties of model adaptation to clinical care. The authors described four reasons why the adaptation of prediction models can fail [7]. The first reason is that models do not fit a clinical purpose, for example when a model includes a patient population that does not correspondent with the patient population in the clinic. A second reason is that the model is not validated, or reporting is incomplete. As demonstrated in this manuscript, this is applicable for the present tinnitus models. This makes it difficult for clinicians and researchers to further develop and use the models. The third reason is that there are difficulties with the implementation-for example, when the model has no impact on decision making, or when local or national regulations are a hindrance to the implementation. The last reason is failed model adaption. Examples include non-useful or non-trusted predictions, or outdated models. Most of these reasons seem to fit the tinnitus literature, whereby the lack of validation, lack of fitness for purpose due to different opinions about outcome measures, included populations and poorly reported models seem to be most prominent.
Collaboration between different research groups can lead to less accumulation or repeating of studies [40]. An improvement in tinnitus prediction research might be to improve and intensify these collaborations. Currently, there is still room for improvement. For example, many similar predictor candidates were used by the different models, of which only a minority are used in the final model. We noticed that tinnitus-specific variables and variables on somatic comorbidities are most frequently used as predictor candidates. However, only in about 25% of the models were the tinnitus specific variables used in the final models. This is in contrast to demographic factors and somatic or psychological comorbidities. These groups of variables tend to end up in the final model in about 50%. This raises the question of whether or not we should continue researching the predictive value of tinnitus-specific variables or put the scope on other domains of characteristics. This review might serve as a base for future research groups to critically assess which predictor candidates or predictors they should use, to improve prediction models' performance and their application in clinical practice. The focus could then be shifted towards model validation, rather than more model development studies.
Prediction models aim to provide guidance in clinical decision making, and should therefore be handled with care by those who develop the models. In all these stages of prediction model development, clinical knowledge about the setting, patients and pathways should be combined with the statistical and methodological know-how of model development. Therefore, we advise researchers to develop prediction models in a collaborative effort involving clinicians, statisticians and epidemiologists. The use of reporting tools can also be a helpful next step in improving tinnitus prediction modelling. Guidance can further be found in the PROBAST statement, which can help with identifying the risk of bias in prognostic studies, whereas the TRIPOD statement is suitable for guidance in reporting [14,41]. As demonstrated in our study, the majority of studies based their model on statistical methods. However, it is recommended to build models based on clinical expertise and previous literature, rather than making them purely data driven [42]. Other ideas to improve the quality of future research are the use of prospective, large, populationbased studies, and the consequent use of similar, validated, outcome measures such as the TFI [3]. This would help compare prediction models in meta-analyses, and would ease external validation. This might help to create clinically applicable prediction models.

Conclusions
We identified 21 different studies, which report a total of 31 models on either the presence or the impact of tinnitus on daily life. All included models were in the development stage. The reporting of the models was found to be poor and the risk of bias high. No studies regarding model validation or risk assessment were found. Knowing the impact prediction models can have on clinical decision making as well as on directing future research and policy making, we need to improve the quality of our prediction research. Better reporting of methods, collaboration between research groups and disciplines could aid future prediction model development.