Predictive Validity, Diagnostic Accuracy and Test-Retest Reliability of the Strength of Urges to Drink (SUTD) Scale

This study compared the 1-item Strength of Urges to Drink (SUTD) scale with the 10-item Alcohol Use Disorders Identification Test (AUDIT) on (i) test-retest reliability, (ii) predictive validity, and (iii) diagnostic accuracy. Data come from 2960 participants taking part in the Alcohol Toolkit Study (ATS), a monthly population survey of adults in England. The long-term test-retest reliability of the SUTD was ‘fair’, but lower than that for the AUDIT (Kappaweighted 0.24 versus 0.49). Individuals with “slight/moderate” urges to drink had higher odds of reporting an attempt to cut down relative to those not experiencing urges (adjusted odds ratios (AdjORs) 1.78 95% confidence interval (CI) 1.43–2.22 and 1.54 95% CI 1.20–1.96). Drinkers reporting “moderate/slight/strong” urges to drink had mean change in consumption scores which were 0.16 (95% CI −0.31 to −0.02), 0.40 (95% CI −0.56 to −0.24) and 0.37 (95% CI −0.69 to −0.05) units lower than those reporting no urges. For all outcomes, strong associations were found with AUDIT scores. The accuracy of the SUTD for discriminating between drinkers who did and did not reduce their consumption was ‘acceptable’, and similar to that for the AUDIT (ROCAUC 0.6). The AUDIT had better diagnostic accuracy in predicting change in alcohol consumption. The SUTD may be an efficient dynamic measure of urges to drink for population surveys and studies assessing the impact of alcohol-reduction interventions.


Introduction
Worldwide each year around 6 L on average of pure alcohol are consumed by every person aged 15 years or older [1]. A large variation exists in adult per capita consumption with the highest consumption levels found in the developed world. In England, around 17% (~9 million) of adults drink alcohol above recommended limits [2] and 6% (~1 million) of the population are classified as dependent i.e., they have a physical and/or mental dependency on alcohol which is associated with high levels of tolerance to its effects and withdrawal symptoms when absent [3]. Such consumption levels are associated with a number of non-communicable diseases, injury and alcohol attributable death each year [1]. be able to identify harmful drinkers who may be around the threshold for dependence and, therefore, tertiary preventive work can be used to help stop further escalation of problems.
Thus, this study aimed to evaluate psychometric properties of the strength of urges to drink on a single day, known as the Strength of Urges to Drink (SUTD) measure, among a population sample of high-risk drinkers. Such epidemiological data has several advantages over patient populations, including the fact that many individuals who are alcohol-dependent remain undiagnosed. Population-based studies may be able to pick some of these individuals up [30,31]. Comparisons will be made with the AUDIT, as it is the most widely used screening tool and can be self-completed.
More specifically, it aimed to assess the: 1.
Test-retest reliability of the SUTD compared to the AUDIT.

2.
Predictive validity of the SUTD compared to the AUDIT in relation to (a) reported attempts to reduce alcohol consumption between baseline and follow-up; (b) reported alcohol consumption at follow-up and (c) change in alcohol consumption between baseline and follow-up. 3.
Diagnostic accuracy of the SUTD compared to the AUDIT in relation to (a) attempts to reduce alcohol consumption between baseline and follow-up; (b) alcohol consumption at follow-up and (c) change in alcohol consumption between baseline and follow-up.

Design and Setting
Data were used from repeated cross-sectional household surveys of a representative sample of the population of adults in England conducted in consecutive monthly waves between March 2014 and December 2016. The surveys are part of the ongoing Alcohol Toolkit Study which is designed to provide tracking information about alcohol consumption and related behaviours in England. Each month a new sample of approximately 1700 adults aged 16+ complete face-to-face computer assisted interviews. All respondents are asked if they are happy to be re-contacted 6 months after baseline [32]. The baseline survey uses a type of random location sampling, which is a hybrid between random probability and simple quota sampling. England is first split into 171,356 'Output Areas', comprising approximately 300 households. These areas are then stratified based on ACORN characteristics and geographic region. ACORN (A Classification Of Residential Neighbourhoods) is a socio-economic profiling tool developed by Acorn Consumer Classification (CACI) [33]. The areas are then randomly allocated to interviewers, who travel to their selected areas and conduct the electronic interviews with one member of the household. Interviews are conducted until quotas based upon factors influencing the probability of being at home and tailored to local area census data are fulfilled. Morning interviews are avoided to maximise participant availability.

Ethical Approval
Ethical approval for the Smoking Toolkit Study (STS), a sister survey to the Alcohol Toolkit Study (ATS), was originally granted by the UCL Ethics Committee (ID 0498/001). Approval for the ATS was granted by the same committee as an extension of the STS.

Measures
At baseline, participants were asked questions that assessed: age; sex; an occupationallybased classification of socio-economic status called 'social grade' (dichotomised to ABC1 = higher and intermediate professional/managerial and supervisory, clerical, junior managerial/administrative/ professional or C2DE = skilled, semi-skilled, unskilled manual and lowest grade workers or unemployed); government office region in England (dichotomised to North = North East, North West, and Yorkshire and the Humber, East Midlands, West Midlands, or South = East of England, London, South East, and South West, classified according to an established North-South divide); receipt of a voluntary educational qualification (obtained after compulsory education ceases at 16 years old); ethnicity (dichotomised as white versus other); and disability. They were also asked if they were currently attempting to cut down on their alcohol consumption.
Participants were also asked to complete the AUDIT questionnaire [10,34,35] and the SUTD measure which consists of one item: "How strongly have you felt the urge to drink in the past 24 h?" Responses include: not at all, slight, moderate, strong, very strong and extremely strong.
The AUDIT-Quantity/Frequency scale (AUDIT-QF) [36] comprises the first two questions of the AUDIT: 1.
"How often do you have a drink containing alcohol?" Responses include: never, monthly or less, 2-4 times a month, 2-3 times a week and 4+ times a week. 2.
Scores on these two questions are combined to give a measure of alcohol consumption, with a range of 0 to 8.
Those who scored 8 or more (i.e., indicating hazardous and or harmful alcohol consumption and possible dependence) on the AUDIT or 5 or more on the AUDIT-C, which comprises the first three questions of the AUDIT, (i.e., indicating high-risk consumption) at baseline were then re-contacted at 6-months follow-up and asked to complete the SUTD, AUDIT and AUDIT-QF questionnaires and: "How many attempts to restrict your alcohol consumption have you made in the last 6 months (e.g., by drinking less, choosing lower strength alcohol or using smaller glasses)?

Analyses
The protocol for this study was published on the Open Science Framework prior to data analysis (https://osf.io/wuuqr/). An amendment was made to the analysis plan in February 2017: we added a plan to assess the predictive validity of the SUTD in relation to the change in consumption between baseline and follow-up.
All analyses were conducted in R version 3.3.2. Data were weighted for key prevalence statistics (for more details see [32]). Those who were and were not followed up were compared on key baseline variables to establish representativeness of the follow-up sample using Mann-Whitney U, t-tests and chi-square tests as appropriate.
Test-retest reliability was assessed by calculating: (a) a reliability coefficient (r), which is simply the Spearman's correlation between the scores on the first and the second testing. The value for the r coefficient can fall between 0.00 (no correlation) and 1.00 (perfect correlation); and (b) a weighted kappa coefficient which is suitable for ordinal data. Values can range from −1 to 1, where 1 indicates perfect agreement, 0 indicates no agreement beyond chance and negative values indicate inverse agreement. Cohen suggested the Kappa result be interpreted as follows: values ≤0 as indicating no agreement and 0.01-0.20 as none to slight, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as substantial, and 0.81-1.00 as almost perfect agreement [37].
The predictive validity of the SUTD was evaluated by examining the association between the SUTD scale and (a) attempts to reduce alcohol intake; (b) levels of alcohol consumption at follow-up and (c) change in alcohol consumption between baseline and follow-up using a Mann-Whitney U test and linear-by-linear association chi-square test. Next attempts to reduce alcohol intake, levels of alcohol consumption at follow-up and change in alcohol consumption between baseline and follow-up were regressed on to the baseline scores using simple logistic and linear regression and multiple logistic and linear regression, adjusting for the following covariates measured at baseline: age, sex, social grade, region, receipt of a voluntary educational qualification, ethnicity, disability, AUDIT and wave of the survey.
To assess predictive accuracy, Received Operating Characteristic (ROC) curves were then calculated [38]. The ROC curve is a graphical presentation of the accuracy of a measure in which the sensitivity of the measure (i.e., the true positive rate) is plotted against the specificity (i.e., the false positive rate). The area under the ROC curve (ROC AUC ) has a value from 0.5 (chance level only) to 1 (perfect discrimination). Alcohol consumption and change in alcohol consumption were first dichotomised according to their mean value into lower and higher scores [39]. Our a priori hypothesis was that the SUTD would be as accurate in discriminating whether drinkers attempt to cut down at follow-up and whether drinkers have a lower or higher alcohol consumption than the mean.
After viewing the distributions and associations for the SUTD additional unplanned sensitivity analyses were run collapsing the "moderate", "strong", "very strong" and "extremely strong" categorises into a three item SUTD scale (SUTD-3) comprising of "not at all", "slight" and "moderate > 3". Unplanned analyses were also run to assess the predictive and diagnostic accuracy of the SUTD, SUTD-3 and AUDIT in relation to consumption at follow-up and a change in consumption between baseline and follow-up restricted to those making an attempt to cut down at baseline. This analysis more accurately mirrors the previous association established between the Strength of Urges to Smoke (SUTS) scale and the success of attempts to quit smoking [20].
Strengthening The Reporting of OBservational studies in Epidemiology (STROBE) guidelines for the reporting of observational epidemiological studies were followed throughout [40].

Results
The sample followed up 6 months after baseline (n = 2960) differed from those not followed up (n = 11,679). They were more likely to be older, to report currently attempting to cut down their alcohol consumption, to be from high socio-economic status, to have a disability, to reside in the South of England, to have stronger urges to drink and to have higher AUDIT scores (Table 1). Figure 2 shows the distribution of scores on the SUTD measure at baseline and follow-up. At baseline the two most frequently reported categories were "not at all" and "slight". Nineteen per cent (n = 2730) and 11.9% (n = 352) scored in the highest four categories (i.e., greater than moderate at baseline and follow-up, respectively. positive rate). The area under the ROC curve (ROCAUC) has a value from 0.5 (chance level only) to 1 (perfect discrimination). Alcohol consumption and change in alcohol consumption were first dichotomised according to their mean value into lower and higher scores [39]. Our a priori hypothesis was that the SUTD would be as accurate in discriminating whether drinkers attempt to cut down at follow-up and whether drinkers have a lower or higher alcohol consumption than the mean. After viewing the distributions and associations for the SUTD additional unplanned sensitivity analyses were run collapsing the "moderate", "strong", "very strong" and "extremely strong" categorises into a three item SUTD scale (SUTD-3) comprising of "not at all", "slight" and "moderate > 3". Unplanned analyses were also run to assess the predictive and diagnostic accuracy of the SUTD, SUTD-3 and AUDIT in relation to consumption at follow-up and a change in consumption between baseline and follow-up restricted to those making an attempt to cut down at baseline. This analysis more accurately mirrors the previous association established between the Strength of Urges to Smoke (SUTS) scale and the success of attempts to quit smoking [20].
Strengthening The Reporting of OBservational studies in Epidemiology (STROBE) guidelines for the reporting of observational epidemiological studies were followed throughout [40].

Results
The sample followed up 6 months after baseline (n = 2960) differed from those not followed up (n = 11,679). They were more likely to be older, to report currently attempting to cut down their alcohol consumption, to be from high socio-economic status, to have a disability, to reside in the South of England, to have stronger urges to drink and to have higher AUDIT scores (Table 1). Figure 1 shows the distribution of scores on the SUTD measure at baseline and follow-up. At baseline the two most frequently reported categories were "not at all" and "slight". Nineteen per cent (n = 2730) and 11.9% (n = 352) scored in the highest four categories (i.e., greater than moderate at

Attempts to Cut Down between Baseline and 6-Month Follow-Up
A total of 767 higher risk drinkers (25.9%; 95% CI 24.3 to 27.5) reported that they had attempted to reduce their alcohol consumption between baseline and follow-up. Table 2 presents the percentage of high-risk drinkers reporting an attempt to cut down stratified by their baseline SUTD score. For the full SUTD measure there is a clear monotonic relationship between the percentage attempting to cut down and increasing urges to drink (U = 974,080, p < 0.001). Of the 23 drinkers who scored the two highest levels of urges to drink, 37.1% had attempted to cut down. The relationship is also monotonic for the SUTD-3 (U = 973,570, p < 0.001).
The odds of attempting to cut down between baseline and the 6-month follow-up according to the SUTD and SUTD-3 scales are also presented in Table 2. For the SUTD, drinkers reporting "slight" to "very strong" urges to drink had 1.57 to 1.09 times higher odds of making an attempt to cut down than drinkers who reported "not at all". After adjusting for age, sex, social grade, region, receipt of a voluntary educational qualification, ethnicity, disability, AUDIT scores and wave of the survey, those reporting "slight" and "moderate" urges had 1.78 and 1.58 higher odds of an attempt to cut down. For the SUTD-3 scale, those reporting "slight" and "moderate >" urges to drink had a 1.78 and 1.51 higher odds of reporting an attempt to cut down at follow-up in adjusted analyses.
In comparison, a positive association was also found between AUDIT scores and attempts to cut down in unadjusted analyses (odds ratio (OR) 1.09, 95% CI 1.07 to 1.12, p < 0.001). This significant association remained after adjustment for all other variables in Table 2 (OR 1.10; 95% CI 1.07 to 1.12, p < 0.001).

Reported Alcohol Consumption at 6-Month Follow-Up
All Participants

Attempts to Cut Down between Baseline and 6-Month Follow-Up
A total of 767 higher risk drinkers (25.9%; 95% CI 24.3 to 27.5) reported that they had attempted to reduce their alcohol consumption between baseline and follow-up. Table 2 presents the percentage of high-risk drinkers reporting an attempt to cut down stratified by their baseline SUTD score. For the full SUTD measure there is a clear monotonic relationship between the percentage attempting to cut down and increasing urges to drink (U = 974,080, p < 0.001). Of the 23 drinkers who scored the two highest levels of urges to drink, 37.1% had attempted to cut down. The relationship is also monotonic for the SUTD-3 (U = 973,570, p < 0.001).
The odds of attempting to cut down between baseline and the 6-month follow-up according to the SUTD and SUTD-3 scales are also presented in Table 2. For the SUTD, drinkers reporting "slight" to "very strong" urges to drink had 1.57 to 1.09 times higher odds of making an attempt to cut down than drinkers who reported "not at all". After adjusting for age, sex, social grade, region, receipt of a voluntary educational qualification, ethnicity, disability, AUDIT scores and wave of the survey, those reporting "slight" and "moderate" urges had 1.78 and 1.58 higher odds of an attempt to cut down. For the SUTD-3 scale, those reporting "slight" and "moderate >" urges to drink had a 1.78 and 1.51 higher odds of reporting an attempt to cut down at follow-up in adjusted analyses.
In comparison, a positive association was also found between AUDIT scores and attempts to cut down in unadjusted analyses (odds ratio (OR) 1.09, 95% CI 1.07 to 1.12, p < 0.001). This significant association remained after adjustment for all other variables in Table 2 (OR 1.10; 95% CI 1.07 to 1.12, p < 0.001).

All Participants
The mean consumption score at follow-up was 4.7 (95% CI 4.6 to 4.7). Table 2 presents the mean consumption scores of high-risk drinkers stratified by their baseline SUTD and SUTD-3 scores. There appears to be an almost linear increase in mean consumption scores with increasing urges to drink on SUTD score (z = 9.821, p < 0.001) and SUTD-3 scale (z = 10.533, p < 0.001). Of the 26 drinkers who scored the two highest levels of urges to drink on the SUTD, the mean consumption score was 5.14.
Drinkers reporting "moderate", "strong", "very strong", and "extremely strong" urges to drink on the SUTD had mean consumption scores which were 0.30, 0.77, 0.80 and 0.56 units higher than those reporting "not at all" ( Table 2). The beta values were smaller after adjusting for age, sex, social grade, region, receipt of a voluntary educational qualification, ethnicity, disability, AUDIT scores and wave of the survey. On the SUTD-3 those reporting "slight" and ">moderate" urges had consumption scores which were 0.22 and 0.46 units higher than those not experiencing urges to drink in adjusted analyses.
By comparison, in unadjusted analyses, a positive association was found between AUDIT scores and consumption levels (β 0.13, 95% CI 0.11 to 0.14, p < 0.001). This significant association remained after adjustment for all other variables in Table 2 (β 0.13; 95% CI 0.08 to 0.35, p < 0.001).
Participants Cutting Down at Baseline Table 3 presents the mean consumption scores of high-risk drinkers who reported cutting down at baseline stratified by their baseline SUTD and SUTD-3 scores. The mean consumption score at follow-up among those cutting down at baseline (n = 692) was 4.9 (95% CI 4.8 to 4.9). There appeared to be an almost linear increase in mean consumption scores with increasing urges to drink on the SUTD (z = 2.5733, p = 0.010) and SUTD-3 (z = 3.3679, p < 0.001).
In unadjusted analyses, the data were inconclusive as to whether those reporting "slight", "strong", "very strong" and "extremely strong" urges to drink on the SUTD had different consumption levels at follow-up relative to those reporting "not at all". In contrast, those with "moderate" urges to drink had significantly higher consumption levels (Table 3). For the SUTD-3, the data were inconclusive as to whether those reporting "slight" urges to drink had different consumption levels at follow-up relative to those reporting "not at all". In contrast, those reporting ">moderate" urges had significantly higher consumption levels in unadjusted but not adjusted analyses.

All Participants
The mean change in consumption scores between baseline and follow-up was 0.32 (95% CI 0.30 to 0.36). Table 2 presents the change scores stratified by their baseline SUTD score and shows a non-linear association for the full SUTD scale (z = −1.7804, p = 0.075) and SUTD-3 scale (z = −3.3012, p < 0.001). Of the 26 drinkers who scored the two highest levels of urges to drink on the SUTD scale, the mean change in consumption score was 0.86.
Drinkers reporting "moderate" and "extremely strong" urges to drink on the SUTD had mean change in consumption scores which were 0.3 units lower and 1.05 units higher than those reporting "not at all" ( Table 2). The differences were smaller after adjusting for age, sex, social grade, region, receipt of a voluntary educational qualification, ethnicity, disability, AUDIT scores and wave of the survey. For the SUTD-3 changes in consumption were smaller for those in the "slight" and "moderate >" relative to those not reporting urges to drink after adjustment.
By comparison, in unadjusted analyses, a positive association was found between AUDIT scores and the change in consumption levels (β 0.05, 95% CI 0.04 to 0.07, p < 0.001). This significant association remained after adjustment for all other variables in Table 2 (β 0.03; 95% CI 0.05 to 0.08, p < 0.001).

Participants Cutting Down at Baseline
The mean change in consumption among those currently cutting down at baseline (n = 692) was 0.20 (95% CI 0.08 to 0.32). Table 3 presents the mean change in consumption scores of high-risk drinkers who reported cutting down at baseline stratified by their baseline SUTD and SUTD-3 scores. There was no linear association between mean consumption scores and urges to drink on the SUTD (z = 1.4292, p = 0.153) or SUTD-3 (z = 0.065, p = −0.9481).
In unadjusted analyses, the data were inconclusive as to whether those reporting "slight", "moderate", "strong" and "very strong" urges to drink on the SUTD had different consumption change scores relative to those reporting "not at all" (Table 3). In contrast, those with "extremely strong" urges to drink had significantly larger change scores, suggesting a significantly larger increase in consumption between baseline and follow-up. For the SUTD-3, the data were inconclusive as to whether those reporting "slight" and ">moderate" urges to drink had different consumption change scores to those reporting "not at all".
In contrast, a positive association was found for the AUDIT scores and the change in consumption levels in both adjusted and unadjusted analyses (β 0.07; 95% CI 0.05 to 0.10, p < 0.001 versus β adj −0.06; 95% CI −0.03 to 0.09, p < 0.001).  Table 2. Results of the regression analysis assessing the association between attempts to cut down drinking between baseline and 6-month follow-up (any versus none), mean consumption at follow-up and mean change in consumption between baseline and 6-month follow-up with the SUTD scale (N = 2960). Note: OR = odds ratio; OR and β adjusted for age, sex, social grade, region, receipt of a voluntary educational qualification, ethnicity, disability, AUDIT and wave of the survey; * significant at p < 0.05; ** significant at p < 0.01; *** significant at p < 0.001; a Positive score = higher consumption at follow-up than baseline, negative score = lower consumption at follow-up than baseline; SUTD = Strength of Urges to Drink Scale Table 3. Results of the regression analysis assessing the association between mean consumption at follow-up and mean change in consumption between baseline and 6-month follow-up with the SUTD scale restricted to participants cutting down at baseline (N = 692). Note: OR = odds ratio; OR and β adjusted for age, sex, social grade, region, receipt of a voluntary educational qualification, ethnicity, disability, AUDIT and wave of the survey; * significant at p < 0.05; ** significant at p < 0.01; *** significant at p < 0.001; a Positive score = higher consumption at follow-up than baseline, negative score = lower consumption at follow-up than baseline. Figure 3a shows the ROC curve for the six-item SUTD measure predicting attempts to cut down. The ROC AUC was 0.6 (95% CI 0.5 to 0.6). The ROC AUC for the AUDIT (0.6; 95% CI 0.6 to 0.7) and SUTD-3 (0.6; 95% CI 0.5 to 0.6) were similar. This would suggest that scores on the SUTD, AUDIT and SUTD-3 would lead to correct categorisation of whether one will make an attempt to cut down around 60% of the time. Figure 3b shows the ROC curve for the six-item SUTD measure predicting consumption levels at follow-up. The ROC AUC was 0.6 (95% CI 0.5 to 0.6). The ROC AUC for the AUDIT (0.7; 95% CI 0.6 to 0.7) but for the SUTD-3 (0.6; 95% CI 0.5 to 0.6) was similar. This would suggest that scores on the SUTD and SUTD-3 would lead to correct categorisation of consumption around 60% and on the AUDIT around 70% of the time. When restricting the analysis to those cutting down at baseline, the ROC AUC's were as follows: SUTD (0.5; 95% CI 0.5 to 0.6), SUTD-3 (0.5; 95% CI 0.4 to 0.6) and AUDIT (0.7; 95% CI 0.6 to 0.7). Figure 3c shows the ROC curve for the six-item SUTD measure predicting change in consumption levels between baseline and follow-up. The ROC AUC was 0.5 (95% CI 0.5 to 0.6). The ROC AUC for the AUDIT was slightly higher (0.6; 95% CI 0.5 to 0.6) but for the SUTD-3 (0.5; 95% CI 0.5 to 0.6) was similar. This would suggest that scores on the SUTD, SUTD-3 and AUDIT would lead to correct categorisation of consumption around 50% and 60% of the time. When restricting the analysis to those cutting down at baseline, the ROC AUC's were as follows: SUTD (0.6; 95% CI 0.5 to 0.6), SUTD-3 (0.6; 95% CI 0.5 to 0.7) and AUDIT (0.6; 95% CI 0.5 to 0.7).

Discussion
Although the long-term test re-test reliability was better for the AUDIT it was still fair for the SUTD [37]. The SUTD was associated with heavier alcohol consumption at follow-up, a reduction in alcohol consumption between baseline and follow-up and greater likelihood of attempting to cut down between baseline and 6-months follow-up. The accuracy of the SUTD in discriminating between drinkers who attempted to reduce and did not attempt to reduce their alcohol intake, and drinkers with a consumption level lower than the mean and higher than the mean consumption level at follow-up, was around 0.6 which would be broadly considered as acceptable [41]. However, the SUTD was poor at discriminating between those with a change in consumption level between baseline and follow-up which was lower than the mean and higher than the mean. The AUDIT had

Discussion
Although the long-term test re-test reliability was better for the AUDIT it was still fair for the SUTD [37]. The SUTD was associated with heavier alcohol consumption at follow-up, a reduction in alcohol consumption between baseline and follow-up and greater likelihood of attempting to cut down between baseline and 6-months follow-up. The accuracy of the SUTD in discriminating between drinkers who attempted to reduce and did not attempt to reduce their alcohol intake, and drinkers with a consumption level lower than the mean and higher than the mean consumption level at follow-up, was around 0.6 which would be broadly considered as acceptable [41]. However, the SUTD was poor at discriminating between those with a change in consumption level between baseline and follow-up which was lower than the mean and higher than the mean. The AUDIT had acceptable to good discriminatory accuracy for all outcomes, performing better than the SUTD on predicting consumption levels and change in consumption levels.
This study has several advantages including the use of data from a large household survey of adults in England which enabled the assessment of the validity and reliability of the SUTD scale compared to the widely-validated AUDIT questionnaire. However, this study also has several limitations which must be considered. First, is the low response rate at 6-months follow-up which may have introduced bias. However, differences between those followed and those not followed up were small. Secondly, participants were asked to retrospectively recall attempts to cut down on their alcohol intake and thus it is possible attempts were forgotten. This may have led to an underestimation of the association with urges to drink. Thirdly, this paper assessed the association between SUTD measures and changes in alcohol consumption at one time point. Given the transient nature of urges to drink it will be important to assess in further studies associations using repeated longitudinal measures (e.g., ecological momentary assessment) and also the relationship with other outcomes including relapse. Finally, interviews could happen at any time of the day but morning interviews were avoided to maximise availability. It is possible that urges to drink are different in the morning and evening and that any differences are moderated by dependence levels. Dependent drinkers may have a greater urge for 'relief drinking' in the morning, while heavy non-dependent drinkers could be more affected by cues for early evening drinking.
Test-retest reliability of AUDIT scores has been shown to be high at least in the short term (e.g., r > 0.6) [42]. The poorer reliability of the SUTD identified in this study may reflect the gap of 6 months between measurement periods, with the AUDIT questionnaire assessing dependency over the past few weeks, while the SUTD measures urges over the past 24 h. Lack of long-term stability in urges to drink is consistent with the assumptions of psychological theories e.g., PRIME theory, which view urges at momentary states [19].
Previous studies have found that the accuracy of the AUDIT questionnaire in discriminating whether one has an alcohol use disorder according to DSM-IV and ICD-10 criteria is as high as 0.9 [43,44]. Literature is however lacking on ROC AUCs for predictors of attempts to cut down and alcohol consumption i.e., actual behaviour change. Studies on cigarette dependence have found that ROC AUCs for attempts to quit smoking are in a similar range to those identified in the current study [21]. This likely reflects greater difficulties in predicting behaviour change due to instability over time. For example, the number of drinks consumed on any one occasion is strongly associated with pre-drinking mood [45].
An additional point of interest is that a significant number of high-risk drinkers attempted to cut down after reporting that they did not have any urges to drink. This provides further evidence that behaviour is a relatively complex and unstable phenomenon and results from the interplay between multiple motivational influences on a moment-to-moment basis e.g., plans, beliefs, views, evaluations, and desires [19]. It also suggests that health-care professionals should not stop encouraging patients to cut down on their alcohol consumption even if they do not report strong desires to drink [46].

Conclusions
In conclusion, this single item measure of urges to drink may be an efficient quantitative tool for population level surveys and studies assessing the impact of interventions aimed at helping high-risk drinkers reduce their alcohol consumption. The fact that it involves reported experience in the previous 24 h means that it might form a helpful dynamic measure, which is a limitation of the AUDIT questionnaire. Further research should assess the external validity of this measure in different populations and examine short-term test-retest reliability. UK (CRUK; C1417/A22962). E.B. also receives support from SPHR2. EK is funded by the NIHR SPHR and the NIHR School for Primary Care Research (SPCR). CD is part funded by the NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, and part funded by the NIHR Collaboration for Leadership in Applied Health Research and Care South London and CD receives funding from an NIHR Senior Investigator award.