The Impact of Lifetime Alcohol and Cigarette Smoking Loads on Amyotrophic Lateral Sclerosis Progression: A Cross-Sectional Study

Background—Amyotrophic lateral sclerosis (ALS) is a devastating and untreatable motor neuron disease; smoking and alcohol drinking may impact its progression rate. Objective—To ascertain the influence of smoking and alcohol consumption on ALS progression rates. Methods—Cross-sectional multicenter study, including 241 consecutive patients (145 males); mean age at onset was 59.9 ± 11.8 years. Cigarette smoking and alcohol consumption data were collected at recruitment through a validated questionnaire. Patients were categorized into three groups according to ΔFS (derived from the ALS Functional Rating Scale-Revised and disease duration from onset): slow (n = 81), intermediate (80), and fast progressors (80). Results—Current smokers accounted for 44 (18.3%) of the participants, former smokers accounted for 10 (4.1%), and non-smokers accounted for 187 (77.6%). The age of ALS onset was lower in current smokers than non-smokers, and the ΔFS was slightly, although not significantly, higher for smokers of >14 cigarettes/day. Current alcohol drinkers accounted for 147 (61.0%) of the participants, former drinkers accounted for 5 (2.1%), and non-drinkers accounted for 89 (36.9%). The log(ΔFS) was weakly correlated only with the duration of alcohol consumption (p = 0.028), but not with the mean number of drinks/day or the drink-years. Conclusions: This cross-sectional multicenter study suggested a possible minor role for smoking in worsening disease progression. A possible interaction with alcohol drinking was suggested.


Introduction
Amyotrophic lateral sclerosis (ALS) is an intractable neurodegenerative disease that is characterized by the progressive degeneration of motor neurons. The main clinical

Materials and Methods
The study was designed as a cross-sectional multicenter study. It was conducted in three centers in Italy (San Giovanni Rotondo, Coordinating Center, Novara, and Modena), one in the Republic of Moldova (Chisinau), and one in Romania (Cluj-Napoca). The study was approved by the Institutional Review Boards of the coordinating center (N96/CE/2016) and the other four centers. Written informed consent was obtained from all participants. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) cross-sectional reporting guidelines [14] were used as the reference for reporting the study (Supplemental File 1).

Patients
Patients of both sexes were consecutively enrolled from October 2016 to January 2020, in different periods in each center. The inclusion criteria were as follows: (1) age higher than 18 years old, (2) diagnosis according to the El Escorial criteria [15], and (3) consecutive in-and out-patients with a new (incident) or already present (prevalent) clinical diagnosis of ALS. The exclusion criteria were as follows: (1) patients with a tracheostomy or receiving mechanical ventilation, (2) patients with percutaneous endoscopic gastrostomy, and (3) patients who did not sign the informed consent form.

Data Collection and Disease Progression Assessment
For each patient, we collected demographic (date of birth, gender, education) and clinical variables (date of onset and diagnosis, site of onset, diagnostic category according to the El Escorial criteria, BMI, forced vital capacity (FVC), therapy). Interviews were conducted during the clinical visit by interviewers that were blinded to the patients' clinical history and neurological status. Disease severity was estimated through the ALSFRS-R, which evaluates the severity of the disease through a 12-item questionnaire [16]. The rate of disease progression (∆FS score) at recruitment was calculated by dividing the ALSFRS-R total score by the symptom duration (months) by applying the following formula: ∆FS = 48 − (total ALSFRS-R at recruitment)/symptom duration in months [17]. The date of disease onset was determined based on subjective complaints, information confirmed by relatives, and clinical charts.

Exposure Assessment
Cigarette smoking and alcohol consumption data were collected at recruitment through the "Questionnaire of Lifestyle," which is part of the European Prospective Investigation into Cancer and Nutrition project (EPIC) study [18,19]. We defined three categories of smoking status at recruitment in relation to the disease onset: non-smokers were those who had smoked <100 cigarettes up to the time of the interview [20] or stopped smoking at least six months before the disease onset; current smokers were those who had smoked ≥100 cigarettes and were still smoking at the time of the interview, or within six months of the interview; former smokers were those who had smoked ≥100 cigarettes and had stopped smoking after disease onset, but at least six months prior of recruitment.
All smokers were asked to state the age when they started and quit smoking (if appropriate), and to quantify the number of cigarettes smoked per day at the ages of 20, 30, 40, 50, 60, and ≥70 years up to the participants current age. For each age period, we calculated the mean number of cigarettes smoked per day based on the questionnaire information and the number of years spent smoking (i.e., smoking duration). The cigarette smoking duration (years) was calculated as the difference between the age at recruitment or smoking cessation and age when the participant started smoking. According to Peters et al. [7] we estimated the smoking intensity (cigarettes per day) as the weighted mean of the number of cigarettes smoked per day during different age periods, with weights equal to the smoking duration within each age period. Pack-years (a measure of lifetime smoking load) was calculated by dividing the smoking intensity by 20 and multiplying the result by the smoking duration (in years).
Similarly, detailed information was obtained regarding alcohol consumption during different age periods up to the participants current age. In relation to the disease onset, we defined non-drinkers as those who had drunk less than one standard alcoholic drink/month up to the time of the interview, or had stopped drinking at least six months before the disease onset; current drinkers were those who had drunk more than one standard alcoholic drink/month and were still drinking at the time of the interview, or within six months of the interview; former drinkers were those who had drunk more than one standard alcoholic drink/month and had stopped drinking after disease onset, but at least six months prior to recruitment. All drinkers were asked to state the age when they started and quit drinking (if appropriate), and to state the number of alcoholic drinks per day by type of beverage (wine, beer, and spirits) at the age of 20, 30, 40, 50, 60, and ≥70 years up to the participants current age. An Italian standard alcoholic drink (standard alcoholic unit) contains approximately 12 g of pure ethanol [21], corresponding to a small glass of wine (125 mL), a can of beer (330 mL), or a shot of spirits (40 mL). Analogous with the measures obtained for smoking, we calculated the drinking intensity (drinks/day) as the weighted mean number of standard alcoholic units per day during different age periods with weights equal to the number of years spent drinking (i.e., drinking duration) within each age period for each type of beverage, and in total (aggregating all types of drinks). Drink-years (a measure of the cumulative lifetime alcohol drinking load) was calculated by multiplying the drinking intensity by the drinking duration (in years).

Validation and Administration of the Questionnaire
The questionnaire was designed in Italian, then translated into Romanian by a native Romanian speaker, and backtranslated by an Italian native speaker. In two sites, two raters, previously trained in the use of the questionnaire and blinded to the patients' clinical status, interviewed patients in a dedicated room. To evaluate the reliability of the questionnaire (inter-rater agreement), two pairs of raters interviewed healthy people or patients with neurological diseases before the study started (40 in Chisinau and 25 in San Giovanni Rotondo). The sequence of interviews was randomized and the randomization list was concealed. Each rater did the interviews on at least one day and no more than seven days apart; this was considered a sufficient time window for the subject being unable to remember his or her answers and not to change his or her consumption habits. Agreement between the two raters for consumption (yes/no) was calculated using Cohen s kappa statistics [22] and was 0.88/1.0 for smoking/drinking in Chisinau and 0.92/0.95 in San Giovanni Rotondo. Agreement for continuous variables was determined with the intraclass correlation coefficient [23], and differed for the different variables in Chisinau (0.57/1.0) and San Giovanni Rotondo (0.63/1.0).

Statistical Analysis
The patients' characteristics are reported as mean ± standard deviation, or median along with range, depending on their distribution, and with absolute and relative frequencies (percentages) for continuous and categorical variables, respectively. The normality of the continuous variables' distributions was checked using Q-Q plots and the Shapiro-Wilk test. In the presence of right-skewed continuous variables, statistical analyses were performed on log values. Comparisons between two categorical variables were assessed using chi-square or Fisher exact tests (as appropriate), whereas comparisons between a continuous and a categorical variable were assessed using univariable and multivariable ANOVA models. Pairwise comparisons between groups of the categorical variables were performed (using the ANOVA models) and, if necessary, least-square means of the dependent variable (along with their 95% confidence interval (CI)) were estimated for each level of the categorical variable. The standardized mean difference was further reported to quantify, from a clinical perspective, the differences in investigated variables between groups and was computed as the average of all possible standardized mean differences across pairwise comparisons. The correlation between two continuous variables was assessed using Pearson's correlation coefficient. To visually assess the relationship between the measures of the intensity (cigarettes or drinks per day), cumulative lifetime load (packor drink-years), and the duration of consumption as independent variables and ∆FS as the dependent variable, boxplots and scatterplots with fitted regression lines were depicted in a plot matrix. To detect all the clinical, demographical, pathological, treatment, and lifestyle variables, which were mostly associated with ∆FS, the conditional random forest (RF) algorithm [24] with 100,000 trees was used. The RF is a popular machine learning tool that assesses the relationship between a dependent variable and a set of covariates in a (nonparametric) tree-based fashion. An important feature of an RF is that it provides a rapidly computable internal measure of variable importance (VIMP) that can be used to rank variables. Moreover, the VIMP produced by a conditional RF is not affected by the correlation structure of all the included covariates. Formally, a VIMP of a specific covariate is defined as the sum of the decrease in prediction error values when a tree of the forest is split by that covariate. The more a tree relies on a variable to make predictions, the more important it is for that tree. The relative importance is the VIMP divided by the highest VIMP value. To better understand the marginal relationship between the value of each "important" variable (i.e., with VIMP > 0) and log(∆FS), a scatterplot of the accumulated local effects [25], which were estimated from the fitted conditional RF, was provided. In addition, the joint relationship (i.e., interaction) between smoking and alcohol intensity effects on log(∆FS) was investigated using a partial dependence plot [25].
A two-sided p-value < 0.05 was considered to represent statistical significance. All statistical analyses were performed using SAS Release 9.4 (SAS Institute, Cary, NC, USA). Conditional random forests and plots were performed using R Foundation for Statistical Computing (version 3.6, packages: party, GGally, iml).

Results
We recruited 241 patients, 145 men and 96 women, with a sex ratio of 1.5:1. Onset was in the spinal district in 187 patients (77.6%) and bulbar in 54 patients (22.4%). The mean age was 59.9 ± 11.8 years at onset and 62.4 ± 11.0 at recruitment. The median time that elapsed between disease onset and recruitment was 20 months (range 1.7-273). According to the El Escorial criteria, 74 (30.7%) patients were categorized as definite, 77 (32.0%) as probable, 55 (22.8%) as possible, and 35 (14.5%) as suspected in terms of their ALS diagnosis. Other demographic and clinical characteristics are shown in Table 1. The ALSFRS-R scores ranged from 10 to 48, with a mean of 34.9 ± 8.3. The ∆FS score ranged from 0 to 5.  * Missing values were excluded from the analysis and percentages were computed out of the total number of observations. SD: standard deviation; p-values from ANOVA models or chi-square (with continuity correction) statistics for continuous and categorical variables, respectively. # p-values from Fisher exact test. SMD: standardized mean difference (i.e., the average of all possible standardized mean differences). Tertiles of ∆FS distribution were ≤0.333 (I), 0.334-0.875 (II), and >0.875 (III). Table 1 shows the clinical characteristics according to the ∆FS tertiles. As expected, slow progressors were younger at disease onset and recruitment, were less likely to have a bulbar onset, and had a better FVC. The El Escorial categories were associated with the progression rate; however, since some of the "suspected ALS" patients may turn out to eventually not have ALS, we made a sensitivity analysis excluding "suspected ALS." The statistical significance did not substantially change (p = 0.010).

Smoking
Current smokers accounted for 44 (18.3%) of the participants, 187 (77.6%) were nonsmokers, and 10 (4.1%) were former smokers. No patient started smoking after their ALS diagnosis. No difference was found for the status and modalities of smoking (Table 1). Table 2 shows the unadjusted comparisons of clinical variables according to the intensity of smoking (cigarettes/day) categories. Former smokers were excluded from the analysis. Never-smokers had a significantly higher age at ALS onset than current smokers and a lower, although not statistically significant, ∆FS. All the other clinical factors (gender, BMI, FVC, El Escorial category), except the site of onset, were equally distributed across the categories. Pairwise associations between cigarettes/day, pack-years, duration of smoking, and the log-transformed ∆FS (i.e., log(∆FS)) are reported in Figure 1. The log(∆FS) was not correlated with the duration of smoking (r = 0.13, p = 0.406), nor it was different between the classes of cigarettes/day and pack-years. As expected, the number of pack-years was associated with the duration.

Alcohol Consumption
Current alcohol drinkers accounted for 147 (61.0%) of the participants, 5 patients (2.1%) were former drinkers, and 89 (36.9%) non-drinkers. No patient started drinking alcohol after their ALS diagnosis. No difference was found for the drinking status (Table 1). Table 3 shows the unadjusted comparisons of clinical variables among non-drinkers and drinkers according to the intensity (drinks/day) categories. Compared to non-drinkers, the age at ALS onset was lower in drinkers of ≤1 drink/day and higher in drinkers of >1 drink/day.
Former drinkers were excluded from the analysis. The disease rate of progression (median ∆FS score) was similar for all categories. All the clinical factors were equally distributed across the categories. Pairwise associations between the drinks/day, drinkyears, duration of alcohol consumption, and log-transformed ∆FS were assessed, and the results are reported in Figure 2. The log(∆FS) was weakly (but statistically significantly) correlated only with the duration of alcohol consumption (r = 0.18, p = 0.028), but not with the number of drinks/day or drink-years. As expected, the number of drink-years was associated with the duration. Missing values were excluded from the analysis and percentages were computed out of the total number of observations. SD: standard deviation; p-values were reported from pairwise contrasts defined in ANOVA models or Fisher's exact test from continuous and categorical variables, respectively; # the log-transformed variable was used in the ANOVA model (because of the skewed distribution); • median cut-off; * the smoking intensity was computed as the weighted mean of the number of cigarettes smoked per day at different age periods, with the weights equal to the smoking duration within each age period.

Figure 1.
Plot matrix depicting the pairwise associations between the smoking intensity (cigarettes/day), smoking load (pack-years), duration of smoking, and log-transformed ΔFS (lower diagonal elements). Comparisons with the smoking loads are reported as boxplots, whereas the correlation between the log-transformed ΔFS and duration of smoking is reported as a scatterplot with a fitted regression line. The distribution of each variable considered is reported as a bar chart or histogram along the diagonal. Only current smokers were considered to produce the analysis results presented here.

Alcohol Consumption
Current alcohol drinkers accounted for 147 (61.0%) of the participants, 5 patients (2.1%) were former drinkers, and 89 (36.9%) non-drinkers. No patient started drinking alcohol after their ALS diagnosis. No difference was found for the drinking status (Table  1). Table 3 shows the unadjusted comparisons of clinical variables among non-drinkers and drinkers according to the intensity (drinks/day) categories. Compared to non-drinkers, the age at ALS onset was lower in drinkers of ≤1 drink/day and higher in drinkers of >1 drink/day. Figure 1. Plot matrix depicting the pairwise associations between the smoking intensity (cigarettes/day), smoking load (pack-years), duration of smoking, and log-transformed ∆FS (lower diagonal elements). Comparisons with the smoking loads are reported as boxplots, whereas the correlation between the log-transformed ∆FS and duration of smoking is reported as a scatterplot with a fitted regression line. The distribution of each variable considered is reported as a bar chart or histogram along the diagonal. Only current smokers were considered to produce the analysis results presented here.   Since a previous multicenter case-control study [12] found an intriguing difference in the ALS risk between patients from the Apulia region (increased) and other areas (decreased or neutral), we analyzed the subset of patients from Apulia separately (Tables S1  and S2). However, no difference in the disease progression was found for exposure to alcoholic beverages, wine alone, or smoking. Since a previous multicenter case-control study [12] found an intriguing difference in the ALS risk between patients from the Apulia region (increased) and other areas (decreased or neutral), we analyzed the subset of patients from Apulia separately (Tables S1 and S2). However, no difference in the disease progression was found for exposure to alcoholic beverages, wine alone, or smoking.

Predictors of ∆FS
The VIMP provided by the conditional RF algorithm that we used to detect the variables that were most associated with ∆FS suggested that diagnostic delay, age at onset, El Escorial category, and education were the covariates that explained the largest amount of the log(∆FS) variance ( Table 4). The whole RF achieved a fair goodness of fit (R 2 = 0.48). Specifically, the diagnostic delay was found to be the strongest predictor of ∆FS, achieving the highest VIMP of 0.63, followed by age at onset and the El Escorial classification, whereas drinking and smoking status were at the bottom of the list (VIMP = 0).
The accumulated local effects plot for all "important" variables (i.e., VIMP > 0) is reported in Figure 3.
The accumulated local effects can be interpreted as the change in log(∆FS) relative to the mean for each specific value of the plotted variable of interest. As expected, the lower the diagnostic delay (close to zero), the higher the ∆FS values, and the higher the diagnostic delay (greater than 2 years), the lower the ∆FS values. In contrast, higher values of age at onset were associated with higher ∆FS values. The VIMP of a specific variable is the sum of the decrease in prediction error values (of log(∆FS)) when a tree of the forest splits due to that variable, whereas RVIMP is the VIMP divided by the highest VIMP value such that values are bounded between 0 and 1 (or between 0 and 100%). Interestingly, from a graphical viewpoint, it seemed that the highest ΔFS was found for those who smoked more than 10 cigarettes/day but drunk less than 2 drinks/day (yellow regions), a modest ΔFS was found for those who smoked less than (or equal to) 10 cigarettes/day but drunk less than 2 drinks/day (green regions), and the lowest ΔFS was found for those who drunk at least 2 drinks/day (blue and violet regions). The association between smoking and alcohol intensity and their combination (as suggested by the partial dependence plot) with log(ΔFS) was eventually assessed in both univariable and multivariable analyses, adjusting the ANOVA models for four possible confounders (gender, age at onset, education, and diagnostic delay), both alone and in combination. Former smokers or drinkers were excluded from this analysis. The results are reported in Table 5: the ΔFS least-square means (i.e., backtransformed on the original scale) did not significantly vary across smoking and alcohol consumption groups. However, when comparing ΔFS with respect to the groups suggested by the partial dependence plot, a statistically significant difference was found in both the univariable (p = 0.032) and the multivariable models (all p < 0.05). Moreover, to better understand whether smoking and alcohol intensity were jointly related with log(∆FS), a partial dependence plot was created using a conditional RF and is shown in Figure 4.
Interestingly, from a graphical viewpoint, it seemed that the highest ∆FS was found for those who smoked more than 10 cigarettes/day but drunk less than 2 drinks/day (yellow regions), a modest ∆FS was found for those who smoked less than (or equal to) 10 cigarettes/day but drunk less than 2 drinks/day (green regions), and the lowest ∆FS was found for those who drunk at least 2 drinks/day (blue and violet regions). The association between smoking and alcohol intensity and their combination (as suggested by the partial dependence plot) with log(∆FS) was eventually assessed in both univariable and multivariable analyses, adjusting the ANOVA models for four possible confounders (gender, age at onset, education, and diagnostic delay), both alone and in combination. Former smokers or drinkers were excluded from this analysis. The results are reported in Table 5: the ∆FS least-square means (i.e., backtransformed on the original scale) did not significantly vary across smoking and alcohol consumption groups. However, when comparing ∆FS with respect to the groups suggested by the partial dependence plot, a statistically significant difference was found in both the univariable (p = 0.032) and the multivariable models (all p < 0.05).     According to Al-Chalabi et al. [26], ALS arises as the final manifestation of a multistep process. However, the rapid progression of the pathological process after onset is an intriguing feature that remains unexplained. We were interested in the possible role of two environmental exposures in accelerating disease progression once it has started and not in their role as risk/protective factors for the onset of ALS. For this reason, we evaluated the smoking/drinking status at disease (clinical) onset by considering those who had quit smoking or drinking at least six months before onset as non-smokers/drinkers. To evaluate the possible impact of the two exposures at the earliest stage, we also included suspected ALS.
To analyze the possible role of smoking and alcohol exposures on disease progression, we divided the ∆FS into tertiles. The tertiles of the ∆FS distribution are associated with survival [17,27], thus indicating that this measure is predictive of different rates of disease progression. This was also true in our sample, where slow progressors had a younger age at disease onset, more frequent spinal onset, better FVC, and a longer diagnostic delay, which are all predictive factors for ALS progression [28,29].
Disease progression, measured using ∆FS and log(∆FS), was only weakly correlated with the duration of alcohol consumption, but not with the alcohol drinking status, drinking intensity, or load. The age at ALS onset went in the opposite direction in the two quantity/frequency categories of drinkers compared to non-drinkers; although this observation may lend itself toward a U-shaped type of association, this must be verified in a larger sample size.
On the other hand, the age of ALS onset was lower in current smokers than nonsmokers, as already observed [30], pointing to a possible effect of smoking in anticipating disease onset. Similarly, ∆FS was slightly, although not significantly, higher for smokers of >14 cigarettes/day. Indeed, as seen in Table S3, our sample only achieved 64% statistical power to detect any significant difference of log(∆FS) means between the smoking groups (exposure).
In order to analyze a possible interaction of smoking and alcohol consumption with other clinical variables, we first ranked variables using the variable importance measure from a conditional RF and eventually performed a multivariable model. This tree-based RF algorithm is very powerful: it provides robust and "internally validated" findings without the need for the training and validation of separate datasets (i.e., RF immediately validated its decision trees in the "out of bag" observations) and, most importantly, with respect to other tree-based machine learning algorithms [31], it requires the setting of a very limited number of parameters. Clinical/demographic variables, such as diagnostic delay, age at onset, El Escorial category, and education explained the largest amount of the log(∆FS) variance, whereas smoking and alcohol drinking retained only minor importance. After adjusting for these four variables, the multivariable analysis showed that log(∆FS) was higher, although not significantly, for smokers of >14 cigarettes/day compared to non-smokers, and was not different among the alcohol drinking categories. However, those drinking ≥2 drinks/day, independently from their smoking status, had lower ∆FS than the two categories of drinkers and smokers, thus showing a possible interaction between the two exposures.
Taken together, the findings from this explorative study suggest possible minor roles for smoking in worsening disease progression, and conversely, for alcohol drinking. Cohort studies have been performed only for smoking, with equivocal results: smoking was identified as an independent predictor of survival in both sexes in a population registry from northwestern Italy [30], and in a U.S. study, but only in women [32]. In two other studies, smoking did not predict mortality [33,34]. Interestingly, a possible interaction between smoking and alcohol drinking as predictors of progression was found in two studies of multiple sclerosis [35,36].
This study has limitations that are intrinsic to its cross-sectional design, which prevents establishing a causal relation; however, this study design is practical for testing hypotheses in rare diseases and allows for proving associations with outcomes, if sufficiently strong, such as for smoking and severity in multiple sclerosis [35]. Furthermore, we could not evaluate the possible confounding due to unmeasured variables, such as physical activity, trauma, or diet.
On the other hand, our study does present some strengths. Selection bias was minimized because patients were consecutively enrolled and had a large spectrum of disease severity. A recall bias is unavoidable with this type of study, but patients were unaware of the study hypothesis and interviewers were blinded to clinical history and neurological status. By collecting the personal history of consumption for every single patient, we were able to study the lifelong cumulative effect of both exposures and not only the amount of exposure at the time of the interview or immediately before.
In summary, this cross-sectional multicenter study suggested only a minor role for smoking in the progression of ALS, in contrast to other neurodegenerative diseases [35,37]; the role of alcohol drinking as a possible modifier should be studied in larger samples.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to ethical and privacy restrictions.

Conflicts of Interest:
The authors report no conflict of interest and the funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.