Canonical Correlation for the Analysis of Lifestyle Behaviors versus Cardiovascular Risk Factors and the Prediction of Cardiovascular Mortality: A Population Study

: Objectives: To assess the overall association of lifestyle behaviors with multiple cardio-vascular risk factors and mortality. Material and Methods: In the Italian Rural Areas of the Seven Countries Study, involving 1712 middle-aged men (40–59 years) enrolled in 1960, smoking habits, physical activity, dietary habits, marital status, and socioeconomic status (SES) were studied as possible determinants of 15 measurable risk factors (body mass index, tricipital and subscapular skinfold, arm circumference, systolic and diastolic blood pressure, heart rate, double product (systolic blood pressure × heart rate), vital capacity, forced expiratory volume, serum cholesterol, urine protein, urine glucose, corneal arcus and xanthelasma) using canonical correlation (CC). Results: The first CC had a value of 0.54 (R 2 0.29, p < 0.0001). The role of marital status was marginal; that of a high SES was contrary to expectations. The strongest behaviors based on standardized CC coefficients were dietary habits and physical activity. The risk factors mostly associated with overall lifestyle behaviors were some anthropometric and cardiovascular measurements. The mean levels of risk factors distributed in tertile classes of the CC variate score of lifestyle behaviors were largely associated in a coherent and graded way with the expected relationship of behaviors versus risk factors. In a large series of Cox models, the CC variate scores were significantly associated with 50-year coronary heart disease (CHD) mortality and much less with stroke and other heart diseases of uncertain etiology. Conclusions: Lifestyle behaviors correlate well with cardiovascular risk factors associated with CHD mortality, and CC is a useful method of analysis to detect long-term impacting characteristics.


Introduction
It is well documented that some lifestyle behaviors may play a great role in health and disease.It is enough to peruse a few long reviews [1][2][3] to find the relationship of physical activity, smoking, and eating habits with disease and mortality, including the role of mediators between the behaviors and outcome, such as modifiable risk factors.In previous analyses, we showed the role of personal characteristics in predicting long-term mortality, heart diseases, age at death, and longevity in middle-aged men followed up to 50 years [4][5][6][7][8][9][10][11][12][13].Among the predictors, there were lifestyle behaviors and measurable risk factors, which were frequently included in the same predictive models.In the latter case, both types of characteristics were significantly associated with mortality rates and the length of survival.
We thus posed the question of quantifying the role of lifestyle behaviors regarding the levels of measurable risk factors.The purpose of the present analysis was to study the relationship of physical activity, smoking and dietary habits with a series of cardiovascular risk factors commonly measured in population studies using canonical correlation, an uncommon method adopted quite rarely (see Appendix A) that we use here for the first time to detect long-term impacting characteristics in a residential cohort.
Moreover, we planned to test canonical variables obtained from this analysis as possible predictors of cardiovascular events that occurred over 50 years in a residential cohort of middle-aged men.

Study Population and Measurements
The data were derived from the Italian Rural Areas of the Seven Countries Study of Cardiovascular Disease located in two villages in Northern and Central Italy, first examined in 1960.The analysis uses baseline data from an entry field examination in a cohort of 1712 men aged 40-59 years, where the participation rate was 98.7%.The mean age of the participants was 49.8 years (SD = 5.1).More details can be found elsewhere [14].
The personal characteristics used in the present analysis are linked to some lifestyle behaviors and some measurable risk factors that in theory could be modulated by lifestyle behaviors.Among the lifestyle behaviors, we considered as possible causes, and thus called X variables, smoking habits, physical activity, and dietary scores, and, in addition, marital status and socioeconomic status (SES), which may play at least an indirect role in modulating risk factor levels.As possible effects, we considered 15 risk factors, thus called Y variables, including anthropometric (not skeletal), biophysical, biochemical and clinical measurements.All X and Y variables are listed in Table 1, with definitions, units of measurement, bibliographic references and other details [15][16][17][18][19][20].
In a separate analysis, we used mortality from coronary heart disease (CHD), stroke (STROKE) and other heart diseases of uncertain etiology (HDUEs) as the endpoint in multivariate Cox models of different types.
Collection of mortality data was complete after 50 years.Coding was based on the availability of causes of death with the addition of other information from repeated field examinations, hospital clinical records, other medical documents and interviews with hospitals, family doctors and relatives of the deceased.The final cause of death was assigned by a single coder following the rules of the Seven Countries Study and using the 8th Revision of the World Health Organization International Classification of Diseases (ICD-8) [21].In cases of multiple causes of death and uncertainties about the choice of the first cause, a decreasing hierarchical rank was adopted with violence, cancer, CHD, STROKE and others in sequence.
Cardiovascular mortality endpoints were chosen as follows: (1) CHD including cases of myocardial infarction, acute ischemic heart attacks and sudden coronary death, after the exclusion of other possible causes (ICD-8 codes 410, 411, 412, 413 and 795); cases with only a mention or evidence of chronic coronary heart diseases (part of code 412) were not included in this group for reasons given in other contributions [10, 12,14], while healed myocardial infarction was retained in this group.(2) STROKE including any type of cerebrovascular disease (ICD-8 codes 430-438).(3) HDUEs including a pool of symptomatic heart diseases (ICD-8 code 427 corresponding to heart failure, arrhythmia, blocks), ill-defined hypertensive heart disease (usually in the absence of documented left ventricular hypertrophy) (ICD-8 codes 402-404) and cases classified as chronic or other types of coronary heart disease in the absence of typical coronary syndromes (ICD-8 part of code 412 and 414), usually manifesting in heart failure, arrhythmia and blocks.The reasons for segregating CHD mortality from that of HDUEs are linked to the repeated documentation we presented about their differences at least for risk factors (mainly serum cholesterol higher for CHD) and age at death (definitely higher for HDUEs) [10,12,14].
After 50 years of follow-up, there were 1669 deaths (97.5%), while nobody (among the enrolled 1712 individuals) was lost to follow-up.All cardiovascular disease mortality covered 45.7% of all causes, while 705 belonged to the three major groups described above, covering 92.4% of all cardiovascular fatal events after the exclusion of other groups with well-defined or very rare etiology.

Statistical Analysis
The analysis was separated into two different parts.Part 1. Canonical analysis.For the purpose of relating lifestyle behaviors with measurable risk factors, we used canonical correlation (see Appendix A), which is an extension of multiple linear regression and correlation, where the dependent variables are more than one [22,23].This statistical approach allows us to find the relationship of several independent variables, usually called Xs (and arbitrarily considered as possible causes), with several dependent variables, usually called Ys (and arbitrarily considered as possible effects).
The analytical process computes one or more variate Xs and one or more variate Ys that are the linear combinations (weighted averages of the original variables) capable of maximizing the correlation between variate X and variate Y.The estimates of variate Xs and variate Ys are based on Z-transformed variables, since their original values are converted into a mean of around 0 with a standard deviation of around 1. To be conceptually simple, variates X and Y are the pool of possible causes and the pool of possible effects, respectively, computed following the procedure mentioned above.The factor loadings are the correlations of variates with the original variables and reflect their role in the production of variates.
The coefficients of the linear combinations are usually reported as standardized (that is, multiplied by the standard deviation of the variables) and can be applied to individual subjects obtaining a variate score for X and Y separately.Again, to simplify, variate scores X and Y are the numerical characteristics of each individual derived from the pool of the possible causes and of the possible effects, respectively.
Both factor loadings and standardized coefficients are used to evaluate the role of the original variables in the production of variates, but the judgement is not univocal.In fact, when the group of variables is uncorrelated, the canonical loadings are similar to the coefficients and highly correlated.The reverse happens when the original variables are highly correlated, since in this case, factor loadings and standardized coefficients are rather different and uncorrelated.
Canonical correlation is the linear correlation between variate X and variate Y and can be interpreted as the usual R (simple linear correlation), including the R 2 that represents the proportion of variance explained by the correlation.The final R and R 2 represent the highest possible correlation between a linear combination of Xs and a linear combination of Ys.
After the first canonical correlation, others can be computed, but usually their levels are smaller and their interpretation is tricky since they are computed by the residuals of previous correlations.In this way, only the first correlation maximizes the association of the original Xs with Y variables and, for this reason, only one correlation has been considered in this analysis.The final outcome allows us also to invert the interpretation that assigns the role of dependent variables to Xs and the role of independent variables to Ys, an approach that is common in psycho-social sciences.In our case, the only reasonable approach was to consider the X variables as possible causes of the Y variables and the Y variables as possible effects.
In general, it is recognized that sometimes the interpretation of findings of canonical correlation is not easy.Another interesting feature of canonical correlation is that it compacts several variables into a kind of score that is independent from the judgement of the investigator while dependent on it in the so-called a priori scores.However, such compacting is conditioned by the search for the best correlation between the two groups of original variables (X and Y).
The output of the computer program [23] included a correlation matrix among all the variables, a canonical correlation with its p value, standardized coefficients and the factor loadings of variates X and Y; the last two reflect the rank of their power.To provide an indication of the connection of variate score X with risk factor levels, a table was filled with the levels of risk factors in tertile classes of variate score X levels.Moreover, we computed the distribution of some risk factor levels in three classes of physical activity and three classes of dietary habits.
Part 2. Cox proportional hazards predictive models.In another part of the analysis, a series of Cox models was analyzed to test the role of canonical analysis findings in the prediction of the mortality of three major groups of cardiovascular disease occurring over 50 years: All behaviors were used in this analysis, while for the risk factors, we selected only those which could exclude multicollinearity problems, resulting in only 11 risk factors as follows: body mass index, subscapular skinfold thickness, arm circumference, systolic blood pressure, heart rate, vital capacity, serum cholesterol, urine protein, urine glucose, corneal arcus, xanthelasma (plus age).
The informativeness of the added variable (prediction ability) was estimated by the procedure proposed by Peto [24], as directly proportional to the chi square (twice the change of the model likelihood).This indicator was used to compare model (c) with model (d) and model (e) with model (f) using the loglikelihoods produced by the Cox solutions.The purpose of this was to see whether the addition of variate Y score in the first couple and variate X score in the second couple improved the goodness of fit (in two couples of nested models).
Additional tests were the Akaike information criterion and the AUCs of the models.

Canonical Analysis
The correlation matrix across all variables is not reported in detail as it is too bulky.Among 190 comparisons, only 29 (19%) had R values (correlation coefficient) equal or greater than 0.23, a level that explains about 5% or more of variance between two variables.The majority of these relatively high levels were related to anthropometric risk factors, and somewhat less to cardio-circulatory risk factors.There were some correlation coefficients of great interest, such as vigorous physical activity versus heart rate (−0.21); sedentary physical activity versus heart rate (0.15); non-Mediterranean diet versus body mass index (+0.28),tricipital skinfold (+0.32), subscapular skinfold (+0.20) and systolic blood pressure (+0.29); and Mediterranean diet versus BMI (−0.29), tricipital skinfold (−0.33), subscapular skinfold (−0.26), systolic blood pressure (−0.24) and heart rate (−0.21).
The standardized canonical coefficients of the X and Y variables for the first canonical correlation (Table 2) allow us to compare their magnitude.The group of Xs showed a dominant role in dietary habits, followed by physical activity and smoking habits.The coefficients of diet and physical activity were positive, while those of smoking habits and a high SES were negative, suggesting an opposite effect on risk factor levels.In fact, the highest level for diet corresponds to a Mediterranean diet and the highest level for physical activity corresponds to vigorous activity.The reverse occurred for smoking habits, which overall were associated with higher levels of risk factors (recalling that the coding of smoking habits was 1 = smokers; 2 = ex-smokers; 3 = never).In the group of Ys, large coefficients were found for some anthropometric and cardiovascular measurements.All risk factors had negative coefficients, except arm circumference, double product, vital capacity, forced expiratory volume and urine glucose.When interpreting (see Appendix A), one should consider the algebraic signs, looking at the same time at the coefficients of the Xs.For example, the coefficient of physical activity (X variable) was large and positive, while those of Y variables such as BMI, tricipital skinfold, systolic blood pressure and heart rate were negative, suggesting an inverse relationship between this X variable with these Y variables.In any case, the algebraic sign of the coefficients should not be interpreted as related to the possible predictive power of the variables in relation to possible events, but only to the direction of their association with the various risk factors.A similar evaluation can be conducted by inspecting the canonical loadings reported in Table 2.For X variables, the rank order is sufficiently similar to that of standardized coefficients, and this is supported by the fact that the overall correlation in the matrix of the original variables is rather small (standardized Cronbach's Alpha test for correlation matrix = 0.0622) and by the high correlation between factor loadings and standardized coefficients (R = 0.97).The situation is rather different for Y variables, where the overall correlation in the matrix of the original variables is relatively high (standardized Cronbach's Alpha test for correlation matrix = 0.65) and consequently the correlation between factor loadings and standardized coefficient is lower (R = 0.38).The consequence is that the top five ranking variables for standardized coefficients are heart rate, double product, systolic blood pressure, tricipital skinfold and body mass index, while the corresponding five for canonical loadings are tricipital skinfold, subscapular skinfold, body mass index, double product and systolic blood pressure.
The first canonical correlation had a value of 0.54 (p < 0.0001), corresponding to an R 2 of 0.29, thus explaining a sizeable proportion of the relationships of the two groups of variables.
To provide an indication of the relationship of the characteristics of people located in the three tertiles of the variate X score, their mean levels are reported in Table 3.For all risk factors (except arm circumference), there was a trend from tertile 1 to tertile 3, ascending for vital capacity and forced expiratory volume, descending for all the others except arm circumference, urine glucose, urine protein, corneal arcus and xanthelasma, whose trends were irregular and not significant.The observed trends suggest that those located in tertile 3 (the highest) enjoy the high values of the ascending risk factors and the low values of the descending risk factors, although this situation is not yet bound by their predictive power.The p value of the ANOVA across the three tertiles was highly significant for most of the risk factors except for arm circumference, forced expiratory volume and three of the five extremely rare conditions expressed as proportions.Overall, the risk profile was definitely more favorable for tertile 3 than for tertile 1.For example, comparing these two extremes for two major risk factors, a difference of 14 mmHg in systolic blood pressure and 11 mg/dL in serum cholesterol corresponded to a difference of about 2 years in life expectancy according to a model available in a previous analysis on the same population and the same follow-up duration [13].Table 4 provides a few selected examples that quantify the effect on levels of single risk factors across the three levels of each lifestyle behavior.We decided to choose physical activity and dietary scores since they were the most powerful as defined by the standardized canonical coefficients, while the "dependent risk factors" were selected as those more likely influenced by the correspondent behavior (see Table 2).All of them were coherent and graded following the expectation on their relationships.In fact, it was expected that the arm circumference and forced expiratory volume would increase and that double product would decrease with increasing levels of physical activity, and similarly that systolic blood pressure, BMI and serum cholesterol would decrease moving from a non-Mediterranean diet to a Mediterranean diet.

Cox Models Prediction
Tables 5-7 provide findings of Cox models using variate X and Y scores in both continuous and discrete shapes for mortality from CHD, STROKE and HDUE separately.For CHD models (Table 5), all coefficients were negative, meaning that the levels of covariates were inversely related to events and all were significant, except tertile 2 of the variate Y score.In general, coefficients and hazard ratios were larger for variate X score (behaviors) than for variate Y score.Findings were entirely different for STROKE models, as only variate Y score (continuous) was negative and significant (Table 6).A slightly better situation was found for HDUE models, where tertiles 2 and 3 of variate X scores were negative and significant (behaviors) (Table 7).The last test was limited to the CHD mortality as an endpoint since this was the one with the best outcome in the previous analyses.In this case (Table 8), we found that adding the variate Y score (representing the pool of "dependent" risk factors) to a basic model (run with the single five behaviors plus age) provided a significant improvement in the model loglikelihood, which is an indicator of goodness of fit based on the informativeness procedure [24].Similarly, when the basic model included the risk factors, the addition of the variate X score representing the behaviors again made the loglikelihood improve significantly, but to a lesser extent.The improvement in the models was confirmed by the computation of the Akaike information criterion and the AUCs.In particular, these last two tests were coherent with each other, indicating a better performance of the couple of Models 1 and 2 than of the couple of Models 3 and 4 (models are here numbered as in Table 8).In any case, these findings suggest that the pools of both behaviors (possible causes) and risk factors (possible effect of behaviors) carry important and significant information that improve prediction.We also ran another separate Cox model, with behaviors and risk factors fed in a traditional way, excluding (like in previous ones) a few risk factors that could produce multicollinearity problems (not reported in detail).In this case, a significant predictive role was found for three (out of five) behaviors (physical activity, dietary score, smoking habits) and for three out of eleven risk factors (systolic blood pressure, serum cholesterol, vital capacity).However, the outcome could have been conditioned by the saturation effect linked to the use of too many covariates.

Discussion
This analysis tends to confirm that some classical lifestyle behaviors such as smoking habits, physical activity and eating habits play a role in health at least partly through the modulation of some measurable risk factors.Overall, these lifestyle behaviors, together with marital status and a high SES, explain almost 30% of the variance in a group of 15 risk factors that are usually good predictors of cardiovascular diseases, all-cause mortality, age at death and life expectancy during long follow-up periods.
However, the role of marital status proved to be almost negligible, while that of SES was not major and apparently played a role against the expectation.In fact, its canonical coefficient was negative, like that of smoking habits, and the correlation matrix showed a direct association with some risk factors carrying adverse levels such as subscapular skinfold and sedentary physical activity.This may mean that in the middle of last century, people with a high SES were not healthier than others as apparently is the case nowadays, suggesting that SES is a variable whose meaning, in terms of public health, is changing on the basis of location and time.
The risk factors mostly associated with those lifestyle habits seem to be some anthropometric and cardiovascular measurements.Behaviors carrying the largest canonical coefficients were dietary habits and physical activity, both with three levels going from a non-Mediterranean diet to a Mediterranean diet and from sedentary to vigorous physical activity.These are the same behaviors that have a strong influence on blood pressure, BMI and cholesterol (for diet), and on arm circumference, forced expiratory volume and double product (for physical activity).On the other hand, systolic blood pressure, heart rate and double product are those that have the largest standardized canonical coefficients, followed by some anthropometric measurements.
An important aspect of these findings is the graded and significant coherence between the levels of lifestyle behaviors and those of the risk factors, as clearly shown in Tables 4 and 5.The fact is that when these lifestyle behaviors are fed into multivariate models as predictors of cardiovascular mortality, all-cause mortality or age at death, together with the usual measurable risk factors, both types of predictors still play a significant predictive role [13].The consequent hypothesis is that these lifestyle behaviors play an extra role, acting also on risk factors not available in our data or that are simply still unknown, probably including genetic markers.
However, looking to the models dedicated to the three cardiovascular endpoints, it appears that the predictive role of canonical variate scores is almost always significant for CHD, which is not the case for STROKE and HDUEs.This perhaps suggests that the three conditions have different risk factors and/or different relationships with the available risk factors, a fact that has been demonstrated in the same epidemiological material using traditional statistical approaches [4,9], forcing us to conclude that we are facing different diseases, at least in terms of their relationship with serum cholesterol (a significant predictor of CHD but not of STROKE or HDUEs) and different ages at death (shorter for CHD, longest for HDUEs, intermediate for STROKE) [11].Moreover, the set of behaviors and risk factors used in this analysis is more suitably bound to CHD than to the other conditions.All this suggests the possible existence of a situation of competing risks, but this problem has already been tackled and substantially solved in a dedicated analysis of the same material [11], where the existence of a competition between CHD and HDUEs and STROKE was shown, mainly mediated by different relationships with serum cholesterol.
From a strictly technical point of view, it is interesting that adding the variate Y score to the predictive model initially solved with the single behaviors provided a significant increase in the informativeness, that is, the predictive ability of the added covariate.The same was found when the initial model included only risk factors and the additional covariate was the variate X score.This was clearly shown for the endpoint used for this part of the analysis, that is, CHD mortality.We can hypothesize that compacting, say, a block of risk factors or a block of behaviors into single covariates might be helpful to limit the excess of covariates in multivariate predictive models.However, much more work is needed to reach this goal and thus produce valuable operative procedures.The limits of this analysis are linked to the relatively small size of the population sample, partially compensated by the long follow-up; the absence of women in the cohort; and the limited number of risk factors, although in the literature it is rare to find analyses including many more risk factors.
The attempt to use variate scores for the prediction of events suggests that the canonical regression procedure, at least for these combinations of X and Y variables, somewhat helps in selecting subgroups of favorable versus unfavorable risk profiles, opening another path in this area.This does not mean that canonical correlation should substitute other wellestablished predictive models, but it is worth noting that there is a coherence between lifestyle behaviors, risk factor levels and the predictive power of these variables.
The contribution of this analysis probably has value due to its systematic and comprehensive approach including a defined population sample, the use of canonical correlation with more than one X variable (five in this case) and many Y variables (fifteen in this case).Moreover, the study of the relationships and the influence of lifestyle behaviors could exploit the availability, in the same population, of long-term (50 years of follow-up, close to extinction) mortality data that became the endpoints of the possible predictive power of the canonical variate score related to three major cardiovascular mortality groups.Finally, the coexistence in the same model of behaviors (possible causes) and risk factors (possible effects) as predictors showed that the former have some extra contributions as determinants of events beyond their influence on risk factors.
A review of the literature highlights that contributions can be divided into two groups, that is, the study of the relationship of behavior with risk factors and the use of canonical correlation for the same or other purposes in the cardiological domain.In the first group, the majority of the quoted studies dealt with one lifestyle behavior and a limited number of risk factors [25][26][27][28][29][30][31][32][33][34][35].We found four studies that used physical activity, variously defined and measured, as a possible determinant of measurable risk factors, mainly blood lipids and components of metabolic syndrome [27,29,31,33], with findings tending to associate high levels of physical activity with a low BMI, low levels of triglycerides and LDL and total cholesterol and high levels of HDL cholesterol.Moreover, in an Australian study, healthy behaviors of sleeping, physical activity and dietary habits were associated with lower levels of LDL cholesterol [34].An a posteriori dietary score derived from a population study involving 2298 adults in France, Belgium and Luxemburg showed lower levels of cardiovascular risk factors when the diet was rich in fruit, nuts, vegetables oils and tea [28].In a Polish study, the interactions of various lifestyle behaviors have been considered, but the upper tertile of the nutrition score (the healthy habit) was not associated with vigorous physical activity, while in this group, there were more people reading books and watching television [32].Finally, in Canada, a survey on children aged 10-11 years showed that parental/peer smoking and drinking and low self-esteem were associated with multiple risk factors linked to behaviors [26].The nutrition score and the physical activity score were associated in different ways with triglycerides, waist circumference and systolic blood pressure depending on the levels of BMI [30], suggesting that exercise had a better effect than nutrition in the subgroup of people with a normal BMI.Finally, an interesting survey conducted in the USA assessed the relationship of lifestyle behavior in couples in a study on employees.A high concordance was found for non-healthy behaviors and risk factor levels, but the relationship between the two types of variables was not considered [35].
Even rarer were contributions dealing with canonical correlation versus cardiovascular diseases.On the contrary, it is quite common in psycho-social sciences, neurology, genetics and biochemistry.Also, our experience in this field was limited to a single paper from 1994, where canonical correlation was used to compare a few cardiovascular risk factors with a few causes of death in an ecological analysis of the 25-year follow-up of the 16 cohorts of the Seven Countries Study [36].
In the output of PubMed, when searching for canonical correlation and cardiovascular diseases, less than 300 references can be found, many of which actually do not include the use of canonical correlation.Among the others, the majority deal with definitely clinical problems, either diagnostic, prognostic, therapeutic or even autoptic.At the end, we selected only six papers with at least some indirect connection with the content of our contribution [37][38][39][40][41][42].A paper from Romania had a promising title, but the abstract did not provide anything precise and the full text was not available [37].A study on 407 healthy Taipei Chinese adults successfully used canonical correlation to show the strict relationship of central adiposity with major cardiovascular risk factors [38].A similar, much larger study conducted in Canada reached the same conclusions, with significant canonical correlations of 0.58 for adult men and 0.61 for adult women [39].In the Framingham Heart Study, a variation of canonical correlation was used to identify the association of multiple repeatedly measured characteristics with single-nucleotide polymorphism data [40].In another Chinese study [41] using canonical correlation in a residential cohort, physical activity was related to anthropometric parameters and blood lipids, obtaining a correlation of 0.44 (p < 0.0001).However, curiously enough, anthropometric parameters were used as behaviors and not as a possible consequence of behaviors.Finally, the association of road traffic noise and air quality was related to more than 700,000 Scottish hypertensive patients from different areas, and the canonical correlation was 0.342, with 89% of the variance explained by the canonical independent variables [42].
This short review of the literature is disappointing, since the quoted papers generically deal with the relationships of lifestyle behaviors with risk factors and are characterized by the use of only one or few behaviors, and the case was the same for the consequent risk factors.In general, the analyses were very different from the one presented in this paper.Similarly, the contributions using canonical correlation in the field of cardiovascular diseases also had limited horizons and were very different from the systematic approach we used to tackle our data derived from a long-term population study.Although, in general, other investigations agree with our findings, none of them were fully comparable with this analysis in terms of methodology.In our case, using canonical correlation, a good association was found between some lifestyle behaviors and a small series of risk factors bound to cardiovascular diseases.Moreover, the canonical X and Y variate scores were capable of predicting CHD mortality in a satisfactory way.Thus, more analyses are needed in this field, hopefully exploiting larger population samples and larger numbers of both behaviors and risk factors to obtain a better picture of the problem.

Conclusions
In a long follow-up population study, the use of canonical correlation proved to be useful in identifying the relationships of major lifestyle behaviors with a number of established cardiovascular risk factors.Moreover, the canonical variates and variate scores showed their ability to predict the mortality of major CVDs in multivariate models, thus providing a tool that, by compacting predictive variables, contributes to limiting their number and reducing the need to force them into these models.
(a) Cox models including variate X and variate Y with a continuous shape predicting mortality from CHD, STROKE and HDUEs separately; (b) Similar Cox models to above including variate X and variate Y in three tertile classes, using tertile 1 (lowest) as a reference; (c) A Cox model with CHD mortality only and the five behavioral characteristics expressed as variable X in the canonical analysis as covariates (plus age); (d) The same Cox model as above with the addition of variate Y (continuous shape); (e) A Cox model with CHD mortality only and risk factors expressed as Y variables in the canonical analysis as covariates; (f) The same Cox model as above with the addition of variate X (continuous shape).

Table 1 .
Lifestyle behaviors (X variables) and measurable risk factors (Y variables): definitions, units of measurement, mean levels and selections.

Table 2 .
Standardized canonical coefficients and canonical loadings of X and Y variables.
Units of measurement from Table1.

Table 3 .
Mean levels of risk factors distributed in tertile classes of variate X scores.
Unit of measurements as from Table1.(*) Proportions and standard error.

Table 4 .
Some examples of mean levels of risk factors in classes of physical activity and dietary score.

Table 5 .
Cox proportional hazards model of variate scores (X and Y) as covariates expressed in continuous and discrete shape predicting 50-year mortality from CHD (n = 278) as a dependent variable.

Cox Model Predicting CHD Mortality with X and Y Variates in a Continuous Shape Covariates Coefficient Hazard Ratio 95% CI p of Coefficient
CI: confidence intervals.

Table 6 .
Cox proportional hazards model of variate scores (X and Y) as covariates expressed in continuous and discrete shape predicting 50-year mortality from STROKE (n = 225) as dependent variable.

Table 7 .
Cox proportional hazards model of variate scores (X and Y) as covariates expressed in continuous and discrete shape predicting 50-year mortality from HDUE (n = 202) as a dependent variable.

Table 8 .
Cox proportional hazards models predicting CHD mortality in 50 years (n = 278) in four shapes including five behaviors alone plus variate Y score, and separately eleven risk factors alone plus variate X score.