Association of Physical Activity with Anthropometrics Variables and Health-Related Risks in Healthy Male Smokers

Anthropometric variables (AV) are shown to be essential in assessing health status and to serve as markers for evaluating health-related risks in different populations. Studying the impact of physical activity (PA) on AV and its relationship with smoking is a non-trivial task from a public health perspective. In this study, a total of 107 healthy male smokers (37 ± 9.42 years) were recruited from different states in Malaysia. Standard procedures of measurement of several anthropometric indexes were carried out, and the International Physical Activity Questionnaire (IPPQ) was used to ascertain the PA levels of the participants. A principal component analysis was employed to examine the AV associated with physical activity, k-means clustering was used to group the participants with respect to the PA levels, and discriminant analysis models were utilized to determine the differential variables between the groups. A logistic regression (LR) model was further employed to ascertain the efficacy of the discriminant models in classifying the two smoking groups. Six AV out of twelve were associated with smoking behaviour. Two groups were obtained from the k-means analysis, based on the IPPQ and termed partially physically active smokers (PPAS) or physically nonactive smokers (PNAS). The PNAS were found to be at high risk of contracting cardiovascular problems, as compared with the PPAS. The PPAS cluster was characterized by a desirable AV, as well as a lower level of nicotine compared with the PNAS cluster. The LR model revealed that certain AV are vital for maintaining good health, and a partially active lifestyle could be effective in mitigating the effect of tobacco on health in healthy male smokers.


Introduction
Tobacco addiction remains a significant global health problem, killing over 16 million people annually [1]. Owing to the rapid growth of the global population, the number of smokers is projected to rapidly increase. However, for the past four decades, relatively investigations have been carried out to ascertain the anthropometric differences between smokers and non-smokers in different populations [30,31]. Some studies have focused on the nutritional status and physical activity level of the smoking group [13,14,32,33]; however, few or no studies have thus far focused on anthropometric variables and their relationships with PA among healthy adult smokers. Investigating the relationships between the aforementioned variables could be useful from a public health perspective in providing relevant information that could assist in converting pervasive smoking behaviour, as well as mitigating the long-term effects of smoking through exercising and other means.
Gender plays a significant role in the tendency to smoke in Malaysia, where the male-to-female ratio of smokers in the country is highly skewed: A larger proportion of smokers in the country are males, which could be attributed to the social norms, which are not favourable to female smoking [34]. Hence, the current study focused on adult male smokers only. To elucidate the relationship between AV and PA among healthy adult male smokers, the following research questions were devised to guide this study: 1.
What are the most dominant AV characteristics of healthy male smokers? 2.
What are the smokers' PA levels and the most dominant AV characteristics that differentiate the PA groups? 3.
How effective are the discriminant analysis (DA) and logistic regression models in discriminating and classifying the PPA and PNA smokers based on their AV characteristics?

Study Design
An ex post facto study design was used in the current study. This design was selected because the natural characteristics of the samples are not required to be manipulated or altered [35]. Essentially, the ex post facto design entails an investigation to find answers after an event occurred. Thus, the investigation was directed towards analysing the causeand-effect relationships between the study variables.

Samples Size and Samples with the Inclusion and Exclusion Criteria
A preliminary power analysis using G*Power was carried out to ascertain the sample necessary to draw meaningful conclusions in the study [36]. A power computation of multivariate analysis with a power equivalent to 0.95 and alpha of 0.05 revealed that a sample size of 107 samples would be sufficient to detect a medium effect size of 0.25 (Cohen's F), as suggested in a previous study [37]. Therefore, purposive sampling was used to recruit the participants. The participants in the current study had no history of any chronic diseases and were neither on routine medication nor alcohol consumers.

Anthropometric and Health-Related Variables Measurement
The height of the participants was determined whilst the participants were in a standing position and unshod, using a portable stadiometer (206, Seca, Hamburg, Germany), while weight was assessed using a standard weighing scale. The height and weight were used to determine the body mass index of the participants (BMI). Additionally, waist and hip circumferences (WC & HC) were assessed via the usage of a measuring tape and measured from the central point between the lower rib margin and the iliac crest and at the maximal circumference over the buttocks. The measurement was performed when the participants were in a standing position. The waist-hip ratio (WHR) was determined as the ratio of WC to HC. The systolic and diastolic blood pressure were measured from the right arm of the participants while they were in a seated position using an automatic digital blood pressure measurement device (HEM-780, Omron, Kyoto, Japan). The body fat percentage (BF), total body water percentage (TBW Percentage), visceral fat (VF), bone mass, and muscle mass were measured using a digital body composition analyser (SC-330, Tanita, Japan) that applied the principles of bioelectrical impedance analysis (BIA).

Nicotine Level Assessment
Nicotine levels were assessed using hair samples at the National Poison Centre (PRN) via gas chromatography-mass spectrometry (GCMS) [38]. The vortex posterior hair samples were cut off in close vicinity to the scalp, and only 10 cm (highest length) of each hair sample (gauged from the scalp end) was extracted. The hair sample was utilized to determine the level of nicotine in the body, a process that typically involves five important phases, namely, hair sampling, cleaning, digestion, extraction, and calculation. It is worth highlighting that this method was adopted in the study due to its superiority over the other methods (questionnaire-based), as it provides more accurate data for determining the precise level of nicotine in the body of a smoker.

Physical Activity Level Estimation
The physical activity (PA) of the participants was assessed via the short version of the International Physical Activity Questionnaire (IPAQ). The IPAQ consists of seven constructs that measure various degrees of PA concerning intensity and duration at different times of the day. This instrument was administered to the participants to obtain their ratings based on the perceived daily PA. The scores were summed used to estimate both the durations and the frequencies of the performance of the various PPA smokers. The IPAQ is shown to be valid and reliable in estimating the PA levels among an adult population [39].

Informed Consent/Ethical Approval
Before the commencement of the current investigation, all the procedures and protocol were endorsed by the Research Ethics Committee (Human) of Universiti Sains Malaysia, and the study was conducted in compliance with the Helsinki Declaration for the experiment with human subjects. Moreover, informed consent was obtained from all the participants. All the experiments were carried out at the laboratory of the School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan.

Data Analysis
A variety of multivariate analyses and machine learning methods were used to answer the research questions previously raised and to achieve the objectives of the study.

Principal Component Analysis (PCA)
PCA is a mathematical method used primarily to identify the structure of a dataset from a group of observed variables [40]. PCA is often used as a data reduction technique to identify important variables for further analysis [41,42].

Application of PCA in the Study
In this study, the PCA was used to ascertain the AV associated with the smoking samples. This analysis is carried out to answer research question 1. The data retrieved via this procedure were outlined in the previous sections, i.e., age, weight, height, BMI, waist circumference, hip circumference, waist-hip ratio, fat percentage, muscle mass, total body water, bone mass, and nicotine were applied in analysing the data. All the variables were considered for PCA to identify the most important parameters for further analysis. Hence, a factor loading that is equal to or greater than 0.80, was considered important, while a variable that was smaller than the threshold was deemed less important [43]. It is worth noting that before commencing the full analysis in the current study, all the data acquired were standardized through z-score transformation, whereby the mean and the standard deviation of all variables were scaled to a z score. This method is deemed suitable for the removal of bias effects between variables [40,44].

Cluster Analysis (CA)
Clustering is one of the most common exploratory data processing techniques used to study the nature and patterns of datasets. Cluster analysis has been reported to be useful in identifying subsets or samples about certain observed variables [45,46].

The Application of k-Means Cluster Analysis in the Study
Cluster analysis was applied at this stage to partition the samples of smokers into groups based on similarities and differences in the PA scale scores. In the present investigation, a k-means clustering algorithm was used, and Euclidean distance was considered a distance metric for assigning the formation of the two clusters identified, i.e., partially physically active (PPA) and physically nonactive (PNA) smokers.

Discriminant Analysis (DA) and Logistic Regression (LR)
We employed DA in this study to determine the differences between the two clusters extracted via the clustering previously described, i.e., PPA and PNA smokers, based on their AV. At this stage, the anthropometric and health-related markers that are shown to be dominant via the PCA analysis were used. The anthropometric and health-related markers were considered the independent variables, whilst the smoking categories, i.e., PPA and PNA, were treated as the dependent variables. Three different modes of DA model, i.e., standard, backward, and forward stepwise, were developed to determine the best fit with the data.
Moreover, a logistic regression model (LR) was employed to further ascertain the efficacy of the discriminant models in classifying the two smoking groups. A five-fold cross-validation technique was used for developing the model [47]. The data obtained from each of the DA analyses were split into a ratio of 70:30 for training and test sets [46,48]. The Scikitlearn libraries were evoked for the development of the LR model via Spyder IDE. Other statistical analyses were implemented via XLSTAT2014 add-in software and Orange Canvas version 3.4.0 [49], Tržaška 25, SI-1000 Ljubljana, Slovenia, for Windows. All assumptions were deemed significant and drawn at an alpha level of p ≤ 0.05. Table 1 depicts the characteristics of the participants. The total number of participants and the maximum, minimum, mean age and standard deviation of the participants, as well as the smoking period, are tabulated. Some of the participants started smoking in late adolescence, i.e., at 20 years of age. The overall mean age of the participants was 37 ± 9.4 years, whilst the average duration of the participants' smoking cessation histories was 16.9 ± 7.7 years. The cessation smoking history of the participants is also displayed. We observed that a total of 57 participants had attempted cessation, while 47 never attempted and 3 did not respond. It is worth highlighting that all participants were Malays recruited from different states in Malaysia. Many of the participants were urban dwellers, and the majority were from middle-to high-income families.  Figure 1 shows the scree plot of the eigenvalues for the PCA analysis. It could be observed from the figure that the PCA illustrated a total of three anthropometric components that are highly attributed to smoking behaviour. For each of these three components, some specific physical attributes are identified and considered the most affected due to their relatively higher Eigenvalues (greater than 1). These identified AV and health components were retained and subsequently used as inputs parameters for further analysis, i.e., varimax rotation.  Figure 1 shows the scree plot of the eigenvalues for the PCA analysis. It co observed from the figure that the PCA illustrated a total of three anthropo components that are highly attributed to smoking behaviour. For each of thes components, some specific physical attributes are identified and considered th affected due to their relatively higher Eigenvalues (greater than 1). These identif and health components were retained and subsequently used as inputs paramet further analysis, i.e., varimax rotation.  Table 2 presents the PCA results after the varimax rotation, where some re anthropometric markers are revealed. These AV are identified due to satisfying t set factor-loading threshold, i.e., greater or equal to 0.80. Likewise, it was observed total number of 7 AV, out of the 12 initially examined, were identified in al components as more pronounced in smokers.   Table 2 presents the PCA results after the varimax rotation, where some relevant anthropometric markers are revealed. These AV are identified due to satisfying the pre-set factor-loading threshold, i.e., greater or equal to 0.80. Likewise, it was observed that a total number of 7 AV, out of the 12 initially examined, were identified in all three components as more pronounced in smokers.     Table 3 indicates the differences in the measured variables between the sm categories. It could be observed from the table that the PNA smokers possessed means of all anthropometric and health markers. However, in the physical activity from the IPAQ measurement, the PPA smokers recorded higher mean scores, shows that the PPAS group spent more time engaging in physical activity compare PNAS.  Table 3 indicates the differences in the measured variables between the smoking categories. It could be observed from the table that the PNA smokers possessed higher means of all anthropometric and health markers. However, in the physical activity levels from the IPAQ measurement, the PPA smokers recorded higher mean scores, which shows that the PPAS group spent more time engaging in physical activity compared with PNAS.

Examining the Effectiveness of the DA and Logistic Regression Models in Discriminating and Classifying the PPA and PNA Smokers Based on the AV Characteristics
To examine the efficacy of the DA and LR models in discriminating as well as classifying the PPAS and PNAS, we excluded the weight and height variables in the process of developing the model as more often than not, these variables were shown to be highly correlated [50]. Table 4 details the classification accuracies, discriminating the anthropometric variables as well as the confusion matrices of the DA models. It could be observed from the table that a total of 94.39 per cent classification accuracy was obtained from the standard model of the DA, with 1 misclassification in the PNAS and 5 misclassifications attributed to the PPAS. From the standard mode analysis of the DA, four AV consisting of BMI, waist circumference, fat per cent, and hip circumference significantly distinguished the two groups. Conversely, the backward and forward stepwise models demonstrated a total classification accuracy of 95.33 per cent, with 4 misclassifications in the PPAS cluster and 1 misclassification in PNAS. A total of three AV were observed as significantly differentiating the two clusters in the backward mode of the DA, namely, waist, fat percentage, and hip, whilst only two variables (fat percentage and hip circumference) were found to discriminate said clusters in the forward stepwise mode, with a total classification accuracy of 94.39 per cent and 4 misclassifications from the PPA smokers as well as 2 from PNAS. To determine the best model, the AV for each DA model was used to develop the LR model as shown in Table 4.   Figure 3 displays the confusion matrix for the model. It could be observed from the confusion matrix that the standard and backward DA feed-forward LR demonstrated 3 and 2 misclassifications respectively, while the forward stepwise DA feed-forward LR showed only 1 misclassification. Therefore, it is evident that the forward stepwise DA feed-forward LR is the best model as it exhibits no overfitting behaviour, in contrast with the other two models.  The variables' contributions to the efficacy of the LR models developed are tabul in Table 6. The parameters for the models' goodness of fit, which constitute values fo parameter estimate (value), beta, standard error, chi-square, and the correspondi values, are shown. It could be observed from the table that fat percent and circumference were the highest contributors toward the prediction of group member (PNAS or PPAS) in all three models developed. Hip circumference was observed to greater contributor to the model, with beta values of −3.6, −3.4, and −3.6, while fat per had beta values of −3.1, −3.2, and −3.2 for each model, respectively. It is also evident the best model (model c) has a comparatively lower SE as well as higher chi-square va for each variable, further demonstrating the importance of hip circumference and percentage in the model's accuracy.  The variables' contributions to the efficacy of the LR models developed are tabulated in Table 6. The parameters for the models' goodness of fit, which constitute values for the parameter estimate (value), beta, standard error, chi-square, and the corresponding p values, are shown. It could be observed from the table that fat percent and hip circumference were the highest contributors toward the prediction of group membership (PNAS or PPAS) in all three models developed. Hip circumference was observed to be a greater contributor to the model, with beta values of −3.6, −3.4, and −3.6, while fat percent had beta values of −3.1, −3.2, and −3.2 for each model, respectively. It is also evident that the best model (model c) has a comparatively lower SE as well as higher chi-square values for each variable, further demonstrating the importance of hip circumference and fat percentage in the model's accuracy.

Discussion
What are the most dominant AV characteristics of healthy male smokers? The findings of the current investigation from the initial objective via PCA (Table 2) demonstrated that the most dominant AV characteristics of healthy male smokers could be identified in three principal components (PCs). The first PCs itemized weight, body mass index (BMI), waist and hip circumference, as well as per cent body fat. The second PCs projected body height, whilst the third PCs revealed nicotine level. These three principal components demonstrated that the smokers were characterized by higher body mass, high-fat accumulation, and a greater level of nicotine. This finding is concordant with the results reported from previous studies where the smoking samples were found to be heavier and with a considerable accumulation of body fat, as opposed to their non-smoking counterparts [51,52]. Moreover, a previous study demonstrated that the body fat of smokers is likely to be distributed mainly across the abdomen in a somewhat central or an apple-shaped pattern, which brings about adverse consequences for health [53]. It is worth noting that most of the effects of tobacco on body weight are regulated by nicotine, which induces the consumption of calories that could interfere with the weight gain processes of smokers [54].
What are the smokers' PA levels and the most dominant AV that differentiate the PA groups? The characteristics and discriminating features of PNAS and PPAS are shown in Tables 3 and 4, respectively. It was observed that weight, height, BMI, and waist and hip circumference as well as body fat per cent are higher in the PNAS compared with the PPAS. Previous investigation has established that an average smoker typically weighed less than similar aged non-smoker [17]. However, the current finding revealed that smokers who are physically inactive tend to be heavier compared with partially physically active smokers. It is plausible that physical inactivity's effect on smoking is mainly attributed to weight gain, obesity, and high fat accumulation, which could pose a high risk of cardiovascular disease [55]. The persistent ingestion of tobacco has been reported to negatively influence metabolism, and smokers tend to consume 350 to 575 more calories per day as opposed to non-smokers [56]. Moreover, studies have demonstrated that smokers who live a sedentary lifestyle have lower physical performance and lack of endurance, characterized by shortness of breath, all of which could lead to the deterioration of overall health [57,58]. Therefore, it is important to note that lack of PA coupled with smoking is a harbinger of many health complications such as cardiovascular diseases and cancer that may affect the general well-being and, indeed, mortality, of an individual [34,59]. However, the adverse effects of smoking on overall health have been shown to be mitigable or reversable when an individual quits smoking. Thus, highlighting smoking cessation should remain a public health goal.
How effective are the DA and logistic regression models in discriminating and classifying the PPA and PNA smokers based on their AV characteristics? It is demonstrated from the best model of the current investigation, i.e., the forward stepwise DA feed-forward LR, that the major discriminating AV among the PA groups are the fat percentage and the hip circumference, as shown in Table 5. Previous studies documented that an increase in smoking coupled with a lack of PA resulted in fatness and larger waist or hip circumference after adjusting for BMI [55,60]. Regular PA has been shown to protect smokers from some adverse effects of smoking. Essentially, PA could help to prevent excessive weight gain and inflammation as well as muscle, loss [61,62]. It is also evident that the best model has a comparatively lower SE, as well as higher chi-square values for hip circumference and fat percent, further demonstrating the importance of hip circumference and fat percentage in the model accuracy, as reflected in Table 6. These variables, i.e., hip circumference and fat percent are found to be the highest contributors to the overall model accuracy in addition to being indicators for the prediction of PA group membership.

Conclusions
These major findings from our study revealed that partially physically active smokers are characterized by relatively desirable anthropometric variables, in contrast with physically non-active smokers. It is evident from the current findings that certain anthropometric variables are vital for maintaining good health and that physical activity is essential for maintaining a healthy body in a sample of smokers. Furthermore, physically non-active smokers are more inclined to develop cardiovascular problems, coupled with nicotine dependence, when compared with partially physically active. Finally, it is worth highlighting that the application of multivariate analysis and machine learning may be useful in studying the underlying associations between the investigated variables.

Practical Application and Future Direction
As a recommendation to diminish the prevalence of smoking among individuals, governments could decrease smoking pervasiveness through the increase of smoking expense, by methods of deploying tax increases, initiating constant social advertising efforts through which health educators could regularly encourage smokers to stop, and offering both pharmacological and humanitarian support for stopping smoking. The promotion of exercise as a daily activity could be an important mechanism through which the effects of smoking on health could be averted or controlled. Governments should embark on the creation of modern leisure and PA facilities in parks, streets, marketplaces, and institutions, as well as corporate buildings, to motivate and promote active lifestyles amongst people. From the public health and clinic perspectives, exercise as well as a variety of PA could be introduced as prescriptions as part of the cessation mechanism since exercise could assist in reducing cravings, managing other withdrawal symptoms, and diminishing stress.

Limitations of the Study
The current study is subject to certain limitations. For instance, the dietary intake of the participants was not assessed, which could provide more precise characteristics of the participants with respect to their PA levels. The findings of the current investigation also could not be generalized to female smokers. The lack of the use of an actual PA test may have hindered the accurate estimation of the PA levels among the participants since over-or under-reporting of the PA levels by the participants could not be completely ruled out. Moreover, the use of physical assessment batteries, as well as other motor-related functional tests, could be explored to determine the physical fitness levels of smokers in future studies.

Informed Consent Statement:
Informed consent was obtained from all the participants before the commencement of the study.

Data Availability Statement:
The data used is available within the manuscript.