Abstract
The cigarette commodity is the second largest contributor to the food poverty line. Several aspects imply that poor people consume cigarettes despite having a minimal income. In this study, we are interested in investigating factors influencing poor people to be active smokers. Since the consumption number is a set of count data with zero excess, we have an overdispersion problem. This implies that a standard Poisson regression technique cannot be implemented. On the other hand, the factors involved in the model need to be selected simultaneously. Therefore, we propose to use a zero-inflated negative binomial (ZINB) regression with a minimax concave penalty (MCP) to determine the dominant factors influencing cigarette consumption in poor households. The data used in this study were microdata from the National Socioeconomic Survey (SUSENAS) conducted in March 2019 in East Java Province, Indonesia. The result shows that poor households with a male head of household, having no education, working in the informal sector, having many adult household members, and receiving social assistance tend to consume more cigarettes than others. Additionally, cigarette consumption decreases with the increasing age of the head of household.
Keywords:
penalized; zero inflated; negative binomial; minimax concave penalty; variable selection; cigarette consumption; poor household MSC:
62J07
1. Introduction
Indonesia has committed to poverty alleviation in the agenda for Sustainable Development Goals (SDGs) and the National Medium-Term Development Plan 2020–2024. It is expected to be able to reduce the national and regional poverty rates. However, from 2010 to 2020, the poverty rate in one of the provinces in Indonesia, East Java Province, was consistently higher than the national poverty rate. The poverty rate in East Java is still at the double-digit level compared to the national poverty rate, which has reached a single-digit level since 2018. This indicates a need for targeted poverty alleviation programs in East Java.
The Indonesia National Socioeconomic Survey conducted in 2019 indicated that the cigarette commodity is the second largest contributor to poverty after rice. The contribution is around 11.82 percent in urban areas and 10.74 percent in rural areas. It indicates that cigarettes are one of the basic needs of poor households in East Java. Cigarette consumption in low-income families is related to their smoking behavior. It is measured by the smoking intensity, represented by the number of cigarettes consumed.
The number of cigarettes consumed is a type of count data (the non-negative integer values). Poisson and negative binomial regressions are commonly used to build predictive models on count data []. However, the high frequency of data with zero values can make the regression model less precise in explaining the data. The count data with excess zeros may have significant variations that cause overdispersion.
The zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models can be applied to accommodate the excess zero observation. However, the ZIP model assumes that the variance and mean must be equal, which is called equidispersion. As a result, it is not easy to qualify the actual data. Meanwhile, the ZINB model has no such assumption []. The ZINB model is more suitable for accommodating the overdispersion that the ZIP model cannot overcome [].
The various factors of demographic and socioeconomic conditions may determine cigarette consumption in poor households []. The inclusion of many predictor variables in the regression model may lead to multicollinearity conditions. Therefore, a parsimonious model is required for better interpretation and more accurate prediction []. On the other hand, stepwise or subset selection methods have weaknesses, including inconsistency between the model selection algorithm and many hypothesis testing problems []. Therefore, we implement a method that can select variables and estimate coefficients simultaneously, as proposed by []. Hence, this study aims to estimate the model parameters and select the variables to determine the dominant factors influencing cigarette consumption of poor households in East Java, Indonesia, by applying the penalized zero-inflated negative binomial (ZINB) regression model.
Binomial models are extensively used, and numerous techniques have been suggested for estimation, prediction, and various other applications. In the case of estimation and variable selection within the normal linear model, the field of sparse estimation encompasses various methods such as the least absolute shrinkage and selection operator (LASSO) [], smoothly clipped absolute deviation (SCAD) [], and minimax concave penalty (MCP) [].
2. Materials and Methods
2.1. Data
The data were taken from the microdata of the National Socioeconomic Survey (SUSENAS) conducted in March 2019 in East Java Province, Indonesia. The response variable is the weekly average cigarette consumption of poor households during the past month. The predictors are related to household characteristics. For example, a discussion on the poverty data was delivered by []. The detailed predictors are presented in Table 1.

Table 1.
Predictor Variables.
2.2. Zero-Inflated Negative Binomial (ZINB)
The zero-inflated negative binomial (ZINB) regression is a model to accommodate the overdispersion problem caused by excess zeros in the response variable. The ZINB regression model in which the response variable has a probability density function as follows [,]:
where is the success probability, is the conditional expectation of the response, , and is the dispersion parameter with . Assume and are parameters that depend on the predictor vectors and , respectively. The length of the predictor vector is , while the length of the predictor vector is . Therefore, the equation of the link function is as follows:
where and are the regression coefficients. For independent random samples, let be the log-likelihood function given by:
2.3. Penalized Zero-Inflated Negative Binomial (ZINB) Regression
Variable selection is the key to producing a parsimonious model to facilitate model interpretation []. The modern variable selection methods were developed to overcome the weaknesses of the traditional methods by adding the penalized likelihood or penalty function. The selection of variables using the penalized ZINB regression model is as follows []:
where is the log-likelihood of the ZINB function and is the non-negative penalty function given by:
where and are the number of parameters for the NB and zero components, respectively. represents the NB component parameter coefficients and represents the zero component parameter coefficients. Tuning parameters and are determined by data-driven methods. Intercept and parameter are always included in the model; hence, they are not penalized.
This study used three penalty functions, as follows:
- Least absolute shrinkage and selection operator (LASSO). The penalty function is given by []:
- Smoothly clipped absolute shrinkage (SCAD). The first derivative of the SCAD penalty function on [0, ∞) is given by []:For where is the indicator function.
- Minimax concave penalty (MCP). The first derivative of the MCP penalty function on [0, ∞) is given by []:
2.4. The EM Algorithm
The expectation maximization (EM) algorithm is generally applied to mixture models []. It is often used in missing data problems. The EM algorithm has two main steps: the expectation step (E-step) and the maximization step (M-step).
Let if is a zero value observation and if is from a negative binomial (NB) distribution. Since is not observable, it is often treated as missing data. If complete data are available, the penalized log-likelihood function is given by []:
where and is a non negative penalty function.
The EM algorithm computes the expectation of the complete data log-likelihood, which is linear in z. The log-likelihood of Equation (4) is maximized iteratively by alternating between estimating by its conditional expectation under the current estimates (E-step) and maximizing Equation (11) (M-step) by from the E-step. The iteration stops when the estimated converges.
The E-step estimates by its conditional mean , given the data and assuming that the current estimates are provided by the right model parameters. The conditional expectation of at iteration is given by:
Therefore, the expectation of the complete data loglikelihood can be expressed as follows:
where and .
The EM algorithm updates the estimates by maximizing Equation (11), which can easily be achieved because and are two disjoint terms. The first term, , is the weighted penalized NB log-likelihood function, whereas the second term, , is the penalized logistic log-likelihood function. The EM performs an iterative procedure for the E-step and M-step until the optimum parameter is obtained.
The EM algorithm requires parameter estimation of the two terms, which are the components of the penalized ZINB model. Meanwhile, the NB and logistic regression models are generalized linear models (GLMs). The parameter estimation of regularized GLMs used the modified iteratively reweighted least squares (IRLS) algorithm. It uses a coordinate descent algorithm to optimize the penalized weighted least squares [].
2.5. Tuning Parameter Selection
The tuning parameter can be determined using the Bayesian information criterion (BIC) as follows []:
where is the estimated parameter, is the tuning parameter of the NB component, is the tuning parameter of the zero component, and is the log-likelihood function. The degrees of freedom are given by , including the degree of freedom for the scale parameter θ. The first step is to construct a solution path based on the paired shrinkage parameters. The algorithm generates two decreasing sequenced parameters, and . Subsequently, sequences are paired. In principle, the large value can be chosen in such a way that all the coefficients are zero, except the intercepts. The tuning parameter can be chosen based on the smallest BIC value [].
3. Results
This study observed 3010 poor household samples from the National Socioeconomic Survey (SUSENAS) in March 2019 in East Java. All computations in this section are presented in Appendix A. Based on the data, the highest number of cigarette consumption in poor households was 720 cigarettes per week during the past month. In contrast, the smallest consumption was 0 cigarettes (no smoking). In other words, 41.4 percent of no-smoking poor households is indicated by the histogram decreasing to the right (Figure 1). Meanwhile, the mean is around 45 cigarettes per week during the past month, with a relatively large standard deviation of 62.49. It indicated that the range of data variation was so extensive that cigarette consumption among poor households varied considerably. The standard deviation value is greater than the mean value, which causes overdispersion.

Figure 1.
Histogram of Cigarette Consumption of Poor Households in East Java 2019.
Figure 2 demonstrates that most poor households who smoked lived in rural areas. It is in line with a related study that cigarette consumption is higher among low-income families living in rural areas than in urban areas []. The poor households who smoked had a male head of household and were married. The head of the poor household who smoked generally had low education. Based on the previous study, families with a head of household who did not attend school tended to consume cigarettes more than those who attended school []. Smoking also appeared to be more prevalent in those employed than those unemployed. Cigarette consumption of poor households who worked in informal sectors is higher than those who worked in formal sectors. The previous study showed that families with a head of household who worked in the informal sector tended to smoke more than those who did not [].

Figure 2.
The Characteristics of Poor Households who Smoking in East Java 2019.
Based on Figure 2, most poor households who smoked received social assistance. Moreover, poor households with children under five had a lower smoking percentage. In addition, more than 50% of poor households who smoked lived in their own house.
Overdispersion checking was done by comparing the variance and mean. If the variance is greater than the mean, an overdispersion occurs. Overdispersion checking can also be seen from the statistical dispersion, namely the division between the Pearson chi-square statistical value and the degrees of freedom []. Based on the processing, the Pearson chi-square statistical value is 170,923 with a degree of freedom of 2989. Hence, the statistical dispersion is 57.1841, which indicates an overdispersion on the response variable.
The excess zero testing in the count data was done using the score test []. The result indicated that the p-value of is smaller than the 5 percent significance level. It also indicated that the response variables have excess zeros, so the zero inflation regression model is more appropriate. ZINB regression with Backward Elimination (BE) regression for 14 predictor variables was employed to model the cigarette consumption of poor households in East Java. The likelihood ratio test was done to test parameter estimation simultaneously, whereas the Wald test was employed to test the parameters partially []. The result indicated that the likelihood ratio is 877.1114, greater than , which is 55.76. In other words, at least one predictor variable influenced the response variable.
In the early steps of modeling the ZINB penalized regression, the tuning parameter was selected based on the smallest BIC value. In each penalty function, two tuning parameters were to be determined: the tuning parameter for the negative binomial component and the tuning parameter for the zero component . In addition, there was an additional tuning parameter in the SCAD and MCP penalty functions [,,].
Based on Table 2, the ZINB-LASSO model selected predictor variables more minor than other penalized ZINB models. Because of the characteristics of the LASSO penalty function, only one variable could be chosen among the predictor variables with a high correlation []. The predictor variables selected in the zero component of the ZINB-SCAD and ZINB-MCP models were generally the same. Meanwhile, the selected predictors of the ZINB-MCP in the NB component were the least.

Table 2.
ZINB Model on Poor Household Cigarette Consumption.
As seen in Table 3, the ZINB-MCP model has the smallest BIC value than the other four regression models. Moreover, the number of parameter coefficients of the penalized ZINB-MCP model is relatively smaller than the other models, indicating that the ZINB-MCP model is parsimonious. Thus, the ZINB-MCP model could better model the cigarette consumption data of poor households in East Java in 2019. Based on the results of parameter estimation in Table 2, the ZINB-MCP model equation is as follows:

Table 3.
Bayesian Information Criterion (BIC) on ZINB Model.
- ZINB-MCP model in the negative binomial (NB) component
- ZINB-MCP model in the zero component
The zero component model shows that the gender of the head of the household had a significant effect on cigarette consumption in poor households. Poor households with a male head of household tended to consume more cigarettes than those with a female house of household. The Indonesian Demographic and Health Survey (IDHS) in 2012 showed that most of the heads of households in Indonesia were male smokers []. Meanwhile, NB and zero component models showed that the age variable significantly influenced cigarette consumption in poor households. Based on a previous study, cigarette consumption will decrease as the age of the head of household increases []. It indicates that most of the young heads of households smoked cigarettes. Therefore, controlling tobacco or cigarette consumption should be targeted at all ages.
The education variable of the head of household significantly influenced cigarette consumption in poor households according to the NB and zero component models. The low-income families whose head of household did not attend school tended to consume more cigarettes than poor households whose head of household had higher education. This result aligns with studies in 48 low- and middle-income countries. Those who do not attend school generally smoke three times more than those who are educated []. An educated person will have great awareness of the health risks of smoking []. Therefore, education is the primary key to reducing cigarette consumption in poor households.
The zero component model found that the working status variable of the head of household significantly influenced cigarette consumption in poor households. Poor households with a head of household who worked in the informal sector tended to consume more cigarettes than those who did not, which is in line with the previous study []. Therefore, it is necessary to put various efforts into increasing awareness of the dangers of smoking by disseminating numerous media that are easily accessible to households. Provincial and regency/city governments need to impose the regulation of non-smoking areas.
The NB and zero component models showed that the number of adult members variable significantly influenced cigarette consumption in poor households. In line with the previous study, adding one adult member would increase cigarette consumption in poor households. It is reasonable that the household member who smokes is an adult.
Furthermore, the zero component model found that the social assistance variable significantly influenced cigarette consumption in poor households. Poor households that received social assistance were more likely to consume cigarettes than those that did not. In line with the previous study, the average number of cigarettes consumed by the head of a household of poor households will increase along with the amount of social assistance received []. This phenomenon relates to the household’s behavior (moral hazard) in allocating excess income. They will likely not spend the additional income from social assistance benefits on basic needs. Another study showed that middle- or low-income households often allocate food expenditure for cigarette consumption []. Therefore, the government should encourage low-income families to use the social assistance benefits for food consumption, support education, health, and basic needs other than cigarette consumption, to improve their standard of living to get out of poverty.
4. Discussion
Modeling the behavior of cigarette consumption is carried out using a technique called Zero-Inflated Negative Binomial (ZINB) with Backward Elimination (BE) regression. Due to some potential covariates, we then need to combine its classical technique with a variable selection procedure. Hence, we penalized the ZINB regression using three penalty functions (LASSO, SCAD, and MCP). Based on the smallest BIC and RMSE values, the penalized ZINB-MCP regression performs better than the others. We also investigate that out of 14 predictor variables (10 dummy), six predictors (two dummy) are selected, those are; gender of the head of household, age of head of household, education of the head of household in the non-school category, working status of head of household in informal sector category, number of adult household members, and social assistance. We found that some variables have a negative effect, for example, the cigarettes consumptions are decreasing as the age of the consumers increases.
5. Conclusions
According to the results, it can be concluded that modeling and selection variables of cigarette consumption of poor households used classical (ZINB-BE) and modern methods, namely penalized ZINB regression using three penalty functions (LASSO, SCAD, and MCP). The best model was obtained based on the smallest BIC value, namely the ZINB-MCP model. This study indicated that poor households with a male head of household who was young, had no education, worked in the informal sector, had many adult household members, and received social assistance tended to consume more cigarettes. Therefore, the awareness of poor households for reducing cigarette consumption should be supported by improving their education and knowledge.
Author Contributions
Conceptualization, Y.A. and R.F.; methodology, Y.A.; software, R.F.; validation, Y.A., B.T., N.S. and K.W.; formal analysis, Y.A., B.T., N.S. and K.W.; investigation, Y.A.; resources, R.F.; data curation, R.F.; writing—original draft preparation, Y.A. and R.F.; writing—review and editing, Y.A. and A.N.F.; visualization, R.F.; supervision, Y.A., B.T., N.S., I.G.N.M.J. and K.W.; project administration, Y.A. and A.N.F.; funding acquisition, Y.A. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Universitas Padjadjaran via RPLK scheme with the contract No: 1549/UN6.3.1/PT.00/2023.
Data Availability Statement
Not applicable.
Acknowledgments
The authors gratefully thank to Universitas Padjadjaran for supporting the research which is funded by RPLK scheme with the contract No: 1549/UN6.3.1/PT.00/2023. We also thank to reviewers for the valuable review for this paper.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
### RCODES ### rm(list=ls()) library(“mpath”) library(“zic”) library(“pscl”) library(vcdExtra) data=read.table(file.choose(),header=T,sep=“,”) y <- data$ciga x1<- data$Residence x2<- data$Gender x3<- data$Single x4<- data$Divorce x5<- data$Age x6<- data$Secondary x7<- data$Primary x8<- data$NoEduc x9<- data$Formal x10<- data$Informal x11<- data$Adultmembers x12<- data$HouseholdInternet x13<- data$HouseholdWork x14<- data$SocialAssistance x15<- data$ToddlerExistence x16<- data$Rent x17<- data$FreeRent x18<- data$Other x19<- data$HealthExpenditure x20<- data$EducationExpenditure #--------------Histogram for the response variable---------------- h<-hist(y,main=“Histogram of Cigarette Consumption”, ylab=“Frekuency”,xlab=“Cigarettes (sticks)”, xlim=c(0,600),ylim=c(0,1500),breaks=50,col=“blue”, freq=T) #----------Goodness of fit of the Response Variable----------- # Poisson with KS test ks.test(y,”ppois”,lambda <- mean(y)) # Poisson Dispersion Test/Variance Test TCC <-((length(y)-1)*var(y))/mean(y) qchisq(0.05,length(y)-1) #------------Overdispersion---------------------- #Model: POISSON library(MASS) Pois <-glm(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+ x16+x17+x18+x19+x20, family=poisson) summary(Pois) #------------------Zero Excess Test---------------- zero.test(y) ##----------------Estimation Model Using ZINB----------- dat<-cbind(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15, x16,x17,x18,x19,x20) dat<-as.data.frame(dat) m1<-zeroinfl(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+ x16+x17+x18+x19+x20,dist=“negbin”, data=dat) summary(m1) cat(“loglik of zero-inflated model”, logLik(m1)) cat(“BIC of zero-inflated model”, AIC(m1, k=log(dim(dat)[1]))) cat(“AIC of zero-inflated model”, AIC(m1)) res.zinb=m1$residual rmse.zinb=sqrt(mean((res.zinb)^2)) rmse.zinb #----Likelihood ratio for simultaneous test---- m2<-zeroinfl(y~1,dist=“negbin”, data=dat) summary(m2) l0<- logLik(m2) lp<- logLik(m1) G<- -2*(l0-lp) G qchisq(0.95,40) ##------------Estimation Model Using ZINB BE(0,05)------ fitbe<-be.zeroinfl(m1,data=dat, dist=“negbin”, alpha=0.05, trace=FALSE) summary(fitbe) cat(“loglik of zero-inflated model with backward selection”, logLik(fitbe)) cat(“BIC of zero-inflated model with backward selection”, AIC(fitbe, k=log(dim(dat)[1]))) minBic <- which.min(BIC(fitbe)) AIC(fitbe)[minBic] BIC(fitbe)[minBic] res.be=fitbe$residual rmse.be=sqrt(mean((res.be)^2)) rmse.be ##---------Estimation Model Using Penalized ZINB-LASSO-------- fit.lasso<-zipath(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+ x14+x15+x16+x17+x18+x19+x20,data=dat,family=“negbin”, nlambda=100, lambda.zero.min.ratio=0.001, maxit.em=300, maxit.theta=25, theta.fixed=FALSE, trace=FALSE, penalty=“enet”, rescale=FALSE) minBic <- which.min(BIC(fit.lasso)) coef(fit.lasso, minBic) cat(“theta estimate”, fit.lasso$theta[minBic]) se(fit.lasso, minBic, log=FALSE) AIC(fit.lasso)[minBic] BIC(fit.lasso)[minBic] logLik(fit.lasso)[minBic] #plot BIC lasso with tuning parameter indexes BIC.Lasso<-BIC(fit.lasso) plot(BIC.Lasso) res.so=fit.lasso$residual [1:3010,22] rmse.so=sqrt(mean((res.so)^2)) rmse.so ##-------Estimation Model Using Penalized ZINB-SCAD----- tune.scad<-tuning.zipath(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+ x12+x13+x14+x15+x16+x17+x18+x19+x20,data=dat standardize=TRUE, family = “negbin”, penalty = “snet”,lambdaCountRatio = .0001, lambdaZeroRatio = c(.1, .01, .001), maxit.theta=1, gamma.count=3.7, gamma.zero=3.7) fit.scad <- zipath(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+ x13+x14+x15+x16+x17+x18+x19+x20,data = dat, family = “negbin”,lambda.count=tune.scad$lambda.count lambda.zero= tune.scad$lambda.zero, maxit.em=300, maxit.theta=25, theta.fixed=FALSE, penalty=“snet”) minBic.s <- which.min(BIC(fit.scad)) coef(fit.scad, minBic.s) cat(“theta estimate”, fit.scad$theta[minBic.s]) se(fit.scad, minBic.s, log=FALSE) AIC(fit.scad)[minBic.s] BIC(fit.scad)[minBic.s] logLik(fit.scad)[minBic.s] #plot BIC scad dg indeks tuning parameter BIC.Scad<-BIC(fit.scad) plot(BIC.Scad) res.scad=fit.scad$residual [1:3010,26] rmse.scad=sqrt(mean((res.scad)^2)) rmse.scad ##---------Estimation Model Using Penalized ZINB-MCP---------- tune<-tuning.zipath(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12 +x13+x14+x15+x16+x17+x18+x19+x20,data=dat,standardize=TRUE, family = “negbin”,penalty = “mnet”,lambdaCountRatio = .0001, lambdaZeroRatio = c(.1, .01, .001), maxit.theta=1, gamma.count=2.7, gamma.zero=2.7) fit.mcp<-zipath(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+ x15+x16+x17+x18+x19+x20,data=dat,family = “negbin” gamma.count=2.7, gamma.zero=2.7, lambda.count=tune$lambda.count, lambda.zero= tune$lambda.zero,maxit.em=300, maxit.theta=1, theta.fixed=FALSE, penalty=“mnet”) minBic <- which.min(BIC(fit.mcp)) coef(fit.mcp, minBic) cat(“theta estimate”, fit.mcp$theta[minBic]) se(fit.mcp, minBic, log=FALSE) AIC(fit.mcp)[minBic] BIC(fit.mcp)[minBic] logLik(fit.mcp)[minBic] #plot BIC mcp with tuning parameter BIC.mcp<-BIC(fit.mcp) plot(BIC.mcp) res.mcp=fit.mcp$residual [1:3010,21] rmse.mcp=sqrt(mean((res.mcp)^2)) rmse.mcp ##---------Residual Checking using ZINB-MCP------- #Normalitas #Histogram Residual res.mcp=fit.mcp$residual [1:3010,21] hist(res.mcp, freq = FALSE) curve(dnorm, add = TRUE) #Normal Probability Plot of the residual probDist <- pnorm(res.mcp) plot(ppoints(length(res.mcp)), sort(probDist), main = “PP Plot”, xlab = “Observed Probability”, ylab = “Expected Probability”) abline(0,1, col=“red”) #Plot between Residual v.s. Fittedvalue pearson.res=resid(fit.mcp, type=‘pearson’)[1:3010,21] miu.hat=predict(fit.mcp,type=‘respon’)[1:3010,21] plot(miu.hat,pearson.res, main=“ZINB-MCP Regression”, ylab=“Residuals”, xlab=“Predicted”, col=“blue”) abline(h=0,lty=1,col=“red”) lines(lowess(miu.hat,pearson.res),lwd=2, lty=2) #Independensi Residual lag.plot(res.mcp)
References
- Said, A. Indonesian Sustainable Development Goals (SDGs) Indicators, BPS RI/BPS-Statistics Indonesia; Indonesian Statistical Bureau: Jakarta, Indonesia, 2019; p. 11. [Google Scholar]
- Kang, K.I.; Kang, K.; Kim, C. Risk factors influencing cyberbullying perpetration among middle school students in Korea: Analysis using the zero-inflated negative binomial regression model. Int. J. Environ. Res. Public Health 2021, 18, 2224. [Google Scholar] [CrossRef]
- Komasari, D.; Helmi, A.F. Faktor-faktor penyebab perilaku merokok pada remaja. J. Psikol. 2000, 27, 37–47. [Google Scholar]
- Wang, Z.; Ma, S.; Wang, C. Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany. Biom. J. 2015, 57, 867–884. [Google Scholar] [CrossRef] [PubMed]
- Hilbe, J.M. Negative Binomial Regression; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Hosseinpoor, A.R.; Parker, L.A.; Tursan d’Espaignet, E.; Chatterji, S. Social determinants of smoking in low-and middle-income countries: Results from the World Health Survey. PLoS ONE 2011, 6, e20331. [Google Scholar] [CrossRef] [PubMed]
- Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective. J. R. Stat. Soc. Ser. B Stat. Methodol. 2011, 73, 267–288. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Zhang, C.-H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef]
- Park, S.; Yang, A.; Ha, H.J.; Lee, J. Measuring the Differentiated Impact of New Low-Income Housing Tax Credit (LIHTC) Projects on Households’ Movements by Income Level within Urban Areas. Urban Sci. 2021, 5, 79. [Google Scholar] [CrossRef]
- Wang, Z.; Ma, S.; Zappitelli, M.; Parikh, C.; Wang, C.-Y.; Devarajan, P. Penalized count data regression with application to hospital stay after pediatric cardiac surgery. Stat. Methods Med. Res. 2016, 25, 2685–2703. [Google Scholar] [CrossRef]
- Wang, Z.; Ma, S.; Wang, C.; Zappitelli, M.; Devarajan, P.; Parikh, C. EM for regularized zero-inflated regression models with applications to postoperative morbidity after cardiac surgery in children. Stat. Med. 2014, 33, 5192–5208. [Google Scholar] [CrossRef]
- Breheny, P.; Huang, J. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 2011, 5, 232. [Google Scholar] [CrossRef] [PubMed]
- Hu, T.W.; Mao, Z.; Liu, Y.; de Beyer, J.; Ong, M. Smoking, standard of living, and poverty in China. Tob. Control 2005, 14, 247–250. [Google Scholar] [CrossRef] [PubMed]
- Siahpush, M. Socioeconomic status and tobacco expenditure among Australian households: Results from the 1998–99 Household Expenditure Survey. J. Epidemiol. Community Health 2003, 57, 798–801. [Google Scholar] [CrossRef] [PubMed]
- Herawati, P.; Afriandi, I.; Wahyudi, K. Determinan Paparan Asap Rokok di Dalam Rumah: Analisis Data Survei Demografi dan Kesehatan Indonesia (SDKI) 2012. Bul. Penelit. Kesehatan. Bul. Penelit. Kesehat. 2019, 47, 245–252. [Google Scholar]
- Van den Broek, J. A score test for zero inflation in a Poisson distribution. Biometrics 1995, 51, 738. [Google Scholar] [CrossRef]
- Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data; Cambridge University Press: Cambridge, UK, 2013; Volume 53. [Google Scholar]
- Hirose, Y. Regularization methods based on the Lq-likelihood for linear models with heavy-tailed errors. Entropy 2020, 22, 1036. [Google Scholar] [CrossRef]
- Patil, A.R.; Kim, S. Combination of ensembles of regularized regression models with resampling-based lasso feature selection in high dimensional data. Mathematics 2020, 8, 110. [Google Scholar] [CrossRef]
- Liu, X.; Zhao, B.; He, W. Simultaneous feature selection and classification for data-adaptive Kernel-Penalized SVM. Mathematics 2020, 8, 1846. [Google Scholar] [CrossRef]
- Algamal, Z.Y.; Lee, M.H. Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Syst. Appl. 2015, 42, 9326–9332. [Google Scholar] [CrossRef]
- Pampel, F. Tobacco use in sub-Sahara Africa: Estimates from the demographic health surveys. Soc. Sci. Med. 2008, 66, 1772–1783. [Google Scholar] [CrossRef]
- Cendekia, D.G. Keterkaitan Transfer Pemerintah Untuk Perlindungan Sosial Terhadap Perilaku Merokok Pada Rumah Tangga Miskin Di Indonesia (The Influence of Government Transfers for Social Protection on Smoking Behaviour Among Poor Households in Indonesia). J. Kependud. Indones. 2018, 13, 133–142. [Google Scholar]
- John, R.M.; Ross, H.; Blecher, E. Tobacco expenditure and its implications for household resource allocation in Cambodia. Tob. Control 2012, 21, 341–346. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).