1. Introduction
Smoking is considered one of the leading causes of deaths internationally. According to a recent NHS report [
1], smoking caused the deaths of approximately 7900 people in England alone in 2016. The report further states that smoking is not only harmful to the smokers, but many diseases may be caused by the exposure to passive smoking, especially affecting children who are particularly vulnerable to the effects of passive smoking. This makes reducing cigarette smoking a significant public health priority. To support efficient and timely delivery of intervention for those wishing to quit smoking, it is important to be able to model the smoker’s behaviour, and in order to do that, it needs to target both endogenous stressors (e.g., nicotine effect, craving, etc.)and exogenous stressors (e.g., timing, location, type of activity, etc.) that trigger the smoking events [
2].
With advances in technology, new possibilities have emerged for creating efficient cessation programs, particularly through the use of mobile apps. This new technology has many advantages over traditional therapies; it can reach people wherever they are; enhance their experience by opening new channels between the therapist and the smoker; lastly, it offers the possibility to access databases that can provide individual feedbacks on the smokers’ current status [
3], Several methods have been used to provide intervention using mobile apps, For example, text messages either in regular or randomized intervals, or by making the user initiate access to the intervention by reporting on indicators that may cause a potential lapse [
4,
5,
6].
Investigations using self-reporting as a method have indicated that the reported predictors can provide a high degree of possibility for predicting potential lapses [
4,
5]. Schick et al. [
6] improved this method by using Hidden Markov Models to set patterns for the timing and places in which individuals are most likely to smoke, and then use these patterns for better delivery of the support messages. This paper did not report any analytical results that are related to Hidden Markov Models, but rather focused on the positive feedback from the participants who used their mobile application.
Recent advances in computation make machine learning a perfect tool for modelling smokers’ behaviour, enabling the implementation of smart mobile apps that have the ability to provide ‘just in time’ intervention. For example Dumortier et al. [
7] used machine learning methods to evaluate the urge to smoke based on participant reporting of 41 features (e.g., alcohol consumption, mood status, hunger, location, type of working, etc.) that may trigger an urge to smoke. They compared three different machine learning algorithms (naive Bayes classifier, discriminant analysis classifier, and decision tree learning), and checked the accuracy of the classification based on a number of selected features. Results indicated that machine learning had the ability to estimate the smokers’ urge rating with an accuracy of the classifications up to 86%. However, the models relied on the users reporting a large number of input features. Another study [
8] also used decision tree to predict daily smoking behaviour. Here population information from the 2015 China Adult Tobacco Survey Report was used; the research modelled an equation that calculates the probability of smoking time based on gender, age and time and used statistical information from the dataset as well as some additional extracted features as input to the decision tree model. The researchers concluded that the best method of prediction is XGBoost with 84.11% accuracy.
In addition to the issues around self reporting, most existing apps for smoking cessation do not take into consideration the complexity of nicotine dependence treatment or the specific needs of the users [
3]. Self-reporting as a method can be inaccurate as it is sensitive to self-biased errors based on how participants define emotional variables (e.g., withdrawal, stress, craving, alcohol use) or environmental variables (e.g., location, the presence of other smokers) [
5]. Furthermore, long-term self-reporting is more likely to be affected by the ‘Ostrich problem’ by which people avoid monitoring their behaviour, as it may be unpleasant, tiresome, or lead to unwanted changes in behaviour [
9]. Therefore, collecting time information from mobile sensors can reduce the reliance on self-reports, and increase the accuracy of just-in time intervention messages [
4].
Actions (including smoking) can be seen as being motivated by the need to maintain stability over time, in the face of a changing environment. This motivation can be interrupted by internal factors, e.g., feelings such as sadness, or external factors such as nicotine level [
10]. A closed-loop control model is a common instrumental technique that seeks to maintain stability. It employs a feedback principle, using the output data from the model (feedback signal) as an input to modify the model’s actions, and hence maintain stability [
11]. However, modelling addictive behaviour as a closed loop control model is a challenging task. It requires understanding the complexity of humans, as well as determining what elements should be counted to model the addictive behaviour. Moreover, when modelling the addictive behaviour, the goal state represents the fact that the system seeks to obtain a steady state (natural state), rather than to imply that there exists a single fixed value, as is often the case in system engineering [
12,
13].
Opponent process theory is claimed to be an essential method that can be used to model a person’s emotional state [
14]. Solomon [
15] described addictive behaviour using the opponent process theory. Within this model, an addict experiences pleasure as soon as a drug is supplied, which is followed by slowly accumulated withdrawal symptoms. As such, during the initial stages of addiction, the pleasure level is high and is accompanied by a low level of withdrawal symptoms. However, as time goes by, the withdrawal symptoms increase leading to a decrease in pleasure caused by using the drug, potentially resulting in a higher quantity of the drug being consumed [
12].
Bobashev et al. [
16] modelled the behaviour of smokers and employed the opponent process scheme of control theory. The model did not present any complex neurobiological process, only providing a mathematical model with a cascading feedback loop, aimed at presenting the scientific narrative of the opponent process as shown in
Figure 1.
The model equations were developed with phenomenological interpretation in mind, and no real biological process was modelled. A set of continuous functions were used, feeding into the cascading functions. The system equations involve five interlinked processes,
where a, b and
are scaling coefficients, and all the
initial values are set to zero. Each equation presents a weighted integration of the previous one, causing the processes to lengthen successively.
represents the effect of nicotine level and is modelled with a pharmacokinetic equation.
represents the toxicity level and how the body processes the drug.
is the daily smoking habit.
is a longer scaling habit, which is scaled in years (rather than minutes/hours/days). While the process
has not been interpreted, it has been used to add scaling period between
and
, which results in a slow change in process
. To simulate smoking behaviour, a threshold value was defined to prompt self-administration. The threshold
has calibration coefficients
, and to avoid division by zero one is added to the denominator of the equation. The threshold value is changed based on external stressors to initiate cigarette use
The research also modelled the withdrawal and craving processes; these processes begin immediately following the initial nicotine use and grow over time
where
and
are calibration coefficients. This control theory model was able to simulate plausible changes in smoking behaviour over time. However, the system was not able to present real-life behaviour, and could not capture individual differences between smokers’ daily habits.
Figure 2 shows an example of the differences between the smoking behaviour as presented using the simulated control theory model
Figure 2a and real-life data collected from a participant shown in
Figure 2b.
Studies show that modelling smoking behaviour is essential, as it can improve the intervention process in the way of helping smokers in their most needed time [
17]. While control theory models lack in prediction but provide an explanation, on the other hand, the deep learning (DL) models provide superior prediction without explanation. In order to get better time-series data prediction, it is useful to incorporate a mechanical structure into a phenomenological statistical model [
18]. Following this hypothesis, this research proposes a deep-learning model, which when combined with a control theory model of smoking, will be able to adapt to the smoker’s unique behaviour and predict future smoking events. The Bobashev et al. [
16] model was chosen due to its ability to capture the nicotine effect using the pharmacokinetic equation. The model can be later employed to develop a smart mobile app that will send automated interventions. Here, we describe the implementation of this control theory model of smoking that is expanded to incorporate other factors affecting smokers’ smoking behaviour (e.g., geolocation and motion).
2. Classification Method: The 1D Convolutional Neural Network
In recent years, deep learning as sub-field of machine learning (ML) has attracted great interest from the scientific community. DL refers to a deep neural network that consists of a massive web of interconnected nodes (whose depth is more than a single hidden layer). The nodes are able to perform complex, non-linear, computation on a set of input features, and give a suggested solution as an output. This new structure has been used to resolve many complex computer science problems such as image and speech recognition, with better accuracy compared to previous approaches of ML [
19,
20]. Convolutional neural networks (CNN) are a type of feed-forward neural network, which dates back to the 1980s. CNNs are composed of a convolution operation followed by a pooling operation [
21,
22]. With the increase interest in DL, CNNs have been reintroduced and used in many applications [
23]. The main advantage of a CNN is its ability to be applied on parallel methods, and its high ability to learn, ensuring that all stages of the computation are appropriate for the data and for each other. To solve a problem using CNN, one should try experimenting with different variables including the number of layers, kernel size, choice of an activation function, etc. [
24]. 1D-CNN performs a convolutional operation on the local region of the input data using different kernels for the individual features. Also, the size of the local region can vary for different features (this is not possible with a 2D CNN). The 1D convolutional operation in layer
l,
where
is the multi-dimensional convolutional kernel,
i is the kernel index,
b is the bias and
and
y are the input and output respectively, performs dot-products across the input [
25]. In most models, a Deep CNN will use a rectified linear unit (ReLU)
, instead of a traditional neural network (hyperbolic tangent, logistic sigmoid) activation function. ReLU is more efficient, simpler and allows non upper-bounded output values. Also, in order to improve the performance of the CNN, regularisation techniques may be used, which reduces the generalization error while preserving the training accuracy [
26].
3. Data Collection and Processing
A mobile application was developed, that can collect signals from mobile sensors (e.g., movement and environment), as well as participants’ self-report of smoking events. Five smokers (all taking at least 5 cigarettes per day) were recruited, and were asked to report their smoking events for two weeks. In the pre-processing stage of the data, samples for each day were unified to 1440 sample per day (one sample per minute). To do so, three types of events were registered in the dataset: smoking, not-smoking and app-off (representing gaps in the dataset due to, for example, participant’s mobile phone being off).
Figure 3a shows the frequency of events for each of the five participants. It is clear from the data that the classes are unbalanced, as there are far fewer smoking compared to non-smoking events. Overall, of the 1440 data samples per day less than 15 per day are smoking events, while the rest are either not smoking or app-off events. To overcome this limitation, the time periods for labelling was changed to include a 30-min window followed the smoking event rather than a 1-min window, hence reducing the ratio of smoking to non-smoking events. Furthermore, it is assumed in the model that app-off is a non-smoking event, to remain cautious.
Figure 3b shows the frequency of events for each of the five participants after applying these changes.
The reported smoking events were then used as input to the control theory model of smoking, in order to calculate the nicotine levels and threshold value during the 13 day period (one 24 h period was dropped because it was made of two half-days, one at the start and the other at the end of the data collection period). Calculated data (e.g., nicotine level) along with collected data (e.g., light, GPS Location, activity labels etc.) were combined to form the dataset for each participant. The reported smoking events were the labels for the data set.
Figure 4 illustrates the process of data collection.
3.1. Mobile App
Data collection took place using a mobile application developed for Android mobile users, using Android Studio (IDE). The main focus of the User Interface (UI) was to develop a user-friendly interface that provides no feedback to users, as so to avoid influencing their behaviour [
27]. The UI was used to label smoking events, relying on participants’ self-reporting. Users could report smoking events either by pressing a button on the main layout of the app, or by pressing a Widget on the home screen of the smartphone as can be seen in
Figure 5.
The application was designed to run as a background service, which records data from the phone’s sensors. This service was designed to restart itself whenever terminated (either by the OS or otherwise). This was implemented in order to overcome a new restriction forced by Android on the development of background services that run for long periods. Collected data, along with smoking events were stored on an internal SQLite database.
3.2. Data Collection
For this study, the participants were healthy smoking adultsover 18 years old, with a good level of English literacy. They each owned and regularly use an Android mobile phone. Smokers were defined as those smoking at least 5 cigarettes a day for at least 6 months; they all smoke traditional cigarettes. During the data collection period, the application was installed on the participant’s smartphone for two weeks. No restrictions were been placed on their daily activities, and they were only asked to report their smoking events and keep the GPS on. At this stage of the research data has been collected from 5 participants (3 females: 2 male); all from the UK. The exclusion criteria were being under 18 years or over 55 years; self-reported physical or mental health issues that impact movement; not using an Android phone (e.g., using an iPhone). Although the number of participants appears small, The study by Schick et al. [
6] modelled smoking behaviour using 4 participants, hence 5 participants were a sufficient number to model smoking behaviour. In addition, the ML model is trained for each participant separately, where a large volume of data was collected from each participant (approximately more than 1000 smoking events and 18720 samples each participant), making it sufficient for modelling a machine learning problem.
Data were collected from several sensors in order to identify correlations between smoking events and the sensors reading.
Table 1 shows the types of collected data. The goal to use the collected data to find the association between smoking events and environmental data, in order to inform the implementation of a machine learning model that can automatically predict smoking events based on the occurrence of internal and external predictors. Following data collection, it emerged that not all sensors are available in all mobile models. Therefore the plan was modified to use only the common sensors that appear in most of the mobiles, i.e., the accelerometer and GPS values.
5. Results
After testing the three classification methods, the 1D-CNN was selected as the most suitable classifier to predict smoking events. The classifier predicts either smoking or non-smoking states, with the app off event being treated as non-smoking events. The point of the prediction was to see if the model can accurately forecast the nicotine level (rather than use the originally calculated values) using the combined control theory and 1D-CNN model, and then predict smoking events based on this predicted nicotine value. As the nicotine level is considered a changed value over time. The model uses the comparison between the nicotine level and the threshold value as the first indicator for the need to have a cigarette. The model then makes use of external stressors (accelerometer and GPS) as input to the DL model in order to make the final decision regarding the likelihood of a smoking event. The importance of using the 1D-CNN model as part of the control theory model is to ensure the capture of the endogenous factors which affect the smoker’s behaviour as presented by nicotine level inside the smoker’s body. This approach should ensure that no intervention messages are sent before the nicotine level as derived from Equation (
1) decreases to a level that is below the threshold as derived from Equation (
6).
The resultant combined model of DL and control theory is shown in
Figure 11. Six features were used as input to the DL model (three raw accelerometer values: x, y and z and three GPS values: longitude, latitude, and altitude). Since all these values and their combinations are personalised for each participant, training needs to take place for each participant. Testing the data iteratively enabled us to compare the prediction level for different days of the week. Since the output of the model is forecasting the nicotine effect value over time, Mean Square Error is used as the error criterion to measure the performance of the model. This evaluation matrix has been previously used to evaluate time-series data [
28]. The results of the Mean Square Error (MSE), Root Mean Square Error (RMSE), and Normalized Root Mean Square Error (NRMSE), represent the accuracy of predicted nicotine levels during week days (
Table 5) and weekends (
Table 6). In general, the model has the same performance throughout the week.
Figure 12 shows the predicted nicotine level from a randomly selected day for two participants. All 6 predictors were used as input to the system. The nicotine level was predicted during the closed-loop process; no pre-calculated data was used.
Although some smoking events were missed, the model in general reliably predicts the smoking behaviour of each of the participants. While accuracy of prediction of nicotine level is negatively affected by missed samples in the data set overall accuracy remains high.
Figure 13 shows the predicted smoking events for a randomly selected day with a high level of missed samples.
In some cases the participant was cooperative in reporting smoking events in all days accept for one day. The model predicted several smoking events for that day, and we cannot be sure whether these are unreported smoking events or false alarms (
Figure 14).
Overall, the model can predict smoking events with 0.2 accuracy in a 15 min window, 0.3 accuracy in a 30 min window, 0.5 in a 1-h window, and 0.8 in a 2-h window.
Figure 15 displaysthe ROC curve for all dataset for 15 min, 30 min, 1-h window, and 2-h window. As far as we know, there are relatively few studies which explore the possibility of using machine learning to classify the factors that lead to smoking events, and all the previous research [
7,
8] rely on self-reporting and surveys, which make it hard to compare with our research since they use different experiment settings and different inputs. Even though this research has accomplished a better overall accuracy equals to 86.6% without relying on self-reporting of predictors.