The idea of wearing on-board computing systems has been around since the 1980s [1
]. However, recent advances in wireless communication technologies, embedded systems, and the lower costs of components (e.g., batteries, processors, and sensors) have enabled these devices to become more miniaturized and mainstream [2
]. The advent of smaller and more powerful devices has enabled technology to take a more central role in everyday life and has enabled personal experiences, such as capturing photographs, videos, or tracking fitness, to become common practice. Every moment of daily life can be shared digitally and can be enriched in ways that we could not have imagined years ago [3
]. Part of such developments has been the inception of wearable devices (e.g., smartwatches, health and fitness trackers, etc.) that have exploded onto the consumer market. By 2019, Cisco predicts that there will be 578 million wearable devices globally, which is a fivefold increase from 109 million in 2014 [4
]. These devices now house a multitude of sensors that are capable of capturing a large amount and range of personal information. Consequently, with all of this data readily available, end users have become more interested in quantifying their activities through their collected personal data. It has been this interest and explosion of consumer wearable products that has paved the way for the areas of lifelogging and the Quantified Self [5
] to thrive, as users can actively track themselves over a sustained period of time [6
]. These devices flourish in the field of activity recognition, where they can function continuously without human intervention or maintenance [1
]. This area has been gaining great momentum due to the tremendous benefits that are associated with long-term monitoring and has garnered interest from researchers and clinicians [7
]. For instance, a system that monitors and gathers data, over a sustained period of time, could significantly improve the prevention, diagnosis, and treatment of several noncommunicable diseases, including obesity and depression [8
]. As the system gathers more information about the user, it can “learn” about his or her lifestyle. These patterns of behaviour can then be analyzed (see previous work [11
]) to recommend healthy lifestyle changes and users can also use this information to reflect on their levels of activity to improve their quality of life [12
One continuous trend in electronics seems to be hardware miniaturization [13
]. For example, Kryder’s law explains that physical storage size is reducing continuously and increasing in capacity in a logarithmic scale [14
]. Similar advances are also applicable in the miniaturization of sensors. These ongoing trends suggests that wearables and other ubiquitous devices either already have or soon will have enough capability to host their own data for the long term [15
]. However, such developments in capturing and storing data have outpaced current knowledge on processing this information. Whilst we have access to a number of consumer products that are capable of recording data, a major challenge in this area is processing this data to extract relevant information [7
]. As devices have become smaller and more powerful, data analysis tools have not been as fortunate and often appear rudimentary when compared to the data collection devices [16
]. Mobile devices and smartwatches are equipped with several sensors, including accelerometers, which can quantify physical activity. However, such sensors produce large sets of raw data. It is unfeasible to feed raw data or a large number of features into algorithms. Furthermore, detecting physical activity is often treated as a classification task, which is dependent on labelling data in a valid fashion, i.e., accurately representing a distinction between multiple activities [17
]. This limits real-world applicability as labels are derived offline, after the event. Furthermore, self-reports tend to overestimate the time spent in unstructured daily physical activities or momentary sporting physical activities, which are the two main aspects of human physical activities [20
]. However, clustering can be used to overcome this challenge as it significantly reduces the search space and does not require the data to be labelled via user input.
In addressing this issue, this paper presents our approach that utilizes two feature selection approaches, principle component analysis feature selection (PCAFS) and correlation feature selection (CFS), to improve the clustering of accelerometer data for the purposes of activity recognition. The motivation behind utilizing feature selection is due to the limitations of mobile and wearable devices. Feeding raw data or a large number of features into clustering algorithms is not efficient. By removing redundant features, we are limiting the search space by selecting a subset of the most important features that can be used to describe the majority of the data. Raw accelerometer data from mobile devices and smartwatches have been obtained from the publicly available Heterogeneity Human Activity Recognition Dataset (HHAR) [21
]. The main contributions of the paper are to utilize feature selection to reduce the baseline feature set by removing redundant features and to evaluate our feature selection approaches, in terms of the quality of the clustering, against the baseline using hierarchical clustering analysis (HCA), k
-means, and density-based spatial clustering of applications with noise (DBSCAN). This comparative study of feature selection and clustering has not been performed previously in this manner. This will enable us to determine how well the selected features perform within the clustering algorithms in separating instances of human activity and the performance of each clustering algorithm. The remainder of this paper is constructed as follows. Section 2
presents an overview of related work, whilst Section 3
describes the materials and methods that have been used to preprocess the data, extract and select features, as well as the clustering processes. Section 4
presents the results of the data evaluation, while Section 5
discusses the results. The paper is then concluded in Section 6
and future directions of the research are presented.
2. Related Work
As wearable devices become more widespread, it has become easier to record our activities and experiences. Recent technological developments have made the use of such devices, such as the accelerometers and heart-rate monitors, popular within the consumer market and within research domains, such as lifelogging, epidemiology, and other health-related areas [16
]. There have been many approaches within the field of wearable sensing that aim to detect human activity [7
As noted by Morales and Akopian [26
], in most instances, accelerometer signals and machine learning are used for activity detection. For instance, Qiu et al. [18
] used accelerometer data from a SenseCam and machine learning tools to automatically identify user activity. In their approach, a support vector machine (SVM) was trained to automatically classify accelerometer features into user activities (sitting or standing, driving, walking, or lying down) [18
]. Meanwhile, Uddin et al.’s [24
] wearable sensing framework utilized a nine-axis wristband to continuously monitor users’ daily activities. The data was then preprocessed and segmented before being passed to the activity recognition algorithm. In other works, Saeedi et al. [25
] developed an automatic on-body sensor platform, consisting of accelerometers and gyroscope sensors, to monitor physical activity. The k
-nearest neighbors (kNN) classifier was then used to categorize the activities and achieved an average recall of 98.41% and precision of 98.42%.
In other works, Machado et al. [7
] used unsupervised learning to detect human activity from two triaxle accelerometers placed on the waist and wrist. The results indicate that the k
-means algorithm produced highly accurate results and was able to recognize various activities, such as standing, sitting, and walking. However, this is in contrast to our approach, which uses different methods of feature extraction and selection to reduce the number of features that are being used. As the accumulation of data increases, the need to provide more accurate ways of analyzing this information is evident. As these datasets grow in size, performing complex clustering on very large datasets becomes less practical, as accuracy is compromised. Taking a different approach, Fortino et al. [27
] explored community-scale cloud-based activity recognition with their BodyCloud system. This approach is a cloud-based multitier application-level architecture that integrates the SPINE BAN middleware with a cloud computing platform. The idea is to support the rapid and effective development of community body area network applications through programming abstractions, such as group, modality, workflow, and view. However, there are some issues with relying on the cloud to process data, including bandwidth and latency (i.e., how long it takes for the system to react to an activity transition) [26
]. Nevertheless, as noted by Lara and Labrador [28
], there are still many challenges in this area, including the selection of attributes and sensors, obtrusive sensing, data collection protocols, recognition performance, energy consumption, processing, and flexibility.
This section presents the results that have been obtained from our approach to clustering the data using (1) k
-means, (2) hierarchical clustering analysis (HCA), and (3) density-based spatial clustering of applications with noise (DBSCAN). The evaluation first uses the baseline Phone
datasets, which utilizes all of generated features, to assess the algorithms performance. The experiments have then been repeated with the PCAFS datasets and the CFS datasets, which utilize a subset of the baseline features to establish if the results can be improved. We have used the internal and external measures Dunn Index (DI), distance ratio (DR), and entropy (EN) as validation mechanisms to assess the quality of the clustering algorithms with the various feature selection methods [26
]. A higher DI implies that clusters are compact and well-separated from other clusters. Distance ratio has been calculated by dividing the average distance within clusters by the average distance between clusters. This measurement is a ratio of the mean sum of squares within clusters to the mean sum of squares between clusters. Entropy is a measure of uncertainty and is the average (expected) amount of the information from an event. In assessing these approaches, high DI and DR and low EN are preferable in assessing the algorithm’s ability to separate the data. In each experiment, the k-
means algorithm uses the silhouette averages from Table 4
to denote k
. The evaluation platform that has been used was a Windows®
10 64-bit Intel®
Core ™ i7-3770 central processing unit (CPU) at 3.40 GHz with 32 GB of random-access memory (RAM). These experiments have been conducted using RStudio v0.99.903.
4.1. Determining k for k-Means Clustering
A requirement of the k-
means algorithm is that the user needs to define the number of clusters (k
) beforehand. This is the most critical user-specified parameter, with no perfect mathematical criteria [55
]. Therefore, defining k
can be challenging and may be seen as a drawback [59
], as the best number of clusters can be difficult to distinguish. However, silhouette averages are often used as a useful measurement for selecting the “appropriate” number of clusters, as it gives an idea of how well separated the clusters are [58
]. The silhouette value S(i) quantifies the similarity of an object i to the others in its own cluster, compared to the objects in other clusters [58
]. These values range from +1, indicating points that are very distant from neighboring clusters, through to 0, indicating points that are not distinctly in one cluster or another, to −1, indicating points that are probably assigned to the wrong cluster. The silhouette average (SA) is then calculated and is used as a measurement of the quality of the resulting clusters [58
]. The value of k
that has the largest SA indicates the most appropriate value to use. Using the baseline, PCAFS, and CFS datasets, the value of k
has been increased, from two to six, and evaluated using the silhouette averages (see Table 4
As previously stated, the most appropriate number of clusters (k
) to use for each dataset are determined by the largest SA. As it can be seen in Table 4
, for the baseline Phone
dataset, it is three, whilst for the Watch
dataset, it is four. In relation to the PCAFS method, it is two for the Phone
and three for the Watch
, whilst for the CFS Phone1
, and Watch5
, it is also two, whilst for Watch3
, it is three. These values of k
will be used within the results to implement the k-
means clustering algorithm. It should be noted that hierarchical clustering analysis (HCA) and density-based spatial clustering of applications with noise (DBSCAN) do not require k
to be specified.
4.2. Results of the Phone Datasets
presents the results of the Phone
datasets using the validation measures above and the baseline feature set, as well as the reduced PCA and CFS feature sets.
The results from Figure 6
illustrate that, using the baseline feature set, k-
means produced the highest DI (0.8460), whilst HCA produced a slightly better DR (0.2705), and DBSCAN produced the lowest entropy (0.0377). However, when the feature set is reduced, these results improve upon the baseline. Looking to the PCAFS approach, k-
means produced the highest DR (0.2866), whilst HCA performed better in terms of higher DI (2.3123), and DBSCAN again produced the lowest EN (0.0332). Using the CFS approach, HCA produced the best results using Phone1
, which obtained a higher DI (9.7001) and DR (0.0028). Overall, the CFS feature selection approach and the HCA algorithm produced the best DI (9.7001) and EN (0.0187). This implies that these clusters are quite compact, well separated, and that uncertainty is reduced. DBSCAN appears to perform the worst.
4.3. Results of the Watch Datasets
presents the results of the Watch
datasets using the validation measures above and the baseline feature set, as well as the reduced PCA and CFS feature sets.
The results from Figure 7
illustrate that, using the baseline feature set, DBSCAN performed better in terms of higher DI (0.5804) and lower EN (0.0556), whilst HCA had a higher DR (0.2767). However, reducing the feature set has again improved these results. Using the PCAFS approach, HCA produced the highest DR (0.3449) and DI (2.8879), whilst DBSCAN produced the lowest EN (0.0524). Using the CFS approach k-means and HCA seemed to perform the best using Watch1, which obtained equally high DIs (5.1438). Overall, the CFS feature selection approach produced the best results in terms of high DI (5.1438) using both k-means and HCA, a high DR using DBSCAN (0.3452), and low EN using k-means (0.0515). In summary, the feature selection algorithms have significantly improved upon the baseline results. In particular, the CFS approach has performed the best across both datasets using k-means and HCA. For instance, the results illustrate that, using the Phone dataset, CFS and HCA produced the highest DI (9.7001) and lowest EN (0.0187). Meanwhile, using the Watch dataset, CFS and k-means produced the highest DI of 5.1438 and lowest EN of 0.0515. As a higher DI implies that clusters are compact and well-separated from other clusters, it would appear that the Phone dataset outperformed the Watch dataset. DBSCAN appears to perform the worst using both sets of data.
Since the CFS approach, in conjunction with HCA, produced the best results using Phone1
, simplified visualizations have been produced to reflect the classes with which the clusters are associated (see Figure 8
As observed by James et al. [67
], when interpreting dendrograms, “observations that fuse at the very bottom of the tree are quite similar to each other, whereas observations that fuse close to the top of the tree will tend to be quite different”. Therefore, as can be seen in Figure 8
a, when using a smartphone for activity recognition, standing and walking are the most similar to each other, whilst walking down the stairs and biking are quite distinct to other activities. Figure 8
b illustrates that when using a smartwatch for activity recognition, sitting and standing are the most similar, whilst walking and going up the stairs are quite distinct from other activities. It can be inferred that this could be due to the placement of the sensors. For example, a smartwatch around the wrist will pick up hand movements as people move their hands as they walk and go up the stairs. This signature is quite unique to each individual. A smartphone, on the other hand, may be placed in the subject’s pockets and would not be subjected to this type of movement.
4.4. Efficiency Analysis
In terms of energy efficiency, a comparison between the baseline, PCAFS, and CFS approaches has also been undertaken for both datasets to determine the time it takes to cluster the data. The results have been presented in Table 5
and Table 6
As can be seen in Table 5
, reducing the number of features utilizing the PCAFS and CFS approaches has significantly decreased the processing time of most of the algorithms, apart from DBSCAN, for the Phone
The CFS approach, in conjunction with the Phone2 dataset, produced the fastest time overall utilizing k-means (0.36 s). In comparison to the baseline result (1.23 s), this represents a 71% improvement in processing time. However, overall DBSCAN, in conjunction with the CFS Phone1 and Phone2 datasets, produced equally slow times (182.60 s and 181.60 s, respectively) than the baseline (150.48 s), which is a decrease of 21%. DBSCAN, in conjunction with the PCAFS dataset, also resulted in an 18% longer processing time. Nevertheless, apart from DBSCAN, k-means and HCA produced faster times, compared to the baseline, when the reduced PCAFS and CFS datasets have been used. Compared to the baseline, utilizing k-means with (1) PCAFS produced an improvement of 68%; (2) CFS Phone1 resulted in an improvement of 59%; and (3) CFS Phone2 produced an improvement of 71%. Compared to the baseline, utilizing HCA with (1) PCAFS produced an improvement of 28%; (2) CFS Phone1 resulted in an improvement of 26%; and (3) CFS Phone2 produced an improvement of 23%.
As can be seen in Table 6
, again reducing the number of features utilizing the PCAFS and CFS approaches has also significantly decreased the processing time of the algorithms for the Watch
illustrates a comparison between all the clustering algorithms using the Watch
datasets. As can be seen, the CFS approach, in conjunction with the Watch1
dataset, produced the quickest time overall using k
-means (0.11 s). In comparison to the baseline result (0.25 s), this represents a 56% improvement in the processing time. However, overall DBSCAN again performed worse using the CFS Watch1
and CFS Watch2
datasets, which produced an equally slower time of 3.04 s, compared to the baseline, which was 2.94 s. This represents a 3% decrease in processing time. Nevertheless, utilizing DBSCAN with (1) PCAFS produced an improvement of 26%; (2) CFS Watch3
produced an improvement of 33%; (3) CFS Watch4
produced an improvement of 32%; and (4) CFS Watch5
produced an improvement of 34%. Similarly, k-
means and HCA produced faster processing times, compared to the baseline, when the reduced PCAFS and CFS datasets were used. Compared to the baseline, utilizing k
-means with (1) PCAFS produced an improvement of 48%; (2) CFS Watch1
resulted in an improvement of 56%; (3) CFS Watch2
produced an improvement of 44%; (4) CFS Watch3
produced an improvement of 52%; (5) CFS Watch4
produced an improvement of 28%; and (6) CFS Watch5
produced an improvement of 32%. Compared to the baseline, utilizing HCA with (1) PCAFS and CFS Watch3
produced an improvement of 18%; (2) CFS Watch1
resulted in an improvement of 22%; (3) CFS Watch2
produced an improvement of 15%; and (4) CFS Watch4
and CFS Watch5
produced an improvement of 21%.
Overall, these results demonstrate that reducing the datasets and utilizing k-means with the CFS Phone2 dataset, which used the features Entropy/Variance, and the Watch1 dataset, which used the features Entropy/STD, seems to produce the most efficient results.
Smartphone and wearable devices have powerful sensing capabilities that can quantify human physical activity. However, due to the energy limitations of mobile/wearable devices, when performing such analysis, it is unfeasible to feed raw data or a large number of features into clustering algorithms. In addressing this challenge, the objective of this research has been to posit our approach that utilizes feature selection in order to improve the clustering of accelerometer data for the purposes of activity recognition by removing redundant features, which also reduces the computation burden that is associated with processing large sets of data. Raw accelerometer data have been obtained from the Heterogeneity Human Activity Recognition Dataset (HHAR) [21
], which comprises of two real-world datasets that contain data from eight smartphones and four smartwatches from participants who undertook a variety of physical activities. The baseline datasets contained eight features, which provided a solid foundation to improve upon. Introducing our feature selection approach has significantly improved the initial baseline efficiency, whilst also improving the accuracy and quality of the clusters. This is because feature selection has reduced the search space from eight to two features, which has improved the quality and efficiency of the clustering results. The results illustrate that the CFS feature selection approach performed better against the baseline and PCAFS approaches. This could be because the CFS approach is statistically significant, as the approach selects a subset of x
features that are completely independent (i.e., uncorrelated) from each other [68
]. However, the PCAFS approach relies on data that is linearly correlated in order to find the linear combination of the original variables. Furthermore, the accuracy of the bi-plots is questionable.
In terms of clustering performance, overall HCA was best at forming clusters that are compact and well-separated from other clusters, as both the Phone and Watch datasets had higher DI results, 9.7001 and 5.1438, comparatively. However, k-means performed marginally better in terms of better efficiency, having a higher distance ratio (using the Phone dataset) and lower entropy (using the Watch dataset), whilst DBSCAN performed the worst. Nevertheless, k-means was more efficient. On average, k-means performed 66% faster utilizing the reduced PCAFS and CFS Phone datasets and 43% faster utilizing the reduced Watch datasets, compared to the baselines. However, there is a tradeoff between overall clustering accuracy and time performance. The results illustrate that using HCA in conjunction with the CFS approach produced better results in terms of clusters that are compact and well-separated from other clusters. Using the Watch datasets, k-means in conjunction with the CFS produced the fastest results, whilst k-means and HCA produced equally high DIs (5.1438). Nonetheless, a limitation of k-means is that a user-defined value of k must be supplied. This can be problematic in the real-world when k is often unknown. Overall, whilst HCA produced acceptable results, it was not the fastest algorithm. Nevertheless, it could be more beneficial to use HCA and the CFS feature selection approach, as it produces high reproducibility and is less susceptible to noise and outliers. A further benefit of using HCA over k-means is that the k parameter (number of clusters) does not need to be specified.
Although this work has been carried out on traditional workstations, ongoing trends in the performance of ubiquitous devices suggest that in the future these devices will have enough capabilities to host their own data for the long term and perform data analysis, indicated in the pipeline in Figure 1
, on the device [15
]. Evidence of this trend is the increase of on-device machine learning chips, such as Apple A11 [69
] NVIDIA Jetson TX series card [70
However, going forward, we believe there is a need to perform the preprocessing stage of our pipeline in Figure 1
online, such as in the cloud, in order to prepare them for on-device analysis. This is due to the miniaturized size of these hardware, which have limited resources in comparison to hardware used in cloud services. In this work, improved resource efficiency is attributed to the fact that the CFS and PCAFS datasets are using a reduced number of features within the clustering algorithms; therefore, fewer resources are being used but accuracy is not compromised.
This is a very important contribution of the work because accuracy and performance have improved when the number of features has been reduced. This work will have an impact on energy efficiency and resource use because less data is being fed into the algorithms. Furthermore, the comparative study of feature selection and clustering with the specific algorithms used has not been performed previously.
These results are very promising and demonstrate the validity of our approach. Our approach is related to Stisen et al. [21
], who clustered based on devices that have recorded the data and treat quantifying human activity as a classification problem. However, we have extended this by incorporating feature selection and reporting on the efficiency of this approach. Similarly, Zhang and Sawchuk’s [71
] feature selection approach appears to be more computationally expensive. In comparison, our approach uses PCA and correlation to reduce the feature set, whilst improving accuracy and computational times. The use of feature selection is a viable approach to analyzing physical activity data because as these vectors increase in size, feature selection ensures that a large amount of data can be reduced without compromising the clustering results.
The results of the clustering approach proposed in the paper can then be used for applications related to activity recognition by incorporating them into a feedback system via multivariate visualizations so that the user can see how often they are active/inactive and the context behind those times. Furthermore, as the system obtains more data, recommendations can be provided to improve the user’s health. For instance, weekly alerts and suggestions can be displayed on users’ smartphones/smartwatches that summarize their activity and can prompt them to engage in more physical activity. For instance, the UK National Health Service (NHS) recommends that adults should engage in a minimum of 150 min of moderate aerobic activity per week [72
]. Therefore, if more than the recommended threshold has been achieved, that week will be tagged as an “active” week. However, if their data indicates that they are becoming more inactive and sedentary, then a negative indication would appear, and some activity changes could be suggested, which can be alerted to them via a visualization.
Within our mobile and wearable devices, we have access to a number of data sources which can be stored and amalgamated to provide more context to our activity-related data. For instance, when clusters are formed, the timestamp of the data from each cluster can be used to search the user’s other data and pull data from that specific time into a temporal location, which can be displayed to the user. Whilst simply logging acceleration data is a good starting point to gather activity data, a significant drawback is the ambiguity of this type of data. However, combining various other pieces together (e.g., location/photographs) enables us to add context to our activities. This information is very useful in quantifying our behaviors as it provides context for the activity. Context plays a significant role because in the case of monitoring activity, this data can be amalgamated and clustered to discern periods of activity and inactivity, which can be reflected upon at a later time and the context behind those times can emerge. As the user logs more information and the system accumulates more data, it will become more intelligent and lifestyle recommendations can begin to emerge. Furthermore, when reflecting upon our years of data, using the methods described above, periods of high and low intensity can begin to emerge. We can see that during certain times, clusters of lower intensity activities are greater than the higher intensity. Without defining a single query, users are able to see that they have not been very active. Seeing this larger cluster could be enough motivation to change their behaviour (e.g., taking up a sport). This can be used to reduce obesity levels and to encourage users to leader a healthier lifestyle (see previous work [73
]). It is also important for the system to be able to generalize the approach as each user is different. This has been demonstrated in this paper, as data has been collected from six different devices (see Table 1
). This demonstrates the ability to generalize across different types of smartphones/smartwatches that are operating on different frequencies. In a new dataset, with new subjects, if the same type of data was collected (i.e., acceleration), then the same features would be tested, as these have proven to be the best features for this type of data.
This work has utilized two datasets that focused on physical activity data acquired from accelerometers and has posited our approach for analyzing raw accelerometer data to detect physical activity. In this sense, the approach is able to reduce a very large set of raw data to learn about the user and to separate instances of activity to determine the user’s level of activity during a given period. In achieving this, the methodology that has been used to preprocess raw accelerometer data has been discussed. Features have then been extracted and analyzed using PCA and correlation matrixes. From this analysis, we have concluded that the optimal number of features to use for both datasets is two. From the PCAFS analysis, the best features to use are Mean, RMS, and Median. Meanwhile, from the CFS analysis, we have concluded that the majority of features, such as (variance, STD), (mean, RMS) (median, RMS), and (median, mean), have almost total positive correlation of 0.97–0.99 across both datasets. Therefore, the data have been clustered using only features with a correlation of 0, including (entropy, STD), (entropy, variance), (median, entropy), (mean, entropy), and (RMS, entropy). Using these reduced sets of features, we have then clustered the data using k-means, HCA, and DBSCAN. The results demonstrate that the quality of the clustering was improved using the CFS approach in conjunction with HCA. The results also demonstrate that compared to the baseline, on average, k-means performed 66% faster utilizing the reduced PCAFS and CFS Phone datasets and 43% faster utilizing the reduced Watch datasets. It is important to note that we are not proposing a holistic clustering approach for all smartphone/smartwatch data. Instead, this paper has aimed to recommend the most energy-efficient clustering approaches that can be used to assist developers in their applications.
These results have demonstrated that feature selection is an avenue worth pursuing and that the accuracy in recognizing human activity and clustering this data have proven to be a viable method of analyzing this data. The results demonstrated in this paper can then be extended into a feedback system to provide recommendations to increase activity. The objective of this research was to demonstrate how activity data could be analyzed using feature selection and clustering techniques. In doing so, one of the main objectives was to compare the feature selection approaches against clustering algorithms. Future work would consider integrating this approach into a mobile interface that could provide real-time feedback to the user on their levels of physical activity. It would also be interesting to then undertake user studies to evaluate the effectiveness of such an application in promoting physical activity.