A Review on Scaling Mobile Sensing Platforms for Human Activity Recognition: Challenges and Recommendations for Future Research

: Mobile sensing has been gaining ground due to the increasing capabilities of mobile and personal devices that are carried around by citizens, giving access to a large variety of data and services based on the way humans interact. Mobile sensing brings several advantages in terms of the richness of available data, particularly for human activity recognition. Nevertheless, the infrastructure required to support large-scale mobile sensing requires an interoperable design, which is still hard to achieve today. This review paper contributes to raising awareness of challenges faced today by mobile sensing platforms that perform learning and behavior inference with respect to human routines: how current solutions perform activity recognition, which classiﬁcation models they consider, and which types of behavior inferences can be seamlessly provided. The paper provides a set of guidelines that contribute to a better functional design of mobile sensing infrastructures, keeping scalability as well as interoperability in mind.


Introduction
The daily activities of billions of people today rely on mediating technology in the form of smart heterogeneous personal mobile devices and other smart sensor devices. Such devices integrate a significant number of sensors, large storage capability, and high processing capability [1]. Moreover, personal mobile devices also integrate short-range wireless interfaces, such as Bluetooth and Wi-Fi interfaces, which facilitate the capability to share data among neighboring devices [2].
Such novel features and, in particular, the non-intrusive sensing capabilities of personal devices are relevant in two ways. Firstly, the sensed data can assist in raising individual awareness of the quality of life and well-being aspects, e.g., physical health conditions and individual activity awareness and management [3]. Secondly, such smart data [4,5] are relevant for assisting mobile crowd sensing services to evolve in a way that allows the network to best adjust to the topological variability that is highly tied to aspects of human routines [5]. A major applicability benefit of pervasive mobile sensing is the possibility of bringing awareness to aspects concerning individual human behaviors [6]. By understanding social interaction aspects, such as similarities in human routines, it is possible to improve individual and collective well-being and the quality of life [7,8].
Several mobile sensing platforms and studies have been trying to provide a better understanding of different dimensions of well-being using data captured with multiple sensors [9][10][11]. Such data IoT 2020, 1 can be used to perform activity recognition and to find regularities in routines [12,13]. Even though there is an increasing interest from the research community in solutions and platforms that perform mobile sensing, there is no clear understanding of how to best develop such tools: which sensors are best applicable to which type of activity detection and/or recognition; which models best suit behavior/routine learning and inference; where and when to capture data; how, when, and where to treat the data captured. Adding to this, keeping in mind that personal devices equipped with a multitude of sensors are carried around by users, there are computational aspects that arise from the distributed nature of such systems [14][15][16]. It should also be highlighted that while mobile crowd sensing solutions have been around for a decade, the evolution of the Internet of Things and its networking and computational models from edge to cloud, coupled with the most recent evolutions in machine learning (ML), brings the possibility of further exploring mobile crowd sensing in diverse scenarios related with our lives [5]. The aforementioned technological evolution and the recent COVID-19 situation again strengthen the need to re-think the decentralization of large-scale sensing platforms. This paper is focused on debating such aspects, as there is currently a major gap in terms of paths to follow concerning large-scale sensing platforms.
The review provided in this paper is based on an extensive review of papers concerning pervasive mobile sensing work focused on promoting well-being. This review comprises an analysis of papers from 2009 until 2020 based on the paper keywords (mobile sensing; cloud-edge computing; human behavior inference; activity recognition; context awareness), which reflect areas of work that are relevant to the interests of the authors and that are currently highly relevant in the context of emergency management, such as for COVID-19. Out of the papers analyzed, we have selected a relevant subset of work, described in Section 3 to provide a comparison of features in regards to simple and complex activity recognition for social interaction and well-being promotion. The selection of this subset of work was done based on the following criteria: (i) The work provides extensible open-source software; (ii) the work has been described in peer-reviewed publications with a high impact factor; (iii) the work has been applied in studies and the data are available.
The paper's contributions are three-fold. Firstly, it provides a fresh look into the most relevant mobile sensing networking solutions used to bring awareness about different aspects of human routine behavior (e.g., activities, likes and dislikes, mood). The study of these solutions and their implications for the network is relevant. Their study is also pertinent in the context of improving the quality of life and social well-being. Secondly, it provides a comparison of features, such as the type of sensing, sets of sensors used vs. recognized activities, and applied classification models. Thirdly, it considers challenges that future mobile crowd-based sensing frameworks should overcome.
The remainder of the document is organized as follows. Section 2 goes over work that has studied mobile sensing networking solutions that attempt to capture and to model human routine properties. Section 3 discusses the types of activities that such platforms can do today, the sensing approaches they rely upon, and the classification models used. Challenges that future frameworks need to work on are discussed in Section 4. Section 5 provides recommendations to assist the future development of large-scale sensing platforms focused on behavior inference. Section 6 concludes this work.

Related Work
A first line of related work concerns surveys focused on sensing platforms that track human routine aspects. In this context, Lane et al. provide a survey on sensing algorithms, systems, and applications. Their survey formulates a first architectural framework for discussing open issues and challenges in mobile sensing [17]. Altallah and Yang contribute with a debate on how pervasive mobile sensing could discriminate approaches to behavior pattern clustering and variability, and on the influence of interaction between people and objects [18]. In a subsequent paper, the authors survey existing pervasive sensing system solutions focused on personal healthcare solutions, e.g., elderly or neonatal support [19]. In a paper by Draguichi et al., the authors investigate the various methods and techniques for capturing crowd behavior through physical sensors focused on automatically detecting information by positioning, tracking, and measuring collections of people [20]. This line of work is focused on centralized sensing frameworks, where the captured data are centrally treated (on the cloud), while our work focuses on edge-based platforms (personal mobile devices).
A second line of related work concerns the integration of human routine aspects into pervasive mobile sensing infrastructures. The proposals of the survey of Rosi et al. are related to integrating social and pervasive sensing infrastructures [21]. In this line of work, related surveys concern the development of sensing middleware, focusing on aspects such as data capture, data treatment, and visualization. However, the contributions of our survey to the field of mobile sensing platforms concern categorization of open-source approaches focused on the inference of human behavior.
A third line of work related to this survey concerns analyses of how current mobile sensing networking frameworks perform activity recognition. That line of work categorizes recognized activities with human routine as a basis. For instance, Schoaib et al. study mobile systems developed to recognize physical activities [22]. The authors consider a characterization of work that comprises both theoretical and experimental work focused on sensor selection and resource management. Avci et al. study the application and process of activity recognition for healthcare via inertial sensors [23]. Lockhart et al. describe and categorize a variety of activity-recognition-based applications to assist in reinforcing the development of mobile sensing middleware for activity recognition [24]. Oscar et al. [25] survey the use of wearable sensors and activity recognition. The authors propose a specific taxonomy where human activity recognition systems are categorized based on response time and learning scheme. Incel et al. cover activity recognition based on sensors integrated in personal devices with a special focus on personal health and well-being [26]. Lane and Georgiev [27] applied deep learning to the inference of activity recognition. Wang et al. [8] and Nweke et al. [28] debate the application of deep learning for assisting activity recognition. The attributes considered are the specific deep learning model to apply [8,28] and the sensor type [8].
While prior work has been focused on the design of activity recognition, our work focuses on analyzing which sensors become relevant for performing different types of activity recognition and which challenges need to be further addressed, as explained in Section 4.
A fourth category of related work concerns context awareness. The application of context awareness is vast, and several related works have dealt with context awareness in ubiquitous and pervasive computing systems [29,30]. For instance, Saeed and Waheed discuss architectures that can support context-aware middleware, comparing aspects such as fault-tolerance, adaptability, interoperability, architectural style, discoverability, and location transparency [31]. Makris et al. conducted a survey on context awareness in mobile and wireless environments. The authors propose a specific context-aware abstract architecture for these specific environments [32]. Bettini et al. studied context modeling and reasoning [33]. Bandyopadhyay et al. conducted a survey on existing popular Internet of Things middleware solutions [34]. Bellavista et al. did a survey that was focused on context distribution for mobile ubiquitous systems [35].
Edge computing, also known as fog computing [36], is a set of paradigms that assist computation, networking, and storage between the edges of the network and the cloud. The main goal of edge computing is to extend the cloud's capabilities to the edges of the network, thereby supporting real-time data processing and latency-sensitive applications. In edge computing, resources are dynamically distributed across the cloud and network elements based on quality of service (QoS) requirements [37].
In this context, the mobile edge computing architecture, currently under specification by the European Telecommunications Standards Institute (ETSI) [38], provides a relevant architectural model for mobile crowd sensing platforms. Edge computing provides cloud systems that are deployed closer to the users to meet their needs regarding processing and delay with minimum help from the Internet infrastructure [39]. Edge computing can assist in lowering latency by allowing data and computation of data to be placed closer to the end user. The idea in this context is that edge devices, such as gateways, switches, and routers, can store/serve application modules before they are sent to the cloud [40].
Our survey contributes to the raising of awareness of the need to consider context awareness in order for pervasive mobile sensing tools to become more effective in the context of quality of life and social well-being.

Analysis of Selected Mobile Sensing Tools
Activity recognition via mobile sensing platforms is a relevant area being applied in diverse aspects of well-being analysis [41,42].
This section discusses selected work focused on behavior inference analysis and awareness related with aspects of social interaction. The selection methodology, which covered papers between 2009 and 2020, and which has already been described in Section 1, considered aspects such as the possibility to reuse the selected work and its scientific impact, among other aspects. A categorization is provided in terms of the sensors used and the type of recognized activity, among other parameters. Our analysis relies on multiple features that are relevant for defining a mobile sensing system intended to perform activity recognition from an end-to-end perspective. These dimensions have been selected from related work and are summarized in Table 1. The table holds the following fields: Column 1 contains the tool analyzed and the year when the tool was first released. We highlight that, over the years, several tools, such as EmotionSense and NSense, have had updates and have been used in several studies that are cited in this paper. Column 2 contains the activities, i.e., the types of activities that the tool recognizes; Column 3 contains the type, referring to an activity being simple (S) or complex (C) [43]. A simple activity can be seen as an action where there is a repeated pattern, such as walking. A complex activity involves several actions, such as cooking, driving, and biking. Column 4 holds the type of sensor(s) considered in the tool, while column 5 concerns the operating system and/or type of device. Column 6 includes the type of sensing approach followed, namely, opportunistic (O) [44] or participatory (P) [45] sensing, while Column 7 describes the classification tools relied upon. Column 8 describes the metrics used in activity recognition, while column 9 describes the type of underlying network architecture used to compute and to store the data. For instance, the tool may send all of the data to a cloud server, or part of the data can be locally classified and stored (edge).
The first middleware described in Table 1 is CenceMe (2007-2008) [46], a personal mobile sensing platform designed to capture activities, disposition (happy, sad, etc.), habits, and surrounding context (e.g., temperature). Based on a client/server model, CenceMe relies on a J48 decision tree classifier for motion detection derived from accelerometer data. A second classifier handles the location detection (indoors/outdoors) by relying on GPS data, Wi-Fi data, and Bluetooth data, such as signal strength. The surrounding context, such as temperature, is also used to infer indoor/outdoor positioning. A third classifier is applied to the classification of mobility (stationary/walking/driving). This classifier considers GPS, Wi-Fi, and Bluetooth measures in regards to neighboring nodes, e.g., changes in the number of nodes around and received signal strength. A fourth classifier is implemented to detect if a user is engaged in a conversation or not based on the microphone. Moreover, CenceMe interacts with social networking applications (presence detection), igniting the possibility of creating new associations on social networks. CenceMe is, therefore, a relevant and interesting tool, and already considers partial computation on the embedded devices, even though it is still based on a client/server architecture.
SoundSense (2009) [47] relies on the microphone to perform activity recognition derived from sound analysis. It integrates a pre-processing module that provides data source adaptation (in this case, frame adaptation), and then relies on decision tree classification to perform detection of ambient sound, music, and speech patterns. In terms of human behavior, SoundSense is a relevant tool for the detection of speech and silence only. All of the computation is locally performed on the device. EmotionSense (2010) [48] is a mobile pervasive communication platform intended to assist in mood detection derived from, among other aspects, social interaction. EmotionSense captures emotional states based on sensed data, e.g., interaction between devices (Bluetooth beacons), speech vs. silence, and measurement of activity and location. By correlating the different types of data and relying as well on feedback from the user based on surveys periodically provided by the tool, EmotionSense is an interesting tool to assist eHealth studies. In terms of activity recognition, EmotionSense has been devised to consider only simple activity recognition. Its modular design implies that it can be easily extended. As for data storage and computation, the inference is performed on the device. Inferred data and user participatory data, provided upon consent, are sent to the cloud for further emotional inference.
AndWellness (2011) [49] is an open mobile sensing platform. AndWellness captures data via multiple sensors on a mobile device, e.g., the camera, GPS, accelerometer, and Wi-Fi. It combines such data with data provided by the user (participatory sensing). The main purpose of AndWellness is to assist eHealth participatory studies, e.g., tracking of the well-being of breast cancer survivors and young mothers. AndWellness assists people in further personal behavior awareness [49]. Its architecture is based on a client-server model. On the end-user side, an application (Android) monitors daily habits and collects indicators of individual behavior. On the server side, specific campaigns can be configured. Sensors can be selected, and the types of data as well. The server stores all collected data and provides a front-end interface for the users to view results in real time. AndWellness relies on the accelerometer and GPS to recognize simple activities, such as motion and location. For motion detection, it relies on a C4.5 decision tree classifier. AndWellness is, therefore, an interesting tool to assist in longitudinal eHealth studies involving the need to provide surveys. Its classification is mostly based on the accelerometer: Indoors, it considers only the accelerometer, which is enough to detect a difference between being still, walking, or running. Outdoors, it requires GPS to further detect complex activities, such as driving and biking. As for computation and storage, AndWellness relies on the cloud.
BeWell and its successor, BeWell+ [51], are mobile sensing systems for eHealth that consider three health dimensions: sleep, physical activity, and social interaction. BeWell provides users with smart feedback on well-being, contributing to a better perception of the well-being component. Relying on three sensors, GPS, accelerometer, and microphone, BeWell performs activity recognition for mobility, sleep, and driving, as well as location and speaking/not speaking. For activities such as mobility and sleeping, it relies on a Naive Bayes classifier. For sleep detection, it provides a model derived from data entered by the user and from statistics over time (e.g., duration and frequency of sleep periods) correlated with aspects such as mobile phone charging, use of Wi-Fi, and periods of near silence. BeWell then computes scores for well-being based on cloud computing; inference of behavior is locally processed and then sent to the cloud together with data provided directly by the user. Classification is, therefore, done on the device (edge); score computation and data storage are based on the cloud.
InSense has been developed to assist in collaborative studies focused on aspects of human behavior. The middleware collects accelerometer and audio data from multiple smartphones, and then relies on cloud computing to support an analysis of similarity in terms of audio. The middleware relies on participatory and opportunistic sensing and has been applied in limited sensing studies on elderly and dementia signals. Opportunistic sensing has been used to, for instance, assist in detecting repetitive body movements, variations in walking gait, abnormal audio patterns (e.g., crying, shouting), while questionnaires were also used to evaluate aspects of social interaction. The collected data are locally filtered and then sent to the cloud, where an estimate of proximity (based on filtering of the individual audio fingerprints) is computed.
SociableSense (2011) [50] is a mobile sensing tool that infers levels of sociability derived from user habits. It relies on the classification approaches explored in EmotionSense, introducing an edge-cloud computational approach that takes into consideration energy consumption and that distributes tasks across local and cloud resources, taking into consideration network constraints and the need to provide feedback to the user in close-to-real time.
StudentLife (2014) [53] relies on opportunistic sensing to track, among other indicators, academic performance and behavioral trends, including human interaction, to infer aspects concerning mental health. StudentLife has been heavily applied in studies involving college students. StudentLife relies on multiple sensors and detects simple activities, such as movement vs. standing still, or periods of conversation vs. silence. It also provides a model for sleep detection, a complex activity. StudentLife relies on different classifiers for different activities. A decision tree model is used for detection of motion vs. standing still. A Markov model is applied to detect periods of conversation without recurring to storing data. StudentLife then relies on additional aspects to infer status, such as level of social interaction.
Sleep as Android (2015) [54] is a sleep-cycle-tracking middleware developed to analyze aspects such as sleep duration and sleep quality in a pervasive way. It relies on accelerometer measurements and recognition of movement activities. The algorithm used considers behavior learning derived from data provided by the user, such as usual sleep intervals. It also considers Google activity tracking, such as application usage, and detects "regularities", i.e., regular habit patterns (e.g., when sleep starts on specific days of the week).
NSense (2016) [55] has been developed as a tool to infer nearness levels, i.e., levels of physical and psychological proximity. NSense relies on opportunistic sensing and on a diversified set of sensors to, in a non-intrusive way, detect levels of social interaction and aspects that assist in finding habit correlations between different users. The user first configures a set of "interests", which are the basis to detect psychological proximity (similarity in interests). Such interests are directly exchanged among nearby users, if there is consent for such distribution, with the aim of fighting perceived isolation. A key difference of this tool from the others is the fact that the inference of behavior, as well as classification, is performed on the edge, i.e., solely locally on the devices.
CrowdMeter [56] is a mobile sensing tool that captures real-time congestion levels in train stations. CrowdMeter relies on sensed data provided by regular users during their daily commutes. CrowdMeter leverages the location and context data of the passenger, recognizing patterns in user behavior; for instance, walking. CrowdMeter can also sense environmental features, such as surrounding sound level. Such data are relevant for providing context to a specific routine. For instance, if the surrounding sound level is high and the user is located in a specific station, then congestion in the station can be identified.

Main Sensors Being Used to Perform Pervasive and Non-Intrusive Behavior Recognition
In what concerns hardware-based sensors, as they become smaller, more portable, and, consequently, less intrusive, there is greater use of them in the collection of data for the analysis of human behavior [57]. Non-intrusive behavior recognition is based on sensors available on common devices carried and controlled by end users. In contrast, intrusive behavior recognition is provided by sensors specifically installed to achieve a means, e.g., a biometric sensor. For instance, Servia-Rodriguez et al. [58] provided a longitudinal study based on the crossing of data collected via mobile phones (e.g., location data, microphone data, accelerometer measurements, and call/SMS logs), i.e., opportunistic sensing, as well as self-reporting (participatory sensing) assessments happening twice a day for 18,000 users; these data were collected over 3 years to predict people's moods based on their activities, sociability, and psychological dimensions (e.g., perception of health and life satisfaction). Data analysis was performed using a restricted set of Boltzmann Machines (RBMs) for mood classification. Based on the study, they concluded that humor is interconnected with people's routine and that mobile detection can be used to predict the user's mood with an accuracy of about 70%. In addition, Krupitzer et al. [59] propose a generalized self-adaptive drop detection framework robust to the heterogeneity of real-life situations and that, independently, the algorithm adapts to the inevitable changes in the position of the sensor at runtime, determining the current sensor position based on the user's movement pattern. They combine sensor data from four datasets. For fall detection algorithms, they used algorithms from other works, like Weka and configurations of Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Random Forest, and J48 decision trees, for comparison. The authors conclude that fall detection algorithms are often customized for the datasets used. Still, in the context of crowd sensing, Depatla et al. presented a crowd counting system to count the number of persons inside an area that embeds the inter-arrival times between the line-of-sight (LoS) blockages into a renewable stochastic process that models the human motion mathematically based on the Received Signal Strength Indicator (RSSI) in the through-the-wall scenario, leveraging Wi-Fi technology to count the total number of people walking inside a building [60].
Despite some advancement, the most popular sets of sensors in use are the accelerometer/gyroscope and GPS. As shown in Table 1, all of the analyzed tools, with the exception of SoundSense (2009), recur at least to two of these sensors to recognize simple activities such as walking [57,61,62]. Then, social interaction detection is often derived from the use of sensed data based on location (GPS) [63,64]. Some pervasive sensing tools, such as NSense, consider interaction based on Wi-Fi and Bluetooth data, i.e., the application of received signal strength to compute relative distance. This process is advantageous for indoor environments, as also stated in CrowdMeter [65,66].
Most cases analyzed rely on a set of one to five sensors, usually considering the recognition of simple activities; mobility is the most commonly recognized activity. A few mobile sensing tools, such as NSense, BeWell, and AndWellness, have started to address the recognition of complex activities.
Moreover, several tools combine sensor measurement with context awareness and with data provided by the user to infer more complex aspects of human behavior, such as emotions, stress, or perceived isolation (reduction in nearness levels), as occurs with EmotionSense, StudentLife, and NSense.
In recent years, tools have been using additional sensors, such as the microphone, to help correlate levels of sound with interaction, as is the case of EmotionSense and NSense, among others [67,68].

Preferred Sensing Approaches
There are currently two main approaches for sensing: opportunistic and participatory sensing [69]. Opportunistic sensing does not imply user intervention. It is based on sensors available on devices. Different sensors passively capture data. However, in participatory sensing, the user intervenes. This can be via self-reports, or via specific actions that an application requests. In both cases, prior consent is mandatory.
As shown in Table 1, all existing tools rely upon opportunistic sensing. A few tools (AndWellness, EmotionSense, SoundSense, and StudentLife) add participatory sensing, as this brings in the possibility to confirm specific aspects of data detection.
What this analysis corroborates is that opportunistic sensing is the preferred choice for solutions that concern well-being. More complete solutions can be built by combining opportunistic with participatory sensing [69][70][71].

Classification of Activities and Their Placement
While the classification of simple activities (e.g., motion, surrounding sound, conversation, proximity, location) is usually performed based on data collected from a single sensor, as explained in Section 3.1, the classification of more complex activities, such as social interaction and sleep patterns, requires data from more than one sensor. Currently, such classification is commonly based on two or three sensors, e.g., accelerometer and GPS, not due to precision aspects, but due to the ubiquity of these sensors.
For instance, BeWell and NSense classify physical activity based on decision tree models, while EmotionSense makes use of a discriminator function classifier. Sociometer makes use of a hidden Markov model to classify group interactions based on the microphone and infrared data, while NSense classifies social interaction and propinquity, i.e., "the probability of social interaction to occur" [55]. For this purpose, NSense relies on the Wi-Fi, Bluetooth, accelerometer, and microphone sensors. In what concerns sleep patterns, BeWell uses Gaussian models to derive patterns related to phone movement and surrounding sound based on data collected from the accelerometer and microphone. These sensed data are also combined with energy consumption.
Current pervasive mobile communication solutions rely on eager classification models without considering if such classification intends to focus on simple or complex activity recognition. However, eager classification models present significant limitations for operation in the fringes of the network, given that personal devices, such as smartphones, have limited resources [5]. Eager learning requires continuous sensing strategies that are able to supply a significant amount of data, which has implications in terms of processing and energy usage, for instance.
As for where such computation is being performed, all of the mobile sensing tools analyzed, with the exception of AndWellness, support classification of activities on the edge. This relates with the fact that such classification is mostly related with the recognition of simple activities. Then, behavior inference is performed in the cloud. The single exception to this is NSense, which performs both classification and behavior inference on the mobile devices.

Current Applied Classification Metrics
The adequate selection of classification models and metrics is essential for assisting prediction of human interaction patterns and habits. Prediction is relevant both to provide better feedback to the user and to increase the efficiency of platforms. For instance, by applying machine learning to Wi-Fi-derived indicators (e.g., visited networks, neighborhood density), it is possible to predict indoor occupancy [72].
Moreover, prediction of behavior patterns is useful for assessment and detection of abnormal behavior, which works as a control trigger in the system. Such a trigger can assist the system in adjusting different aspects in real time. For instance, the system may shift the set of sensors to be used to capture data [5], or the system may adjust the feedback provided to the user to prevent information overload [72]. Such an adjustment should take into consideration not only the prior digital footprint, but also external commodities (the context in time and space) that may or may not contribute to a deviation from the usual pattern modeled.
Currently, the preferred classification metrics are based on statistical properties, such as mean, variance, and standard deviation [25]. This selection is due to the ease of implementation and is not based on application or user/service requirements. However, a few solutions, such as AndWellness, consider more sophisticated metrics, such as the measurement of quality of participation in studies over time. NSense considers a measure of sociability (feedback to the user). EmotionSense also considers similar measures.
The technological evolution assists the implementation of sophisticated metrics that combine statistical accuracy with social interaction classification aspects, such as sociability levels, awareness levels, etc. It is, therefore, relevant to consider metrics that target accuracy and efficiency in terms of classification, but that incorporate interdisciplinary metrics, which are better suited to assist platforms in scaling.

Discussion: Challenges for Mobile Sensing Infrastructures
A typical modular design for pervasive mobile communication solutions can assist in developing solutions that better address human behavior aspects. A modular design also assists in service decentralization. Such a framework design is expected to have at least four functional modules [17,22]: data capture, learning, inference, and feedback (individual and collective). Figure 1 illustrates the different modules and their interactions, as well as the processes that feed each module. Figure 2 illustrates a simple taxonomy of the different blocks discussed in this paper.
The methodology followed in this section to discuss current challenges for mobile sensing infrastructures is derived from the four identified functional blocks. Data capture is backed up by an adequate sensing methodology, as discussed in Section 4.1. Learning and inference rely on contextualization and classification, respectively discussed on Sections 4.2 and 4.4. Feedback requires adequate prediction and QoS management. All of these functional modules need to take privacy and anonymity support into consideration, as discussed in Section 4.3.

Sensing
As discussed in Section 3.1, even though all of the studied solutions rely on opportunistic sensing, there is no consensus about the best paradigm to be applied for pervasive sensing systems.
Both opportunistic and participatory sensing have pros and cons concerning implementation complexity and data handling. Participatory sensing provides a user with a sense of control, and eventually with a reward [73]. The user can, therefore, control data to be shared [74]. The data provided by the user also help in fine-tuning the system. However, participatory sensing requires the sound support of recruitment strategies to collect meaningful data [75][76][77][78][79].
In opportunistic sensing, the user provides consent, but data are collected passively (in the background) based on specific application requirements, e.g., using geolocation or battery usage [80][81][82]. Furthermore, it is possible to explore network overhearing without adding network entropy (e.g., the need for probing). Therefore, opportunistic sensing lowers the need for incentives and well-thought recruitment campaigns. The downsize concerns the need to integrate mechanisms that assist in better service decentralization [83] and the need to improve the efficiency of large-scale sensing, which requires adequate security mechanisms [84,85].
As has been discussed and as is observable in the tools analyzed (rf. to Section 3.1) Still, prior studies have discussed the advantages of opportunistic sensing, i.e., how it is better suited for large-scale environments [86][87][88][89]. Independently of the data collection approach to be used, the amount of data, information, and generated knowledge is expected to be substantial, and the platforms need to be able to adequately manage the resources required for this support [90].
Moreover, the use of mobile devices and multiple sensors for activity recognition is expected to increase, as shown in the previous sections and as corroborated by the recent COVID-19 situation. Thus, the extraction of relevant data, the composition of information, and the generation of knowledge, as well as its proper representation, need to be carefully addressed. If poorly selected, composed, and generated, the underlying networked system's operation may be endangered. Moreover, user and data privacy may also be at risk. By compromising the infrastructure that supports data storage and computation, there is the risk of producing invalid results in terms of behavior, activities, and habits.

Adequate Contextualization
The environment (i.e., context) where the user finds him/herself may influence his/her well-being [91]. With automatic context recognition, it is possible, for instance, to detect abnormal patterns, e.g., isolation of a user, falls, etc. [92][93][94]. Environmental indicators, e.g., geolocation and temperature, are also relevant in assisting sensing platforms to provide a more accurate activity recognition [43].
Moreover, social context is relevant as well, as it allows one to infer aspects concerning user and device interaction over time and space [95]. For this purpose, it is relevant to define "social" context. Currently, in the context of network architectures, social context is derived from human interaction. Models for the social context are often simplified. The indicators used are, for instance, encounter duration. Human interaction can be modeled based on sociability levels [48,55]. Therefore, context awareness should take into consideration three different dimensions: physical (space, co-presence), social (embedding in groups), and relational (e.g., identification of similarities of interests or behavior patterns). According to Vaiseman et al., the context-aware component must be discrete and should not require the person to adjust their behavior so that the application succeeds on a large scale [92].
Adequate contextualization is also relevant for providing a more efficient data transmission. For instance, via context-aware mechanisms, the network can opt to keep data on the edges or send them to the cloud [96]. This can be dependent on several aspects, e.g., node availability, network status, and data types and volume. It can also be dependent on social interaction aspects [97].

Privacy and Anonymity
Any pervasive mobile communication framework needs to take privacy and anonymity into account [98][99][100]. Pervasive sensing applications require the cooperation of strangers who will not trust each other [17] and, therefore, incentives to back such schemes are essential [101][102][103].
One relevant aspect concerns the need to protect user/device privacy. Even with obfuscation techniques, the data collected via sensors can reveal, for instance, user location [104] and user habits [105]. Keeping and treating data locally is a technique that can be used to circumvent this issue. For instance, assuming a large-scale event (such as a music festival), the data to be exchanged would track an increase in, e.g., device movement or surrounding noise in a specific cluster of devices, rather than considering the devices, their users, or the conversations being held. Therefore, pervasive sensing platforms should consider data aggregation support [99,102]. Moreover, it is essential to obfuscate parameters such as device identifiers, e.g., MAC address. Furthermore, and above all, independently of data being locally stored or sent to the cloud, the user must provide his/her prior consent.
Some authors have been working on improving the privacy and anonymity of sensed data. For instance, Alsheikh et al. [106] proposed a secure framework based on incentives for crowd sensing. In this framework, the users can configure specific data anonymity levels individually. In [107][108][109], the authors presented a nonparametric privacy optimization framework with an interactive optimization algorithm to further enhance the privacy, but they did not take data fusion into consideration.

Classification
Routine modeling and an adequate identification of a digital behavior footprint across time and space require pervasive solutions to passively, and in real time, learn and adjust to the individual user's routine [5].
Prediction of behavior patterns is useful for assisting the network in better adjusting to the demand of different users. Such an adjustment should take into consideration not only the prior digital footprint [110], but also external commodities (the context in time and space) that may or may not contribute to a deviation from the usual network behavior or from human interaction patterns. For this purpose, classification models have to be applied.
Eager learning algorithms [111], such as decision trees (DT) [112][113][114] and neural networks (NN) [11,106,110,115], build explicit descriptions of target functions based on training data sets. Generalization beyond the training data is tried before receiving queries.
Lazy learning algorithms [111,116], such as the k-nearest neighbor (k-NN) algorithm, are more commonly applied in wireless sensor networks. Lazy learning algorithms store training data and wait until a query (test tuple) is performed [13,106]. Hence, this category of algorithms displays properties of low computational costs during training, but may have high computational costs at a query. In the context of online analysis of mobile sensing data where it may be necessary to continually retrain an eager learner, running a lazy learning algorithm with storage in the cloud may prove to be beneficial and improve computational performance.
Case-based reasoning (CBR) [117][118][119] stores "cases", namely, prior experiences (dataset contextualization) and the solutions for those experiences. CBR assumes that problems recur and that similar problems have similar solutions. CBR provides a simplistic way to solve its (prior) dataset of problems. CBR is often used for recommendation systems [120] and is a lazy learning method that uses the k-NN approach.
Memory-based reasoning (MBR) [106] is a lazy learning method that usually relies on k-NN to operate [121]. The process here is to store all of the training data and retrieve instances from memory that are similar to the query instance. The result is applied in the classification of a current instance. MBR differs from CBR in the fact that CBR uses some form of domain theory for the case matching and adaption process, while MBR relies entirely on similar examples from memory found in the training data and avoids the knowledge engineering phase employed by other artificial intelligence approaches. This property makes MBR a powerful tool for classification in the context of analysis of fused data from pervasive sensing devices. An advantage of using the k-NN algorithm compared to other popular machine learning algorithms is also its simplicity in understanding and implementation.
To mitigate the aforementioned limitations, researchers have been investigating methods to help in the integration of eager classification models into pervasive mobile networking architectures. For that, for instance, one possibility is to reduce resources consumed during continuous sensing activities. Alternatively, it is feasible to send data to the cloud and perform classification learning there. As an example of methods aiming to reduce resources used in sensing activities, we can enumerate hierarchical sensor management strategies, balancing the performance of applications and sensing activities, or monitoring topology changes and adapting the rate of sensing queries. In what concerns the exploitation of cloud computing for the development of pervasive mobile communication platforms, in their work on SociableSense [50], K. Rachuri et al. show that tools that rely on continuous sensing require adaptation of cloud computing for efficient data upload. SociableSense relies on classification to assist task placement on the edges or on the cloud, taking into consideration energy consumption, latency, and the resulting throughput.
In what concerns behavior inference, the standard approach for data mining until recently has been to collect raw data and send them to cloud servers, where they would then be filtered, classified, and mined to identify and analyze statistical properties, e.g., mobility or interaction (encounters, distance) patterns [122]. This process is time and resource consuming [112], and above all, it raises critical privacy issues. Hence, an important challenge to address in the context of mobile sensing networking concerns the use of data mining techniques on the edge to the detriment of or to complement sending data to the cloud [122].
Eager classification models, such as neural networks, seem better suited for crowd sensing in wireless environments [123,124]. The reason is that eager classification seem to fit devices on the edges; for instance, in personal devices such as smartphones [5]. This could be achieved via a hierarchical classification strategy. Such a strategy holds benefits in terms of data capture, as data can be kept locally instead of being sent to the cloud.
It is worth highlighting that data extracted from a single sensor (such as an accelerometer) are already often classified locally and used to recognize different activities, such as sitting, jogging, or walking. Nevertheless, concerning sensing middleware, several sensors are usually applied to perform activity recognition, as explained in Section 3. With data fusion, the classification process becomes more complex in terms of data volume and possible features. The risk for misclassification also increases due to the higher data variability. Hence, a first requirement to be able to classify data on the network edges derived from data acquired by multiple sensors is to reduce the computational cost of the required classification models. However, there are still few studies focused on this paradigm [27,125,126]. To fulfill such a requirement, it is necessary to evaluate whether or not lazy classification models suit the limitations of the edge of the network. A recent paper specifically focused on healthcare discusses an ML recommendation system on the edge, where ML and recipe search tasks are placed in the edge, thus reducing the overall latency and computational impact on mobile end-user devices [127].

Categorization of the Different Approaches
Leveraging a new generation of mobile sensing platforms that can cope with new Internet challenges, such as mobility and distributed services, requires support on different fronts, as has been debated throughout the previous sections. To better assist future work, Table 2 provides a summary of the studied work according to the area. The table starts with data capture, providing related work that has been discussed for opportunistic sensing, participatory sensing, and hybrid approaches. Learning and contextualization work is categorized based on the focus on learning of routine habits (e.g., human interaction, sociability levels, inter-contact times) and on context-awareness applicability aspects. Inference and classification work is categorized in terms of activities being recognized, placement of classification models (e.g., edges of the network), specific models being applied, and classification metrics in use. Feedback and behavior inference work is split into providing behavior awareness or supporting well-being, as well as according to aspects of inference of human behavior. Security is split into privacy, anonymity, and incentives, which are a relevant aspect for bootstrapping pervasive mobile sensing systems, particularly in large-scale environments. Last, related work that has provided contributions to the specific topic is listed.

Recommendations for Future Research
Pervasive sensing middleware is highly relevant in the context of contributions to societal well-being and quality of life, as can be seen today with, for instance, the COVID-19 situation. A common modular design for such frameworks, where the different sensors are adequately mapped to activity recognition, is relevant for creating tools that can more efficiently achieve their purpose. Overall, pervasive mobile sensing is expected to be further increased in the context of areas such as mobile crowd sensing and Internet of Things. Due to this, it is relevant to debate on how to approach future solutions, particularly those based on a consolidated view on sensing, activity recognition, and the required computational and networking support to sustain mobile sensing frameworks.
By understanding social interaction aspects, such as similarities in human routines, it is feasible to assist in improving well-being and quality of life. This paper provides a thorough review of related work focused on behavior inference and social interaction awareness. The paper then discusses the different functional modules that mobile sensing platforms for activity recognition need to consider. We have reviewed the most relevant open-source pervasive sensing solutions developed to bring awareness about different aspects of human routine behavior, and warn about the current challenges and limitations faced.
Derived from the current analysis, we propose the following recommendations for future work: • The data collected should to be kept private and anonymous, as mandated today by privacy regulations, such as General Data Regulation Protection (GDPR). This aspect requires adequate data treatment and filtering, and must ensure that feedback and visualization do not endanger individuals in any way. For that purpose, the network architecture should consider that data should be treated as much as possible in end-user devices or closer to the end-user as much as possible (edge of the network). The discussion of aspects concerning privacy and anonymity is covered in Section 4.3. • The analysis described in this paper based on the extensive related work shows that behavior inference for simple activities-as well as for complex activities, as demonstrated by the middleware NSense-can be at least partially located on the edge. Furthermore, Edge AI [134] is addressing this aspect today via the distribution of artificial intelligence applications across the cloud-edge continuum. Therefore, whenever feasible (due to the associated computational cost), classification and inference mechanisms should be made available, thus reducing the need for users to always be on. The possibility to export data should be given to the user, but not be an underlying assumption. Moreover, the selection of specific classification models needs to take data fusion into consideration. Data fusion can provide a lighter software design. Data fusion is also relevant for providing finer-grained behavior inference. Classification and behavior inference aspects and today's approaches have been discussed in Section 4.4. • Mobile sensing platforms need to be designed with the consideration of energy consumption aspects. In pervasive sensing platforms, the use of multiple sensors implies heavy energy consumption, thus limiting the potential of these solutions in large-scale scenarios. From this perspective, which has been discussed in Section 4.2, it is also important to highlight the role of opportunistic wireless routing approaches that take energy consumption into consideration [103,135,136].

Conclusions
This article provides a review of the challenges faced by large-scale mobile sensing platforms that have been devised for human activity recognition. The review integrates an analysis of selected open-source mobile sensing tools and their categorization in terms of different aspects, such as types of sensors used, computation placement, recognized activities, and types of classification. Derived from this analysis, the review provides recommendations for the future design of mobile sensing platforms, namely, aspects to assist in overcoming the identified challenges in future applications. Mobile sensing applications-and, in particular, mobile crowd sensing-are again gaining ground as a category of technology to assist with different aspects of physical and social well-being. Nonetheless, specific frameworks to provide a better design in terms of usability, behavior inference, or even the specific sensors and classifiers to apply to large-scale decentralized analysis are missing; this is, therefore, a relevant field of work for current and future research in the context of edge/cloud computing, the Internet of Things, and decentralized application architectures.