1. Introduction
Mobile devices (MD) are popular because of their portability, robust connectivity, and capability to house and operate different third-party applications. In their normal course of activities, MD users may connect to several cloud services and thus become part of the mobile cloud computing (MCC) environment. MCC refers to the concept of providing users with flexible, reliable, and anywhere-and-anytime access to cloud-stored data and other cloud services. It combines the powerful resources of cloud computing (CC) and wireless network technologies [
1]. However, some of the connected networks forming the mobile cloud may not be secure and may expose the applications (apps) that reside on the MD to security threats (e.g., breaches of data integrity, confidentiality, and service availability [
2]). In addition, when personal MDs are used in the workplace, there is a possibility of introducing security threats to corporate networks [
3].
Furthermore, many of the apps installed on MDs are inherently risky due to the broad range of permissions they request to operate, such as access to user location data, user contacts, or photo galleries. Apps downloaded from reputable app markets such as Apple’s App Store and Google Play may not be intentionally malicious, but if excessive permissions are granted, personal information, sensitive data, and user privacy can nevertheless be compromised [
4]. For example, when an end-user MD is part of an Internet of Things (IoT) ecosystem, a malicious app that resides on one device may negatively affect other connected devices. The threats to the IoT environment related to mobile app vulnerabilities have been mostly associated with devices using the Android mobile operating system [
5] due in part to the open publication policy that allows users to download apps from both official and unofficial market stores. In turn, malware developers have shifted their attention to targeting apps that can be deployed on such open platforms [
6].
Android app developers do not always consider the potentially harmful effects of requesting multiple permissions for effective app operation, and more specifically, how the requested permissions can be manipulated and misused to breach user privacy [
7]. In more recent Android versions, the permission system affords users a certain degree of control over granting permissions by allowing it to be done at runtime rather than during the installation of the app. Nevertheless, granting permissions at runtime does not solve the problem of malicious apps gaining access to sensitive personal information, as users may not have the knowledge required to grant only the permissions necessary for a particular app to operate. For example, a game app may request permissions to access user location data and to read the user’s contact list. If granted, these privileges may lead to a privacy breach [
8].
Inter-app communication channels may also pose a risk. For example, apps that appear benign on their own may become capable of performing a malicious task when working together (malware collusion). Such potentially dangerous apps may be hard to detect [
9].
While earlier research in app security investigated how to distinguish between malicious and benign apps [
10,
11,
12,
13,
14], more recent models and methods focus on evaluating the potential harmfulness of an app rather than classifying it as malicious or benign. For example, Feng et al. [
4] used app permissions and descriptions to determine the “riskiness” of an app. Similarly, Wang et al. [
15] proposed a framework that quantified app riskiness based on the permissions requested.
Alshehri et al. [
7] developed a model that measures the security risk of Android apps based on the permissions that the user allows. The model (named PUREDroid) estimates the magnitude of the damage that might occur as a result of excessive permission-granting. For each app resident on the device, PUREDroid first creates two orthonormal state vectors representing the permissions the app has and has not requested. Then it determines the risk score of each app based on the number of times other known benign and malicious apps have also requested the permissions requested by the app. Higher-scoring apps are deemed potentially malicious. However, the accuracy of the model is relatively low. Benign apps that request excessive permissions will also receive a high risk score and will be deemed potentially malicious.
Also, based on app permission analysis, Rashidi et al. [
16] proposed a risk assessment model named XDroid that monitors the resource usage of Android devices. Adopting a probabilistic approach (hidden Markov model), XDroid analyzes app behavior and performs an adaptive assessment of the apps residing on the device. Users select the resources they want monitored, and the system then alerts the user of suspicious activities related to the selected resources. As XDroid relies on user decisions, a lack of user expertise may negatively affect the choice of resources to be monitored, and thus the system’s effectiveness.
The model proposed by Jing et al. [
17] helps the user understand and mitigate the security risks that are associated with mobile apps and, in particular, with Android-based apps. The model (RiskMon) computes a risk score baseline that is derived from the runtime behavior of trusted apps and user expectations. The risk score baseline results are used to evaluate actual app behavior and generate cumulative app risk scores. RiskMon increases an app’s baseline risk score every time an app attempts to access a sensitive or critical device resource. The model considers permission-protected resources only, assuming that these resources are reachable only if the respective permissions are granted. RiskMon reinforces resource protection by implementing automatic permission revocation that does not require user consent; this may have a detrimental effect on the effectiveness and efficiency of the MD user activities when using some services.
A comprehensive three-layer framework for assessing the risk posed by mobile apps was proposed by Li et al. [
18]. Using a Bayesian graphical model, the system conducts static, dynamic, and behavioral analyses to assess the risk that the app introduces to the mobile environment. The framework provides the user with information about apps that have lower risk profiles. However, the risk assessment is completed only after the app has been executed and the results of the analyses of all three layers have been combined into one final app risk score. Similarly, the models proposed by Kim et al. [
5,
19] consider the app’s actual behavior, but the app needs to be executed to enable risk assessment. Such an approach leaves the MD dangerously vulnerable to the risk of compromise.
Baek et al. [
20] proposed to measure the frequency of potential security events (e.g., financial loss or loss of private data) as an indicator of the riskiness of an app. They used a set of known benign and malicious apps and applied an unsupervised learning approach to create an app risk map. The device user can decide about using a particular app based on the plotted frequencies on the risk map. The model does not recommend or perform preventative actions when a risk is identified. Similarly, Kong et al. [
21] propose a comprehensive risk evaluation approach that includes gathering information about the app from several sources. However, this model also relies on user judgment when determining the risk posed by an app.
More recent research investigates how to increase the accuracy of the prediction. For example, both Urooj et al. [
22] and Boukhamla & Verma [
23] propose ensemble machine learning (ML) models that work with a wide range of static features (including app permissions and intents). Panigrahi et al. [
24] improved the feature selection process by adopting a high-performing nature-inspired approach to select the most suitable static features for the ML classification model (HyDroid). The DroidDetectMW model that was proposed by Taher et al. [
25] uses both static and dynamic features to classify apps as either benign or malicious and utilizes a multi-class ML to determine the category that a malicious app belongs to. However, the effective implementation of these complex solutions may be hampered by the limited processing power of the MD.
The research reviewed recognizes the importance of assessing the risk posed by mobile apps residing on the MD. Three major challenges to accurate app risk evaluation can be highlighted:
How to reduce or eliminate dependency on inherently unreliable user input.
How to establish the riskiness of an app without having the need to execute it first.
How to increase the accuracy of the risk evaluation so that apps are not falsely categorized as risky.
In this research, we develop and evaluate a framework for app risk assessment that addresses these challenges. In addition to using app permissions as important risk characteristics, the framework considers app intents to capture data about app-to-app communication that may aid in collusive behavior. The framework includes an ensemble ML classification model and a probabilistic app risk assessment evaluator. It does not require user input or running the app that is being evaluated. Rather, an app is assigned a risk category based on the app’s classification (benign or malicious) by the ML classifier, and the app’s probabilistically estimated riskiness. Using this two-pronged approach mitigates the risk of falsely classifying an app as malicious.
The rest of the paper is organized as follows:
Section 2 provides a description of the Android OS security mechanisms used in this research and the methods involved in data collection and analysis, and the proposed risk assessment framework and evaluation results are presented and discussed in
Section 3 and
Section 4. Directions for further research are also outlined.
2. Materials and Methods
To ensure that the app activities do not affect other apps or the performance of the MD, each app operating on an Android OS platform uses its own sandbox [
26]. This approach provides a secure environment since requests for communication with another app or for access to any sensitive resources are granted only if the app has the required permissions. Most apps require resources that are outside their sandboxes, which will be possible if permission is granted. However, this user-controlled mechanism of granting privileges may expose the MD to potentially dangerous attacks [
7]. For example, the user may unknowingly grant excessive permissions to a malware app that may use them to gain unauthorized access to sensitive personal information such as messages and call records [
4]. Therefore, app features that were used for classification and risk evaluation in this study reflect two of the security mechanisms of the Android OS: the app permissions and the app intents as declared in the app’s manifest file.
2.1. Android OS Security Mechanisms Used in This Study
App permissions protect the MD’s functional capabilities. Once permission is granted, the app can invoke an API call for the functionality it needs. A complete list of all permissions required by an app is stored in the manifest file of the app. Most of the permissions used by the app belong to one of two major categories: normal permissions and dangerous permissions. Normal permissions are deemed to pose little or no risk to user privacy and security, and by default are granted to the app without informing the user [
26]. On the other hand, for better protection, dangerous permissions (clustered into nine functional groups) are under user control, to be granted (or denied) at runtime (for Android OS 6.0 and above). However, a significant limitation of this model is the risk of granting privileges that exceed the scope of the functionality needed by the app. For example, an app might request the READ_PHONE_STATE permission; if granted, the app will have access to data such as the MD phone number, SIM (Subscriber Identity Module) serial card number, the SIM operator, and the IMEI (International Mobile Equipment Identity). Moreover, the app will also have access to all other functionalities in the same functional group such as the ability to make a phone call. The MD users may not always be aware of the consequences of granting excessive or dangerous permission [
27].
In addition to normal and dangerous permissions, a third category, signature permission, comprises permissions that protect even more sensitive functional capabilities. One example is the WRITE_SETTINGS permission, which allows an app to modify system settings. For added protection, the app needs to send an “intent” before the setting change can be authorized.
Lastly, app intents are metadata components that are readily available in the manifest file of every Android app. An intent communicates the intention of an app to perform a certain action. It is a mechanism for coordinating different functional activities, including access control to the resources used by other apps within the MD. The purpose is to prevent an app from gaining direct access to other app data without having appropriate permissions. In this way, the intent mechanism controls what an app can do after it is installed, including intra-app and inter-app communication. The intent filter that is included in the app manifest file is also used to communicate the type of intent that an app can receive [
24]. For example, an app may be designed to perform several different activities, with each activity on its own page and the user moving from one page to another. Appropriate intents will enable the passing of data from one activity to another, and from one activity component (e.g., an action button) to another component [
28]. The intent mechanism provides an added layer of protection to sensitive MD resources and functional capabilities Similarly to permissions, intents can be exploited by malware developers [
11]. For example, two apps that appear benign on their own may, in fact, be designed to communicate with each other to perform a malicious task.
2.2. Study Sample and Datasets
To build the study’s datasets, over 30,000 Android Package (APK) files were collected from the AndroZoo [
29] and RmvDroid [
30] repositories. The apps’ APK files were screened with the antivirus engines of Virus Total,
https://virustotal.com (accessed on 15 January 2022). The results were used to assign a label (benign or malicious) to each app. An app was deemed malicious if at least 15 of the VirusTotal antivirus engines flagged it as malicious, and benign if none of the antivirus engines flagged it as malicious. This resulted in a sample of 28,306 apps comprising 9879 benign and 18,427 malicious apps.
The APK files were decompiled to obtain the manifest file of each app by applying the APK Easy tool,
https://apk-easy-tool.en.lo4d.com/windows (accessed on 3 February 2022). Next, the permissions and intents of each app were extracted using custom-built software. It was established that across the collection of apps in the sample, a total of 132 unique permissions and 131 unique intents were used. Three study datasets were constructed as follows:
The dataset “Permissions” includes all apps in the sample. Each app is represented by a record that maps the use of the 132 unique permissions. The record comprises 132 binary input features (1 indicating that the respective permission is used by the app, and 0 indicating not used), and a label (1—malicious, 0—benign).
The dataset “Intents” was constructed similarly, with 131 binary input features indicating intent use and a label (benign or malicious).
The dataset “Hybrid” also includes all apps. Each app is represented by 263 features that map the combined permission and intent use, and a label (benign or malicious).
The data indicated that not all permissions and intents were highly used.
Table 1 and
Table 2 show how the 25 most used normal and dangerous permissions and intents are used by the apps in the study sample.
While both benign and malicious apps used some of the dangerous permissions, there were dangerous permissions that were used much more by malicious apps rather than by benign apps. One such dangerous permission was READ_PHONE_STATE, mentioned earlier. It was requested by 96.52% of the malicious apps in the sample. Another example is GET_TASK, which was used by 50.17% of the malicious apps. This permission enables access to all apps resident on a device. If obtained by a malicious app, it can lead to significant compromise. In contrast, only a small number of the benign apps in the study sample requested these two permissions (6.49% and 25.84%, respectively).
Similarly, another five permissions (ACCESS_COARSE_LOCATION, ACCESS_FINE_LOCATION, READ_LOGS, MOUNT_UNMOUNT FILESYSTEMS, SYSEM_ALERT_WINDOW) were used more frequently by the malicious apps rather than by the benign apps in the sample. In addition, both benign and malicious apps requested the dangerous permissions, WRITE_EXTERNAL_STORAGE and READ_EXTERNAL_STORAGE, with malicious apps still exhibiting more frequent use (91.47% vs. 63.61% and 33.42% vs. 30.58%, respectively). The dangerous permission request patterns that emerged indicated that app permissions may be used to make inferences about the potential of an app being malicious.
With respect to the usage pattern of intents in the sample, the two most often declared intents were Action_MAIN and Category_LAUNCHER, which were used both by benign and malicious apps with similar frequencies. However, there were some intents that showed relatively higher use by malicious apps compared to benign ones; for example, Action_USER_PRESENT (to know whether the MD is in use or idle) and Action_PACKAGE_ADDED (to add code to resident apps or to install a new app). While the percentage of malicious apps using these intents was not high in itself, their usage patterns provide additional insights into the app’s potential riskiness if considered in conjunction with the respective permission usage patterns.
Each of the study datasets was divided into two parts: 20% for testing and 80% for training and validation. The test datasets contained permissions and/or intents from 5661 apps (1995 benign and 3666 malicious apps). The training and validation sets contained permissions and/or intents from 22,645 apps. ML modeling was carried out using the Python programming language. The computer hardware included an Intel (R) Core (TM) i7-8700 CPU @3.20 GHz, 16 GB RAM, and a 500 GB hard disk drive.
3. Results
3.1. Risk Assessment Framework
The proposed risk assessment framework is shown in
Figure 1. It includes an initial preparatory stage and an operational stage. At the initial stage, the ensemble ML classifier is trained and tested on input data that includes apps’ permission and intent usage indicators using a suitably constructed dataset. The trained classifier resides in the mobile cloud. The dangerous permissions used in calculating apps’ risk scores are also selected at the preparatory stage, and the respective dangerous permission risk scores are calculated using a probabilistic approach (as discussed in
Section 3.3). In preparation for the operational stage, the ensemble ML classifier is downloaded and installed on the user’s MD along with the risk scores of the selected dangerous permissions and the software needed for the app risk score calculation and app risk evaluation.
At the operational stage, the app risk evaluator extracts the permissions and the intents of the apps that are resident on the MD and feeds them to the trained ensemble ML classifier. The classifier checks each resident app and classifies it as either benign or malicious, while the app risk evaluator applies the risk function to calculate the app’s risk score. For each app, the risk function considers the permissions that belong to the set of the already selected dangerous permissions. Finally, the app risk evaluator assigns a risk category to each app.
The app risk evaluator aims to provide an accurate risk assessment at all times. Whenever a resident app is updated, the app risk evaluator automatically scans and re-evaluates the updated app because of the possibility of new permissions or intents to have been added to the new version. Furthermore, when the cloud ML model is retrained on a new dataset, the updated version is pushed to the MD and the app risk evaluator re-evaluates the resident apps.
3.2. Ensemble ML Model
A series of experiments were conducted to identify the most suitable algorithms for inclusion in an ensemble classifier to build an effective ML-based app classification model. The selected ML classification algorithms (shown in
Table 3) have been used extensively in related prior research [
31].
Each algorithm was applied to each of the three study datasets. The outputs were compared to identify the best-performing classification algorithm based on the following confusion matrix:
True Positive (TP): The number of malicious apps classified correctly as malicious (in a given dataset).
True Negative (TN): The number of benign apps classified correctly as benign (in a given dataset).
False Negative (FN): The number of malicious apps classified wrongly as benign (in a given dataset).
False Positive (FP): The number of benign apps classified wrongly as malicious apps (in a given dataset.
The performance metrics used are listed below. The performance metric values obtained are shown in
Table 4.
Classification Accuracy (CA): The percentage of all correctly classified apps (TP + TN) out of all apps in a given dataset (TP + TN + FP + FN).
Error Rate (ER): The percentage of all misclassified apps (FP + FN) out of all apps in a given dataset (TP + TN + FP + FN).
Precision (PR): The percentage of the correctly classified malicious apps (TP) out of all apps classified as malicious in a given dataset (TP + FP).
Recall (RC): The percentage of the correctly classified malicious apps (TP) out of all malicious apps in a given dataset (TP + FN).
False Positive Rate (FPR): The percentage of the wrongly classified malicious apps (FN) out of all benign apps in a given dataset (TN + FP).
False Negative Rate (FNR): The percentage of the wrongly classified benign apps out of all malicious apps in a given dataset (TP + FN).
False Alarm Rate (FAR): The mean of FPR and FNR.
F-Measure (FM): The harmonic mean of PR and RC.
Consistent with the data about intent usage by malicious and benign apps in the study sample, ML models using intents as the only features performed poorly compared to models using permissions only or those using both intents and permissions. Moreover, combining both permissions and intents as features consistently produced better results compared to using permissions only in terms of accuracy (CA) and false alarm rate (FAR). These results lend support to considering both app permissions and intents in the proposed risk assessment framework.
The three best-performing classifiers in terms of CA, FAR, and false negative rate (FNR) were C2 (Random Forest), C1 (Decision Tree), and C7 (K-Nearest Neighbor). Applied to the hybrid dataset t, the top classifier (C2) achieved a CA of 94.59%, precision rate (PR) of 96.05%, and recall rate (RC) of 95.58%.
The outcomes of the tenfold cross-validation (
Table 5) indicated that C1, C2, and C7 performed better than most, apart from C9 when using the Permissions dataset having the highest maximum CA overall (96.11%). However, C2 had the highest minimum CA (91.17%) overall. When applied to the hybrid dataset, C1 and C9 achieved the same maximum CA overall (95.41%); the highest minimum CA overall also belonged to C2 (90.46%). Considering these results, C1, C2, and C7 retained their position as the top-performing algorithms amongst the ten evaluated.
The top best-performing algorithms discussed above (Decision Tree, Random Forest, and K-Nearest Neighbor) were used to construct the ensemble ML classifier. The voting approach selected was “hard voting” (i.e., simple majority voting). Hard voting is reasonably effective for models working on discrete data such as the data used in this study. In addition, we aimed to build a relatively simple model that would not create computational overload when deployed on a low-resource MD.
The ensemble ML model used the app’s permissions and intents as features. It was trained and tested on the study’s Hybrid dataset described earlier.
Table 6 presents the results along with the results obtained from applying each of the algorithms of the ensemble to the same dataset. The comparison shows that the ensemble ML model performed better than the individual classifiers. It achieved a CA of 97.40%, error rate (ER) of 2.61%, PR of 97.92%, RC of 98.03%, and FAR of 2.87%.
The ensemble ML model proposed and tested in this study outperformed the ML model considered in our earlier work [
32]. The performance metrics achieved supported the use of the ensemble ML model discussed above as the “benign/malicious” app classifier in the risk assessment framework.
3.3. App Risk Evaluator
We now introduce our app risk evaluator. It determines the risk category of an app based on the classification output of the ML model, the estimated risk scores of a selected set of dangerous permissions, and the estimated risk score of the app.
3.3.1. Dangerous Permission Risk Estimation
The risk scores of the selected dangerous permissions are estimated as follows: let
P = {
P1,
P2,
P3, …
Pn} be a no n-empty finite set containing
n dangerous permissions
Pi, where
1 ≤ i ≤ n, and for any given set of apps
X, let
αi be the ratio of the number of malicious apps in the set
X using
Pi to the number of all malicious apps in the set
X. That is,
Similarly, let
βi be the ratio of the number of benign apps in the set
X using
Pi to the number of all benign apps in the set
X. That is,
For any i in the interval [1, n], 0 ≤ αi, βi ≤ 1.
We calculate the risk score of a dangerous permission
Pi that belongs to the set
P as the value of the function
r(
i) defined by
The risk score of a dangerous permission Pi estimates the risk posed by the use of this permission based on its usage patterns across the set of apps X.
3.3.2. App Risk Score
The app risk score is estimated as follows. Let
Z be a set of apps undergoing risk assessment. For any app
z that belongs to
Z and for each dangerous permission
Pi from the set
P, we define the function
λ(
,
i) by
For any app z in
Z, let
k(
z) be the sum of the app’s
λ(
z,
i) values. For any app
z, the value of
k(
z) is an integer where
0 ≤
k ≤
n.
For any app z in
Z, we calculate the app’s risk score as the value of the function
R(
z) defined by
R(
z) is a probabilistic function with a value in the interval [
0,
1] (i.e.,
0 ≤
R(
z) ≤
1). It estimates the riskiness of the app
z based on the combined risk potential of the dangerous permissions requested by
z that belong to the set of dangerous permissions
P. The app’s risk score is used by the risk app evaluator to assign the app a risk category. The effectiveness of the approach of using the statistics of a suitable dataset to determine the riskiness of an Android app is supported by results reported in prior work [
33,
34].
3.3.3. App Risk Category
The app risk category is determined as follows: let Z be the set of all apps installed on a device, and for each app z that belongs to the set of the installed apps Z, the app evaluator calculates first the app’s risk score R(z) (i.e., the value of the risk function) using the risk scores r(i) of a known set of n dangerous permissions {Pi} where i = 1,2 …n. Next, an app’s risk category is determined as follows:
An app z that belongs to the set Z of all apps installed on a device is considered a low-risk app if, and only if, one of the following two conditions is satisfied:
The ensemble ML model classifies the app as malicious and 0 ≤ R(z) ≤ t1 where t1 is a predetermined threshold value in the interval (0, 1).
The ensemble ML model classifies the app as benign and the 0 ≤ R(z) ≤ t2 where t2 is a predetermined threshold value in the interval (0, 1) such that t1 < t2.
An app z that belongs to the set Z of all apps installed on a device is considered a medium-risk app if, and only if, one of the following two conditions is satisfied:
The ensemble ML model classifies the app as malicious and R(z) < t3, where t3 is a predetermined threshold value in the interval (0, 1) such that t2 < t3.
The ensemble ML model classifies the app as benign and R(z) < t4, where t4 is a predetermined threshold value in the interval (0, 1) such that t3 < t4.
An app z that belongs to the set Z of all apps installed in a device is considered a high-risk app if it does not meet any of the conditions above (i.e., the ensemble ML model classifies the app as malicious and R(z) > t3 or the ensemble ML model classifies the app as benign and R(z) > t4).
3.4. Risk Assessment Framework Validation
An instance of the framework was implemented to validate our approach. We used the trained ML classifier described in
Section 3.2 and the precalculated risk scores of a selected set of known dangerous permissions. The selection of the known dangerous permissions and the permission risk score calculations are described below.
3.4.1. Known Dangerous Permissions
For the purposes of the experiment, we constructed the set of known dangerous permissions
PV = {PVi, i = 1, 2…15} from the top 15 dangerous permissions P1, P2, P3…P15 listed in
Table 1. These permissions were of particular interest, as they enable access to personal user data stored in the device.
The risk scores
r(
i) (
i = 1,2…15) of the selected dangerous permissions were calculated by applying the formula in Equation (1) (
Section 3.3.1) to the apps in the set of apps
X of the study sample (described in
Section 2.2). The results are shown in
Table 7.
The dangerous permission READ_PHONE_STATE (P2) had the highest risk score (0.8460); as mentioned earlier, this permission gives access to the SIM card and the device details such as the IMEI and may enable targeted attacks. User privacy data could also be jeopardized if the dangerous permission ACCESS_COARSE_LOCATION (P3) was misused; this permission had the second-highest risk score (0.7154. The third-highest risk score value (0.6063) was that of the dangerous permission GET_TASK (P5), which provides access to the MD activity data.
The dangerous permission RECORD AUDIO (P11) had the lowest risk score (0.2115); when granted, this permission does not enable access to sensitive user data or device resources. The results were consistent with the functionalities associated with the top 15 dangerous permissions and their related permission groups.
3.4.2. Resident App Risk Assessment
To complete the framework validation experiment, we created an instance of the operational layer of the framework introduced in
Section 3.1 (
Figure 1) on an Android MD. The operational layer instance comprised a copy of the trained ML classifier described in
Section 3.2 and purpose-built software for the app evaluator described in
Section 3.3. The app evaluator used the set of known dangerous permissions and their risk scores, as described in the preceding section. The threshold values used to assess the risk of the apps resident on the MD were determined experimentally.
The validation sample of apps used was a set, ZV, of 20 apps downloaded from Google Play. By type, the apps belonged to ten different functional areas. These apps were not part of the study datasets, as they were posted to the store after harvesting the apps that were used in the study sample. The apps were downloaded and installed on the MD. The apps’ APK files were checked by VirusTotal; none of the antivirus engines raised a flag for any of the apps, so all apps were deemed to be benign.
First, the trained ensemble ML classifier was applied to the validation sample and each app was classified as benign or malicious. Next, the app risk evaluator calculated a risk score for each app using the formulae in Equations (2)–(4) (
Section 3.3.2). Finally, each app was categorized as high, medium, or low risk following the algorithm described in
Section 3.3.3. The risk assessment results are shown in
Table 8 for threshold values
t1 = 0.25,
t2 = 0.50,
t3 = 0.65, and
t4 = 0.75.
Across the set of the assessed apps, the number of dangerous permissions that the apps were requesting varied from zero to 10. Out of the 20 apps, seven apps were classified as low risk and 13 were classified as medium risk.
4. Discussion
As seen in
Table 8, none of the 20 apps in
ZV was categorized as a high-risk app. This was not unexpected, as all the apps in the validation sample were benign. The ML classifier correctly classified 18 apps as benign and wrongly classified two apps as malicious. We further analyzed these two apps (ZV7 and ZV17) to get an insight into this inaccurate prediction. It appeared that these apps used a relatively high number of dangerous permissions from the set
PV (eight and ten, respectively). However, the apps’ risk scores were between 0.25 and 0.65, hence the app risk evaluator categorized them as medium-risk apps (rather than as high-risk apps).
An app’s risk category depends not so much on the number of known dangerous permissions requested by the app but on the particular risk scores of the dangerous permissions requested. For example, apps ZV11 and ZV19 requested the same number of dangerous permissions (five) but were categorized as low and medium risk, respectively. Indeed, ZV19 requested the second highest scoring dangerous permission, P3, while ZV11 requested the lowest-scoring permission, P11.
In another example, apps ZV15 and ZV16 each requested just one dangerous permission (P1). However, P1 is a high-risk permission, second only to P2. Both apps were categorized as medium risk. Only one app (ZV5) from the seven apps (ZV1, ZV2, ZV4, ZV5, ZV11, ZV14, ZV18) that were categorized as low risk used a highly dangerous permission (P2). Generally, the medium-risk apps in the set ZV tended to use not just a high number of permissions but high-risk score permissions.
According to [
35], the activities of malicious and unwanted apps have contributed significantly to the growth of the number of attacks on mobile devices. Installing fake apps on Google Play is one of the methods used by adversaries to obtain personal data about the MD user [
36]. Indeed, it is pointed out in [
37] that uploading malicious apps on Google Play has become quite common.
The proposed risk evaluation framework addresses the concerns of researchers and industry about the need to provide effective endpoint security, including on mobile devices, and the challenges identified in
Section 1. First, the framework does not require user input. Second, it does not overload the MD, as it relies on the MCC environment to support its operations. The ML training and testing, selecting the set of dangerous permissions and calculating their scores, are processes that occur outside the MD. Third, the risk assessment is carried out without the need to run the app. In addition, the app’s intent usage allows the ML classifier to learn the inter-app communication patterns. This enables the identification of apps that may attempt to evade detection using the existing “permission only” malware detection techniques. The combined use of these two types of app features adds to the reliability of the classification output as the model learns about new approaches to malware collusion (i.e., two apps that appear to be benign on their own but may be in fact communicating with each other to perform a malicious task).
Furthermore, the framework is flexible and adaptable to changes in the environment. As shown in
Figure 1, the MDs that belong to the mobile cloud receive regular updates to maintain the app evaluator’s accuracy. This allows necessary changes such as modifying the ML model or the training and testing dataset and changing the set of known dangerous permissions to be fed seamlessly into the risk assessment software operating at the MD level. The threshold values used to determine the risk category can also be varied. For example, they can be increased for MDs whose particular use is less tolerant of risk. These mechanisms address, to a degree, the issue of maintaining the sustainability of the model [
22] given the changing threat and attack landscape.
The experimental results demonstrate the effectiveness of the proposed risk assessment approach and how combining the two risk assessment methods (i.e., the ML classifier and the app evaluator) can act as “check and balance,” leading to an appropriate risk categorization of the resident apps. Although the validation sample was very small, the results obtained indicate that the risk assessment framework is realistic and reliable.
The approach proposed and evaluated in this study compares favorably to the latest research that focuses on app risk assessment using app features. Xiao et al. [
38] determine the minimum set of permissions an app needs according to its functionality and use the difference between the minimum set and the set of actual permissions requested by the app to identify the app’s “overprivileged” (unnecessary) permissions. An app is classified as risky if one or more of the unnecessary permissions are also risky (where the “risky” permissions are preidentified by an ML model). However, the model relies heavily on the assumption that Android app developers routinely include unnecessary permissions; this trend may not be persistent. Second, the method may not be able to classify correctly malicious apps that have been constructed through repackaging benign apps.
Yang et al. [
39] also propose a risk assessment framework (PRADroid). It comprises a risk assessment matrix that considers the likelihood of an app being malicious (based on the app’s permissions) and the severity of the consequences of data leakage. The computational engine includes an ensemble model of five ML classifiers and a code analyzer that examines the information flow path from the source API to the sink API. This framework has not been implemented as a prototype.
Dhalaria & Gandotra [
40] have developed a web-based app where the user can upload an app’s APK file to check the app’s risk profile. The risk profile is determined by an artificial neural network model that uses the requested static permissions as app features. While the reported classification accuracy is relatively high and the MD is not computationally overloaded, the app operates as a standalone one and requires significant user input.
5. Conclusions
This study’s contribution is proposing an effective and feasible risk assessment framework for the protection of the data on mobile devices connected to the mobile cloud. First, to determine the risk scores of the selected dangerous permissions, the risk assessment model takes into account the relative frequency of permissions used by both benign and malicious apps in the training set. Permissions more likely to be used by malicious apps receive a higher risk score compared to permissions that are more likely to be used by benign apps This ensures the model is not subject to the bias that may be introduced if only malicious apps were used. Second, rather than classifying an app as either benign or malicious, the model categorizes the app as low, medium, or high risk. The probabilistic evaluation combines the classification output of the ensemble ML model and the app risk score calculated by the app evaluator. Both approaches contribute to an effective reduction of the false alarm rate of the risk model.
Scalability is an important consideration that has not been addressed to date given the early-stage nature of the research and the fact that the tool is still, in essence, a minimum viable product (MVP). It will need to be given attention as the system is more widely deployed; there is nothing inherent in the way the tool has been built or implemented to suggest that it cannot scale effectively given the availability of sufficient computing resources. Another aspect to consider is developing a cross-platform solution that could work for different mobile operating systems. The challenge of this approach is how to reconcile the security mechanisms and policies of the different operating systems to build a uniform product. In the case of IOS, it may also be hard to obtain the apps needed to build a substantial dataset.
Directions for further research include the development of even more accurate ML models, experimental work to establish guidelines for setting threshold values, and extending the evaluation to include dynamic features such as API calls. Another avenue is the development and evaluation of an on-demand, cloud-based risk service model for comprehensive MD risk assessment.