Research on Data Mining of Permission-Induced Risk for Android IoT Devices

With the growing era of the Internet of Things (IoT), more and more devices are connecting with the Internet using android applications to provide various services. The IoT devices are used for sensing, controlling and monitoring of different processes. Most of IoT devices use Android applications for communication and data exchange. Therefore, a secure Android permission privileged mechanism is required to increase the security of apps. According to a recent study, a malicious Android application is developed almost every 10 s. To resist this serious malware campaign, we need effective malware detection approaches to identify malware applications effectively and efficiently. Most of the studies focused on detecting malware based on static and dynamic analysis of the applications. However, to analyse the risky permission at runtime is a challenging task. In this study, first, we proposed a novel approach to distinguish between malware and benign applications based on permission ranking, similarity-based permission feature selection, and association rule for permission mining. Secondly, the proposed methodology also includes the enhancement of the random forest algorithm to improve the accuracy for malware detection. The experimental outcomes demonstrate high proficiency of the accuracy for malware detection, which is pivotal for android apps aiming for secure data exchange between IoT devices.


Introduction
Nowadays, smart devices have become part of the Internet of Things (IoT), and are widely used in almost every domain such as banking, shopping, social networking, etc.The adoption of the Android platform for smart devices provides a variety of services all over the world.Smart devices that use the architecture of IoT include smart TVs, refrigerators, automobiles, wrist watches, etc. Figure 1 shows some applications areas which use android apps for connecting IoT devices.In the framework of IoT, smartphones are providing numerous advantageous IoT services to clients such as cell phone's housed sensing (accelerometer, Gyroscope, compass, GPS) and connectivity alternatives (i.e., NFC, RFID, WiFi, Bluetooth).Moreover, smartphones are also used to control and monitor personal and business premises such as controlling the air temperature using Android apps or using CCTV cameras to track work progress.Consequently, in our inexorably associated society, the number and scope of Android devices keep on increasing.It is assessed that there will be roughly 6.1 billion smartphone clients by 2020 [1,2].On the other side, Android devices are an increasingly attractive target for online criminals who try to hold personal details (i.e., location, contact numbers, accounts, photos, etc.) [3].This unethical act is targeted by malicious hackers who are trying to use Android application as a tool to break into devices.In addition to this, most of the Android devices do not use anti-virus or malware detection applications [4][5][6].Additionally, to control a violation of privacy and the leakage of data such as the location of the user, contact information and smartphone certificates, hackers pose a severe threat.The previous literature reports several techniques to address the problem of malware attacks, such as a sandbox, access control, signature mechanism, authority mechanism [7][8][9].Furthermore, in [3], a M0Droi framework based on API calls has been proposed that generate signatures which are pushed to the client devices for danger identification.However, TaintDroid [10] developed an exception-based malware discovery system based on usage of application data behaviour.Moreover, Canfora G. et al. [11], designed a structure to monitor Android Dalvik activity codes regarding frequencies, which can detect malware applications.Besides, Burguera et al. [12], proposed Crowdroid framework for the client and server components.The client uses the strace mechanism of the Linux system for monitoring android system calls.Kim et al. [13], proposed CopperDroid framework, which detects the behaviour of Java code and local code execution.Although, these reported methods are very effective, yet, these methods can not be directly applied to mobile and IoT devices, because of their limited resources such as memory and power consumption.Several techniques are developed for selecting the permission [14][15][16][17] for instance information gain technique [14], permission but they considered only high-risk permission and ignored the low-risk permissions.
Permission control is a key problem in the security of the Android operating system.Android permissions enforce the restrictions on the specific operation to offer concrete security features.However, it places a significant responsibility on app developers to declare the least privileged set of permissions required by designed apps, and it is up to the app users to completely understand the risk of granting certain permissions.The Android platform provides plenty of documentation with limited information regarding permission.The lack of reliable permission information may allow app developers to request unnecessary permissions, which may lead to overprivileged applications.Additionally, unnecessary perilous permissions may lead to malware apps, which can generate permission oriented attacks.Moreover, the lack of information about the risk level of permissions confuses the users, whether to install the app or not.Currently, the Android permission privileges are not able to help the users to make correct decisions about risk of app or security [14,[17][18][19][20][21].
In this paper, we propose a methodology with improved accuracy for malware detection by extracting features based on Android application permissions.The behavioural analysis of permissions helps to detect malware and benign apps.Specifically, we adopted three data mining techniques in our proposed methodology (1) Permission Ranking; (2) similarity-based Permission feature selection; and (3) association rule for permission mining.The Permission Ranking analysis is used to rank the permissions based on their risk.The similarity-based Permission is used to collect the subsets of permissions (individual permissions and group of permissions), which cause a security breach in malware apps.Moreover, the association rule for permission mining discovers meaningful relationships between the permissions.Additionally, the Random Forest classifier is mostly used the algorithm in the detection of malware.Therefore, we improve the accuracy of the random forest algorithm for permission induced malware detection.The improved random forest iteratively removes the unnecessary features.By restricting the upper limit of random forest regarding the number of generated trees based on essential and unnecessary features.So, the improved random forest algorithm contains a reduced but most essential set of features.Furthermore, the experimental results validate the proposed approach for malware detection approach, which can effectively detect malware with more accuracy as compared with the previously reported techniques.Our contribution includes three major folds: 1.
We develop a permission-based feature selection approach using (1) Permission Ranking, (2) similarity-based permission feature selection, for the identification of an essential subset of permissions.

2.
We also evaluate the effectiveness of the mined association rules for the permission-based which improves the accuracy of prediction.

3.
Finally, we enhance the performance of random forest algorithm by iteratively remove the unnecessary features and setting the upper limit on the number of trees in the random forest to improve the accuracy and recall rate, which leads to a secure data exchange between IoT devices and Android devices.
After the brief introduction of the paper, the following sections are arranged as follows.Section 2 contains the previous reported research work on Android malware detection with two major solutions.Section 3 gives a brief overview to the proposed methodology based on extracted features from application permissions.Conducted experiments and results are discussed in Section 4. The last section concludes the contribution of paper and results.

Literature Review
In this section, we briefly review the related literature work.Firstly, we discuss the static analysis, which consists of two methods (i) Permission-based analysis (ii) API Call based analysis.Secondly, we elaborate the dynamic analysis that is used to extract the training characteristics of the model.Also, we consider the hybrid analysis that combines the static and dynamic analysis.Finally, we compare the static, dynamic and hybrid analysis.

Static Analysis
The recent use of static features of machine learning to detect Android malware include the following: Cen et al. [22] proposed a model based on API call and permissions.thismethod also utilize regularised logistic regression (RLR).Moreover, RLR was compared to different machine learning techniques such as SVM, KNN, decision tree and naive Bayes.DroidMat [23] classified the malware and benign according to the intents, permissions and API calls.For extracting static features, the author used k-nearest neighbors (k-NN) and k-means clustering algorithm.In [24], SVM classifier was developed for on-device malware detection-the proposed algorithm based on API calls, permissions and network access.Yerima et al. [9,25] used random forest ensemble learning models to detect the malware, which was based on permissions, API calls, embedded commands and intents.Wang et al. [26] applied SVM, decision trees and random forest to analyse the use of vulnerable permissions for malware detection.Varsha et al. [27] extricate static features from the manifest and application executable documents; their location strategy gave SVM, rotation forest and random forest on three datasets.DAPASA [28] used sensitive sub-graphs to construct five features depicting invocation patterns, random forest machine learning algorithm achieved the best detection performance.
Wang et al. [26], used logistic regression, linear support vector machines, random forest and decision trees.Furthermore, the author achieved the TPR (true positive rate) of 96% and FPR (false positive rate) of 0.06% with the logistic regression, which was highest as compared with other used classifiers.The dataset used in [26], was composed of 18,363 malware applications and 217,619 benign applications, to describe the particular static signature and stage particular static features.Most of the literature explores machine learning approaches for malware detection [26,[29][30][31][32][33][34][35].The feature extraction methods are shown in Figure 2. Also, Table 1 describes some malware detection methods.Furthermore, Tables 1 and 2 shows the feature sets and some static features detection methods, respectively.The k-nearest neighbours machine learning classifier achieves better performance and accuracy in the detection of the malware.However, it takes more processing time with a large amount of data.That is why most of the authors used Support Vector Machine and Random Forest classifiers.Therefore, we use and enhance the Random Forest algorithm for Android malware detection.

Permission-Based Analysis
Since the Android security model is based on application permissions, the permission set was extracted from the manifest file.Every application must have the privileges needed to access different features.During the application installation, the Android platform asks the user whether to grant the requested permissions.There are some permissions, which can be exploited by malicious applications.For example, a malicious application may use the permissions to access the SD card and the Internet, in order to access and filter sensitive information on an SD card.Our approach is to model the group of Android permissions requested by malicious applications.Therefore, we propose a method that uses the appearance of a specific privilege as a feature of a machine learning algorithm.
Among the 145 permission set, 48 permission are risky permissions which are mentioned in Table 3.There are several techniques for selecting the permissions [14,16,17] for instance information gain technique [14], and rank permission technique.Finally, We choose the 48 risky permission set from the previous literature [17].In this paper, we designed the association mining rule and ranked permission methods for the detection of the malwares.

Suspicious API Calls
The second solution is a static analysis of the source code of the app.Malicious codes usually use a combination of services, methods and API calls that are not common for non-malicious applications [12].To differentiate malicious and non-malicious applications, the Machine learning algorithms are able to learn common malware services such as combinations of APIs and system calls.Figure 3 shows the some of suspicious API calls, which are mostly used by malware applications.Figure 4 shows the extracted the features from the APK file that contains the classes.dexfile.

Dynamic Analysis
AntiMalDroid [43] was a machine learning method to extract dynamic features, which uses the detection technique based on behavioural sequences as feature vectors with SVM.Also, DroidDolphin [44] use Support Vector Machine with features that obtained dynamically.Afonso et al. [45] proposed a dynamic API calls and framework calls to track and study bolster vector machines, J48, IBk (an example based approach), BayesNet Pool, BayesNet K2, Random Forest, and Naive Bayes.
Figure 5 shows the feature extraction method and detection technique of the dynamic analysis.Many machine learning algorithms used for dynamic analysis for instance, Logistic regression (LR), K-means Clustering, SVM, KNN_E, KNN, Bayesian network (BN), and Nave Bayes.Table 4 illustrates the accuracy level, dynamic features and detection methods.

Hybrid Analysis
To improve the performance of learning algorithms, the hybrid analysis was developed, which uses the dynamic and static features as shown in Figure 6.Some researches proposed multi-classification techniques [49,50] to obtain high accuracy in the hybrid analysis.Furthermore, the static features are Publisher ID, API call, Class structure, Java Package name, Crypto operations, Intent receivers Services, Receivers, and Permission, and dynamic are Crypto operations, File operations, Network activity.The APK file extracted static features from classes.dex files, and dynamic features from Androidmanifest.xml file.Hybrid Analysis combines static features and dynamic features.These features are used to detect malicious applications.In [51], following features are selected form static ( permission and APICall) and dynamic (SystemCall).Y. Liu, et al. [51] used the SVM and Navie Bayes machine learning classifier.The SVM classifier used for static analysis achieved 93.33 to 99.28 percent accuracy, while the Naive Bayes used for dynamic analysis achieved accuracy up to 90 percent.Furthermore, Kim et al. [13], used the J48 machine learning classier, the features are selected from static (permission ) and dynamic (APICal l). A. Saracino el al. [52], achieved 96.9% accuracy based on KNN by selecting the static feature (permission) and dynamic (critical API, SMS, User activity System call) features.

A Comparison of Static, Dynamic, and Hybrid Analysis
Static Analysis: Single Category features: The advantages of single category features are easy to extract, and low power computation.The limitations associated with this method are code obstruction, imitation attack and low precision.2.
Multiple categories of Features: The advantages of multiple category features are easy to extract, and high accuracy.The limitations associated with this method are Mimicry attack, high computation, code obfuscation, and difficult to handle multiple features Dynamic Analysis: 1.
Single Category features: it poses a better accuracy and it is easy to recover code obfuscation as compared with static analysis.However, its feature extraction process is difficult, and it consumes high resources.

2.
Multiple categories of Features: It gives better accuracy and it is easy to recover code obfuscation as compared with a static and dynamic single category.The limitations of this approach are: (1) difficult to handle multiple features; (2) high resources; and (3) more time needed for computation.

Hybrid Analysis:
The main benefits of hybrid analysis are to perform the highest accuracy as compared to static and dynamic analysis.The limitations are (1) highest complexity; (2) framework requirement to combine the static and dynamic features; (3) more resource use; and (4) time-consumption.

Proposed Scheme and Methodology
In this section, we propose the malware detection methods.We adopted (i) the permission ranking-based feature selection approach (ii) the similarity based permission feature selection (iii) the association rule mining algorithm and (iv) the modified random forest classifier parameters.
The permission ranking-based feature selection approach and similarity-based permission feature selection rank the features based on frequency.The association rule mining algorithm deletes the permissions, which is common in malware and benign software.Moreover, we improve the accuracy of the random forest algorithm for permission-induced malware detection.The improved random forest iteratively removes the unnecessary features.By comparing the essential and unnecessary features, we formulate an improved random forest algorithm that contains less but most imperative features.

Permission Ranking
Each permission defines a specific action that the application is permitted to perform.For example, the permission INTERNET specifies the user can access the Internet.Different kinds of benign and malicious applications can request various permissions that correspond to their operational requests.For developing real-time Android malware detection system do not analyze all permission for the malicious application, it needs some common features of the permissions [17].
However, we pay more attention to creating permissions on high-risk external attacks and are often requested by malware samples.Therefore, malware examples rarely require permission to the good indicator of the distinction among malicious and benign applications.Thus, our methodology classifies the highly distinguishable permissions so that we can use the information for classify the malicious apps and benign apps.In addition, we exclude common permission which is used in benign and malicious application because they create ambiguity to detect the malware.For example, both malware and benign applications often request INTERNET permissions because almost all applications require access to the Internet.So our method identifies INTERNET permission.
We demonstrate the schemes which analyze the permission ranking that can be utilized to distinguish malware from benign and malicious application.Ranking is definitely not a new concept.Wang et al. [17], used the ranking-based method, but it only identifies the high-risk permissions.Previous work ignored low-risk permissions because they were interested in identifying malware abuse, and our goal is to develop ranking-based framework that can detect malicious and benign apps.In addition, in the Android platform each permission has a different operation.For instance, permission READ_SD_CARD indicates that apps has access to the mobile disk.We are focusing on the permission to create high risk of the attack by the malware apps.we are focusing to classify the malicious and benign apps for the detection of malware.
The method is based on two matrices: M specify the malware application permission list and B ij indicate the list of permissions for the benign application.
Before calculating the permission from matrices B and M, We check the size of benign and malicious applications.For instance, the quantity of benign application is much larger then malicious application due to this reason dataset is imbalanced.An imbalanced dataset can cause you a lot of frustration.If the size of benign B app is larger than malware M apps for the balance of the two matrices mentioned in the following equation * size(M j ) where P j represented the jth permission and SB(Pj) denotes the jth permission in matrix In the equation above R(P j ) denotes the rate of the jth permission.The value if R(P j ) between [−1, 1].If the value of R(P j ) = 1, that means (P j ) is malicious if R(P j ) = −1 that means benign permission.As we know malicious permission High-Risk permissions and benign low-risk permission.R(P j ) = 0, impact on (P j ) is is much less effective in malware detection.We generate two kinds of lists, one in ascending order and other in descending order.Next we recognize the frequency of top permission list used in benign and malicious applications.

Similarity-Based Permission Feature Selection
In this section, we matched the permission list P.P sb = p 1 , p 2 , p 3 , ..., p n corresponding to the marking function γ the similarity of permission shown in the equation sb ψ P j sb , P j ∈ P sb Thus, we can calculate the similarity between features We calculate the maximum similarity of the permission lists and observed the threshold.If threshold is exceed than we consider as a malware otherwise benign.In our experiment threshold and support different value is 0.05 and 0.15 respectively.

Association Rule Mining Algorithm Based on Probabilistic Model
Association rule mining is used to discover meaningful relationships between variables in huge databases.For example, if events A and B always occur at the same time, then the two events are likely to be associated.for instance, we found that many permission are always together i.e., READ_CONTACTS and WRITE CONTACTS are always used together.These dangerous Android permissions belong to Google's permission list.As we know, those permissions are always together.So we only need one of them to characterize certain behavior.We need to remove the higher association permission i.e., READ_CONTACTS.In this paper, we define association rules as those with low support but high confidence and proposed association rule mining algorithm for finding the permissions that occur together.

STEP1:
Find out the frequent two-permissions sets STEP2: Diversity-based interestingness measures for association rule using frequent two itemsets that was developed by Piatetsky-Shapiro [ end for 13: end for 14: output ← Association Rule R

Improved Random Forest Classifier
The voting decision process is an important part of the RF algorithm and it determines the final classification of the test sample.The RF algorithm adopts the principle of simple voting.Each decision tree is given the same weight, ignoring the difference between the strong classifier and the weak classifier, and affecting the overall classification performance of the random forest classifier.For this reason, this paper adopts the weighted voting principle to modify the RF algorithm to form an improved IRF (improved random forest) Algorithm 2. The Table 5 shows the list of symbols used in Algrothim 2. The architecture of training and testing shown in Figure 7 and description of the algorithm is shown in Figure 8. we have changed the few parameters of the random forest algorithm as we can see in line no.11 and 12, we tuned the parameters.The line no.11, iteratively removes the unnecessary features.By comparing the essential and unnecessary features, we formulate an improved random forest algorithm that contains less but most important features.Consequently, we enhance the accuracy of the random forest algorithm by removing the extra features and further improves the more performance in feature selection.

Algorithm 2 Modified Random Forest (IRF)
1: Grow inital forest (θ 0 ) and B 0 random tress and feature vector F 0(.) 2: An average ranking calculated weight w(.) ranked the all features F 0(.) 3: Features from the ranked list, place the top u 0 = F 0 (.) in T 0 4: Put rest V 0 = #F 0(.) −u 0 features in t 1 5: n is the number of pass.Initialize n = 0 6: for u n > f do 7: compute mean u n and standard deviation σ n of features weights in t 1 8: get rid of unimportant features, find the most informative feature set A n whose weight greater than the minimum value of the important features weight.so A n = {j}: w(j) = min ∀jεt 1 n+1 , kεt n+1 10: Find |v∆B| < lq u ∆u + lq v ∆v; B n+1 = B n + ∆B 14: Grow forest (θ 0 ) and B 0 tress and feature vector F n+1 (.) 15: Calculate Weights w(.) and ranked the all features F n+1 (.)

Experimentation and Results
To calculate the effectiveness of the machine learning classifier such as Random Forest, J48 and Naive Bayes.we select the formulas to evaluate the classifiers and effectiveness of the dataset.
TN True negative rate-the rate of malware detection recognized correctly as malicious.TP True positive rate-the rate of benign apps detection recognized incorrectly as benign.FP False positive rate-the rate of malware recognized incorrectly as malicious.FN False negative rate-the rate of benign apps recognized incorrectly as benign.
Given the number of true positives and false negatives, recall is calculated using the following formula The TPR is sometimes referred to as "sensitivity"or the "true positive rate".Given the number of true positive and false positive classified items, precision (also known as "positive predictive rate") is calculated as follows: To excavate practical significance, it can cover most Android permission models, which have their own characteristics.In this paper, our dataset composed of malware and benign application.The dataset includes 6192 benign and 5560 malware apps, collected from the Google Play Store and the Chinese App store.Figure 9 shows the frequency of the permission.Therefore, we collect dataset from multiple sources for considering the more attributes.

Ranking and Similarity Frequency of the Permissions
We generate the top 10 permission combination based on probability and negative rate.The frequency of benign and malware application with k = 1, 2, 3, 4 are shown in Figures 10-13, respectively.When k = 1, READ_SMS, SEND_SMS, WRITE_SMS and RECIEVE_SMS much more commonly than the benign applications.To detect these two types of important permissions k = 2, then two permission more frequently appear in malware (READ PHONE STATE & READ SMS, INTERNET & READ SMS).Through the combination of INTERNET and READ SMS permissions, malware can send private messages over the Internet; through a combination of READ PHONE STATE and READ SMS permissions, malware can read phone IDs such as IMSI (International Mobile Subscriber Identification) and IMEI (International Mobile Equipment Identity), as well as SMS messages, which can be used to detect mobile users and collect personal information.For larger k, the combination of permissions obtained is more intrusive.We found that the larger the value of k, the higher the precision of the permission combination to identify malware.We studied the results of k = 1, 2, 3, 4, and found that some permission are similar malicious and benign applications.While some combinations of permissions are clearly malware are requested more frequently than benign applications.Almost all combinations of permissions frequently requested by malware include SMS-related permissions, such as READ_SMS and WRITE_SMS.We suspect this is because most of the examples in malware datasets are related to SMS attacks, such as intercepting and leaking SMS messages, and sending SMS messages to advanced numbers.The top 10 petterns according to common and run time permission pattern shows in Table 6.We generate the top permission based on data mining association rule Algorithm.The permission WRITE_SMS frequently appear in benign and malware, while READ_SMS is mainly used by malware.When we delete the privilege WRITE_SMS, the application requesting permission READ_SMS is likely to be classified as malware.Furthermore, we found that WAKE_LOCK, and READ_HISTORY_BOOKMARKS and WRITE_HISTORY_BOOKMARKS, and READ_PHONE_STATE have a high occurrence of being associated.After trimming the three permission characteristics in the dataset, we only reserved some features.Then we observed that a new model using the permissions set in Table 7.We arrange that different permission ordering methods result in different sorted lists.For instance, we give up INTERNET permissions because it indicates that both benign and malicious applications usually require INTERNET.However, the mutual information-based ranking method saves the permission INTERNET, because all applications often request the permission INTERNET.Therefore, we believe that our algorithms can preserve more important permissions by deleting unimportant or irrelevant permissions.Our approach distinguish malware applications (with a higher review rate), which is necessary for malware detection.

Machine Learning Malware Detection
Our experiment uses different kinds of machine learning algorithms (i.e., naive Bayes, Random Forest iterative Random Forests and J48 decision tree).We used machine learning techniques after using the probability-based model and association-based algorithm.We evaluate the performance of the random forest algorithm regarding detection accuracy.Figure 14 shows the machine learning models; the random forest algorithm achieves more than 98% accuracy, we achieve the highest recall rate for detection the dangerous permission of Android platform.Table 8 shows the improved random forest achieved the best performance and provided the highest efficiency and lowest false alarm rate.
In Figure 15, we obtained the average processing time of the random forest, naive Bayes and J48 machine learning classifier based on the recall rates of the permissions.As shown, naive Bayes is the most less time-consuming machine learning algorithm.When using a naive Bayes with 10 significant permissions, the processing time averages only 0.1 s, compared to the improved random forest 0.24 s for the malware detection.In addition, our approach report that recall rate for detection the dangerous permission of Android platform shown in Figure 16.

Compared with Other Methods
There are several techniques developed for selecting the permission [14,16,17] for instance information gain technique [14], and rank permission technique.Based on these permission set, we designed the association mining rule and ranked permission methods for detecting the malware.
We analyse our results by comparing them with the other methods proposed by Wang et al. [17], which uses the ranking-based approach, but it only identifies the high-risk permissions and ignores the low-risk permissions.We focused on the low-risk as well as high-risk permissions.We additionally observe that, despite only a small amount of permissions, our methodology is as yet better than most existing malware scanners currently available.Some detection techniques rely only on signature, it looks for specific patterns, so if a particular type of malware signature pattern is not matched, then the system will not be able to detect the particular type of malware.
The DREBIN [24] method was used for static analysis to build datasets based on application permissions and other features.Our approach is more efficient than DREBIN when combining permission.Also, the support vector machine (SVM) algorithm was used to classify malware datasets.It is a little challenging to enhance DREBIN and SVM.Previous research [14,20,36,38] shows the Random Forest achieved the best performance in detection of malware application.
Therefore, we modify the Random Forest algorithm to achieve better accuracy as compared with a simple Random Forest algorithm.The result shows that the proposed malware detection approach can effectively detect malware with more accuracy (98.1%).Moreover, the true positive rate and false positive rate of improved random forest are, 95.5% and 4.6%, respectively.The experimental results show improved random forest algorithm is effective for the detection of malware.Besides the significance of proposed approach towards malware detection, the limitations associated with this approach include accuracy comparison with other studies and handling towards the feature hiding techniques when decompile the apk using Dex2jar.Furthermore, in future, we plan to develop a framework based on the blockchain to combine static and dynamic analysis for run-time malware detection.

Conclusions
Recently, there is an increase in the number of IoT devices connected to the Internet.Most of these IoT devices use Android applications for communication and data exchange.The permission mechanism of the Android platform restricted the access of the applications.Permission can be used as elements of Android applications to detect benign apps and malware apps.However, our work reduced the number of permission for maintaining accuracy and high effectiveness.In this work, Firstly, we adopted three data mining techniques (1) Permission Ranking; (2) similarity-based Permission feature selection; and (3) association rule for permission mining.The permission ranking analysis and similarity-based permission are used to rank the permissions based on their risk and to collect the subsets of permissions (individual permissions and group of permissions), respectively.Moreover, the association rule for permission mining discovers meaningful relationships between the permissions.Secondly, we improve the accuracy of the random forest algorithm for permission induced malware detection.The accuracy of random forest was improved by modifying selective parameters of the algorithm, (1) iteratively removing the unnecessary features, (2) by setting the upper limit on the number of trees in the random forest.The result shows that the proposed malware detection approach can effectively detect malware with more accuracy (98.1%) with true positive rate and false positive rate, 95.5% and 4.6%.Furthermore, the development of a framework based on the blockchain to combine static and dynamic analysis approaches for run-time malware detection would be a good topic for future work.

Figure 1 .
Figure 1.IoT devices Connected with Android devices and apps.

Figure 6 .
Figure 6.Dynamic Feature Extraction and Detection.

Figure 8 .
Figure 8. Flow Chart of Random Forest Algorithm.

Figure 14 .
Figure 14.Performance and comparison of machine learning classifiers.

Figure 15 .
Figure 15.Modeling time comparison of different classifiers.

Figure 16 .
Figure 16.Comparison between different machine learning classifiers on dynamic features.

Table 1 .
Static features detection methods.

Table 2 .
Overview of feature sets.

Table 3 .
Permission set mostly used in malware.

Table 4 .
Dynamic features detection methods.
Version January 9, 2019 submitted to Journal Not Specified 8 of 23 if interest(Y, Z) ≈ 0, Y andZ are commonly independent, and the common two-item sets should be rejected.-if interest(Y, Z) < 0, Y and Z are negatively correlated.STEP3: Create the association rule based on the permission shown in Algorthim 1 STEP4: Calculate probability table of the association rules.Z ∈ L2 and support(Y ⇒ Z) > minsup and con f idence(Y ⇒ Z) > minco f then -if interest(Y, Z) > 0,Y andZ are correlated positively.-

Table 5 .
List of Symbols.

Table 6 .
Permission Patterns Malware and Benign.

Table 7 .
Random Forest Based Malware Detection for Permissions.

Table 8 .
Comparison different machine learning classifiers.