An Improved Binary Owl Feature Selection in the Context of Android Malware Detection
Abstract
:1. Introduction
- A comprehensive review of the latest research for detecting malicious applications on the Android platform is presented.
- A detailed analysis of DREBIN dataset is provided.
- A modified feature selection based on a binary owl optimizer is proposed.
- A lightweight machine learning approach that is based on ensemble learning and binary Owl optimizer is proposed.
2. Related Works
3. Owl Optimization Algorithm
- Initial Population: The forest’s population of owls is represented by the initial set of random solutions. Each Owl is represented by a vector of length equal to the number of features in the search problem.
- Owl Evaluation: all Owls in the population are evaluated using a target evaluation or fitness function. The evaluation value indicates how well the solution fits the intensity of the information detected by the owl’s ear. The best Owl is the one that receives the maximum intensity in the case of the maximization problem, while the worst Owl is the one that receives the minimum intensity. The intensity information for Owl can be normalized using Equation (1).
- Update Owl Location: each Owl updates its position toward the prey. Here, the fittest owl with the highest fitness value is the prey. All the Owls update their locations according to the distance toward the prey as in Equations (2) and (3).
Modified Binary Owl Optimizer
4. Data Description
DREBIN Dataset Structure
- Feature vector folder: this folder contains the feature vector for the applications, where each application’s feature vector is saved in a separate file. Each file has been titled by the application signature in (SHA256) format. Moreover, in the feature vector folder, there are 129,013 files for all benign and malware applications. The feature vector files contain all features selected from the application (“android manifest and Dex code”) including the requested permissions, the “used permission”, URLs, API calls, etc.
- Family Labels file: this file lists all the signatures (SHA256 hash) of all applications with the corresponding family label (benign, malware family).
- Dataset Splits folder: this folder contains 10 sub-folders, each sub-folder contains training, testing, and validating files. These files only contain a list of signatures for the applications (SHA256 hash).
5. Malware Detection Model Development
5.1. An Enhanced Simplified Version of the DREBIN Dataset
- All URLs are removed from the feature vector, since every application has a unique URL referring to images or an external link.
- All features in the requested permission set that are never used and do not affect the functionality of the application.
- The requested permissions with typos. After removing the irrelevant features mentioned above, only distinct features from all files are combined to form the standard feature vector.
5.2. Feature Selection and Model Development
Algorithm 1 Malware Detection Approach Pseudocode. |
Input: Population_Size , Number of Iterations , Fitness Function Weights . Output: Global Solution |
Initialize for each Owl randomly. Evaluate_Owls by their fitness values. while () do Update Intensity Change for each Owl by Equation (6) Find the Owl distance towards the prey by Equation (5) Update Owl location toward the best Owl by Equation (7) Train the model for each Evaluate Owls by their fitness values using Equation (8). Update end while |
Evaluate_Owls : forEach forEach x in Select= [ ] if then Select.append() end forEach prediction=RandomForest.fit(train_set[:,Select], target_train).predict(test_set) Calculate Fitness Value using Equation (8) end forEach |
6. Experiments and Results
6.1. Experimental Setup
6.2. Performance Metrics
- Accuracy: Measures the number of correctly classified applications to the total number of classifications.
- F-score: F-measure is a harmony measure that take into consideration both the recall and precision.
- Precision: Measures the number of correctly predicted applications as malware to the all applications predicted as a malware.
- Recall: Measures the number of applications that are correctly predicted as malware to the number of actual malware applications.
- False-Positive Rate: Measures the rate of benign applications erroneously classified as malware.
- True Positive (TP): The quantity of malware application instances that were accurately classified.
- True Negative (TN): The quantity of benign applications that were accurately classified.
- False Positive (FP): The quantity of benign applications that were erroneously classified as malware.
- False Negative (FN): The quantity of malware application instances that were incorrectly classified as benign.
6.3. Results
7. Discussion
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Almin, S.B.; Chatterjee, M. A novel approach to detect android malware. Procedia Comput. Sci. 2015, 45, 407–417. [Google Scholar] [CrossRef] [Green Version]
- Talal, M.; Zaidan, A.; Zaidan, B.; Albahri, O.S.; Alsalem, M.; Albahri, A.S.; Alamoodi, A.; Kiah, M.L.M.; Jumaah, F.; Alaa, M. Comprehensive review and analysis of anti-malware apps for smartphones. Telecommun. Syst. 2019, 72, 285–337. [Google Scholar] [CrossRef]
- Xu, K. Advanced Malware Detection for Android Platform. Ph.D. Thesis, Singapore Management University, Singapore, 2018. [Google Scholar]
- Li, W.; Ge, J.; Dai, G. Detecting malware for android platform: An svm-based approach. In Proceedings of the 2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing, New York, NY, USA, 3–5 November 2015; pp. 464–469. [Google Scholar]
- Amro, B. Malware detection techniques for mobile devices. Int. J. Mob. Netw. Commun. Telemat. (IJMNCT) 2017, 7. [Google Scholar] [CrossRef] [Green Version]
- Truong, H.T.T.; Lagerspetz, E.; Nurmi, P.; Oliner, A.J.; Tarkoma, S.; Asokan, N.; Bhattacharya, S. The company you keep: Mobile malware infection rates and inexpensive risk indicators. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Republic of Korea, 7–11 April 2014; pp. 39–50. [Google Scholar]
- Shabtai, A. Malware detection on mobile devices. In Proceedings of the 2010 Eleventh International Conference on Mobile Data Management, Kansas City, MO, USA, 23–26 May 2010; pp. 289–290. [Google Scholar]
- Syrris, V.; Geneiatakis, D. On machine learning effectiveness for malware detection in Android OS using static analysis data. J. Inf. Secur. Appl. 2021, 59, 102794. [Google Scholar] [CrossRef]
- Feizollah, A.; Anuar, N.B.; Salleh, R.; Wahab, A.W.A. A review on feature selection in mobile malware detection. Digit. Investig. 2015, 13, 22–37. [Google Scholar] [CrossRef]
- Vishnoi, A.; Mishra, P.; Negi, C.; Peddoju, S.K. Android Malware Detection Techniques in Traditional and Cloud Computing Platforms: A State-of-the-Art Survey. Int. J. Cloud Appl. Comput. (IJCAC) 2021, 11, 113–135. [Google Scholar] [CrossRef]
- Kouliaridis, V.; Barmpatsalou, K.; Kambourakis, G.; Chen, S. A survey on mobile malware detection techniques. IEICE Trans. Inf. Syst. 2020, 103, 204–211. [Google Scholar] [CrossRef] [Green Version]
- Idrees, F.; Rajarajan, M.; Conti, M.; Chen, T.M.; Rahulamathavan, Y. PIndroid: A novel Android malware detection system using ensemble learning methods. Comput. Secur. 2017, 68, 36–46. [Google Scholar] [CrossRef] [Green Version]
- Gupta, D.; Rani, R. Improving malware detection using big data and ensemble learning. Comput. Electr. Eng. 2020, 86, 106729. [Google Scholar] [CrossRef]
- Kumar, R.; Zhang, X.; Wang, W.; Khan, R.U.; Kumar, J.; Sharif, A. A multimodal malware detection technique for Android IoT devices using various features. IEEE Access 2019, 7, 64411–64430. [Google Scholar] [CrossRef]
- Li, C.; Mills, K.; Niu, D.; Zhu, R.; Zhang, H.; Kinawi, H. Android malware detection based on factorization machine. IEEE Access 2019, 7, 184008–184019. [Google Scholar] [CrossRef]
- Karbab, E.B.; Debbabi, M.; Derhab, A.; Mouheb, D. MalDozer: Automatic framework for android malware detection using deep learning. Digit. Investig. 2018, 24, S48–S59. [Google Scholar] [CrossRef]
- Zhong, W.; Gu, F. A multi-level deep learning system for malware detection. Expert Syst. Appl. 2019, 133, 151–162. [Google Scholar] [CrossRef]
- Millar, S.; McLaughlin, N.; del Rincon, J.M.; Miller, P. Multi-view deep learning for zero-day Android malware detection. J. Inf. Secur. Appl. 2021, 58, 102718. [Google Scholar] [CrossRef]
- Rehman, Z.U.; Khan, S.N.; Muhammad, K.; Lee, J.W.; Lv, Z.; Baik, S.W.; Shah, P.A.; Awan, K.; Mehmood, I. Machine learning-assisted signature and heuristic-based detection of malwares in Android devices. Comput. Electr. Eng. 2018, 69, 828–841. [Google Scholar] [CrossRef]
- Odusami, M.; Abayomi-Alli, O.; Misra, S.; Shobayo, O.; Damasevicius, R.; Maskeliunas, R. Android malware detection: A survey. In Communications in Computer and Information Science, Proceedings of the International Conference on Applied Informatics, Bogotá, Colombia, 1–3 November 2018; Springer: Cham, Switzerland, 2018; pp. 255–266. [Google Scholar]
- Kouliaridis, V.; Kambourakis, G. A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection. Information 2021, 12, 185. [Google Scholar] [CrossRef]
- Rana, M.S.; Gudla, C.; Sung, A.H. Evaluating machine learning models for Android malware detection: A comparison study. In Proceedings of the 2018 VII International Conference on Network, Communication and Computing, Taipei City, Taiwan, 14–16 December 2018; pp. 17–21. [Google Scholar]
- Bala, N.; Ahmar, A.; Li, W.; Tovar, F.; Battu, A.; Bambarkar, P. DroidEnemy: Battling adversarial example attacks for Android malware detection. Digit. Commun. Netw. 2021, in press. [Google Scholar] [CrossRef]
- Chen, Y.C.; Chen, H.Y.; Takahashi, T.; Sun, B.; Lin, T.N. Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection. IEEE Access 2021, 9, 123208–123219. [Google Scholar] [CrossRef]
- Arif, J.M.; Ab Razak, M.F.; Mat, S.R.T.; Awang, S.; Ismail, N.S.N.; Firdaus, A. Android mobile malware detection using fuzzy AHP. J. Inf. Secur. Appl. 2021, 61, 102929. [Google Scholar]
- Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar]
- Selvaganapathy, S.; Sadasivam, S. Anti-malware engines under adversarial attacks. Int. J. Comput. Appl. 2021, 44, 1–14. [Google Scholar] [CrossRef]
- Jain, M.; Maurya, S.; Rani, A.; Singh, V. Owl search algorithm: A novel nature-inspired heuristic paradigm for global optimization. J. Intell. Fuzzy Syst. 2018, 34, 1573–1582. [Google Scholar] [CrossRef]
- Lai, G.; Li, L.; Zeng, Q.; Yousefi, N. Developed owl search algorithm for parameter estimation of PEMFCs. Int. J. Ambient. Energy 2020, 43, 1–10. [Google Scholar] [CrossRef]
- El-Ashmawi, W.H.; Abd Elminaam, D.S.; Nabil, A.M.; Eldesouky, E. A chaotic owl search algorithm based bilateral negotiation model. Ain Shams Eng. J. 2020, 11, 1163–1178. [Google Scholar] [CrossRef]
- Daniel, A.; Michael, S.; Hugo, G.; Konrad, R. Drebin: Efficient and explainable detection of android malware in your pocket. In Proceedings of the 21th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 23–26 February 2014. [Google Scholar]
- Michael, S.; Florian, E.; Thomas, S.; Felix, C.F.; Hoffmann, J. Mobilesandbox: Looking deeper into android applications. In Proceedings of the 28th International ACM Symposium on Applied Computing (SAC), Coimbra, Portugal, 18–22 March 2013. [Google Scholar]
- Arp, D.; Spreitzenbarth, M.; Hubner, M.; Gascon, H.; Rieck, K.; Siemens, C. Drebin: Effective and explainable detection of android malware in your pocket. Ndss 2014, 14, 23–26. [Google Scholar]
- Alazzam, H.; Sharieh, A.; Sabri, K.E. A feature selection algorithm for intrusion detection system based on pigeon inspired optimizer. Expert Syst. Appl. 2020, 148, 113249. [Google Scholar] [CrossRef]
- Alazzam, H.; Alsmady, A.; Shorman, A.A. Supervised detection of IoT botnet attacks. In Proceedings of the Second International Conference on Data Science, E-Learning and Information Systems, Dubai, United Arab Emirates, 2–5 December 2019; pp. 1–6. [Google Scholar]
- Stiborek, J.; Pevnỳ, T.; Rehák, M. Multiple instance learning for malware classification. Expert Syst. Appl. 2018, 93, 346–357. [Google Scholar] [CrossRef] [Green Version]
- Surendran, R.; Thomas, T.; Emmanuel, S. Gsdroid: Graph signal based compact feature representation for android malware detection. Expert Syst. Appl. 2020, 159, 113581. [Google Scholar] [CrossRef]
- Fan, Y.; Ye, Y.; Chen, L. Malicious sequential pattern mining for automatic malware detection. Expert Syst. Appl. 2016, 52, 16–25. [Google Scholar] [CrossRef]
- Chandak, T.; Shukla, S.; Wadhvani, R. An analysis of “A feature reduced intrusion detection system using ANN classifier” by Akashdeep et al. expert systems with applications (2017). Expert Syst. Appl. 2019, 130, 79–83. [Google Scholar] [CrossRef]
- Yusof, M.; Saudi, M.M.; Ridzuan, F. A new mobile botnet classification based on permission and API calls. In Proceedings of the 2017 Seventh International Conference on Emerging Security Technologies (EST), Canterbury, UK, 6–8 September 2017; pp. 122–127. [Google Scholar]
- Al-Andoli, M.N.; Tan, S.C.; Sim, K.S.; Lim, C.P.; Goh, P.Y. Parallel Deep Learning with a hybrid BP-PSO framework for feature extraction and malware classification. Appl. Soft Comput. 2022, 131, 109756. [Google Scholar] [CrossRef]
- Potha, N.; Kouliaridis, V.; Kambourakis, G. An extrinsic random-based ensemble approach for android malware detection. Connect. Sci. 2021, 33, 1077–1093. [Google Scholar] [CrossRef]
- Sharma, R.M.; Agrawal, C.P. MH-DLdroid: A Meta-Heuristic and Deep Learning-Based Hybrid Approach for Android Malware Detection. Int. J. Intell. Eng. Syst. 2022, 15, 425–435. [Google Scholar]
- Rana, M.S.; Sung, A.H. Evaluation of advanced ensemble learning techniques for Android malware detection. Vietnam J. Comput. Sci. 2020, 7, 145–159. [Google Scholar] [CrossRef]
Reference | Datasets | Evaluation Metrics | Model |
---|---|---|---|
[16] | Malgenome, DREBIN, merged, and MalDozer dataset. | F-score, false alarms, precision, recall, and runtime. | Neural networks. |
[8] | DREBIN dataset. | F-score, precision, recall, true negative rate, accuracy, false positive rate, and Area Under Curve (AUC). | Naive Bayes, L1 and L2 regularization, Random Forest, Support Vector Machine (SVM), and Shallow neural network. |
[23] | Original DREBIN, malware dataset, and a modified DREBIN dataset, The benign applications are obtained from Google play. | Precision, recall, and F-score. | SVM. |
[17] | Constructed dataset with 2.2 million malware samples, and 3.4 million benignware samples. | Area under curve, true positive rate, and false positive rate. | Deep learning model. |
[14] | Combined data set from the Chinese App Store and Google Play Store, containing 5560 malware samples and 6192 benign ones. | True positive rate, F-measure, false alarms, and classification accuracy. | Multi feature Naive Bayes algorithm. |
[15] | DREBIN and Malware Dataset (AMD) datasets, in addition to clean applications which collected from online app stores. | Precision, recall, accuracy, F1, and False-Positive Rate. | New Factorization Machine (FM) model. |
[12] | Dataset contains 1745 real world applications. | True positive rate, false alarms, accuracy, F-score, model build-up time, and area under curve. | Naive Bayes, Decision Table, Random Forest, Sequential Minimal Optimization, Decision tree, and Multi Lateral Perceptron (MLP). |
[24] | DREBIN dataset and AndroZoo dataset. | Accuracy, precision, F-score, recall, and area under curve. | CatBoost, LightGBM, RandomForest, and LineraSVM. |
[22] | DREBIN dataset. | Accuracy, precision, F-score, recall, area under curve, and false positive. | Decision Tree, Gradient Boosted, Random Forest, Extremely Randomized Tree, Neural Networks, k Nearest Neighbors, Discriminant Analysis, NB, Logistic Regression, Bagging, K Means, and SVM. |
[27] | DREBIN dataset. | Recall, specificity, false-positive rate, false-negative rate, accuracy, and f1_score. | Feed-forward deep neural network model. |
Manifest File | |
---|---|
Feature Set | Description |
Hardware Components | Contains the requested hardware components such as request access to the mobile camera. |
Requested permissions | Permission granted by the user at the installation time, such as SEND_SMS Permission |
App Components | There are four types of application components (services, activities, broadcast receivers, and content providers). |
Filtered Intents | On Android, intra-process and inter-process communications are performed through intents. This is a passive data structure that works as a synchronous message, allowing sharing of information between different components and applications. |
Dex Code | |
Restricted API Calls | API calls are restricted by permissions, inside Dex code, this set searches for API calls that do not have an associated permission. |
Used permission | The set of permissions that are requested in manifest and actually used based on the disassembled code. Sometimes they refer to it as real permission. |
Suspicious API calls | API calls that are frequently used in malware applications and requested access to sensitive data or resources. There are four types of these API calls; API calls to access sensitive data, API calls for communicating over the internet, API calls for sending and receiving SMS messages and finally API calls used frequently for obfuscation. |
Network Addresses | This includes any IP address, URL, or host-name found in the disassembled code. |
Owl Parameters | |
---|---|
Parameter | Value |
A random number between | |
A linear decreasing number from 1.9 to 0 | |
r | Uniform random number |
Number of Iterations | 300 |
Population size (Np) | 100 |
Fitness Function | |
0.48 | |
0.48 | |
0.04 |
Split Sample | Precision | Recall | FPR | Accuracy | F-Score | # of Features |
---|---|---|---|---|---|---|
Split #1 | 0.9930 | 0.9964 | 0.1524 | 0.9897 | 0.8780 | 205 |
Split #2 | 0.9933 | 0.9948 | 0.1484 | 0.9884 | 0.8638 | 221 |
Split #3 | 0.9918 | 0.9951 | 0.1709 | 0.9874 | 0.8504 | 208 |
Split #4 | 0.9924 | 0.9963 | 0.1719 | 0.9890 | 0.8642 | 221 |
Split #5 | 0.9925 | 0.9948 | 0.1683 | 0.9878 | 0.8542 | 225 |
Split #6 | 0.9922 | 0.9963 | 0.1654 | 0.9891 | 0.8730 | 224 |
Split #7 | 0.9927 | 0.9960 | 0.1576 | 0.9890 | 0.8707 | 228 |
Split #8 | 0.9921 | 0.9958 | 0.1729 | 0.9882 | 0.8604 | 216 |
Split #9 | 0.9927 | 0.9965 | 0.1626 | 0.9896 | 0.8743 | 231 |
Split #10 | 0.9928 | 0.9941 | 0.1615 | 0.9873 | 0.8500 | 201 |
Average | 0.9924 | 0.9956 | 0.1601 | 0.9884 | 0.8634 | - |
Reference | Approach/Technique | Precision | Recall | FPR | Accuracy | F-Score |
---|---|---|---|---|---|---|
[23] | DroidEnemy | 0.749 | 0.9964 | - | 0.747 | 0.752 |
[24] | Deobfuscation | - | - | - | 0.9889 | 0.8212 |
[8] | BNB | 0.469 | 0.937 | - | - | 0.625 |
[8] | 1ANN | 0.712 | 0.774 | - | - | 0.742 |
[40] | DecisionTree | 0.928 | 0.992 | 0.179 | - | - |
[40] | NaiveBayes | 0.9920 | 0.867 | 0.167 | - | - |
[40] | KNN | 0.933 | 0.993 | 0.163 | - | - |
[27] | DNN | - | - | 0.28 | 0.987 | - |
Proposed Approach | RF-Owl | 0.9924 | 0.9956 | 0.1673 | 0.9881 | 0.8634 |
Reference | Featuer Selection/Classifier | Precision | Recall | Accuracy |
---|---|---|---|---|
[41] | PDL-PSO | 0.988 | 0.98 | 0.977 |
[42] | ERBE | 0.936 | 0.940 | 0.938 |
[43] | IWD | 0.9535 | 0.9668 | 0.9912 |
Proposed Approach | RF-Owl | 0.9924 | 0.9956 | 0.9881 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alazzam, H.; Al-Adwan, A.; Abualghanam, O.; Alhenawi, E.; Alsmady, A. An Improved Binary Owl Feature Selection in the Context of Android Malware Detection. Computers 2022, 11, 173. https://doi.org/10.3390/computers11120173
Alazzam H, Al-Adwan A, Abualghanam O, Alhenawi E, Alsmady A. An Improved Binary Owl Feature Selection in the Context of Android Malware Detection. Computers. 2022; 11(12):173. https://doi.org/10.3390/computers11120173
Chicago/Turabian StyleAlazzam, Hadeel, Aryaf Al-Adwan, Orieb Abualghanam, Esra’a Alhenawi, and Abdulsalam Alsmady. 2022. "An Improved Binary Owl Feature Selection in the Context of Android Malware Detection" Computers 11, no. 12: 173. https://doi.org/10.3390/computers11120173
APA StyleAlazzam, H., Al-Adwan, A., Abualghanam, O., Alhenawi, E., & Alsmady, A. (2022). An Improved Binary Owl Feature Selection in the Context of Android Malware Detection. Computers, 11(12), 173. https://doi.org/10.3390/computers11120173