Ransomware is a relatively new type of intrusion attack, and is made with the objective of extorting a ransom from its victim. There are several types of ransomware attacks, but the present paper focuses only upon the crypto-ransomware, because it makes data unrecoverable once the victim’s files have been encrypted. Therefore, in this research, it was proposed that machine learning is used to detect crypto-ransomware before it starts its encryption function, or at the pre-encryption stage. Successful detection at this stage is crucial to enable the attack to be stopped from achieving its objective. Once the victim was aware of the presence of crypto-ransomware, valuable data and files can be backed up to another location, and then an attempt can be made to clean the ransomware with minimum risk. Therefore we proposed a pre-encryption detection algorithm (PEDA) that consisted of two phases. In, PEDA-Phase-I, a Windows application programming interface (API) generated by a suspicious program would be captured and analyzed using the learning algorithm (LA). The LA can determine whether the suspicious program was a crypto-ransomware or not, through API pattern recognition. This approach was used to ensure the most comprehensive detection of both known and unknown crypto-ransomware, but it may have a high false positive rate (FPR). If the prediction was a crypto-ransomware, PEDA would generate a signature of the suspicious program, and store it in the signature repository, which was in Phase-II. In PEDA-Phase-II, the signature repository allows the detection of crypto-ransomware at a much earlier stage, which was at the pre-execution stage through the signature matching method. This method can only detect known crypto-ransomware, and although very rigid, it was accurate and fast. The two phases in PEDA formed two layers of early detection for crypto-ransomware to ensure zero files lost to the user. However in this research, we focused upon Phase-I, which was the LA. Based on our results, the LA had the lowest FPR of 1.56% compared to Naive Bayes (NB), Random Forest (RF), Ensemble (NB and RF) and EldeRan (a machine learning approach to analyze and classify ransomware). Low FPR indicates that LA has a low probability of predicting goodware wrongly.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited