6.1. Addressing Hypotheses
Each hypothesis was designed to test each traditional botnet detection technique to determine its capabilities on both IoT-based and traditional botnets. The hypotheses were constructed with the assumption that the techniques would be able to correctly identify all bots and non-bots in traditional networks while not being able to do the same for bots in IoT networks. Three capabilities were tested with these hypotheses for IoT and traditional scenarios: the ability to detect botnets, the ability to identify all bots, and the ability to identify all non-bots. Certain nuances among the results are also apparent, though not necessarily addressed by the hypotheses, such as the tendency for the techniques to perform better on IRC-based botnets and BotProbe’s diminished performance when applied to multiple infections. The outcome of each hypothesis pertaining to each botnet detection technique is shown in
Table 9.
BotMiner met expectations in terms of detecting bots in traditional scenarios, and it was able to detect all botnets and all bots: H2 and H3 were accepted. BotMiner also defied expectations by doing the same for IoT-based botnets. H1 and H5 were rejected, as BotMiner performed better than expected when it came to detecting bots. However, BotMiner was prone to producing false positives in certain cases, leading to H4 being rejected and H6 being accepted. BotMiner was unable to correctly identify all non-infected hosts. However, even though H6 was accepted, it must be noted that, overall, BotMiner produced fewer false positives in comparable IoT scenarios than it did in the equivalent traditional scenarios—suggesting that BotMiner might have performed better on the more contemporary threat than that which it was designed to address. Although BotMiner has some limitations in the form of over-sensitivity, the network being IoT-based does not appear to have any additional negative impact on its performance.
BotProbe was able to correctly identify all non-infected hosts, and no false positives were produced across all scenarios, both IoT and traditional. This means that it met expectations for the handling of non-infected devices in traditional networks while defying them for IoT networks. H4 and H6 were accepted and rejected, respectively, and no differences could be identified in terms of false positives between the traditional and IoT scenarios. BotProbe was able to detect bots across a number of scenarios, both traditional and IoT, leading to H1 being rejected and H2 being accepted. However, BotProbe was not able to detect all bots and had a diminished performance when multiple infected devices were present. Therefore, BotProbe did not meet the expectation given by H3, leading to that hypothesis being rejected and H5 being accepted. Although H5 was accepted, it should be noted that, while two of the traditional scenarios that BotProbe was applied to resulted in a mean TPR of 0, the same scenarios but on IoT networks had a mean TPR greater than 0. BotProbe appears to have performed better with the IoT-based scenarios overall, having detected at least one bot in every scenario.
BotHunter was unable to meet expectations on traditional networks while effectively performing as expected on their IoT counterparts. BotHunter did not detect any bot activity among any of the scenarios, leading to H2 being rejected and H1 being accepted—the opposite result when compared to the other two techniques. In line with that, H3 was rejected and H5 was accepted. Technically, BotHunter did not produce any false positives, meaning that H4 and H6 were accepted and rejected, respectively. Unlike with BotProbe, however, this does not appear to be a result of high specificity but rather a total lack of sensitivity.
6.2. Detection Technique Performance
BotMiner had the highest and most consistent mean TPR across all scenarios, successfully detecting all bots in all simulated (and externally recorded) networks. BotMiner did also produce the largest number of false positives among the techniques, however, and was prone to making incorrect assessments whenever devices exhibited aberrant behaviour that could be misconstrued by the technique as malicious. BotMiner’s TNR was greater whenever a botnet utilising the IRC protocol was present, which appears to have allowed BotMiner to make a more accurate distinction between bot and non-bot activities. This apparent pattern of behaviour had a greater impact on IoT-based networks, where BotMiner was able to achieve a mean TNR of 1 in scenarios where it was unable to do so for its traditional counterparts.
BotProbe was able to successfully detect bot activities in almost all scenarios, with only two traditional scenarios where BotProbe had a mean TPR of 0. BotProbe failed to detect all bots in scenarios where more than one host was infected. It appears that the probing technique was unable to properly assess the probe responses to determine whether a given host was infected. When only a single infection was present in a given network, BotProbe successfully detected the bot. When the total number of devices was 100 as opposed to 10 in traditional networks, BotProbe was able to produce a mean TPR above 0—possibly indicating that a greater volume of traffic allowed BotProbe to operate correctly. The equivalent IoT scenarios to which BotProbe was applied also rendered a mean TPR above 0, where it failed to make any detections in their traditional counterparts. The volume of traffic in the IoT scenarios was greater than that in the traditional equivalents, representing the more autonomous activities of IoT sensors as opposed to the more fluctuating human-like activities represented in the traditional scenarios. Although BotProbe did not detect all bots, particularly when more than a single bot was present, it did correctly identify all non-bot hosts.
BotHunter was unable to detect any bots across all scenarios, both simulated and externally acquired. It is possible that BotHunter’s approach was designed for situations where a given bot’s full lifecycle can be reported and modelled—something that may not have been available in the simulations or external datasets. The simulations did not necessarily include the infection stage, for example, though it is apparent that some of the CnC activity was at least alerted when the IRC protocol was present. When the IRC protocol was not present, BotHunter had no alerts to model at all. It should also be noted that not all botnets allow detection during the infection stage, such as when backdoors are present on a maliciously modified ISO [
48] or installed at some stage of a supply chain prior to purchase [
49]. BotHunter’s modelling approach may have been too rigid for the scenarios presented while also failing to make any signature detections for non-IRC bots.
Comparing each technique, it can be observed that BotMiner and BotProbe have strengths that could potentially mitigate the other’s limitations. Some limitations of BotProbe were apparent prior to experimentation, limited to the protocol that the initial filter is configured for. While BotProbe was able to detect at least one bot in most scenarios, BotProbe failed to detect every bot. BotMiner, in contrast, was able to detect all bots and is completely protocol-independent. However, BotMiner registered multiple false positives when aberrant activity was present. BotProbe did not produce any false positives, even when aberrant activity was present. Given that BotMiner is protocol-independent and was able to detect all bots in all experimental scenarios and that BotProbe did not produce any false positives, it is possible that the techniques could be used to complement one another. If the CnC command traffic can be identified from BotMiner’s netflow and alert cluster correlation, BotProbe could be used to refine BotMiner’s findings.
Whereas BotMiner and BotProbe could be complementary to one another, BotHunter was unable to detect any bots. Like BotMiner, it appears that BotHunter may have been more capable when applied to IRC-based bots, given that its IDS component was able to make some detections when the IRC protocol was used. However, unlike BotMiner, BotHunter was unable to utilise these IDS detections to render a detection. It is possible that if BotHunter’s modelling were more accommodating to modern threat scenarios or if the IDS component was made to be more sensitive, BotHunter could have been more comparable to BotMiner: they both utilise IDS and are both protocol-independent. Whether or not BotHunter would have produced false positives if this had been the case cannot be determined from the experimental results.
With only one of the three detection techniques failing to detect any bots and with certain limitations appearing to be unaffected, or sometimes even partially mitigated, by IoT scenarios, it can be concluded that traditional botnet detection approaches found in the literature, particularly BotMiner and BotProbe, are capable of detecting IoT-based botnets. While IoT-based botnets may present a serious threat in comparison to their traditional counterparts by their numbers, availability, and relative lack of security, there are techniques capable of detecting them.