Within developed countries, understanding how SMEs view cybersecurity and machine learning is important. This then provides a good landscape to template against for those countries further developing in their technology to combat cybercrimes.
Section 2.1 gives a brief introduction to the understanding and origins of machine learning. This section gives importance to the ML roots of AI. The section goes on to explore how ML is broken down into categories in order to help with managing Big Data and how using public datasets could help in future experimentations for testing and using MLCS technology. In order for MLCS to succeed, the labelling of data is important within these datasets in being able to analyze and obtain the right information processed from it.
Section 2.2 covers the success stories of application of MLCS in industry. This is particularly important, as having success stories such as these and examples of how larger companies are using MLCS give a positive impact on how SMEs can use these templates to help protect and secure their data in the process. Whilst they may not be direct cut and paste applications, they are examples nonetheless of what works and what is still a learning curve for MLCS applications. These success stories show SMEs that with correct application, MLCS can work to their advantage and help to prove the method that could benefit the SME ecosystem. This section also shows that MLCS methods are worth adopting and worth exploring. Raising SMEs’ awareness naturally increases the adoption rate of MLCS within the SME sector of developed nations, in particular the UK.
Section 2.3 discusses the recent changes to how SMEs have changed in the way they work due to the pandemic, highlighting the issues raised for cyber security and how it is now becoming an important subject to talk about regardless of the industry the SME is in.
Section 2.4 goes on to look at SMEs’ understanding of machine learning through various examples experienced within the UK and its cyber agencies, including examples from developed countries.
Section 2.5 completes the literature review in taking a close look at the challenges of cybersecurity for SMEs within the UK SME market and comparing those to other developed nations inclusively.
2.1. Machine Learning–Understanding and Origins
In 1968, Arthur C. Clarke imagined that by the year 2001, a machine would exist with an intelligence that matched or exceeded the capability of human beings. By the 1980s, the film Robocop encapsulated AI technology through its robotic creation using ML and its algorithms [
5]. AI and ML capabilities go far beyond the expectations of conquering human hobbies but lend further into everyday events in our daily lives. Professor Stephen Hawking, a world-renowned scientist, in an interview with the BBC in 2017 [
6], discussed how efforts had been made to create thinking machines that potentially could pose a threat to our very existence. Hawking added that,
“The development of full artificial intelligence could spell the end of the human race.”
Machine learning (ML) on its own stems from a branch of AI and is defined by computers being able to develop a model and learn over time without prior learning and then improve this model like a human [
7]. Over time, the computer starts to develop and improve based on its interactions, as its software grows and develops. In a paper by Hewage, C. et al. (2018), one example of AI usage was to model polyalphabetic ciphers for decryption, in other words, to break the code following a set of sequential mathematical calculations and models of evaluation. Hewage’s paper discussed select traditional algorithms such as hill climbing and genetic algorithm and simulated annealing to decrypt sample codes [
8]. Similar to its predecessors and founders in code breaking back in 1941, the Enigma enciphering machine, which was used by the German army to send messages securely, was later on succeeded in its code breaking by the famous Alan Turing, who played a key role in his invention of the machine known as the Bombe, which significantly reduced the work of the codebreakers [
9].
In the case of cryptology, designs of such algorithms in ML fundamentally lay within the strong structures of cryptology. As cited in the paper by Hewage [
8], “Cryptology is the art and science of making and breaking ‘secret codes’”. Hewage’s paper goes on to divide cryptology into the two sub-divisions of cryptography and cryptoanalysis. Cryptography is the transforming and securing of the original data, whereas cryptoanalysis analyses the data to decrypt its encryption. Hewage’s paper focuses its decoding using algorithms that were inspired by nature to tackle complex problems, in particular, looking at the ant colony optimization and how social behaviors influence the findings of the shortest paths leading to the end goal [
10]. The algorithms used were hill climbing to decrypt sample codes as ways and means to obtain results through collection of data and getting results at each step of the “climb”, whilst the genetic algorithm took an evolutionary approach. The results mutated and changed over a time period. Another algorithm discussed was simulated annealing, which was a process of heating and cooling and potentially trying to reach a local maximum to gain results. Worse solutions were discarded, to obtain the best possible solutions available. The paper unites the understanding of AI and furthers its categories inspired by nature.
ML can be divided into three subcategories of supervised learning (SL) (task driven), unsupervised learning (USL) (data driven), and reinforcement learning (RL) (learning from errors). In order to understand the advantages and disadvantages of how these algorithms work, a dataset is always used and injected into the algorithms. These datasets are then classified as labelled and unlabeled data [
11]. ML is unable to move forward unless there is a dataset to work with. According to Buczak, A.L. (2015) [
12], there exists a variety of datasets to choose from depending on the experiments being conducted. For the interest of ML algorithms, the public dataset was discussed. DARPA 1999 and KDD 1999 are amongst many datasets that have been used in the past and continue to be used in the public domain. These datasets that now contain more than 4 million records are difficult to maintain and require human intervention when it comes to labelling the records. How these datasets are labelled will define the type of category of ML utilized to move an algorithm design forward. These datasets sit very nicely under the DPA (Data Protection Act) 2018 and UK GDPR (General Data Protection Regulation) within a developed nation such as the UK. DPA and GDPR are important policy instruments regulating the framework for cyber security as well as data protection. This is important in the data mining and uses of datasets when experimenting with ML [
13].
In ML, the first category is SL and is driven by tasks. It refers to the most basic types of ML, where the learning algorithm is developed on data [
12]. Buczak further explains that SL can be further categorized into classification and regression. Classification refers to data points being set. Examples of classifications in real life include predictive text in tweets in Twitter and product reviews in Amazon and eBay. Algorithms used here are support vector machines (SVM) and naïve Bayes (Bayesian). Regression is used to predict continuous values, and examples of the algorithms are decision trees and neural networks. Real life examples include improving healthcare [
14], calculating temperature, insurance premiums, pricing, and number of workers to the revenue of a business.
The second category in ML is USL. This uses datasets that are unlabeled, which means that human labor is not required to make the dataset machine-readable, thus allowing much larger datasets to be worked on by the program [
13]. USL has two categories, namely dimensionality reduction and clustering, using many algorithms such as decision tree, random forest, missing values, principal component analysis (PCA), neural networks, fuzzy logic, and Gaussian. Dimension reduction focuses on data compression, and hence reduces storage space, leading to reduced computation time, and helps remove redundant features. Clustering refers to the task of dividing data into groups. Real life examples include identifying fake news, implementation of a spam filter, identifying fraudulent or criminal activity online, and marketing campaigns.
The third category in ML is RL, based on the psychological concept of conditioning. RL here works by putting the algorithm within a working environment with an interpreter and a reward system. The output result is then decided by the interpreter whether it is favorable or not. RL enables interactions with an environment through the means of a machine. An example of this is repeatedly playing a video game, providing a reward system when the algorithm takes an action. AlphaGo, the online game is an example of RL.
The next section shows how MLCS has proven to work well in big technology companies and how the uses of ML technology and its methods have extracted success stories for SMEs to learn from and perhaps even apply at their level in industry.
2.2. Success Stories of Machine Learning Cybersecurity in Big Technology Companies
The information in
Table 1 was collated as success stories of ML techniques used by big technology companies using ML methods, techniques, and algorithms. It also shares reference points and more success stories of where MLCS has benefited and helped these companies in securing their own internal systems from cyber threats. Reference to the legend is required to further explain the abbreviations in the table below.
Table 1 above references recent articles on the internet written by a variety of technology magazines, suggesting that Amazon’s AWS (Amazon Web Services), Google’s Gmail, and Facebook are all using their ML knowledge towards their cyber security models to advance their threat detection. Stephen Schmidt, Amazon Chief Information Security Officer (CISO), mentioned that Amazon had a duty of care to ensure the online safety of millions of people across the world, leading back to their cyber security structure. Siemens Cyber Defense Centre, which uses Amazon’s AWS, went on to build an AI-enabled, high-speed, fully automated, and highly scalable platform to evaluate 60,000 potentially critical threats per second. This success story has then subsequently improved their cyber security and its threats reduction. In
Table 1, it is also highlighted that Amazon used ML algorithms such as decision tree in its AWS Services and has expanded its services through Amazon’s Macie on which its design was to embed its intelligence to protect the network and works of SL and USL methods [
21].
In another article posted on CSO online [
18], in order to analyze threat endpoints on mobile devices running on Androids, Google was able to use ML in identifying and removing malware from these devices. As clearly shown in
Table 1 above, Google mail (Gmail) has seen success stories in its spam filtering, not just incoming spam but by the use of machine learning in identifying other abuses, such as Denial-of-Service (DoS), virus delivery, and other imaginative attacks [
20]. Based on these ML methods, Amazon launched a new service to classify its data storage under the SL techniques of ML.
Table 1 above also shows the applications of a UK cyber security start-up company, Darktrace, a company that had seen success around its ML solutions since 2013 [
22,
23]. Darktrace used algorithms within its software package to spot attacks within one NHS agency’s network, and the threat was then mitigated without causing any damage to that organization. When WannaCry was the top cyber threat back in 2018, all Darktrace customers were not harmed, as the ML algorithms were clever enough to intervene and create a safe environment for them [
19]. According to Vähäkainu, P. (2019) [
18], Darktrace uses its own mathematical algorithm, Enterprise Immune System (EIS) technology, and utilizes this ML technique combined with the Bayesian algorithms and other mathematical principles in order to detect anomalies for cyber threat detection within a network. Vähäkainu describes the technology using Bayesian probability theory and how Darktrace monitors raw data, such as cloud service interactions. Vähäkainu also explains how this data was then transferred onto a network in real time, without disturbing business operations and transactions.
Other companies to take up MLCS are companies such as PayPal, Visa, and Mastercard. These companies use deep learning algorithms to identify and prevent fraudulent behavior within milliseconds before, during, and after a transaction, as reported in the article written in November 2020. Mastercard also had experienced over 200 fraud attempts per minute, which allowed them to also utilize the ML algorithms to combat cyber security threats. Mastercard too chose to implement deep learning algorithms within their network.
Another article in the
MIT Technology Review, dated April 2020 [
24], explained how hackers were trying to trick Tesla’s program into veering into the wrong lane whilst driving. However, Elon Musk’s investment in ML showed strength in trying to overcome this issue.
Table 1 goes on to show that in a similar study [
18], Tesla used ML to secure Wi-Fi and browser vulnerabilities using zero-day exploits to limit tampering with autonomous vehicles, which can be disruptive.
Various other research lends particular interest to MLCS in action and how particular use of ML algorithms specifically enhances the interest of the applications used. In particular, e-commerce applications provide an added advantage to customers to buy products with added suggestions in the form of reviews, similar to the design of the likes of Amazon and eBay. In a paper by Uppal, S. (2019), the author gave importance to how these reviews become useful and form impact for customer engagement on wanting to purchase products. However, whilst most reviews are positive, many can create problems if they are less savory in nature and if customers not being able to segregate useful ones from those that are nonsense. Uppal’s paper pays attention to the need for an approach which will showcase only relevant reviews for the customer’s interest. Uppal’s paper suggests the “Pairwise Review” relevance ranking method, which is based on their relevance of the product and avoids showing irrelevant reviews. ML algorithms used here were SVM, random forest, neural network, and logistic regression, being applied to validate ranking accuracy. Out of all four applied classification models, random forest gave the best result and achieved 99.76% classification accuracy and 99.56% ranking accuracy for a complete dataset using random forest. This success story showed that ML usage is becoming more applicable in its design for everyday application as well as cyber security for protecting the network, in this case protecting the integrity of a sound business with a genuine reputation [
25].
In real life applications of the previous section of ML techniques and its algorithms, technology giants such as Amazon, Google, and Facebook all have been gradually ramping up their security models in using AI and its usage of ML. These technology giants have used ML to focus on how they can use the technology to improve their customer service experience and further develop their customer engagement and behaviors and complement their cyber security. These technology giants have also created ML products to protect their own customers from cyber threats [
26].
In the same paper by K. Lee et al. [
15], it was observed that malicious spammers would exploit social media systems of these technology giants such as phishing attacks, malware, and promoting affiliate websites, thus leading to the development of detecting spammers in social network companies such as Twitter, Facebook, and My Space. Developing specific classifications techniques enables the detection of email spam and phishing approaches that rely on data compression algorithms, machine learning, and statistics that could inform the further refinement of many proposed approaches. Lee’s paper uses SL techniques based on support vector machine (SVM) with its high precision as well as low false positive rate with its information and data feeding into the SVM classifiers.
ML’s celebrity status is covering nearly all disciplines including that of sports analytics in visualizing impact to assist in decision-making in making sports performance at its peak, as explained in Jayal, A. et al.’s paper. Jayal’s research uses big data approaches and analysis of approach-based structures in integrating problem-based learning through interactive visualization, simulation and modeling, geospatial data analysis, and ML, amongst various other big data techniques, in particular ML and its algorithm usage in sports. The approaches of clustering techniques, survival analysis, artificial intelligence, rule-based approaches, graph-based approaches, and inductive logic programming plus neural networks and deep learning allow for greater understand of identifying a general-purpose toolkit that can be used with the help of data reduction and data mining and analytics approaches in sports [
27].
Through these success stories, as shown from the diagram above, there is certainly an overlap in the types of methods being used, and this lends a hand to the techniques plus algorithms distributed to obtain the best effective solutions in the market to combat cyber threats through various cyber security software packages. The next section will lead on to the SME’s view of cyber security and UK SME’s adaptation of MLCS.
2.3. SME’s View of Cyber Security and Adaptation of MLCS
At the start of 2020 there were 5.94 million small businesses (with 0 to 49 employees) in the UK, accounting for 99.3% of the total business as recently reported by the National Federation of Self Employed & Small Businesses (FSB) [
28]. The same set of statistics has shown that UK SMEs account for 99.9% of the business population equivalent to 6.0 million businesses. According to the definition by the UK government, micro-SMEs hold less than 10 employees and an annual turnover under €2 million, small SMEs have less than 50 employees and an annual turnover under €10 million, and medium-sized SMEs have less than 250 employees and an annual turnover under €50 million. Between 2019 and 2020, the total business population grew by 113,000 (1.9%). The COVID-19 pandemic has caused the UK to face challenges effecting the economy, and SMEs alongside other organizations have made a shift from physical shop windows to virtualizations in cyberspace [
29].
According to the Office of National Statistics (ONS) reporting on December 2020, temporary closures, a shift to online shopping, and reduced travel meant the first wave of the coronavirus (COVID-19) had an enormous impact on business, and some industries felt the impact far worse than others [
30]. Whilst some industries shrank by up to 90% in April and May, others recorded some growth. In particular, online shopping grew far more than its pre-pandemic trend, and our cyber footprint exploded, seemingly having no boundaries [
31].
For SMEs to further reach their network and grow their businesses, online activities have seen a massive rise in how SMEs have had to change the way they worked to accommodate this change. SMEs have had to change their technology and organization, but most importantly change how they work with their staff, with working from home to making sure their business data are kept safe and secure. Whilst larger organizations have had the benefit of many departments cushioning various corners of the business with the right people being paid the right money to support the organizations, this scenario is not the same for SMEs. With a smaller group of people to manage the business and controlling the growth rate, SMEs fall into a niche category of experts that potentially have to understand and know everything about the business and be flexible in how work is conducted and administered.
In light of these challenges and changes to SMEs and hybrid working conditions, an exceptional rise has been seen on the usage of Internet of Things (IoTs). Employees working from home are having to juggle personal and business life through using personal devices to access business data [
32]. Data shared with each other and the need to share data in particular ways have now become important in recent events of needing to work from home during the current pandemic of 2020. The pandemic has brought to light the need for using IoTs, such as daily usage of phones, iPads, and other smart devices. These IoTs are being used in industry to keep up with the growing trends of getting information faster, whilst having advantages to the ever-growing IoTs in these industries and devices talking to each other in a connection of networks across cyberspace, which allows for transfer of data to happen quickly and efficiently. This in particular is advantageous to the SME industry for its size and its ability to be flexible in how their employees work and the changing lifestyle in which SMEs need to grow.
However, this scenario within the functions of an SME is now presenting numerous challenges, including those related to privacy, security, and data breaches, or those pertaining to ethical, legal, and jurisdictional matters. IoTs cover a broad range of proprietary hardware and software that often use different data formats, networks, or communication protocols, and physical interfaces resulting in technical challenges. MLCS methodologies allow for the analysis of SME business and arise to management questions on how multiple interactions and complexities arrive from being connected to the internet. These large quantities of data are often private and sensitive, transferring data along the way. Disadvantageously, this creates a wider security attack surface for potential malicious activities to occur.
Looking at how IoTs and ML have clearly moved forward positively and making it easier to manage, humans now cannot even imagine life without technology. Hard as it is to imagine, the realization has taken one step further in that the pandemic of COVID-19 in 2020 has accelerated the usage of IoTs and its applications of MLCS into new realms humans perhaps cannot even understand. Even the likes of Chatbots have emerged to manage online interactions linked to the use of AI applications. Chatbots [
33] have replaced people online, and ML is now learning everything about us and how humans behave. ML in its integration into IoTs is now evolving in how we interact online and adapt to our needs and surroundings. The desire for humans to interact with machines is vital. It is no wonder that the assumption of people’s awareness is at stake.
2.4. SME’s View and Support on Machine Learning
The UK’s answer to providing intelligence and information assurance to the government and armed forces is the Government Communications Headquarters (GCHQ) [
34]. The GCHQ is an intelligence and security organization with a mission to keep the UK safe.
The National Cyber Security Centre (NCSC), under the parent body of GCHQ and other national security centers, offers online guidelines for SMEs and business on how they can avoid cyber-attacks. Following these guidelines helps SMEs give awareness and shape their business to keep their data as safe as can be. The guidelines follow a set of rules such as backing up SMEs data, protecting organizations from malware, keeping IoTs safe, using good structured passwords and management to protect the data, and how to avoid phishing attacks, amongst many other tips and tricks. Most of these guidelines give helpful hints and share knowledge on how to develop a state of awareness and be diligent in keeping information safe [
35].
In recent news published September 2020 by GCHQ, ten tech cyber security start-up companies using AI, Data Science, and ML were selected to benefit from the 12-week support program, based out of GCHQ’s Manchester office. These included firms which use AI to alert haulage companies to stowaways in their containers, data to determine how busy trains are to manage social distancing, and how AI and ML were used to identify and prevent the spread of fake news [
36]. In April 2019, guidance was being written by the National Cyber Security Centre website (NCSC), which is now part of the GCHQ, that offered information on assessing intelligent tools for cyber security in the form of AI and ML. The NCSC provides a single point of contact for SMEs, larger organizations, government agencies, the general public, and departments, and also collaborates with law enforcement, defense, the UK’s intelligence and security agencies, and international partners.
These methods adopt the stranger danger policy in helping SMEs move forward. Whilst this is useful, many SMEs fall short due to how they go about securing their data rather than getting their hands dirty for prevention.
SMEs, due to their structure and economic characteristics, can be extremely damaged when a cyber-attack takes place. In a 2020 study by López, M.Á. on intelligent detection, the author outlined the different scenarios of cybercrime and what can be done to compensate the situation [
37]. Here Lopez proposed an intelligent cybersecurity platform, which had been designed with the objective of helping SMEs to make their systems and network more secure and robust. The proposed aim of this platform was to provide a solution optimizing detection and recovery from attacks. The proposal applies a proactive security technique in combination with both machine learning (ML) and blockchain. The proposal, which is part of a funded project by the Innovation and Development Agency of another developed nation country, Andalusia, Granada, Spain (IDEA) (IASEC project), allows for the provision of security in each of the phases of an attack in helping SMEs in prevention, avoiding systems and networks from being attacked. For SMEs, using various different software to manage their security information and event management systems (SIEMs) is very important in helping organizations become compliant and to have the infrastructure in place to help with any breaches. Lopez et al. proposed providing resources to optimize detection and self-recovery of systems and services after suffering an attack, creating a solution to allow detecting, and dealing with fake publications on the Internet, protecting IoT devices and Industry 4.0 from the most relevant attacks for SMEs, and detecting and avoiding fake news and hoax spreading. These objectives are tackled by combining both smart systems and blockchain. Blockchain here in the proposal is used to improve the security systems by protecting data integrity in a secure and transparent way.
Another study by Rawindaran, N. et al. (2021) [
38] explored how early detection of cyber-attacks is important through SIEMs, especially in the cycle of network security. Intrusion detection and prevention systems (IDPS) were experimented with, and commercial network intrusion detection systems (NIDS) versus open-source devices were compared to combat cyber-attacks. These IDPS devices all came with their own SIEMs to track events and send alerts to become part of the cycle of IDPS. Amongst those that were discussed were SolarWinds, Cisco, Tripwire, Wireshark, and Splunk, to name a few. Protection of data, as evaluated and discussed in Rawindaran’s paper, is the reason why IDPS systems have come into force more within the SME market [
38].
Both Rawindaran, N. et al. and Lopez, M.A. et al. agree that IDSs can be network-based (NIDS) and host-based (HIDS) and can monitor and analyze network traffic in real time together with analyzing records, databases, and other elements in a host to detect possible intrusions. IDS can also be grouped according to the type of detection technique, being signature-based and anomaly-based [
38].
ML techniques and algorithms have now contributed largely to how data can be classified, labelled, and ultimately managed under the umbrella of AI. The ability to use techniques such as supervised and unsupervised learning has helped in getting Big Data within this cyber space, through various classification, regression, and clustering activities. These activities allow for outcomes to be predicted. ML mathematical algorithms all compound to how data is treated and managed to produce the outcomes and predictability required to contribute to economic growth in societies moving forward.
In Lopez, M.A. et al.’s [
37] proposal, ML techniques were used for data collecting, testing, and evaluation, and the main goal was to determine the most efficient algorithm for intrusion detection. ML algorithms for supervised detection were compared, such as C4.5 (decision tree), Bayesian network, random forest, support vector machines (SVM), and artificial neural network (ANN). The study performed measurements from different sampling data, and the results showed that C4.5 was the most precise among the studied algorithms. Finally, another proposal was to build a solution focused on cyber security for a smart-home or smart-office, applying two variants of long short-term memory (LSTM), which is a type of neural network.
SMEs are all aware of Denial-of-Service (DoS) and Distributed DoS (DDoS), malware, or web-based attacks, as they are some of the most common security incidents around. Lopez explains that when a server suffers a DoS attack, the system records in the smart contract those IP addresses that are involved in the attack, creating new blocks every 14 s through block chain technology. Each user in this network now has an updated list with malicious addresses in the interval, allowing the security people to take actions for attack mitigation. This solution can be extended to DDoS attacks using a dataset that has been accurately obtained using the random forest ML algorithm for model building. Similarly, for structured query language (SQLi) attacks detection, datasets are applied ML algorithms such as decision stump, naïve Bayes, Bayesian network, and radial basis function (RBF) network, which is an ANN. The most efficient algorithm was decision stump. Naïve Bayes was then used to classify SQL queries as malicious or legitimate. Both grammar and SQL syntax were taken into account and extracting features from language and defining rules. Training several classifiers, such as SVM, ensemble bagged trees, or ensemble boosted trees was important, and it was identified that in this case, the best result obtained was the decision tree model. Another attack that is common to SMEs is the domain generation algorithm (DGA), and this can be detected by analyzing DNS traffic in pseudo-real time. DGA is used to generate new domain names and IP addresses for malware’s command and control servers. Here the proposal enables filters and non-resolved DNS requests and identifies those hosts showing the highest peaks for this value for detection. From this study of Lopez, it is very apparent that the three most common attacks to SME infrastructure can be identified and prevented by the correct use of ML coupled with block chain technology to protect the business. By using the right SIEMs together with the IDPS and NIDS/HIDS, SMEs can be educated in the right direction to be able to make the informed decisions they need to make.
Another study showed the evaluation of ML algorithms for anomaly detection is performed through the ALICE high performance computing facility at the University of Leicester [
39]. The impressive computer had 64 GB of RAM, two Ivy Bridge CPUs at 2.50 GHz (20 cores in total), and 2 × Nvidia Tesla P100 GPU cards. Python 3.6.8 was used to run the service on an Enterprise Operating System3 (CentOS Linux 7). The classical ML algorithms were implemented using the Scikit-learn 0.21.3 ML library. The deep learning algorithms were implemented using Keras 2.3.04 neural-network library on top of TensorFlow 1.9.05 to enable the use of GPU. Sigmoid and SoftMax functions were also used for binary and multi-class classification. Pandas6 and NumPy7 library packages were used to manipulate and analyze the raw data. This evaluation looks at a comprehensive analysis of the ML algorithms, with the result being that the random forest (RF) algorithm achieved the best performance in terms of accuracy, precision, recall, F1-score, and receiver operating characteristic (ROC) curves on all datasets given. The main contributions of this paper were that the currently available datasets containing the most up-to-date attack scenarios were used, and ML anomalous detection was applied. Binary classification and multi-classification based on the performance metrics were used and produced the best-fit algorithms for the anomaly detection challenge. This same study shared the research community’s and the SME cybersecurity industry’s insightful knowledge and suggestions regarding suitable ML algorithms to support cybersecurity.
In all its glory and complex structure, ML is playing an important part in the way we handle attacks for cyber security and protecting our data. In a paper presented by Gupta, A. (2021) [
40], the authors gave clarity to the various applications of ML in cyber security within the SME market. The ability for ML to detect malicious events and prevent attacks are the top reasons to use ML within the cyber security infrastructure and start using devices and technology than can support anomaly detection for zero-day vulnerabilities and protection of networks, endpoints coupled with application security, and user behavior. ML usage in IoT comes in second as the incorporation into mobile gadgets such as Google and Apple’s Siri have become important in the cyber security ecosystem. Various other uses of ML will go on to include human analysis and make our jobs easier in terms of being able to filter data, review millions of login details, pass information on to human analysts, and minimize notification and build a complete AI system to support the system.
The next section looks at how SMEs’ views and support of MLCS lead to barriers and challenges within the industry and how this can be overcome.
2.5. SME’s Cybersecurity Barriers and Challenges
Whilst there is a huge advantage to using ML in the SME industry, the disadvantages include dataset availability for testing, and the fact that information can be mixed up, as well as the need for information to still require ground truthing, according to Gupta [
40], as human intervention in creating the mathematics and the models is still unfortunately required and the margin of human error is still to be defined. The degree of human intervention is still strong. Lopez, M.A. et al. [
37] highlighted barriers such as resources, and not having enough knowledge to set up efficient security systems, such as SIEMs, to challenges in implementing a security platform that provides this knowledge through means of ML techniques. Whilst the architecture is scalable, SMEs rely on micro-services for detection and recovery when an attack is predicted to occur.
SMEs have become most vulnerable to cyber-attacks due to their unique ecosystem. One reason could be due to the potential shortage of cybersecurity knowledge and resources that exist in the SME organizational structure. SMEs have become put into positions of exploitation, whereby the likelihood of cyber-attacks come at a high price in experiencing cyber incidents. In a recent paper by van Haastrecht, M. et al. (2021) [
41], SMEs struggle to cope with the rise in cyber security threats leading to intuitive, threat-based cyber security risk assessment approaches for the least digitally mature SMEs, using a socio-technical cyber security framework to help contribute towards the needs of SMEs. The works of van Haastretcht use both a framework and the ADKAR (awareness, desire, knowledge, ability, reinforcement) change management model of Hiatt [
42] to guide the research in covering the social dimensions needed to be considered in SMEs. Coupled with five main aggregation strategy classes applied, such as weighted linear combinations, weighted products, weighted maxima, weighted complementary products, and the Bayesian network, the results are able to determine if the application within the SME was too simplistic or needed advanced care. The framework was then applied on SMEs that were divided into further four categories, as suggested by the European DIGITAL SME Alliance [
43]:
start-ups,
digitally dependent SMEs,
digitally based SMEs, and,
digital enablers.
In summary, digitally based SMEs and digital enablers were advised to use a more comprehensive risk assessment approach and maturity model due to the expertise available within the SME organization to cope with building trust in cyber security along with standards and policies in place. Digital enablers were also prime candidates for using more advanced aggregation strategies, such as Bayesian networks, due to having the cyber security expertise and data required to make these solutions successful. For start-ups and digitally dependent SMEs, threat-based risk assessment approaches worked better based on non-aggregated or intuitive strategies by focusing on the real-life threat environment to accommodate feelings of competence and relatedness by ensuring optimal organization and employee motivation and doing what is right. Van Haastrecht, M. et al. goes on to explain that one size does not fits all and the type of SMEs matters, and the intellectual knowledge contributes to the success of its cyber security landscape. The barriers here reflect that SMEs cannot adopt a “cut and paste” style of understanding cyber security and its threats like how larger organization can.
In another article by Tam, T. et al. (2021) [
44], another developed country is examined through the lens of Australia. Lessons are shared of how developed nations such as Australia deal with their SMEs and how they are faced with cyber security challenges. Large organizations within Australia have always been early adopters of cyber security scenarios often having the workforce, finance, and environment to support the research and development in cyber threats. Tam explains that most cyber security lessons and conventions exists due to the result of early large-scale incidents such as NotPetya, Equifax, Wikileaks, etc., affecting mostly large organizations. Consequently, cyber security industry best practices, standards, and products are influenced by the needs of larger organizations. Tam also highlights that the technical landscape of an SME can potentially be very different from that of a large enterprise, making it impractical to apply solutions for the larger enterprises to smaller scale users. Taking an example back to the UK was the implementation of Cyber Essentials and GDPR. Larger organizations had an easier approach for implementation compared to that of SMEs purely due to their ability to be able to have the labor-power and the technical expertise to implement at a smoother rate. The small business IT technical architecture becomes another barrier to adopting a complete cyber security solution. Tam goes on to further explain that another major barrier for technical implementation is the need for a robust testing environment. Testing environments are achievable between larger organizations than SMEs in the context of this Australian example. Tam explores in this study that any cyber security solution designed to test a response to debilitating events requires a safe testing environment.
For example, denial of service (DoS) simulation tools can simulate a service overwhelmed with requests, resulting in legitimate requests not getting through. A DoS simulator, if implemented on a live system, would render the SME business IT infrastructure, e.g., website, unavailable to customers, or worse, jeopardize the overall system integrity and potential loss of business. Tam concludes that live environments cannot be used for stress-inducing tests. Consequently, businesses without a test environment will never be able to test the full suite of catastrophic scenarios as part of their incident response training. Tam goes on to discuss the importance of a test environment that requires substantial technical knowledge, time, and ongoing maintenance, which is only feasible in larger organizations and very rarely seen within the SME context.
In addition to barriers of technical challenges, Tam’s study also highlights barriers such as human factors that contribute to SMEs having challenges in implementing the right cyber security choices, leading to organizational and process maturity of the SME sector. The complexity of implementing industry standards and having to bear the costs of cyber insurance, legal remediation, and costs of a data breach also contribute to why SMEs in Australia have found moving forward to protect their data sometimes impossible to keep up with. Tam’s paper sits well with the given technology landscape that is very similar to countries such as the UK and hence will have similar cyber security concerns. similar. SMEs in Australia and UK hold similar societal profiles, thus sharing similar human struggles with cyber security. The conclusion to Tam’s paper suggests that opportunities to apply non-traditional solutions to cyber security are becoming apparent through new found alliances, security paradigm, and the open source community for helping SMEs build up their defenses to combat cyber-attacks.
The literature review section above required the uses of various platforms in order to perform searches for the topic in concern for this article. The methodology is documented in
Appendix A of this article. The next section reveals the methodology applied in the survey questionnaire run in this paper to hear and listen to the voices of SMEs in the UK on how their impressions have been in these various cyber security topics, paying particular attention to the awareness and changes through the pandemic and how governments could make some changes in bridging the gap to a better and safer cyber landscape moving forward.