Multi-Level Clustering-Based Outlier’s Detection (MCOD) Using Self-Organizing Maps

: Outlier detection is critical in many business applications, as it recognizes unusual behaviours to prevent losses and optimize revenue. For example, illegitimate online transactions can be detected based on its pattern with outlier detection. The performance of existing outlier detection methods is limited by the pattern / behaviour of the dataset; these methods may not perform well without prior knowledge of the dataset. This paper proposes a multi-level outlier detection algorithm (MCOD) that uses multi-level unsupervised learning to cluster the data and discover outliers. The proposed detection method is tested on datasets in di ﬀ erent ﬁelds with di ﬀ erent sizes and dimensions. Experimental analysis has shown that the proposed MCOD algorithm has the ability to improving the outlier detection rate, as compared to the traditional anomaly detection methods. Enterprises and organizations can adopt the proposed MCOD algorithm to ensure a sustainable and e ﬃ cient detection of frauds / outliers to increase proﬁtability (and / or) to enhance business outcomes.


Introduction
Illegal actions in business usually lead to a significant amount of financial loss, especially with those organizations that handle a large amount of data or metrics. For example, with the development of online shopping, the number of online transactional frauds is increasing, which involves scammers pretending to be legitimate online sellers, or buyers paying with an unauthorized credit card. However, it is impractical to examine each metric of the dataset over the whole timeframe manually. Therefore, discovering those anomaly behaviours in data is very critical to reduce fraud and to increase profitability. Outliers, or anomalies in general, refer to extreme objects that different from other observations of the same dataset [1]. It can cause some issues for statistical applications or training of machine learning algorithms, because an outlier may represent a variation, experimental error, or a novelty. Therefore, outlier detection is a popular topic in data mining research and is commonly used in credit card fraud detection [2], medical diagnosis [3], intrusion detection in cloud computing [4], and the pre-processing of a dataset [5]. For example, in [6], an automatic framework to estimate the production frontier is proposed. The first step in the framework is to use data points to fit a cubic function. The points that are far away from the curve are potential outliers. After eliminating all potential outlier points, the rest of the points are used to estimate the production frontier. This framework is proven to produce meaningful outcomes in simulation experiments and real-life applications. Furthermore, the centre location problem is addressed in [7], in which the location of facilities that is best suited for existing customers is decided. In the problem-solving process, the far-away customers are considered as outliers, because excluding those distant customers may lead to a reasonable and Big Data Cogn. Comput. 2020, 4, 24 2 of 18 economic centre location for the decision-makers. Therefore, a k-max function is proposed to limit the influence of far-away customers for an optimal outcome. The value of k is pre-specified, the (k − 1) of far-away customers are detected as outliers and are not considered to the further location decision. The location decision process with outlier detection is able to provide an economic-efficient solution for the majority of customers. Outlier detection has also shown a significant impact in time-series analytics as illustrated in [8], in which they proposed a non-parametric outlier detection (FOD) for time series data. The FOD is based on the frequency-domain and Fourier transform. Firstly, the Fourier transform of time series data is calculated. Then periodic peaks and their most repetitive interval in the frequency domain are detected and transformed back to the time domain. In the time domain, the global extremes are identified as periodic outliers. In general, outlier detection methods are classified as distance-based, distribution-based, density-based, deviation-based, angle-based, deep learning-based, and clustering-based, based on the definition of an outlier. Existing outlier detection methods require prior knowledge of dataset patterns to obtain a decent detection accuracy. For example, the distance-based outlier detection method assumes inliers are closed to each other, and the density-based detection method believes inliers have more neighbour data points than outliners. However, those assumptions may not be suitable for general datasets with various types, sparsity, configurations, or prior labelling. Clustering analysis handles unlabelled data, overcomes the sparsity of data, and works efficiently on datasets with various configurations. Therefore, this paper proposes a clustering-based outlier detection technique that achieves a better detection performance for various datasets of different types and structures, as compared to traditional detection methods. In this paper, we are proposing a Multi-level Clustering-based Outlier's Detection method called MCOD. The MCOD algorithm uses two stages to discover outliers. In the first stage, a clustering process is performed on the original data to generate summarizations (i.e., cluster prototypes) of the data. In the next stage, an outlier risk factor (ORF) is assigned to each data point x. The ORF is a measure of both the size of the cluster the data point belongs to and the distance between the object and its closest cluster. In the first stage, the proposed MCOD algorithm uses self-organizing maps (SOM) [9] as the base level of clustering, due to its efficiency in handling several types of classification problems while providing a useful, interactive, and intelligible summary of the data. The major disadvantage of the SOM is that it requires necessary and sufficient data to develop meaningful clusters. Furthermore, the clustering performance strongly depends on the initial weight vectors. Initializing the weight vector of SOM with the prior knowledge of datasets significantly helps to group the input data correctly. The proposed MCOD provides a multi-level clustering process that enhances the quality of the SOM and provides a significant outlier detection capability using a cascaded-level clustering and detection process. In this paper, the MCOD is applied to datasets in different fields, such as biomedical datasets and credit card fraud transactions. Experimental results show that the MCOD demonstrates its capability of improving the outlier detection rate in comparison with state-of-art methods. Utilizing the MCOD, business organizations can significantly detect instant frauds in data and subsequently increase profit or users' outcomes. The rest of this paper is organized as follows: Section 2 introduces the background of current outlier detection methods; Section 3 presents the proposed model of multi-level clustering-based outlier detection; Section 4 describes the experimental analysis; Section 5 outlines the conclusions and the future works.

Related Work and Background
This section provides the commonly used techniques to detect outliers, which are based on distance, distribution, density, deviation, angle, network connections, and clusters. For each technique, the methodology to detect outliers and its execution complexity are discussed.

Distance-Based Outlier Detection
This approach identifies an outlier based on the distance to its neighbours. If the locality of a data point is sparsely populated, then this point is an outlier [10]. A reasonable distance value and Big Data Cogn. Comput. 2020, 4, 24 3 of 18 a reasonable number of neighbours are set to be the thresholds. If the distance between two points is within the distance threshold, these two points are considered as a neighbour. The number of neighbours is the criterion for defining an outlier. This detection scheme was formalized by [11]. Given a dataset X, an object x ∈ X is an outlier if it meets the following condition: where n presents the number of objects in the dataset, and α, δ ∈ R (1 ≥ α ≥ 0) are thresholds. To improve the drawback of this distance-based method, which includes the lack of a ranking of outliers, the method of k-NN distance-based outlier detection is proposed, which gives each object a score by measuring the distance of its kth-nearest neighbour (kth-NN) [12]. Outliers can be ranked and identified by its score. The work [13] proved that the distance-based outlier detection method is capable of providing a comparable accuracy with a low computation cost.

Distribution-Based Outlier Detection
The distribution-based method is known as statistical-based outlier detection, which assumes, that in a normal dataset without outliners, all data follow a stochastic model. This approach requires prior knowledge of the datasets, such as distribution, mean, and variance. A data point is classified as an outliner if it deviates from the target distribution. For a dataset X, the target distribution usually is a normal distribution N µ, σ 2 and a standard deviation α are chosen as a threshold. Let L(X, α) be the lower bound of the standard deviation and R(X, α) be the upper bound. Areas lower than L(X, α) or higher than R(X, α) are considered as outlier region, which is expressed as An object is detected as an outlier as lying in the outlier region out α, µ, σ 2 [14]. For the multivariate case, the Mahalanobis distance is a popular-used criterion to detect outliers. Let x denote the mean vector of dataset X, and V be the covariance matrix. Then, the Mahalanobis distance M i for each object i is calculated, which is given by The object with a larger Mahalanobis distance M i , x i is classified as the outlier [15]. However, there are some drawbacks to this approach. For example, the data distribution is not pre-known in practice. Furthermore, it is difficult to estimate the actual distribution of the dataset for high dimensional data points.

Density-Based Outlier Detection
The density-based outlier detection method analysis the difference between the density of an object and the density of its neighbours. This method assumes the density of a normal object is similar to the density of its neighbours; therefore, if an object has a density that significantly different from its neighbours, this object will be considered as an outlier. The local outlier factor (LOF) is one of the most well-known unsupervised outlier detection methods, which functions similarly to the k-NN detection method [16]. Let dist k (x) be the distance between object x and its k-nearest neighbours x , and N k (x) be the set of k nearest neighbours (kNNs) of the object x, the reachability distance and the local reachability density are defined in Equations (4) and (5), respectively. The LOF of object x is defined as in Equation (6).
The LOF indicates the average of the ratio of local reachability density of an object x and its k-nearest neighbours x . An object with a high LOF value is identified as a local outlier. The SimplifiedLOF [17] differs from the standard LOF where the reachability distance (Equation (6)) is replaced by the k-NN distance, resulting in a simpler density estimate defined by the following: The SimplifiedLOF has often been used implicitly and often unintentionally where the reachability had not been explicitly defined. The density estimate stems predominately from LOF when in the reach_dist MinPts

Deviation-Based Outlier Detection
Based on the deviation-based outlier detection method, an object is classified as an outlier if it cannot fit into the main characteristics of the dataset. This approach simulates a mechanism in which human beings capture discordant objects from a series of similar objects. The sequential exception technique is one of the most popular [18]. For a dataset X, define a Smoothing f actor as the threshold. The threshold, SF(I), indicates how much the deviation can be reduced by removing the object I from dataset X. An object x is identified as an outlier if it satisfies the following condition in Equation (8), The sequential exception technique involves a high computational cost of O(2 n ) for n objects.

Angle-Based Outlier Detection
In the angle-based outlier detection [19] approach, the variance in the angles between the outlier candidate and all other pairs of points are assessed as the angle-based outlier factor (ABOF) value of it. The Standard Angle-based Outlier Detection (ABOD) approach requires to calculate ABOF of each data points. The ABOF is defined as: Big Data Cogn. Comput. 2020, 4, 24 It should be noted that C are all mutually different. After calculating the ABOF score, the standard ABOD ranks data points according to the outcome of the ABOF values. Given the inherent complexity of the ABOD, the primary issue with the aforementioned approach for outlier detection is the efficiency of the model. Using n to denote the number of points in the database, the time complexity of the ABOD is O n 3 , which is taxing on the system performing the analysis. As weight is used in the calculation of ABOF, distant data points from a particular → A is supposed to be of less importance than points neighbouring → A. Therefore, the formula for the FastABOD's corresponding ABOF was proposed to use pairs of points only in the set of N MinPts . The following is the definition of the FastABOD [20].
The approximate angle-based outlier factor, approxABOF N MinPts → A , is the variance over the angles between the difference vectors of → A to all pairs of points in D weighted by the distance of the points: The resulting time complexity is O n 2 + n·N 2 MinPts . Furthermore, it is noted that as long as the number N MinPts is selected small enough with respect to n, the FastABOD algorithm provides a marked acceleration.

Deep Learning-Based Outlier Detection
Deep learning methods, such as artificial neural networks (ANN), are recently used for fraud detection. The research of [21] on multiple structures of ANN models to detect fraud in the credit card transactions contained three types of layers: the input layer, hidden layers, and the output layer. They test the detection accuracy performance by one, two, and three hidden layers with one, ten, one hundred, and one thousand nodes using various activation functions, like the Relu, sigmoid, tanh, and identity functions. As a result, the highest precision rate of 96% is found when the model consists of two hidden layers with 1000 nodes and the Relu activation function. Furthermore, the sigmoid function gives the best sensitivity value than other activation functions. The work [22] presented four different deep learning detection methods and compares their performance on an identical dataset. The four methods were artificial neural networks (ANN), recurrent neural networks (RNN), long short-term memory (LSTM), and gated recurrent unit (GRU). RNN, which is a variant of the ANN model, can model large sequential data because the RNN has links not only between layers but also between neurons in the same layer. The LSTM model involves a memory cell to store the state of the neuron. The function of the GRU model is to make each recurrent unit in RNN capture dependencies of different time scales. Their research indicates that the fraud detection accuracy of the LSTM and GRU is higher than the ANN model, which is a performance baseline. The research also shows that the larger network performs better than small networks. Instead of considering deep learning methods only, [23] compared several machine learning methods and deep learning methods for credit card fraud detection. The machine learning methods they used were k-nearest neighbour (KNN), random forest, and support vector machines (SVM), while the deep learning methods were convolutional neural networks (CNN), restricted Boltzmann machine (RBM), and deep belief networks (DBN). They apply all these methods on three different size datasets and use the area under the Receiver Operating Characteristic (ROC) curve, denoted by the AUC to evaluate the detection performance. The work [23] concluded that the best method for detecting larger datasets is by using SVM, potentially combined with CNN to attain more reliable performance. Comparing the deep learning methods only, CNN always provides higher accuracy and fewer false alarms than other methods, such as RBM and DBN.

Clustering-Based Outlier Detection
Cluster analysis is a grouping process that divides data objects into multiple collections in an unsupervised process. There are no predefined classes; instead, data objects are grouped based on their characteristics. As the clustering result, data objects that are in the same cluster are similar to each other, and they are dissimilar to the objects in other clusters. A way to define the clustering result whether it is high quality in clusters should have high intra-class similarity and low inter-class similarity. Clustering has a quite broad field of applications, such as medicine, social science, and market research. Therefore, several clustering approaches are studied to produce high-quality clusters. In clustering-based outliers detection methods, such as FindCBLOF [24], outliers form small clusters and are of a far distance of objects in large clusters [24,25]. The work [26] proposed a clustering method that assigns and updates a weight of relevance to each attribute of objects to decrease the impact of noise on the clustering result. For a dataset X, each object x which contains a set of m attributes are divided into multiple data chunk. Given a chunk size n, the number of clusters k and the weight vector W can be calculated as: where w j indicates the weight of j th attribute of object x. u i,l equals a value of 1 if the i th object is in the l th cluster; otherwise, it equals 0. β is a user-defined parameter, d() is the distance measure and z l is the center of the l th cluster. After updating the weight vector using Equations (13) and (14) to form the k clusters, outliers are detected if they are far away from its cluster centre than other objects. This method performs a high outlier detection rate and a low false alarm rate with a low time consumption [26].
Clustering-based outliers' detection has shown a great impact in the area of outlier's detection, as (1) it performs two dual tasks at the same time including clustering the data and detecting outliers, and (2) it does not need ground truth to train the model (i.e., it does not need prior knowledge about the data to build a detection model) [27].

Multi-Level Clustering-Based Outlier Detection
In this paper, we propose a multi-level clustering-based outliers detection method called MCOD. The MCOD algorithm uses two stages to discover outliers. In the first stage, a clustering process is performed on the original data to generate summarizations (i.e., cluster prototypes) of the data. Two sets of clusters are generated, a large population set (LPS), and a small population set (SPS), such that a cluster S i ∈ the LPS if its sizes | S i | exceeds α *|X|, where X is the entire dataset, and α is a predetermined threshold; otherwise, S i is a member of the SPS. In the next stage, an outlier risk factor (ORF) is assigned to each data point x; the ORF (x) is a measure of both the size of the cluster the data point x belongs to and the distance between the object and its closest cluster (if the object lies in a small cluster). For two data points, x and y in the d-dimensional space, the measure of closeness the two data points by: For datasets with spherical shapes, we used the Euclidian distance as a measure of dissimilarity (i.e., large distance), such that Distance(x,y) = Euclidian (x,y), and for datasets with high sparsity and dimensionality, we used cosine similarity as a measure of homogeneity (i.e., high similarity and Sim(x,y) = Cosine(x,y)). Given a numeric parameter α, a cluster S i , and a prototype z i , the ORF of an object x ∈ S i represented by z i is defined as in Equation (16). The final outliers are returned as objects with high ORF values.
It can be shown that the efficiency of detecting outliers is constrained to the quality of the adopted clustering technique. In [25] and [28], it has been experimentally proven that better clustering solutions reveal better detection of outliers. Self-organizing maps (SOM) cluster the datasets into groups with high homogeneity and better overall clustering quality, as compared to traditional unsupervised clustering algorithms [9]; thus, in this paper, we focus on using the SOM as the base level of clustering in our proposed algorithm. More information on SOM is provided next.

SOM Clustering
SOM clustering is used to produce a two-dimensional map to represent a high dimensional input training dataset. In this case, the number of neurons on the 2-D map represents the number of clusters. Each N-dimensional input data point x in the dataset is connected to the map with the neurons of the size of M through weights w ij , where i = 1, 2, . . . , N, j = 1, 2, . . . , M. For M clusters, there are a M number of w i vectors. In the beginning, those weight vectors are initialized as random values. During the training process, each w i vector gets updated to describe the input patterns associated with those clusters. The square of the Euclidean distance is used to measure the relationship between data point x and each weight vector w i . The neuron j with a weight vector w ij , where i = 1, 2, . . . , N that closely matches the data point x is chosen as the cluster that point x belongs to, which also means x − w ij 2 is the smallest [9]. The major disadvantages of SOM are that it requires sufficient data to perform a good clustering, and the order of presentation of the training data impacts the final map solution [29]. The clustering performance strongly depends on the initial weight vectors. Initializing the weight vector of SOM with the prior knowledge of datasets significantly helps to group the input data correctly [30].

The MCOD (Ai-SOM) Outlier's Detection Algorithm
Since the SOM highly depends on the initialization, in the first stage of the MCOD algorithm, we propose a multi-level clustering algorithm approach that uses two levels of clustering in a cascade-level approach. In the first level, an A i clustering method is applied to the dataset to divide the data points into k clusters. The A i algorithm can be of any type of the commonly used clustering techniques, such as K-Means (KM) [31], Bisecting K-Means (BKM) [32], Partitioning Around Medoids (PAM) [33], and Fuzzy C-means (FCM) [34]. The appropriate A i clustering method is selected based on the characteristics of the dataset. In the second level, the clustering result from the A i method is used as the initial seeds to the SOM clustering method. The A i algorithm is able to provide knowledge about the dataset and give a reasonable initial weight vector to the SOM process. Adding one prior clustering level to SOM can improve the cluster performance to the dataset so that the multi-level-based clustering outlier detection method is capable of classifying outliers for datasets with general configurations and properties. In the MCOD, the number of clusters, k, needs to be specified. The A i algorithm groups the dataset into k clusters. The average of all data points in each cluster from the A i algorithm is computed. Then, the clustering result from the A i algorithm is passed to SOM. The number of neurons in the SOM is the same as the number of clusters in the A i result, and the initial weight vector for these neurons is now initialized non-randomly. Instead, the average values of each clustering from the A i algorithm are assigned to be the initial weight of each neuron in SOM. The multi-level clustering is shown in Figure 1. Using this multi-level cascaded strategy, the MCOD overcomes the initialization problem that the SOM suffers from, thus providing significant fraud detection capabilities for organizations and enterprises. The ORF for all data points is computed based on the clustering result after stage 1, using the provided initial weights and specified learning rate. A data point is classified as an outlier if it has a high ORF value. The MCOD algorithm is illustrated in Algorithm 1.

Algorithm 1 Multi-Level Clustering-Based Outlier's Detection (MCOD) (Ai-SOM)
Input: Dataset X of n records and d dimension, Algorithm A i , Number of clusters k, Learning rate η, Alpha α Output: Top % outliers Begin Step1://Apply A i on X to obtain k cluster Clusterj ← A i algorithm (X, k) where j = 1, 2, 3, . . . , k Step2://Initialize a vector W with the size k for j ← 1, 2, 3, . . . , k do w → j = mean (Clusterj) end for Step 3://Reshape W to a 2D matrix that matches the shape of SOM for x ← 1, 2, 3, . . . , sqrt(k) for y ← 1, 2, 3, . . . , sqrt(k) z = (x-1)·sqrt(k)+y Wx,y = w → z end for end for Step 4: //Apply SOM and update W by using η to obtain updated k cluster UpdatedClusterj ← SOM (Clusterj, W, k, η) where j = 1, 2, 3, . . . , k Step 5: Find the ORF factor for each object x in the updated k cluster given α Step 6: Select top % data points with high value of ORF as outliers End multi-level clustering is shown in Figure 1. Using this multi-level cascaded strategy, the MCOD overcomes the initialization problem that the SOM suffers from, thus providing significant fraud detection capabilities for organizations and enterprises. The ORF for all data points is computed based on the clustering result after stage 1, using the provided initial weights and specified learning rate. A data point is classified as an outlier if it has a high ORF value. The MCOD algorithm is illustrated in Algorithm 1.

Datasets
Experiments were performed on artificial, biomedical, documents datasets, and credit card datasets with various characteristics and degree of outliers. A summary of the experimental datasets is shown in Table 1.  [35] and the Wood dataset [36] are two artificial datasets with known true outliers used for our experimental analysis. The HBK is an artificially generated random dataset with 75 observations in four dimensions. The dataset contains 14 outliers. The Wood dataset consists of 20 observations, with data points 4, 6, 8, and 19 being outliers.

Biomedical Datasets
Two biomedical datasets were used: the Cardiotocography dataset [37] and the Breast Cancer dataset [38]. The cardiotocography dataset contains 2126 cases with 25 variables. Both pathologic and suspect cases are classified as outliers, while the normal cases formed the inliers. The percentage of outliers in this dataset is 22.5%. The total number of instances in the Breast Cancer dataset is 699, with 34.5% outliers. Each instance of this dataset has nine attributes.

Credit Card Datasets
The Royal Bank of Canada (RBC) dataset is a real dataset provided by the RBC bank [39]. The dataset contains 13,731 credit card transactions. Each transaction includes 15 variables, which are the result of a Principal Component Analysis (PCA) transformation. All transactions are labelled as fraud or non-fraud. There are 415 fraud transactions, which account for 3.02% of all datasets. The secondary dataset, European Credits, contains credit card transactions in September 2013 by the European credit cardholders, which has 492 instances of fraud out of 284,807 transactions [40]. The percentage of fraud transactions is 0.172% in this dataset. This dataset has 29 numerical variables transformed by PCA.

Adopted Outliers Detection Algorithms
The detection accuracy of the MCOD(A i -SOM) is compared with that of the traditional density-based Local Outlier Factor (LOF) detection method [16] as well as the clustering-based detection approach FindCBLOF(A i ) [24] approach, where A i ∈ {KM, BKM, PAM, FCM, SOM}, using the k-means (KM) [31], Bisecting k-means (BKM) [32] or Partitioning Around Medoids (PAM) [33], Fuzzy k-means (FCM) [34], and Self-Organizing Map (SOM) [9]. The FindCBLOF(KM) uses the K-means clustering method to divide the dataset D into k clusters, which can be represented as C = {C 1 , C 2 , C 3 , C 4 , . . . , C k } such that C i ∩ C j = ∅ and C i ∪ C j = D. The FindCBLOF(BKM) uses bisecting K-means (BKM) to split the dataset into k clusters. BKM firstly considers the dataset as one whole cluster, then divides one cluster into two sub-clusters at each bisecting step using K-means. The criterion of choosing cluster to bisect is based on the number of data points in the clusters. The cluster with the greatest number of data points is selected to be partitioned using K-means into two clusters. The bisecting process stops until the number of clusters is reached. The FindCBLOF(PAM) uses the PAM technique to divide the dataset into k number of clusters. The difference between PAM and K-means is that PAM uses medoids of a cluster instead of the mean, and the cluster centre is a data point inside the cluster. The FindCBLOF(FCM) uses Fuzzy c-means (FCM), in which, each data point has a degree of belonging to each cluster, and the cluster centre is the mean of all data points in this cluster weighted by their degrees of membership. Finally, the FindCBLOF(SOM) uses the self-organizing map (SOM) to cluster the dataset. For each FindCBLOF(A i ) approach, where A i ∈ {KM, BKM, PAM, FCM, SOM}, every data point is assigned with a clustering-based local outlier factor (CBLOF), and those with the largest value of CBLOF are considered as outliers depending on the choice of contamination. Table 2 shows the parameter settings for each of the above algorithms where k is the number of clusters and MaxITER is the maximum number of iterations.

Evaluation Criteria
The performance of the detection algorithms performance is measured by the number of detected outliers compared to true outliers, area under the precision-recall curve (AUPRC), and the accuracy of label prediction. The AUPRC indicates the model's capability of distinguishing between fraud and non-fraud. The execution time of each detection method is also measured and compared.

Individual Clustering Results
As stated in [25] and [41], better clustering solutions reveal better detection of outliers; thus, to test the performance of the proposed MCOD algorithm, using a clustering method A i , we tested the performance of each of the clustering methods separately, using silhouette score [42], as shown in Table 3. The silhouette score of each instance measures how similar an object is to its own cluster compared to other clusters by calculating the intra-cluster distance and the distance to its nearest cluster. The silhouette score of a dataset is the mean of each instance's silhouette score. The score range is between −1 to 1. A score that is closed to 1 indicates a good clustering performance. Scores near zero means overlapping clusters. If the silhouette score is a negative value, it generally indicates that an instance has been assigned to the wrong cluster, or the clusters are too similar. It can be shown in Table 3 that SOM and KM has the best performance measured by high silhouette Score for the HBK, Wood, Cardio, BC, and RBC datasets.  Table 3).

Experiment 2: Medical Datasets with True Outliers
The FindCBLOF, LOF, and MCOD were applied to the Cardiotocography and Breast Cancer medical datasets. In this experiment the TopRatio ranges from 10% to 30%. From each detection method, data points with a high ORF value within the percentage of the TopRatio were considered outliers, and they were compared with the true outliers. Tables 6 and 7 show the number of true outliers detected by the LOF and the FindCBLOF for the Cardiotocography and Breast Cancer medical datasets, respectively. Tables 8 and 9 present the detection quality of the MCOD algorithm for the Cardiotocography and Breast Cancer medical datasets, respectively. It can be shown that, for both the Cardiotocography and the Breast Cancer dataset, the number of detected true outliers using the MCOD method was more than those detected by the LOF and FindCBLOF. For the Cardiotocography dataset, the FindCBLOF(A i ), where A i ∈ {KM, BKM, PAM, FCM, SOM}, the technique was able to detect 22% of true outliers on average, while the MCOD(A i -SOM) techniques achieved 42%. The result from the Breast Cancer dataset shows the same performance. The MCOD increases the average detection rate of true outliers from 48% by using FindCBLOF to up to 81%.

Conclusions and Future Directions
Outliers detection plays a significant role in many business applications, including finance, healthcare systems, face recognition, and many others. In this paper, a multi-level clustering-based outlier detection technique (MCOD) is proposed. This approach is based on using two stages to finally assign an outlier risk factor (ORF) to each data point and recognizing the set of final outliers by the high value of ORF. The MCBOD algorithm relies on the fact that the clustering method in the first layer can provide the knowledge about the dataset as the initial seeds to the second layer to achieve better detection of outliers in the dataset. The MCOD is applied to artificial datasets, biomedical datasets, and credit card datasets with different degrees of outliers and instances of fraud. The undertaken experimental results indicate that the MCOD attains better outlier detection performance than the traditional outlier detection techniques measured by the detection accuracy and the area under the PR curve for various datasets with different configurations and types. As outliers and fraud detection play an important role in many business applications, including credit card fraud detection, anomalies records in healthcare, and many others, the MCOD can be used as an automatic tool to find and discover those outliers with high efficacy and better detection accuracy. Various enterprises and organizations can adopt the MCOD algorithm to reduce the impact of frauds or outliers in data and increase profit/outcomes. Future directions include adopting different clustering algorithms and datasets with large sizes, dimensions, and different levels of outliers. Performing sensitivity analysis on the parameters used is also recommended for future research work.