# Using Rough Set Theory to Find Minimal Log with Rule Generation

^{*}

## Abstract

**:**

## 1. Introduction

- Developing a new algorithm using RST basic concepts to create minimal re-ducts;
- Offering a feasible feature selection methodology scalable to huge datasets, without sacrificing performance;
- Creating a minimal rule decision database that retains information content;
- Using three benchmark UCI datasets to evaluate the performance of the methodology;
- Comparing the result of the proposed model to recent works.

## 2. Related Works

## 3. Theoretical Background

#### 3.1. Rough Set

_{1}, u

_{2},…, u

_{n}} is called the universe set, which is a finite non-empty set of N objects (or instances), and A is (n + k) attribute set, which is non-empty. The set A (A = C ∪ D) is split into the following two subsets: conditional attribute set C and decision attribute D. The subset C = {a

_{1},a

_{2},…,a

_{n}} has n predictors or conditional attributes, while the subset D = {d

_{1},d

_{2},…,d

_{k}} has k output variables or decision attributes. For every single feature a ∈ A, there exists a domain which collects possible assigned values denoted by V

_{a.}

_{1},u

_{2}) ∈ U × U:∀a ∈ P, a(u

_{1}) = a(u

_{2})}.

_{i}) indicates the attribute value for the object i. This shows that if two objects belong to indiscernibility relation (u

_{1},u

_{2}) ∈ IND(P), then, by attributes P, u

_{1}is indistinguishable or unidentifiable (indiscernible) from u

_{2}. The relation mathematically is symmetric, reflexive, and transitive. Now let [u]

_{P}be the set representing the generated equivalence classes, where u∈U. This set divides U into distinct classes or blocks labeled as U/P.

^{γ}

_{P}(Q) = |POS

_{p}(Q)|/|U|, 0 ≤

^{γ}

_{P}(Q) ≤ 1, | | means cardinality

_{Q}using P. The closer

^{γ}

_{P}(Q) is to 1, the more dependent Q is on P. RST proposes two essential ideas for feature selection based on these fundamentals, which are the Core, and the Reduct.

^{γ}

_{R}(D) =

^{γ}

_{C}(D), where R ⊆ C

^{γ}

_{R′}(D) =

^{γ}

_{R}(D), if this condition is satisfied, the reduct is called the minimal reduct, where the features selected are the minimum that preserve the same value of dependency degree as the whole original feature set. However, we should remember that the definition allows the theory to generate a set of possible reuducts, RED

^{F}

_{C}(D), and any of them are allowed to be used.

_{C}(D) = ∩ RED

^{F}

_{C}(D)

_{ij}can be defined by:

_{ij}contains attributes for which x

_{i}and x

_{j}are different. If this matrix is adapted with any decision table, the definition will be:

#### 3.2. R Language

## 4. Research Methodology

#### 4.1. Problem Statement and Motivation

#### 4.2. Datasets

#### 4.3. Building a Minimal Log Size (Reduct)

- Splitting the dataset into N subsets and performing the proposed algorithm on each subset will overcome hardware limitations, since fewer entries means less memory space to upload the data, perform computations, and store the results. Keeping the whole high dimensional dataset in memory and performing all the previous steps, is mostly impossible;
- Reducing the number of calculations, since passing only the minimal elements in the discernability matrix to reducts calculation will not cause the computation of each possible attribute combination, and hence the equation ${{\displaystyle \sum}}_{i=1}^{N}(\begin{array}{c}N\\ i\end{array})$ = ${2}^{N}$ − 1 is no longer valid. This will certainly reduce the execution time. The proposed code is given in Algorithm 1:

Algorithm 1: IRS Algorithm |

Input: T = (U,A∪D): information table, N: number of iterations,M: number of datasets Output: Core–Reduct,1: For each dataset M do2: For each iteration N do3: Calculate IND _{N}(D)4: Compute DISC.Matrix _{N}(T)5: Do while (DISC.Matrix_{N}(T) ≠ Ø) and i ≤ j(RST discernibility matrix is symetric) 6: S _{i0,j0} = Sort (x_{i},x_{j}) ∈ DISC.Matrix_{N}(T)according to number of conditional attributes A 7: End while 8: Compute Reduct _{N}(S_{i0,j0})(calculating reducts for minimal condition atrridutes) 9: Reduct _{N} = Reduct_{N} ∩ Reduct_{N}(S_{i0,j0})10: End For N11: Core–Reduct = Core–Reduct ∩ Reduct _{N}minimal optimal reduct 12: End For M |

#### 4.4. Generating Minimal Decision Rules

Algorithm 2: Rule Generation Algorithm |

Input: Reduct_{N} (T): minimal reduct information table, M: number of datasetsOutput: Set-Rule_{Min}1: For each dataset M do2: read.table(Reduct _{N} (T))3: Splitting Reduct _{N} (T)training set 60% and a test set 40%. 4: RI.LEM2Rules.RST() function Create rules depending on training set of Reduct _{N} (T)5: predict() function Testing the quality of prediction depending on the test set of Reduct _{N} (T)6: mean() function. Checking the accuracy of predictions 7: End For M |

#### 4.5. Execution Time Comparison with Existing Methods

## 5. Conclusions and Future Works

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

Event Name | Log Source | Event Count | Low Level Category | Source IP | Source Port | Destination IP | Destination Port | User Name | Magnitude |
---|---|---|---|---|---|---|---|---|---|

Tear down UDP connection | ASA @ 172.17.0.1 | 1 | Fire wall Session Closed | 8.8.8.8 | 53 | 172.18.12.10 | 53,657 | N/A | 7 |

Deny protocol src | R | 1 | Fire wall Deny | 172.20.12.142 | 56,511 | 172.217.23.174 | 443 | N/A | 8 |

Deny protocol src | ASA @ 172.17.0.1 | 1 | Fire wall Deny | 172.20.18.54 | 52,976 | 213.139.38.18 | 80 | N/A | 8 |

Deny protocol src | ASA @ 172.17.0.1 | 1 | Fire wall Deny | 172.20.15.71 | 53,722 | 52.114.75.79 | 443 | N/A | 8 |

Deny protocol src | ASA @ 172.17.0.1 | 1 | Fire wall Deny | 192.168.180.131 | 55,091 | 40.90.22.184 | 443 | N/A | 8 |

Built TCP connection | ASA @ 172.17.0.1 | 1 | Fire wall Deny | 172.18.12.19 | 59,201 | 163.172.21.225 | 443 | N/A | 8 |

Training Data Set | Minimal Attribute | Degree of Dependency ^{1} |
---|---|---|

First Training Set S1 (∩ three iterations) Reduct_{N = 1} | A1 = {Event Name, Source IP, Source Port, Destination IP, Magnitude } |A1| = 5 | 1 |

Second Training Set S2 (∩ three iterations) Reduct_{N = 2} | A2 = { Event Name, Source IP, Destination IP, Magnitude }|A2| = 4 | 0.9992941 |

Third Training Set S3 (∩ three iterations) Reduct_{N = 3} | A3 = {Event Name, Source IP, Source Port, Destination IP, Magnitude } |A3| = 5 | 1 |

Core-Reduct (A1∩ A2∩ A3) | A2 = { Event Name, Source IP, Destination IP, Magnitude }|A2| = 4 | 0.9992941 |

^{1}: a decision attribute, d, totally depends on a set of attributes A, written as A ⇒ d if all attribute values from d are distinctly identified by attribute values from A.

Training Data Set | Number of Decision Rules before Reduct | Number of Deccision Rules after Reduct | Prediction Accuracy |
---|---|---|---|

First Training Set | S1 = 905 | A1 = 596 | 0.9552733 |

Second Training Set | S2 = 878 | A2 = 509 | 0.9535073 |

Third Training Set | S3 = 813 | A3 = 481 | 0.9741291 |

Dataset | Number of Attributes | Number of Instances |
---|---|---|

Glass | 9 | 100 |

Wiscon | 9 | 699 |

Zoo | 16 | 100 |

Data | Num. of Attributes of the Dataset | All Reducts | Execution Time in Seconds | |||
---|---|---|---|---|---|---|

IRS | SPS and CDM | Classical DiscernibilityMatrix (CDM) | SPS | IRS | ||

Wiscon | 9 | 4 | 4 | 1362.1 | 24.0956 | 9.05 |

Glass | 9 | 2 | 2 | 23.3268 | 0.7931 | 0.7 |

Zoo | 16 | 35 | 35 | 106.6581 | 1.2574 | 0.9967 |

