A Novel Study: GAN-Based Minority Class Balancing and Machine-Learning-Based Network Intruder Detection Using Chi-Square Feature Selection

: The network security problem becomes a routine problem for networks and cyber security specialists. The increased data on every minute not only creates big data problems, but also it expands the network size on the cloud and other computing technologies. Due to the big size and data, the network becomes more vulnerable to cyber-attacks. However, the detection of cyber-attacks on networks before or on time is a challenging task to solve. Therefore, the network intruder detection system (NIDS) is used to detect it. The network provided data-based NIDS were proposed previously, but still needed improvements. From the network data, it is also essential to find the most contributing features to avoid overfitting and lack of confidence in NIDS. The previously proposed solutions of NIDS mostly ignored the class imbalance problems that were normally found in the training of machine learning (ML) methods used in NIDS. However, few studies have tried to solve class imbalance and feature selection separately by achieving significant results on different datasets. The performance of these NIDS needs improvements in terms of classification and class balancing robust solutions. Therefore, to solve the class imbalance problem of minority classes in public datasets of NIDS and to select the most significant features, the proposed study gives a framework. In this framework, the minority class instances are generated using Generative Adversarial Network (GAN) model hyperparameter optimization and then the chi-square method of feature selection is applied to the fed six ML classifiers. The binary and multi-class classifications are applied on the UNSW-NB15 dataset with three versions of it. The comparative analysis on binary, multi-class classifications showed dominance as compared to previous studies in terms of accuracy (98.14%, 87.44%), precision (98.14%, 87.81%), F1-score (98.14%, 86.79%), Geometric-Mean (0.976, 0.923) and Area Under Cover (0.976, 0.94).


Introduction
Internet usage and growing technologies raise the risk of cyber security installments on cloud computing, edge computing, and other networks. These malicious attacks lead to financial and reputational losses. However, the network intrusion detection system (NIDS) installed on these networks prevents these cyber-attacks [1][2][3]. NIDS basically monitor the network continuously within its cyberspace [4]. The history of intrusion detection systems starts at 1980 where J. Anderson et al. proposed a method [5] that was enough to secure a network at that time with required security. However, the immense progress in technologies in recent decades has created many challenges regarding network security. The big data created by technologies nowadays makes millions of gigabytes (GBs) of data that are shared across the network nodes.
The network itself becomes larger in size and it also challenging to maintain the network safely. Network security also becomes more challenging when certain types of • The class imbalance problem is solved using the generative adversarial network (GAN) model hyperparameters optimization for tabular or numeric data generation. • The appropriately generated new data using the UNSW-NB15 dataset reduced the class imbalance problem against different categories of network attacks. • The generated dataset-based results are compared with the original UNSW-NB15 dataset that proved the validity of the dataset and enhanced its precision rates of classification. • The chi-square method is used for features selection, where classical ML methods are used for the binary and multi-class classification of network attacks on original and newly generated datasets. • The results and comparative analysis showed the outperformance of the proposed framework as compared to previous studies such that the proposed framework is a more reliable and valid method of network intrusion detection.
The rest of the article is divided into five sections. The related work section reviews the previous studies about NIDS; the proposed methodology and discussion in Section 3 high-light the operational functionality of a proposed framework; Section 4 is about the results and validity of proposed study; and Section 5 is about a comparison to previous studies, and lastly the whole framework is concluded, with weaknesses and future directions.

Related Work
The proposed framework specially targets the class imbalance issue using an algorithmbased approach and, later on, an applied feature selection method to provide efficient NIDS. Therefore, a summary of the recently applied relevant studies are discussed here and shown in Table 1 with their weaknesses and strengths to enhance understanding about the proposed framework. The class imbalance issue was discussed in a study by X. Tan et al. [21] and the KDD 99 Cup dataset was used in it to make NIDS, where the random forest classifier was used for classification. It showed in the results that the classification was improved (from 92.39% to 92.57%) while using the SMOTE-based solution of the class-imbalance problem. Other classifiers were also used, and the best one is discussed here.
Another study [22] used SMOTE and Gaussian mixture model (GMM)-based methods to oversample and solve the class imbalance problem. Two datasets, UNSW-NB15 and CICIDS2017, were used. The binary classification using the proposed SGM-CNN on UNSW-NB15 obtained an accuracy of 98.82%, and a precision of 95.53% F1-score, where the multi-class achieved 96.54% accuracy and 97.26% F1-score. The other dataset was classified on multi-class only and showed 99.85% accuracy with 99.86% F1-score.
The deep-learning-based study [23] proposed a Bi-LSTM model that assigns the attention weights via adaptive synthetic sampling (ADASYN) to handle the class imbalance problem found in the NSL-KDD dataset. The accuracy reaches up to 90.73% and 89.65% F1-score. T. Wao et al. [24] proposed a NID based on SMOTE and the KNN clustering method to solve the class imbalance problem found in the NSL-KDD dataset and then applied the random forest classifier to solve the classification problem. It achieved 78.47% accuracy on the test set.
The Focal loss is a function mostly used in ML to measure the loss-based performance of a model to classify the instances. A study [25] used this cost-sensitive function to solve the class imbalance problem and proposed a deep neural network (DNN) and convolutional neural networks (CNN) to classify data on three datasets (NSL-KDD, UNSW-NB15, and Bot-IoT). The results section of this study is enriched with scalability-based data, the layerwise results evaluation for binary and multi-class classification on all datasets. However, the final results for binary DNN showed 83.92%, 90.41%, and 79.24% F1-score for the NSL-KDD, UNSW-NB15, and Bot-IoT datasets. For multi-class, it showed 47.33%, 39.78% and 98.90% F1-scores, respectively. On the second model CNN for binary classification, it showed 84.87%, 86.03%, and 95.57% F1-scores, where for multi-class, it showed, 51.96%, 39.52% and 95.51% F1-scores, respectively.
The algorithm level class imbalance problem is solved [26] by applying several ML classifiers on the NSL-KDD and UNSW-NB15 datasets. It modifies the cross-entropy function that is used as a loss function and by applying data normalization. The final achieved classification scores on binary data are 90.76% for the UNSW-NB15 dataset, and 85.56% for the NSL-KDD dataset.
Secondly, the features-selection methods reduce the training and testing times of NIDS solutions with the most appropriate features engagement while performing classification. One of the recent studies [20] applied various methods, such as GA, PSO and many others to check the effect, where the top16 features were selected from the UNSW-NB15 dataset to classify the binary data. The maximum scores achieved by the J48 algorithm were 90.48% accuracy, 84.136% precision, 97.141% sensitivity, and 90.172% F1-score.
However, we have seen a lot of studies applied on class imbalance problem solutions, where the proposed study not only solved the class imbalance problem, but also applied feature selection based upon the chi-square method of features ranking and selected the topmost-ranked features. Similarly, we can see that few studies have been applied separately for class imbalance solutions and feature-selection methods. Therefore, the proposed work gives both solutions with appropriate combination and solves the binary and multiclass classification problems with improved performance as compared to previous studies.

Proposed Methodology
The proposed framework applies certain schemes to make balanced, robust, and improved NIDS. It includes certain steps to perform class balancing and features-selectionbased NIDS that are shown in Figure 1.
The proposed framework includes dataset cleaning and the removal of unnecessary features from the given dataset. The meaningful features are then further selected before classification. However, at first, the minority class is balanced with the GAN method, and instances are increased with the appropriate method. The detailed discussion in coming sections.

GAN-Based Minority Class Data Generation
GAN is basically a method that works on two main objects or networks in it: generator and discriminator. The generator is given with a latent space regarding instances based on some values to it, and then random noise is added into it. The noise is then given to a discriminator that compares it with a real dataset sample to check if the value is similar or nearer to it. The sigmoid and other loss functions are used in it. Each time the noise is rejected from the discriminator; the value is changed with the help of the weight update functions.
The loss function is set to reduce each time, and when it is reduced to a certain threshold, the new value of the generated sample is accepted. By using this basic principle of the GAN, the tabular data generator is used by the proposed study to create minority class instances. The tabgan API [27,28] is used to obtain the GAN model solution, the hyperparameters optimization is discussed in Section 4. The generated values are compared and shown in Figure 2 for few feature vectors that show similarity with the original data.
All data of features from the UNSW-NB15 dataset are fed to the GAN model to generate new instances where, for better understanding, only 10,000 instances of data are shown in Figure 2 to see the similarity between the generated data and original data. If we look at the left side of Figure 2, at the proto feature, the range is equalized to 100 or slightly more than 100, where on the right side, the generated data also contain a similar range, even if the range is higher at the initial values; then, the range is again higher in GAN-based proto data. This is similarly true for service, state, spkts, sbytes, etc.
For all the features, we can look at the left and right histograms showing very similar behavior among the original and newly generated data. These features are basically generated for each attack type given the dataset. The data for the normal class are not considered for giving and training while making the new instances, as they already have many instances. However, the data are later checked with both binary and multi-class data type classifications.

Dataset Preprocessing
The values in all features are converted into numeric form, even if those are in categorical format. After obtaining the refined numeric values in feature columns, clamping is performed with certain conditions as described in Equations (1)-(3).
Equation (1) represents a flag (F 1 ) that is made to check if the iterating feature vector (FV i ) maximum value is greater than 10 times: The second flag (F 2 ) in Equation (2) checks whether the feature vector value is 10 times greater than the median of all data of this particular feature: (FV i ) maximum value is more than 10 times or not.
In Equation (3), if the feature vector values are flagged to true means, then at a particular instance of the feature vector, the value is found to be 10 times greater than the median of this feature, and then that particular instance value with 98th percentile of that column should be replaced. In this way, all values of all features do not go beyond the limits and are pruned over the 98% percentile. The clamping plays an important role in normalizing the values.
The greater the categorical values, the greater the skewness.Therefore, in order to reduce the skewness among those features that have more than 50 categorical values, the log function is applied, and the mathematical representation is shown in Equation (4): In Equation (4), we can see that all features that are collected after performing the pruning operations are given to a log function to remove the skewness in those columns that have uniqueness greater than 50 times.

Features Selection Using Chi-Square
Selection of meaningful and most significant features increases the precision value and reduces the training and testing time with the reduction in overfitting problems as well. Therefore, the features ranking using chi-square is applied, where how much features should be selected is interpreted manually by giving 16 features as the final selected features. The manual selection is also based upon performance, as the further increase of features does not increase the classification models' performance, where reducing the features by more than16 reduces their performancer.Therefore, the most valuable 16 features are selected based upon their ranking. The operational functionality of the chi-square method is shown in Equation (5): The  There are five main steps involved in the chi-square method of feature selection. The hypothesis-based operation is performed, which is based upon contradicted behaviors. The contingency table is built upon values of given data. The expected value is calculated, and then the chi-square value is calculated according to Equation (5). The acceptance of the hypothesis is performed based on the chi-square value, where the rule is to select the higher values that show a greater tendency toward the output classes. The chi-square selected 16 features are given to the ML classifiers. The features selection shows different features on binary and multi-class classification. It means the different features show varying levels of importance in the multi-class classification of network security attacks.

ML Classification
There are three datasets that are evaluated and classified on binary and multi-class tasks. The original dataset is given to chi-square to select the top 16 features and then fed to ML classifiers to apply binary and multi-classification. However, to validate the newly generated data attacked categories instances and the original data, the (1) original (UNSW-NB15), (2) GAN and (3) original + GAN datasets are used to show the consistent behavior of all classifiers on binary and multi-class classifications.
The holdout validation scheme is used by randomly splitting all data into an 80:20 ratio of training and testing. The decision tree, extra trees, random forest, logistic regression, k-nearest neighbor (KNN), and multi-layer perceptron (MLP) types of classifiers are applied to validate the performance of GAN-based data, and the feature-selection method is used by the proposed study. These are all discussed in detail in this section. The full frameworkbased algorithm is shown in Algorithm 1.

Algorithm 1 Proposed-framework-based algorithm from features input to classification.
Input: UNSW-NB15 dataset features (FV i ) Output: Classification of three datasets D 1 , D 2 , D 3 Step 1: Take all features (FV i ).
Step 2: Separate out minority class instances (MV i ).
Step 3: Generate new minority class instances (NMV i ) using hyperparameter-optimized GAN model used in proposed study.
Step 4: Separate out three datasets: Original UNSW-NB15 Dataset (D 1 ), combined data of newly generated minority class in Step 3 and original dataset based normal class instances (D 2 ) and combined data of original UNSW-NB15 instances + GAN based newly generated minority class instances (D 3 ) Step 5: Features normalization using Equations 1, 2, 3 and 4 on D 1 , D 2 , D 3 .
Step 7: Obtained three feature sets (GV i ) based upon chi-square method using D 1 , D 2 , D 3 Step 8: Conducted Experiments 1, 2 and 3 using D 1 , D 2 , D 3 by feeding them to ML classifiers The full algorithm is shown to give a step-by-step understanding of the proposed study. The input is taken from the UNSW-NB15 dataset by separating out the class attribute columns. However, at Step 2, based upon the category of each network attack type, the class-wise instances are separated out. In Step 3, the class-wise new instances from the given instances of UNSW-NB15 (D 1 ) are generated using the GAN model that is optimized with the hyper-parameters selection. To validate the performance of newly generated minority class instances and to cross check the performance as compared to the original dataset (D 1 ), three datasets D 1 , D 2 , and D 3 are made in step 4. Features normalization on all three sets is performed using Equations (1)-(4) in Step 5. After applying features normalization, the most important features are selected using the chi-square method on all feature sets in Step 6. The selected features (GV i ) for each feature set are fed to six classifiers in Step 8, where both binary and multi-class classifications are performed in three experiments.

Results and Discussion
The original UNSW-NB15 dataset, the GAN-based a balanced dataset of minorityclasses and the original + GAN-based datasets are used to perform experimentation on NIDS development. The dataset instances before and after GAN are discussed in detail. The feature-selection-based results are collected in three different experiments on six different classifiers of the ML domain.

Datasets Description
The original dataset contains labels for both multi-class and binary class classification. The class-wise number of instances in each class is shown in Table 2. The table shows that the instances for the normal class are too many as compared to other categories of network attacks. Therefore, the more instances for attacked classes are generated using the GAN model, and their numbers are shown in Table 2. The normal class instances are enough to use in classification; therefore, more instances for the normal class are not generated. The final collective dataset using original and GAN datasets is also described in Table 2, in correspondence to the total instances against each class. The dataset in detail described against each category is shown in Table 2. The instances of each category in Column 2 show the training and testing sets provided by the UNSW-NB15 dataset originally. The binary classification is based on all attacks considered as attacks, where the normal is considered a negative class in that category. The last row shows almost 64 attacked class data as compared to the normal class instances that are the remaining 36% of data. We see here that the percentage against a single class is 36%, which is a problem in binary classification, where in multi-class classification, it is greater, as a few of the categories, such as worms, are very low in the whole dataset. To solve it separately against each class, the GAN is applied on each category separately and the new corresponding instances are made against them, shown in Column 3.
The number of new instances against each category is slightly lower as compared to the original dataset instances of similar categories. Higher number of given instances for a category of GAN create new instances in a similar way, either in terms of the value or number of instances. However, the normal and attacked instances using GAN show 57% for attacked classes and 43% for normal class. Although it is less compared to the original dataset instances, it plays an important role in the creation of a new and more balanced dataset in favor of attacked classes. The new dataset contains collectively 75% for attacked classes and 25% for the normal class, which is better compared to original and GAN-based instances. However, the classification for multi-class and binary is discussed on each of these datasets to check the validity of the created dataset and its instances.

GAN Hyperparameters Optimization and Learning Environment
The GAN method used in the proposed framework was proposed originally in 2020. The authors developed it to create both tabular and image-based synthetic data generation. However, the hyperparameters are case-specific to create relevant and efficient data. The proposed study used different values for data generation, which are shown in Table 3.
There are many parameters that can be called to optimize the learning as much as we want. To check the performance of the GAN model, the root mean square error (RMSE) is used; others could also be used depending upon the case study. The adversarial model parameters, such as estimators, are set to 100, where the batch size is used to pick up the chunk of instances at once while training, and it is set to 500. The patience rate is set to 25 which is used when RMSE is not improving during the GAN model training. After 25 times with no improvement in RMSE, the training is stopped, as the saturated stage of model learning is reached. The learning environment is Python 3.9 with required libraries, where TensorFlow is used as a backend environment.

Experiment 1: Original UNSW-NB15 Dataset
The proposed study first applied features selection using the chi-square method and then fed the16 selected features for binary and multi-class classification separately. The six classifiers were employed with an 80:20 holdout validation split. The training-databased model was then tested, where the training and tested times were also estimated to obtain the time efficiency for all of the models' performance results. The accuracy, precision, recall, and F1-score were calculated. The detailed results are shown in Table 4.
In Table 4, the classifier-wise performance of multi-class and binary-class data is shown. As we can see, different classifiers showed different behavior on both types of classification. If we look at binary classification, we can see that the best results in terms of accuracy are achieved by random forest classifier (95.59% accuracy), where the recall, precision, F1-score, geometric mean (G-mean) and area under cover (AUC) for this model are also greater than all of the other classifiers having 95.59%, 95.60%, 95.60%, 0.953 and 0.953 scores, respectively. The extra trees also achieved the best scores but here, we can say it value remains at second place among all classifiers. The other classifiers showed scores greater than 90% in all reported metrics.
If we look at the multi-class classification results, random forest showed the best results with 83.36% accuracy, 83.36% recall, 83.31% precision, 82.35% F1-score, 0.90 G-mean and 0.89 AUC. The second best is still extra trees with 83.16% accuracy and recall. If we look at logistic regression that is basically used as a binomial classifier, it showed appropriate results in the binary classifier but failed in multi class classification, as it showed 65.47% accuracy. However, we need to see the performance of classifiers in terms of time efficiency as well in the table. The time taken to train and to predict could be another metric to measure the performance of a classifier. However, DL took more time to train, as we can see in Table 4 that MLP took the highest time to train among all classifiers in both binary and multi-class classification at more than two minutes. The other classifiers, KNN and logistic regression, took less time in training but took more time in testing or prediction. The best performing classifiers random forest, and extra trees, showed 9 and 7.2 seconds in total for binary classification, where in multi-class classification, they took 11.5 seconds and 8.3 seconds in total. Therefore, as compared to classifiers other than the decision tree, the performed classifiers are also time-efficient for both binary and multi-class classification.

Experiment 2: GAN-Based Dataset
The second dataset contains the GAN-based instances only for attacked categories and the normal instances taken from the original dataset. Then the contingency of new instances is proved, and the results are reported for binary and multi-class, as shown in Table 5. Table 5 explores the score achieved by the six classifiers on GAN-based created instances and the normal class instances taken from the original dataset. However, the performance with these generated instances is improved as compared to the original ones. There is no significant difference, but it is improved, as we can see that for binary classification, the random forest again showed the highest accuracy of 95.41%, recall was the same, and the precision and F1-score were also highest with 95.44% and 95.42% with 0.954 G-Mean, AUC scores. The multi-class classification also showed the highest accuracy of 84.53% against random forest classifiers, and the nearer or second score classifier is the same.
However, in the binary class results, the results this time go down against logistic regression, where the performance of other classifiers is improved. Further, the multi-class classification results were the worst in the case of the original dataset against logistic regression, and in this experiment, worst with 68.62% accuracy. Now let us check again the time efficiency and the training and testing times of experiment 2.
The time efficiency could be seen such that in a few classifiers, training took time, and in a few, testing took more time. However, as proven previously, MLP took more total time as compared to other classifiers. The best classifiers are still the same here: random forest and extra trees took 8 and 5.9 seconds, respectively, in binary classification. However, in multi-class classification again the MLP is the worst, and the best ones took 12.3 and 9.6 total time. The training and testing times were greater as compared to the Experiment 1 in terms of multi-class classification, where in binary classification, Experiment 2 took less time.

Experiment 3: Original UNSW-NB15 + GAN Dataset
The original and GAN-based generated instances from attacked categories were combined to give a new dataset with normal class data of original data. To check the validity upon solving the class imbalance ratio to some extent, the proposed study-based dataset was fed to the same six ML classifiers, and the results for binary and multi-class were calculated. The results are shown in Table 6. Table 6 first shows the binary and then the multi-class classification results. The classification results reported for logistic regression were the worst results among all other classifiers, as was also the case in previous experiments. However, the binary classification on this new dataset achieved 98.14% highest accuracy using the extra trees methods after applying the chi-square method of feature selection. In Experiments 1 and 2, we see that random forest remains the highest score achiever for both binary and multi-class classification.
However, in Experiment 3, random forest is second with respect to the six classifiers of the binary category. If we look at recall, precision and F1-scores, then extra trees is also first with 98.14% recall, precision, F1-score, and 0.976 G-mean and AUC scores. Further, all classifiers remain greater than 95% except logistic regression which is less than or nearer to 90%. The confusion-matrix-based instance-based true positive, false positive, true negative, and false negative are also shown in Figure 4.
If we look in the MLP confusion matrix, it shows that in the 0 class, there are 23,064 predictions which correctly predicted where 2996 were predicted to be wrong and 56,568 were predicted as right in Class 1, where only 1190 were wrongly predicted. This was an MLP classifier only that did not perform as the best classifier among all. The best performing, random forest and extra trees need to be discussed here. Random forest predicted 1146 as being wrong predictions from Class 0, and 490 wrongly predicted from Class 1.  The wrong predictions did not improve much as compared to MLP for Class 0, but for Class 1, they were reduced to 490. Extra trees reduced the 0 and 1 Classes' wrong predictions, as 983 were predicted to be wrong for Class 0 and 574 for Class 1. The Class 1 predictions increased here as compared to the Random forest classifier but the Class 0 wrong predictions were also reduced. Therefore, in this way, the extra trees showed a slightly higher score in binary classification as compared to the random forest method. Now, if we look at multi-class classification, the Extra trees is again the highest score achiever, where 87.44% recall, 87.81% precision, 86.79% F1-score, 0.923 G-mean, and 0.94 AUC scores are achieved. The Second place was taken by random forest again in binary classification with scores of 87.38% accuracy, and recall, 87.84% precision, 86.67% F1-score, 0.923 G-mean and roundly 0.94 AUC.
Logistic regression again failed here to classify the multi-class instances, with 67.38% accuracy and recall, 64.50% precision, 64.64% F1-score, 0.78 G-mean, and 0.74 AUC. However, MLP and KNN remain above 80%, where three others showed more than 86% results. The results showed improvement compared to Experiments 1 and 2 for both binary and multi-class classification. It shows that the class imbalance not only reduces the classification accuracy, but also compromises to achieve a score with a greater number of samples as a normal category in it. The confusion matrices are shown in Figures 5 and 6 regarding multi-class classification to check each classifier's performance by instance.
The confusion matrices of random forest and extra trees need to be discussed here as the best performers. The behavior of predictions in random forest remains good for all classes, but those for 3, 4, 5, 6, and 9, are better compared to other classes. The correctly predicted instances are 3866, 7207, 20,000, 4062, and 24,915 for the 3, 4, 5, 6 and 9 class numbers, respectively. These numbers of rightly predicted instances are greater than those of other classes. These class are DoS, exploits, fuzzers, generic and normal classes, where other classes are predicted to a good extent as well. To compare Extra trees with it, we will again look at these predicted instances. In the case of extra trees, these numbers remain at 3779, 7114, 20,000, 4046, and 25,044 for DoS, exploits, fuzzers, generic, and normal classes. Although, as compared to the random forest results of these five classes, the number of predictions is less for DoS, exploits, and generic classes, where normal class instances are higher as compared to random forest. The rightly predicted normal class instances were 24,915 in random forest, whereas for extra trees, it were 25,044 that contain 100 more right predictions. We can say that with slight differences in the performance of random forest and extra trees classifiers, the normal class right predictions play an incremental role. However, in order to classify the attacked categories, the random forest classifier is good as compared to extra trees. Time efficiency is also discussed here as shown in the table for all classifiers among binary and multi-classification tasks. The worst method with respect to the total time is again MLP, with 216.3 seconds for binary and 335.6 seconds for multi-class classification tasks. However, the best performers are random forest and extra trees which showed 30.5 and 24.2 seconds in binary classification, but this time increased to 21.2 and 16.7 seconds in multi-class classification tasks. Logistic regression and decision trees took less time as compared to random forest and extra trees, which was also observed in Experiments 1 and 2, but the best performer also did not take much time. We can see that only one or less than one second are the prediction time once they are trained on binary and multiclass classifications. Therefore, the best performer is also time -efficient while performing predictions.

Comparison
The proposed study uses the UNSW-NB15 dataset by applying binary and multi-class classification as the original and oversampled datasets. Therefore, we compared it with binary and multi-class studies that used original or oversampled datasets. The comparative summary is shown in Table 7. The first comparison [29] includes SMOTE in its proposed two-step approach for network intruder detection and showed that the multi-class 10 category classification approach reached up to 85.78%. The second comparison [25] study was discussed in the related work section in detail, where the achieved results by this study showed a maximum of 90.41% binary class F1-score and 39.78% F1-score on multi-class classification using the DNN method. The third study [30] used the image-conversion-based approach using the DL approach to classification on UNSW-NB15 and other datasets, where oversampling is used to increase the instances. The applied methods achieved the highest macro accuracies: 92.87%, in binary and, 72.31%, multi-class classification.
The fourth comparison used only the binary class to classify and achieved 90.76% accuracy by solving the class unbalancing problem using the modified cross entropy function. We compared here both binary and multi-class classification samples to increase the validity of our proposed framework. However, by looking at comparative studies, some of them proposed class imbalance problem solutions and some applied oversampling techniques; none reached the proposed framework results of binary and multi-class results regarding accuracy, precision, or F1-score metrics.It proved the outperformance of the proposed study on the recently applied state-of-the-art approaches.

Conclusions
The proposed study uses the UNSW-NB15 dataset to solve binary and multi-class classification problems. The dataset contains a minority class imbalance that was solved by many recent studies, using different methods and achieved significant results. However, the proposed framework not only solved the minority class imbalance problem using the GAN-based model to generate new instances from the UNSW-NB15 dataset, but achieved improved classification scores as compared to previous studies. Upon solving class imbalance, the proposed study collected three main datasets and performed three experiments on them. In experimentation, the newly generated data-based dataset outperformed the original dataset performance score when the chi-square method of features selection and six ML classifiers were used. However, the comparative analysis showed that the highest achieved results by the proposed framework are better than those of previous studies. The proposed framework suggests using the GAN method to get more appropriate hyperparameter optimization to solve class imbalance problems. It could be applied on other domains as well. More than one of the feature selection methods could be applied to check if better performance could be achieved.