Wireless Communication Channel Scenarios: Machine-Learning-Based Identiﬁcation and Performance Enhancement

: Wireless communication channel scenario classiﬁcation is crucial for new modern wireless technologies. Reducing the time consumed by the data preprocessing phase for such identiﬁcation is also essential, especially for multiple-scenario transitions in 6G. Machine learning (ML) has been used for scenario identiﬁcation tasks. In this paper, the least absolute shrinkage and selection operator (LASSO) is used instead of ElasticNet in order to reduce the computational time of data preprocessing for ML. Moreover, the computational time and performance of different ML models are evaluated based on a regularization technique. The obtained results reveal that the LASSO operator achieves the same feature selection performance as ElasticNet; however, the LASSO operator consumes less computational time. The achieved run time of LASSO is 0.33 s, while the ElasticNet corresponding value is 0.67 s. The identiﬁcation for each speciﬁc class for K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and k-Means and Gaussian Mixture Model (GMM) is evaluated using Receiver Operating Characteristics (ROC) curves and Area Under the Curve (AUC) scores. The KNN algorithm has the highest class-average AUC score at 0.998, compared to SVM, k-Means, and GMM with values of 0.994, 0.983, and 0.989, respectively. The GMM is the fastest algorithm among others, having the lowest classiﬁcation time at 0.087 s, compared to SVM, k-Means, and GMM with values of 0.155, 0.26, and 0.087, respectively.


Introduction
It is widely known that artificial intelligence (AI) has become an essential addition to industries related to wireless communications. The data obtained from the surrounding environment are huge in volume, as the number of smart devices is increasing-especially in industry, scientific, and medical domains [1,2]. Connecting rural and distant locations has become crucial for future networks due to the rise in global communication needs and the growth of the Internet of Things [3]. As the variety of smartphones, laptops, tablets, and data-driven sensors are all typical data transceiver devices, their data can be and are processed by AI. The main functionality of AI is to train algorithms to make decisions and take actions. ML is a subset of AI in which large amounts of data are used in training an ML algorithm to allow it to learn more about the processed information [4]. Once the ML algorithm is trained on the training data, it can successfully make predictions or decisions with new data and execute tasks using inferential statistics and arithmetic calculations. This can permit and facilitate ML modeling for wireless communication system availability, mobility accessibility, cross-communication management, and the optimization of automated networks based on 6G data. The aim is to ensure that the key performance indicators are meeting the quality-of-service (QoS) requirements. It is anticipated that the 6G network will offer diverse services and seamless network coverage everything. The integrated satellite-terrestrial network design combines the benefits of both satellite and terrestrial networks. The current promises are to provide worldwide broadband connectivity for all sorts of users [5][6][7][8]. It has attracted much interest from both academia and business. A wireless communication system mainly consists of a physical layer, a middle layer, and an end-user layer, as shown in Figure 1. Each layer requires different AI approaches to enhance the QoS, system security, privacy, latency, power allocation and control, and channel capacity [9]. The main concern of the use of ML in the physical layer is the channel encoding, decoding, and estimation. One of these ML applications is scenario classification, which can be regarded as channel estimation. A scenario for a wireless channel is defined as the specific channel environment of data transmission [9]. For example, rural, suburban, urban, and indoor hotspots and satellites are all typical scenarios. The rural macro-cell (RMa) and the urban macro-cell (UMa) are two typical different scenarios. The UMa is deployed in urban areas as it indicates a city or a town, while the term RMa is used for rural areas or countries with smaller populations and less scattering.
In some cases, a number of circumstances or scenarios regularly occur. Users that use high-speed transportation, such as high-speed railways, may attest to this. As the user navigates across a variety of possible scenarios, including deserts, mountains, stations, and other impediments, the user places substantial strain on existing communication systems. Therefore, accurately defining wireless channel scenarios is important for meeting user QoS needs. However, the traditional forms of statistics in radio propagation, such as the Okumura and Hata models, were introduced under the assumption that the propagation model of the scenario was previously known, such as in urban scenarios [10]. This can lead to certain inaccurate classifications or mistakes. In response, minimizing complexities and run times and precisely identifying the scenario task are crucial for increasing communication system reliability. Moreover, the power efficiency, beam management, maintenance, bandwidth allocation efficiency, network setup, operation, throughput, QoS prediction, and coverage performance can be tackled by AI-based solutions. Deep-learning methods are usually used in solving complex problems, but require more computational time. The extraction of elevation and azimuth angles from CIR to distinguish the NLoS and LoS scenarios in urban places was performed in [11]. Also, an accurate classification performance was achieved using convolutional networks in the problem of fingerprint feature extraction and classification [12]. The authors of [13] demonstrated via Figure 1. ML concern in physical, middle, and end-user layers.
The main concern of the use of ML in the physical layer is the channel encoding, decoding, and estimation. One of these ML applications is scenario classification, which can be regarded as channel estimation. A scenario for a wireless channel is defined as the specific channel environment of data transmission [9]. For example, rural, suburban, urban, and indoor hotspots and satellites are all typical scenarios. The rural macro-cell (RMa) and the urban macro-cell (UMa) are two typical different scenarios. The UMa is deployed in urban areas as it indicates a city or a town, while the term RMa is used for rural areas or countries with smaller populations and less scattering.
In some cases, a number of circumstances or scenarios regularly occur. Users that use high-speed transportation, such as high-speed railways, may attest to this. As the user navigates across a variety of possible scenarios, including deserts, mountains, stations, and other impediments, the user places substantial strain on existing communication systems. Therefore, accurately defining wireless channel scenarios is important for meeting user QoS needs. However, the traditional forms of statistics in radio propagation, such as the Okumura and Hata models, were introduced under the assumption that the propagation model of the scenario was previously known, such as in urban scenarios [10]. This can lead to certain inaccurate classifications or mistakes. In response, minimizing complexities and run times and precisely identifying the scenario task are crucial for increasing communication system reliability. Moreover, the power efficiency, beam management, maintenance, bandwidth allocation efficiency, network setup, operation, throughput, QoS prediction, and coverage performance can be tackled by AI-based solutions. Deep-learning methods are usually used in solving complex problems, but require more computational time. The extraction of elevation and azimuth angles from CIR to distinguish the NLoS and LoS scenarios in urban places was performed in [11]. Also, an accurate classification performance was achieved using convolutional networks in the problem of fingerprint feature extraction and classification [12]. The authors of [13] demonstrated via their ML approaches that supervised classification algorithms and unsupervised learning clustering algorithms can be effective classification strategies for scenario identification.
In a previous work [14], we introduced an enhanced feature-selection process based on the regularization concept to enhance the classification performance of [14] and reduced the computational complexity of the ML algorithms with high generalization ability. The authors of [15] initiated the problem formulation of classification of typical terrestrial scenarios. However, the latency of transition and classification was not their concern. Therefore, to iterate upon their results, the time consumption of the preprocessing workflow and ML classification time are crucial issues to address. Minimizing the preprocessing procedure computations and the classification time of the algorithm are required to have quick scenario identification when a transition between multiple scenarios occurs. The ROC curve is important as an evaluation parameter as it contains both true positive rate (TPR) and false positive rate (FPR). The main motivation for this work is enhancing recent work [14,15] in terms of preprocessing time and performance of the wireless communication scenario identification. The proposed method has considered the latency of the classification scheme.
Hence, this paper contributes the following: 1. Reduction of the model response and latency for the preprocessing workflow of each regularization technique instruction used in the previous model [14]. The previous model adopted ElasticNet without studying the time consumption. In this work, the performance and time efficiency results prove that adopting the LASSO is more suitable than ElasticNet. It achieves the same feature-selection performance of ElasticNet but in much less time.

2.
Calculation of the classification time of KNN, SVM, k-Means, and GMM. The training phase and testing phase runtimes are computed and compared for both supervised algorithms (KNN and SVM). The formulation of clusters and the "fit and predict" runtime for the unsupervised learning k-Means and GMM are revealed.

3.
Calculation and study of the ROC curves and AUC scores for each class in each model. Both ROC curves and AUC scores of the classes are calculated as one over all, where the evaluation is taken as a binary such that every class is distinguished from the others (e.g., the RMa LoS represented as '1' versus the other classes presented as '0').
The rest of this paper is organized as follows. Section 2 provides information about the dataset used in this research, the model specification, and preprocessing and processing procedures. Section 3 shows the results and discussion of the preprocessing and classification phases, including time and ROC curves. Section 4 is devoted to the main conclusions.

Model Planning Procedures
In this section, the dataset adopted in this work is discussed. The features that are preprocessed describe each wireless communication scenario, such as delay spread (D σ ), path loss (P L ), k-factor (K F ), elevation spread angle of arrival (σ EoA ), elevation spread angle of departure (σ EoD ), azimuth spread angle of arrival (σ AoA ), and azimuth spread angle of departure (σ AoD ). In addition, this section introduces the preprocessing procedure and the methods of evaluation.

Dataset Origination and Parameters
The dataset is taken from [14], where each scenario parameter is validated through the 3GPP standard. These parameters describe the large-scale fading and small-scale fading parameters such as D σ , P L , K F , σ EoA , σ EoD , σ AoA , and σ AoD . The angular information is extracted from CIR using the space-alternating generalized expectation-maximization algorithm in MIMO model using 31 antenna elements [14,15]. The CIR snapshots are generated from the reception of the signal from mobile terminal (MT) to base station (BS). These CIR snapshots are supposed to be processed from the BS end as shown in Figure 2. The NLoS and the LoS cases are obtained for both RMa and UMa, so the number of classes is four. As mentioned before, the UMa scenario is specified in urban places as it refers to a town or a city, while the RMa is defined for rural areas such as towns that have small population and reduced scattering.
is defined as a decrease of power due to a distance. It is the relation between the distance d and the actual path loss, denoted as ( 0 ), in a unique scenario and is expressed as [16]: where the exponent is defined as γ, 0 is the reference distance, and [ ] represents the standard normal distribution.
is an essential small scale fading (SSF) parameter. It is the ratio between the power of a dominant LoS component and the NLoS multipath components. At every CIR snapshot capture, the ( ) can be denoted as [17] = 10 { where m represents the current period of the th component delay, the highest amplitude occurs at 0 index, and = 1, 2, 3 … . ℎ( ) expresses the CIR in time domain. The value of is always greater in LoS scenarios. is also an important SSF parameter that indicates the channel dispersion of a CIR snapshot in terms of time delay.
can be represented as [18] where M denotes the total number of th components and ̅ is the mean excess delay and is denoted as The channel capacity is dependant on , where a scenario that has multiple rich scatters will have a larger . Therefore, an NLoS scenario has a higher root mean square (RMS) delay spread (DS). The NLoS and the LoS cases are obtained for both RMa and UMa, so the number of classes is four. As mentioned before, the UMa scenario is specified in urban places as it refers to a town or a city, while the RMa is defined for rural areas such as towns that have small population and reduced scattering. P L is defined as a decrease of power due to a distance. It is the relation between the distance d and the actual path loss, denoted as P L (d 0 ), in a unique scenario and is expressed as P dB [16]: where the P L exponent is defined as γ, d 0 is the reference distance, and S σ [dB] represents the standard normal distribution. K F is an essential small scale fading (SSF) parameter. It is the ratio between the power of a dominant LoS component and the NLoS multipath components. At every CIR snapshot capture, the K F (K dB ) can be denoted as [17] where τ m represents the current period of the m th component delay, the highest amplitude occurs at τ m 0 index, and m = 1, 2, 3 . . . M. h(t) expresses the CIR in time domain. The value of K dB is always greater in LoS scenarios. D σ is also an important SSF parameter that indicates the channel dispersion of a CIR snapshot in terms of time delay. D σ can be represented as [18] where M denotes the total number of m th components and τ is the mean excess delay and is denoted as The channel capacity is dependant on D σ , where a scenario that has multiple rich scatters will have a larger D σ . Therefore, an NLoS scenario has a higher root mean square (RMS) delay spread (DS). The angular spread (σ θ ) denotes the channel dispersion for a CIR snapshot in terms of angular information, where σ θ is obtained as [19] The declaration of angle θ is taken for azimuth angle of departure (AoD), azimuth angle of arrival (AoA), elevation angle of departure (EoD), and elevation angle of arrival (EoA). The NLoS scenario has more clusters than the LoS scenario. As a result, the value of σ θ is higher in NLoS scenarios than LoS scenarios. Figure 3 shows the model planning procedures that represent the flow of the dataset mentioned in the previous section, including the data preprocessing criteria, processing phase, and model evaluation. The angular spread ( ) denotes the channel dispersion for a CIR snapshot in terms of angular information, where is obtained as [19]

Preprocessing and Processing Procedures
The declaration of angle is taken for azimuth angle of departure (AoD), azimuth angle of arrival (AoA), elevation angle of departure (EoD), and elevation angle of arrival (EoA). The NLoS scenario has more clusters than the LoS scenario. As a result, the value of is higher in NLoS scenarios than LoS scenarios. Figure 3 shows the model planning procedures that represent the flow of the dataset mentioned in the previous section, including the data preprocessing criteria, processing phase, and model evaluation. The preprocessing procedures are sequential, including normalization, regularization, and dimension reduction. The preliminary dataset enters the preprocessing phase with a data shape of 2000 row and 7 columns, excluding the output label. The dataset can  be represented as , where  = { , ,  ,  ,  ,  , , L}, and L represents The preprocessing procedures are sequential, including normalization, regularization, and dimension reduction. The preliminary dataset enters the preprocessing phase with a data shape of 2000 row and 7 columns, excluding the output label. The dataset can be represented as A , where A = {D σ , P L , K F , σ EoA , σ EoD , σ AoA , σ AoD , L}, and L represents the outcome label that specifies whether it is Uma LoS, Uma NloS, Rma LoS, or RMa NLoS. Each row, A i , represents a single data point that will be normalized using a Z-score normalization method. The ML models may exhibit bad performance due to the outliers of un-normalized data [16]. The Z-score is applied on each data point A j i , where it represents Electronics 2022, 11, 3253 6 of 14 the distance of a data point to the mean divided by a standard deviation and can be denoted as [20]

Preprocessing and Processing Procedures
where A j and σ A j are the average and standard deviation of each feature j. Then, the correlation matrices are used to validate the inter-parameter correlation revealed before normalizing the features and after normalization. The normalized data points are then regularized to drop the unwanted features and to enhance the classification performance [15]. The LASSO operator is L1 regularization type, while the ElasticNet is a combination of both L1 and L2 regularization type. L1 stands for Least Absolute Error, and L2 stands for Least Square Errors [21]. The LASSO regressionB lasso and ElasticNet regressionB elastic can be denoted, respectively, aŝ where both α 1 , α 2 affect the ratio of penalty (α 1 + α 2 = 1), and λ is the weight of shrinkage. λ expresses the regularization penalty, where 1 ≥ λ ≥ 0. β 0 is a coefficient with a constant value, while β = (β 0 , β 1 , β 2 , . . . , β N ) is the coefficient vector; t denotes the degree of regularization. The evaluation of the regularization process is conducted using process runtime and a regularization coefficient evaluation. Here, the runtime was tested on an Intel ® Core™ i3-2365M CPU (1.40 GHz) processor using the Python timing library. Once the regularization coefficient of a certain feature obtains 0, t the feature is considered as unwanted noisy data.
The last preprocessing step is the dimension projection. This is used to reduce the number of predictors for the ML, which reflects on the computational complexity reduction [22]. Here, the kernel principal component analysis (k-PCA) is adopted. The kernel type in this paper is a radial basis function (RBF) and can be denoted as where A a , A b are two different points and ρ is a hyper-parameter threshold [23]. Then, the output components are visualized as function of probability density function (PDF). After preprocessing the data, the dimension reduction output is the number of analyzed principal components that are used as ML input predictors. The data are split into training data and validation data. The problem formulation is a classification problem in order to classify between four different scenarios {RMa LoS, RMa NLoS, UMa LoS, UMa NLoS} efficiently. There are two supervised learning algorithms used, KNN and SVM, which require label training. Moreover, two unsupervised learning clustering algorithms are used: k-Means and GMM. The algorithms are then evaluated using ROC curves and their runtime.
The KNN classifies a novel unknown data point based on the majority of the surrounding points based on the nearest distances [24]. The SVM attempts to create different support vectors, then determines the optimal hyper-planes in order to minimize the error and widen the maximal margins for each group of data [25]. The k-Means and GMM are both unsupervised learning methods that create clusters based on the inferential statistics of the data [14]. Each algorithm is then evaluated using the computational time and the ROC curves. The ROC curve shows the relation between the true positive rate and the false positive rate. Figure 4 shows an illustration of an ROC curve [26].
are used: k-Means and GMM. The algorithms are then evaluated using ROC curves and their runtime.
The KNN classifies a novel unknown data point based on the majority of the surrounding points based on the nearest distances [24]. The SVM attempts to create different support vectors, then determines the optimal hyper-planes in order to minimize the error and widen the maximal margins for each group of data [25]. The k-Means and GMM are both unsupervised learning methods that create clusters based on the inferential statistics of the data [14].
Each algorithm is then evaluated using the computational time and the ROC curves. The ROC curve shows the relation between the true positive rate and the false positive rate. Figure 4 shows an illustration of an ROC curve [26]. The perfect classification occurs when the area under the curve is maximized. The worst classification occurs when the line becomes straight.

Results and Discussion
In this section, the results of each process discussed in the preprocessing and processing procedure are revealed. The correlation matrices of each scenario are discussed before normalization and after using Z-score. The performance and the runtimes of the process of regularization for both LASSO and ElasticNet are compared. The ROC curves of each scenario are discussed for each algorithm. Moreover, the runtime of each algorithm is revealed. Figure 5 shows the dataset inter-parameter correlations before and after normalization of Z-score for different channel scenarios. The highly positive correlations tend to be bright orange. The highly negative correlations tend to be dark blue. All , , , , ,

Z-Score Normalization Impact on Inter-Parameter Correlations
, and are represented as DS, PL, KF, esD, esA, asD, and asA, respectively. The perfect classification occurs when the area under the curve is maximized. The worst classification occurs when the line becomes straight.

Results and Discussion
In this section, the results of each process discussed in the preprocessing and processing procedure are revealed. The correlation matrices of each scenario are discussed before normalization and after using Z-score. The performance and the runtimes of the process of regularization for both LASSO and ElasticNet are compared. The ROC curves of each scenario are discussed for each algorithm. Moreover, the runtime of each algorithm is revealed. Figure 5 shows the dataset inter-parameter correlations before and after normalization of Z-score for different channel scenarios. The highly positive correlations tend to be bright orange. The highly negative correlations tend to be dark blue. All D σ , P L , K F , σ EoA , σ EoD , σ AoA , and σ AoD are represented as DS, PL, KF, esD, esA, asD, and asA, respectively.

Z-Score Normalization Impact on Inter-Parameter Correlations
The parts Figure 5a,c,e,g represent the correlation matrices for each scenario before normalizing the features, while the parts Figure 5b,d,f,h display the final result for the correlation after normalization. It is easily noted that the correlation coefficients differences between normalized and un-normalized parameters are negligible as the normalization process only standardizes the data without significantly affecting the inter-parameter correlation. The Z-score keeps the inter-parameter correlations in its region after normalization. The strong, medium, and weak correlations are retained. This determines the robustness of the Z-score and its ability to normalize the data while keeping the data within the correlation region.     The parts Figure 5a,c,e,g represent the correlation matrices for each scenario before normalizing the features, while the parts Figure 5b,d,f,h display the final result for the correlation after normalization. It is easily noted that the correlation coefficients differences between normalized and un-normalized parameters are negligible as the normalization process only standardizes the data without significantly affecting the inter-parameter correlation. The Z-score keeps the inter-parameter correlations in its region after normalization. The strong, medium, and weak correlations are retained. This determines the robustness of the Z-score and its ability to normalize the data while keeping the data within the correlation region. Table 1 shows the decision of feature elimination for LASSO and ElasticNet, where the elimination is based on the value of the coefficient. If the value is 0, this means that the corresponding feature should be dropped. The decision results of the table are either kept or dropped. The ElasticNet and LASSO performances are similar. Both have dropped the 0 regularization coefficient parameters esD, asD, and DS. They kept esA, KF, PL, and asA because their coefficients are 0.72, 0.37, 0.25, and 0.12, respectively. These results indicate that the regularization of both LASSO and ElasticNet reduced the data dimensionality from 7 to 4. Table 2 shows the time evaluation of both LASSO and ElasticNet. The instruction of each one is 10 times and their time average is taken into consideration.  Table 1 shows the decision of feature elimination for LASSO and ElasticNet, where the elimination is based on the value of the coefficient. If the value is 0, this means that the corresponding feature should be dropped. The decision results of the table are either kept or dropped. The ElasticNet and LASSO performances are similar. Both have dropped the 0 regularization coefficient parameters esD, asD, and DS. They kept esA, KF, PL, and asA because their coefficients are 0.72, 0.37, 0.25, and 0.12, respectively. These results indicate that the regularization of both LASSO and ElasticNet reduced the data dimensionality from 7 to 4. Table 2 shows the time evaluation of both LASSO and ElasticNet. The instruction of each one is 10 times and their time average is taken into consideration. Clearly, the LASSO is more time-efficient and more suitable for this type of dataset as it is a L1 regularization type. The ElasticNet takes more time as it combines both L1 and L2 types during regularization.

Dimension Reduction and Data Visualization
As mentioned before, the LASSO reduced the data dimension from 7 to 4 in less time than the ElasticNet. The k-PCA reduces the data dimension even more. The type of kernel is RBF. Figure 6 shows the PDF of the final preprocessed data, where the data have the principal components extracted from k-PCA. Figure 6a shows the PDF of the first principal component, PC1, for all scenarios. The PC1 has an overlay of classes (scenarios). This overlap would cause misclassification due to the overlapped groups. This data overlap can be seen in the cases of RMA NLoS and UMA NLoS. Figure 6b shows the PDF of the second principal component, PC2, which shows a novel differentiation of information that can be used in distinguishing the four classes. In other words, PC2 displays a new dimension of the data: for example, the UMa NLoS can be easily distinguished from RMa LoS. Both PC1 and PC2 components are used as ML predictors.
as it is a L1 regularization type. The ElasticNet takes more time as it combines both L1 and L2 types during regularization.

Dimension Reduction and Data Visualization
As mentioned before, the LASSO reduced the data dimension from 7 to 4 in less time than the ElasticNet. The k-PCA reduces the data dimension even more. The type of kernel is RBF. Figure 6 shows the PDF of the final preprocessed data, where the data have the principal components extracted from k-PCA. Figure 6 (a) shows the PDF of the first principal component, PC1, for all scenarios. The PC1 has an overlay of classes (scenarios). This overlap would cause misclassification due to the overlapped groups. This data overlap can be seen in the cases of RMA NLoS and UMA NLoS. Figure 6 (b) shows the PDF of the second principal component, PC2, which shows a novel differentiation of information that can be used in distinguishing the four classes. In other words, PC2 displays a new dimension of the data: for example, the UMa NLoS can be easily distinguished from RMa LoS. Both PC1 and PC2 components are used as ML predictors.

ML Evaluation
The ROC curves of each algorithm show a binary classification. Each scenario is compared with the other three scenarios so that it is considered as a binary value of 1 while the other scenarios are represented as 0. Figure 7 shows the ROC curves output for each scenario over the others for all algorithms.

ML Evaluation
The ROC curves of each algorithm show a binary classification. Each scenario is compared with the other three scenarios so that it is considered as a binary value of 1 while the other scenarios are represented as 0. Figure 7 shows the ROC curves output for each scenario over the others for all algorithms.
The AUC is the indicator for the binary classification performance. In the case of RMa LoS, all algorithms achieved an AUC score above 0.997. The KNN achieved the highest AUC score in RMa NLoS classification with 0.994. In the case of UMa LoS, the KNN achieved a score of 0.994, outperforming the SVM by 0.02, while the GMM outperformed the k-Means by 0.02. The least AUC score in the case of UMa NLoS was 0.96. which corresponds with the k-Means, while the GMM surpassed the AUC score of k-Means by 0.266. The supervised learning seems to have a better AUC score than unsupervised learning because the TPR and FPR were calculated on the testing set only. The unsupervised learning models were used to cluster the combined dataset (training and validation set). Then, the evaluation was performed by calculating the minimum mean square error of each cluster and comparing it with the actual data output.
The time evaluation for each algorithm is displayed in Table 3. The time of the training and testing were combined for the supervised learning algorithms KNN and SVM.  The AUC is the indicator for the binary classification performance. In the case of RMa LoS, all algorithms achieved an AUC score above 0.997. The KNN achieved the highest AUC score in RMa NLoS classification with 0.994. In the case of UMa LoS, the KNN achieved a score of 0.994, outperforming the SVM by 0.02, while the GMM outperformed the k-Means by 0.02. The least AUC score in the case of UMa NLoS was 0.96. which corresponds with the k-Means, while the GMM surpassed the AUC score of k-Means by 0.266. The supervised learning seems to have a better AUC score than unsupervised learning because the TPR and FPR were calculated on the testing set only. The unsupervised learning models were used to cluster the combined dataset (training and validation set). Then, the evaluation was performed by calculating the minimum mean square error of each cluster and comparing it with the actual data output.
The time evaluation for each algorithm is displayed in Table 3. The time of the training and testing were combined for the supervised learning algorithms KNN and SVM. The clustering time is shown for the unsupervised learning k-Means and GMM. Both supervised algorithms KNN and SVM require training using labels. The SVM performed the training and testing in a total time of 0.155 s, which is faster than the KNN. This indicates that the linear SVM could achieve 99% accuracy in less time, as the dataset distribution was suitable for SVM. The GMM performed the clustering task in 0.087 s, which was the lowest among all algorithms. The k-Means clustering time was the highest at 0.26 s. This information indicates that the GMM is more flexible and reliable in the scenario clustering task. It can be generalized for other different scenarios besides RMa LoS, RMa NLoS, UMa LoS, and UMa NLoS, as it does not require a label during training, which is a good advantage over the other algorithms.  Both supervised algorithms KNN and SVM require training using labels. The SVM performed the training and testing in a total time of 0.155 s, which is faster than the KNN. This indicates that the linear SVM could achieve 99% accuracy in less time, as the dataset distribution was suitable for SVM. The GMM performed the clustering task in 0.087 s, which was the lowest among all algorithms. The k-Means clustering time was the highest at 0.26 s. This information indicates that the GMM is more flexible and reliable in the scenario clustering task. It can be generalized for other different scenarios besides RMa LoS, RMa NLoS, UMa LoS, and UMa NLoS, as it does not require a label during training, which is a good advantage over the other algorithms.

Conclusions
In this work, the time consumption of the data preprocessing for ML was enhanced. Regularization using LASSO was integrated, instead of ElasticNet, to reduce the preprocessing procedure and the computational time. Moreover, the computational time and performance of different ML models based on regularization technique were evaluated. The ML performance was evaluated using the ROC curves and AUC scores for each specific class. The classes were RMa LoS, RMa NLoS, UMa LoS, and UMa NLoS. The ML algorithms used are KNN, SVM, k-Means, and GMM.
The obtained results show that the adoption of LASSO is better than ElasticNet, as it performs the same unwanted features removal in a shorter time. These unwanted features are esD, asD, DS, as they achieved 0 regularization coefficient values. The LASSO runtime was 0.33 s, while the ElasticNet runtime was 0.67 s. Then, the ML algorithms were performed. The time results show that the KNN training time and testing time were 0.011 s and 0.17, respectively. The model training time of the linear SVM is 0.14 s and the testing time was 0.015. Therefore, the SVM was the fastest supervised learning algorithm used, as its runtime was faster than KNN and it achieved a 0.994 AUC score. The k-Means showed the slowest classification time with 0.26 s and the worst performance by an overall score of 0.983. The runtime of the fit and predict function of GMM was 0.087 s and achieved an overall score of 0.989 AUC score in ROC curves. The runtime of the GMM clustering may be considered as the fastest one among k-Means and the supervised algorithms. The GMM was the most time-efficient algorithm among both supervised and unsupervised learning algorithms.
For future work, the optimization of GMM for scenario classification task can be considered. Long Short-Term Memory (LSTM) can be investigated in wireless channel classification and prediction. The LSTMs were developed to deal with the vanishing gradient problem that can be encountered when training traditional networks. Moreover, the study of the Doppler effect is a good point to be considered in a future work.