Machine Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning for Optimal Adaptive Learning

Over the past couple of decades, many telecommunication industries have passed through the different facets of the digital revolution by integrating artificial intelligence (AI) techniques into the way they run and define their processes. Relevant data acquisition, analysis, harnessing, and mining are now fully considered vital drivers for business growth in these industries. Machine learning, a subset of artificial intelligence (AI), can assist, particularly in learning patterns in big data chunks, intelligent extrapolative extraction of data and automatic decision-making in predictive learning. Firstly, in this paper, a detailed performance benchmarking of adaptive learning capacities of different key machine-learning-based regression models is provided for extrapolative analysis of throughput data acquired at the different user communication distances to the gNodeB transmitter in 5G new radio networks. Secondly, a random forest (RF)-based machine learning model combined with a least-squares boosting algorithm and Bayesian hyperparameter tuning method for further extrapolative analysis of the acquired throughput data is proposed. The proposed model is herein referred to as the RF-LS-BPT method. While the least-squares boosting algorithm is engaged to turn the possible RF weak learners to form stronger ones, resulting in a single strong prediction model, the Bayesian hyperparameter tuning automatically determines the best RF hyperparameter values, thereby enabling the proposed RF-LS-BPT model to obtain desired optimal prediction performance. The application of the proposed RF-LS-BPT method showed superior prediction accuracy over the ordinary random forest model and six other machine-learning-based regression models on the acquired throughput data. The coefficient of determination (Rsq) and mean absolute error (MAE) values obtained for the throughput prediction at different user locations using the proposed RF-LS-BPT method range from 0.9800 to 0.9999 and 0.42 to 4.24, respectively. The standard RF models attained 0.9644 to 0.9944 Rsq and 5.47 to 12.56 MAE values. The improved throughput prediction accuracy of the proposed RF-LS-BPT method demonstrates the significance of hyperparameter tuning/optimization in developing precise and reliable machine-learning-based regression models. The projected model would find valuable applications in throughput estimation and modeling in 5G and beyond 5G wireless communication systems.


Introduction
Effective data processing and analysis have become a huge task due to the upsurge in massive data collection from various wireless communication devices and cellular Battiti [26] The work focuses on accelerated backpropagation learning, considering two optimization techniques.
There is a need to assess the performance of the models for networks with a large number of weights.
This paper presents a detailed statistical analysis of the acquired throughput data through performance status quality reporting at the different user equipment terminal locations. 2008 Castillo [27] Adaptive learning algorithms for Bayesian network classifiers were projected. The work aims to handle the cost-performance trade-off and deals with concept drift.
The work did not provide adequate information on how to resolve the bottleneck challenges in a prequential learning framework as the training data increase over time.
The current work examined the performance of the projected learning-based models for 5G wireless networks using large-scale throughput data acquired from several network operators in the United States. 2011 Khan, Tembine, and Vasilakos [28] The work presents game dynamics and the cost of learning in heterogeneous 4G networks.
The work provides numerical examples and OPNET simulations concerning network selection in WLAN and LTE. However, experimental validation of the numerical results is missing.
Our work presents performance benchmarking of adaptive learning capabilities of different machine-learning-based regression models based on the experimental 5G throughput data. 2016 Pandey and Janhunen [29] The work presents a method based on reinforcement learning for automating parts of the management of mobile networks.
The work did not cover the concept of learning with partial observability and cooperative learning that considers the neighboring base stations.
Our work addresses the problem of learning with partial observability and cooperative learning by integrating the neighboring base stations based on the 5G data analyzed. 2018 Li, Cao and Hao [30] The work presents an adaptive-learning-based network selection approach for 5G dynamic environments. The system enables users to adaptively adjust their selections in response to the gradually or abruptly changing environment.
Though the proposed approach enables a population of terminal users to adapt effectively to the network dynamics, experimental validation of the proposed approach is missing.
Our work proposed an RF-LS-BPT regression model for improved dataset predictive modeling and learning based on 5G experimental datasets. 2020 Narayanan et al. [31] The work focuses on commercial 5G performance on smartphones using 5G networks of three carriers in three US cities. Additionally, the work explored the feasibility of using location and other environmental data to predict network performance.
The work developed practical and sound measurement methodologies for 5G networks on COTS smartphones but did not provide the learning-based models for the 5G performance measurements.
The current work projected learning-based models for improved dataset predictive modeling and learning based on the 5G throughput data. 2021 Moodi, Ghazvini, and Moodi [32] The work considers a hybrid intelligent approach to detect android botnets using a smart self-adaptive-learning-based PSO-SVM.
The authors observed that one of the factors influencing the selection of important features of a dataset is the approach and the parameters used on that dataset. However, practical deployment of the projected hybrid intelligent approach was not considered.
An optimized RF-LS-BPT regression model was proposed for accurate throughput data modeling and learning using different performance indicators based on experimental datasets.  [33] The work examines the application of a machine-learning-based algorithm to approximate a complex 5G path loss prediction model. Specifically, the decision tree ensembles (bagging) algorithm was employed to build a generic model which was used to estimate the pathloss.
Time optimization for the feature (input) calculation process was not considered in this work. Experimental validation of the proposed model is also missing.
Lastly, practical testing of the model for accurate wireless network planning is required.
The current work captured optimization for the features (inputs) variables and experimentally validated the proposed model using practical 5G throughput data.
In view of the preceding literature, there is no existing work that reports a machinelearning-based boosted regression ensemble combined with hyperparameter tuning for optimal adaptive learning. To this end, this paper proposes a random forest machinelearning-based model combined with a least-squares boosting algorithm and Bayesian hyperparameter tuning to boost its predictive application performance. The proposed regression model is termed the RF-LS-BPT model. A detailed application of the proposed RF-LS-BPT model to real-time throughput data acquired at different user equipment terminal locations in 5G mobile broadband cellular networks was investigated. Our proposed RF-LS-BPT model offers a new hybridized predictive modeling method to help network operators and engineers regularly conduct improved extrapolative analysis of different cellular network data for planning and management purposes.
The foremost contributions of this research paper are highlighted as follows: • We first give a detailed statistical analysis of the acquired throughput data through performance status reporting at the different user equipment terminal locations with respect to the tested communication distances from the transmitter.

•
We provide performance benchmarking of adaptive learning capacities of different key machine-learning-based regression models with the choice regression model, which is the random forest.

•
We propose an RF-LS-BPT regression model for improved dataset predictive modeling and learning.

•
The proposed RF-LS-BPT regression model was applied in detailed, accurate throughput data modeling and learning using different performance indicators.
The remaining part of this paper is structured into four sections as follows. Section 2 contains the related work and background information. Section 3 offers the machinelearning-based boosted regression method, 5G throughput measurement campaign, and the proposed RF-LS-BPT algorithm and implementation process. Section 4 focuses on the results and discussion. Finally, the conclusions are drawn in Section 5.

Theoretical Background
First, the general concept of random forest is described, and the exploratory data analysis (EDA) method applied is highlighted in this section. The mathematical description of the RF regression model and least-squares boosting (LS-Boost) is also broached.

Random Forest (RF)
The RF is an ensemble of decision-tree-based machine learning methods. It was proposed in [34] by Breiman to address both data classification and regression problems. It operates by growing and assembling a host of self-regulating decision trees to solve complex real-world problems. While the trees grow, the data are shared using a principle in several steps. The performance accuracy of regression-learning-based RF models is primarily influenced by the input data, training algorithm, and regulating hyperparameters [35]. Here, 'hyper' indicates top-level parameters that can be explored to regulate the machine learning process and produce better results. Some of the vital RF hyperparameters include the decision tree number, tree type, and the feature set size (number of features), all of which control performance. Hyperparameter tuning or optimization is a robust method of identifying and finding the best feasible values of hyperparameters for a machine learning model to attain the desired resultant modeling outcome. Popular hyperparameter tuning algorithms in the literature include random search, grid search, and Bayesian optimization search [36,37].
Generally, many factors impact on predictive modeling capacities of machine-learningbased models and methods, especially the surrogate types such as the RF, SVM, DT, GPR, and NN. These include the learning rate, tree number, training algorithm, and hyperparameter tuning algorithm. In [7], the authors concentered on how to explore different key RF modeling parameters such as tree number (size) and related features to effectively mine different datasets. Particularly in [8], the researchers' interest was how to optimally implore the RF feature set size to conduct a robust regression analysis of large datasets.
In this study, an integrated exploratory approach was taken to examine some of the aforementioned factors in RF predictive modeling performance capacity. Our exploratory approach considers the integration of a random forest (RF)-based machine learning model combined with a least-squares boosting algorithm and Bayesian hyperparameter tuning method for real-time extrapolative data analysis.

Exploratory Data Analysis Procedure
The exploratory data analysis (EDA) method [38] was utilized in this study. It is a systematic method of investigating and analyzing datasets to discover patterns and ensure that valid results are produced according to desired goals. A regression-based machine learning model is expected to learn a dataset adaptively, thereby identifying and bringing out the relationships between data input values and targeted output response during training. Effective predictive data processing is part of a critical step in discovering patterns in data.

The RF Regression Model and Least-Squares Boosting (LS-Boost)
In broad mathematical terms, an RF is a special predictor whose main constituents are built on randomized tree ensembles {Y(x;Θ a t , R n )} 1≤a≤A . The sequence {(Θ a t )} 1≤a≤A encloses the random variables Θ that regulate the probabilistic mechanism wherein each tree is built.
For a finite tree number, A, the RF estimate can be expressed as (1): For an infinite tree number (i.e., M is sufficiently large), the RF estimate turns to (2): where E Θ indicates the expectation value in correspondence with Θ.
The individual tree predictor can be defined by (3): where W i := p , of real-valued random variables. The leading objective is to predict the target response Y connected to the random variable, X, employing a regression function Accordingly, the loss function which defines the mean squared error (MSE) can be estimated using (4): Owing to the hypothetical bias and variance issues, the fitted model and the resulting predicted outcome may severely suffer from underfitting or overfitting problems, leading to a high error between the targeted response and the estimated variables. In order to address such drawbacks, the inconsistency of f (X 1 ) in Equation (4) needs to be placed under control by employing the bagging (Bag) or least-squares boosting (LS-Boost) algorithm. This paper considers the LS-Boost algorithm but employs bagging to benchmark the results. In the LS-Boost algorithm, hundreds or more weak learners (trees) are engaged for training, and it iteratively updates the error to become a strong learner [34,38]. At every iteration step, the ensemble fits in a fresh learner. The MSE are expressed in Equation (4).

The Proposed Machine-Learning-Based Boosted Regression Ensemble Combined with Hyperparameter Tuning
This section presents detailed information on the entire procedure engaged to achieve the research aim. Particularly, the method of 5G throughput data collection, the proposed RF-LS-BPT implementation algorithm, and its implementation process are provided in this section.

5G Throughput Measurement Campaign
The current study utilized field measurements taken in diverse urban environments in the United States to test and validate the proposed learning-based models. The field measurements were taken to assess the commercial 5G performance on smartphones and made available online [31]. The 5G networks of three carriers in three US cities were examined. Specifically, a systematic analysis of the various mechanisms used for handoff in the 5G network was explored. The impact of these handoff mechanisms on network performance was also feasibly explored to determine whether the location and other dynamic environmental conditions can be used to predict network performance.
Additionally, the performance of the app in terms of web browsing, HTTP download, and volumetric video streaming over 5G was critically examined. The experiments, which consume over 15 Tb data, were carried out over T-Mobile, Sprint, and Verizon 5G networks. Verizon offers operational mm-wave-based 5G services to subscribers in the investigated environments where dense 5G base stations are deployed. T-Mobile employs mm-wave, while Sprint uses a mid-band frequency at 2.5 GHz. In the field measurements, the authors captured about 6.8 million data points obtained from the 5G coverage in downtown Minneapolis, USA [31].
Two types of commercially available off-the-shelf (COTS) 5G-capable smartphones were used in the experiments. These were the Motorola Moto Z3 and Samsung Galaxy S10 5G (SM-G977U). For brevity, these are described as MZ3 and SGS10, respectively. The SGS10 uses an in-built 5G radio, while the MZ3 uses an external 5G mod to access 5G networks. The mobile device used is the SGS10, and it is 4G-and 5G-compatible, allowing comparison on the same device. Four locations were considered for the experiments carried out on Verizon's network [31]. Typically, the locations are a good representative of open/crowded spaces, low/high buildings, indoor/outdoor environments, and more. The experimentation was conducted using a Microsoft Azure server to achieve the highest statistical 5G throughput. The server also helps to achieve approximately 3 Gbps throughput.

The Proposed RF-LS-BPT Process
The entire RF-LS-BPT process, which is revealed using the flowchart in Figure 1 and its stepwise implemented method using MATLAB, is outlined as follows. Also, the RF regression with least-squares boost (LS-Boost) is given in Algorithm 1. Load the throughput datasets into MATLAB. b.
Examine the datasets to obtain relevant insights. c.
Presence of correlated features. d.
Missing values and outliers. e.
Preprocess the datasets to cater for the identified missing values and outliers. f.
Transform the datasets RF-LS-BPT modeling format. g.
Split the datasets into two, with 0.3 portions for testing and 0.7 portions for training. h.
Engage the default RF ensemble fitting tool in MATLAB for the data training and testing. i.
Evaluate the default RF ensemble fitting through data training and testing. j.
Choose an appropriate RF aggregation technique. LS-Boost was chosen here. k.
Identify the most relevant RF hyperparameters. l.
Determine optimal values of the RF hyperparameters using the optimization option I MATLAB ('OptimizeHyperparameters', 'auto'), which is based on the Bayesian optimization process. m.
Optimize the RF Regression ensemble results using the cross-validation process. n.
Build the final RF-LS-BPT model combing the LS-Boost algorithm with tuned optimal RF hyperparameter values. o.
Engage the resultant RF-LS-BPT model on the entire throughput quality datasets. p.
Test the resultant RF-LS-BPT using a 0.3 portion of the data and new data. q.
Assess and report the predictive performance of the resulting RF-LS-BPT model.
Algorithm 1: Also, The RF Regression with least-squares boost (LS-Boost) is given in Algorithm 1.

Input:
Training set: Learning rate value, v and Tree number, A, obtained through Bayesopt, Loss function, L(y, f (x i )). Output:

Key Evaluation Metrics
The mean absolute error (MAE) [39] given in (5), the normalized mean squared error (NRMSE) given in (6), the coefficient of determination (Rsq) given in (7), and the percentage error (PE) are the five key evaluation metrics used in this paper to examine the performance of the RF-LS-BPT method. The proposed method is better if the MAE and NRMSE values are low but have higher Rsq values.

Key Evaluation Metrics
The mean absolute error (MAE) [39] given in (5), the normalized mean squared error (NRMSE) given in (6), the coefficient of determination (Rsq) given in (7), and the percentage error (PE) are the five key evaluation metrics used in this paper to examine the performance of the RF-LS-BPT method. The proposed method is better if the MAE and NRMSE values are low but have higher Rsq values.

Results and Discussion
Detailed results and discussion are contained in this section. All computations, coding, implementation, and graphics were conducted using MATLAB software environment with the aid of an HP laptop (Elitebook) with an Intel ® Core ™ i3-10110U and Intel ® Turbo Boost Technology, 4 MB L3 cache, 2 cores was used. First, in this section, we start by revealing the status of the acquired 5G throughput qualities attained at close communication distances of 25, 50, 75, 100, and 160 m between the transmitter and UET. This is followed by results and discussion on the throughput data training and testing accuracy achieved using seven machine learning models with their default parameters. Also contained in this section are throughput data training and testing results achieved using the proposed RF-LS-BPT model versus the standard RF modeling approach, plus results on throughput data training and testing using LS-boosting and bagging.

Throughput Quality Status Analysis
The throughput quality remains an exclusive higher-layer performance indicator for assessing data transmission quality and integrity in mobile broadband networks. Remarkably, the actual throughput quality at the UET can be influenced by critical factors such as user location and communication distance from the transmitter. The user data throughput expresses the speed at which a user can reliably send data and receive the same at the user equipment terminal (UET). It also expresses the quantity of data in bits per second (bps) conveyed and delivered over the cellular network within a specific period. The graphs in Figure 2 display the measured throughput qualities attained at close communication distances of 25, 50, 75, 100, and 160 m between the transmitter and UET. All the graphs show that the network experiences low throughput quality in the range of 50 to 100 Mbps at first user download before experiencing upward but fluctuating quality improvement as the user stays in the network. The low throughput quality experienced at the UET shows that the network has serious delay problems during the initial network log-in. The general fluctuations in throughput quality across the various measurement distances can be attributed to several influencing factors, which include network propagation environment, user location, communication distance, the asymmetry between upload and download rates, available channel bandwidth, network traffic load, propagation channel conditions, signal quality, signal coverage, and modulation/coding scheme [40].
Throughput quality status in terms of maximum, minimum, and mean throughput quality values attained at the various distances are summarized in Table 2. For maximum quality, about 2350, 2080, 2070, 1970, and 1990 Mbps values were attained at 25, 50, 75, 100, and 160 m UET location distances. A close look at the values shows that the maximum quality value attained by the UET degrades as communication distance increases for the connecting transmitter. This result confirms that user equipment location regarding communication distance from the transmitter antenna is a major factor influencing the quality level received. Overall, the mean throughput quality attained at the UET is quite low compared to the at least 1000 Mbps quality value envisioned for 5G broadband networks, even at such close distances. Throughput quality status in terms of maximum, minimum, and mean throughput quality values attained at the various distances are summarized in Table 2. For maximum quality, about 2350, 2080, 2070, 1970, and 1990 Mbps values were attained at 25, 50, 75, 100, and 160 m UET location distances. A close look at the values shows that the maximum quality value attained by the UET degrades as communication distance increases for the connecting transmitter. This result confirms that user equipment location regarding communication distance from the transmitter antenna is a major factor influencing the quality level received. Overall, the mean throughput quality attained at the UET is quite low compared to the at least 1000 Mbps quality value envisioned for 5G broadband networks, even at such close distances.

Throughput Data Training and Testing Using Different Machine Learning Models with Their Default Parameters
In addition to hyperparameters, machine learning models have their default parameters internally built for specific tasks. While the default parameters are inevitably used to learn, hyperparameters are objectively set by the user to guide the learning process optimally. Here, five key machine learning models with default parameter regression settings were first engaged for throughput data training and testing. The machine learning models are the multi-layer perceptron neural network (MLP-NN), random forest (RF),  In addition to hyperparameters, machine learning models have their default parameters internally built for specific tasks. While the default parameters are inevitably used to learn, hyperparameters are objectively set by the user to guide the learning process optimally. Here, five key machine learning models with default parameter regression settings were first engaged for throughput data training and testing. The machine learning models are the multi-layer perceptron neural network (MLP-NN), random forest (RF), support vector machine (SVM), K-nearest neighbor (KNN) model, Gaussian process regression (GPR), and decision tree (DT). The generalized least-squares (GLS) model was also employed in the regression process. The main aim of using other machine learning methods is also to assess their adaptive learning capability with the choice of RF method. Shown in Figure 3 is a plot displaying the throughput quality comparison of different machine learning models. The prediction performance of the individual methods explored in terms of their accuracy using MAE and R-Squared (Rsq) is shown in Figures 4 and 5 value of 9.27 dB and the best 0.9998 Rsq value from these throughput data training results. Similar superior prediction efficiency results were obtained with the RF regression model when engaged for throughput data testing. Still, the results are excluded here for the sake of brevity. The better prediction efficiency of RF could be due to its robust ability to handle large dimensionality datasets efficiently with high precision. On the other hand, the GLS attained the worst results because of its poor performance in handling stochastic datasets with high variance and large dimensionality [41,42]. methods is also to assess their adaptive learning capability with the choice of RF method. Shown in Figure 3 is a plot displaying the throughput quality comparison of different machine learning models. The prediction performance of the individual methods explored in terms of their accuracy using MAE and R-Squared (Rsq) is shown in Figures 4  and 5. The MAE and Rsq values attained for GLS, MLP-NN, RF, SVM, KNN, GPR, and DT were 276.96, 57.54, 137.01, 58.94, 125.66, 276.96, and 9.27 dB and 0.6746, 0.9602, 0.8642, 0.9644, 0.87.55, 0.9728, and 0.9989, respectively. The RF regression model achieved the lowest MAE value of 9.27 dB and the best 0.9998 Rsq value from these throughput data training results. Similar superior prediction efficiency results were obtained with the RF regression model when engaged for throughput data testing. Still, the results are excluded here for the sake of brevity. The better prediction efficiency of RF could be due to its robust ability to handle large dimensionality datasets efficiently with high precision. On the other hand, the GLS attained the worst results because of its poor performance in handling stochastic datasets with high variance and large dimensionality [41,42].

Throughput Data Training and Testing Using Proposed RF-LS-BPT Model versus Standard RF Modeling Approach
Although the RF-based regression model outperforms other selected machine learning models, which used benchmarks, as shown above, some hyperparameters can be tuned to further optimize it for improved performance during predictive data modeling and learning. Furthermore, the large prediction error attained by the standard RF-based regression model can be attributed to the high divergence between the input variables and targeted response. As mentioned earlier, the target error response can be reduced using the LS-Boosting technique, hence the proposed RF-LS-BPT model. In order to prevent overfitting or underfitting, the hyperparameters were tuned to minimize the prediction error further [43]. In order to implement the proposed technique, first, a data training set was conveyed through an intended RF regression model empowered with an LS boost ensemble. We then used the Bayesian optimization search-based process to tune and obtain the values of the optimal hyperparameters. The three main focused hyperparameters for tuning are learning rate, number of training cycles (the tree maximum depth), and maxisplits. The same process was repeated, but the Grid search-based hyperparameter tuning method was employed. Figures 6 and 7 display the tuning process patterns and the values of the optimal hyperparameters obtained using grid search and Bayesian optimization search. Notably, the curves in Figure 7 comprise the minimum cross-validated MSE arising after determining optimal hyperparameter values, as shown in Table 3. The table shows the learning rate, tree number, and maximum splits. In order to develop the proposed RF-LS-BPT method, the Bayesian optimization-based search was considered over the grid search since it yielded the lowest error. neighbor (KNN). (g) Random forest (RF).

Throughput Data Training and Testing Using Proposed RF-LS-BPT Model Versus S ard RF Modeling Approach
Although the RF-based regression model outperforms other selected machine ing models, which used benchmarks, as shown above, some hyperparameters tuned to further optimize it for improved performance during predictive data mo and learning. Furthermore, the large prediction error attained by the standard RF regression model can be attributed to the high divergence between the input variab targeted response. As mentioned earlier, the target error response can be reduced the LS-Boosting technique, hence the proposed RF-LS-BPT model. In order to p overfitting or underfitting, the hyperparameters were tuned to minimize the pre error further [43]. In order to implement the proposed technique, first, a data train was conveyed through an intended RF regression model empowered with an LS ensemble. We then used the Bayesian optimization search-based process to tune a tain the values of the optimal hyperparameters. The three main focused hyperpara for tuning are learning rate, number of training cycles (the tree maximum depth maxisplits. The same process was repeated, but the Grid search-based hyperpar tuning method was employed. Figures 6 and 7 display the tuning process patter the values of the optimal hyperparameters obtained using grid search and Bayesia mization search. Notably, the curves in Figure 7 comprise the minimum cross-va MSE arising after determining optimal hyperparameter values, as shown in Table  table shows the learning rate, tree number, and maximum splits. In order to deve proposed RF-LS-BPT method, the Bayesian optimization-based search was cons over the grid search since it yielded the lowest error.                                           Due to hypothetical bias and variance issues, the predicting model or the targeted response may severely suffer from underfitting or overfitting problems, leading to high error between the targeted response and the estimated variables. In order to address such drawbacks, the prediction model needs to be placed under control by employing the bagging (Bag) or least-squares boosting (LS-Boost) algorithm. While bagging employs a simple technique of result averaging to aid a model in achieving its desired prediction, boosting utilizes a weighted mean of results in aiding a model in actualizing its prediction method. In the LS-Boost algorithm, hundreds or more weak learners (trees) are engaged for training, and the error is iteratively updated to improve learning. Here, the robust performance of the adopted LS-Boost algorithm in the proposed RF-LS-BPT model compared to the bagging algorithm in training and testing to learn the throughput data obtained at collection points is provided in Figure 18 using MAE values. Also shown in Tables 6 and 7 are summaries of accuracy attained by the two RF ensemble algorithms. The robust prediction accuracy achieved in terms of MAE, NRME, and Rsq with the LS-Boost algorithm with the proposed model shows that it helped to considerably improve the extrapolative performance capacity between the targeted throughput data and the estimated variables. Brain and Webb [44,45] opined that models with low bias during learning are generally sought after for large dataset analytics, hence the superiority of our proposed model. drawbacks, the prediction model needs to be placed under control by employing the bagging (Bag) or least-squares boosting (LS-Boost) algorithm. While bagging employs a simple technique of result averaging to aid a model in achieving its desired prediction, boosting utilizes a weighted mean of results in aiding a model in actualizing its prediction method. In the LS-Boost algorithm, hundreds or more weak learners (trees) are engaged for training, and the error is iteratively updated to improve learning. Here, the robust performance of the adopted LS-Boost algorithm in the proposed RF-LS-BPT model compared to the bagging algorithm in training and testing to learn the throughput data obtained at collection points is provided in Figure 18 using MAE values. Also shown in Tables 6 and 7 are summaries of accuracy attained by the two RF ensemble algorithms. The robust prediction accuracy achieved in terms of MAE, NRME, and Rsq with the LS-Boost algorithm with the proposed model shows that it helped to considerably improve the extrapolative performance capacity between the targeted throughput data and the estimated variables. Brain and Webb [44,45] opined that models with low bias during learning are generally sought after for large dataset analytics, hence the superiority of our proposed model.   Table 7. Throughput data testing accuracy using LS-Boosting and bagging.

Conclusions
The throughput quality remained an exclusive higher-layer performance indicator for assessing data transmission quality and integrity in mobile broadband networks. The user data throughput expresses the speed at which a user can reliably send data and receive the same at the user equipment terminal (UET). Generally, the amount and quality of throughput at the UET can fluctuate significantly, subject to many influencing factors. However, many factors can influence user data throughput quality. The key ones include network propagation environment, user location, communication distance, the disproportion between upload and download rates, available channel bandwidth, network traffic load, propagation channel conditions, signal quality, signal coverage, and modulation/coding scheme. The first objective of this research was to determine the actual throughput quality attained at the UET at a close communication distance of 25, 50, 75, 100, and 160 m from the transmitter over a typical 5G mobile broadband cellular network. The second objective was to appraise the popular machine learning predictive modeling techniques in the literature and optimize the best one using a robust approach for optimal adaptive prediction modeling and learning of the acquired stochastic throughput quality. By following the two main objectives, the aims of the proposed learning-based models were achieved. Apart from examining the impact of transmitter-receiver communication distances on throughput quality status as in this study, there is also a need to conduct a detailed empirical investigation of the influence of variables on throughput quality. This need, however, is slated for our future research. In addition, future work would investigate the predictive capability of deep neural network models such as long short-term memory and other evolutionary-based regression techniques such as particle swarm optimization and genetic algorithms.

Data Availability Statement:
The data that support the findings of this study are publicly available from the University of Minnesota-Twin Cities, USA: https://fivegophers.umn.edu/www20 (accessed on 21 February 2022).