Mobility Management-Based Autonomous Energy-Aware Framework Using Machine Learning Approach in Dense Mobile Networks

: A paramount challenge of prohibiting increased CO 2 emissions for network densification is to deliver the Fifth Generation (5G) cellular capacity and connectivity demands, while maintaining a greener, healthier and prosperous environment. Energy consumption is a demanding consideration in the 5G era to combat several challenges such as reactive mode of operation, high latency wake up times, incorrect user association with the cells, multiple cross-functional operation of Self-Organising Networks (SON), etc. To address this challenge, we propose a novel Mobility Management-Based Autonomous Energy-Aware Framework for analysing bus passengers ridership through statistical Machine Learning (ML) and proactive energy savings coupled with CO 2 emissions in Heterogeneous Network (HetNet) architecture using Reinforcement Learning (RL). Furthermore, we compare and report various ML algorithms using bus passengers ridership obtained from London Overground (LO) dataset. Extensive spatiotemporal simulations show that our proposed framework can achieve up to 98.82% prediction accuracy and CO 2 reduction gains of up to 31.83%.

The high energy consumption of SCs is due to the circuit power component (load independent power consumption) that enacts a much larger portion of over-all energy consumption [9]. The various reasons that load component gets overloaded include in-efficient use of SCs drawing unnecessary power and incorrect mobility management of daily passengers ridership across deployed SCs. As a result, with the concept of UDSC deployments, the need for CO 2 reductions driven by ES schemes using Reinforcement Learning (RL) and statistical mobility methods driven by Machine Learning (ML) algorithms will be even more compelling.

Related Work
The current exponential escalation of the bus traffic ridership is a precursor towards an imminent traffic flux, congestion and capacity crunch. In this backdrop, an effective utilisation of resources through classification, optimisation and densification of the large number of 5G Heterogeneous cells (HetNets) have emerged as the most conceding solution to achieve the prediction accuracies to save time, cost and capacity gain goal [1]. Recent years have seen major technological developments in mobile communication networks, including network densification, split of control and data planes [10], network virtualisation, etc. Technology will continue to advance rapidly, and across the world billions of pounds will be invested in the development of 'new mobility services', eventually bringing more challenges to the dense networks [5]. Mobile traffic patterns that are recorded by a large number of cellular towers to facilitate mobile operators are extremely important for daily scheduling and dimensioning purposes. Movement patterns help in developing a tactful scenario when users mobilise in their most visited places. Embracing the world in 5G era brings limited information of travellers mobility when there are gaps in the cellular coverage or mobility is not possible due to the limitations of cellular towers requiring upgrades [6,11,12].
Several studies based on travellers mobility despite randomness show sufficient user movement predictions and optimisation. Cellular towers that exploit various methods of predicting user mobility patterns where ML-based algorithms are popular predictors shown in [13][14][15][16][17]. However, with the 5G limitations [12], ML predictors proved to be smart chosen alternative in order to study traffic flow and user behaviour. Hence, there is a need to explore performance of ML predictors by using a meaningful and intelligent framework that classifies the daily travellers and saves energy accordingly. Some works are found in the context of traffic flow and network complexity where average path lengths were analysed, whereas there are other papers produced on the subject of energy efficiency in transport operations that highlight the importance of the daily operational status [17,18]. Another work which further classifies the traffic into weights and temperatures is found. Structure of the classification training model was proposed to train different kinds of traffic flows with comparison using K-Nearest Neighbour (KNN) algorithm [19,20]. Artificial Neural Networks (ANNs)-based ML algorithm is employed for determining various wireless networking problems using detailed analysis such as spiking and Deep Neural Networks (DNNs) [21]. A Decision Tree (DT) algorithm has been used in some works related to mobility prediction classification to establish passengers next move and interactions by using Global Positioning System (GPS) trajectory data to generate decision and model trees [6,18,22]. Some works related to traffic modelling are conducted where the structure of the training model was derived to offer optimal traffic management using Support Vector Machine (SVM), Naive Bayes (NB) and Discriminant Analysis (DA) algorithms [6,19].
In cellular systems, CO 2 emissions driven by energy consumption can be lessened significantly by switching off underutilised, lightly loaded or idle SCs during peak and off-peak intervals by offloading their load to their Macro Cell (MC) within the same HetNet or by managing user mobility patterns [6,11,23]. In this way, minimum energy coupled with CO 2 emissions are consumed per bit transmission [22,[24][25][26]. To exploit these approaches, SON function by 3GPP [5,7] has adopted ES and has extensively been studying the ripple effects of ES and user's mobility on the environment. ES improvement with a focus on mobility management and resource allocation has been studied more extensively despite its relatively small gain compared to switching On-Off underloaded Base Stations (BSs) [22,[24][25][26]. ES gains by switching On-Off operation would improve the situation to only a limited degree for a given throughput until they are further enriched with the proactive and autonomous approaches by intelligent switching methods. In this direction of research, some recent works show promising results in terms of potential ES [1,[12][13][14][15][16][17]25,26]. However, they fall short for 5G demands, to the best of our knowledge, due to the following five limitations: 1. Sensitive mode of operation: Typical ES SON algorithms are susceptible to reaction that achieves ES at the expense of QoS after an event has been completed. Given the well-populated city dynamics with bus passenger ridership in relation to deployed cellular environment, by the time SCs overloading or underloading is detected and a realistic algorithm is opted to solve the known issue, the conditions may already change [7]. In the 5G environment, this problem can further escalate when disparate passenger ridership and plethora of cell types responsible to support smart city eco-system are not in harmony. 2. SCs wake up time: Sleeping SCs require a specific amount of time to wake up [27]. Any passenger entering a SC footprint that is still in a sleeping state would add high latency experience. Thus, there is a need to modernise conventional paradigms pro-actively to maintain low latency requirements of 5G in a more agile fashion, i.e., pro-active ES by passenger's mobility management. 3. User Association to sleeping SCs: A key challenge in the HetNet cell On-Off switching strategy is to establish user associations (bus passengers ridership association) to the correct serving SCs that are switched ON while passengers are within its coverage footprint [17], thus contributing to overhead challenges. Existing ES schemes have not apparently provided evidence to address this challenge where 5G QoS demands low-overhead, low-costs and highly efficient architectures.
4. SON upright design: Conventional ES solutions when implemented together in a HetNet environment are susceptible to conflicts [5] that require intelligence to resolve. SON use-cases that are liable to be conflicted are: traffic offloading while SC switching [3,13] and prediction of passengers to neighbouring cells [6]. For the first conflict, Cell Individual Offsets (CIOs) along with transmission power settings play a major role, whereas a correct distancing metric for the classification of mobility predictions is used for the second one. Furthermore, traffic offloading through vertical, horizontal or both is an important method when BS transmit power is concerned [25]. In horizontal offloading, SCs have low transmit powers within the certain cell range to offload the traffic of neighbouring cells.Therefore, between SCs, horizontal offloading cannot always be realised. Consequently, vertical offloading often becomes the only choice for some SCs to go into sleep mode if its neighbouring SCs are not in the proximity. 5. CO 2 Emissions: To the best of our knowledge, this challenge coupled with ES is either overlooked by researchers or lacks comprehensive quantisation [2,3] to the extent of ES saving impact on CO 2 emissions.

Contributions Organisation
We propose the Energy-Aware Framework ( Figure 2) to address the aforementioned limitations by analysing a bus passenger ridership dataset using multi-tier Self-Organising HetNet with one MC and nine SCs in Central London location [5]. The main focus is to anticipate the passengers mobility behaviour travelling on a bus who are passing through a HetNet architecture so that concerned SCs become artificially intelligent and autonomous. This Artificial Intelligence (AI) would be used to formulate a novel ES optimisation problem through mobility management for proactive SCs scheduling and offloading satisfying QoS requirements. The following are the summarised contributions of our research: 1. As a building block of novel Energy-Aware Framework, we develop a spatiotemporal mobility prediction framework by analysing a statistical K-Nearest Neighbour (KNN) model which would modernise ES conventional limitations. 2. A novel method of passengers future location estimation is proposed to map the next cell spatiotemporal Handover (HO) based on the idea of landmarks using multiple K values in KNN model and a detailed comparison. 3. Another novelty of this proposal is that, based on the future cell load information and CIOs as optimisation variables for load balancing among SCs, a proactive ES optimisation problem is formulated to reduce power and energy consumption by switching off lightly loaded, idle or underutilised HetNet SCs. Intelligence in load balancing would exploit specifically lightly loaded SCs to be switched off while satisfying QoS. 4. Based on the information achieved from mobility management of passengers ridership and ES awareness, a novel scheme for CO 2 reductions is also quantified.

System Model
In this section, we present the analytical model development of the Energy-Aware Framework whose key corner stones are as follows: • Statistical KNN-based Passengers Mobility Prediction • Passengers Future Location Estimation • Proactive-Energy Saving Optimisation based on CO 2 Reduction

Energy-Aware Framework
The Energy-Aware Framework proposed considers the downlink stream with 1 MC and 9 SCs for the sake of conciseness, as shown in Figure 2. MC is equipped with directional antennas, whereas all the SC antennas are assumed to be omnidirectional with constant gains. Same frequency band is utilised by all cells in the framework with frequency re-use factor of 1. Constant bit rate service with full buffer data utility is available in a centralised C-SON architecture with system-wide Proactive-Energy Saving Optimisation-based CO 2 Reduction. Moreover, historical traces of mobility that include time, location, number of passengers, associated cell IDs and received power levels (RSRP) are assumed to be available to the C-SON server. For Proactive-Energy Saving Optimisation-based CO 2 Reduction, consideration has been given to a two-tier HetNet model consisting of a live MC and nine live SCs along with their traffic information in separated control and data planes. Signalling is carried out by the MC with the responsibility of low data rate services while the backhaul MC-connected SCs offer high capacity services. When SCs monitor low traffic activity, they tend to switch-off while offloading their traffic to the MC provided there is enough capacity in the MC to accommodate the offloaded traffic load.

Statistical KNN-Based Passengers Mobility Prediction
A non-parametric KNN classifier segregates a model in which the alteration of both distance metrics and the number of nearest neighbours are done simultaneously for optimum results and comprehensive comparison. The reason for simultaneous parameters settings is that the KNN classifier stores training data which can be easily tuned and modelled to compute resubstitution predictions. Alternatively, the model can be classified to train new observations using the predict method. The following modified equation is to be considered [19]: where p is probability of classes x i , which depends on peak time x 1 and off-peak time x 2 test inputs within our framework; N K (y, D) are the K nearest points to an integer y in our dataset D; and I(e) is an indicator function with e equal to 0 for false and equal to 1 for true. This method provides flexibility whether learning could be instance-based or memory-based. Commonly, Euclidean distance metric is used to set the parameters of KNN classifier; however, we use Mahalanobis distance metric to manipulate our dataset bounds due to the optimal classification results evaluation. This simply "looks at" the K points in the training set that are nearest to the test input x i . We input peak and off-peak bus passenger ridership data with multiple values of K. KNN classifiers work fine with low inputs; however, they do not function appropriately with the inputs of high dimensions. Computation with Mahalanobis distance metric uses a positive definite covariance matrix C between each pair of elements of a given random vector. The default value of the matrix C is the sample covariance matrix of X, as computed by nancov(X). The following modified equation is to be considered [19,20]: where d (i,j) (x, µ) is the distance between a data vector x and the mean vector µ with (i, j) = 0, 1, 2, . . .. The Mahalanobis distance metric can be thought of as a vector distance that uses a Σ −1 norm, is a stretching factor on the space and is the inverse of variance-covariance matrix Σ between x and y [20]. The number of nearest neighbours in X used to classify each point during prediction is specified as a positive integer value which can be less than the number of rows in the training data. Note that, for an identity covariance matrix Σ i = I, the Mahalanobis distance becomes the familiar Euclidean distance.

Passengers Future Location Estimation
Let the association of users u to their SCs according to the geographic locations at time instant k be U ∈ M k = (x k , y k ) and the predicted cell HO tuple for each user mobility be (M u , T u ). We use this information to establish users' future cell associations in next time intervals k +k as M k+k . Taking references from [14,23] that the nodes (passengers in our case) usually move around in such a way that daily passengers move to complete their routine tasks with fairly regular landmarks and tourists for well-visited landmarks, we utilise their mobility logs to estimate: (i) most probable landmarks of daily commuters; and (ii) visited landmarks by non-frequent travellers in each SC. By harnessing mobility information, trajectories from current location to the most predicted locations would be estimated by cell sojourn HO time T HO and multiple distance metrics mentioned in the section above. Let the coordinates of the most probable landmark for users u mobility in the next where ||.|| is the Mahalanobis norm operator. The future coordinates at time interval M k+k can be estimated as: The pseudo-code for the users mobility prediction in terms of future location estimations is given in Algorithm 1.

Proactive-Energy Saving Optimisation Based CO 2 Reduction
For a wireless network performance evaluation, the state-of-the-art is to analyse RAN components at system level where multiple components in a typical BS contribute to power consumption depending on traffic load profiles. These components include power amplifiers, back-haul links, amplifier efficiency, signal processing and generation, air conditioning and others. The following equation is to be considered.
where S is the number of sectors in a cell, A Tx is the number of antennas transmitting per sector, P Tx is the input power of the antenna and η eff is the power amplifier efficiency, with transceiver Pr t , digital signal processor Pr d , signal generator Pr g , AC-DC converter Pr c , back-haul link Pr l , air conditioning Pr a and other BS components Pr o . Given the movement of all passengers in the next tuple with future movement estimated location L ∈ M k+k , we aim to devise cell switching mechanism for SCs in the next interval as k +k in order to control overall energy consumption of the framework. The On-Off switching schedule would comply with coverage KPI and QoS requirements of each UE to be at its estimated next move L ∈ M k+k while ensuring each BS loading constraints. The total instantaneous power being consumed by a cell is the sum of transmit power and circuit as [28]: where P tot represents total instantaneous cell power; P cir is the constant circuit power which gets drawn when a BS in a given cell c changes its state from being active to sleep mode while reducing significant power; P t represents cell transmit power; λ is the load variable that depends on the capacity of the cell, the indicator variable ρ defines the On-Off state of BS in cell c; P HetNet is total HetNet power consumption that employs one MC and nine SCs in our case; and P mc and P k sc are the power consumptions of MC and K-th SCs, respectively, with k = {2, 3, 4, .., n} the number of SCs surrounded by a MC. Energy savings leveraging the performance metrics defined by Energy Consumption Ratio (ECR) [8,29] is one way to quantify cell energy behaviour in Joules/bit, given as: where f (γ u ) denotes a function that returns user's u achievable spectral efficiency at a given SINR, B u (w) is the user specific bandwidth and P is the amount of power consumed. The SINR γ u at future movement estimated location M k+k when associated with a cell c defined as the ratio of a user's reference signal received power RSRP u from a cell c to the sum of all cells i RSRPs such that i ⊂ C with the noise N.
where cell transmit power is represented by P t , user equipment gain is represented by G u , transmitting antenna gain seen by user u is represented byḠ u , signal varient shadowing is represented by δ, path loss constant is represented by α, estimated user u location distance M k+k is given by ∆ u and path loss exponent is represented by β. The next time steps k +k are reliant on the time subscript enclosed within [.]k +k throughout the paper. We assume the shadowing estimated information is available which would provide user u estimated locations with minimal errors. Thus, we calculate the SINR expression of fully loaded cells along with the interference from neighbouring cells for data transmission as where ρ i denotes the cell load of a cell i. The process of compensating a distorting factor for received interference power from each active cell yields a particular coupling of the total interference when multiple cells are utilised. Heavily loaded cells are more power interfering contributors than less loaded ones [3,7]. For a HetNet arrangement, instantaneous cell load is the ratio of active Physical Resource Blocks (PRBs) obtained during a Transmission Time Interval (TTI) with the available PRBs available in the cell. Hence, for monitoring UL/DL total PRBs usage, the ratio act as a standard measurement indicator. QoS and achievable SINR are influenced by the number of PRBs allocated in an SC(s). PRBs are directly proportional to the required data rate to maintain QoS. Hence, the more PRBs are assigned to a user, the higher is the QoS and the lower is the SINR. We define the total cell c load for each time stepping intervals k +k to achieve the required rate of all users of an SC as: where B u (w) is the bandwidth of one resource block, RB n are the number of resource blocks in a cell c, τ u is the transmission rate and i ⊂ C are the number of active users u in a cell c. We define this load as a virtual load which is allowed to exceed one to monitor the cell loading state and give us clear indication. The minimum bit rate τ u is required to maintain QoS requirements by continuously serving users u in each cell c. There are several methods used to calculate the required user throughput by calculating required resources to service the user. In the 3rd Generation Partnership Project (3GPP) standards, a mechanism known as QoS Class Identifier (QCI) is used to prioritise active users based on the resource type allocation and requirements. We can define the desired throughput on QCI that would model transmission bit rate τ in a more robust way while leveraging network analytics. Furthermore, functional user behaviours, their service request patterns, levels of subscription and their in-use applications would be modelled using transmission bit rate τ [5]. We define user u association criterion to the SCs as: where P r,u (dBm) is user reference signal power from a cell, i.e., one MC or sum of nine SCs, and P CIO (dB) is Cell Individual Offset (CIO), a biased parameter depends on load of the cell that has a main function to offset lower transmit power of all the SCs in the HetNet in order to transfer load when lightly used or idle. When under utilised, SCs are turned off, and the load would be transferred to MC provided there is sufficient capacity. The downside of this activity is that users would no longer be associated with the strongest SC and backhaul overhead would increase when loading and off-loading occurs. This would make the SINR lower with higher CIO values. However, CIO measures loads to be balanced, which would eventually drop the capacity due to SINR drop and affects the QoS. Therefore, to partially offset the SINR of the serving cell, available PRBs would be allocated to users u which are more abundant in comparison to the available PRBs in the previous serving SC. As a trade-off knob to control the load balancing, Energy Consumption and CO 2 Emissions of the cells and overall HetNet architecture, the CIO parameter is highly important. Now, we formulate the equation of general energy consumption for each of the time steps k +k for all cells c within HetNet architecture as [3,12,17]: The main objective is to optimise the HetNet energy consumption based on cell switching and traffic offloading so that CO 2 emissions are reduced and optimal policies are enforced to automate the network. To do this, load and CIO parameters (ρ, P CIO ) are required to be optimised for all the SCs such that overall energy consumption ratio in all SCs within HetNet is minimised, consequently reducing the CO 2 emissions. The first two limitations define the CIO limits and On-Off state array, respectively, and determine the solution search space size, whereas the third limitation is to ensure the minimum amount of coverage to all the users through means of HetNet collective contribution. The minimum received power that a user would require is P th , and is the area of coverage probability that the users would be within it that would maintain QoS requirements with the indicator function. The minimum required bit rate is the fourth constraint that depends upon the QoS requirements. To maintain the ECR minimum objective, switching off lightly loaded SCs would impact received P r,u (dBm) minimum user reference signal power and make it worse, leading to worse SINR and throughputs. Therefore, the fourth constraint guarantees the minimum SINR to be maintained for all users in all cases. This would require cell load ρ to be less than total load threshold ρ T (0, 1). Cell switching On-Off, CIOs and cell load index is a non-convex optimisation problem on a large scale [12,18]. The complexity of user association in Equation (11) per SC is expected to grow exponentially when dealing with multiple constraints concurrently; therefore, we analyse some Reinforcement Learning (RL) techniques to compare and obtain optimal results. Our modelled scenario has CIO as an optimisation variable with ten possible values available at each SC and 1024 possible iterations. The Energy-Aware Framework devises optimal On-Off state array and all SCs CIO values proactively aiming to minimise the energy consumption ratio of the whole network that would further extrapolate the CO 2 emissions. The Energy-Aware Framework has a direct impact on CO 2 emissions and is directly proportional. The total integral sum of all the ECR ratio values in a HetNet architecture would be calculated with the help of CO 2 conversion factors in [1,3]. Therefore, from (7), we have where ∆ CO 2 is carbon footprint that depends on the total energy consumption ratio ECR(P tot ) obtained from total power consumption P tot , ψ refers to emissions per unit/conversion factor and t represents the time duration.

Proposed Approach
We present our results based on the novel Energy-Aware Framework where we first analyse the Machine Learning (ML) driven classification accuracies using six algorithms, K-Nearest Neighbour (KNN), Support Vector Machine (SVM), Discriminant Analysis (DA), Naive Bayes (NB), Decision Tree (DT) and Artificial Neural Network (ANN). ML-based algorithms use geographic BS locations, user cell association and the number of passengers in peak and off-peak times. Second, we estimate passenger's future locations using KNN algorithm with multiple distance metrics where Mahalanobis distance metric plays a main role due to its best classification accuracy, and finally, we present energy saving coupled with CO 2 emissions through Q-learning (QL)-based Reinforcement Learning (RL) while benchmarking against: (i) No-Switching (NS); (ii) Exhaustive Search (ES); and (iii) Greedy approach.

Machine Learning (ML) Driven Classification Accuracy
ML driven classification accuracies to predict peak and off-peak bus passengers ridership within the HetNet environment in our Energy-Aware Framework is proposed. ML was invented from pattern recognition to automate machines for intelligent decision making while learning from history and adapt to the testing environment [21,30]. For the optimisation of peak and off-peak activities in our framework, ML algorithms proposed in [21,[31][32][33] are used to model traffic identification and user association to the BSs for classification of moving patterns. The first classification mechanism is KNN, which is a non-parametric classifier that searches for K-points in its training set that are the nearest to its test inputs. It performs counting of its member classes and returns observational fractions as estimated values [6,19]. Distance-based metrics for KNN algorithm are comprehensively discussed in Section 2.2. The second classification mechanism is Discriminant Analysis (DA), which is based on independent variables to perform predictions for classification individuals into groups with two objectives: (i) classification of new inputs by predictive equations; or (ii) predictions of individual variables to comprehend relationships [6,19]. The third classification mechanism is Support Vector Machine (SVM). SVM model is also recognised as a large margin classifier with the set of inputs classification of high dimensions through the liner and non-linear mapping. Output results are reliant on a subset of the training data, known as support vectors [6,19]. The model takes a decision based on boundaries to construct distance bound nearest training samples in a form of hyperplane. The fourth classification mechanism is Decision Tree (DT), which is often called the classification and regression trees (CART) model. DT recursively partitions the input space of individual local models in each resulting region. It can be represented by a tree with one leaf per region [6,19]. The fifth classification mechanism is Naive Bayes (NB), which is another mobility classification algorithm that classifies vectors of discrete-valued features [6,19]. It has class labels through which we demonstrate the training classes (peak and off-peak passengers) as a product which is called the NB model. Finally, our sixth classification mechanism is an Artificial Neural Network (ANN) that classifies the interconnected group of nodes/neurons consisting of input and output layers. Neurons learn the training data without being programmed with task-specific rules. Numeric weights are tuned on experiences to exploit best possible outcomes when learning the neural nets [6,19].

Reinforcement Learning (RL) Driven Energy Savings
The Energy-Aware Framework proposes SC On-Off switching operation by using RL algorithm where MC senses the environment and takes an action which is rewarded or penalised depending on the conditional state of action being taken. RL has been chosen to support SC On-Off switching operation due to its suitability of making decisions out of a wide-range of options. MC interacts with the network environment and obtains SCs traffic information and user association criterion to make decisions. Hence, RL copes with the dynamic environment via adaptability through learning and then deciding the required actions to maintain QoS. We have adopted Q-learning algorithm (QL) [3,34,35] to solve the constraints. QL is one of the most well-known RL algorithms that has a proven capability of working in dynamic environments [9,11]. QL is an off-policy method that follows different policies to determine next possible action state in conjunction with action-value table update. The six main components in QL are: (i) agent; (ii) environment; (iii) action; (iv) state; (v) reward/penalty; and (vi) action-value table. An agent takes actions by interacting with a given environment in order to maximise the reward or minimise the penalty. After each action that the agent takes, the resulting state and reward/penalty are evaluated with the following rule [3,13,35].
where s t and s t+1 are the current and next states, φ is a discount factor, γ t+1 is the expected penalty for the next step, a t is the action taken after MC learned from the environment, a is the set of all possible actions and λ r is the rate of learning. Due to QL algorithm's model-free learning [3,34,35] that adapts to behaviour of the environment varying conditions such as dynamic traffic loads, it bears very low computational overhead compared to the Exhaustive Search (ES) and Greedy techniques [25,26,36]. In our study, we considered a simple HetNet live model which has one MC and nine SCs that have been distributed geographically on one of the busiest streets in Central London. With this consideration, the state space is small enough to apply a simple look up table (Q table), which is updated for every state action pair. In designing the SC switching mechanism, our goal is to find the best switching strategy with low ECR coupled with CO 2 emissions that is dependant on the selection of the best set of SCs to switch off, out of all possible set of SCs.

Performance Evaluation
Our novel Energy-Aware Framework is divided into two parts for the analysis of peak and off-peak dataset obtained from the bus passenger ridership in Central London. We generated typical LO environment-based bus passenger distributions leveraging 3GPP compliant standards. As described in Section 2, first, ML-based KNN algorithm with Mahalanobis distance metric is chosen for passengers mobility prediction followed by future location estimations. The second part covers the proactive energy saving optimisation based on CO 2 reduction by using RL-based QL algorithm. Due to the dynamics of the busy environment, number of passengers vary over time. We modelled 21 h from 05:00 to 02:00 to cover peak and off-peak travel, as shown in Figure 3. For the first part of our proposed model, bus passengers are classified and compared against different ML algorithms to establish optimum mobility management-based approach in dense mobile networks. For the second part, bus passengers are distributed within HetNet cells to find the optimum cell On-Off switching and overloading method. Total power consumption of the HetNet architecture is calculated from Equation (6) where the cell load λ is normalised for the calculation of transmitted power P t from Equation (10). Average mean values of 100 iterations for power and energy consumptions are plotted. CO 2 emissions based on user associations with the SCs and the ECR is obtained from Equation (13). For both parts, network topology is supported by simulations in MATLAB for which simulation parameters are mentioned in Table 1.

Classification Prediction Accuracy
For the purpose of obtaining classification prediction accuracies, we used six ML algorithms to present the performance of best classification algorithm for our dataset. It is worth mentioning that we used MATLAB (R2020b, MathWorks, United Kingdom) libraries for all six ML classifiers. The KNN classifier from Equations (1) and (2) with several values of K as the distance metric of nearest neighbours that is discussed in Section 2 where K = 1 belongs to the distance metric called Mahalanobis outperforms compared to other metrics, as shown in the Figure 4. This proves the simplicity of the algorithm in classifying new data points based on similarity measures. Similar to K = 1, when K = 2, 3, 4, . . . , the output value stays closer to the K = 1 result, meaning the test points from the training dataset. The classifier has memorised the last movement to the correct label and the classifier achieves minimal error rate response. Section 2.2 provides a brief explanation of Mahalanobis distance metric. However, for a detailed analysis, Mahalanobis distance metric is based on correlations between a set of variables of a multivariate analysis, i.e., daily bus passengers in peak and off-peak times in our case. It is based on correlations between different class passengers where different patterns can be identified and analysed. Unlike the Euclidean distance, which is the commonly used straight line distance between two points, Mahalanobis distance is suitable for determining multivariate similarity of a set of independent variables in dispersed directions, i.e., multidimensional position of each case of the sample class. In Figure 4, the number of uncorrelated neighbours are shown on the x-axis, the distance metrics are shown on the y-axis and the estimated values of objective function are shown on the z-axis. The total functional evaluations performed for all the distance metrics are 30 along with the total elapsed time 63.65 s and total objective function evaluation time as 5.24 s. After application of all the distance metrics, Mahalanobis distance provided best classification accuracy based on our dataset of two uncorrelated classes, peak and off-peak time travels. The objective function value is the real-valued function whose value is to be either minimised or maximised over the set of feasible alternatives dependant on the best performing distance in the outliers. For optimisation purposes where the value of the objective function is to be maximised, the optimal value is the least upper bound of the objective function values over the entire feasible region. Conversely, when minimising the objective function values, the optimal value is the greatest lower bound of the objective function values over the entire feasible region. For the best observed feasible points, observed objective function value is 0 which is based on the estimated objective function value 1.024 × 10 −6 and the function evaluation time is 0.89 s. The second classifier was DA with the linear function being used. The third classifier was SVM with default RBF kernel parameters settings but the kernel size used was 200. The fourth classifier was DT with maximum splits set to 50. The fifth classifier was NB with its normal function used. The sixth classifier was ANN with neurons set to multiple values to train weights and layers in each k intervals while the rest of the values were set to default. In total, 840 observations were used for peak and off-peak time travel for the performance metric mobility prediction accuracies using the six ML classifiers, as shown in Table 2. It can be seen that the ANN algorithm failed to provide good accuracy of classification and is listed at the bottom of the table with only 73.09%. The NB provided overall accuracy of 86.94%, whereas DA, SVM and DT algorithms performed in a somewhat similar fashion, delivering overall classification accuracy of more than 97.00%. Finally, KNN classifier in the Energy-Aware Framework performed better than the other five classifiers with overall classification accuracy of 98.82%. On obtaining the best performing model for daily bus ridership travel times-based scenario, its performance was evaluated against advance evaluation metrics, as presented in Table 2. It can be seen that the overall daily bus ridership travel data show the highest precision, recall and f-measure, as 0.97, 0.96 and 0.97, respectively. The Receiver Operating Characteristic (ROC) curve is presented in Figure 5 for the best performing model (KNN), illustrating its ability of classification against No-Skill metric. The overfitting occurs when bias (train loss) is reduced and variance is increased in relation to model complexity. As more parameters are added to a model, the complexity of the model rises, and variance becomes our primary concern while bias (train loss) steadily falls. Our model attempted to learn the relationship on the training data and was evaluated on the test data with 70% of data as training, 20% as test and 10% as validation. On the other hand, instead of training on fixed numbers of epochs, we used early stopping process to stop the training because the validation loss was raised.

Energy Saving, Benchmarking and Metrics
We evaluated the performance of the proposed QL-based cell switching algorithm in respect of live BS. The learning rate λ r was set to 0.3 and the discount factor φ was 0.9 [37]. The energy efficiency performance of the proposed QL assisted approach was compared to various cell switching approaches, namely NS, Greedy and ES. In no-switching case, the SCs are always kept on, while the SCs are always switched off regardless of the available capacity of the MC in Greedy approach. ES, on the other hand, goes through all possible switching options to find the best option which reduces the total energy consumption of the network without exceeding the capacity of the MC. Figure 6 demonstrates the power consumption of all the approaches.
As expected, the Greedy approach outperforms all other methods, as it does not consider the availability of the MC. Thus, its superiority in terms of power consumption comes at the expense of exceeding the MC capacity, which in turn decreases the QoS. The ES approach finds the best trade-off between power consumption reduction and the capacity of MC and as expected the proposed QL assisted approach converges to ES. In other words, the QL-assisted proposed method reduces the power consumption of the network without degrading the QoS. The ECRs of all approaches included in the HetNet are shown in Figure 7a, while Figure 7b presents the gains of NS, Greedy and ES approaches. As shown in Figures 6 and 7, the number of SCs is the most contributing factor for saving power and energy. As shown in Figure 7a, the total energy consumption of the network increases almost linearly with rising number of SCs. Therefore, it is expected that the ECR increases with the growth in number of components. Thus, maintaining CIOs are significant without dropping out capacity for QoS demands. The results in Figure 7b show that the energy saving increases with increasing number of SCs but only to some extent (when the number of SCs is 4). The reason is that the contribution of the SCs on the total energy consumption becomes more significant when their quantity increases, and thus the ECR improves by switching off the SCs. On the other hand, the energy consumption gain starts decreasing once the number of SCs exceed some certain quantity, which is 4 in our model. The reason behind this is the capacity of MC becomes insufficient to accommodate more users; hence, there is no more room to switch off additional SCs. In other words, the MC reaches its limit in terms of capacity, i.e. the number of SCs that can be switched off also reaches the limit. Consequently, the network cannot save more energy even with additional SCs, and the relative gain starts decreasing, as the total energy consumption of the network increases. Finally, CO 2 emissions are directly proportional to ECR, which keeps on increasing with the ratio being increased, as shown in Figure 8. On comparing all methods discussed in this proposal, the overall HetNet gain in terms of energy consumption coupled with CO 2 emissions between NS and Greedy approach is approximately 45.63%, between NS and ES is approximately 35.60% and between NS and QL is approximately 31.83%. For the proposed statistics methods, we can conclude that our robust framework can save considerable amount of energy and subsequent carbon emissions.

Conclusions
The novel Mobility Management-Based Autonomous Energy-Aware Framework using ML and RL techniques is proposed to address multiple challenges including peak and off-peak time passenger ridership and future location estimations supported by mobility prediction accuracies and energy consumption of the HetNet, analysing the overall impact of HetNet CO 2 emissions in a two-tier model by using cell On-Off switching and offloading scheme. In the first part of our framework, we show that the ML-based comprehensively discussed algorithms and optimal mapping of classification prediction accuracy can achieve 98.82% with KNN classifier. Comparative study of peak and off-peak time passenger ridership and future location estimations indicate adequate robustness. In the second part of our framework, we use RL-based QL algorithm to establish an optimal way of underutilised cell On-Off switching operation and SCs that emit unnecessary CO 2 emissions. Based on our Energy-Aware Framework, energy savings gain coupled with carbon emissions of up to 31.83% are achieved. For future works, we would endeavour to implement further optimisation schemes for a higher number of scenarios by employing user-specific behaviours. Another promising area of the research would incorporate clustering of daily ridership in multiple bounds.