1. Introduction
The demand for dependable mobile network services is growing and is projected to continue to grow in the coming years. To meet this rising service demand, mobile network operators (MNOs) are expanding their networks and use a centralized network management system (NMS) to monitor the performance of the radio access network (RAN) and core network, two critical components of mobile network infrastructure. NMS is a network monitoring and control tool, with fault management and performance management its two essential functionalities [
1]. Fault management is the need for fault-free operation and has three aspects, namely fault identification, fault isolation, and fault correction. The fault identification is conducted with the help of network alarms, while fault isolation of the network’s remaining components from the failure is needed so that the isolated network can continue to function normally. Fault correction requires repairing or replacing failed components. Performance management, on the other hand, includes network monitoring to observe network activities and network control to take mitigations that increase network performance. Some of the network manager’s performance concerns include determining the amount of capacity utilization, traffic monitoring, throughput status, and reaction time status, among others [
1].
Although the NMS offers critical information through various management sub-systems, most operators find it challenging to manage the data collected from the system and take corrective actions in a timely manner. Operators select key performance indicators (KPIs) and monitor them at hourly, daily, weekly, or monthly intervals to discover problems or unusual events that might drastically affect service delivery and end-user experience. KPIs are further grouped to assess network performance, and the widely used performance measures are accessibility, retainability, availability, integrity, and mobility. NMS holds vast amounts of historical network performance data, from which possible trends and patterns can be revealed using cutting-edge data mining techniques.
In mobile networks, Markov chain is used for call admission control [
2], quality of experience (QoE) modeling [
3], quality of service (QoS) modeling [
4], efficient resource utilization [
5], prediction of user mobility [
6], handover management, and network operation status monitoring [
7]. In [
8,
9], Markov chain is proposed to forecast radio resource controller (RRC) setup success and call setup success rates (CSSR) for the Long-term Evolution (LTE) mobile network. The status or state as per Markov’s terminology of a cell (base stations) is classified as “Good/High,” “Moderate/Acceptable,” or “Bad/Low” based on the RRC success rate. Data collected from an operator’s network is used to create the Markov chain–based models for RRC and CSSR future state predictions. A given cell is in one of the three states depending on the time of a day, the cell’s geographic location, network capacity, and other user- and network-related factors. In addition to the Markov chain, cluster-based approaches, decision trees, and artificial neural networks were employed in [
10,
11,
12] to estimate a network accessibility-related parameters. These and papers such as [
13,
14,
15] addressed related performance measures for various generations of cellular mobile systems.
This paper’s primary goal is to forecast mobile network accessibility and retainability status using real-time data gathered from the NMS of a major network operator in the capital city of Addis Ababa, Ethiopia. Specifically, the data were collected on an hourly basis from 1530 cells for 4 months’ duration, from 1 November 2020 to 28 February 2021. The states of these two critical RAN performance parameters are defined based on the International Telecommunication Union’s (ITU’s) recommendations for network accessibility and retainability. As the cells are scattered across different geographic regions of the capital city, K-mean clustering technique is used to group cells having spatially correlated performances. The per-cluster averaged data are used to construct the Markov chain prediction model. Two approaches are used for the model formulation, and one is a separate approach so that two Markov models are built for accessibility and retainability. In the joint modeling, a single model is used to predict both parameters. Using either of the two approaches, we can compute the state of the network and the number of transitions until a steady-state is reached. The essential contributions of the research are mentioned here.
In contrast to prior attempts, we established four states [
16], namely “Idle,” “Good,” “Acceptable,” and “Bad” states, to conform to the ITU’s recommendations. Furthermore, the Markov chain is constructed to jointly estimate accessibility and retainability in a single operation, yielding a model with 16 states. Four-state separate estimation is employed as a benchmark for comparison. Incorporating ITU’s recommendations for state definition and the joint prediction proposal are the unique contributions of this research.
Previous models only operate for a single cell, leaving out the correlated nature of accessibility and retainability in the spatial domain. Including more cells, however, increases the number of combined states; thus, the Markov model may not scale as the number of cells increases. As an alternative to replicating the prediction method as many times as the number of cells, we employed K-mean clustering to identify related cells. The Markov chain is then applied to the per-cluster averaged data. Prediction aids in analyzing the status of the considered mobile network.
The remaining paper is organized as follows. Section two discusses fundamental concepts and formulas in accessibility and retainability. Section three introduces some basic concepts of discrete Markov chains. Section four presents and discusses the results obtained. Finally, Section five concludes the paper by identifying possible future directions.
3. Discrete-Time Markov Chain
A Markov chain is a particular class of a stochastic process with random variables designating the states or outputs of the system [
7,
20]. The probability of the system transitioning from its current state to a future state depends only on the current state. The collection of states forms a state space of alphabet size
. Let
designate the state space and let a sequence of states
generated by the system in time, where
and
in
indicates the discrete-time index.
For the Markov chain fulfilling the memoryless assumption, the transition probability is expressed as [
21]:
where
We learn from the Markov property that only the most recent state matters to predict the next or future state. From Equation (5), the transition probability from state
to state
is designated as:
For all
and
, the summation of all transition probabilities in a row must be equal to one, i.e.,
3.1. Transition Probability Matrix
The collection of the transition probabilities
forms the probability transition matrix (TPM),
P (See Equation (8)). Each entry of the matrix shows the probability that the system will transition or remain in the same state.
P is a square matrix with the same dimension as the number of states.
The transition probability
is computed from empirical data by counting the number of transitions from state
to state
and dividing the result by the count of all transitions from state
[
7].
3.2. Initial (Probability or State) Distribution
The initial state distribution is usually expressed as a probability distribution vector,
of dimension
, as shown in Equation (9), with entries that indicate the probability that the system is in a given state at a given initial time. Each entry of the vector is non-negative and the sum of the all entries should be unity.
Without accurate knowledge of the initial distribution, the system can be considered to be in one state with absolute certainty, i.e., probability of unity.
3.3. Steady-State Distribution
One of the fascinating aspects of systems that obey the Markov chain is that, after a sufficient number of iterations/transitions, the chain converges to a steady-state, stable, equilibrium, or static distribution [
7]. A steady-state condition is one in which the probability of the next state is the same regardless of the present state.
With knowledge of the transition matrix
and the initial probability vector
, the probability distribution of the chain after
transitions in the future is given by [
7].
is the result of multiplying the transition matrix
times by itself. Each element of
, designated as
, is the probability of going from state
to state
in
iteration. As we keep iterating through state transitions by applying
the probability vectors
converge to some fixed value, say
. That is called the
steady-state distribution and mathematically written in the form as in Equation (11) below.
We note from Equation (11) that the Markov chain probabilistically predicts the system’s future state based on knowledge of state space, initial distribution, and transition matrix.
3.4. Transition Diagram
A transition diagram, which illustrates all of the system’s transitions, is another way to display the TPM. A directed arrow shows the presence of a transition from one state to another state, and each node represents a state of the Markov chain. The edge represents the current state, and the arrow points towards the next state [
7].
5. Conclusions
In this paper, the two important mobile network KPI parameters of accessibility and retainability are predicted by formulating the Markov chain in four states and sixteen states. The sixteen-state Markov chain is formulated in a bid to jointly estimate both KPIs in a single operation. Moreover, in order to capture the spatial behaviors of these KPIs, K-mean clustering is applied to cluster the data from 1530 cells into 6 clusters. States are created based on threshold values set by operators and the developed models are validated by splitting the data for training and testing. We hope the approach provides significant insight on how to use data available within an operator’s NMS to better understand the status of a network.
This work might be improved in some ways. Conducting the prediction for a large number of cells in a computationally efficient manner and to obtain per-cell level information is one research area. The clustering and joint approach may not scale well as the number of cells grows. Moreover, applying the approach for other KPIs, network types, and services is an area worth exploring. Finally, future research should employ the hidden Markov model for status modeling and prediction.