Data-Driven Construction of User Utility Functions from Radio Connection Traces in LTE

In recent years, the number of services in mobile networks has increased exponentially. This increase has forced operators to change their network management processes to ensure an adequate Quality of Experience (QoE). A key component in QoE management is the availability of a precise QoE model for every service that reflects the impact of network performance variations on the end-user experience. In this work, an automatic method is presented for deriving Quality-of-Service (QoS) thresholds in analytical QoE models of several services from radio connection traces collected in an Long Term Evolution (LTE) network. Such QoS thresholds reflect the minimum connection performance below which a user gives up its connection. The proposed method relies on the fact that user experience influences the traffic volume requested by users. Method assessment is performed with real connection traces taken from live LTE networks. Results confirm that packet delay or user throughput are critical factors for user experience in the analyzed services.


Introduction
In recent years, there has been a significant increase in the number of users and services in mobile networks. This fact has led to an exponential growth in the demand of mobility services. In coming years, a tenfold increase of mobile traffic is expected, reaching 71% of total traffic on the Internet by 2022. Internet of Things (IoT) applications are one of the main causes for this increase, and by 2023, IoT devices will account for 50% of all global networked devices [1]. Not only that, new radio access technologies (e.g., 5G) have increased the complexity of mobile networks, which has been identified as a major issue for the success of future deployments [2].
Traditionally, operators have managed their networks in a Quality of Service (QoS) framework. This QoS perspective needs to measure user or network performance (e.g., accessibility, sustainability, integrity, et al.). Thus, network management must be oriented in such a way as to meet some requirements based on these indicators (e.g., a user throughput not less than X Mbps). Additionally, QoS requirements can be defined in a service basis in such a way that different services can use different indicators and/or meet different requirements. As an example, operators usually demand some maximum delay for realtime services (e.g., Voice over IP (VoIP)), while throughput is the most-used indicator for best-effort services (e.g., the Internet) [3].
The QoS framework, however, lacks the user's perspective, and so a good network/user performance is not always translated into a good user experience. Operators have therefore shifted their focus from network performance to end-user satisfaction (Quality of Experience (QoE)) [4]. This shift is reinforced by the success of smartphones and tablets, which has raised users' expectations, and the introduction of 5G new radio technology [5,6]. As a consequence, QoS management processes have been replaced by a more modern approach that is focused on QoE. This new paradigm has become a key differentiating factor in a competitive market in which networks and services are similar for all operators. In this new framework focused on the user's perspective, Customer Experience Management (CEM) has become an extremely important task for mobile network operators [7].
CEM aims to improve the final user experience by optimizing the use of network resources [8]. One of the main tasks involved in CEM is to find sophisticated indicators at the service level to ensure service performance is properly characterized. Unfortunately, such service performance indicators are usually not available for network operators, unless complex crowd sourcing schemes are deployed [9]. Thus, CEM tries to understand the factors influencing user quality perception with the aim of describing the relationship between measurable variables and the experience perceived by the end user (i.e., QoE modeling [8,10]. Such variables may be human (e.g., age, education, etc.), system (e.g., resolution, throughput, delay, etc.) or context (e.g., cost, data charging gap, mobility, etc.) factors [11][12][13]. For system aspects, QoE models often consist of analytical utility functions relating network-based QoS indicators to user opinion [14]. In its simplest form, this relationship between QoS and QoE is a logarithmic [15] or exponential [16] function. This approach is followed by most frameworks for large-scale, on-line, passive monitoring for each connection [17,18]. For a comprehensive survey of objective QoE models, the reader is referred to [19].
Most QoE models include parameters reflecting QoS thresholds above/below which QoE remains constant [20]. The values of these thresholds are derived from subjective tests with real users in lab environments, which are time-consuming and may not reflect the true conditions in real life. Moreover, objective QoE models are seldom updated. However, customer expectations continuously increase as a result of handset upgrades, service diversification and new radio technologies. As a consequence, user satisfaction progressively decreases if the provided QoS remains the same. For this reason, QoE models must be continuously updated. In most cases, tuning model parameters would be enough, avoiding more complex actions, such as changing the model structure. Even so, an automatic parameter tuning process is required to avoid subjective tests.
Current mobile networks generate a huge amount of information in the form of measurements and interaction registers [21]. However, for simplicity, the majority of this information is discarded, and CEM is often performed based on limited data. Thus, operators are only focused on Configuration Management (CM), Performance Management (PM), Charge Data Record (CDR) and Customer Relationship Management (CRM) data. All this information is usually aggregated, meaning that it is impossible to identify an individual user's QoE. With the latest advances in information technologies, it is now possible to analyze massive volumes of information by using Big Data Analytics (BDA) techniques [22]. In mobile networks, BDA can improve the reaction time of management systems, allowing actions in real time and in a proactive way to improve the monitoring, control and optimization of QoE [21]. Connection traces are one of the main sources of information in mobile networks. Traces systematically register all events associated with a specific cell/user in some period of time, becoming a powerful tool for automated network performance analysis, monitoring and control [6].
In this work, a novel automatic method is presented to tune QoS thresholds in classical analytical QoE models by analyzing radio connection traces in an Long Term Evolution (LTE) system. The proposed method relies on the fact that users tend to shorten their connections when QoE is not satisfactory. Thus, the values of QoS thresholds can be inferred by detecting the loss of traffic volume for each connection as a result of unsatisfied users. The method consists of two stages: first, connections are segregated per service, based on QoS Class Identifiers (QCIs) and hierarchical clustering from connection descriptors; then, the value of QoS thresholds is estimated for each service by analyzing traffic descriptors on a per-connection basis. Method assessment is carried out by using a real trace dataset from two live LTE networks. Unlike previous approaches, the proposed data-driven method (a) can be fully automated, eliminating the need for subjective tests when deploying a new service; (b) can deal with the large diversity of system and human factors, which cannot be taken into account in lab environments; and (c) can be executed periodically to detect changes in user trends in large geographical regions.
The rest of the work is organized as follows. Section 2 introduces the use of utility functions for QoE characterization. Section 3 outlines the trace collection process in mobile networks. Section 4 describes the proposed method to adjust QoS threshold parameters in classical QoE models based on network traces. Section 5 shows the results obtained with a trace dataset taken from real LTE systems. Finally, Section 6 presents the main conclusions.

Characterization of Quality of Experience
QoE monitoring in mobile networks is a key factor for operators [23]. As the network evolves, new indicators and counters are included in network equipment with the aim of reflecting service performance (e.g., initial buffering time or web download time for video and web services, respectively). However, user experience, as a subjective matter, cannot be measured but only estimated from network and service performance indicators. For this purpose, QoE models use utility functions to map the value of network Key Performance Indicators (KPIs), reflecting QoS, to user experience [20,24].
How QoS parameters are mapped into a QoE indicator is a widely studied subject. A generic formula connecting QoE with QoS for different packet data services is described in [16]. It is assumed here that user experience remains constant at a maximum level when some upper QoS threshold is exceeded. Similarly, a minimum QoS threshold can be defined below which a user neglects to continue its connection due to their bad experience. These statements can be formulated as where At the same time, user experience is influenced by factors that strongly depend on the requested service. For instance, a user performing a voice call is sensitive to packet delay, whereas a user uploading a photo in a social network is more sensitive to throughput [25]. Thus, different user utility functions are defined for each service [20]. To aid comparison, QoE is commonly measured as the Mean Opinion Score (MOS). MOS scale ranges from 1 (worst experience) to 5 (best experience), i.e., MOS where superscript s refers to the service under consideration (i.e., s ∈ {web, video, . . .}). From (3), it follows that users of different services can experience a different QoE with the same network performance (QoS). Consequently, different QoS requirements must be achieved to guarantee the same MOS for all services in a mobile network [20].
QoS thresholds give extremely valuable information to network operators, as it is not worthwhile to increase QoS beyond/below a certain threshold if there is no impact on user experience. Unfortunately, the value of QoS thresholds per service, QoS (s) i,min/max , is highly dependent on many factors, such as user expectation (which is not the same for all users), handset features (the user expects a better experience for a more expensive terminal) or network evolution (a specific level of user experience previously seen as acceptable may not be so some months later). All these factors make it very difficult for operators to find precise QoS thresholds for their networks. Nonetheless, approximating these thresholds is still useful for operators as it allows them to assess the overall cell performance from a user experience perspective. From these thresholds, operators can trigger corrective actions to have an impact on the overall user experience (e.g., ensuring some minimum user QoE). In this work, we take advantage of the fact that the minimum threshold, QoS i,min , often reflects the QoS below which the user gives up the connection [16]. Thus, QoS i,min can be inferred from user behavior observed in connection traces.

Trace Collection Process
Monitoring the QoE of individual users can only be done by collecting QoS indicators for each connection. Such a piece of information is only available in connection traces, containing signaling messages (a.k.a. events) exchanged between every single piece of user equipment (UE) and base station. The structure of events consists of a header and a message container made up of different attributes, referred to as event parameters. The header provides general information (e.g., timestamp, base station, user, event type, among others), whereas attributes stored in the message container are specific to the event. Depending on the network entities involved, events can be external or internal. External events consist of signaling messages exchanged through network interfaces via standard protocols [26][27][28], whereas internal events store vendor-specific information about the performance of the base stations (known as evolved Nodes B (eNBs) in LTE). Events selected by the network operator are registered in a Data Trace File (DTF) for each cell, which is. generated after each reporting period (currently, 15 min). Two types of DTFs are distinguished: UE Traffic Recording (UETR) and Cell Traffic Recording (CTR) [29]. UETRs gather events from a specific users identified by International Mobile Subscriber Identity (IMSI), while CTRs store cell performance information by monitoring many anonymous connections [30]. In this work, CTRs are used to collect QoS indicators that reflect the average performance of each cell in the network.
A high-level view of the architecture for trace reporting in LTE can be found in [30]. The operator starts the trace collection process by preparing a Configuration Trace File (CTF) in the Operations Support System (OSS). A CTF consists of (a) the event(s) to be monitored, (b) the particular UE(s) or ratio of anonymous users to be monitored, (c) the Reporting Output Period (ROP), (d) the maximum number of traces activated simultaneously in the OSS and (e) the time period when trace collection is enabled. Once trace collection is enabled, UEs transfer their event records to their serving eNB. After finishing the ROP, DTFs are generated by the eNB and then sent to the OSS asynchronously.
Trace files are binary files encoded in ASN.1 format [29]. Trace decoding is performed by a parsing tool that decodes, synchronizes and correlates events to extract the information contained in fields and compute the required network indicators, as described later.

Estimation of QoS Thresholds on a Per-Service Basis
A novel method to automatically estimate QoS thresholds for different services is described in this section. In this work, only the threshold that determines the worst network performance tolerated by users before terminating the connection is estimated. Depending on the service, this critical value corresponds to QoS i,th max . Estimation is carried out by a heuristic approach based on user behavior observed in connection traces. The inputs to the method are the following descriptors, collected for each connection: (a) the QCI value; (b) the Radio Resource Control (RRC) connection time; (c) the total downlink (DL) and uplink (UL) traffic volume at the packet data converge protocol level; (d) the DL traffic volume ratio transmitted in the last transmission time intervals (TTIs) [31]; (e) the DL activity ratio, computed as the ratio between active TTIs (i.e., those with data to transmit) and the effective duration of the connection; (f) the DL session throughput, computed as the volume transmitted in the DL divided by the effective duration of the connection; (g) the mean downlink delay, τ, defined as the sum of DL mean connection delays in Radio Link Control (RLC) and Medium Access Control (MAC) layers; and (h) the mean DL Packet Data Control Protocol (PDCP) connection throughput, TH PDCP,DL , excluding the last TTIs. The output of the method is an estimate of the QoS threshold for each indicator i and service s, QoS Two main steps are required to estimate QoS thresholds for each service: (1) the classification of connection traces on a service basis and (2) the estimation of QoS thresholds for each service by analyzing user behavior.

Step 1: Classification of Connection Traces
Due to the coexistence of multiple services with very different requirements, cellular operators are forced to classify traffic for each service to offer differentiated access and resource management [32]. In LTE, services are distinguished by their QCI value [33]. Then, different traffic management priorities and policies (e.g., scheduling weights, queue thresholds, link-layer protocol configuration, etc.) are applied depending on QCI. In current networks, services are commonly classified as QCI 1 (VoIP), QCI 2 (conversational video), QCI 3 (real-time gaming), QCI 4 (non-conversational video), QCI 5 (IMS signaling) and QCIs from 6 to 9 (services based on the Transport Control Protocol without a guaranteed bit rate) [33]. In particular, QCI labels 6 to 9 include a mix of services, ranging from social networks to buffered streaming, which have very different QoS requirements from a QoE perspective. Moreover, some operators assign these last QCI values for user prioritization purposes (i.e., plan vs pre-paid). Thus, it is very difficult to monitor the experience of each specific service based on counters in the network management system, even if these are segregated per QCI. Thus, a more accurate traffic classification is needed for QCIs 6-9.
In recent years, several methods for data traffic classification have been proposed. The simplest method is to identify the connection port [34]. However, currently, several applications use non-standard ports, and port assignment is often dynamic, meaning that there is no unequivocal relationship between a port number and service. More refined methods for traffic classification are based on the analysis of information exchanged along the session [35]. Such an approach cannot be applied for encrypted traffic services. Moreover, even for non-encrypted services, all these methods rely on information from high protocol layers, which can only be accessed by expensive network probes [36].
An option to solve these limitations consists of analyzing payload-independent flow characteristics. These methods exploit the fact that different applications show different features in their traffic that can be classified with Machine Learning (ML) techniques. Encrypted traffic classification has been extensively covered in the literature. In [37], a supervised learning algorithm is used to identify fingerprints of Android apps from their encrypted network traffic. However, supervised schemes require a labeled training dataset. Other alternatives use unsupervised learning algorithms to classify connections without the need of a previously-labeled dataset [38,39]. In [38], an unsupervised method for offline coarse-grained traffic classification in cellular radio access networks is presented. This method relies on the fact that the identification of the class of service for a specific connection can be performed from a set of traffic descriptors showing the properties of data bursts in the connection. Unfortunately, radio connection traces do not explicitly register these traffic descriptors at the burst level, so that they must be estimated from other traffic parameters collected per connection. In the absence of labeled data that could be used as ground truth, the authors in [38] validate their method by comparing the traffic mix resulting from their classification algorithm against mobile traffic statistics published by a vendor. Results show that traffic shares per application class estimated by the proposed method are similar to those provided by a vendor report.
The above-described method is used in this work in the absence of a large dataset of real traces that includes the service requested by the user for each radio connection, due to the difficulty of combining data from the radio access and core domains. To this end, the following traffic descriptors are collected per connection: • The RRC connection time; • The total DL traffic volume at the packet data converge protocol level; • The UL traffic volume ratio η UL [%], computed as • The DL traffic volume ratio transmitted in last TTIs, η lastTTI UL , computed as • The DL activity ratio, η active DL , computed as the ratio between active TTIs and the effective duration of the connection, • The session DL throughput, TH session DL (in bps).
Then, burst level parameters required for traffic classification are estimated for each connection from the set of traffic descriptors listed above. From these parameters, connections are divided into groups by hierarchical clustering. Finally, the resulting groups are associated with broad application groups by analyzing the median value of traffic descriptors for connections in each group.

Step 2: Estimation of Minimum Qos Thresholds
As explained in Section 2, each service has its own user utility function, f (s) , combining different QoS indicators. In this work, the analysis is restricted to application groups that have a significant share of connections and are affected by QoS; namely, Voice over LTE (VoLTE), full-buffer data services (e.g., app download, software update, large file download via File Transfer Protocol, etc.) and streaming services (e.g., audio/video, live/buffered, etc.). For simplicity, in each service, only the QoS indicator with the largest impact on QoE for each service is considered (i.e., N =1 ∀ s in (3)). This indicator is not necessarily the same for all services. For instance, packet delay negatively affects user experience for real-time services (e.g., VoLTE or conversational video-streaming), whereas user experience in non-real-time services (e.g., app download) is more sensitive to user data throughput. Previous works have shown that user experience in most services is dominated by a single QoS metric. For instance, in [40], an analytical model to estimate the QoE for a video-streaming service based on different network level metrics (e.g., average session throughput, packet loss ratio and round-trip time) is presented. It is shown there that QoE is strongly correlated with a single QoS metric (average session throughput). On the other hand, it is well accepted that voice calls are mostly affected by packet delay [41]. For this reason, user experience is estimated here from the foremost QoS indicator of the requested service in order to reduce the complexity of the proposed model.
Hereafter, it is assumed that the QoE of a connection k of service s, MOS (s) (k), is conditioned by the value of the indicator i with the largest impact for that service, QoS  For instance, unsatisfied VoLTE users tend to shorten their connections, and the effect is therefore observed in connection length. In contrast, in non-real-time services, whether background, interactive or streaming, the effect is more evident in the traffic volume for each connection. As a consequence, an analysis of an additional and service-based traffic indicators (e.g., length connection for VoLTE or data volume for streaming services) is needed in order to detect those low-QoE connections. This traffic indicator is denoted as T          , is estimated. This minimum QoS threshold determines a boundary between two states: a degraded state, where a user perceives a bad service performance and tends to stop the connection, and a normal state, where service performance is good enough to consume the service normally. As this boundary highly depends on service, the following paragraphs anticipate the ideal user behavior for broad service classes. To this end, Figure 1 shows the expected relationship between the selected QoS and traffic indicators-i.e., QoS    In full-buffer data services, all data are available at the beginning of the connection, meaning that the associated traffic pattern consists of a few, very long bursts in which data are transmitted at full speed. Thus, the user terminal demands as many resources as possible until all the data are transmitted. It is assumed here that the user tends to give up the session when the download time exceeds a certain threshold. Such an action should be reflected both in connection duration and traffic volumes per connection, as shown in Figure 1a. The x-axis represents the mean DL PDCP connection throughput, measured only considering active (and non-last) TTIs, which are selected as the QoS indicator with the largest impact on QoE for these services. The primary x-axis represents the connection duration, while the secondary y-axis represents the total DL data volume per the connection. The solid curve represents the median of the distribution of connection duration, whereas the dashed line represents the median of the distribution of the total DL data volume. For clarity, the shaded area labeled as the degraded state comprises connections whose link conditions are unacceptable for the user, which are more likely to be interrupted. As observed in the figure, it is expected that users will try to maintain a connection until a maximum duration is reached. On the right of the figure, as the link performance improves, the connection duration is reduced, since data are transmitted faster. In contrast, the data volume per connection remains constant, since it is not conditioned by link performance beyond a certain point (i.e., the user ends the connection before downloading the complete data). Thus, the minimum QoS threshold, TH min , in full-buffer data services is estimated as the average DL PDCP throughput below which connection duration drops.
Streaming services are also affected by user throughput, meaning that the selected QoS indicator is again DL PDCP throughput. However, a different behavior is expected for connection duration and data volume. Streaming sessions consist of long connections with large data volume distributed in many bursts. Unlike full-buffer data services, streaming services are elastic, meaning that a good link performance does not necessarily lead to a reduction of session duration. Thus, connection duration may not be a good QoS indicator to reflect user behavior. Instead, DL session throughput, calculated by dividing the total DL data volume by the connection duration (including silent periods), may reflect the quality of the downloaded material. Figure 1b shows the expected impact of user behavior for streaming services, representing the relationship between DL PDCP throughput and DL session throughput. The solid line represents the median session throughput and the shaded area defines the degraded state. As shown, in the degraded state, the session throughput decreases as the DL PDCP throughput decreases. Once the DL PDCP throughput is good enough, the session throughput remains constant, showing that the latter is not conditioned by the former. Thus, the minimum QoS threshold, TH min , for streaming services is the value of the DL PDCP throughput below which the median session throughput starts to decrease.
In a VoLTE service, the connection duration is the most representative indicator for the characterization of user behavior. However, unlike full-buffer data, the QoS indicator with the strongest impact on QoE is packet delay. Figure 1c shows the expected impact of user behavior in VoLTE by representing the variation of connection duration caused by changes in DL packet delay. As in previous sub-figures, the solid line represents the median of connection duration and the shaded area is the degraded state. It is observed that the median connection duration should drop when the DL packet delay increases above a certain limit. Thus, the minimum QoS threshold, τ max , for VoLTE is the value of average DL packet delay above which the mean connection duration starts to decrease.
It is envisaged that, in real networks, some services may not be fully represented by the three above cases. For instance, web service or social networks might show different behaviors depending on the size of their objects. Likewise, live streaming may have strict latency requirements.

Performance Assessment
In this section, the above-described method to estimate QoS thresholds on a service basis is tested with a set of radio connection traces taken from a live LTE network. For clarity, the analysis set-up is first explained and results are presented later. Finally, implementation issues are discussed.

Analysis Set-Up
Two independent datasets are generated from anonymous traces collected in two different LTE systems. Both systems are mature enough to provide a large set of connections with a varying QoS to derive the required QoS thresholds. Dataset 1 is collected in 1960 LTE cells covering an urban area of 3900 km 2 . Specifically, traces are collected during two hours (from 10:00 to 12:00 a.m.), resulting in 48,683 connections: 43% of connections in QCI 1 and 57% in the range of QCIs 6-9. On the other hand, dataset 2 is collected from 10:00 to 11:00 a.m. in 145 LTE cells covering 125 km 2 in an urban area, resulting in 10,123 connections, all of which have QCIs between 6 and 9.
Traces are processed to obtain the traffic descriptors for each connection needed for traffic classification, as defined in Section 4.1. Then, connections are classified with the unsupervised learning method described in the same subsection. After classification, 8% of connections are labeled as full-buffer data services, 5% are classified as streaming, 35% are classified as VoIP, 5% as web browsing of webs with large objects and 47% as web browsing for webs with small objects or social networks. . The throughput axis is adjusted to low values (below 10 Mbps) to better identify the boundary between the two states specified in Figure 1a. The results confirm the expected impact of user behavior, since, for a low DL PDCP throughput, the DL data volume decreases and the connection duration stagnates. The minimum QoS threshold can be determined as the TH      . A priori, user behavior for these services should be close to that in full-buffer services. However, the DL data volume seems not to be greatly affected by changes in DL PDCP throughput. This is due to the fact that web sessions manage a lower amount of data for each connection than full-buffer data services and thus the link performance must be much worse for the user to notice this degradation. Based on the available data, a minimum QoS value for this service cannot be obtained. Finally, Figure 6 shows the analysis of web browsing with small objects and social networks. Each point in the figure represents a connection identified as these services. The solid line represents CD do not show changes regardless of throughput values. This is due to the fact that these services manage a very small amount of data for each connection. As a consequence, user satisfaction relies more on successful data transactions rather than on the connection duration. Thus, only extremely bad link conditions would impact CD.

Implementation Issues
The method is designed as a centralized scheme that can be integrated into OSS platforms. Due to its simplicity, its computational load is relatively low. The theoretical time complexity increases linearly with the number of analyzed connection traces. In practice, the most time-consuming process is trace pre-processing, which can be done by trace processing tools provided by OSS vendors and the classification process, which is performed by using an unsupervised algorithm and can be implemented, along with the rest of the method, in any programming language (in this work, Matlab [42]). Specifically, the total execution time for the considered datasets in a 2.6-GHz quad-core processor laptop is less than 5462 s (92 s per 1000 connections).

Conclusions
In this paper, a novel automatic method for estimating QoS thresholds to be integrated in user utility functions on a per-service basis in an LTE system is proposed. The method relies on the collection of radio connection traces. In the first stage, connection traces are classified into application groups based on QCI and traffic descriptors registered per connection. Then, a minimum QoS threshold is inferred on a per-service basis by analyzing the QoS indicator with the largest impact on user experience and the traffic indicator that best reflects user behavior. The method has been tested with traces taken from live LTE networks, resulting in a minimum DL user throughput of 5 Mbps for full-buffer data services, 30 Mbps for streaming services and a maximum DL packet delay of 20 ms for VoIP services. The proposed data-driven method can be fully automated, eliminating the need for time-consuming subjective tests. Likewise, it can deal with the large diversity of system and human factors, which cannot be taken into account in lab environments. Due to its low computational load, it can be executed periodically to track changes in user trends. Additional analysis can be extended to 5G and broadband Internet satellite systems to check the impact of network capabilities on general user behavior. Funding: This work has been funded by the Spanish Ministry of Science, Innovation and Universities (RTI2018-099148-BI00), the Junta de Andalucía (UMA18-FEDERJA256) and Ericsson Spain.

Data Availability Statement:
Restrictions apply to the availability of these data. Data were obtained from Ericsson Spain and are available from A.J.G. with the permission of Ericsson Spain.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: