An Approach to Data Analysis in 5G Networks

: 5G networks expect to provide signiﬁcant advances in network management compared to traditional mobile infrastructures by leveraging intelligence capabilities such as data analysis, prediction, pattern recognition and artiﬁcial intelligence. The key idea behind these actions is to facilitate the decision-making process in order to solve or mitigate common network problems in a dynamic and proactive way. In this context, this paper presents the design of Self-Organized Network Management in Virtualized and Software Deﬁned Networks (SELFNET) Analyzer Module, which main objective is to identify suspicious or unexpected situations based on metrics provided by different network components and sensors. The SELFNET Analyzer Module provides a modular architecture driven by use cases where analytic functions can be easily extended. This paper also proposes the data speciﬁcation to deﬁne the data inputs to be taking into account in diagnosis process. This data speciﬁcation has been implemented with different use cases within SELFNET Project, proving its effectiveness.


Introduction
5G networks expect to provide a secure, reliable and high-performance environment with minimal disruptions in the provisioning of advanced network services, regardless the device location or when the service is required [1].This new network generation will be able to deliver ultra-high capacity, low latency and better Quality of Service (QoS) compared with current Long Term Evolution (LTE) networks [2].In order to provide these capabilities, 5G proposes the combination of advanced technologies such as Software Defined Networking (SDN) [3,4], Network Function Virtualization (NFV) [5,6], Cloud Computing [7], Self-organized Networks (SON) [8,9], Artificial Intelligence, Big Data [10][11][12], Device to Device Communications (D2D), among others [13][14][15][16][17].In particular, 5G will be able to face unexpected changes or network problems through the identification of specific situations and taking into account the user needs and the Service level Agreements (SLAs).
Nowadays, the main telecommunication operators and community research are working in strategies to facilitate the decision-making process when specific events or situations compromises the health in 5G Networks [18,19].Meanwhile, the concept of situational awareness (SA) and incident management models applied to 5G Networks are also an emerge topic [20,21].In this context, Self-Organized Network Management in Virtualized and Software Defined Networks Project (SELFNET) [22] combines SDN, NFV and SON concepts to provide a smart autonomic management framework, analysing and resolving network problems and improving the QoS and Quality of Experience (QoE) of end users.In order to facilitate the decision-making process, SELFNET proposes an analysis phase to diagnosis and predict possible problems in 5G Networks.
This paper presents the design of SELFNET Analyzer Module, which main objective is to diagnose the network state and infer data from monitored low level and aggregated metrics in order to facilitate proactive responses.The contributions of this proposal include: (i) the description of diagnosis and prediction capabilities in 5G environments and how it is being applied in current research and projects; (ii) the introduction of SELFNET Analyzer Architecture, its design principles and requirements and (iii) the definition of data specification, as well examples, to obtain the initial parameters to diagnostic purpose.This document is organized into eight sections, being the first of them the present introduction.Section 2 describes 5G requirements, related works, the main characteristics and analytic capabilities of SELFNET Project.Section 3 outlines the design principles, requirements and the architecture of Analyzer Module as a whole.Then, Section 4 shows this module as a black, emphasizing the data inputs and outputs.Section 5 formally defines how the data must be specified.Section 6 illustrates examples of the data specification and their workflows.Section 7 discusses the main contributions of this proposal.Finally, conclusion and future work are presented in Section 8.

Background
This section describes how analysis and intelligent capabilities can address 5G requirements, the related work and research projects, emphasizing the main features of SELFNET Project.

Diagnosis Capabilities in 5G Networks
A 5G network envisages an architecture able to cover three main domains [1]: (i) enhancement of radio capabilities to enable the spectrum optimization, the interference coordination and cost-effective dense deployments; (ii) provisioning of an effective network management environment to create and deploy a common core to support several use cases in a cost-effective manner; and (iii) simplification of the system operations by means of automated procedures, where the introduction of new capabilities or network functions should not imply increased complexity on operations and management tasks.In order to tackle these requirements, 5G networks take advantage of the separation between data and control plane (network programmability) offered by SDN architectures [23], the deployment of virtualized network functions, the scalability and flexibility in the service provisioning based on cloud environments, enabling high capacity and massive communications (cognitive radio, carrier aggregation, Machine to Machine Communication), spectrum and resource optimization (millimeter wave and massive Multiple Input Multiple Output (MIMO)) and intelligent capabilities provided by artificial intelligence or self-organization concepts [24,25].
In particular, the introduction of analysis and intelligent capabilities [8,19] could be applied to several domains such as autonomic network maintenance, automation in the provisioning of services, prediction and remediation of congestion or queue utilization, detection of security threats, improving network efficiency, multi cell coordination, provisioning of high QoS and QoE for services, etc.For this purpose, analysis and intelligent capabilities allow to response to network problems based on pattern recognition, the dynamic smart selection of the best location where the services can be deployed or migrated, sharing and releasing of resources based on forecasting methods, building of context awareness models based on real time information from the network, its devices and applications.In order to provide intelligence and facilitate the decision-making process, some tasks must be performed.On one hand, analysis stage is intended to perform the identification of network situations and events.These situations do not necessarily imply (a priori) a harmful nature.On the other hand, the decision-making task determines if a specific situation is a risk for the network health, or its components, and then it performs the respective countermeasures.
In this context, traditional approaches apply different analysis and reasoning techniques, such as Bayesian Networks (BN) [26], in order to provide intelligent to common network management tasks.However, these models are not sufficient to guarantee the network performance according to SLAs and future 5G user needs [1].There are some proposals to address the data analysis in 5G systems and its elements such as access and radio components [13,27], network devices [28], cloud elements [29] and resource allocation [30].In [31], a prototype to perform mobile network analysis based on Markov Logic Network (MLN) and semantic web technologies is presented.This approach allows the optimization and network status characterization but does not explain how it cover heterogeneous data sources.For its part, Imran et al. [10] proposes a framework to provide a full view of network status based on machine learning and big data concepts.To this end, their proposal predicts the user behaviour and dynamically associate the network response to the network parameters.However it is doesn't specify how to deal with SDN or NFV components.
Meanwhile, there are reports [1,32] and projects [21,[33][34][35][36][37] that introduce analysis capacities to cover 5G requirements.In this way, METIS Project [28] takes into account SON concept in order to provide a new level of adaptability to 5G infrastructures.Meanwhile, 5G-NORMA [35] introduces adaptive capacities to allocate network functions based on user and traffic demands over time and location.CHARISMA project [36] deploys an intelligent cloud radio access network and end devices.For its part, 5G-Ensure [21] proposes a 5G secure system based on risk assessment and mitigation methodologies.COGNET [37] takes into account Machine learning, SDN and NFV to provide dynamic adaptation of network resources.For its part, a whole approach to address not only analysis component but also the whole cycle of incident management in 5G networks is proposed in [20].This work applies the three stages of processing information of Endsley Model [38] to 5G Networks: perception, comprehension and projection.In the perception phase the monitoring and collection of different metrics from network infrastructure (and its elements) are performed.Then in comprehension stage, the association and correlation of this information are performed in order to provide enhanced metrics to be analysed (projection phase).The analysis component includes the diagnosis and prediction of the whole state of the system.In general terms, these proposals aid to tackle 5G Requirements but they do not offer a generalized approach able to take into account several kind of metrics from heterogeneous data sources, that is the case of SELFNET Project [22].

SELFNET Project
The SELFNET H2020 Project [22] aims to provide an autonomic network management framework for 5G mobile network infrastructures through the integration of novel technologies such as SDN, NFV, SON, Cloud computing and Artificial Intelligence.SELFNET enables both autonomic corrective and preventive actions to mitigate existing or potential network problems while providing scalability, extensibility and reduce capital expenditure (capex) and operational expenditure (opex).These capabilities are provided through a layered architecture and a use-case driven approach, as is detailed in [34].The SELFNET architecture addresses major network management problems including self-protection capabilities against distributed cyber-attacks, self-healing capabilities against network failures, and self-optimization to dynamically improve the performance of the network and the QoE of the users.For this purpose, SELFNET defines two kind of advanced network functions: (i) sensors to monitor specific information from the network and (ii) actuators to address or mitigate possible problems.In particular, the network intelligence is provided by SON Autonomic Layer.This layer collects metrics related with the network behaviour and use that information to infer the network status.Then, it decides the actions to be executed to accomplish the system goals.The SON Autonomic Layer is composed by two sublayers: (i) Monitor and Analyzer Sublayer and (ii) Autonomic Management Sublayer.The Monitor and Analyzer Sublayer follows the Endsley Situational Awareness Principles.Monitoring and Discovery, Aggregation and Correlation and Analyzer modules corresponds with the Perception, Comprehension and Projection functions as is shown in Figure 1.Regarding the Analyzer Module, its main goal is to infer data from the monitored metrics in order to facilitate proactive responses over the network infrastructure (i.e., enhance diagnosis and decision-making tasks).Therefore the Analyzer Module is the first step to provide intelligence to the system, where complex conclusions should be reached by reasoning about knowledge provided by the Monitoring/Aggregation stages and the definition of each use case.Because of this, the Analyzer Module distinguishes three great information processing activities: Pattern Recognition, Reasoning and Prediction.The achieved conclusions are described in the form of symptoms related with each use case.Bearing this in mind, it is possible to assert that the Analyzer Module provides a symptom-oriented Situational Awareness bounded by the situations defined for each use case.

SELFNET Analyzer Module Design
In this section the design of SELFNET Analyzer Module is detailed.It also describes the initial assumptions, the requirements, the design principles as well as the Analyzer architecture.

Initial Assumption and Requirements
The following describes the most relevant requirements and the main initial assumptions considered in the design of the Analyzer Module: • Scalability.The approach must be allowed to add new capabilities (extensibility), according to SELFNET design principles.For this reason, the integration of additional analytic functionalities are done via plugins.• Use Case Driven.Given the heavy reliance of the tasks performed with the characteristics of use cases, the basic definition of the observations to be studied (Knowledge-based objects, rules, prediction metrics, etc.) are provided by the use case operator, thus being the Analyzer Module scalable to alternative contexts.• Knowledge Acquisition.It is well known that the most common disadvantage of the expert systems is the initial knowledge acquisition problem.Hence, to have skilled operators in novel use cases with the ability to properly specify rules is not always straightforward.This document does not address the issue of the innate knowledge acquisition.Our approach assumes that the use case knowledge-bases are provided by skilled operators or by accurate machine learning algorithms.• User-Friendly Symptom Definition Rules.The definition of proper rule-sets is a tricky business.Thus, even the skilled operators often do coherence/ambiguity mistakes.In order to mitigate these problems, the configuration and definition of new use cases should be user-friendly, as well as the scheme for building new rule-sets.• Uncertainly.Classical logic permits only exact reasoning.It assumes that perfect knowledge always exists, but this remains far from the SELFNET reality.In order to improve the quality of the conclusions, the Analyzer Module manages the knowledge bearing in mind uncertainty.This is particularly appropriate for certain analytic features, such as studying observations based on decision thresholds or confidence intervals.In addition, closing the door on possible stochastic dependent definitions is against the SELFNET design principles, as these could be the keys to properly specify future use cases.• Filtering.Initially, the filtering of symptom reports is not considered.Because of this, every inferred symptom, regardless of nature or uncertainty, is transmitted to the diagnosis/decision-making stages, where their impact and relevance are properly assessed.

Design Principles
The following design principles and limitations lay the foundation of the Analyzer framework, as well as the implementation of its internal components: • Big Data.In order to deal with huge and homogeneous datasets, Big Data provides predictive algorithms, user behaviour analytics, and aggregation/correlation functionalities [39].These capabilities are mainly taken into account in monitoring and aggregation tasks.The Analyzer Module deals with aggregated and correlated metrics, hence reducing the amount of information to be analysed.In our approach, the implementation of Big Data technologies to handle all this information is optional, leaving the decision of integrate these tools at the mercy of the SELFNET administrators, which driven by a better awareness of the use cases and the monitoring environment are more able to decide whether they are counterproductive or beneficial [40].Because of this, our contribution is compatible with both Big Data and conventional techniques.• Stationary Monitoring Environment.According to Holte et al. [41], in a stationary monitoring environment, the characteristics and distribution of the normal observations to be analysed match the reference sample population considered in the Analyzer learning processes.If the monitoring environment distribution is able to change representatively, it is considered non-stationary.Another problem that may reduce the quality of the analytics is the presence of gradual changes over time in the statistical characteristics of the class to which an observation belongs.In the literature this fluctuation is known as concept-drift.These problems are discussed at length in [42].
The assumption that the Analyzer Module operates on a stationary environment brings a simple and efficient solution, but prone to slight failures when the changes occur.On the other hand, to consider a non-stationary monitoring environment improves accuracy, but entails new challenges, among them: detection of changes, implementation of model/regression updating techniques, identifying when the calibration must be completed or selection of the samples that will be taken into account in new trainings.Given the complexity that this implies, the Analyzer Module assumes a stationary monitoring environment.The non-stationary approach will be part of future work.• High Dimensional Data.The analysis of high dimensional data implies to bear in mind data whose dimension is larger than dimensions considered in classical multivariate analysis.As indicated by Bouveyron et al. [43], when conventional methods deal with high dimensional data they are susceptible to suffer the well-known curse of dimensionality, where considering a large number of irrelevant, redundant and noisy attributes leads to important prediction errors.Hence operate with this data implies the need for more specific and complex algorithms.In terms of SELFNET this means that the vector of Health of Network Metrics (HoN) is large enough to consider the implementation of specific methods adapted to optimize the processing tasks of this kind of information.A priori there are no signs of SELFNET requiring processing an important amount of High Dimensional Data.Therefore, this paper does not take into account differences between conventional and High Dimensional data, assuming that the aggregation tasks will be able to optimize the amount of attributes to be analysed.• Supervision.SELFNET training mode.The analytical methods based on modeling/regression assume that new knowledge can be inferred from observations, by a prior learning stage.The learning process often requires reference data which allows identifying the most characteristic features of the monitoring environment, such as rules, boundaries, incidence matrices, direction vectors or basic statistics.Given the complexity involved in designing a SELFNET training mode, this approach describes how the information needed for the construction of new models is obtained.• Centralized Design.To assume a centralized approach lead us to pose a general purpose scheme where the onboarding of new use cases is completely configurable by specification, and which does not requires updating the implementation (see Figure 2).Therefore the centralized approach is not dependent on the characteristics of the use case, so it is highly scalable and allows performing tasks efficiently (avoiding redundancy).However, its design and the description of use cases is complex.On the other hand, the distributed approach includes an additional component for each use case in which specific pattern recognition and prediction methods are implemented via plugin.The preprocessing, selection and symptom discovery mechanisms have general purpose.
In essence, this second approach is easy of design, but completely use case dependent; each time a new use case is onboarded, the Analyzer implementation must be updated.Due to the large impact on the scalability that this entails, the centralized approach is considered hereinafter.• Data Encapsulation.The greatest challenge in designing the SELFNET Analyzer approach is the requirement of dealing with unknown data.It is possible to assume that use cases do not provide clear enough information about the characteristics of the information to be analyzed (in fact, several future use cases are completely unknown).At the specification stage, use cases operators tend to provide good qualitative information about the metrics to consider, but may overlook details about their quantitative nature: data type, domain, range, restrictions, etc., which is what in the first instance, will be considered in the analysis tasks.Furthermore, quantitative information is much use case dependent.In order to subtract relevance to quantitative details (which are the backbone of the aggregation/correlation tasks), and thus facilitate the incorporation of new use cases by definition of general purpose descriptors, the SELFNET Analyzer is driven by data encapsulated in two levels of abstraction: quantitative and qualitative parameters (see Figure 3).The first one is independent of the use cases, and allows designing a centralized analysis framework valid for any type of data specification.On the other hand the qualitative parameters gather information directly related with SELFNET and the use case to which they belong (metric name, source, location, tenant, etc.).This data is mainly required for aggregation/correlation, diagnosis and decision making.

Analyzer Module Architecture
In Figure 4  • Pattern Recognition: identifies previously known or acquired patterns and regularities in facts related with aggregate data (i.e., Fa(T h ), Fa(KPI), Fa(Ev)), and returns Facts Fa with the results of their study.With this purpose, different internal tasks may be executed: study of the input data (both training data and samples to be analysed), decision of the best suited data mining strategies for each context, feature extraction, construction of models/regressions, analysis of facts related with aggregate data in order to find and labeling verification.Note that the bibliography collects a plethora of pattern recognition methods, which are adapted to the needs of the use cases and to the characteristics of the different monitoring environments [44].The SELFNET Analyzer focuses on two fundamental actions: the identification of signatures of previously known events [45] and the detection of anomalies [46].• Prediction Component: calculates the prediction metrics (as Facts Fa) associated to each use case from the observations provided by the aggregation stage (Thresholds T H , Key Performance Indicators KPI and Events Ev).This implies different processing steps: management of a track record with the data required to build forecasting models, analysis of the data characteristics which are relevant for deciding the best suited prediction algorithms, construction of forecasting models, decision of prediction algorithms, forecasting and evaluation of the results in order to learn from the previous decisions.Note that as stated in [47], the prediction of network events enhances the optimization of resources, allows the deployment of proactive actions and anticipates risk identification.The SELFNET Analyzer focuses primarily on infer predictions from two data structures: time series and graphs.The first one aims to determine the evolution of the HoN metrics, hence it mainly implements exponential smoothing algorithms [48] and autoregressive models [49].On the other hand, the evolution of graphs is predicted in order to anticipate the discovery of new elements [50] and facilitate the management of resources [51].• Adaptive Thresholding: establishes measures to approximate when the forecast errors must be taken into account when identifying symptoms.Therefore it receives as input parameters the values related with the prediction metrics (Track Record TR and Forecasts Ft), and returns adaptive thresholds AT h .Their construction involves different steps, such as analyzing and extracting the main features from the input data, decision of the best suited thresholding algorithms, modeling and estimation of thresholds.The SELFNET Analyzer build adaptive thresholds from data represented as time series or graph, which allows inferring more accurate conclusions from every forecast generated by the prediction component.The main applicability of the adaptive thresholds is considering the context of the monitoring environment in the inference of new facts related with filtering [52], and decreasing the false positives rates [53].• Knowledge-Base: stores specific information about each use case.This data is represented by objects and rules.The objects O are the basic units of information (ex.temperature, congestion, latency, etc.).The rules Ru are the guidelines for reasoning that enable the inference of facts and conclusions.Facts, objects, and their values are interrelated through operations Op.A priori, in this approach online machine learning is not considered in order to acquire knowledge about the use cases in real-time [54], such as definition of new rules, prioritization, metric weighting, etc. (i.e., all information to be considered part of the original training and the specification of the use cases and their symptoms provided by operators).• Inference Engine: applies rules Ru to the knowledge base in order to deduce new knowledge.This process would iterate as each new fact Fa in the knowledge base could trigger additional rules.
Traditionally, inference engines operate in one of two modes: forward chaining and backward chaining [55].The first initially considers previously known facts and infers new facts.On the other hand, backward chaining initially considers facts and tries to infer the causes that have led to them.Because the SELFNET Analyzer infers conclusions from discovered facts, the first approach is implemented.In addition, it is important to bear in mind that the easier implementation of the inference engine considers basic implication elimination rules (i.e., modus ponens rules) driven by propositional logic [56].They can be adapted to different representations of uncertainty, such as fuzzy logic [57], rough sets [58] or Bayesian networks [26].But in order to facilitate the understanding of this proposal, the current specification of rules on the SELFNET Analyzer applies only basic propositional logic rules (as described in Section 5), hence postponing for future works a most complex but generic definition.• Memory: stores all the known facts Fa concerning with the use cases (ex.Temperature = 3 • , Latency >200 ms, etc.) considering those predicted/inferred (Fa(PR), Fa(AT h ), Fa(Ft)) and those provided by the SELFNET Monitoring/Aggregation stages (Fa(T h ), Fa(KPI), Fa(Ev)).Metadata related with qualitative additional information about the nature of the discovered facts is also stored.• User Interface: configures Patter Recognition PR for each use case and allows updating the knowledge-base by inserting, modifying or deleting data associated with every use case, such as objects O, rules Ru operations Op or prediction metrics Ft.The information is preprocessed aiming to ensure compatibility and coherence [59].The latter is particularly important, as it tries to avoid contradictions and ambiguity between rules, prior to their incorporation into the SELFNET intelligence.• Uncertainly Estimation: complements the inference engine and facilitates the study of the conclusions bearing in mind their uncertainty.Its outputs are the acquired conclusions as potential symptoms of relevant incidences, their uncertainty and the information associated with their inference (facts, triggering rules, etc.).This is the only optional element of the architecture, since its use is only required when the SELFNET Diagnostic task [60] need to disambiguate conclusions, filter those of greater uncertainty or convert the logic on the Analyzer to data specified for upper layers of SELFNET.For example, when the inference engine operates on fuzzy logic rules, the element of Uncertainly Estimation generates a quantifiable result use-friendly for Diagnosis as crisp logic, given fuzzy sets and the corresponding membership degrees (i.e., defuzzification) [61].

Analyzer Inputs/Outputs
By studying the Analyzer Module as a black box model it is possible to focus more on its inputs/outputs and their relationship with the rest of the SELFNET components [34].From this perspective, their information sources, nature of the data and behaviour in different circumstances are described.As shown in Figure 5, the Analyzer Module depends on three sources of information.Two of them are external: the SELFNET Aggregation component and the use case operators; the last is internal: data generated by the Analyzer Module itself.The inferred conclusions are reported to the SELFNET Diagnosis Module as symptoms [60].The role played by each of these elements is detailed below: • Aggregation.Observations in SELFNET come to the Analyzer through the Aggregation Layer (Perception capabilities within the Endsley's model).The information provided by this source contains facts concerning Events Fa(Ev), Thresholds Fa(T H ) and Key Performance Indicators Fa(KPI) related to the current network status.

Use Case Descriptors
This section describes the characteristics of Analyzer Module quantitative data and its categories.
In Table 1 the quantitative data is summarized.The objects O = {O 1 , ..., O n }, n ≥ 1 are definitions of the elements from which the system infers knowledge.They are added to the knowledge-base by use case operators.Their function is to describe the nature of the data in order to facilitate the selection of proper preprocessing and prediction methods.Objects are expressed as follows: where object name acts as identification of the data category and the range of values limits the values that can be assigned.The weight is a field reserved for the future implementation of machine learning; it determines priority.Finally, noValues anticipates its amount of possible values.Because of this, an object may be specified as a sequence of k previously defined objects or values interrelated.In this case, they are defined as follows.
The specification of sequences of the same value repeated several times in a row can be simplified by the indicator : i, where i is the number of times it repeats.For example, the previous example: It may be simplified as follows:

Operations Op
The operations Op = {Op 1 , ..., Op n }, n ≥ 1 are definitions of binary relationships between facts Fa, objects O or their possible values Va.Initially, the knowledge-base provides a basic battery of operations (ex.All arithmetic operations, propositional logic relationships, basic statistic expressions, etc.).When a use case is on-boarded, operators should declare the set of operations to be taken into account and their restrictions.This is achieved by the following layout: where name refer to the identification of the operation in the predefined battery, symbol is its shortened representation, priority its position in the hierarchy of operations, operands limits the categories of operands applicable on each side of the binary expression, and description briefly explains its functionality in natural language.

Facts Fa
The facts Fa = {Fa 1 , ..., Fa n }, n ≥ 1 are the basic elements of the SELFNET reasoning.They are added to the memory of the Analyzer Module by the Aggregation layer or deduced by the inference engine.Facts are constructed by the linear grammar GFa = (N, ∑, P, State) where N = State, Operand, ∑ = O, Op, Va and P is extended as: Facts must be accompanied by a timestamp indicating when they have been stated, the location on which they are valid and a weight that determine their priority.The location refers to SELFNET elements (ex.physical machines, virtual nodes, etc.).The priority is a field reserved by future machine learning weighting.Uncertainty describes its probability of being true.Facts are described as the following expression: Examples:

Rules Ru
The rules = {Ru 1 , ..., Ru n }, n ≥ 1 describe how the Analyzer Module acquires new knowledge via rule-based expert system.In order to facilitate their specification, they are declared as propositional logic expressions, and according with the linear grammar GRu = (N, ∑, P, Rule), where ∑ = "True", "False", Facts, N = Rule, Atomic | Symbol | Complex, and P is expressed as follows: The rules are accompanied by the identification of the usecase on which they are valid, and their priority of inference.Note that in order to enhance scalability, the rules of each use case are totally independent from the others.Rules are detailed as follows: Examples:

Forecast Ft
The Forecasts = {Ft 1 , ..., Ft n }, n ≥ 1 are specifications of the objects that must be projected per use case.In this way it is possible to enhance the selection of prediction algorithms and forecasting models.Given the nature of the monitoring environment, a priori, this approach only considers predictions on two data types: time series and graphs.The time series allow estimating the evolution of Key Performance Indicators (KPI) or thresholds from concrete locations on SELFNET (physical infrastructure, network devices, virtualization, etc.).The prediction on graphs facilitates the inference of changes on large regions of the SELFNET topology, such as spreading of congestion, inclusion of new network elements or failures.This expert system considers prediction results as facts, so Ft only refers to their specification when on-boarding new use cases.In Figure 1 predictions as facts are declared as Fa(Ft).The following expression describes the forecasts on time series: Examples: where timeSeries is a reserved word indicating that the prediction is on time series, object declares the nature of the data to be analyzed, and domain is the extension of the prediction.The examples show two reserved words, obs (observations) and time (timestamp).When the time is measured in observations, the length of the prediction is indicated from the initial time instant t and the amount of coming observations (ex.t + 5 indicates forecast the next five observations).On the other hand, timestamps directly detail how long must be the prediction (ex.Today 13:28:15 indicates the requirement of forecast a certain object between now and 13:28:15 today).Note that the term timeSeries is used to describe the way in which data is structured and not the prediction algorithm.A record tracking of this nature could be forecasted by traditional time series methods (autoregressive moving average, exponential smoothing, extrapolation, etc.) but also by other very different approaches (drifting, naive-based algorithms, Artificial Neural Networks-ANN, Support Vector Machines-SVM, etc.).It is up to the decision component of Prediction, select the most appropriate forecasting strategy.If the prediction considers observations on graphs, the forecasts are specified as follows: where graph is the reserved word to declare predictions on graphs.object is the nature of the data on the edges of its incidence matrix.noVertex is the number of vertex (i.e., dimension noVertex-by-noVertex of its complete adjacency matrix).The last two parameters (domain and length) have the same function as in the expression of timeSeries prediction (indicate the measurement of time and the extension of the prediction).

Thresholds T h
The thresholds T h = {T h1 , ..., T hn }, n ≥ 1 are specifications of fault tolerance limits related with values assigned to objects O.They are calculated by the SELFNET Aggregation task, but their specification is part of the use case operators.Thresholds are described as the following expression: where T h name is the threshold identification and object is the object on which it acts.

Adaptive Thresholds T h
The adaptive thresholds AT h = {AT h1 , ..., AT hn }, n ≥ 1 are specification of fault tolerance limits related with values assigned to predictions Ft.They are calculated by the component of prediction of the Analyzer Module, but must be specified by the use case operators.Similarly to the forecast descriptions, initially they act on time series or graphs.They are described as follows: where AT h name is the identification of the adaptive threshold, data structure is timeSeries or graph depending on the representation of the predicted data, CI is the confidence interval on which it is built by the Adaptive Thresholding component and forecast is the prediction from which it is created.

Pattern Recognition PR
The pattern recognition configurations PR = {PR 1 , . . ., PR n }, n ≥ 1 are specifications of how facts Fa related with aggregate data are analyzed in order to determine their similarity with previously established reference information.The outputs of pattern recognition actions are facts that display the degree of the similarity observed.Each PR action is defined as follows: Examples: where PR name is the action identificator, objectIn is the nature of the data to be studied, objectOut is the nature of the object recipient of the similarity degree, action is the reserved word associated with the type of analysis to be performed.The default actions are "match" for matching observations with the reference data and "anomaly" for outlier detection.Finally, re f erencedata is the identification of the dataset D to be taken into account.

Datasets D
The Datasets D = {D 1 , . . ., D n }, n ≥ 1 is the initial reference data to be required by pattern recognition actions.Given that Analyzer Module does not consider online training, all the reference data is provided by the use cases via User Interface.Datasets are declared by the following expression: Examples: where D name is the dataset identifier and object is the nature of its samples.In this first approach, the dataset can be framed by three types: "collection", "model" or "signature".Firstly, "collection" refers to a set of raw observations directly extracted from the monitoring environment.On the other hand, "model" is a preprocessed description of the data to be analysed.Finally, "signature" indicates exactly patters to be identified.The field source determines where the dataset is found (ex.path, url, repository, etc.).

Conclusions C
The conclusions C = {C 1 , . . ., C n }, n ≥ 1 are the subset of the group of facts Fa specified for a use case to be satisfied, that form part of the Situational Awareness of the network.When a conclusion is inferred, it is reported to the Diagnostic module [60] for being a potential indicator of situations.These symptoms are defined by use case operators as follows: where C name is the conclusion identificator, use case is the associated SELFNET use case, and fact is the triggering conclusion.Conclusions are reported to Diagnostic Module as follows: where uncertainty the probability of being certain and trigger is the list of rules Ru or facts Fa that take part of its inference.

Examples of Specification and Workflows
This section describes three examples of data specification and workflows on the Analyzer Module.

UC 1: Device Temperature Analysis
This section describes an example of a sensor related with self-healing use case.

Description
The use case (myTemp) requires identifying symptoms related with overheat on network devices.This is a very basic example where prediction and adaptive thresholding are not considered.Therefore the decision thresholds are static and were built at Aggregation.

Initial Status
The Analyzer Module disposes of a battery of predefined operations, including basic arithmetic calculations, logical and statistical functions.

Use Case Specification
First, the use case operators specify the basic objects to be taken into account: the temperature of the devices and its upper threshold.
Second is indicating the operators that are required and how they are taken into account: Third, the conclusions to be satisfied: The last step is declaring the inference rules: At runtime, Aggregation layer notify to the Analyzer Module facts related with myTemp use case.Some of them concern the temperature on SELFNET devices, for example:   The location NodeB is considered because it is the more restrictive between NodeB,All.So the symptom C 1 has been discovered, and it is reported to Diagnostic Module as follows: The inference engine will continue operating looking for new symptoms.

UC 2: Network Congestion Analysis
This section describes an example of a sensor related with self-optimization use case.

Description
The use case to be managed (Self-Congestion (SC)) requires identifying symptoms related with traffic congestion on SELFNET elements.In this example, prediction and adaptive thresholding are considered.

Initial Status
The Analyzer Module disposes of a battery of predefined operations, including basic arithmetic calculations, logical and statistical functions.

Use Case Specification
First, the use case operators specify the basic objects to be taken into account: the congestion level monitored and its prediction.
Next, they define an adaptive threshold to be automatically generated from the information provided by the record tracking and the Adaptive Thresholding.
Second, it is specified what operators are required and how they are taken into account: Third, the conclusions are identified: The last step is declaring the inference rules: At runtime, Aggregation layer notify to the Analyzer Module facts related with the SC use case, for example:  This use case (Self-Guard(SG)) requires identifying symptoms related with anomalous payloads on SELFNET traffic.In this example, pattern recognition actions are considered.

Initial Status
The Analyzer Module disposes of a battery of predefined operations, including basic arithmetic calculations, logical and statistical functions.The external repositories (Rep1, Rep2) provide collection of Legitimate (Rep1) and malicious (Rep2) SELFNET traffic observations.

Use Case Specification
First, the use case operators specify the basic objects to be taken into account; in this example they are the payload of the SELFNET traffic O 1 , its similarity with the legitimate payload dataset O 2 and the malicious samples O 3 .
Second, it is specified what operators are required and how they are taken into account: Next, they define the datasets to be taken into account.
And then the pattern recognition actions to be executed: Conclusions are identified as follows: And the following rules are onboarded: At runtime, Aggregation layer notifies to the Analyzer Module facts related with the SG use case: Finally, the symptom is reported to Diagnostic Module as follows:

Discussion
Analysis and intelligence capabilities play an important role to address 5G requirements, in combination with key-enabled technologies such as SDN, NFV, cloud computing, etc.All of these domains can take advantage of forecasting, pattern recognition, artificial intelligence and advanced intelligence concepts.In this way, 5G networks will be able to provide enhanced capacities related to the network management and the detection of possible harmful problems.For its part, the diagnosis of data information is required in order to know what the real cause of the event is.Because of this, the intelligence is provided in two phases: (i) analysis stage and (ii) decision-making; similar to a medical evaluation, where firstly the symptoms are detected and then based on it a treatment is applied.In general terms, the proposed architecture and data specification for enhancing a use-case driven analysis on 5G networks presupposes substantial improvements over previous approaches.On the one hand, 5G data analysis is partially covered in some works [27,30,31].These proposals take into account specific requirements such as context awareness of radio components [27].In [30] a context aware resource allocation algorithm based on the user mobility is presented.This work proposes some resource management schemes, handover procedures and cell activations.For its part, Apajalahti et al. [31] use ontologies and statistical reasoning in order to analyze and configure the mobile network.This approach can be used as a complementary methodology to the SELFNET Analyzer approach.On the other hand, some ongoing works [28,29,36] are still at an early stage and are complementary to our proposal.METIS project [28] is dealing with 5G radio access network components (e.g., spectrum usage or air interface).In [29] intelligent capabilities applied to virtualized environments are proposed.Furthermore, Charisma Project [36] introduces intelligent mobile cloud in order to meet low latency and security requirements.
To the best of our knowledge, the SELFNET Analyzer framework is the first proposal that provides a generalized framework to deal with both traditional technologies and currently 5G key-enabled technologies.The SELFNET Analyzer Module provides a general purpose scheme easily adapted to the operator needs and hence to overcome the design constraints in different monitoring environments.This is a very important feature bearing in mind the great amount of technologies that can be part of a 5G scenario, as well those that still under development to be deployed in the near future [10,14,15].Note that SELFNET Analyzer Module is able to analyse information from heterogeneous sources such as SDN elements, virtual devices or metrics from specialized sensors.Another important characteristic is that the SELFNET Analyzer facilitates the incorporation of different analysis strategies, such as novel prediction or pattern recognition algorithms.This framework was developed to be able to operate indistinctly with very different data mining and machine learning paradigms, among them conventional information, big data or high dimensional data.The use of any of them does not imply design changes, being simply an implementation problem.As a result, this proposal is easily adaptable to future projects.
The proposed data specification to accommodate the onboarding of new use cases is simple and adjustable.This is also corroborated by the fact that SELFNET has been able to incorporate every use case without modifications on the original definitions.This does not mean that in future use cases, more relevant changes will not be required.But without doubt, this robustness provides a solid base for design analytic schemes on similar contexts.Note that SELFNET implements a triad of services: self-protection, self-healing and self-optimization with completely different features and dependences (metrics, network devices to be monitored, prediction/pattern recognition algorithms, etc.).But despite these advantages, the proposal presents some weakness, most of them related with the limitations previously mentioned at the design principles (see Section 3).For example, the SELFNET Analyzer is not able to deal with complex stationary monitoring environments [42], where the quality of the analytics will decrease with time.Given the importance of this kind of scenarios on network environments, this is an aspect that must be studied.
Another point to be keep in mind is that, according to the experience of the SELFNET consortium, the effectiveness of the Analyzer Module depends on the quality of their specification.It means that once deployed, the approach follows the guidelines provided by operators, which indicate what information should be processed, how it should be analysed and what results can be obtained from it.Despite of the simplicity of the proposed data specification, if the operator makes errors, there is a greatest chance of unexpected results.So in this sense, its robustness and scalability imply a high dependence on the quality of the specification inserted by operators where configuring the analyzer functionality.It is important to emphasize that loading new use cases is based only on configuration changes, without the need to modify the Analyzer implementation or to include additional software.Furthermore, there are a number of challenges that need to be addressed related to how the data received from underlying layers will be organized or how the analysis process will be performed.Regarding to data organization is important to determine if the data will be processed as a raw data or in an aggregated manner because it may become a performance issue.This information will be loaded and converted into facts by the Analyzer framework in order to provide the network state in real time.Another important aspect to bear in mind is the execution pipeline of the Analyzer components in order to provide consistence and facilitate the organization of the received information.Thus, the investigation of methods to process and analyze the received information is also part of the ongoing work.

Conclusions
In this paper, the application of analysis and intelligence capabilities and how these concepts are used in 5G networks were explained.We introduced the design of SELFNET Analyzer Module and its data specification.Our design provides pattern recognition, reasoning and prediction capabilities to infer the possible symptoms, facilitating the diagnosis and decision-making tasks, which are part of future work.The main contribution of SELFNET Analyzer Module is its general, simple and scalable approach, allowing new rules and metrics in the analysis process when a new use case is added by the operator.SELFNET sensors gather information from several data sources such as virtual elements, LTE, SDN and traditional network devices; and thus the gathered information can be subject of analysis.Furthermore, this proposal was built to support new analytic capabilities by means of a plugin based approach.Meanwhile, the implementation of Analyzer Module is part of ongoing work as well as the introduction of mechanisms to work in non-stationary monitoring environments.

•Figure 5 .
Figure 5. Analyzer Module as a Black Box.

Fa 7 :
{T h1 = 79 • | 1 | 1 | Today 12 : 22 : 15 | All} Fa 8 : {T h1 = 79 • | 1 | 1 | Today 12 : 22 : 16 | All} These facts are provided by Aggregation, and they are directly included on the memory of the Analyzer Module.If they are updated for the same location (ex.Fa 5 and Fa 6 ), the latest version is considered by the inference engine.After certain period of observation, the inference engine tries to deduct new knowledge from the rule-set of every use case.In myTemp, the Analyzer Module tries to infer conclusions for Ru 1 .At Today 12 : 22 : 17 the systems satisfy the first conclusion: Fa 5 (O 1 = 80 • ) ≥ Fa 8 (O 1 = 79 • ), so the fact Fa(C 1 ) is added to memory:

Table 1 .
Summary of UC data Specification.