Control of Telecommunication Network Parameters under Conditions of Uncertainty of the Impact of Destabilizing Factors

: The classiﬁcation of the natural and anthropogenic destabilizing factors of a telecommunications network as a complex system is presented herein. This research shows that to evaluate the parameters of a telecommunications network in the presence of destabilizing factors, it is necessary to modify classical linear methods to reduce their sensitivity to the incompleteness of a priori information. Using generalized linear models of multiple regression, a combined method was developed for assessing and predicting the survivability of a telecommunications network under conditions of uncertainty regarding the inﬂuence of destabilizing factors. The method consists of accumulating current information about the parameters and state of the network, the statistical analysis and processing of information, and the extraction of suﬃcient sample statistics. The basis of the developed method was balancing multiple correlation–regression analysis with the number of regression equations and the observed results. Various methods of estimating the mathematical expectation and correlation matrix of the observed results under the conditions of random loss of part of the observed data (for example, removing incomplete sample elements, substituting the average, pairwise crossing out, and substituting the regression) were analyzed. It was established that a shift in the obtained estimates takes place under the conditions of a priori uncertainty of the statistics of the observed data. Given these circumstances, recommendations are given for the correct removal of sample elements and variables with missing values. It is shown that with signiﬁcant unsteadiness of the parameters and state of the network under study and a noticeable imbalance in the number of regression equations and observed results, it is advisable to use stepwise regression methods.


Introduction
The fundamental difference between modern telecommunications networks-nextgeneration networks, future networks, and traditional networks-is, according to the standards of open systems, free access to resources.On the one hand, connecting to existing information and communication systems is greatly simplified, but on the other hand, there are new problems in protecting information from unauthorized access.This is even more critical for wireless networks.
Destabilizing factors have a detrimental effect on the stability and survivability of a network as a complex technical system.The statistical approach is the most informative approach to quantifying the impact of destabilizing factors on the characteristics of a telecommunications network.The research presented in this article used a method consisting of analyzing key performance indicators widely used for statistical analysis and forecasting the parameters and conditions of technical, economic, and social systems.
In the framework of the classical theory of system analysis, an abstract telecommunications network is considered a complex system, i.e., a system in the model in which there is not enough information for effective management.A study was conducted on various methods of accumulating information about destabilizing factors, and a combined method was developed that is suitable for monitoring the state and parameters of a telecommunications network under conditions of a priori uncertainty of the statistical characteristics of destabilizing factors.
The purpose of this research is, therefore, to manage the survivability of a new-generation telecommunications network in the presence of destabilizing factors with a priori unknown statistical characteristics.
The organization of the rest of this paper is as follows: In Section 2, similar significant works by other authors are presented.The background of the analysis of the mutual impacts of destabilizing factors and their effects on the survivability and stability of a telecommunications network is presented in Section 3. The analysis is based on correlation and multiple regression theory.In Section 4, the concept and combined statistical method of the search for missing values are described.In Section 5, the methodology and results of the experiment are presented concerning a corporate network, and the concept of key performance indicators (KPIs) is applied.Section 6 presents the conclusions and future work on the subject.
These problems are usually solved under conditions of insufficient (and sometimes absent) a priori information about the state and current parameters of the system under study, as well as the characteristics of useful signals, noise, interference, destabilizing factors, etc. Analysis with complete, or at least sufficient, a priori information is an unaJainable ideal; in reality, it is necessary to work without a priori information [2,3].Methods of ensuring the resistance of complex systems to the influence of destabilizing factors were considered in Dodonov et al.'s monograph [4], wherein they considered the impact of uncertainties as the main destabilizing factor deriving from the activity of malicious actors.
The scientific articles in [12,13] presented the results of methods developed to increase the structural survivability of telecommunications networks.
A model of distributed information processing under the conditions of the influence of destabilizing factors on a telecommunications network was developed in [14].The model is partial and based on the method of functional redundancy, considering time characteristics and computational resources.Although the keywords mention destabilizing actions, the article does not specify their characteristics and results.The study by Tolubko et al. [15], a review devoted to describing the methods of assessing the sustainability of a telecommunications network under the influence of destabilizing factors, has the same limitation.Phase portraits of the model of destabilizing factors are developed.It is argued that they can be used to obtain asymptotic estimates of network stability, although practical methods and estimation algorithms are not provided.
The monograph in [16] focuses on optimizing the allocation of network resources to ensure the sustainable operation of access control protocols to the environment.A conclusion is drawn about the stability of the whole network, although there is no proof of this statement.
The report in [17] is an International Telecommunication Union (ITU-T) standard for dealing with natural disasters to improve the stability and speed up the recovery of networks.This standard is a practical recommendation that can be followed in all cases.
The article in [18] describes the development of a method for calculating the survivability of telecommunications network channels, although the main content of the work is the operator method (Laplace method) for solving ordinary differential equations with constant coefficients.The condition of stability of the operator equation of an abstract closed system is accepted as a condition of the survivability of a network.The operator equation is stable if the poles of the function (the roots of the denominator polynomial) are in the left half-plane of the p-plane of the complex variable.
The scientific article in [19] presents the results of developing methods for increasing the structural survivability of new-generation telecommunication networks.
In conclusion, it is important to underscore that the purpose of this section was to provide an overview of the existing work on methods and tools of information security management under the uncertainty of destabilizing factors and the authors' views on the principles and main contents of these methods.The references to the literature are far from systematic.

Background
To understand the problem statement, the mutual impacts of destabilizing factors and their effects on the survivability and stability of a telecommunications network must be analyzed.
The problem of ensuring the stability of complex technical systems, i.e., their ability to maintain normal functioning during and after exposure to destabilizing factors (radiation, thermopower, vibration, etc., including human factors), is relevant to various applications.
The stability of a system means its ability to perform the assigned functions with the specified quality indicators under the influence of internal and external destabilizing factors.The mutual influences of destabilizing factors on each other and the stability of the system are complex.
Ensuring the resilience of complex technical systems against the effects of destabilizing factors is a long, time-consuming, and iterative process associated with different types of uncertainties.
First, this process covers all major stages of the system's life cycle, which can take decades.Secondly, determining the levels of sustainability indicators is a difficult task, much more complex than, for example, determining the levels of reliability indicators.Third, assessing the resilience of a complex technical system against the effects of destabilizing factors requires extensive knowledge in various scientific and technical fields.This is due to the need to determine the physical mechanisms of the effects of destabilizing factors on the object, for knowledge of the functioning of the complex technical system, and for the ability to predict the response of the functioning system to the impact, accompanied by different physical processes.Finally, the assessment of resilience is usually hampered by the lack of data required.
Telecommunications networks are designed to provide communication between enterprises, organizations, and all structural units of the economy and transport per the requirements and rules of technical operation.
For the sustainable operation and provision of subscribers with a given quality of telecommunication services, the state of the network and its elements is constantly monitored, and the quality of services is maintained at a given level.However, as experience shows, abnormal situations caused by the impact of destabilizing factors (DFs) and emergencies (EMs) not only reduce the quality of services provided but also negatively affect customer service.
External destabilizing factors that reduce the stability of functioning include the following: Conflicts and deadlocks due to the incorrect distribution of system resources, some selected mechanisms of the organization of information and computational processes, and the architectural features of system components.
Considering the mutual influences of destabilizing factors is a non-trivial task.The statistical approach is the most accessible and suitable approach for practical applications [20,21], in particular, correlation-regression analysis.
Afifi and others applied the imputation method to replace missing data with some reasonable values, especially in the economic and financial spheres [22][23][24][25].Mean substitution is the main method of imputation proposed for replacing missing data [22].Afifi described this method in [21] in 1979.Imputation methods are not mathematically strict (e.g., they select the types of missing values based on missing-completely-at-random (MCAR), missing-at-random (MAR), and nonrandom missing data, and this selection is qualitative in a wide range).Therefore, multiple correlation-regression analysis with robust and resistant estimates of location and scale was used (median and bi-weight methods) in [26].
The general purpose of the multiple regression method is to analyze the relationship between several independent variables (also called regressors) and a dependent variable.Statistics represent a sample of the implementation of the values of random variables:

−
The i-th sample of the numerical value of the result , 1, 2, , − The j-th sample of the numerical value of the j-th factor 1, 2, , Using statistical data allows us to achieve optimal results by controlling the values of factors or to predict the possible value of the result with the formed values of factors.
The estimation is performed via observations of the input (the rows of the observation matrix X) and output (the elements of the response vector y ).
There is a stochastic (random) relationship between random variables, i.e., there is a correlation.
In the general case, the procedure for constructing multiple regression is to estimate the parameters of the linear equation.The functional dependence of the result on the factors is represented by the following regression equation: Matrices of multiple correlation coefficients and a system of multiple linear or polynomial regression equations are usually used as the main characteristics of statistical communication [18,19].In addition, it is necessary to choose a method for approximating the repeatability curves of KPIs to automate measurements and calculations [24].
Let us consider the process of forecasting network parameters as a task of predicting the k-th dependent variable , = 1, , with M independent variables with the dependent variable k Y is wriJen as follows: where ε is the approximation error.Let 1 . Then, the equation of polynomial regression is introduced in the form of The parameters of the regression model are estimated by sampling the volume taken from a general population.
The sample is formed as follows.Based on the results of the network operation test, the first sample of independent variables 11 The system of multiple linear regression equations takes the following form: where { } To obtain estimates using the least squares method, it is necessary to minimize the sum k S of squares of the deviations at each point.The best approximation corresponds to the minimum value of the following expression: ( ) The value k S is a measure of the error associated with the binding of available data to the selected regression model.
where 0 1 ˆˆ, ,..., Estimates are unbiased and effective, i.e., they have minimal variance for the samples among all linear estimates for predicting variables , 1, . Regression coefficients represent the contributions of each independent variable to the prediction of the dependent variable.Two opposite criteria are usually used to select the final regression equation.
To make the equation useful for prediction, the observer should seek to include as many independent variables as possible in the model so that the predicted values can be determined more reliably.

Description of Method
The concept of missing values is introduced in this section.When using one-dimensional methods of analysis (for example, the t-criterion [21]), the most reasonable action is to remove elements with no value of X (the analyzed variable) from the sample.However, the situation changes when using significantly multidimensional methods of analysis, i.e., when for each element of the sample, there are p observable variables 1 2 , , , p X X X … .If a sample element is missing, e.g., for a variable 1 X , it is not necessary to remove that sample element from the analysis because this results in the loss of information about the variables provided by that element.Since multiple linear regression analysis, as well as other multidimensional procedures, is based on the vector of means m and the covariance matrix S, this element can be left in the sample and the available measurements can be used to calculate the estimates of the vector of the means x and covariance matrix Σ.
Let us imagine a system of equations of the multiple linear regression model (3) in matrix form: where is the so-called design matrix.
For the regression model, it is a matrix of independent variables, supplemented with the first column of current observation weights.

{ }
is the vector of regression equation parameters; is the vector of estimation errors with a multidimensional Gaussian distribution with zero vectors of mathematical expectations and a matrix of variances in the form of 2 σ I ; I is a single matrix; and T is the transposition symbol.Then, Equations ( 4)-( 6) can be combined and represented in matrix form as ( ) ( ) The least squares estimation vector is a solution of a system of normal equations, ( ) Τ = X X B XY (8) the solution of which has the following form: Considering that the matrix of variances of the vector of estimation errors is described using the expression 2 σ I , the correlation matrix of the vector B is equal to ( ) In this case, instead of using the optimal estimate as the initial value, which is determined via the matrix equation, it is easier to look for the optimal estimate as a solution of the dual-difference equation.The coefficients of the difference equation are determined by the statistics of observations and interferences and, in the general case, are time-dependent variables.The advantage of this approach is that even if it is not possible to obtain an analytical solution of the difference equation, it is always possible to obtain its numerical solution on a computer.Moreover, the solution can be obtained in real time, considering the newly received information about changes in the parameters of observations and interference.
Following [21], the iteration algorithm for solving Equation ( 6) in sequential form is constructed as follows: where F and G are matrix multipliers with non-zero determinants or non-zero scalar multipliers.These factors are chosen to ensure the maximum rate of convergence without losing the stability of Equation (10).For the optimal selection of values F and G, the z-conversion operation is applied to (10): (11) and the roots of the characteristic equation are calculated, which must have a modulo of less than one.Then, the general solution of Equation ( 11) with an unlimited increase in the number of iterations asymptotically converges to the exact solution of Equation (6).The rate of convergence depends on the value of the maximum modulus of the root of the characteristic equation.Given the value of the modulus of the relative error of the solution, the required number of iterations can be estimated as a local or non-local characteristic of the efficiency of finding a solution.

Methodology of Experiment and Results
For the simulation, a large corporate computer network with heterogeneous traffic [27] was used.The network had a mixed topology and consisted of more than 200 workstations.The diagram of the typical structure of the researched network is presented in Figure 1.A detailed description of all the traffic parameters is presented in [27].Only the main ones are explained herein.The productivity and equipment parameters searched the service output data.The time series represented the input traffic from the Internet to clients and the output traffic from clients to the Internet.The time interval was 5 min.The transformed input data (Table 1) and normalized KPIs (Table 2, featuring the main KPIs) were used.The key parameters were transmission delay, bandwidth, packet loss, and security level.These parameters have the greatest impact on the resulting service quality.In [22], it was noted that the number of KPIs selected for analysis should be minimal, and in all cases, it is impractical to use more than 20 such indicators.These considerations were considered when specifying the set of KPIs.
The following were selected as the parameters of the optimized problem: A hypothetical IEEE 802.11 n WLAN network was considered, and the data for calculating its parameters were taken from [18].The program of multiple correlation analysis, given in [23] and modified for this problem, was used for calculations.
Table 3 shows the partial correlation coefficients of the optimized parameters.Via these correlation coefficients, partial regression coefficients could be calculated later using Equations ( 1)-(3).There was an essential correlation between the main key parameters.This is because they significantly impact sustainability requirements.The exception was e-mail, because, unlike streaming audio, video, web services, and file transfer via FTP, it is not critical for bandwidth or delivery delay.However, the level of security and data protection (parameter sp D ) was critical for almost all presented parameters because even for types of flexi- ble traffic such as e-mail, data protection is an integral requirement for ensuring the quality of service (QoS).
The results of the correlation analysis are also a key indicator of the monitoring and regulation of streaming data and web services.This is necessary to ensure the secure transmission of information over the network, forecasting and preventing the congestion of the controlled network fragment.Thus, the current monitoring and management of the network survivability level, which are an integral part of the task of overall service quality management, can be successfully performed using statistical methods, particularly the correlation and regression analysis method.
In addition, the fully compiled calculation program takes up to 80 to 500 kilobytes in the memory of the computing device, depending on the scale of the network and the size of the sample.Since, currently, almost any network node is a specialized computer or even a multiprocessor system, the problem of the hardware implementation of the proposed method can be easily solved.
There are different methods for estimating µ and Σ (or, equivalently, the correlation matrices R) when some values are missing.These methods are described in detail in [21].Using those descriptions, some of the "best" equations that included the optimal number of independent variables were chosen.The step method (stepwise regression) was used to find the "best" regression equation.
If the number of independent variables is large, using such an approach to determine the best subset is practically not needed even when using a computer.
One solution is stepwise regression (direct), where independent variables are included one after another in a subset according to a predetermined criterion.At the same time, a variable can be replaced by another variable that is not included in the set or deleted from it.The set of criteria that determine which variables to include, replace, and delete is called a stepwise procedure.Using a stepwise procedure, the ordered list of predictors was found.For example, when p = 5, this list may have the form of 2, 5 1 4 , , X X X X , and 3 X .The variables were chosen so that for the definition of "the best" subset from the list of values m p ≤ : (a) They could beJer predict Y; (b) Their number t would be as small as possible.

−
Emergency and fan power outages; − Multipath interference (for wireless networks); − Viruses and hacker aJacks.− Internal destabilizing factors that reduce the stability of systems include the followthe variety of characteristics of the installed equipment, not considered at the stages of design and deployment; −

Figure 1 .
Figure 1.Typical structure of the corporate computer network … is fixed, and the dependent variable is calculated.The procedure continues until the N variables obtain a sample of observations.

Table 3 .
Mutual correlation coefficients of optimized parameters.