Partial Diffusion Markov Model of Heterogeneous TCP Link: Optimization with Incomplete Information

: The paper presents a new mathematical model of TCP (Transmission Control Protocol) link functioning in a heterogeneous (wired/wireless) channel. It represents a controllable, partially observable stochastic dynamic system. The system state describes the status of the modeled TCP link and expresses it via an unobservable controllable MJP (Markov jump process) with ﬁnite-state space. Observations are formed by low-frequency counting processes of packet losses and timeouts and a high-frequency compound Poisson process of packet acknowledgments. The information transmission through the TCP-equipped channel is considered a stochastic control problem with incomplete information. The main idea to solve it is to impose the separation principle on the problem. The paper proposes a mathematical framework and algorithmic support to implement the solution. It includes a solution to the stochastic control problem with complete information, a diffusion approximation of the high-frequency observations, a solution to the MJP state ﬁltering problem given the observations with multiplicative noises, and a numerical scheme of the ﬁltering algorithm. The paper also contains the results of a comparative study of the proposed state-based congestion control algorithm with the contemporary TCP versions: Illinois, CUBIC, Compound, and BBR (Bottleneck Bandwidth and RTT).


Introduction
Despite its age of almost 50 years, the Transmission Control Protocol (TCP) [1] is still an object of permanent modernization and improvement, and this evolution represents a natural perpetual process. The root of this development lies in incessant challenges caused by a wide variety of computer networks, impetuous progress in the communication devices design, and strengthening of requirements to the information transmission [2][3][4]. Meanwhile, guaranteeing data transfer independent of the hardware platform is the key task of the TCP algorithm; both the stable functioning and effective use of the available channel bandwidth are also the performance characteristics of each specific version of TCP. The congestion control algorithms are responsible for the implementation of all these functions. They use two characteristics as the control actions. The basic one is the congestion window size (cwnd), i.e., the number of packets sent without acknowledgment. A less influential one is the retransmission timeout, i.e., some waiting time for the acknowledgment of the successful packet reception, which excess is treated by the congestion control algorithm as a packet loss.
When most channels were wire channels and had a relatively small capacity and queue waiting time "Additive Increase-Multiple Decrease" (AIMD) congestion control rule demonstrated good performance. This presumed a linear growth of the cwnd between two successive packet losses when the cwnd abruptly decreased in a jump-like manner. The effectiveness of this strategy for such channels was transparent. First, the small channel capacity gave a chance to reach a bandwidth limit linearly without losses for a rather short time. Second, wired hops were so reliable that the fact of a sudden packet loss presumed congestion at some "bottleneck" almost surely. Therefore, the loss indicated the necessity to reduce the sending rate. This simple reason was a base to develop such loss-based versions of TCP as Tahoe, New Reno, etc. [5].
In the case of the "long fat" channels (ones with huge capacity and long queue waiting times), AIMD-based versions of TCP turned out to be ineffective: they underused the channel bandwidth significantly. In the case of the channels with high capacity, the linear growth does not allow for the congestion window to quickly achieve values close to the available bandwidth. Plus, a loss of at least one packet decreases the data transferring speed even more. In addition, if a channel includes a wireless hop, facts of single packet losses are not an explicit congestion indicator. The round-trip time (RTT) parameter starts to play a remarkable role in the congestion control algorithm, and this brings to the variety of the TCP versions: delay-sensitive, hybrid loss-delay, bandwidth estimation-based, etc. [2]. All the modifications make the congestion control algorithm more tolerant to packet losses: after each loss, it decreases cwnd not multiplicative but more sparingly. At the same time, the cwnd growth speed is more aggressive to reach the channel bandwidth faster. The bandwidth value is unknown but estimated given all past statistics of the channel functioning. The algorithm probes more or less gentle cwnd enlargement to give a chance to use all channel resources. Hence, the typical cwnd curve between two packet losses demonstrates a concave [6] or mixed concave-convex character [7].
The ubiquitous application of wireless technologies in computer networks is a challenge to TCP protocol performance and claims its subsequent enhancement. Jitter and periodical signal fading in the wireless channel hops are extra sources of uncertainty of the channel real throughput. These physical phenomena affect both the new mathematical models of the channel functioning and the congestion control algorithms.
Generally speaking, a prospective mathematical model of a channel should satisfy the conditions below.

1.
A model should describe the data transferring process adequately. 2.
A model should represent a trade-off between a complicated object with many parameters, their uncertainty along with the uncertainty introduced by the external disturbances, and simplicity. 3.
A model should operate with the same collection of statistical information as the one available in the real channel. 4.
A model should provide a possibility to simulate the collection of recent "concurrent" versions of TCP. 5.
The chosen model presumes the presence of the developed mathematical framework for the solution to the complex of all the analysis, estimation/identification and optimization/control problems. Availability of both the theoretical solution to the problems above and their efficient numerical realization is strongly encouraged.
The aim of the paper is two-fold. First, this is a presentation of a new mathematical model of the TCP link functioning based on the heterogeneous (wired/wireless) channel. It represents a controllable, partially observable stochastic dynamic system. The system state describes the status of the modeled TCP link and expresses it via the controllable Markov jump process (MJP) with a finite-state space. This space can be chosen arbitrarily depending on the desired detailing of the link description. Below in this paper, we consider four possible channel states: • e 1 : the channel is idle, • e 2 : the channel is loaded moderately, • e 3 : congestion in the wired segment, • e 4 : signal fading in the wireless hop.
Looking rather simple, this model admits successful description of such a problematic link phenomenon as congestion in a channel "bottleneck" and the carrier radio signal fading.
The observations included into the model correspond to those available to a TCP control algorithm on the sending side. Two observable processes describe the flow of packet losses and the flow of timeouts. They are represented by controllable Cox processes with intensity that depends both on the control and unobserved link state. The third observation is a flow of the acknowledgments concerning the successful packet reception on the receiving node. The flow is expressed in terms of a compound Poisson processes (CPP). Its first component represents a counting process of acknowledgment reception moments, and the second one registers corresponding individual values of the Round-Trip Time (RTT).
In the paper, we control the TCP varying the cwnd value only; however, the proposed model allows other control parameters, e.g., RTO (retransmission timeout). We also demonstrate how the proposed mathematical model can describe various contemporary versions of the TCP: Illinois, CUBIC, BBR, and Compound.
The second aim of the paper is presentation of a new TCP prototype version. Its mathematical background is both the solution to the optimal MJP state control under complete information, and the solution to the optimal MJP state filtering given the diffusion and counting observations. The performance of the proposed prototype is demonstrated on the complex of the numerical experiments.
The paper is organized as follows. Section 2 contains a detailed description of the TCP link mathematical model in terms of the controllable stochastic observation system, along with the optimization problem of data transmission through this link.
One can enhance the use of the channel resources in terms of the optimal stochastic control with incomplete information. However, this approach promises complications during its realization: starting from the proof of the optimal solution existence and concluding by bulky numerical algorithms of its realization. Hence, we propose a rather simple suboptimal solution to the problem along with its effective numerical implementation.
To develop the TCP prototype, we need a substantial mathematical framework, which is introduced in Section 3: • Section 3.1 contains the solution to the optimal MJP control problem with instant geometric control constraints and complete information [26], • Section 3.2 introduces a diffusion approximation for the high-frequency CPP describing the packet acknowledgment flow [27], • Section 3.3 presents a solution to the optimal MJP state filtering problem given both counting and diffusion observations with state-dependent noise [28], • Section 3.4 contains a numerical algorithm for the optimal filtering realization [28].
In general, the articles [26][27][28] represent a formal, detailed mathematical background of all applied inferences presented in this paper. We use it in Section 4 to develop a new congestion control algorithm as follows. At the first stage, we calculate a highprecision channel state estimate based on the available observations discretized by time. At the second stage, we apply a separation principle: the obtained filtering estimate replaces the actual MJP state during the process of the optimal control synthesis with the complete information.
The aim of Section 5 is two-fold. First, it demonstrates the potential of the proposed mathematical model to describe various versions of the TCP: classic AIMD congestion control scheme and TCP Illinois (Section 5.1), TCP CUBIC (Section 5.2), TCP Compound (Section 5.3), TCP BBR (Section 5.4).
Second, the section contains the comparison of the proposed state-based TCP with versions mentioned above: Section 5.5 highlights some details of the numerical realization of the proposed TCP version, and Section 5.6 represents the summary of the performed numerical experiments. Section 6 contains concluding remarks.

Problem of Optimal Data Transmission through TCP Channel
On the canonical Wiener-Poisson space with filtration (Ω, F, P, {F t }) [29,30] we consider the following controllable stochastic system, describing the TCP link functioning Here the TCP link state X t is a controllable finite-state MJP with values in the set S N {e 1 , ..., e n } formed by unit coordinate vectors of the Euclidean space R N . The initial value X 0 has a known distribution π, A(u) = A ij (u) i,j=1,N is a controllable transition intensity matrix and α t is a F t -adapted martingale with the quadratic characteristic [31] The link state is unobservable, and the complex of observations (Y t , Z t , {(τ n , V t )}) includes three components.
• Y t is a counting process (flow) of packet losses described by its martingale representation (2): β t is an F t -adapted martingale with the quadratic characteristic ) represents the collection of the loss intensities of the flow given the conditions X t = e n , n = 1, N. • Z t is a counting process (flow) of packet timeouts described by its martingale representation (3): γ t is an F t -adapted martingale with the quadratic characteristic γ, γ t = t 0 C(u s )X s ds, C(u) row(C 1 (u), . . . , C N (u)) represents the collection of the timeout intensities of the flow given the conditions X t = e n , n = 1, N. • {(τ n , V t )} is a flow of successful packet acknowledgments: here τ n stands for the time instant of the n-th acknowledgment arrival and V t does for the specific RTT of the n-th acknowledgment. It represents controllable compound Poisson process (CPP) with the intensity driven by the Markov state X t : the predictable measure generated by {(τ n , V t )} conditioned by the MJP state X takes the form Here λ(u t ) row(λ 1 (u t ), . . . , λ N (u t )) is a vector-valued function with continuous positive components, its nth component represent conditional intensity of acknowledgment arrivals given X t = e n ; Λ(u t , v) col(Λ 1 (u t , v), . . . , Λ N (u t , v)) is a vectorvalued function with continuous components, its nth component represent conditional probability density function (pdf) with respect to v given X t = e n for each fixed u t .
All martingale terms in the processes X, Y, Z and (τ, V) are strongly orthogonal. The control u t represents a current size of the congestion window, i.e., portion of packets which can be instantly transmitted. The set of admissible control contains all O tpredictable processes (O t σ{Y s , Z s , (τ n , V n ) : s, τ n ∈ [0, t]} stands for a natural filtration induced by all observations available up to the moment t) with the geometric constraint: The intensity of acknowledgment arrivals is much more than all the state transition, packet loss and timeout ones: The performance criterion represents an average profit for the transmitted information, which should be maximized. Here • ψ row(ψ 1 , . . . , ψ N ) is a vector of conditional gains given the terminal state X T , • φ(u s ) row(φ 1 (u s ), . . . , φ N (u s )) includes strictly concave components, which represent conditional instant gains for the transmitted information given the current link state X s , • ξ row(ξ 1 , . . . , ξ N ) is a vector of specific transmission expenses per information unit in each link state.
The problem under consideration is challenging. First, in general, optimal control problems of stochastic jump processes with incomplete information are rather complicated [31][32][33][34]. Their proper statement and solution depends on the answer to several auxiliary questions/problems: the martingale one [35], the one of strong solution existence and uniqueness and the one of measurable control selection (see [36] and references within). Without positive answers to the questions, we cannot use the martingale theory [35,37] to express optimal control in terms of either variation inequalities (dynamic programming equation as the preferable outcome) or stochastic maximum principle. Please note that negative answers presumes only impossibility to use the mathematical tools mentioned above. Apparently, the control problem can be modified slightly to provide its solution existence which can be found involving other still undiscovered frameworks.
Second, both the dynamic programming equation and stochastic maximum principle have forward-backward form which complicates synthesis of the optimal control in the explicit form. The authors of [36] have solved the analogous problem of the MJP state (1) control observing the flow of packet losses (2) only. The theoretical optimal solution has been characterized both via the dynamic programming equation and the maximum principle. At the same time, the authors have presented a numerical realization of the obtained result only for the case when the transition intensity matrix of the MJP is independent of the control (i.e., the state is uncontrollable), and control affects the intensity of the losses only. Despite the restrictive conditions the obtained practical results have looked rather prospective: the optimal policy has demonstrated piecewise concave nature similar to the modern versions of TCP: Illinois [6], CUBIC without probe phase [38,39], Compound [40,41] etc.
Third, the essential weak points of the optimal control implementation are its poor robustness relating to the imprecise knowledge of the control system characteristics and small perturbations of the synthesized control to its performance. This means that either control system parameters slightly misspecified towards its unknown nominal, or "instrumental errors" in control caused by imperfection of its numerical realization could nullify gain of the sophisticated optimal control in comparison with a stable suboptimal algorithm.
Fourth, the flow of packet acknowledgments has high intensity and hence leads to a high-frequency control, which is resource intensive.
Keeping in mind all arguments above we avoid the direct solution to the optimal stochastic control problem (6) of the MJP (1) state given the observations (2), (3) and (4) including the martingale problem and the ones of the solution existence and uniqueness. Instead of this we use solutions to a complex of adjacent problems and propose a suboptimal control algorithm of high performance.

Mathematical Background
As a basis of the proposed suboptimal control algorithm, we use the following arguments and mathematical results. We derive the algorithm basing on the following mathematical results and reasons.

1.
The solution to the optimal stochastic control of the MJP (1) state with the complete information does exist and can be defined as a solution to the equation of dynamic programming [26].

2.
The high frequency allows us to approximate the observable controlled CPP (4) by a drifting Brownian motion [42] with the parameters modulated by the MJP state [27]. We can describe the distribution of the diffusion approximation via some moment characteristics only, and this fact leads to robustness of the subsequent state filtering algorithm towards the imprecise knowledge of the specific distribution of compound Poisson process jumps.

3.
The conversion of high-frequency acknowledgment flow to a diffusion process gives a possibility to use the solution to the optimal MJP (1) state filtering problem given the "diffusion" and counting observations [43]. This is extension of the Wonham filter [44] to the case of the diffusion observations with state-dependent noises. Under rather mild identifiability conditions the optimal filtering estimate coincides with the exact MJP state. 4.
The dynamic programming equation corresponding to the control problem with complete information mentioned at item 1, represents the system of ordinary differential equations with well-developed methods of numerical solution. By contrast, the equations of the generalized Wonham filter [43] require design of special numerical procedures similar to [28].

5.
To complete the control synthesis, we postulate a separation principle. This means we put the state filtering estimate mentioned at items 3, 4 into the control strategy defined at item 1.

Optimal Control Strategy with Complete Information
Let us consider the controllable MJP (1) which should be optimized with respect to the optimality criterion (6) where the set U of all admissible controls U includes all O t -predictable processes with the geometric constraint (5).

1.
The function η(t) is the unique solution to the Cauchy problem 2.
There exists a Borel function u t (x) The random process U t u t (X t− ) is an optimal control strategy for the problem (1), (6).

4.
The optimal value of criterion (6) has the form max U∈U J(U) = J( U) = η (0)π; moreover, supremum in (7) is attained for The theorem establishes the base of the practical control realization. Indeed, all variants of possible optimal controls (9) can be calculated and stored in advance via solution to (8), before the control synthesis. The synthesis itself represents the selection of suitable control from the set of possible ones using the "current" MJP state X t− .

Diffusion Approximation of High-Frequency Counting Observations
Use of the "genuine" acknowledgments flow (4) to synthesize the control leads to discontinuous one with high frequency. Its calculation may be resource intensive: each newcoming acknowledgment triggers the control recalculation algorithm. The contemporary TCP versions are exactly like this, but they are relatively simple, so not too "costly".
Once we consider (4) discretized by time with some appropriate time increment, we can see the probability distribution of the observation increments look like mixtures of some Gaussians due to the central limit theorem for renewal-reward processes (CLTRRP). In this subsection we answer two questions. First, we determine characteristics of these mixtures. Second, we form recommendations how to choose time increment value to provide appropriate closeness of the real discretized observation distribution to the theoretical mixture above.
First, to perceive the nature of diffusion approximation, we investigate the CPPs with a fixed control u ∈ U. We consider a collection of the CPPs Probabilistically they correspond to initial CPP {(τ n , V n )} staying in the "single mode": X t ≡ e j and a fixed control value u t ≡ u. Each CPP generates a stochastic measure Keeping in mind the specific form of the predictable measures µ j p , we can compute the moment characteristics for one jump of the CPPs: We investigate the asymptotic behavior of the distribution of the two-dimensional random process when t → ∞. The first component represents the total number of acknowledgments received at the sender over the time interval [0, t], the second component, in turn, stands for the corresponding cumulative RTT value. The author of [42] proved a version of CLTRRP: as t → ∞. In other words, for rather huge t Let us complicate the model, mixing the CPPs {(τ j n , V j n )} n∈N,j=1,N, u∈U above with probabilities π = col(π 1 , . . . , π N ) Here X 0 col(X 1 0 , . . . , It is easy to verify that the predictable measure generated by {(τ n , V n )}, conditioned by X 0 , takes the form Please note that the mixed CPP (13) represents a specific case of the observations (4) with "single mode" MJP X: Making inferences as above we can conclude that for rather huge t Therefore, given some MJP state X s distribution (conditional or unconditional) at the time instant s and a constant control u q ≡ u ∈ U, q ∈ [s, s + h) we assume that the cumulative observation increment over the interval [s, s + h) is distributed approximately in the following way By analogy with (15) for the cumulative process, corresponding to the acknowledgment flow (4) we propose the following approximate diffusion model where (17) gives a chance both to solve the MJP state filtering problem given the diffusion and counting observations and develop corresponding algorithms of the numerical solution to the filtering problem.
By contrast with weak convergence in (12), any convergence in (15) is absent. First, the right-hand side (RHS) of (15) contains the mathematical expectation which is increasing function of t. Second, we determine (15) under hypothesis that the MJP state X remains unchanged over the discretization interval: X q ≡ X s , q ∈ [s, s + t). In the general case, the probability of MJP state transition increases to 1 when the interval length t increases infinitely.
Use of the time-discretized observations (4) at the first stage of the control synthesis-MJP state filtering-presumes calculation of likelihood ratios for the single Gaussian modes and their mixtures. Therefore, the filtering performance depends on both the "theoretical" pdf (15) and the closeness of real distribution of the observation increments to (15).
We form recommendations for appropriate choice of the time interval for discretization of (4). On the one hand, the length should provide the appropriate performance of the diffusion approximation (15), when there is no MJP state transitions over the time interval. On the other hand, the interval length should be small enough to guarantee small probability of those state transitions.
In the CLT the closeness of the limit distribution and the pre-limit one is described by the Berry-Esseen inequality in terms of either the uniform metric or the total variation one [45][46][47]. By contrast, we are interested in closeness of the corresponding PDFs, and the appropriated results are valid for the case of the "classic" CLT, not for CLTRRP.
We propose some heuristic technique choose the discretization interval length, basing on a performance criterion of the distribution approximation.
We refer to the "single mode" processes Θ j t and construct the processes From the definition one can conclude that Θ j h represents the normalized sum of the random number of independent equally distributed normalized random summands. We investigate closedness of its distribution to the standard Gaussian one depending on time h.
Below in the filtering algorithm we operate with various likelihood ratios calculated via the pdfs, hence we need to characterize a distance between the pre-limit pdf and its limit one. The precise distance is difficult to calculate, and we must turn to some upper bound of this quantity.
Let µ(dx) be some positive measure on (R, B(R)), and there exist both the pdf dP a dµ of the pre-limit distribution and the limit one dP dµ . Then the relative approximation error takes the form coincides with the total variation distance (TVD) between P a and P . We use the notation P j (x, h) P Θ j h x for the pre-limit distribution function, P j n (x) stands for the distribution function of the normalized sum of n independent equally distributed normalized random summands with the pdf Λ j (u), and Φ(x) e − z 2 2 dz does for the distribution function of the standard Gaussian random value. From the total probability formula, it follows that where I(x) is the Heaviside function.
an approximate upper bound of Var(P j , Φ) can be written as where Proof. From (19) and the results of [48] (Theorem 1.1) and [49] (Theorem 2.6) the following inequalities are true where C 1 = C 1 (Λ j (u, ·)) is some parameter (see [48,49] for details). Under the Proposition conditions the approximation of the Poisson distribution by the Gaussian one is valid Coefficients a and b above correspond to a piecewise linear majorant for y(x) = 1 √ x over the interval [1, +∞) (see Figure 1). We can calculate the last integral analytically Using the RHS of (24) in (21) we obtain the approximate upper bound (20). This ends the sketch of the proof of the Proposition.
To characterize the distance between the Q t (16) increment distribution and its diffusion approximation (17) we should take into account the chance of the MJP transition during the discretization interval. Let us suppose X u t = e j , then, taking into account (20), the upper bound of Var(P u , Φ|X t = e j ) can be obtained by the total probability formula: The second summand in (25) answers the chance the MJP can leave the state e j during the time interval with probability 1 − e A jj (u)h , and the multiplier 2 is the upper bound of the TVD for any distributions.
To take into account the statistical uncertainty of the current state X u t , we must consider the following averaged criterion: which describes the guaranteeing estimate of distribution distance for the case of the fixed control u ∈ U and X u t ∼ col(p 1 , . . . , p N ).
From the practical point of view, the "rational" value of the time increment h can be chosen following to the one of policies: 1.
Numerical analysis of the values J j (u, h) for various (j, u, h) for the choice of an appropriate value for h.

2.
Solution to the individual minimax problems with subsequent choice of the maximal h from the set of the individual solutions. 3.
Solution to the general minimax problem In this paper, we use the first policy as the most economical one.

Optimal Filtering of MJP State Given Counting and Diffusion Observations
In this section, we investigate MJP state (1) filtering problem given counting (2), (3) and diffusion observations (17). Without loss of generality to simplify the presentation and subsequent analysis of the solution to the MJP filtering problem we must introduce below the additional assumptions.

1.
The control u t represents an observable nonrandom cádlág-process.

2.
The noises in Q t are uniformly nondegenerate [50], i.e., min N has a finite local variation (here and below 0 stands for a zero matrix of appropriate dimensionality); K(u t ) K ij (u t ) i,j=1,N is the corresponding N × N-dimensional matrix-valued function. The optimal filtering problem is to find a Conditional Mathematical Expectation (CME) } is a natural flow of σ-algebras generated by the observations (2), (3) and (17).
The noise intensity in the observations (17) depends on the estimated state X, and this fact prevents to apply the known results of the optimal nonlinear filtering [37]. To overcome this obstacle, we use a special transformation of available diffusion observations [28]. Here we present a sketch of this transformation.
The Ito rule gives a possibility to obtain the observable quadratic characteristics of Q: e n X s E n (u s )ds. (27) We use the normalized diffusion observations as the first block component of the transformed observations. The model of this process is the following where D(u) ∑ N n=1 E − 1 2 n (u)D(u) diag e n , and W t is a standard Wiener process of appropriate dimensionality.
The quadratic characteristics Q, Q contains essential statistical information which should be included in the estimation algorithm. This process is a linear transformation of the estimated MJP state.
It is easy to verify that however, result of the direct derivation is a matrix-valued function with the excess dimensionality. All its statistical information is included in the complete preimage of F: In [28] we explain in detail how to reduce the "rough" process F to the N-dimensional "compressed" process H t , which has the model where L(u t ) is an N × N-dimensional matrix-valued function with cádlág components; its rows are orthogonal and contains 0 or 1 only. One can rewrite the process H t as a cumulative sum of the jumps occurred at some nonrandom (or O t -predictable) moments τ (the term H D t ) and one, which accumulates jumps at the random (totally inaccessible) moments (the term H R t ): The process H D t represents the second block component of the transformed diffusion observations. To obtain the third component we must express H R t through the equivalent complex of the counting processes G t = col(G 1 t , . . . , G N t ): The components of the process have the following properties.

1.
Each component G n t has the martingale representation where α u t is the martingale from the state representation (1), L n (u) e n L(u) and Γ n (u) diag(L n (u))Λ (u)(I − diag(L n (u))).

2.
[G n , G m ] t ≡ 0 for any n = m, and G n , G n t = t 0 1Γ n (u s )X s ds. Below we present a stochastic system for the CME X t along with its properties. Proposition 2. The following assertions are true.

1.
The CME X is the unique strong solution to the stochastic system

2.
The estimate of the maximum a posteriori probability (MAP) X t = e n : n ∈ Argmax 1 m N e m X t minimizes the L 1 -criterion, i.e., X t ∈ Argmin X t E X t − X t 1 .

3.
If E n (u) = E m (u) for any n = m almost everywhere on [0, t], then X t = X t P-a.s.
The validity of items 1 and 3 in Proposition 2 can be proved by complete analogy with [28] (Theorem 1, Corollary 1), meanwhile the one of item 2 is proved in [51].
The theoretical assertions above are also meaningful from the practical point of view for subsequent design of the suboptimal control of MJP state under incomplete information. First, the CME X t represents a solution to some closed finite-dimensional stochastic system, by contrast with the general case of the optimal filtering problem [37]. Second, the paths of the CME X t usually are piecewise continuous functions with values in Π, meanwhile the MJP X state trajectories are P-a.s. piecewise constant functions with values in S N . Therefore, we cannot directly substitute the state X by its estimate X, imposing the separation principle to this control problem. The CME X can be easily transformed into the MAP estimate X with the paths with the same properties as the ones of X. Assertion 2 of Proposition indicates that the proposed MAP estimate is also L 1 -optimal. Third, if the observation system satisfies the identifiability conditions (see Assertion 3 of Proposition) then the MJP state can be restored exactly given the indirect noisy observations. This crucial property gives a chance to reduce the initial control problem with incomplete information to the one with complete information. Obviously, any numerical realization of the filtering estimate leads to some approximation errors, nevertheless Assertion 3 allows one to hope that the small filtering errors cause acceptable control performance.
At the same time, results of Proposition 2 are difficult for the direct application. First, due to the approximation of the acknowledgment flow (4) by the diffusion model (17), the former one is valid and can be effectively applied only for the observation increments over the time interval of significant length (see Section 3.2). Second, the process H t , playing the key role in the estimation, is not observable directly, and represents a result of some stochastic limit passage since it is based on the quadratic characteristic Q, Q . Due to the boundedness from below of the diffusion observation time increment, direct calculation of H t looks impossible. In the next subsection, basing on the time-discretized diffusion observations we present a special numerical algorithm of the nonlinear filtering together with its performance characteristics.

Numerical Realization of Filtering Algorithm
To construct the numerical algorithm of the MJP state filtering given the combination of both the diffusion and counting observations we consider a time-invariant version of the observation system (1), (3), (2), (17) given the observations discretized by time with the time increment h > 0 (t r rh, r ∈ N): and O r σ{Y n , Z n , Q n , n r} is a natural filtration generated by the discretized observations. An assumption that coefficients A, B, C, D and E are constant, is not too restrictive in practice because below we will construct the MJP control which will be constant during the time discretization intervals. Please note that the discretized observations Y r , Z r and Q r are conditionally independent given F X t r ∨ O r−1 due to the properties of the Wiener-Poisson canonical space and the result of [50] (Lemma 7.5). Specifically, the distribution of Y r , Z r and Q r depends on the random vector η r = col(η 1 r , . . . , η N r ) = t r t r−1 X s ds is a random vector composed of the occupation times of the state X in each state e n during the interval [t r−1 , t r ]. Then • conditional distribution of Y r given F X t r ∨ O r−1 is the Poisson one with the parameter Bη r , • conditional distribution of Z r given F X t r ∨ O r−1 is the Poisson one with the parameter Cη r , • conditional distribution of Q r given F X t r ∨ O r−1 is the Gaussian one with the mean Dη r and covariance matrix ∑ N n=1 η n r E n . Below in the presentation we use the following notations: • A max n=1,N |A nn |; • D {u = col(u 1 , . . . , u N ) : u n 0, ∑ N n=1 u n = h} is an (N − 1)-dimensional simplex in the space R M ; D is a distribution support of the vector υ r ; • Π {π = col(π 1 , . . . , π N ) : π n 0, ∑ N n=1 π n = 1} is a "probabilistic simplex" formed by the possible values of π; • N X r is a random number of the state X t transitions, occurred on the interval [t r−1 , t r ], • ρ k, ,q (du) is a conditional distribution of the vector X t r I {q} (N X r )υ r given X t r−1 = e k , i.e., for any G ∈ B(R M ) the following equality is true: Gaussian probability density function (pdf) with the expectation m and nondegenerate covariance matrix K; • P (n, a) e −a a n n! is a Poisson distribution with the parameter a; • Υ k,j,s (y, z, q) Below is an assertion introducing the calculation algorithm of the MJP state given the discretized observations X r E{X t r |O r }. Proposition 3. The filtering estimate X r can be calculated be the following recursive algorithm and initial condition X 0 = π 0 .
To construct a numerically realizable algorithm we must restrict the sums both in the numerator and denominator of (37) and obtain the analytical approximation of the Sth order. We present some summands Υ of the low order s: Υ k,j,0 (y, z, q) = δ kj e A kk h P (y, B k h)P (y, C k h)N (q, hD k , hE k ), where D k is the kth column of the matrix D. Other summands are also determined by the total probability formula and have complicated form. Obviously, the integrals above cannot be calculated analytically, and we approximate them by some integral sums Υ k,j,s (y, z, q) L ∑ P (y, Bv )P (z, Cv )N (q, Dv , where {v } =1,L ⊂ D is a collection of points, and { kj } =1,L are corresponding weights, such that ∑ N j=1 ∑ L =1 kj 1. Therefore, we calculate the filtering estimate by the recursion and refer it as the numerical approximation of the Sth order, corresponding to a chosen numerical integration scheme. Let us fix a time instant t, and consider the asymptotic performance of approximation (41) as h → 0. The performance index is sup π∈Π E π X r − X r 1 , i.e., an average of the L 1norm of the filtering error calculated at the step r for the worst initial distribution of the MJP.
Proof of Proposition 4 can be performed similarly to [28] (Lemma 4, Theorem 2). The first term in (42) characterizes the error of the analytical approximation: formula (39) takes into account at most S possible state transitions occurred during the time discretization interval [t r−1 , t r ]. The second term in (42) describes an impact of numerical integration error to the overall performance of the filtering approximation. We can deduce that the effective choice of the integration scheme should provide the equal contribution of both summands in (42).
For the numerical study we choose the analytical approximation of the 1st order realized by the middle-point scheme:

State-Based Modification of TCP
In this section, we describe a TCP channel mathematical model we later use for simulation of some modern TCP versions and their comparison with the state-based optimal control policy. The model we use here is in general following the one of [52]. The main distinctive characteristic of this model is the channel state allocation: we use three states to describe the wire channel condition and add one extra state to cover the issues of the wireless connection. This allocation presents a reasonable trade-off between a comprehensive connection state model taking into account all possible features (including the data flows from every channel user, the current packet distribution in all the channel hops and buffers' queues, and signal quality in the wireless channel segment) and the feasibility of the mathematical modeling.
Thus, we suppose that the link state from a sender to a receiver is described by a controllable MJP X t (1) with four possible states: • e 1 is assigned for low channel load, • e 2 is for moderate load, • e 3 is for wired segment congestion, • e 4 is for signal fading in the wireless segment.
The intensity matrix A(u) = A ij (u) i,j=1,4 is defined based on the following assumptions: the link has a single bottleneck device, which remains the same during the whole transmission, this bottleneck device uses Random Early Detection (RED) queuing discipline [53], its buffer capacity is Q, and the RED threshold of guaranteed packet rejection is W (W Q). We also assume that the wireless connection quality does not dependent on the data flow, hence the intensities A ·4 and A 4· corresponding to the transitions from/into the state e 4 are independent of the control u s . Furthermore, the direct transitions between the e 1 and e 3 without passing through the e 2 are assumed impossible, i.e., A 13 = A 31 ≡ 0.
The controllable components of A(u t ) have the form where U bdp is the control, which corresponds to the bandwidth-delay product (BDP), in other words-the maximum window size yielding throughput equal to channel bandwidth. The constant A is a level of intensity which guarantees the state transition during the forthcoming RTT. The dependence of A ji (u t ) on control u t is straightforward. In the state e 1 , the number of packets in the link is less than U bdp ; and in the state e 2 the "bottleneck" buffer begins to fill. The inverse proportionality of A 21 (u t ) on u t and guaranteeing intensity A provides the increasing probability of e 1 → e 2 transition as u t approaches to U bdp and guarantees the transition when the threshold U bdp is reached. The constant additive term A 21 0 stands for a chance of the e 1 → e 2 transition under low control values u < U bdp , which are probable due to the external flows. When u t decreases to levels less than U bdp , the probability of backward transition e 2 → e 1 increases linearly due to the constant flow processing rate. The transition intensities e 2 e 3 act the same way, but with a different threshold, namely W . The conditional intensities of the acknowledgment arrivals λ j (u) depend on the control u and, according to (10), are inversely proportional to the average time between the acknowledgment arrivals: We assume that if no packets are lost, then during each RTT cycle, the sender receives back the acknowledgments for all the packets currently being sent into the network; hence we assume that the following relation is valid: The average RTT for each state m j V (u) is assumed to be a sum of the following components: • constant propagation delay, δ 0 , • average queuing delay caused by external data, flows m j V, ext , • average queuing delay caused by the data flow under control, u · m j V, sel f . Summing up the assumptions, we have the following relation for the conditional intensity of the acknowledgment arrivals: The counting processes for loss (2) and timeouts (3) can now be defined as thinned versions of the acknowledgment flow with following conditional intensities: Here P j to denotes the conditional probabilities of a timeout in the corresponding states. For the states e 1,2,3 , which are related to the wired part of the link, we assume that the only cause for a timeout is a temporary communication hardware fault; and hence the probabilities for these states are constant and equal to each other: P 1 to = P 2 to = P 3 to . In the state e 4 , the timeouts follow the wireless carrier signal fading; hence the probability of a timeout P 4 to is different but still independent of the control u. The packet loss conditional probabilities, on the contrary, are the functions of the control u. If the control value is less than the RED threshold u < W , then where P 0 is the probability of a packet loss in the wired segment during its propagation through the media, W is the lower RED threshold (W < W ). If the threshold of guaranteed packet loss is exceeded, then the loss is inevitable, thus P j (u) = 1 for any j, if u ≥ W .
To conclude the definition of the loss and timeout intensities, it remains to mention that the additive terms B j 0 in the loss intensity B(u) stand for the losses caused by the external flows.

Comparative Study with Modern Versions of TCP
We have completely described the observation system (1)-(4) and its parameters' dependence on the control u. Let O t {Y s , Z s , Q s , 0 s t} be the natural filtration generated by the observations available up to the moment t. Generally speaking, any O t -predictable nonnegative control U t is admissible to (1)- (4).
In this section, we present the control processes, which describe the modern versions of TCP in terms of the presented model of channel state and observations. We also present here a state-based TCP control modification, which is based on the optimal state filtering and optimal control strategy. The section will be concluded by a comparative analysis of the TCP versions' performance.
In what follows we will assume that the constant values U bdp , W , δ 0 and m j V, sel f are selected so as to comply with the link of C = 100 Mbps capacity, propagation delay of δ 0 = 0.1 s, bottleneck queue limit of Q = 100 packets, and MSS = 1000 bytes:

AIMD Scheme and TCP Illinois
In [52] we presented an AIMD type control u t policy, which remains the same for the present channel model: where • I S (u) is an indicator function equal to one, if u ∈ S, and zero otherwise, • W is the minimal window size, • W th t is a threshold actuating congestion avoidance phase, • r t is the exponential smoothing estimate of RTT, • α t and β t are O t -predictable coefficients of additive increase and multiplicative decrease. The first term in (45) describes the slow start mode, the second and the third stand for the linear increase and the multiplicative decrease in the congestion avoidance phase, and the fourth provides the window rollback to the minimal value W and return to the slow start mode when a timeout event occurs.
In the case α t ≡ 1 and β t ≡ 0.5 Equation (45) represents the New Reno algorithm. The Illinois concave control policy is defined by convex α t and increasing linear β t functions of the average queuing delay d a = The parameters κ i and d i and other details of the Illinois control scheme can be found in [6]. It should be noted that the most important parameters are the maximum and minimum additive increase and multiplicative decrease coefficients, which for the standard implementation are set to [α min , α max ] = [0.3, 10], [β min , β max ] = [0.125, 0.5]. In Figure 2, we present the simulation results for the Illinois TCP control policy for these standard parameters. The upper plot presents the channel parameters' dynamics, including RTT (in red), losses (black triangles), and timeouts (red crosses). The filling color indicates the channel states: white for idle, green for moderate load, red for congestion in the wired segment, and grey for the wireless segment signal fading. The lower plot shows the control dynamics and the critical thresholds: U bdp , which corresponds to the channel bandwidth-delay product and buffer overflow low bound U bdp + W .
One can notice that by processing only the RTT information, the algorithm succeeds in the determination of the U bdp and becomes much more prudent once the bottleneck buffer starts to fill. This results in long periods of relatively high transmission rates without buffer overflows and rare losses. Nevertheless, during the intervals, when the channel is idle, the control values growth speed is insufficient, which results in underuse of the channel resources and, in the end, in lower average transmission rate.

TCP CUBIC
In contrast with TCP Illinois, this version of TCP does not rely on RTT observations most of the time. Instead, it considers the control value, at which a loss occurred last time, as the highest network use control and tends to form a plateau in the close region to this point. To that end, it keeps counting the time since the last loss or timeout, t dZ s , and sets the control according to a cubic function of T loss t forming two regions: a concave region to reach the last maximum control value of W max t , and then a convex region of network probing, where the control growth speed becomes higher as the time without loss increases. Upon the loss event, the control is reduced according to a constant multiplicative decrease coefficient β, and when a timeout occurs, the control is reset to a minimal window size W. Summing up, the TCP CUBIC control can be represented as follows: where C is a constant fixed to determine the aggressiveness of control growth: with higher C values (for example, C = 4.0), CUBIC tends to be more aggressive, which can be quite useful in high BDP networks.
In Figure 3, we present the simulation results for the TCP CUBIC control with multiplicative decrease coefficient β = 0.9 and scale constant C = 4.0. It should be noted that this simulation is based on a more precise model of the protocol described in [38] and takes into account such details as TCP-friendly region and fast convergence heuristics. These details were not reflected in Equation (47) to avoid unnecessary complications. As in the previous Figure, the upper plot presents the channel dynamics (RTT, losses, timeouts, and state), and the lower plot shows the dynamics of the control.
One can see that TCP CUBIC manages to keep the control close to the desired U bdp value, allowing fast recovery after losses. At the same time, the probing phase, which is symmetrical to the recovery phase, is too aggressive, and the average throughput would benefit from longer "plateau" periods. Another advantage, which must be mentioned, is the ability to adjust to dramatic changes in the media: in contrast with TCP Illinois, the CUBIC protocol keeps the control at low values throughout the whole period of wireless signal degradation, which results in fewer losses.

TCP Compound
The TCP Compound algorithm tries to benefit both from the loss-based and congestionbased approach. To that end, the authors enhance the standard AIMD congestion avoidance scheme with an additional component, which allows faster growth on an idle channel when standard AIMD control underuses the resources [40]. When the congestion is detected, the window is adjusted to avoid packet losses. To estimate the congestion, the TCP Compound scheme compares the estimated number of backlogged packets (bottleneck queue size) d t with a known threshold value γ. The estimate of the queue size is computed as follows: where V t is current, and V min t is a minimum registered RTT value. The entire TCP Compound control scheme can be represented by the following expression: where α, β, κ, ζ are tunable protocol parameters.
In (48), the first term describes the slow start mode, the second term reflects the growth phase and correction upon congestion detection, the third stands for the multiplicative decrease, and the fourth provides the window rollback and return to the slow start mode when a timeout event occurs.
In Figure 4, we present the simulation results for the TCP Compound protocol with standard parameter values: α = β = 0.125, κ = 0.75, ζ = 1.0. The backlog estimate threshold value for congestion indication is set to γ = 80. The upper plot presents the channel dynamics (RTT, losses, timeouts, and state), and the lower plot shows the dynamics of the control (in black) and the estimated backlog size d t (in blue). The figure illustrates the correction of the control when the backlog size estimate reaches the threshold and high control values when the bottleneck buffer queue is assumed empty. It should be noted that TCP Compound, such as the Illinois version, fails to quickly adapt to the wireless signal degradation, demonstrating high instability and a big number of losses during this channel state.

TCP BBR
The TCP BBR algorithm is purely delay-based [54]. It is designed with the idea of maintaining the total data in the channel equal to the BDP. At this load, a connection runs with the highest throughput and lowest delay. The BDP value is estimated as a product of RT prop-round-trip propagation time and BtlBw-bottleneck bandwidth or delivery rate. An estimate for the propagation time is the minimum registered RTT over a long time: where W R typically varies from tens of seconds to minutes. To estimate the delivery rate, BBR calculates the ratio of the portion of data delivered to the time elapsed from the delivery start. Since this ratio is calculated for every acknowledgment received, it is natural to take the data "inflight" at the moment the packet was sent as a portion and the RTT of this acknowledgment as the time elapsed from the delivery start. The estimated delivery rate then is a maximum of such ratios taken over a period W B equal to 6-10 RTTs: The main problem of this approach is that the propagation time and the delivery rate cannot be observed at the same time. Indeed, the bottleneck buffer must be empty to observe RTT values close to the propagation time and, to observe the capacity of the channel, it must be overfilled. This problem is solved by two modes of the steady-state regime: ProbeBW and ProbeRTT. In ProbeBW, the algorithm cycles through eight phases with the following pacing gain values: p t = (5/4, 3/4, 1, 1, 1, 1, 1, 1). The length of each phase is equal to the current estimate of the propagation time RT prop t . Thus, the capacity of the channel is achieved by a periodical increase of the sending rate followed by a rollback for the queue drain. ProbRTT is turned on when the value of RT prop t is not updated for a long time. In this mode, the transmission barely stops for a short time to fully drain the queue. Simulation experiments show that in the present model, the last mode is redundant since BBR manages to maintain a very precise estimate of the propagation delay spending the whole time in ProbBW mode. Plus, we excluded from consideration the Startup and Drain modes since they are usually very short.
Thus, finally, the BBR control is defined as follows: where e[k] ∈ R 8 is a vector with unity on k-th place and zeros on all others, and % is the modulo operator.
In Figure 5, we present the simulation results for the TCP BBR protocol. The upper plot presents the channel dynamics (RTT, losses, timeouts, and state), and the lower plot shows the dynamics of the control (in black) and the estimate of the BDP control equal to RT prop t · BtlBw t (in blue). One can notice that this estimate is quite precise, nevertheless, the channel is congested almost the whole time. This means that the BBR algorithm is too aggressive for the channel at hand parameters: the bottleneck buffer size is not enough to accommodate the periodical 25% sending rate increase.

State-Based TCP
To obtain the state-based TCP control strategy, the optimization problem (6) needs to be solved for some predefined gains (instantaneous and terminal) and transmission expenses.
It is natural to bind the transmission expense function ξ = (ξ 1 , . . . , ξ 4 ) T with the intensity of losses, which we aim to minimize, hence set where k 1 , . . . , k 4 are coefficients, which reflect the gravity of losses in particular channel states. We take the same instantaneous gain, as in [36]: where a 1 , . . . , a 4 are coefficients, which define the utility of the traffic, depending on the channel state.
Analyzing the behavior of the TCP versions described earlier in the present paper, we may conclude that the most beneficial in terms of the throughput and losses is the state e 2 (moderate load). Hence, it is natural to design the state-based version with the goal of spending most of the time in this state. Terminal gains ψ j , satisfying the condition max{ψ j } = ψ 2 , would reflect this idea.
In Figure 6 (left), we present a solution to the problem (6) with transmission expenses and instantaneous gains given by (50)-(51) with k = (10 −4 , 10, 10 2 , 1) T and a = (100, 100, 1, 100) T . The terminal gains are ψ = −10 6 · (2, 1, 2, 4) T , and the right bound of the observation interval is set to a rather small value of the propagation delay T = δ 0 = 0.1 so that the impact of the terminal gains on the criterion would be more valuable. The controls for the states e 1 (idle), e 2 (moderate load), e 3 (congestion), e 4 (wireless signal fading) are given in grey, green, red, and black colors, respectively.
One can observe that the optimal control we obtained is almost constant. This is a very useful property in terms of the scalability of the results. Indeed, the control strategy equal to the mean of the optimal controls does not depend on the interval, where the original optimization problem (6) was defined. In Figure 6 (right), we present three plots, which illustrate the behavior of state occupation probabilities of the channel X t with constant controls (52) given three different initial states: X 0 = e 1 , X 0 = e 2 , X 0 = e 3 . The color scheme is the same: grey, green, red, and black lines show the occupation probabilities for respectively e 1 , e 2 , e 3 , e 4 states. With solid lines, we show the probabilities obtained as a result of the Kolmogorov equation solution, and with dotted lines, we show the same probabilities obtained through the Monte-Carlo sampling (with 1000 trajectories). One can see that even on a bigger time interval (T = 5 s), the goal of the state-based control is achieved: from any given initial condition, the channel manages to revert to (or maintain) the most favorable state e 2 .
In Figure 7, we present the simulation results for the state-based control policy. The upper plot presents the channel dynamics (RTT, losses, timeouts, and state), and the lower plot shows the optimal channel estimate X t in the form of a stack plot: the height of the white/green/red/grey area at a certain point of time corresponds to the conditional probability of state idle/moderate load/congestion/wireless signal fading. This plot demonstrates that the quality of the estimates is good and that the hidden channel state may be adequately revealed based on the available information. In the lower plot of Figure 7, we also show the dynamic of the control where u t is given by (52). One can see that even on a larger interval, the main property of the proposed control strategy remains: the channel spends most of the time in the state e 2 , which results in better throughput and fewer losses.

Comparison
To compare the performance of the TCP control schemes discussed above, we use statistical modeling. The performance metrics, namely the average throughput (a measure of bandwidth usage effectiveness) and the loss percentage (a measure of predisposition to congestions, which affect other users), are calculated on samples long enough to make the variance negligible. This way is preferable in comparison with taking the average on a bunch of short-term samples since it diminishes the effect of transient phases: initial probing for available channel characteristics, which is implemented differently but is an essential part of all TCP protocol versions.
On samples of 10 6 seconds, we compare the state-based control with TCP Illinois, CU-BIC, Compound, and BBR versions. To make the comparison fairer, we variate, where available, the parameters of TCP control algorithms to achieve better performance. For TCP Cubic, we take three values of multiplicative decrease coefficient β ∈ {0.7, 0.8, 0.9}; for TCP Compound, we consider nine values of the backlog estimate threshold γ ∈ {10, 20,30,40,50,60,70, 80, 90}. Other parameters of the protocol are the same as they were defined in Sections 5.2 and 5.3 since they have little or negative effect on the performance.
For the state-based version described in Section 5.5, one can tune the protocol behavior by choosing different optimization criteria (6). Nevertheless, since, in our case, the optimal control is constant, instead of the variation of the coefficients of the transmission expenses (50) and instantaneous gain (51), we can directly manipulate these constant values assigned for the channel states. The experiments show that changing controls for states e 1 , e 2 , e 3 , which correspond to the wired part of the transmission channel, makes the performance worse. At the same time, the variation of the control for the state e 4 (wireless signal fading) can bring value; hence we consider four cases: u 4 t ∈ {20, 50, 100, 200}. The simulation results are summarized in Figure 8, where we present the average throughput and loss percentage and are detailed in Table 1, where one can also find the control algorithm parameters and state occupation times.
One can immediately observe the same occupation time value for the state e 4 , which is an indirect indicator of the sufficiency of the chosen simulation sample length: since the transition to and from the state of wireless signal fading does not depend on the control values, the limit probability for the corresponding state should be the same.
The highest occupation time for the state e 2 of moderate channel load is demonstrated by the state-based control. In addition, it can be confirmed that this allows this control algorithm to demonstrate better performance: for the case of u 4 t = 20, the losses are minimal, and the average throughput is second best. It should be noted that the best throughput value demonstrated by the BBR protocol is only possible at the cost of huge losses. This is a characteristic feature of this control algorithm on shallow buffers [55]: it is too aggressive for a channel with chosen characteristics, and a small buffer cannot accommodate frequent 25% speed jumps.
The last thing, which is worth mentioning, is the ability of the state-based protocol to be tuned specifically for the cases of wireless channel issues. Depending on the application, it may try to maintain the maximal possible transmission rate at a cost of huge losses, or, vice versa, drop the speed and wait for the connection to restore to the full speed. 60

Conclusions
The class of controllable Markov jump processes equipped by the stochastic analysis framework represents an effective tool for the description of a TCP governed communication connection. The hidden channel state is described by a Markov jump process with a finite-state space, characterizing both the current channel load and physical "health status". The state equation admits both to include various types of existing congestion control algorithms (Illinois, CUBIC, Compound, BBR, etc.) and to incorporate some novelties.
The available observations represent the Markov jump processes, namely the Cox processes of the packet losses and timeouts and compound Poisson processes of the packet reception acknowledgments.
The available mathematical framework admits designing the complete technological chain of the TCP congestion control optimization, namely: • to describe properly the congestion control problem as the stochastic control one, • to solve the problem above in the case of complete information under the admissible controls with geometric constraints, • to simplify the mathematical model of available observations, replacing the highfrequency packet acknowledgments flow by its diffusion limit, • to solve the connection state filtering by the available observations and obtain highprecision state estimates, • to design effective numerical algorithms for the filtering and control problems solution, • to apply the separation principle and the loop of congestion control synthesis, using the connection state estimates instead of their exact values.
The result of this optimization represents the proposed state-based version of TCP. The paper contains a comparative analysis of the proposed algorithm against the other contemporary TCP versions and demonstrates its advantages.
The potential of the controllable Markov jump processes for the description of the transport and applied layer communication protocols is far from being exhausted. In perspective, one can use it both for the enhancement of the existing protocols (see, e.g., multi-path TCP [56]) and for the development of new ones (see, e.g., "TCP-free" protocols such as QUIC [57]).
In conclusion, we should also note that the mathematical potential of Markov chains/ Markov jump processes allows designing complete technological chains "mathematical model-properly formulated mathematical problem-theoretical solution-efficient numerical algorithm" to solve many applied problems of the analysis, estimation, and control in such areas as biology [58][59][60], epidemiology [61][62][63], inventory control [64], mathematical finance [65], insurance [66,67], etc. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: BBR Bottleneck Bandwidth and RTT BDP bandwidth-delay product CLTRRP central limit theorem CLTRRP central limit theorem for renewal-reward processes CME conditional mathematical expectation CPP compound Poisson process cwnd congestion window size MAP maximum a posteriori probability MJP Markov jump process pdf probability density function RHS right-hand side RTO retransmission timeout RTT round-trip time TCP Transmission Control Protocol TVD the total variation distance