Path Integral Approach to Nondispersive Optical Fiber Communication Channel

In the present paper we summarize the methods and results of calculations for the theoretical informational quantities obtained in our works for the nondispersive optical fiber channel. We considered two models: the per-sample model and the model where the input signal depends on time. For these models we found the approach for the calculation of the mutual information exactly in the nonlinearity parameter but for the large signal-to-noise power ratio. Using this approach for the per-sample model we found the lower bound of the channel capacity in the intermediate power range.


Introduction
For a linear transmission system, Shannon [1] obtained the famous result for the channel capacity that is the maximal amount of information which can be transmitted through the channel with additive noise: where P is the input signal power, and P noise is the noise power. In the recent 25 years, the power and frequency bandwidth of the signals transmitted through optical fiber channels have been grown. It results in the necessity to take into account the Kerr nonlinearity when considering the modern optical fiber channels. The Kerr nonlinearity leads to the distortion of the signal and to the nonlinear interaction of the signal with the noise in the information channel. Therefore, for such channels, Shannon's result (1) should be modified to take into account the nonlinearity effects. It is worth noting that nonlinear effects in a channel depend on both a particular realization of the communication channel and physical fiber parameters. When designing the real transmission systems, say, in the wavelength division multiplexing systems, the following nonlinear effects should be taken into account: self-phase modulation, cross-phase modulation and so on [2]. These effects are due to the optical Kerr effect, i.e., the change of the refractive index of fiber material in response to the applied electric field. Concerning other fiber parameters, the second dispersion coefficient is the fundamental one, and the second dispersion coefficient varies from non-zero values (typical value is about β = 2 × 10 −23 s 2 /km) to almost zero ones. Around the world, the vast majority of fiber networks are fiberoptic communication channels with non-zero dispersion. However, the design of some fiber channels is arranged in the zero-dispersion region of the wavelength in the transmission windows: in particular,

Per-Sample Model
Following the papers [4,17] we imply that in the case of the per-sample model the equation of signal propagation has the form: where γ is the Kerr nonlinearity, ψ(z) is the signal function which obeys the boundary conditions ψ(0) = X, ψ(L) = Y, L is the signal propagation distance, X is the input signal and Y is the output signal. One can see that the Equation (3) does not contain terms which lead to decreasing of the signal in the propagation process and terms that compensate for the decreasing of the signal. It means that in the model, we have distributed amplifiers which completely compensate the attenuation of the signal in its propagation in the optical fiber. The only trace of amplifiers in Equation (3) is the nose function η(z). The functions η(z) has the following properties: the noise has zero mean η(z) η = 0 and following correlation function η(z)η(z ) η = Qδ(z − z ) . Here and below the bar means the complex conjugation and δ(x) is the Dirac delta-function. The coefficient Q is the noise power per unit length, so QL is the noise power in the channel. The brackets . . . η mean the averaging over the noise realizations in the channel.

Extended Model
In the model (3) the bandwidths of the input signal, amplifiers and receiver cannot be taken into account since the functions ψ and η do not depend on time. In order to include these bandwidths we extend the previous model. In the extended model the signal ψ(z, t) does depend on time. The propagation is described by the stochastic NLSE with zero dispersion: where the coefficient γ is also the Kerr nonlinearity coefficient. The function ψ(z, t) obeys the following conditions: The noise function η(z, t) has zero mean η(z, t) η = 0. We also imply that the correlation function has finite bandwidth, so the correlator for the function reads Here, the parameter Q denotes a noise power per unit length and per unit frequency; θ(x) is the Heaviside theta-function. The theta-function θ W 2 − |ω| indicates that the noise is not zero within the interval [−W /2, W /2], i.e., the bandwidth of the noise is equal to W . Performing the Fourier transform of Equation (7) we arrive at the correlator in the time domain: It is easy to check that if the time difference t − t = 2πn/W , and n is integer, then the correlator (8) equals to zero. Therefore, the noise at times t and t is not correlated, thus we can solve Equation (4) for different times t j = j∆ independently. Here j is the integer number and ∆ = 2π/W is the grid spacing in the time domain. Therefore, instead of the continuous time model (4) we can consider the set of the discrete models: ∂ z ψ(z, t j ) − iγ|ψ(z, t j )| 2 ψ(z, t j ) = η(z, t j ) (9) for the set of the time moments t j . So we obtain the set of independent time channels, and instead of the continuous input and output conditions (5) we obtain the set of the discrete ones: To include the bandwidth of the input signal to the model we represent the initial signal X(t) in the form: where C k are complex random coefficients which carry the information. These coefficients have the probability density function P X [{C}], where {C} = {C −N , . . . , C N }. We restrict our consideration by the envelopes f (t) which obey the following properties: the function f (t) is the real function and normalized by the condition: The last property means that the overlapping of the functions f (t − kT 0 ) and f (t − mT 0 ) is negligible. It means that we assume the smallness of the effects of the overlapping. The smallness of these effects will be discussed below. The function f (t) has almost finite support [−T 0 /2, T 0 /2], and the input signal X(t) is defined on the interval T = (2N + 1)T 0 . The finiteness of the support [−T 0 /2, T 0 /2] means that the frequency support of the function f (t) (and as a consequence, the support of the function X(t)) is infinite. However, we imply that where X(ω) is the Fourier transformation of X(t). The relation (13) means that T 0 W 1. So we can say that the bandwidth of the input signal is W.
The bandwidth broadening of the signal propagating through optical fiber is associated with the nonlinearity and noise η. To estimate the broadening which is connected with nonlinearity we can find the solution Φ(z, t j ) of Equation (9) with zero noise: This solution obeys the input condition Φ(0, t) = X(t). Since we know the solution Φ(L, t), we can find its bandwidth W. Strictly speaking, the bandwidth W is formally infinite, but the most part of the signal power is localized in the finite frequency region that can be specified as Below we assume the following hierarchy where W is the bandwidth of the function Φ(L, t).
To include the receiver bandwidth to the model we introduce the procedure of the output signal detection. In our model the receiver gets the information from the output signal, i.e., it recovers the coefficients {C}. We consider the following detection model. The receiver measures the output signal ψ(L, t j ) at the discrete time moments t j for j = −M, . . . , M − 1. Here the quantity M = T/(2∆) is the total number of the time samples. Since T T 0 , we have that M N. Such property of the receiver means that its time resolution coincides with the time discretization ∆. From Equation (16) it follows that ∆ 1/ W, therefore the receiver completely recovers the output signal in the noiseless case. Then the receiver removes the nonlinear phase and obtains the recovered input signal X(t). In fact, the procedure (17) means that we use the backward propagation procedure for the channel with zero dispersion. Finally, from the function X(t) the receiver recovers the coefficients C k by projecting X(t) on the basis functions f (t − kT 0 ): So the extended model contains the bandwidth of the input signal W, bandwidth the noise of amplifiers W , and bandwidth of the receiver. In our case, the bandwidth of the receiver coincides with the bandwidth of the noise because we choose the discretization in the information extracting procedure (18) coinciding with the initial channel discretization.

Channel Capacity and Its Bound
It is known [1] that the statistical properties of the memoryless channels such as the conditional entropy H[Y|X] and the output signal entropy H[Y] can be expressed through the conditional probability density function as where P[Y|X] is the conditional probability density function (i.e., the probability density to receive the output signal Y for transmitted signal X), P out [Y] is the probability density function of the output signal. The distribution P out [Y] has the following form: The measures DX and DY are defined in such a way that The mutual information of a memoryless channel is defined through the entropy H[Y] of the output signal and the conditional entropy H[Y|X] as The channel capacity C is defined as the maximum of the functional I P X [X] with respect to the input signal distribution P X [X]: The maximum value of I P X [X] should be calculated for the fixed average signal power. Note that since in Equations (19) and (20) we use the logarithm to the base e, here and below we measure the mutual information and the capacity in units nat/symbol. For the case of the per-sample channel, the signal power reads For the case of the extended model the signal depends on time, therefore, we have So, to find the channel capacity we should know the conditional PDF for both models. Let us start with the calculation of the conditional PDF for the per-sample model.

Conditional Probability Density Function
The conditional PDF can be written in the form of the path integral over all realizations of the signal ψ(z) in the channel [17,18]: where the effective action S[ψ] reads as the integral of the squared left-hand side of Equation (3) over the variable z: To calculate the path integral, we use the retarded discretization scheme which reflects the physics of the propagation process. The retarded discretization assumes that the derivation means ( The general approach for the derivation of the representation (28) and the argumentation for the retarded scheme one can find in Ref. [24]. For the first time, the conditional PDF P[Y|X] for the per-sample model was obtained in the form of infinite series in Ref. [4]: where the input signal X = ρe iφ (X) , the output signal Y = ρ e iφ (Y) , µ = γL|X| 2 is a dimensionless nonlinear parameter, k m = √ Q mγe i π 4 , and I |m| (z) is the modified Bessel function of the index |m|. The representation (29) was rederived using the path integral representation (28) in Refs. [17,18]. In Ref. [18] the authors applied two various methods: the recursive derivation based on the discretizing of the nondispersive NLSE and the properties of Markov chains (Chapman-Kolmogorov equation); and the derivation of the conditional PDF via the stochastic approach and Ito calculus [25].
Using the result (29), the lower bound of the capacity for the large signal power was found in Ref. [17]: Here the signal-to-noise ratio has the form Recall that the noise power for the per-sample model is QL. To obtain the result (30) the authors demonstrate that at large-signal power the phase of the output signal occupies the entire phase interval [0, 2π] due to the interaction of the signal with the noise. As a result, the phase does not carry the information, see also [18]. Therefore, to find the lower bound for the large signal power it is necessary to take only the term with m = 0 in Equation (29).
At the so-called intermediate power P range: it is necessary to take into account terms with m = 0 in the Equation (29). In Ref. [18], using Equation (29), the attempt was made to find the lower bound for the capacity in the intermediate power range. However due to the inconvenience of using of Equation (29) for analytical calculation this attempt was unsuccessful.
To calculate the mutual information in the intermediate power range we have to find the conditional probability density function (29) in a more convenient form. In Ref. [19] we found the method of the conditional probability density function calculation in the form of expansion in the parameter 1/SNR. Using the developed method we found the conditional PDF in the convenient for further calculation form. The expansion in 1/SNR, or in the small parameter Q similar to the semi-classical approximation (expansion in small Planck's constanth) in quantum mechanics.
Using the "semi-classical" method, see Ref. [26], we perform the change of integration variables (the simple shift with the Jacobian equals to unity) in Equation (28): and represent the conditional PDF (28) in the form: where the normalization factor Λ has the form hereψ i =ψ(z i ), ∆ z = L N z is the grid spacing. The Euler-Lagrange equation δS[Ψ cl ] = 0 has the following explicit form The boundary conditions for the function Ψ cl (z) are as follows Equation (37) is of great concern and we present two different approaches to its solution. One can find the analytical solution of Equation (37) in the polar coordinate system [19]: Ψ cl (z) = ρ(ζ)e iθ(ζ) , ζ = z/L. The solution depends on the following real integration constants: E,μ, ζ 0 and θ 0 . These constants should be expressed from two boundary conditions (38). There are two different types of the solution. The first and the second type corresponds to the cases E ≥ 0 and E ≤ 0, respectively. In the first case we have the solution: where k = √ 2E. The solution (39), (40) is obtained under conditionsμ ≥ k ≥ 0. The integration constants µ, k and ζ 0 must be found from the boundary conditions: If the integration constants are found then we can express the action in the form: In the second case (E ≤ 0) the solution has the form where k = √ −2E. The integration parametersμ, k, ζ 0 , and θ 0 are derived from the same procedure as in the first case. The action has the form One can see that to obtain the solution ρ(ζ), θ(ζ), it is necessary to solve the system of nonlinear equations Equations (41)-(44) for integration constants. Of course, the system (41)-(44) can be solved numerically, but for the analytical calculation of the mutual information, it is necessary to develop the method which allows us to find the solution Ψ cl (z) and action S[Ψ cl (z)] analytically as a functional of the input X and output Y signals.
In Ref. [19] we have proposed the method based on the expansion of the semi-classical solution in the vicinity of the solution of the Equation (3) with zero noise. This makes sense when the noise power is much less than the signal power.
To demonstrate the approach, we find the solution of (37) in the leading order in the parameter 1/SNR, linearizing Equation (37) in the vicinity of the solution Ψ 0 (z). The function Ψ 0 (z) is the solution of the Equation (3) with zero noise. It obeys the input boundary condition Ψ 0 (0) = X = ρe iφ (X) , ρ = |X|. Note that we do not assume smallness of the nonlinearity. The function Ψ 0 (z) reads where µ = γLρ 2 is the nonlinear dimensionless parameter. Let us represent the "classical" solution Ψ cl (z) in the form Here where the function κ(z) is assumed to be much less than the ρ: |κ(z)| ρ, i.e., Ψ cl (z) is close to Ψ 0 (z). Note that the function κ(z) depends on the output boundary conditions, therefore in a general case, the ratio |κ(z)|/ρ can be of order of unity. The boundary conditions for the function κ(z) are as follows: where x 0 = Re{κ(L)} and y 0 = Im{κ(L)}. It is important, that the configurations of κ(z) at which Ψ cl (z) significantly deviates from Ψ 0 (z) are statistically irrelevant. One can check that since the action achieves the absolute minimum (S[Ψ 0 (z)] = 0) on the solution Ψ 0 (z), the expansion of the action S[Ψ 0 (z) + δΨ(z)] starts from the quadratic terms for small κ(z): Therefore, the exponent e −S[Ψ cl (z)]/Q and, as a result, the conditional PDF P[Y|X] decreases exponentially when κ(z) √ QL. The next step in evaluation of the conditional probability P[Y|X] is the calculation of the path integral (35). To calculate the integral (35) in the leading order in parameter 1/ √ SNR one should retain only terms quadratic in the functionψ in the integrand. Any extra powers ofψ or κ lead to suppression in the multiplicative parameter √ QL, since for small Q the dominant contribution to the path integral comes from region whereψ ∼ √ QL. Since we calculate the the integral (35) in the leading order in the parameter Q, the function Ψ cl (z) can be replaced by the function Ψ 0 (z) in the exponent To find the next-to-leading order corrections in the parameter 1/ √ SNR to the conditional PDF P[Y|X] one should keep both κ(z) in Ψ cl (z) and higher powers ofψ in the exponent in Equation (35). Details of the path integral calculation in the leading and next-to-leading orders in 1/ √ SNR one can find in Ref. [19]. Here we present only the result obtained in the leading order in the parameter 1/ √ SNR: One can see that the expression (53) is much simpler than the exact result (29). Note that the distribution (53) has the following property The limit (54) is the deterministic limit of P[Y|X] in the absence of noise. Also Equation (53) has the correct limit for small γ: where the right-hand-side is the conditional PDF for the linear nondispersive channel with the additive noise. Let us compare the result obtained in the leading in 1/ √ SNR order (53) with exact result (29). In Figure 1 we plot the PDF P[Y|X] as the function of |Y| for X = 2 mW 1/2 , arg(Y) = µ, µ = γL|X| 2 = 4, so we choose γL = mW −1 , and for two values of parameter QL: QL = 1/2 mW, QL = 1/25 mW (it corresponds to SNR = 8 and SNR = 100, respectively). One can see the good agreement between the exact result (29) and the approximation (53) even for SNR = 8. For SNR = 100 the approximation almost coincides with the exact result. In the case when the SNR ≈ 10 2 -10 4 , which corresponds to optical fiber channels, the difference between the approximation and the exact result is of the order of 1/ √ SNR. To decrease this difference we should calculate the corrections of the order of 1/ √ SNR and 1/SNR, see the Section 3.1.3.

Probability Density Function P out [Y]
Let us consider the integral (21) which defines the the probability density function of the output signal: where the distribution P X [X] is a smooth function. Since the input signal power is P, we can expect that the function P X [X] changes on the scale X ∼ √ P which is assumed to be much greater than √ QL. For this case we can use the Laplace's method [27] to calculate the integral (56) up to the terms proportional to the noise power QL, for details of the calculation see Appendix C in Ref. [19]. The idea of the integral (56) calculation is based on the fact that the function P[Y|X] is much narrower than the function P X [X] (the function P[Y|X] is almost Dirac delta-function for the function P X [X]). Therefore, the calculation of the integral is simple and the result has the form [19]: When obtaining the result (57) it is not required to pass to the limit Q → 0 but only the relation P QL between the scales P and QL is used.
For the class of the distributions P X [X] depending only on absolute value |X| we have P out [Y] = P X [|Y|]. For such distributions we can calculate corrections to (57) in the parameter QL in any order in QL.
Let us restrict our consideration in the remainder of this sub-subsection to the case of the distributions P X [X] depending only on |X|. We can use conditional PDF P[Y|X] (29) found in Ref. [17]. In this case the function P out [Y] depends only on |Y| = ρ : Using this formula one can obtain the simple relation for P out [ρ ] within the perturbation theory in the parameter QL. Performing the zero order Hankel transformation [27]: for both sides of Equation (58), and using the standard integral with the Bessel J ν (x) and modified Bessel functions we arrive at the relation between the Hankel images of the output and input signal PDFs: Then, we perform the inverse Hankel transformation and obtain the following important result where ∆ ρ = d 2 dρ 2 + 1 ρ d dρ is the two-dimensional radial Laplace operator. The relation (62) allows us to find the corrections of orders of (QL) n to P out [ρ] by the expansion of the exponent and calculation of the action of the differential operator ∆ n ρ on the input PDF P X [ρ]. Let us consider the widely used example of the modified Gaussian distribution with one parameter β: where Γ (x) is the Euler gamma function. In the case of β > 0 the distribution P (β) The last one means that the average power of the input signal is equal to P. The generalized distribution P (β) and the Gaussian distribution for β = 2: Inserting (63) into Equation (58) we arrive at the standard integral, see [28], and obtain: where 1 F 1 (β/2; 1; z) is the confluent hypergeometric function. The function reduces to e z for the case of the Gaussian distribution, and to the expression e z/2 I 0 (z/2) for the case of the half-Gaussian one: The result (57) can be obtained from Equation (68) in the case QL |Y| 2 ∼ P. Expanding the right-hand side of the Equation (67) in the parameter QL/P we obtain with accuracy O(QL). The result (69) coincides with the general relation (57).

Lower Bound for the Channel Capacity
The estimates for the capacity of the per-sample model in the regime of very large SNR were obtained in Ref. [17]. To specify, it was considered the case P P noise γ 2 L 2 −1 , where P noise = QL is the noise power. In Ref. [17] the lower bound for the capacity of the per-sample channel was found. Using the trial half-Gaussian input signal PDF (64) authors obtained the following result: where γ E ≈ 0.5772 is the Euler constant. The second term on the right-hand side of Equation (30) was presented as O(1) in Ref. [17] but it can be found using Equations (23) and (24) of Ref. [17]. Comparing the result (70) with the Shannon result (1) it is worth noting that the pre-logarithmic factor 1/2 differs from the unity in the Shannon result. The physical meaning of the difference is that the signal's phase does not carry information when the power is very large: P P noise γ 2 L 2 −1 , see Ref. [17].
The most interesting power regime for the per-sample model is so-called intermediate power range defined in (32). In the regime we have on the one hand the parameter SNR is large, P noise P, and on the other hand the signal-dependent phase does not yet occupy the entire phase interval [0, 2π] due to the signal-to-noise interaction, i.e., the phase still carries information.
Capacity estimates in the intermediate power range (32) were presented in Ref. [18]. For such a power P the authors of this paper used the half-Gaussian input signal PDF (64) for the estimate of the lower bound for the capacity as well, see inequality (40) in Ref. [18]. But there were some flaws in the derivation of this inequality in Ref. [18], see the discussion in the Introduction of Ref. [19]. In our approach presented in Ref. [19] we solved a variational problem for the mutual information (24) with the normalization (22) and power restriction (26): we found both the optimal input signal distribution P X [X] maximizing the mutual information (24) in the leading and next-to-leading orders in 1/ √ SNR in the intermediate power range. Let us proceed with this calculation.
To begin with, when the parameter SNR 1 we can calculate the output signal entropy H[Y] by substituting P X Y exp −iγ|Y| 2 L to Equation (20) instead of P out [Y] due to the relation (57), and then performing the change of the integration variable φ = φ (Y) + γ|Y| 2 L we obtain Note that the output signal entropy in the form (71) (19). Then we change the integration variables DY ≡ dReYdImY to dx 0 dy 0 , and perform the integration over x 0 , y 0 . The result has the form: Thirdly, to find the optimal input signal distribution P opt X [X] we solve the variational problem for the functional J[P X , λ 1 , λ 2 ]: where λ 1,2 are Lagrange multipliers which corresponds to the normalization condition (22) and the condition (26) of the fixed average signal power P. The solution P opt X [X] of the corresponding Euler-Lagrange equations for (73) referred to as the "optimal" distribution: where coefficients N 0 (P) and λ 0 (P) are determined from the conditions: Note that in the leading order in the parameter 1/SNR the function P opt X [X] depends only on |X|. In the next-to-leading the order in 1/SNR this property holds true as well [20].
In the parametric form the power dependance of the parameters λ 0 and N 0 reads where G(α) = π 2 H 0 (α) − Y 0 (α) , Y 0 (α) and H 0 (α) are Neumann and Struve functions of zero order, respectively. The parameter α(P) is the real solution of the equation Note that the optimal input signal distribution P opt X [X] (74) differs from the half-Gaussian distribution (64).
For sufficiently large values of the power P, log(γPL) 1, we use the asymptotics of Y 0 (α) and H 0 (α) at small α and arrive at the following result for λ 0 (P) and N 0 (P): Here and below we use the notation is the convenient dimensionless nonlinear parameter. At small P, such that the nonlinearity parameter γ 1, the solutions of the Equations (75) and (76) have the form: Note that atγ → 0 the optimal distribution (74) passes to the Gaussian distribution (65). It is known that this distribution is optimal for a linear channel [1]. In Ref. [20] using the same method we found the first correction to P opt X [X] proportional to QL. The fourth, to calculate the mutual information we substitute the expression (74) for P opt X [X] in Equations (71) and (72) and using the definition (24) we obtain The last equation gives the mutual information I P opt X [X] with the accuracy O(QL). At smallγ we obtain This result is the Shannon capacity log (1 + SNR) at large SNR of the linear channel (1) with the first nonlinear correction.
In the high power sub-interval (γL) −1 P QL 3 γ 2 −1 using Equation (78) one can obtain the following asymptotics for the mutual information in the case of very large nonlinearity parameter (log(γ) 1): This expression is obtained with the accuracy 1/ log 2 (γ). One can see that the first term on right-hand side of the Equation (84) grows as log log P. It means that the mutual information I P opt X [X] also grows as log log P at large enough P.
Note that the results (74), (82), and asymptotics (78), (81), (83), (84) are obtained in the leading and next-to-leading order in the parameter 1/ √ SNR. Therefore, the results (74), (82) are calculated with the accuracy O (1/SNR). However, in the literature the bounds of the capacity rather than the asymptotic estimates are on the carpet. Therefore, to find the lower bound of the channel capacity for the per-sample model it is necessary to calculate the corrections of the order of 1/SNR and define their signs. Moreover, to find the applicability region of the result (82) we have to know the corrections of the order of 1/SNR as well. The applicability region will be defined by the condition that the corrections of the order of 1/SNR are much less than obtained results (74), (82). In Ref. [20] we have calculated these corrections using the approach described in Section 3.1. The calculation of these corrections is straightforward but cumbersome, therefore, here we present only the idea of the calculation and demonstrate the results for these corrections for the mutual information. The detailed calculation can be found in Ref. [20].
To calculate the correction to the mutual information we should know the corrections to the conditional probability density function (53). Therefore, we should calculate the corrections both to the action S and to the normalization factor Λ in Equation (34). To find these corrections we have to calculate the function κ(z), see Equation (51), in the leading, next-to-leading, and next-to-next-to-leading orders in the parameter 1/ √ SNR. Then we should substitute the found corrections for the action S to the path integral (35), and calculate the path integral up to the terms which are proportional to the parameter Q. After that, we expand the product of the exponent and the normalization factor Λ up to the terms of order of 1/SNR. This expression is cumbersome, and therefore, we do not present it here, but it can be found in Ref. [20]. To calculate the corrections to the mutual information we substitute the obtained result for P[Y|X] to the Equations (19)-(21), (24). Then, using the method described above, we perform maximization of the mutual information calculated with the accuracy 1/SNR, and obtain the following result for it: where C 0 is defined in (82). The correction ∆C has the form: The quantity ∆C corresponds to the first non-vanishing correction to the mutual information. One can verify that for the small parameter γL 2 Q 1, the correction (86) is always small with respect to C 0 . Indeed, the ratio of the expression in the curly brackets in Equation (86) andγ is the bounded function for all values ofγ.
We do not have the explicit analytical result for the correction ∆C for the arbitrary parameterγ, since the quantities λ 0 and N 0 are the solutions of the nonlinear equations. But the correction ∆C can be calculated numerically for any parameterγ. However, for the small and large parametersγ we can calculate the asymptotics of the quantity ∆C analytically.
For the small nonlinearityγ 1 we substitute the parameters λ 0 and N 0 in the form (81) to the Equation (86) and obtain: Using this result and the asymptotics (83) we obtain the mutual information within our accuracy in the form: One can see that the first term in the right-hand side is the the capacity of the linear channel, the second and the third ones correspond to the nonlinear corrections. The nonlinear corrections in (88) are negative and they decrease the mutual information I Note that this correction is suppressed parametrically as γL 2 Q instead of 1/SNR = QL/P. One can see the correction decreases at largeγ as 1/ logγ. It is interesting that at largeγ the correction ∆C is positive, therefore, it slightly enhances the mutual information I . Since the correction ∆C is positive in the region defined as log(γLP) 1, P (γ 2 QL 3 ) −1 , and the next-to-leading corrections to the mutual information are suppressed parametrically, the quantity C 0 is the lower bound of the per-sample channel capacity: Since there are no corrections of order of γ 2 L 3 QP at large P, see Equation (90), we expect the next correction containing the power P to be of order of (γ 2 L 3 QP) 2 , see Ref. [19]. Therefore, the applicability region at large P for the quantity C 0 is determined by the condition (γ 2 L 3 QP) 2 1. For the given small parameter γ 2 L 3 QP this condition extends the applicability region for the lower bound of the channel capacity C 0 . For realistic channel parameters presented in the Table 1. One can see that the range is very wide. For the presented parameters we have numerically calculated the lower bound C 0 using Equations (77) and (82). The result of the calculation is presented in Figure 2. Also in Figure 2 we present the comparison of the approximation (85) with the Shannon capacity of a linear channel and with the asymptotic capacity bound (70).  In Ref. [22] the comparison of our result (82) for the NLSE per-sample model (authors referred to our model as the memoryless NLS channel (MNC)) with two other models, more precisely, the regular perturbative channel (RPC) and the logarithmic perturbative channel (LPC), was performed. The comparison was illustrated in Figure 1 of Ref. [22], and we present this figure in the Figure 3. The authors claimed that they established a novel upper bound on the capacity of the NLSE per-sample model (the violet curve U MNC (P) in Figure 3). In addition, the authors considered various input signal distributions within MNC: half-Gaussian (64), Gaussian (65), and modified Gaussian distribution (63) optimized in parameter β (it is denoted as "Max-chi" in Figure 3). They used the following channel parameters: γ = 1.27 W −1 km −1 , L = 5000 km, QL = 7.36 × 10 −3 mW. For these parameters the upper limit of the intermediate power range (we choose it as P max = 6π 2 Qγ 2 L 3 −1 , see [18]) is estimated as P max = 0.2 W = 23 dBm. It is obvious in Figure 3, that up to this power P max our result (green solid line) is consistent with the capacity upper bound of the per-sample model (the violet curve U MNC ) and it exceeds other mutual information curves for other input signal distributions in all intermediate power range. Of course, for large powers (P P max ) our input signal distribution (74) is not the optimal any more, and the mutual information (85) underestimates the real capacity. In the strict sense, the corrections to (85) are small only up to P ∼ 5.5 dBm, and when P > 5.5 dBm we have the transition range from the intermediate power range with the optimal input signal distribution (74) to the large power semirange where the optimal distribution is believed to be the half-Gaussian one, see [17,18]. The large extension of the transition range (up to P max ∼ 23 dBm) can be explained by the smallness of the next-to-leading order corrections (86) and its decreasing when increasing the power, see Equation (90).
To finalize the per-sample channel consideration we emphasize the main results. We developed the path integral approach to the calculation of the conditional PDF P[Y|X] for large SNR. We demonstrated that for the nonlinear nondispersive channel the lower bound C 0 of the capacity increases only as log log P at large signal power P instead of the behavior log P that is specific for the channel with zero nonlinearity. To determine the applicability region and the accuracy of the found quantity C 0 we calculated the first non-zero correction ∆C proportional to the noise power QL in the intermediate power range QL P (γ 2 L 3 Q) −1 . We demonstrated that the quantity ∆C is small in the intermediate power range, and it is the positive decreasing function at large signal power P: (γL) −1 P (γ 2 L 3 Q) −1 . The found result is in agreement with recent results of Ref. [22].

Extended Model: Considerations of the Time Dependent Input Signals
Let us start this section from the consideration of the conditional probability density function P[Y|X]. In the case of the nontrivial time dependance of the input X(t) and output Y(t) signals the conditional PDF reads Ref. [11]: where the effective action S[ψ] has the form: ; the integration measure Dψ(z, t) depends on both z− and t− discretization scheme: where ∆ = T/(2M) = (2N + 1)T 0 /(2M) is the time grid spacing and ∆ z = L/N z is the z-coordinate grid spacing. Of course, the expression (93) contains all information about the transmitted signal and its interaction with the noise, but as strange as it sounds, the expression (93) contains redundant information (i.e., degrees of freedom which cannot be detected). The point is that in the realistic communication channel the receiver has the finite bandwidth, it means that it reduces somehow the bandwidth of the received signal Y(t). After that, there is a procedure of the extraction of the information from the signal measured by receiver. This detection procedure should be implemented in the function P[Y(t)|X(t)].
To demonstrate the point, let us consider the input signal X(t) in the form (11). In such form, the number of the degrees of freedom in the path integral (93) is infinite, if the function X(t) is continuous, or 2M degrees of freedom if we set the function X(t) in the discrete-time moments t i , see the text after Equation (17). However, the transmitted information is carried only by 2N + 1 coefficients C n (N M), see Equation (11). It means that to obtain the conditional probability density function which describes the transmission of the information carried by the set of the coefficients C n we have to integrate over the redundant degrees of freedom. The approach based on the path integral representation (93) is general and it can be used for any signal model, any receiver and projecting procedure (18). For our signal model described in the Section 2.2, we can use a simpler method of the calculation of the conditional PDF P[{ C}|{C}]. Below we describe the method.
In our model, the signal propagation for different time moments t j is independent because the dispersion is zero and the noise is not correlated for different time moments t i = t j . Therefore, the conditional PDF P[Y(t)|X(t)] can be presented in the factorized form: where X j = X(t j ), Y j = Y(t j ), and P j [Y j |X j ] is per-sample conditional PDF described in the previous Section 3.1.1, where we should replace Y → Y j , X → X j , Q → Q/∆. Our goal is to find the PDF P[{ C}|{C}] in the leading order in the parameter 1/SNR. Instead of calculation of the path integral, we build the PDF which reproduces all possible correlators of C k : C k 1 , C k 1 C k 2 , C k 1 . . . C k n for the fixed input set {C} in the leading order in the parameter Q. These correlators read as where d 2 Y j = dReY j dImY j , and C k is defined in Equation (18). After substitution of Equations (95), (53), and (18) into Equation (96) and performing the integration we obtain in the leading order in the noise parameter Q: where δ m,n is the Kronecker symbol and we have introduced the following notation for the integral of the s-th power of the pulse envelope function f (t): We remind that f (t) is assumed to be normalized by the condition n 2 = 1. Note that the correlator C k − C k is proportional to QL/∆ = QLW /(2π), i.e., it is proportional to the total noise power in the whole bandwidth W . The reason for that is the bandwidth of the receiver coincides with the bandwidth of the noise. The correlators (98) and (99) are proportional to QL/T 0 and do not depend on the discretization parameter ∆. It means that these correlators depend only on the bandwidth of the envelope function f (t). So we obtain that in the leading order in the parameter Q the shift of the mean value C k due to the signal-noise interaction is proportional to the total noise power in the channel (W = 2π/∆), whereas the spread around the average value, see (98), (99), is proportional to the noise power containing in the bandwidth W of the pulse envelope (bandwidth of the pulse envelope coincides with the bandwidth of the signal X(t)). Note that the higher-order corrections in parameter Q to the correlators are more complicated and contain the noise bandwidth, see details in Appendix A of Ref. [21]. The correlators of higher orders in C can be calculated in the leading order in the parameter Q using Equations (97)-(99).
To verify the analytical results (97)-(99) the numerical simulations of pulse propagation through a nonlinear nondispersive optical fiber were performed in Ref. [21]. After that the numerical results for correlators (97)-(99) were obtained. To find these correlators the Equation (4) was solved numerically for the fixed input signal X(t) and for different realizations of the noise η(z, t). After that the detection procedure described by Equations (17) and (18) was applied. Finally, the averaging procedure over noise realizations for the coefficients C k was performed. Two numerical methods of the solution of Equation (4) were used: the split-step Fourier method and the Runge-Kutta method of the fourth order. It was shown that the numerical results do not depend on the numerical method and these results are in consistent with analytical ones for different realizations of the input pulse envelope f (t) and the noise bandwidth W . Below we present the comparison of the numerical and analytical results obtained in Ref. [21]. The numerical simulation was done for the following channel parameters, see the Table 2. We choose the duration of one pulse as T 0 = 10 −10 s. Simulations were performed for different t-meshes, i.e., for different time grid spacing ∆, and for different pulse envelopes. Different grid spacings ∆ correspond to the different noise bandwidths W = 2π/∆ for the fixed noise parameter Q. The numerical calculations were performed for the different ∆ presented in the Table 3. The different grid spacings determine the different widths of the conjugated ω-meshes in the frequency domain: 1/∆ 1 = 10.26 THz, 1/∆ 2 = 5.12 THz and 1/∆ 3 = 2.56 THz. In Ref. [21] the different envelopes f (t) were considered as well. Here we present results only for the Gaussian envelope: where T 1 = T 0 /10 = 10 −11 s stands for the characteristic time scale of the function f (t). Such relation between T 0 and T 1 means that the overlapping between different pulses is negligible. For pulses with envelope (101) the coefficients n s defined in Equation (100)   [This figure is taken from Ref. [21]]. The real part of the relative difference of the coefficient C k and the correlator (97) in units 10 −3 as a function of |C k | 2 , see [21]. Dashed-doted, dashed, and solid lines correspond to an analytic representation (97) for time grid spacings ∆ 1 , ∆ 2 , ∆ 3 , respectively. Circles, squares, and diamonds correspond to numerical results for time grid spacings ∆ 1 , ∆ 2 , ∆ 3 , respectively. [This figure is taken from Ref. [21]]. The imaginary part of the relative difference of the coefficient C k and the correlator (97) in units 10 −3 as a function of |C k | 2 , see [21]. Dashed doted, dashed, and solid lines correspond to analytic representation (97) for time grid spacings ∆ 1 , ∆ 2 , ∆ 3 , respectively. Circles, squares, and diamonds correspond to numerical results for time grid spacings ∆ 1 , ∆ 2 , ∆ 3 , respectively.
One can see the good agreement between analytical and numerical results depicted in Figure 4. There is some difference in the imaginary part of analytical and numerical results corresponding to grid spacing ∆ 1 at large |C k |, see Figure 5. The reason is that, for the analytical results corresponding to ∆ 1 it is necessary to take into account the next corrections in the parameter Q Ref. [21]. The numerical and analytical results for the correlators (98), (99) are also in a good agreement, for details see Ref. [21]. In this paper, it was demonstrated that the relative importance of the next-to-leading order corrections for the correlators (98) and (99) is governed by the dimensionless parameter QL ∆ γL 2 γLP, i.e., it increases linearly for increasing power P.
Now we can proceed to the search for the conditional probability density function. Using the correlators (97)-(99) we build the conditional PDF P[{ C}|{C}] which reproduces all correlators of the coefficients C m in the leading order in parameter Q. Thus, the conditional PDF has the form [21]: where where φ m = arg C m , µ m = γL|C m | 2 , and The parameter ξ 2 obeys inequality ξ 2 ≥ n 6 > 0 due to Cauchy-Schwartz-Buniakowski inequality. For the Gaussian envelope (101) this parameter is and for the chosen parameters T 1 = T 0 /10 = 10 −11 sec one obtains ξ ≈ 5.08. Equation (102) means that our channel decomposes to the 2N + 1 independent information channels. Therefore, the function P m [ C m |C m ] describes the channel corresponding to the m-th time slot. The function P m [ C m |C m ] obeys the normalization condition Since there are 2N + 1 independent channels, we can choose the input signal distribution P X [{C m }] in the factorized form: and we can consider only one channel, say m-th channel. One can see that the presentation of the conditional PDF (103) is close to the presentation (53) for the per-sample PDF. Also the function P m [ C m |C m ] changes significantly when the variable C m changes on the value of order of √ QL/T 0 for fixed value of C m . Such behavior coincides with that for the function P[Y|X] for the per-sample model. Below, we imply that where P is the mean power of the m-th pulse, i.e., the signal power is much greater than the noise power in the whole bandwidth W = 2π/∆ and in the input signal bandwidth W. For the extended model we define the signal-to-noise ratio as on the assumption (110) we have SNR 1. We also imply that the PDF of the input signal P (m) X [C m ] is a smooth function that changes on a scale |C m | ∼ √ P. Therefore, using a consideration similar to that for the per-sample model we obtain that the PDF of the output signal reads in the leading order in the parameter 1/SNR it has the form: Just as a reminder, to obtain result (113) we perform the integration in Equation (112) using Laplace's method [27], see details in Ref. [19].
Since we know the conditional PDF P m [ C m |C m ] and output signal PDF P , and find the optimal input signal PDF in the form: Substituting the result (114) to the mutual information, we obtain the following result: where the parameters N 0 and λ are the solutions (as functions of power P) of the normalization conditions for the function (114): where we have performed the change of variables, |C m | to ρ. Note that the results (115)-(116) are obtained in the leading order in the parameter 1/SNR. Note that the expression (115) is obtained for one m-th pulse, therefore, to obtain the mutual information of the channel with the input signal (11) it is necessary to multiply the right-hand-side of Equation (115) by the number of the independent channels, i.e., 2N + 1.
One can see that relations in the Equation (116) coincide with ones in the Equation (76) after changing the parameter γ → γ/ξ. Therefore, to obtain the results for the mutual information I  The asymptotics has the form: for ξγLP 1, and for the case when power P obeys the conditions log ξγLP 1, P T 0 /(QL 3 ξ 2 γ 2 ) we have the asymptotics Note that the asymptotics (118) is obtained with accuracy 1/ log 2 (ξγLP). The found mutual information (115) is calculated for the fixed shape of the pulse envelope f (t). It is worth noting that the pulse shape f (t) is encoded only through one parameter ξ, see Equation (106). Therefore, strictly speaking, to find the capacity one should find the maximum over all forms of the pulses. On the one hand, we can consider the expression (115) as the estimation of the capacity of the channel whose model implies the specific given shape of the pulse envelope f (t). On the other hand, using Equation (118) for fixed power P one can increase the value of I . Note that we have obtained similar asymptotics for the per-sample model. The time dependence of the pulse leads to same asymptotics behavior with modified nonlinearity parameter γ (γ → ξγ). It worth noting that all our calculations were performed under the assumption of the negligible overlapping of the pulses, see Equation (12). It means we imply that the corrections due to the overlapping effects are at least of the same order as the next-to-leading order corrections in the parameter QL. We can satisfy this condition by choosing appropriate pulse parameters T 0 and T 1 .

Conclusions
In our review, we considered nondispersive nonlinear optical-fiber communication channel with the additive noise. We studied two different models of the channel: the per-sample model and the extended model. For these models, we present results of the calculation for the following theoretical information characteristics such as the output signal entropy, conditional entropy, mutual information in the leading order in the parameter 1/SNR. To calculate these quantities two methods were developed for the calculation of the conditional probability density function P[Y|X] [19,21]. The first method which was used for the P[Y|X] calculation for the per-sample model is based on the path integral approach. In the approach, the path integral (28) was treated using the saddle point method, i.e., the expansion in the parameter 1/SNR, see Refs. [19,20]. This method was used to obtain the expression for the conditional PDF P[Y|X] which is convenient for the analytical calculations of the output signal PDF P out [Y], entropies, the mutual information, and the optimal input signal distribution in the leading [19], next-to-leading [20] orders in the parameter 1/SNR. These calculations allow us to find the lower bound of the channel capacity for the per-sample model in the intermediate power range. The second method was applied to the investigation of the extended model, which contains such characteristics as the bandwidths of the input signal, the noise, and the receiver, and also takes into account the projection procedure (18). The second method is based on the calculation of the correlators of the output signal for the fixed input signal in the leading and next-to-leading orders in the noise parameter Q, see Ref. [21]. Using these correlators the conditional PDF P[{ C}|{C}] which reproduces all these correlators in the leading order in the parameter 1/SNR was constructed. The knowledge of the function P[{ C}|{C}] allowed us to find the informational characteristics of the extended model [21].
We compared the results of our calculations for the per-sample model with limitations on the capacity obtained by other authors [4,17,18,22]. We demonstrated that the conditional PDF P[Y|X] presented even in the leading order in the parameter 1/SNR reproduces the exact result (29) with high accuracy, see Figure 1. Using the expression (53) in the intermediate power range, we found the lower bound (82) of the capacity which is consistent with the recent results obtained in Ref. [22].
For the extended model [21] we present results of the calculation of the output signal correlators for the fixed input signal and demonstrate that the difference in the average value for the recovered coefficient C k and the input coefficient C k is proportional to the noise power containing in the total noise bandwidth, see Equation (97). Whereas the covariances (98), (99) are proportional to the noise power containing in the input signal bandwidth. This behavior of the mean value C k is related to the model of the receiver (i.e., the bandwidth of the receiver coincides with the bandwidth of the noise). The obtained analytical results (97)-(99) were confirmed by the direct numerical calculations, see Figures 4 and 5. Therefore, the constructed conditional PDF P[{ C}|{C}] contains information about the bandwidths of the signal, noise and receiver [21]. This result is in agreement with assertions made in Ref. [14]. Despite the dependance of the PDF P[{ C}|{C}] on the noise bandwidth the mutual information calculated in the leading order in 1/SNR for the extended model depends only on the noise power containing in the input signal bandwidth. Since we have not calculated the corrections in the parameter 1/SNR to the mutual information (115), we can only to consider this quantity as the capacity estimation rather than the lower bound of the capacity.
The models considered in the present paper are widely-spaced from the modern communication systems where the coefficient of the second dispersion is not zero and the signal detection procedure differs from considered above. The effects related to the non-zero dispersion and properties of the receiver can significantly change the results for the mutual information obtained in our consideration. However, the methods described in the present paper may be useful for the consideration of real communication systems. The calculations performed in Refs. [29][30][31][32][33] is indicative of the possibility to use the presented methods for the capacity investigation of the nonlinear fiber-optical channels with non-zero dispersion.
Author Contributions: All authors performed the calculations together. The consideration of the per-sample model was prepared predominantly by I.S.T., whereas the consideration of the extended model was prepared predominantly by A.V.R. All authors have read and agreed to the published version of the manuscript.