Extending Fibre Nonlinear Interference Power Modelling to Account for General Dual-Polarisation 4D Modulation Formats

In optical communications, four-dimensional (4D) modulation formats encode information onto the quadrature components of two arbitrary orthogonal states of polarisation of the optical field. Many analytical models available in the optical communication literature allow, within a first-order perturbation framework, the computation of the average power of the nonlinear interference (NLI) accumulated in coherent fibre-optic transmission systems. However, all such models only operate under the assumption of transmitted polarisation-multiplexed two-dimensional (PM-2D) modulation formats, which only represent a limited subset of the possible dual-polarisation 4D (DP-4D) formats. Namely, only those where data transmitted on each polarisation channel are mutually independent and identically distributed. This paper presents a step-by-step mathematical derivation of the extension of existing NLI models to the class of arbitrary DP-4D modulation formats. In particular, the methodology adopted follows the one of the popular enhanced Gaussian noise model, albeit dropping most assumptions on the geometry and statistic of the transmitted 4D modulation format. The resulting expressions show that, whilst in the PM-2D case the NLI power depends only on different statistical high-order moments of each polarisation component, for a general DP-4D constellation, several other cross-polarisation correlations also need to be taken into account.


Introduction
With the resurgence of polarisation-diverse, optical coherent detection, transmission of information over an optical fibre is typically performed exploiting four degrees of freedom of the optical field: two quadrature components over two orthogonal states of polarisation. The standard approach consists in encoding data independently over the two polarisation channels using the same two-dimensional (2D) modulation format. The resulting four-dimensional (4D) constellation is often referred to as a polarisation-multiplexed 2D (PM-2D) modulation format. The strong point of PM-2D formats is their simplicity of generation and performance analysis: as the two polarisation channels are independent and under the assumption of data-independent cross-polarisation interference in the fibre channel, transmission performance can be evaluated using the 2D component format.
Despite the popularity of PM-2D formats, a substantial amount of research work in the literature has been devoted to more general 4D formats, i.e., 4D constellations which are not necessarily generated as Cartesian products of a component 2D constellation [1,2]. These formats have recently regained attention due to their potential power efficiency, nonlinearity tolerance, and ultimately their still unexplored shaping gains. The reason for that relies on the fact that, by exploiting the full 4D space, sensitivity and other relevant performance metrics such as mutual information or generalised mutual information can be improved compared to traditional PM-2D formats [3][4][5][6][7].
Previous works on optimised 4D modulation formats have either operated under an additive white Gaussian noise channel hypothesis [1][2][3] or exploited some heuristic approaches to derive nonlinearly tolerant formats in the fibre-optic channel [5][6][7]. However, accurately predicting the amount of nonlinear interference generated by transmission of a given constellation in an optical fibre is key to optimising its shape in N dimensions.
Modelling of nonlinear interference (NLI) in optical fibre transmission is quite a mature field of research where an impressive amount of progress was made in the first half of the 2010s, e.g., in [8][9][10][11]. In particular, [10,11] introduced for the first time the possibility of predicting the dependency of nonlinear interference power as a function of the modulation format features, i.e., geometrical shape and statistical properties. Among other assumptions, one underlying key point of all previous models is the transmission of PM-2D modulation formats, where data on the two polarisation channels are assumed to be independent and identically distributed. Under this constraint, one can predict the NLI power using the statistical properties of the 2D component modulation format. It is clear, however, that this approach ceases to be applicable to general DP-4D formats, where a single 2D component format might not even exist.
In this work, we extend the existing analytical expressions for NLI power to account for DP-4D constellations where the two 2D polarisation components are not identically distributed or when there is statistical dependency between them. The undertaken approach is the same as in [9], i.e., a frequency-domain, first-order perturbation study. Unlike [9], no assumptions are made on either the marginal or joint statistics of the two polarisation components of the transmitted 4D constellation (besides being zero-mean). The final expressions reveal the impact of several different (nontrivial) cross-polarisation statistics on the NLI power.
The formulas presented in this work enable an accurate computation of the NLI power for all possible dual-polarisation formats in optical fibre transmission. As a result, a reliable optimisation of both geometry and symbol probability of occurrence of such 4D formats is also enabled for the optical fibre channel.

Organisation of the Manuscript and Notation
The manuscript is organised as follows: (i) in Section 3, the investigated system model is described and the model assumptions are presented; (ii) Sections 4-8 are devoted to a step-by-step analytical derivation of the model; and (iii) ultimately, the main model expression is presented in Section 8. In particular: in Section 4, the regular perturbation (RP) solution to the frequency domain Manakov equation is derived for a multi-span fibre system and its power spectral density (PSD) is evaluated in the case of a transmitted periodic signal; in Section 5, the contributions of the different high-order moments and cross-polarisation correlations of the transmitted 4D modulation format are highlighted; finally, Section 8 derives, via Theorem 2, an expression for the PSD as the signal period is extended into infinity. A flowchart of the main derivation steps performed in this work, with their corresponding references in the manuscript, is shown in Figure 1.
Throughout this manuscript, we denote 2D (column) vectors with boldface letters (e.g., a), whereas 2D column vector functions are indicated with boldface capital letters (e.g., E( f , z),Ẽ(t, z), etc.). For indicating the optical field, the first variable of represents either the time or frequency variable whereas the second one represents the fibre propagation section. An exception is made for the multi-span system case, where second and third variables are assigned to the number of spans and span length, respectively. This highlights the joint dependence of the output optical field on these two variables, as shown later in the paper. F {·}, E{·}, and Re{·} indicate the Fourier transform, the statistical expectation, and the real part operators, respectively. The delta distribution is indicated by δ(·), whereas δ k denotes the Kronecker delta defined as δ k 1 for k = 0, 0 elsewhere.
Finally, Z, R, and C denote the integer, real, and complex fields, respectively, and j is the imaginary unit.

System Model
The baseband equivalent model of the optical fibre system under investigation in this work is shown in Figure 2. The fibre channel is a multi-span fibre system using Erbium-doped fibre amplification (EDFA). In this manuscript, it is assumed that a single-channel signal is transmitted. The transmitter is assumed to generate for each symbol period n the 4D symbol a n = [a x,n , a y,n ] T , where a x,n , a y,n ∈ C are complex symbols modulated on two arbitrary orthogonal polarisation states x and y, respectively. The sequence of symbols a n for n ∈ Z is assumed to be a cyclostationary process of period W. The set of random variables (RVs) within each period of such process are also assumed to be statistically independent. Linear modulation with a single, real pulse p(t) on x and y polarisation is adopted. The pulse p(t) with spectrum P( f ) is assumed to be strictly band-limited within the range of frequencies [−R s /2, R s /2]. As discussed in Section 3.3, the transmitted signalẼ(t, 0) is assumed to be periodic with period T, such that T s = 1/R s = T/W represents the symbol period, and R s is the symbol rate. A schematic representation of the transmitted signal is shown in Figure 3. The signalẼ(t, 0) is transmitted over N s (homogeneous) fibre spans, each of length L s and each followed by an ideal lumped optical amplifier for which the gain exactly recovers from the span losses. Since in this work we are only concerned about the prediction of NLI arising from the signal-signal nonlinear interactions along the fibre propagation, the optical noise added by the amplifier plays no role in the model and will be entirely neglected. The signal at the channel outputẼ(t, N s , L s ) is ideally compensated for accumulated chromatic dispersion in the link (see Section 4). In the frequency-domain output of the chromatic dispersion compensation (CDC) blockE( f , N s , L s ) (Figure 2), we ideally isolate the first-order RP term E 1 ( f , N s , L s ) (see Section 4) and we compute its PSD S( f , N s , L s ). The vector of the NLI powers Σ NLI [σ 2 NLI,x , σ 2 NLI,y ] T for both x and y polarisations is obtained by integrating over the frequency interval [−R s /2, R s /2] the NLI PSD weighted by the function |P( f )| 2 , where P * ( f ) is the frequency response of a matched filter (MF) for the system under consideration. As shown in Figure 2, this quantity is equivalent to the variance of the output of the MF followed by symbol-rate sampling, which more naturally arises when assessing the transmission performance of systems employing an MF at the receiver. The model in this manuscript provides an analytical relationship between the statistical features of the transmitted symbols a n and Σ NLI .

DP-4D vs. PM-2D Formats
The model presented in this paper allows for prediction of the NLI for generic 4D real modulation formats. A 4D format is defined as a set where a x and a y are the symbols modulated on two orthogonal polarisation states x and y, respectively, and M is the modulation cardinality. It can be seen that the elements in A are 2D vectors in C as opposed to 4D. This is only due the to baseband-equivalent representation of signals used throughout this paper, while it is common to refer to a modulation format dimensionality based on the real signal dimensions, which justifies the 4D format label. Two important particular cases of the formats in (2) are (i) the so-called polarisation-multiplexed 2D (PM-2D) modulation formats, which are characterised by A = X 2 , X ∈ C, where X represents the 2D component constellation, and (ii) polarisation-hybrid 2D modulation formats characterized by A = X × Y, with X , Y ∈ C, X = Y, where X and Y are two distinct component 2D formats in x and y polarisation, respectively. PM-2D formats are the most common ones in optical communications due to their generation's simplicity. Both PM-2D and polarisation-hybrid 2D formats are often analysed in terms of their 2D polarisation components. This is because A can be factorised in two component formats which are independently encoded. Hence, if the generic transmitted constellation point is regarded as a random variable, in a conventional PM-2D format, the two polarisation components are statistically independent. In the remainder of this paper, no specific assumption on either the geometry or the statistic of the transmitted 4D symbols will be made, except the zero-mean feature E{a (i) } = 0.

Transmitted Signal Form
LetẼ(t, z) =Ẽ x (t, z)i x +Ẽ y (t, z)i y be the complex envelope of the optical field vector at time t and fibre section z, and let i x , i y denote 2 orthonormal polarisations of the transversal plane of propagation.
Because of the periodicity assumption made in (1) (see Figure 3), we can writeẼ(t, 0) as where C k = [C x,k , C y,k ] T , C x/y,k are the Fourier series coefficients ofẼ(t, 0) and ∆ f = 1/T is the frequency spacing of the spectral lines in E x/y ( f , z). Hence, E( f , 0) can be then written as Since each component ofẼ(t, 0) is periodic with period T, we can writẽ where, as per assumption in (1), we havê and, W is assumed to be odd without loss of generality. Under the above assumptions, the Fourier coefficients in (3), for k ∈ Z, are given by where P( f ) F {p(t)} and are the discrete Fourier transforms of the sequence a n , n = 0, 1, . . . , W − 1. Note that the approximation in (5c)-(5d) is justified only for large enough values of T as and letting T → ∞ will be the approach taken at a later stage in this derivation. Finally, combining (4) and (5f), we obtain where the approximate equality on the right-hand side of (7) stems from the fact that p(t) is assumed to be strictly or quasi-strictly band-limited (see Section 3.1). Hence, P(k∆ f ) is effectively equal to zero for k = −W/2, −W/2 + 1, . . . , W/2.

PSD of the First-Order NLI for Periodic Transmitted Signals
To find an analytical expression for NLI power, first, a solution as explicit as possible to the Manakov equation [12] ∂Ẽ(t, z) ∂z must be found. Equation (8) describes the propagation of the optical fieldẼ(t, z) in a single strand of fibre (e.g., a fibre span with no amplifier in the system in Figure 2). In this case, α, β 2 , and γ representing the attenuation, group velocity dispersion, and nonlinearity coefficients, respectively, can be assumed to be spatially constant. As it is well-known, general closed-form solutions are not available for (8). Like most of the existing NLI power models in the literature, the model derived here operates within a first-order perturbative framework. In particular, a frequency-domain first-order regular perturbation (RP) approach in the γ coefficient is performed [13,14], i.e., the Fourier transform of the solution in (8) is expressed as x,n +ν x,k ν * x,m ν x,n ν * y,k ν y,m ν * x,n + ν y,k ν * y,m ν x,n ν * x,k ν x,m ν * x,n + ν y,k ν * y,m ν x,n ν * y,k ν y,m ν * x,n η k,m,n η * k ,m ,n , where we have defined P k,m,n,k ,m ,n P(k∆ f )P * (m∆ f )P(n∆ f )P * (k ∆ f )P(m ∆ f )P * (n ∆ f ).
The following proposition can be used to make (21b) more compact, and in particular, it will be used to group the two inner correlation terms in (21b) ( ν x,k ν * x,m ν x,n ν * y,k ν y,m ν * x,n and ν y,k ν * y,m ν x,n ν * x,k ν x,m ν * x,n ).

Proposition 1.
For P k,m,n,k ,m ,n in (22), we have Proof. See Appendix B.
Using (23) and (21b) can be written as x,m ν x,n ν * x,k ν x,m ν * x,n + ν y,k ν * y,m ν x,n ν * y,k ν y,m ν * x,n η k,m,n η * k ,m ,n + 2 Re{P k,m,n,k ,m ,n ν x,k ν * x,m ν x,n ν * y,k ν y,m ν * x,n η k,n,m η * k ,n ,m } (24) x,n +E ν y,k ν * y,m ν x,n ν * y,k ν y,m ν * x,n η k,m,n η * k ,m ,n + 2 Re{P k,m,n,k ,m ,n E ν x,k ν * x,m ν x,n ν * y,k ν y,m ν * x,n η k,n,m η * k ,n ,m }, where the real part operator arises from the sum of the complex conjugate terms discussed in the Proposition section (Section 1). According to (24), calculation of the PSD of the NLI reduces to the computation of a four-dimensional summation (per frequency component i∆ f ) of three sixth-order correlations of the sequence of random variables ν x/y,n , n = 0, 1, . . . , W − 1. The y-component S y ( f , N s , L s ) of the PSD can be calculated once S x ( f , N s , L s ) is obtained by simply swapping the polarisation labels x → y and y → x. This is due to the invariance of the Manakov equation in (8) to such a transformation.

Classification of the Modulation-Dependent Contributions in the 6th-Order Frequency-Domain Correlation
In this section, we will break down the frequency-domain sixth-order correlation terms in (24) to highlight different contributions in terms of 4D modulation-dependent cross-polarisation correlations.

Expansion in Terms of the Stochastic Moments of the Transmitted Modulation Format
To relate the PSD in (24) to the statistical properties of the transmitted modulation format, we replace (6) into (24), obtaining m,n,k ,m ,n η k,m,n η * k ,m ,n · ∑ i∈{0,1,...,W−1} 6 S i (k, m, n, k , m , n ) + 2 Re{P k,m,n,k ,m ,n η k,m,n η * k ,m ,n · ∑ i∈{0,1,...,W−1} 6 where i (i 1 , i 2 , . . . , i 6 ), and The terms S i (k, m, n, k , m , n ) and T i (k, m, n, k , m , n ) give rise to several correlations among the transmitted symbols a x,i and a y,j at different time-slots i, j, each weighted by a complex exponential. As discussed in Section 3, in this work, we operate under the assumption that the sequence of vector RVs a i for i = 0, 1, . . . , W − 1 are independent, identically distributed (i.i.d.), and with E{a i } = E{a} = 0. As shown in the following example, this assumption allows us to discard the S i (k, m, n, k , m , n ) and T i (k, m, n, k , m , n ) terms which are identically zero for some values of i. Moreover, as it will be shown in Example 2 for all other values of i, S i (k, m, n, k , m , n ), and T i (k, m, n, k , m , n ) can be expressed as a product of high-order statistical moments of the RVs a x and a y , which enables a more compact expression for (25).
The S i (k, m, n, k , m , n ) and T i (k, m, n, k , m , n ) contributions for the set in (28) are identically zero regardless of the values taken by k, m, n, k , m , and n . However, as it will be shown in Section 6, for a specific subset of values k, m, n, k , m , and n , such contributions cancel each other in the inner sums in (25) due to the complex exponential weights.
Zero contribution region It can be noted that (i) the sixth-order correlation degenerates into a product of marginal high-order moments of a x and a y and into the cross-polarisation correlation E{|a x | 2 |a y | 2 } and (ii) all elements within the set in this example contribute to the inner summation in (25) with the same set of moments, cross-polarisation correlations, and products thereof (i.e., E{|a x | 2 }, E{|a x | 4 }, E{|a y | 2 }, andE{|a x | 2 |a y | 2 }). This example illustrates how to break down each instance of the contributions S i (k, m, n, k , m , n ) and T i (k, m, n, k , m , n ), which will be then added up in Section 6.
In the remainder of this section, we first partition the six-dimensional space i ∈ {0, 1, . . . , W − 1} 6 and list all sets corresponding to nonzero elements of S i (k, m, n, k , m , n ) and T i (k, m, n, k , m , n ). As shown in Example 2, this will help highlight the contribution of a specific set in terms of high-order moments of the transmitted symbols a in (25). Then, we proceed to list all such contributions.

Set Partitioning
The six-dimensional space i ∈ {0, 1, . . . , W − 1} 6 can be partitioned in different subsets, each one uniquely defined by a partition on the set of indices (i 1 , i 2 , i 3 , i 4 , i 5 , and i 6 ). Each partition defines its corresponding subset in {0, 1, . . . , W − 1} 6 as follows: for each index partition, the indices belonging to the same subset all take the same value, whilst the indices belonging to different subsets have distinct values. This is schematically illustrated in Figure 4. For example, the subset of {0, 1, . . . , W − 1} 6 labelled by the index partition {(i 1 , This subset is shown in Figure 4 as part of L 1 . In Figure 4, the families of subsets of {0, 1, . . . , W − 1} 6 labelled L i , i = 1, 2, . . . , 4, are also highlighted. These families are characterised by subsets sharing the same cardinality of elements associated to their corresponding index partition. For example, in L 1 , all index partitions are characterised by 3 subsets, each one containing 2 indices. As shown in Example 2, this way of partitioning the set {0, 1, . . . , W − 1} 6 is useful as it separates out the different contributions of (25) based on the high-order moments of a, as it is highlighted in region L 3 of Figure 4.
Since we have 6 different indices, the number of subsets in a partition can vary from 1 to 6. Each of these subsets can contain a number of elements also ranging from 1 to 6. However, the subsets of {0, 1, . . . , W − 1} 6 , where the corresponding index partition has one or more index subsets with only one element, bring no contribution to (25) and thus can be discarded. This is illustrated in Example 1. The above class of index partitions then forms a zero contribution region, as shown in Figure 4. Such a region also includes all subsets where the corresponding index partitions contain 4 or more index subsets, as at least one of these subsets will have to contain only one element.
As shown in Figure 4, by removing the zero contribution region from {0, 1, . . . , W − 1} 6 , only 4 different families of subsets are left: This set contains all sets of elements where the indices i 1 , i 2 , . . . , i 6 , can be grouped in 3 pairs. The indices take up the same value within each pair but different values across different pairs. It can be found that this set can be partitioned in 15 different subsets C (i) 15, representing all possible distinct ways of pairing the i k indices for k = 1, 2, . . . 6. These sets are listed in Table A1 in Appendix C, where each column shows a subgroup of indices taking the same value; Table A2 in Appendix C. Each index subgroup identifies a triplet of indices assuming the same value; Table A3 in Appendix C. Each of the two index subgroups identifies the pair and the quadruple of indices assuming the same value; (iv) L 4 : {i ∈ {0, 1, . . . , W − 1} 6

Evaluation of the L-Based Contributions
In this section, we provide three examples for the computation of the contributions of a generic element in L 1 , L 2 , and L 3 . The full list of contributions in these three sets and the ones in L 4 are given in Sections 6.1-6.4.
We label each contribution as M and the subsets C we can compute (30) using the following approach: 1. we add up the terms for all i 1 , i 3 , i 5 values including all cases when i 1 , i 3 , and i 5 are equal among each other. Because of (31), these terms sum up to W 3 only when k = m + pW, n = k + pW, andm = n + pW, p ∈ Z; otherwise, they sum to 0; 2.
we subtract the terms corresponding to the cases: As an example, the number of terms defined by i 1 = i 3 , i 1 = i 5 is given by the difference between the number of all pairs i 1 , i 5 ∈ {0, 1, 2, . . . , W − 1} and the number of terms for i 1 = i 5 . According to (31), the former terms sum to W 2 only for k − m + n − k = pW, m − n = pW, whereas the latter sum to W only for k − m + n − k + m − n = pW, with p ∈ Z. In all other cases, they all bring zero contribution. Similar results are obtained for i 1 we finally subtract the terms i 1 = i 3 = i 5 , which sum to W only for k − m + n − k + m − n = pW, p ∈ Z; otherwise, they sum to 0 (see (31)).
Hence, we obtain The same approach can be followed to compute N 1 , which is, thus, given by All other contributions in L 1 can be computed using the approach used in this example.
2 , i.e., the contribution for the set C Following a similar approach as in Example 3, we compute (32) by 1. adding up the terms for all i 1 and i 4 values including all cases when i 1 and i 4 are equal to each other. These terms sum up to W 2 only when k − m + n = pW and −k + m − n = pW, with p ∈ Z; otherwise, they sum to 0; 2.
subtracting the terms corresponding to the cases i 1 = i 4 . These terms sum to W only for k − m + n − k + m − n = pW, p ∈ Z; otherwise, they sum to zero.
We, thus, obtain Following the same approach for N All other contributions in L 2 can be computed using the approach used in this example.
3 , i.e., the contribution for the values in the set C As in the L 2 case described in Example 4, in L 3 , each subset is characterized by 2 subgroups of indices. Hence, the approach followed to compute (33) is identical to (32) and gives All other contributions in L 3 can be computed using the approach used in this example.

As shown in the above examples, each contribution
g is nonzero only for a specific set of (k, m, n, k , m , n ) values which is spanned by p ∈ Z. However, the terms (k, m, n, k , m , n ) arising for all p = 0 bring a total contribution to (25) that can be considered negligible. This is due to our assumption on P( f ) being strictly band-limited (see Section 3.1) and to the magnitude of the functions product η k,m,n η * k ,m ,n (see definitions (16) and (22)). Thus, in the computations performed in the following subsections, we will restrict ourselves to the case p = 0.  Table 1.    Table 2.   Table 3.
In this section, we evaluate ∑ 4 g=1 ∑ g as well as compact the resulting expression as much as possible.
Before we proceed with computing the abovementioned summation, we remove the Kronecker g corresponding to contributions in the following subspaces: (i) k = m; (ii) n = m; iii) k = m ; and iv) n = m . These contributions correspond to so-called bias terms, i.e., they arise from a component of the field E 1 ( f , z) which is fully correlated with the transmitted field E 0 (t, 0). This component, after CDC and MF, only results in a deterministic and static complex scaling of the received constellation, which is typically compensated at the receiver even in the presence of other noise sources in the system. Thus, it does not contribute to the power of the additive zero-mean interference component that we observe at the output of the MF + sampling stage once the received constellation is synchronised (in phase and amplitude) with the transmitted one. A more detailed discussion on these bias terms can be found in [8] (Appendix A), [14] (Appendix C). Moreover, g is also removed as it only gives nonzero contribution to the PSD for frequency f = i∆ f = 0; hence, its effect on the total NLI variance vanishes as we let ∆ f → 0 (see Section 8). A total of 23 terms from the last columns of Tables 1-3 are thus removed. The remaining contributions are given in Table A4 in Appendix D.
We now compact the contributions in Table A4 by grouping the Kronecker delta products based on each correlation term they multiply. We use three pairs of curly brackets {·} to denote the terms multiplying R 3 s , R 2 s , and R s . The list of all Kronecker delta products multiplying each correlation term is shown in Table 4. The correlation terms are divided into intra-polarisation (expectations containing only a x ) and cross-polarisation terms (expectations containing both a x and a y ). Moreover, the correlations are categorised based on the specific contribution (either M or N) in (34) to which they belong.
As it can be observed in Table 4, each correlation term is associated with different delta functions. To compact these terms, we exploit a property introduced in the following proposition. Proposition 2. Let D 1 (k, m, n, k , m , n ) and D 2 (k, m, n, k , m , n ) be two Kronecker delta products of the kind shown in Table 4. If D 1 (k, m, n, k , m , n ) = D 2 (n, m, k, n , m , k ), This property also holds when applying the transformations k = n, n = k, k = n , and n = k , individually.

Proof. See Appendix E.
The property in (37) allows us to group many of the Kronecker function products in Table 4 under a single term. Namely, the Kronecker delta products in Table 4 can be grouped in subsets that are closed to property (36), since they all result in the same value of summations in (37). In particular, 14 distinct subsets can be identified for the list of Kronecker delta products in Table 4. We label these subsets, which are shown in Table 5, as D l for l = 1, 2, . . . , 14.

Intra-Polarisation Terms
In

Cross-polarisation terms
In

Correlation Terms Kronecker Delta Products
In   Proposition 3. Let D 1 (k, m, n, k , m , n ) and D 2 (k, m, n, k , m , n ) be two Kronecker delta products of the kind shown in the second column of Table 4. If D 1 (k, m, n, k , m , n ) = D 2 (k , m , n , k, m, n), then ∑ (k,m,n)∈S i (k ,m ,n )∈S i P k,m,n,k ,m ,n η k,m,n η * k ,m ,n D 1 (k, m, n, k , m , n )

Set Name Set Elements
Proof. See Appendix G. Proof. This corollary directly follows from Proposition 3 when D 1 = D 2 = D.

Final Result
Equation (40) expresses the NLI PSD for a periodic signal of period T = 1/∆ f as a function of the statistical moments and cross-polarisation correlations of a generic 4D modulation format. To generalise this result to aperiodic signals, we take the same approach in [8,17], i.e., we let the period T go to infinity or, equivalently, ∆ f → 0 (see Figure 3).
The limit of (40) for ∆ f → 0 is a limit of a distribution (a Dirac's delta comb) which is parametric in ∆ f . To rigorously evaluate such a limit, we use Lemma 1 and Theorem 2 presented in the following. In particular, Theorem 2 presents the key result of this work. Lemma 1 (Dimensionality of the sets T l,i ). The sets T l,i , for l = 1, 2, 3, for l = 4, ..., 10, and for l = 11 have dimensionalities 2-4, respectively, ∀i ∈ Z.

Proof. See Appendix H.
Theorem 2 (Limit of the distribution S x ( f , N s , L s )). For a generic aperiodic transmitted signal and a fibre transmission system like the one in Figure 2 and under the following assumptions: • i.i.d. sequence of zero-mean input DP-4D symbols a n for n ∈ Z (see Section 3.1) • rectangular (or quasi-rectangular) spectrum of the transmitted pulse p(t) (see Section 3.1) • first-order RP framework for the solution of the Manakov equation in (8) where the coefficients Φ i , i = 1, 2, 3, Ψ i , i = 1, 2, . . . , 4, Λ i , i = 1, 2, . . . , 6, and Ξ 1 as well as the integrals χ i ( f ), i = 1, 2, . . . , 11, are given in Table 8. As discussed at the end of Section 4,S y ( f ) can be obtained applying the transformation x → y, y → x to (42).
The NLI power vector Σ NLI can be obtained from the PSDs in x and y as where P( f ) is the transmitted pulse spectrum.

Proof. See Appendix I.
The results in (42) and (43) generalise the formulas presented in [9] for PM-2D modulation formats. In particular, assuming that 1. a x and a y are statistically independent, which leads to, e.g., E{a x a y } = E{a x }E{a y } = 0, 2. a x and a y are identically distributed, which, for example, leads to E{|a| 2 } E{|a x | 2 } = E{|a y | 2 }, 3.

Name Value
Correlation coefficients x }E * {a x a y }E{a * x a y }} Integrals

Discussion and Conclusions
In this paper, we have derived a comprehensive analytical expression for NLI power when a general DP-4D modulation format is transmitted. The transmitted format is only assumed to be zero-mean. The reported result extends the model in [9] by accounting for any constellation geometry and statistic in four dimensions. This extension is performed by lifting two underlying assumptions in [9] (and other existing models): (i) the transmitted formats are PM versions of a 2D format and (ii) some high-order moments of the 2D components of the transmitted modulation format, such as E{a 2 x } and E{a 3 x }, are implicitly assumed to be equal to zero. The presented results are derived in a single-channel transmission scenario. However, as it can be inferred from previous works, extending the expressions to the wavelength-division multiplexing (WDM) case does not lead to a different set of modulation-related statistical quantities in the NLI power expression. An extension of this work to the WDM transmission scenario will be addressed in a future publication.
Future work will also focus on comparing the presented model with possible heuristic extensions of existing PM-2D models to the general DP-4D case, for instance by using the 4D constellation standardised fourth-order moment (or so-called kurtosis). For such a study, a numerical validation of the model via the split-step Fourier method will certainly play a key role. Lastly, 4D constellation shaping in the optical fibre channel arguably represents the most attractive application and future research direction for the model derived in this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1
The Manakov Equation (8) can be written in the frequency domain as where * denotes a modified convolution operator between a scalar function and a vector function (For a scalar function α and a vector function B = [B x , B y ] T , the operator α * B is defined here as . Expanding the nonlinear term in (A1), we have which, for instance, for the x component, becomes Expanding the first term in (A2), we obtain which by substitution f 1 − f 2 =f 2 becomes (for notation's simplicity, the integration variablef 2 is relabelled as f 2 ) Similar to the steps in (A3) and (A4), the second term in (A2) can be found as The x component in (A1) can be then rewritten as (A5) Following the first-order RP approach to finding the solution to the Manakov equation [13], we replace the x component of the first-order expansion in (9) into (A5) and equate terms with the same power of γ. After some algebra and after substituting the A n terms with the corresponding E n using (10), we find the following set of differential equations The zeroth-order term for a single fibre span of length z is given by On the other hand, the first-order term (for the x component) E 1,x ( f , z), with initial conditions given by the transmitted signal E( f , 0), can be found solving the following differential equation The solution to (A9) with initial condition E 1,x ( f , 0) = 0 is given by where (A8) was used in the step from (A10a) to (A10b). The power profile assumed in Section 3.1 for the multi-span optical link exponentially decays with a lumped amplification at the end of each span, which brings the power back to the transmitted level. This behavior leads to a discontinuity in the function α(z) across the interface where an amplifier is located. For such a power profile, we can solve the differential Equations (A6) and (A7) by exploiting the continuity of their coefficients within each span and by imposing the initial conditions at the input of each new fibre span E 0, , for l = 1, 2, ..., N s . Here, z = lL − s and z = lL + s indicate the sections at the input and at the output of the lth amplifier, respectively. Thus, we obtain that the zeroth and first-order term after N s fibre spans are given by Using (A11) in (A12) and swapping the integral in z with the double integral in d f 1 d f 2 , we obtain The y-component of the zeroth-order term E 0,y ( f , z) and first-order term E 1,y ( f , z) can be found using the transformation x → y, y → x in (A11) and (A12), respectively. Finally, bringing together the x and y components, we have where η( f , f 1 , f 2 , z) is defined in (12), which proves the theorem.
Appendix C. Partition Tables for Sets L 1 , L 2 , and L 3 Table A1. List of all subsets in L 1 : for each subset, the index subgroups identify the corresponding pairs of indices assuming the same value. Table A2. List of all subsets in L 2 : for each subset, the index subgroups identify the corresponding triplets of indices assuming the same value. Table A3. List of all subsets in L 3 : for each subset, the index subgroups identify the corresponding pair and quadruple of indices assuming the same value. The set C

Subset Label
3 corresponds to the case discussed in Example 2.
From definitions (16) and (22), it can be easily verified that P n,m,k,n ,m ,k = P k,m,n,k ,m ,n and η n,m,k = η k,m,n . Moreover, based on the definition of the set S i in (19), it can be observed that the condition (k, m, n) ∈ S i is equivalent to (n, m, k) ∈ S i , i.e., generates the same set of triplets (k, m, n). We can thus write (A15) as ∑ (k,m,n)∈S i (k ,m ,n )∈S i P k,m,n,k ,m ,n η k,m,n η * k ,m ,n D 2 (k, m, n, k , m , n ) which proves the proposition.

Appendix H. Proof of Lemma 1
To prove the statement about dimensionality of the sets T l,i , we take as an example the cases for l = 1, 2, 3. In these instances, the sets T l,i ∀i ∈ Z are identified by 5 linear constraints on the set of variables (k, m, n, k , m , n ) ∈ {0, 1, . . . , W − 1} 6 given by (i) the 2 linearly independent constraints, (k, m, n) ∈ S i and (k , m , n ) ∈ S i and (ii) the 3 linearly independent constraints induced by the condition D (l) = 1 for l = 1, 2, 3 (see Table 7). Then, let A l [a T 1 ; a T 2 ; . . . ; a T 5 ] be a 5 × 6 matrix in which the rows a k , k = 1, 2, . . . , 5, describe each of these 5 linear combinations, x [k, m, n, k , m , n ], and y i = [i, i, 0, 0, 0]. Thus, the set T l,i can be equivalently defined as T l,i = {x ∈ {0, 1, . . . , W − 1} 6 : A l x = y i }. (A18) From (A18), it can be seen that T l,i is a vector space for which the number of dimensions is given by dim{T l,i } = 6 − rank(A l ).
For l ∈ {4, 5, . . . , 10}, we have that T l,i is identified by 4 linear constraints, 2 of them related to the S i set and 2 related to the condition D (l) = 1. Furthermore, it can be seen that a 1 − a 2 = ±a 3 ± a 4 , hence leading to rank (A l ) = 3, ∀ l ∈ {4, 5, . . . , 10} and dim{T l,i } = 3. Finally, based on similar arguments, one can show that rank(A l ) = 2 for l = 11 and dim{T l,i } = 4, which proves the lemma.
which, defining [−R s /2, R s /2] 4 , proves the theorem. The second equalities in (A28) are justified by the form of the functionsP l (see Table 8) which, due to the assumption of strictly bandlimited pulses, have limited support within the hybercube [−R s /2, R s /2] t(l)−1 . To derive the explicit expressions for χ l ( f ) in Table 8, we used (A28) and the property P(− f ) = P * ( f ), which stems from the fact that p(t) is assumed to be a real-valued function (see Section 3.1).