Non-Modal Three-Dimensional Optimal Perturbation Growth in Thermally Stratiﬁed Mixing Layers

: A non-modal transient disturbances growth in a stably stratiﬁed mixing layer ﬂow is studied numerically. The model accounts for a density gradient within a shear region, implying a heavier layer at the bottom. Numerical analysis of non-modal stability is followed by a full three-dimensional direct numerical simulation (DNS) with the optimally perturbed base ﬂow. It is found that the transient growth of two-dimensional disturbances diminishes with the strengthening of stratiﬁcation, while three-dimensional disturbances cause signiﬁcant non-modal growth, even for a strong, stable stratiﬁcation. This non-modal growth is governed mainly by the Holmboe modes and does not necessarily weaken with the increase of the Richardson number. The optimal perturbation consists of two waves traveling in opposite directions. Compared to the two-dimensional transient growth, the three-dimensional growth is found to be larger, taking place at shorter times. The nonmodal growth is observed in linearly stable regimes and, in slightly linearly supercritical regimes, is steeper than that deﬁned by the most unstable eigenmode. The DNS analysis conﬁrms the presence of the structures determined by the transient growth analysis. with the optimally perturbed base ﬂow as the initial condition. This validation shows that the second approach that involves forward/backward time integrations yields the correct results.


Introduction
The mixing layer flow is the simplest configuration allowing for the well-known Kelvin-Helmholtz (or KH) instability that takes place when two parallel flows having different velocities meet. The instability develops as a wave (or KH mode) in the shear layer separating two uniform flows. Studies of this phenomenon started in the early works of Lord Kelvin [1] and Strutt and Lord Rayleigh [2]. The phenomenon appears to be so complicated and to pose so many questions that in spite of hundreds of studies published, it remains the subject of many current researches [3][4][5][6]. For the details, the reader is referred to review papers [7][8][9].
The isothermal mixing layer is known to be linearly unstable either in the inviscid limit or starting from relatively low Reynolds numbers within the Newtonian viscous model [10]. It is also known that the flow remains linearly stable for the streamwise dimensionless wavenumber larger than unity [10,11]. At the same time, studies [12,13] showed that temporal non-modal disturbances growth can take place in the isothermal inviscid and viscous mixing layer flows, respectively. A similar mechanism was discovered recently in round jets [5].
The classical result on the linear stability of an inviscid mixing layer flow stably stratified by density (or temperature) stays that the flow becomes linearly stable for the gradient Richardson number exceeding 0.25 [10][11][12][13][14][15]. Later studies showed that along with the monotonic Kelvin-Helmholtz instability, an oscillatory instability is also possible [16]. The latter is the Holmboe instability that develops as two waves traveling in opposite

Formulation of the Problem and Numerical Techniques
We consider a flow of an incompressible Newtonian fluid in a thermally stratified mixing layer. After the Boussinesq approximation is applied, the flow is governed by the momentum, continuity, and energy equations ∂u ∂t + (u·∇)u = − 1 ρ ∇p + ν∆u + Fluids 2021, 6, x FOR PEER REVIEW 2 of 17 latter is the Holmboe instability that develops as two waves traveling in opposite directions [10,17]. Furthermore, numerical calculations [18][19][20] supported by experiments of [21] showed that a three-dimensional mode traveling at an angle to the base velocity may attain the largest growth rate. For example, a shear layer characterized by a relatively thin region of stably stratified fluid can exhibit Holmboe instability [16,21,22], whose growth rate increases with increasing stratification. Despite the valid Squire transformation proved in [23], the flow is predicted to be linearly unstable to a three-dimensional disturbance.
It was shown in [10] that the instability at small Richardson numbers is governed by two monotonic KH modes, one of which can become unstable. With further increase of the Richardson number, the pair of monotonic modes are replaced by a pair corresponding to complex conjugated eigenvalues related to the Holmboe instability. It was shown recently that KH instability plays an important role in the formation of interfaces between coating and substrate materials and metallic materials [24,25]. The non-linear evolution of KH and Holmboe instabilities was studied in quite a large number of articles (e.g., [17,[26][27][28][29][30][31][32][33]). A difference in the nonlinear disturbances growth compared with the predictions of linear stability theory was reported in [31]. Surprisingly, the issue of non-modal growth of disturbances at short times was addressed only by [14] for non-stratified mixing layers and by [32] for a stratified inviscid model. The study [32] described the non-modal amplification of the stratified mixing layer in terms of the interaction of gravity wave and vortex energy of the optimal perturbation for large wavenumbers. It was shown that the amplified transient growth can be three-dimensional and is associated with wave generation due to a vertical motion at large wavenumbers.
This study extends the previous results of [12] and [29] to viscous thermally stratified mixing layers. We show that the non-modal growth in this flow is governed mainly by the Holmboe-type modes. In the following, we examine their dynamics and interaction via consideration of a fully nonlinear three-dimensional model.
The paper is organized in the following way. In Section 2, we outline the fundamental details of the instability analysis and give a formulation of the optimal growth methodology, including the derivation of an adjoint operator. Section 3 discusses the implementation of different techniques for the calculation of the largest possible non-modal growth. Section 4 presents non-modal instability properties as a function of the Richardson number and evolution of the optimal disturbances within three-dimensional models. The summary and conclusions are derived in Section 5.

Formulation of the Problem and Numerical Techniques
We consider a flow of an incompressible Newtonian fluid in a thermally stratified mixing layer. After the Boussinesq approximation is applied, the flow is governed by the momentum, continuity, and energy equations Here, = ( , , ) is the velocity with components in the streamwise ( ), spanwise ( ) and vertical ( ) directions; and are the pressure and the temperature, is the density, ν is the kinematic viscosity, ℊ is gravitational acceleration, is the thermal expansion acceleration, is the unit vector in the -direction (vertical direction), ̅ is the mean temperature, which is defined below, is the thermal diffusivity, and ∆ denotes the Laplacian operator.
Here, u = u x , u y , u z is the velocity with components in the streamwise (x), spanwise (y) and vertical (z) directions; p and T are the pressure and the temperature, ρ is the density, ν is the kinematic viscosity, IEW 2 of 17 latter is the Holmboe instability that develops as two waves traveling in opposite directions [10,17]. Furthermore, numerical calculations [18][19][20] supported by experiments of [21] showed that a three-dimensional mode traveling at an angle to the base velocity may attain the largest growth rate. For example, a shear layer characterized by a relatively thin region of stably stratified fluid can exhibit Holmboe instability [16,21,22], whose growth rate increases with increasing stratification. Despite the valid Squire transformation proved in [23], the flow is predicted to be linearly unstable to a three-dimensional disturbance.
It was shown in [10] that the instability at small Richardson numbers is governed by two monotonic KH modes, one of which can become unstable. With further increase of the Richardson number, the pair of monotonic modes are replaced by a pair corresponding to complex conjugated eigenvalues related to the Holmboe instability. It was shown recently that KH instability plays an important role in the formation of interfaces between coating and substrate materials and metallic materials [24,25]. The non-linear evolution of KH and Holmboe instabilities was studied in quite a large number of articles (e.g., [17,[26][27][28][29][30][31][32][33]). A difference in the nonlinear disturbances growth compared with the predictions of linear stability theory was reported in [31]. Surprisingly, the issue of non-modal growth of disturbances at short times was addressed only by [14] for non-stratified mixing layers and by [32] for a stratified inviscid model. The study [32] described the non-modal amplification of the stratified mixing layer in terms of the interaction of gravity wave and vortex energy of the optimal perturbation for large wavenumbers. It was shown that the amplified transient growth can be three-dimensional and is associated with wave generation due to a vertical motion at large wavenumbers.
This study extends the previous results of [12] and [29] to viscous thermally stratified mixing layers. We show that the non-modal growth in this flow is governed mainly by the Holmboe-type modes. In the following, we examine their dynamics and interaction via consideration of a fully nonlinear three-dimensional model.
The paper is organized in the following way. In Section 2, we outline the fundamental details of the instability analysis and give a formulation of the optimal growth methodology, including the derivation of an adjoint operator. Section 3 discusses the implementation of different techniques for the calculation of the largest possible non-modal growth. Section 4 presents non-modal instability properties as a function of the Richardson number and evolution of the optimal disturbances within three-dimensional models. The summary and conclusions are derived in Section 5.

Formulation of the Problem and Numerical Techniques
We consider a flow of an incompressible Newtonian fluid in a thermally stratified mixing layer. After the Boussinesq approximation is applied, the flow is governed by the momentum, continuity, and energy equations where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation ( ) can be connected with the Euclidean norm of the vector ( ) defined by the eigenvector decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally = ( v, θ), we arrive at the dimensionless linearized equations for the velocity, v, and temperature, θ, perturbations: Equations (3a-3c) are rendered dimensionless using the scales δ v , U max , δ v /U max , ρU 2 max , and (T 2 − T 1 ) for length, velocity, time, pressure, and temperature, respectively. The Reynolds number is defined by Re = U max δ v /ν, Pr = ν/κ is the Prandtl number, Pe = RePr = U max δ v /κ is the Péclet number, Ri = γg(T 2 − T 1 )δ v /U 2 max is the bulk Richardson number, and δ = δ v /δ T is the velocity and temperature thicknesses ratio.
For the modal analysis, we assume an exponential dependence of the perturbations on time and apply the Fourier integral expansion in the infinite xand y-direction, i.e., FOR PEER REVIEW 5 of 17 where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) = (ℒ ) (0) = ( ) (0), yields ( , , ) = max where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation ( ) can be connected with the Euclidean norm of the vector ( ) defined by the eigenvector decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) = (ℒ ) (0) = ( ) (0), yields ( , , ) = max where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation ( ) can be connected with the Euclidean norm of the vector ( ) defined by the eigenvector decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, Subsequently, the linearized continuity and momentum equations can be rearranged into a system of Orr-Sommerfeld and linearized energy equations (see, e.g., [28]) that read λ Fluids 2021, 6, x FOR PEER REVIEW where * denotes the complex conjugate. In the following, the inner product of tw plex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or grow tion, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling tha where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) max = is the largest eigenvalue that corresponds to the maximum possible growth attainable at the time τ. The eigenvector corresponding to the eigenval the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two indepen proaches. The first one is the Gram matrix factorization, followed by the singular v composition (SVD) method [41,42]. The energy norm (7) where * denotes the complex conjugate. In the following, the inner pro plex vectors and associates with the energy norm (7), (8) and is g So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = tion, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and reca (ℒ ) (0) = ( ) (0), yields ( , , ) = max where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The op is self-adjoint and normal, its eigenvalues are real and non-negative, an max = is the largest eigenvalue that corresponds to the maximum growth attainable at the time τ. The eigenvector corresponding to the the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied tw proaches. The first one is the Gram matrix factorization, followed by the composition (SVD) method [41,42]. The energy norm (7) of an arbitrary can be connected with the Euclidean norm of the vector ( ) defined b decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows Here, λ is a complex time increment, α and β are real wavenumbers in the x− and y− directions respectively, L is the linearized generalized operator, L OS is the Orr-Sommerfeld operator that determines the evolution of perturbation of the vertical velocity component w; L Tw and L wθ are operators coupling the velocity perturbation w with temperature perturbations θ; L Sq θ is the Squire operator that determines the evolution of perturbation of temperature θ; T z and U zz are the first and second derivatives of temperature and velocity profiles (Equation (2)), respectively. Note that the Laplacian operator reduces to ∆ = ∂ 2 ∂z 2 − α 2 + β 2 , and ∆ −1 denotes the inverse Laplacian operator.  (10), where * ( ) is the adjoint of the ope is self-adjoint and normal, its eigenv max = is the largest eigenvalu growth attainable at the time τ. The e the disturbance which yields this ma To examine possible non-modal proaches. The first one is the Gram m composition (SVD) method [41,42]. T can be connected with the Euclidean decomposition of the matrix : ‖ ‖ is the mass matrix, p is th which is the complex-valued Hermi definite Gram matrix = , the m as the value of the 2-norm of the m maximal singular value. This method on a separated part of the spectrum. F the spectrum should be included in trum, we tried the approach of [13] an inside and are near the mixing zone the analysis only those modes whose a given from the eigenvalues of the The second approach is the iter random initial perturbation. The evo followed by the integration backwar derived in the next section. This app the spectrum and results in the growt optimal initial vector [43].
To validate the results, we appl perturbed base flow as the initial con that involves forward/backward time

Adjoint form of the Orr-Sommerfel
. The existence of an eigenvalue λ having a positive real part is a sufficient condition for linear instability. Therefore, the linear stability problem reduces to finding a Reynolds number at which the real part of the leading eigenvalue (i.e., the eigenvalue with the largest real part) changes its sign from negative to positive and then minimizing Re over all possible values of Ri, α, and β. An extended analysis of Squire's theorem [27,30] demonstrated that a three-dimensional unstable mode can become most unstable in strongly stratified viscous flow.

Non-Modal Analysis
Considering the non-modal disturbances growth, we are interested in a short-time behavior of small finite-amplitude perturbation governed by the linearized equations. We exploit the homogeneity of the xand y-direction by assuming solutions in the form FOR PEER REVIEW 5 of 17 where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) = (ℒ ) (0) = ( ) (0), yields ( , , ) = max where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation ( ) can be connected with the Euclidean norm of the vector ( ) defined by the eigenvector decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) = (ℒ ) (0) = ( ) (0), yields ( , , ) = max where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation ( ) can be connected with the Euclidean norm of the vector ( ) defined by the eigenvector decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally (z, t) exp(iαx + iβy). It can be shown that by applying the continuity equation and excluding the pressure, an arbitrary three-dimensional infinitesimal perturbation is described by where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation ( ) can be connected with the Euclidean norm of the vector ( ) defined by the eigenvector decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding = u x , u y , u z , θ . Furthermore, the two velocity components u x and u y can be replaced by the vertical vorticity component η = ∂u y /∂x − ∂u x /∂y. Thus, consideration of the perturbation vector (w, η, θ) T where u z = w is the vertical velocity component yields an equivalent formulation. The resulting linearized dynamical matrix operator L is expressed as ∂ Fluids 2021, 6, x FOR PEER REVIEW where * denotes the complex conjugate. In the following, the inner product o plex vectors and associates with the energy norm (7), (8) and is given b is the energy amplification at time = , or gr tion, is defined as ( [38][39][40]) (10), with assuming (0) = 1, and recalling t where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * max = is the largest eigenvalue that corresponds to the maximum possi growth attainable at the time τ. The eigenvector corresponding to the eigenv the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two indep proaches. The first one is the Gram matrix factorization, followed by the singula composition (SVD) method [41,42]. The energy norm (7) of an arbitrary pertur can be connected with the Euclidean norm of the vector ( ) defined by the decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gr which is the complex-valued Hermitian matrix. Then, using factorization of t definite Gram matrix = , the maximum energy growth at time can be as the value of the 2-norm of the matrix ( ) −1 and is equal to the sq maximal singular value. This method allows for the calculation of non-modal gr on a separated part of the spectrum. Following the arguments of [13], only a disc the spectrum should be included in the analysis. To separate the discrete part trum, we tried the approach of [13] and extracted the eigenvectors that have finite inside and are near the mixing zone and vanish far from it. Alternatively, we the analysis only those modes whose pseudospectrum is located at a distance s a given from the eigenvalues of the linearized problem.
where * denotes the complex conjugate. In the following, the inner pro plex vectors and associates with the energy norm (7), (8) and is So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = tion, is defined as ( [38][39][40]) (10), with assuming (0) = 1, and reca where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The op is self-adjoint and normal, its eigenvalues are real and non-negative, an max = is the largest eigenvalue that corresponds to the maximum growth attainable at the time τ. The eigenvector corresponding to the the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied tw proaches. The first one is the Gram matrix factorization, followed by the composition (SVD) method [41,42]. The energy norm (7) of an arbitrary can be connected with the Euclidean norm of the vector ( ) defined b decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is which is the complex-valued Hermitian matrix. Then, using factorizati definite Gram matrix = , the maximum energy growth at time as the value of the 2-norm of the matrix ( ) −1 and is equal to maximal singular value. This method allows for the calculation of non-m on a separated part of the spectrum. Following the arguments of [13], onl the spectrum should be included in the analysis. To separate the discret trum, we tried the approach of [13] and extracted the eigenvectors that hav inside and are near the mixing zone and vanish far from it. Alternative the analysis only those modes whose pseudospectrum is located at a dis In this case, the coupling operator L ηw and the Squire operator governing the vorticity perturbation L Sq are added to the equations system (4a-4f). The non-modal analysis examines the maximum energy growth at a particular time period of the flow disturbance, where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation ( ) can be connected with the Euclidean norm of the vector ( ) defined by the eigenvector decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows (0), over all possible initial conditions in terms of energy norm, produced by a sum of kinetic and potential energies, and assuming that the temperature perturbation alters the latter. The squared energy norm is represented by the inner product (to be defined below): where * denotes the complex conjugate. In the following, the inner product o plex vectors and associates with the energy norm (7), (8) and is given b is the energy amplification at time = , or gr tion, is defined as ( [38][39][40]) where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * max = is the largest eigenvalue that corresponds to the maximum poss growth attainable at the time τ. The eigenvector corresponding to the eigen the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two inde proaches. The first one is the Gram matrix factorization, followed by the singul composition (SVD) method [41,42]. The energy norm (7) of an arbitrary pertur . The is self-adjoint and normal, its eigenvalues are real and non-negative, max = is the largest eigenvalue that corresponds to the maxim growth attainable at the time τ. The eigenvector corresponding to the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied proaches. The first one is the Gram matrix factorization, followed by th composition (SVD) method [41,42]. The energy norm (7) of an arbitra where * denotes the complex conjugate. In the following, the inne plex vectors and associates with the energy norm (7) . T is self-adjoint and normal, its eigenvalues are real and non-negativ max = is the largest eigenvalue that corresponds to the max growth attainable at the time τ. The eigenvector corresponding the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applie proaches. The first one is the Gram matrix factorization, followed by composition (SVD) method [41,42]. The energy norm (7) of an arbit For fixed wavenumbers, the integrals in xand ycoordinates scale out of the problem, and the energy norm in terms of w, η, and θ becomes where the coefficient C is a positive definite scalar. The choice of C will not change the norm qualitatively [34,35]. It was shown in [36] that the perturbation energy is associated with the Brunt-Väisäla frequency and can be expressed in terms of the local Richardson number Ri l = Ri T z (see also [37]). Following this result, we define C = Ri l (z = 0) = Ri T z (z=0) . It is easy to see that the above energy norm is produced by the inner product ids 2021, 6, x FOR PEER REVIEW 5 of 17 where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) = (ℒ ) (0) = ( ) (0), yields where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation ( ) can be connected with the Euclidean norm of the vector ( ) defined by the eigenvector decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally perturbed base flow as the initial condition. This validation shows that the second approach that involves forward/backward time integrations yields the correct results.

Adjoint form of the Orr-Sommerfeld, Squire, and Energy Equations
where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation ( ) can be connected with the Euclidean norm of the vector ( ) defined by the eigenvector decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally perturbed base flow as the initial condition. This validation shows that the second approach that involves forward/backward time integrations yields the correct results.

Adjoint form of the Orr-Sommerfeld, Squire, and Energy Equations
j associates with the energy norm (7), (8) and is given by , x FOR PEER REVIEW 5 of 17 where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally perturbed base flow as the initial condition. This validation shows that the second approach that involves forward/backward time integrations yields the correct results.

Adjoint form of the
where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally perturbed base flow as the initial condition. This validation shows that the second approach that involves forward/backward time integrations yields the correct results.

Adjoint form of the Orr-Sommerfeld, Squire, and Energy Equations
where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally perturbed base flow as the initial condition. This validation shows that the second approach that involves forward/backward time integrations yields the correct results.

Adjoint form of the Orr-Sommerfeld, Squire, and Energy Equations
where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally perturbed base flow as the initial condition. This validation shows that the second approach that involves forward/backward time integrations yields the correct results.

Adjoint form of the Orr-Sommerfeld, Squire, and Energy Equations
where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) = (ℒ ) (0) = ( ) (0), yields where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally perturbed base flow as the initial condition. This validation shows that the second approach that involves forward/backward time integrations yields the correct results. where * denotes the complex conjugate. In the following, the inner produ plex vectors and associates with the energy norm (7), (8)  where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The oper is self-adjoint and normal, its eigenvalues are real and non-negative, and max = is the largest eigenvalue that corresponds to the maximum p growth attainable at the time τ. The eigenvector corresponding to the ei the disturbance which yields this maximal growth value.

Adjoint form of the Orr-Sommerfeld, Squire, and Energy Equations
To examine possible non-modal disturbance growth, we applied two i proaches. The first one is the Gram matrix factorization, followed by the sin composition (SVD) method [41,42]. The energy norm (7)  where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is th which is the complex-valued Hermitian matrix. Then, using factorization definite Gram matrix = , the maximum energy growth at time ca as the value of the 2-norm of the matrix ( ) −1 and is equal to t maximal singular value. This method allows for the calculation of non-mod on a separated part of the spectrum. Following the arguments of [13], only a the spectrum should be included in the analysis. To separate the discrete p trum, we tried the approach of [13] and extracted the eigenvectors that have inside and are near the mixing zone and vanish far from it. Alternatively, the analysis only those modes whose pseudospectrum is located at a distan a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integra random initial perturbation. The evolution forward is governed by the op followed by the integration backward governed by the adjoint operator, derived in the next section. This approach involves both discrete and con the spectrum and results in the growth function value at a given time and th optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS wit perturbed base flow as the initial condition. This validation shows that the s that involves forward/backward time integrations yields the correct results Substitution for E(τ) in (10), with assuming E(0) = 1, and recalling that where is the mass matrix, p is the which is the complex-valued Hermit definite Gram matrix = , the m as the value of the 2-norm of the ma maximal singular value. This method on a separated part of the spectrum. Fo the spectrum should be included in th trum, we tried the approach of [13] and inside and are near the mixing zone a the analysis only those modes whose a given from the eigenvalues of the The second approach is the itera random initial perturbation. The evol followed by the integration backward derived in the next section. This appr the spectrum and results in the growth optimal initial vector [43].
To validate the results, we apply perturbed base flow as the initial cond where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10), with assuming (0) = 1, and recalling that ( ) where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally (0), yields where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by So that ( , , ) = ‖ ‖ 2 is the energy amplification at time = , or growth function, is defined as ( [38][39][40]) Substitution for ( ) in (10) where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given from the eigenvalues of the linearized problem.
The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator ℒ and is followed by the integration backward governed by the adjoint operator, ℒ * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally perturbed base flow as the initial condition. This validation shows that the second approach that involves forward/backward time integrations yields the correct results. (11) where Φ * (τ) is the adjoint of the operator Φ(τ) (see Section 2.2). The operator Φ * (τ)Φ(τ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ||Φ * (τ)Φ(τ)|| = max i σ i = σ m is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector where * denotes the complex conjugate. In the following, the inner pr plex vectors and associates with the energy norm (7), (8)  where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The o is self-adjoint and normal, its eigenvalues are real and non-negative, a max = is the largest eigenvalue that corresponds to the maximum growth attainable at the time τ. The eigenvector corresponding to th the disturbance which yields this maximal growth value.

Adjoint form of the Orr-Sommerfeld, Squire, and Energy Equations
To examine possible non-modal disturbance growth, we applied tw proaches. The first one is the Gram matrix factorization, followed by the composition (SVD) method [41,42]. The energy norm (7) of an arbitrary can be connected with the Euclidean norm of the vector ( ) defined decomposition of the matrix : ‖ ‖ where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is which is the complex-valued Hermitian matrix. Then, using factorizat definite Gram matrix = , the maximum energy growth at time as the value of the 2-norm of the matrix ( ) −1 and is equal t maximal singular value. This method allows for the calculation of non-m on a separated part of the spectrum. Following the arguments of [13], on the spectrum should be included in the analysis. To separate the discre trum, we tried the approach of [13] and extracted the eigenvectors that ha inside and are near the mixing zone and vanish far from it. Alternative the analysis only those modes whose pseudospectrum is located at a dis a given from the eigenvalues of the linearized problem. m corresponding to the eigenvalue σ m is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation q(t) can be connected with the Euclidean norm of the vector k(t) defined by the eigenvector decomposition of the matrix L: ||q|| 2 2 = ∑ i,j k * i p * j , p i k j , as follows where M is the mass matrix, p is the eigenvector, and S = p, Mp is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix S = F H F, the maximum energy growth at time τ can be calculated as the value of the 2-norm of the matrix Fexp(Λτ)F −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of the spectrum should be included in the analysis. To separate the discrete part of the spectrum, we tried the approach of [13] and extracted the eigenvectors that have finite amplitude inside and are near the mixing zone and vanish far from it. Alternatively, we included in the analysis only those modes whose pseudospectrum is located at a distance smaller than a given ε from the eigenvalues of the linearized problem. The second approach is the iterative forward/backward time integration applying a random initial perturbation. The evolution forward is governed by the operator L and is followed by the integration backward governed by the adjoint operator, L * . The latter is derived in the next section. This approach involves both discrete and continuous parts of the spectrum and results in the growth function value at a given time and the corresponding optimal initial vector [43].
To validate the results, we apply the full three-dimensional DNS with the optimally perturbed base flow as the initial condition. This validation shows that the second approach that involves forward/backward time integrations yields the correct results.

Adjoint form of the Orr-Sommerfeld, Squire, and Energy Equations
We derive the adjoint operator considering the derivation of the adjoint form of the first-and second-order spatial derivatives and the first-order time derivatives. As noted by [44], the direct and adjoint parabolic problems have opposite directions of stable timelike evolution. From the definition (5a,b) and using (9) the adjoint operator L * can be obtained as follows: Fluids 2021, 6, x FOR PEER REVIEW 6 of 17 We derive the adjoint operator considering the derivation of the adjoint form of the first-and second-order spatial derivatives and the first-order time derivatives. As noted by [44], the direct and adjoint parabolic problems have opposite directions of stable timelike evolution. From the definition (5a,b) and using (9) the adjoint operator ℒ * can be obtained as follows: Then, the adjoint system may be obtained by the integration by parts and reads where ∆ 2 = 2 2 + 2 2 is the 2D Laplacian and is the first derivative of the velocity profile (Equation (2)). Evidently, the adjoint operator ℒ * of the adjoint system * ⁄ = −(ℒ ) * in terms of a periodical in the stream-and span-wise direction perturbation will be expressed as where the entries of the matrices are Finally, in accordance with Equation (11), the time integration forward/backward iterative procedure will result in the optimal initial vector, which maximizes growth at the specified time and a given set of parameters ( , , , , ) [43,45].

Calculations Techniques
For the non-modal analysis, the linear system (Equation (5)) was discretized via a second-order central finite difference method. The eigenvalues and eigenvectors of matrix ℒ were calculated using the QR algorithm. The verification of the method and test calculation for the isothermal mixing layer can be found in [13]. Table 1 presents an example of the convergence of four leading eigenvalues belonging to the discrete part of the spectrum for a non-isothermal case. It is shown that the use of 1000 grid points yields four converged decimal digits for the first mode; however, the convergence slows down for the next modes. Despite this, it is shown that the growth function calculated by the iterative integration method converges to the value of 21.71 already for 700 grid points. (13) Then, the adjoint system may be obtained by the integration by parts and reads where ∆ 2D = ∂ 2 ∂x 2 + ∂ 2 ∂y 2 is the 2D Laplacian and U z is the first derivative of the velocity profile (Equation (2)). Evidently, the adjoint operator L * of the adjoint system ∂ where * ( ) is the adjoint of the opera is self-adjoint and normal, its eigenval max = is the largest eigenvalue t growth attainable at the time τ. The eige the disturbance which yields this maxim To examine possible non-modal d proaches. The first one is the Gram ma composition (SVD) method [41,42]. The can be connected with the Euclidean n decomposition of the matrix : ‖ ‖ 2 2 = ‖ ‖ 2 = * = where is the mass matrix, p is the which is the complex-valued Hermitia definite Gram matrix = , the ma as the value of the 2-norm of the mat maximal singular value. This method a on a separated part of the spectrum. Fo the spectrum should be included in th * /∂t = − L , 6, x FOR PEER REVIEW 5 of 17 where * denotes the complex conjugate. In the following, the inner product of two complex vectors and associates with the energy norm (7), (8) and is given by where * ( ) is the adjoint of the operator ( ) (see Section 2.2). The operator * ( ) ( ) is self-adjoint and normal, its eigenvalues are real and non-negative, and ‖ * ( ) ( )‖ = max = is the largest eigenvalue that corresponds to the maximum possible relative growth attainable at the time τ. The eigenvector corresponding to the eigenvalue is the disturbance which yields this maximal growth value.
To examine possible non-modal disturbance growth, we applied two independent approaches. The first one is the Gram matrix factorization, followed by the singular value decomposition (SVD) method [41,42]. The energy norm (7) of an arbitrary perturbation ( ) can be connected with the Euclidean norm of the vector ( ) defined by the eigenvector decomposition of the matrix : ‖ ‖ 2 2 = ∑ * 〈 * , 〉 , , as follows where is the mass matrix, p is the eigenvector, and = 〈 , 〉 is the Gram matrix, which is the complex-valued Hermitian matrix. Then, using factorization of the positive definite Gram matrix = , the maximum energy growth at time can be calculated as the value of the 2-norm of the matrix ( ) −1 and is equal to the square of its maximal singular value. This method allows for the calculation of non-modal growth based on a separated part of the spectrum. Following the arguments of [13], only a discrete part of * in terms of a periodical in the stream-and span-wise direction perturbation will be expressed as where the entries of the matrices are Finally, in accordance with Equation (11), the time integration forward/backward iterative procedure will result in the optimal initial vector, which maximizes growth at the specified time and a given set of parameters (α, β, Re, Ri, Pe) [43,45].

Calculations Techniques
For the non-modal analysis, the linear system (Equation (5)) was discretized via a second-order central finite difference method. The eigenvalues and eigenvectors of matrix L were calculated using the QR algorithm. The verification of the method and test calculation for the isothermal mixing layer can be found in [13]. Table 1 presents an example of the convergence of four leading eigenvalues belonging to the discrete part of the spectrum for a non-isothermal case. It is shown that the use of 1000 grid points yields four converged decimal digits for the first mode; however, the convergence slows down for the next modes. Despite this, it is shown that the growth function calculated by the iterative integration method converges to the value of 21.71 already for 700 grid points. The calculated spectrum was examined via analysis of its pseudospectrum [46]. The ε-pseudospectrum was computed as the minimal singular value of the matrix (λI − L), as was proposed in [47][48][49]. If u s is the singular vector of the matrix corresponding to the minimal singular value, then ε = ||(λI − L)u s || 2 estimates the accuracy of the calculation of the eigenvalue λ. It was found that for the stratified mixing layer configuration, the relatively large part of the eigenvalues corresponds to a relatively large ε-pseudospectrum (ε = 10 −2 , 10 −3 ), consequently, the eigenvalues calculation accuracy can be even more demanding than in the isothermal case. Results of [13] show that only the eigenmodes with the pseudospectrum ε < 10 −6 contributed to the optimal growth in isothermal flow. This is the main limitation of applying the SVD-based method [41] here. A series of tests show that we cannot use the SVD-based method based on a full spectrum for the same reasons as in the case of isothermal flow [10], i.e., we again have to separate part of the spectrum that corresponds to a non-accurate replication of the continuous modes. Figure 1 shows that the growth functions calculated based on extracting the vectors vanishing outside the shear layer ( Figure 1a) and based on extracting the vectors corresponding different ε values (Figure 1b) do not yield converged consistent results within the SVD-based method. It is emphasized, however, that despite the obvious scatter in the results shown in Figure 1, they do show that significant non-modal growth can be expected even in cases of a strong, stable stratification. It also supports the results of [32], indicating that optimal perturbation can be located outside the shear layer.
We succeeded in obtaining reliable converged quantitative results using the iterative integration method, which, similarly to the isothermal case [13], converges to within three decimal places already on the 700 nodes grid. This method is used for the computation of the results presented below. It should be noted that this method, compared to the SVD-based one, requires much longer computational runs. To validate the non-modal analysis results, the direct numerical simulations taking the optimal vector as an initial condition were carried out. The fully non-linear time-dependent problem was solved using the approach described in [13].
growth functions calculated based on extracting the vectors vanishing outside the shear layer ( Figure 1a) and based on extracting the vectors corresponding different values (Figure 1b) do not yield converged consistent results within the SVD-based method. It is emphasized, however, that despite the obvious scatter in the results shown in Figure 1, they do show that significant non-modal growth can be expected even in cases of a strong, stable stratification. It also supports the results of [32], indicating that optimal perturbation can be located outside the shear layer.

Results
In the following, we describe results on non-modal growth in the stratified mixing layer for Pr = 9, Re = 1000, and R δ = √ Pr = 3, assuming the fluid properties close to water, where heat diffuses much slower than momentum. The test calculations (Table 2) show that the least unstable eigenmode (or KH mode) is only slightly affected by the increase of the Prandtl number. The results of [48] also show that for a bounded channel flow, heat diffusivity does not affect linear stability. Conversely, the eigenvalues corresponding to the oscillatory Holmboe modes change noticeably with the increase of the Prandtl number, followed by a narrowing of the layer where the temperature varies.

Non-Modal Instability as a Function of the Richardson Number
In this section, we present a characteristic case of non-modal energy growth for varying Ri and fixed α, β, Pe, and R δ . Results for the target time t = 10 are summarized in Table 3 (2D case) and Table 4 (3D case). It is seen from Table 2 that the flow is unstable to 2D perturbations (α = 0.5, β = 0, Re = 1000) for Ri ≤ 0.8. At small Richardson numbers (Ri < 0.1), the monotonic KH mode grows similarly to the isothermal case. With the increase of Ri, the two KH modes turn into a pair of Holmboe modes [10]. Further increase of the Richardson number, Ri > 0.8, makes the flow linearly stable; however, the non-modal growth remains possible. Although all the eigenvalues are negative, the Holmboe modes remain to be the least stable ones. An examination of the growth function, G, at a relatively short time, t = 10, shows that the highest amplification occurs at a small Richardson number, and the growth function decreases with the increase of Ri. Note that non-modal amplification reaches its value, for example, 76.4, for Ri = 0.1, while amplification of the least unstable mode at t = 10 is much smaller, approximately 1.4. This result may be important for the choice of the initial vector for non-linear calculation where the least unstable mode is usually taken as the initial condition. Based on the foregoing results, we argue that at small times the non-modal growth is expected to dominate, i.e., the optimal vector will grow faster than just one least unstable eigenmode. With the steepening stratification, i.e., increase of the Richardson number, a contribution of two-dimensional disturbances into transient non-modal growth diminishes. Table 3. The variations of the leading eigenvalues and values of amplification function, G, (at an arbitrary time t = 10) with increasing Richardson number for α = 0.5, β = 0, R e = 1000 (2D case). Based on the results for the isothermal mixing layer [13], where the largest amplification is yielded by 3D optimal disturbance, we examine a particular parameter set, α = 0.5, β = 1, Re = 1000, and vary only the Richardson number. These parameters characterize the perturbation that attains a large non-modal growth in the case of isothermal flow. Table 4 shows that the leading mode belongs to the continuous spectrum, Imag (λ) = ±α, so that the leading monotone (λ = 0, KH) and oscillatory (λ = 0, Holmboe) modes are presented in additional columns. It is seen that the oscillating Holmboe modes attain larger growth rates than the monotone ones. Computations of the growth function G (t = 10), using the adjoint forward/backward time integration technique, show large non-modal growth of linearly stable 3D disturbances for Ri < 0.01. The growth function decreases with the increase of the Richardson number for Ri ≤ 0.2. Above the value Ri = 0.3, the growth function G (t = 10) starts to increase. We observe that the value of G (t = 10) continues to increase with increasing Ri.

Richardson
It should be noted that for a relatively strong stratification, in the interval of 0.7 ≤ Ri ≤ 1, we found an additional linear instability. A similar result was reported in [19], where it was shown that stratified shear flow could be unstable to 3D disturbances propagating at an angle to the mean flow. Linear instability at large Richardson numbers for 2D perturbation was also reported in [37,50].
We emphasize that, similarly to the 2D case, the 3D non-modal growth at short times can be significantly larger than the modal one. The least unstable modes in the linearly unstable region are characterized by a minimal growth rate, making them almost neutral. For example, for Ri = 0.7 and Ri = 1, non-modal amplification reaches values 55 and 73.9, while exponential amplification of the least unstable mode at t = 10 is about unity in both cases.
Similar to the isothermal case, the values of the growth function in the 3D case are larger than in the 2D case for stratified flow so that the strongest amplification could be reached via 3D optimal perturbation.

Transient Dynamics of Optimal Perturbation
We start exploring the optimal vector evolution from the two-dimensional case, β = 0, and consider Ri = 1, for which only non-modal transient growth exists. Figure 2 shows the amplitude of the initial optimal disturbances in terms of the temperature and uand wvelocity components. We observe that amplitudes of the optimal disturbances are symmetric with respect to the mixing layer midline with the maxima of both of them shifted from the symmetry plane z = 0. These shifts indicate that the non-normal growth is governed mainly by the Holmboe modes, whose patterns are discussed in [10]. Contrary to the isothermal case [13], the optimal vector components are not localized only in the shear layer and are noticeably wider. This explains why the selection of vectors lying inside the shear layer is not justified for the stratified model considered here. Similar to the isothermal case, the values of the growth function in the 3D case are larger than in the 2D case for stratified flow so that the strongest amplification could be reached via 3D optimal perturbation.

Transient Dynamics of Optimal Perturbation
We start exploring the optimal vector evolution from the two-dimensional case, β = 0, and consider Ri = 1, for which only non-modal transient growth exists. Figure 2 shows the amplitude of the initial optimal disturbances in terms of the temperature and u-and w-velocity components. We observe that amplitudes of the optimal disturbances are symmetric with respect to the mixing layer midline with the maxima of both of them shifted from the symmetry plane z = 0. These shifts indicate that the non-normal growth is governed mainly by the Holmboe modes, whose patterns are discussed in [10]. Contrary to the isothermal case [13], the optimal vector components are not localized only in the shear layer and are noticeably wider. This explains why the selection of vectors lying inside the shear layer is not justified for the stratified model considered here. To investigate the non-modal growth mechanism, we plot spatial patterns of optimal vector time evolution. Figure 3 shows that the u-velocity component's evolution is represented by two waves traveling in the opposite direction to the mean flow. Such a behavior can be a replication of the well-known Holmboe instability mechanism [16]. To investigate the non-modal growth mechanism, we plot spatial patterns of optimal vector time evolution. Figure 3 shows that the u-velocity component's evolution is represented by two waves traveling in the opposite direction to the mean flow. Such a behavior can be a replication of the well-known Holmboe instability mechanism [16]. Following [51,52] explanations of the linear stability dynamics using the "buoyancyvorticity wave interaction approach" in an inviscid flow, we presented the −vorticity component at four indicative time moments: initial state, growth, maximum at t = 10, and further decaying (Figure 4). Note that similar to the isothermal case [13,14], results for the Following [51,52] explanations of the linear stability dynamics using the "buoyancyvorticity wave interaction approach" in an inviscid flow, we presented the η y − vorticity component at four indicative time moments: initial state, growth, maximum at t = 10, and further decaying (Figure 4). Note that similar to the isothermal case [13,14], results for the stratified layer show that the initial optimal vector structures are elongated against the shear in the streamwise direction. This allows us to attribute the initial amplification to the Orr mechanism [14,53]. With the increase of time, the perturbation develops into two series of vortices whose origins are located at the lines z = ±1, i.e., at the edges of the temperature stratification layer. At the same time, we observe the development of another vortex series with the origins located at the midline. These vortices attain larger amplitude and change their patterns from elongating against the shear to tilting with the shear. Furthermore, the maximal energy growth corresponds to the vertically positioned midline vortices generating vertical velocity. Hence, our calculations confirm the suggestion of [51,52] that vorticity growth occurs due to persistent vertical velocity produced by the propagated waves in the two strong density gradient regions. shear in the streamwise direction. This allows us to attribute the initial amplification to the Orr mechanism [14,53]. With the increase of time, the perturbation develops into two series of vortices whose origins are located at the lines z = ±1, i.e., at the edges of the temperature stratification layer. At the same time, we observe the development of another vortex series with the origins located at the midline. These vortices attain larger amplitude and change their patterns from elongating against the shear to tilting with the shear. Furthermore, the maximal energy growth corresponds to the vertically positioned midline vortices generating vertical velocity. Hence, our calculations confirm the suggestion of [51,52] that vorticity growth occurs due to persistent vertical velocity produced by the propagated waves in the two strong density gradient regions.  Figure 5 illustrates snapshots of the temperature field where the base temperature profile and the 2D optimal disturbance are superposed. The time evolution of optimal temperature disturbance θ exhibits an extension of the initial temperature layer due to two waves propagating in opposite directions along the x-axis. Such perturbation leads to elevation of the cold fluid and moving down of the warm fluid, which is driven by the vertical velocity component. During the evolution, the perturbation extracts energy from the mean flow, followed by an increase of the growth function. After passing the maximal growth stage, the cold fluid is naturally descending, which corresponds to the growth function decrease. Owing to the small values of the Holmboe modes decay rates, these up-and-down motions of the fluid persist for a relatively long time. The evolution at larger times results in merged heated structures below and above the midplane, corresponding to an increase in the amplitude of the oscillations and an increase of kinetic and potential energy. At later times, the amplitude of the non-modal disturbance decreases, as expected.  Figure 5 illustrates snapshots of the temperature field where the base temperature profile and the 2D optimal disturbance are superposed. The time evolution of optimal temperature disturbance θ exhibits an extension of the initial temperature layer due to two waves propagating in opposite directions along the x-axis. Such perturbation leads to elevation of the cold fluid and moving down of the warm fluid, which is driven by the vertical velocity component. During the evolution, the perturbation extracts energy from the mean flow, followed by an increase of the growth function. After passing the maximal growth stage, the cold fluid is naturally descending, which corresponds to the growth function decrease. Owing to the small values of the Holmboe modes decay rates, these up-and-down motions of the fluid persist for a relatively long time. The evolution at larger times results in merged heated structures below and above the midplane, corresponding to an increase in the amplitude of the oscillations and an increase of kinetic and potential energy. At later times, the amplitude of the non-modal disturbance decreases, as expected.  Figure 6 illustrates the amplitudes of the optimal vector components for Ri = 2, Re = 1000, α = 0.5, β = 1, the parameters characteristic for 3D transient growth. Similar to the 2D case, the optimal vector amplitude has two symmetric maxima, indicating the dominance of the Holmboe-like modes in the initial optimal disturbance. It is shown that the spanwise vorticity component, , has a maximum also at z = 0, supporting the suggestions of [52]. The amplitude of all components of the 3D initial optimal vector decays beyond = ±10, making it narrower than the 2D optimal vector. Growth function of the three-dimensional optimal temperature perturbation and patterns of its time evolution are illustrated in Figure 7 and Figure 8, respectively, for the case α = 0.5, β = 1, Re = 1000, Ri = 2. It is shown that the growth function value at t = 10 is similar to those obtained using the forward/backward time integration method, G = 122 (Table 4), which justifies the results obtained using two approaches. As already mentioned, the growth rates of the Holmboe modes are low so that the optimal perturbation energy Figure 5. Snapshots of the base flow superposed with 2D optimal temperature perturbation calculated for the maximal growth at t max = 10. Ri = 1, Re = 1000, α = 0.5, β = 0. The blue color corresponds to cold fluid; the red color corresponds to warm fluid. Figure 6 illustrates the amplitudes of the optimal vector components for Ri = 2, Re = 1000, α = 0.5, β = 1, the parameters characteristic for 3D transient growth. Similar to the 2D case, the optimal vector amplitude has two symmetric maxima, indicating the dominance of the Holmboe-like modes in the initial optimal disturbance. It is shown that the spanwise vorticity component, η y , has a maximum also at z = 0, supporting the suggestions of [52]. The amplitude of all components of the 3D initial optimal vector decays beyond z = ±10, making it narrower than the 2D optimal vector.  Figure 6 illustrates the amplitudes of the optimal vector components for Ri = 2, Re = 1000, α = 0.5, β = 1, the parameters characteristic for 3D transient growth. Similar to the 2D case, the optimal vector amplitude has two symmetric maxima, indicating the dominance of the Holmboe-like modes in the initial optimal disturbance. It is shown that the spanwise vorticity component, , has a maximum also at z = 0, supporting the suggestions of [52]. The amplitude of all components of the 3D initial optimal vector decays beyond = ±10, making it narrower than the 2D optimal vector. Growth function of the three-dimensional optimal temperature perturbation and patterns of its time evolution are illustrated in Figure 7 and Figure 8, respectively, for the case α = 0.5, β = 1, Re = 1000, Ri = 2. It is shown that the growth function value at t = 10 is similar to those obtained using the forward/backward time integration method, G = 122 (Table 4), which justifies the results obtained using two approaches. As already mentioned, the growth rates of the Holmboe modes are low so that the optimal perturbation energy Figure 6. Amplitudes of the initial optimal disturbance for Ri = 2, Re = 1000, α = 0.5, β = 1 yielding the maximal growth at t max = 10; (a) three components of velocity magnitude; (b) three vorticity components; (c) amplitude of the temperature disturbance.
Growth function of the three-dimensional optimal temperature perturbation and patterns of its time evolution are illustrated in Figures 7 and 8, respectively, for the case α = 0.5, β = 1, Re = 1000, Ri = 2. It is shown that the growth function value at t = 10 is similar to those obtained using the forward/backward time integration method, G = 122 (Table 4), which justifies the results obtained using two approaches. As already mentioned, the growth rates of the Holmboe modes are low so that the optimal perturbation energy slowly decays at longer times. Similar to the 2D case, the three-dimensional structures exhibit two lines of vortices ordered along with the shear layer and tending to be tilted against the shear slope at the initial stage, constantly changing inclination at later times. At each time moment, corresponding to the maximum of the growth function, the temperature layer's width increases; however, it does not exceed the width observed in the 2D case. Figure 9 shows the evolution of the streamwise velocity at the same times as in Figure 7. The optimal perturbation is composed of two waves traveling in the opposite direction. The wave's amplitude grows and decreases, reflecting in the non-monotonic growth of kinetic energy at longer times. slowly decays at longer times. Similar to the 2D case, the three-dimensional structures exhibit two lines of vortices ordered along with the shear layer and tending to be tilted against the shear slope at the initial stage, constantly changing inclination at later times. At each time moment, corresponding to the maximum of the growth function, the temperature layer's width increases; however, it does not exceed the width observed in the 2D case. Figure 9 shows the evolution of the streamwise velocity at the same times as in Figure 7. The optimal perturbation is composed of two waves traveling in the opposite direction. The wave's amplitude grows and decreases, reflecting in the non-monotonic growth of kinetic energy at longer times. Contrary to the 2D case, the spanwise vorticity component of initial optimal perturbation appears as three lines of vortices with the centers located in the middle plane and the = ±1 planes. When the growth function increases, e.g., at times t = 0, 5, 20, 35, the vortices are always oriented against the shear. The central vortex splits into two smaller vortices, moves from the midline above and below, and scrolls. At times t = 10, 25, at which the growth function attains a maximum, the vortices are strongly tilted with the shear, thus supporting the distinctions between 2D and 3D non-modal growths. Figure 7. Growth function of the 3D optimal temperature perturbation superposed with base flow profile for Ri = 2, Re = 1000, α = 0.5, β = 1. The numbers represent different stages of increase and decrease in growth function. The optimal disturbances corresponding to some of these stages are shown in Figure 8.  . Growth function of the 3D optimal temperature perturbation superposed with base flow profile for Ri = 2, Re = 1000, α = 0.5, β = 1. The numbers represent different stages of increase and decrease in growth function. The optimal disturbances corresponding to some of these stages are shown in Figure 8.
slowly decays at longer times. Similar to the 2D case, the three-dimensional structures exhibit two lines of vortices ordered along with the shear layer and tending to be tilted against the shear slope at the initial stage, constantly changing inclination at later times. At each time moment, corresponding to the maximum of the growth function, the temperature layer's width increases; however, it does not exceed the width observed in the 2D case. Figure 9 shows the evolution of the streamwise velocity at the same times as in Figure 7. The optimal perturbation is composed of two waves traveling in the opposite direction. The wave's amplitude grows and decreases, reflecting in the non-monotonic growth of kinetic energy at longer times. Contrary to the 2D case, the spanwise vorticity component of initial optimal perturbation appears as three lines of vortices with the centers located in the middle plane and the = ±1 planes. When the growth function increases, e.g., at times t = 0, 5, 20, 35, the vortices are always oriented against the shear. The central vortex splits into two smaller vortices, moves from the midline above and below, and scrolls. At times t = 10, 25, at which the growth function attains a maximum, the vortices are strongly tilted with the shear, thus supporting the distinctions between 2D and 3D non-modal growths. Figure 7. Growth function of the 3D optimal temperature perturbation superposed with base flow profile for Ri = 2, Re = 1000, α = 0.5, β = 1. The numbers represent different stages of increase and decrease in growth function. The optimal disturbances corresponding to some of these stages are shown in Figure 8.

Conclusions
The non-modal disturbances growth and transient dynamics of optimal perturbation in stratified viscous mixing layers were investigated. The flow exhibits strong transient growth for streamwise and spanwise wavenumbers, at which the flow is either asymptotically or neutrally stable. Comparing the calculated flow structures with those observed in several previous experimental and numerical studies at the parameters corresponding to linearly unstable regimes, we conclude that the mixing layer flow at early stages of the linear instability development can be strongly affected by the temporal disturbances growth when the optimal perturbation is localized inside the shear zone. The main findings of the performed study are as follows. Contrary to the 2D case, the spanwise vorticity component of initial optimal perturbation appears as three lines of vortices with the centers located in the middle plane and the z = ±1 planes. When the growth function increases, e.g., at times t = 0, 5, 20, 35, the vortices are always oriented against the shear. The central vortex splits into two smaller vortices, moves from the midline above and below, and scrolls. At times t = 10, 25, at which the growth function attains a maximum, the vortices are strongly tilted with the shear, thus supporting the distinctions between 2D and 3D non-modal growths.

Conclusions
The non-modal disturbances growth and transient dynamics of optimal perturbation in stratified viscous mixing layers were investigated. The flow exhibits strong transient growth for streamwise and spanwise wavenumbers, at which the flow is either asymptotically or neutrally stable. Comparing the calculated flow structures with those observed in several previous experimental and numerical studies at the parameters corresponding to linearly unstable regimes, we conclude that the mixing layer flow at early stages of the linear instability development can be strongly affected by the temporal disturbances growth when the optimal perturbation is localized inside the shear zone. The main findings of the performed study are as follows.

•
It is known that increasing stratification stabilizes two-dimensional perturbations, which are linearly unstable in the isothermal case. We found that with an increase of the Richardson number, these perturbations exhibit a non-modal growth at relatively short times. This non-modal growth is governed mainly by the Holmboe modes. The non-modal growth weakens and then decays with a further increase of the Richardson number.

•
We examined the effect of stratification on linearly stable three-dimensional disturbances, which were found to attain large non-modal amplifications in the stably stratified configuration. It was found that the largest amplification is reached by 3D optimal perturbations, whose growth functions are noticeably larger than those computed for the 2D stability problem. • It was shown that at short times the non-modal growth of the optimal disturbance gains a larger amplitude than the leading eigenvector growing due to the linear instability. This means that the optimal vector can be a better choice for the initial conditions applied in fully non-linear computations.
In the case of stratified flow, we could not find a clear criterion to define which part of the spectrum should be taken into account to obtain a correct and numerically converged growth function. We could not obtain comparable results using SVD decompositionbased approaches for studying the non-modal growth, trying different extractions of the (seemingly) discrete part of the spectrum. This shows that the stratified mixing layer is a significantly more complicated problem, and the question about which part of the spectrum contributes to the non-modal growth is yet to be investigated.