A Neural-network-based Nonlinear Adaptive State-observer for Pressurized Water Reactors

Although there have been some severe nuclear accidents such as Three Mile Island (USA), Chernobyl (Ukraine) and Fukushima (Japan), nuclear fission energy is still a source of clean energy that can substitute for fossil fuels in a centralized way and in a great amount with commercial availability and economic competitiveness. Since the pressurized water reactor (PWR) is the most widely used nuclear fission reactor, its safe, stable and efficient operation is meaningful to the current rebirth of the nuclear fission energy industry. Power-level regulation is an important technique which can deeply affect the operation stability and efficiency of PWRs. Compared with the classical power-level controllers, the advanced power-level regulators could strengthen both the closed-loop stability and control performance by feeding back the internal state-variables. However, not all of the internal state variables of a PWR can be obtained directly by measurements. To implement advanced PWR power-level control law, it is necessary to develop a state-observer to reconstruct the unmeasurable state-variables. Since a PWR is naturally a complex nonlinear system with parameters varying with power-level, fuel burnup, xenon isotope production, control rod worth and etc., it is meaningful to design a nonlinear observer for the PWR with adaptability to system uncertainties. Due to this and the strong learning capability of the multi-layer perceptron (MLP) neural network, an MLP-based nonlinear adaptive observer is given for PWRs. Based upon Lyapunov stability theory, it is proved theoretically that this newly-built observer can provide bounded and convergent state-observation. This observer is then applied to the state-observation of a special PWR, i.e., the nuclear heating reactor (NHR), and numerical simulation results not only verify its feasibility but also give the relationship between the observation performance and observer parameters.


Introduction
The growing requirements for electricity and the pollution caused by burning fossil fuels has led to a renaissance of nuclear energy industry, even if there have been some severe accidents such as Three Mile Island (USA), Chernobyl (Ukraine) and Fukushima (Japan).Since power-level control is a quite crucial technique which guarantees operation stability and efficiency for nuclear reactors, developing high performance power-level regulators is quite meaningful for the current rebirth of nuclear energy industry.Compared with the classical static output feedback power-level control laws, the advanced power regulation strategies have the potential of strengthening both the closed-loop stability and control performance by feeding back the internal system state-variables.Due to the absence of adequate sensors, some state-variables associated with the dynamics of a nuclear reactor are not available for measurement.In order to implement the advanced power-level control strategies for stronger dynamic performance, some observation structure should be used to reconstruct the state-variables that cannot be obtained directly through measurement.In this case, the simpler solution is to utilize the linear observers such as the Luenberger observer [1] and Kalman filter [2,3].However, the dynamic behavior of a given nuclear reactor exhibits strong nonlinearity and it depends on many factors such as power-level, fuel burnup, etc.The linear observers can only provide satisfactory performance in a small neighborhood near an operating point.Thus, if large variations of the system state variables are required, especially in the case of load following, the previous option is not effective anymore, and nonlinear observers should be developed.Shtessel gave a sliding mode observer to construct a dynamic output feedback loop with a static state-feedback sliding mode controller for regulating the power-level of space nuclear reactor TOPAZ II [4].Etchepareborda applied the high gain observer to design a nonlinear model predictive power-level control for a pressurized water reactor (PWR)-like research reactor [5].Dong proposed the dissipation-based high gain filter (DHGF) for the state-observation of PWRs [6], and then applied the DHGF to build the dynamic output-feedback power-level control laws [7,8].However, the precondition of applying these nonlinear observers is to know the accurate lump-parameter dynamic model of a given nuclear reactor.Although some schemes have been introduced to strengthen the adaptation performance of nonlinear observers to system uncertainties, there are strong constraints on the form of system uncertainties [9].Therefore, more advanced schemes should be given to further improve the adaptability of nonlinear observation.
Artificial neural networks (ANNs), inspired by biological neural networks, are composed of simple processing elements called neurons normally arranged in layers and interconnected to each other by some weighted connections.This architecture along with a learning algorithm for adjusting the connection weights, exhibits some interesting properties such as learning, approximation and parallel distributed processing capability.The radial basis function (RBF) network and multi-layer perceptron (MLP) network are two widely utilized ANNs.It has been proven theoretically that both the RBF [10,11] and MLP [12][13][14] networks can approximate a wide range of nonlinear functions to any desired degree of accuracy under certain conditions.In recent years, ANNs have also been applied to nuclear engineering, particularly, for reactor control.Ku, Lee and Edwards applied the diagonal recurrent neural network (DRNN) to a nuclear reactor model to improve its temperature response, and here the DRNNs must be trained offline by a linearized reactor model and a pre-designed optimal temperature control [15].Arab-Alibeik and Setayeshi designed a neural adaptive inverse controller for regulating the power-level of a PWR, and here the ANN was also trained offline by a reactor model [16].From the above works in applying ANN in nuclear engineering, it was shown that the identification must be sufficiently accurate before control action is initiated.However, in practical control applications, it is desirable to have systematic method of ensuring the stability and robustness of the overall system.In the past few years, several ANN-based control laws for nonlinear systems have been proposed based upon Lyapunov stability theory.One main advantage of these schemes is that the adaptive laws were derived based on the Lyapunov synthesis method and thus can provide the closed-loop stability.Ge et al. proposed an adaptive state-feedback control law for a large class of nonlinear systems based on the RBF network, and the regulating error was proved to converge to a small neighborhood of the origin by using Lyapunov stability theory [17].Moreover, state-feedback control design methods based on the MLP network were also studied for nonlinear systems in Brunovksy, pure-feedback and lower-triangular forms by using Lyapunov stability theory and techniques of feedback linearization and backstepping [18][19][20][21][22].It is clear that designing a satisfactory state-observer is the precondition of implementing advanced state-feedback control laws.Since there usually exist system dynamics uncertainties the adaptive observer design method based upon ANNs is another hot topic nowadays.Vargas and Hemerly proposed an adaptive observer for unknown general nonlinear systems based upon both RBF networks and Lyapunov stability theory, and the adaption laws of the weights provide the bounded-error performance [23].By the use of the adaptive bounding technique, Stepanyan and Hovakimyan gave a RBF-based adaptive observer which could provide asymptotically convergent state estimation for a class of uncertain nonlinear systems [24].Very recently, Yang et al. also designed a stable RBF-based observer to build a model referenced adaptive controller (MRAC) for an electrohydraulic system [25].Since the MLP network is nonlinear in its parameters and can be applied to many systems with arbitrary degrees of nonlinearity and complexity, it has already been used to design adaptive observers.Abdollahi et al. gave an MLP-based observer for nonlinear systems by Lyapunov direct method, and then applied it to the state-estimation of flexible-joint manipulators [26].Pérez-Cruz and Poznyak gave a stable observer for estimating the precursor power and internal reactivity of a nuclear reactor by combining the MLP network and sliding mode technique [27].Talebi et al. designed a recurrent neural-network-based state-observer for sensor and actuator fault detection of the satellite's attitude control subsystem [28].
Since a nuclear fission reactor is by nature a complex nonlinear system with its parameters varying with time as a function of power-level, fuel burnup, xenon isotope production, control rod worth, etc., it is very necessary to design nonlinear observers for nuclear reactors with the adaptability to those parameter uncertainties.In this paper, a nonlinear adaptive observer is developed to PWRs by the use of MLP network.Based upon Lyapunov stability theory, both the boundness and convergence property of the observation error is first proved.Then, this observer is applied to the state-observation of a nuclear heating reactor (NHR) which is a special type of PWR with some properties such as natural circulation and self-pressurization.Numerical simulation results not only verify the feasibility of this newly-built observer but also show the relationship between its parameters and performance.

Dynamic Model for Observer Design
The reactor model for observer design in this paper is the point kinetics with one equivalent delayed neutron group and temperature feedback from both the fuel and coolant temperature, which is given as follows [6][7][8]29]: where n r is the relative nuclear power, c r is the relative concentration of delayed neutron precursor, β is the fraction of delayed neutrons, Λ is the effective prompt neutron lifetime, λ is the decay constant of delayed neutron precursor, α f and α c are respectively the temperature reactivity feedback coefficients of the fuel and the coolant, T f is the fuel temperature, T cav and T cin are respectively the average and inlet coolant temperatures of the reactor core, T f,m and T cav,m are respectively the initial equilibrium values of T f and T cav , Ω is the heat transfer coefficient between fuel and coolant, M is the mass flow rate times the heat capacity of the coolant, P 0 is the rated thermal power, ρ r is the reactivity induced by the control rods, μ f is the total heat capacity of the fuel elements, μ c is the total heat capacity of the reactor coolant, G r is the total reactivity worth of control rods, and z r is the control input, i.e., the speed signal of control rods.Suppose that n r0 , c r0 , T f0 , T cav0 , T cin0 and ρ r0 are respectively the steady values of n r , c r , T f , T cav , T cin and ρ r , which satisfies: Define the deviations between the actual and the steady values of n r , c r , T f , T cav , T cin and ρ r as: Moreover, let: and: Based on Equations ( 1) and ( 2), the nonlinear state-space model for observer design can be written as: (7) where: and the bounded vector θ ∈ R 4 denotes other modeling uncertainty.

Approximating System Uncertainty by MLP Network
The MLP network with one hidden layer can be expressed as: (11) where z ∈ R n is the input vector, both V ∈ R n×l and W ∈ R l×n are the first-to-second layer and second-to-third layer interconnection matrices respectively, l is the number of neutrons in the hidden layer, and: Here, vector v i (i = 1, …, l) is the ith column of interconnection matrix V, and activation function s is chosen as the continuous and differentiable nonlinear sigmoidal function, i.e.,: It has been proved in [12] that if the node number l of the hidden layer is large enough, then MLP network Equation (11) can approximate any continuous function to arbitrary accuracy on a compact set, from which we can see that there must exist proper weight matrices W and V such that: x σ (14) and: where ε is a bounded positive scalar, U is a given positive definite matrix and vector σ is defined by Equation (10).Usually in practical engineering, σ is norm-bounded system uncertainty given by Equation ( 10), and then it is not loss of generality to assume that: and: where, for a matrix A = (a ij ) ∈ R m×n , the Frobenius norm F is defined as:

Theoretic Problem Formulation
Usually, δn r and δT cav can be obtained directly from measurement, and the output of system Equation ( 7) can be defined as: (19) where: Choose the state-observer of system Equation (7) as: where x ∈R 4 and ξ ∈R are respectively the estimation of x and ξ, vector-valued functions f and g are determined by Equations ( 8) and ( 9), respectively: Ŵ and V are weighting matrices of MLP network Ĝ MLP , and both K O and k Oξ are observer gains.Then, the theoretic problem to be solved in this paper is summarized as follows.Problem 1.How to design observer gains K O and k Oξ and the learning algorithms of weighting matrices Ŵ and V of Ĝ MLP so that nonlinear adaptive observer Equation ( 21) is bounded and convergent?

Observer Design
It is clear that solving Problem 1 is equivalent to giving the tuning approach for both feedback gains K O and k Oξ and weighting matrices Ŵ and V of Ĝ MLP .In this section, this tuning approach, which provides bounded and convergent observation, will be given based on Lyapunov stability theory.Before giving the main result of this paper, a useful lemma is firstly introduced as follows.
Lemma 1.The approximation error of Ĝ MLP to G MLP defined by: (24) satisfies: where: and d r is the residual term.Moreover, d r satisfies: where c i (i = 0,1,2,3) are certain positive scalars.Proof: It is easy to see that the Taylor expansion of S(V T x) about V T x can be written as: where: (31) and O e r ( ) denotes the sum of the high order terms in the Taylor series expansion.Based on Equation (30), we can derive that: where: and: Then, we can clearly see from Equation (32) that Equation ( 25) is well satisfied.Moreover, since we have assumed that activation function s takes the form as Equation ( 13), it is clear that: and we can also derive that: From Equation (38), it is easy to check that for ∀v ∈R : and: Based on Inequalities (39) and (40), we have: and: Moreover, from Taylor expansion Equation (30), we can know that: Then, based on Assumption (17) and Inequalities (37) and ( 41)-(43), we have: By Equation (33), it can be seen that: We can see that Inequality (29) certainly holds.This completes the proof of Lemma 1.

Remark 1.
From Lemma 1, the norm of residual term d r is influenced by the norms of systems state x, observation error e and approximation error of weighting matrix  W .The following Theorem 1, which is the main result of this paper, proposes the design of nonlinear adaptive state-observer based on the MLP neural network.
Theorem 1.Consider state observer Equation ( 21) of PWR dynamics Equation (7), and suppose that observer gains k Oξ is positive and system state-vector x is bounded.Let observer gain matrix K O take the form as: where observer gains k ON , k OF and k OC are all positive.Furthermore, choose the learning algorithms of weighting matrices Ŵ and V of multilayer network Ĝ MNN as: ( ) ( ) respectively, where both Γ W and Γ V are diagonal positive-definite matrices, both scalars δ W and δ V are positive: (50) (51) δ is a positive scalar and matrix C is defined by Equation (20).Then observation errors e and e ξ defined by: = − e x x (53) and: are convergent and bounded.

e Σ W S S V x W S Vx He e N Σ W S S V x W S Vx e H N ΣW S S V x e N ΣW S Vx e H N ΣW S Vx e N ΣW S S V x
Further, since: (64) and: (65) from Equation (63), we have:

e Σ W S S V x W S Vx e H N Σ S S V x e H N ΣW S e Γ Γ e S S V x ΣN Γ N Σ S S V x WW x S WΣN Γ N ΣW S x VV W S S V x e H N Σ V xe
Substitute Inequality (66) to Equation ( 62): where: (68) (69) (71) Then, differentiate V e along the trajectory given by observation error dynamics Equation (55): From Inequality (72), if we choose the learning algorithms of the weighting matrices as Equations ( 48) and ( 49), then it is clear that: where: (74) and: Here, scalars δ W and δ V should be chosen so that both υ W and υ V are positive.
Based on the assumption about the boundness of system state x and Inequalities ( 15)-( 17) and ( 29), it is clear from Inequality (73) that the observation errors e and e ξ are convergent and bounded.This completes the proof of Theorem 1.
Remark 2. The MLP-based nonlinear adaptive observer determined by Equations ( 21) and ( 47)-(49) does not need any matching condition of system uncertainty However, the existing adaptive observers for nuclear reactors such the observer presented in [9] needs some matching on the system uncertainty.This means that the neural observer given in this paper is able to deal with general bounded system uncertainties, which is the key advanced feature of this novel neural observer design technique.Moreover, from Equations ( 21), ( 48) and (49), x , Ŵ and V are updated simultaneously.If the perceptron number of the hidden layer is not large, the simultaneous updating of state-estimation x and weighting matrices Ŵ and V cannot affect the real-time performance of the algorithm.

Simulation Results with Discussions
To verify the feasibility of this newly-built neural observer, it is applied to the state-observation of a NHR which is a small PWR developed by Institute of Nuclear and New Energy Technology (INET) at Tsinghua University in this section.The NHR has many advanced safety features such as integrated arrangement, natural circulation at any power-levels, self-pressurization, hydraulic control rod driving, and passive residual heat removing [30][31][32], and it can be applied to the fields such as district heating, seawater desalination and electricity production.The structure of the NHR is illustrated in Figure 1.Since NHR dynamics has both strong nonlinearity and high uncertainty, in order to implement advanced power-level controllers for higher operation performance, it is very meaningful to realize the adaptive state-observation for the NHR.

Description of the Numerical Simulation
The simulation model of the NHR is composed of the point kinetics model with six delayed neutron groups and lumped dynamic model of the reactor thermal-hydraulics, primary heat exchanger, U-tube steam generator (UTSG), feedwater pump of the UTSG and necessary pipe or volume cells [33].The parameters of the NHR at the middle of the fuel cycle in 100% power-level are shown in Table 1.The output-feedback-dissipation power-level control strategy given in [34] is adopted here.Moreover, in this simulation, we choose l = 4, k ON = k OC = 0.0001, k OF = 10.0, k Oξ = 1.0: where both δ wv and r p are given positive scalars.The initial values of interconnection matrices Ŵ and V , i.e., 0 Ŵ and 0 V are set to be 0 ˆ= W O and 0 ˆ= V O , respectively.Case A (large load increase): The load signal changes linearly from 20% to 100% in a minute.
1. δ wv = 0.01, and different r p is adopted in the simulation.2. r p = 1.0, and different δ wv is adopted.
Case B (large load decrease): The power demand decreases linearly from 100% to 20% in a minute.
1. δ wv = 0.01, and different r p is adopted in the simulation.2. r p = 1.0, and different δ wv is adopted.

Simulation Results
In this numerical simulation, the following two case studies are done to show the state-observing performance of MNN-based nonlinear adaptive observer determined by Equations ( 21) and ( 47)-(49).

Large Load Increase
This verification represents a hard operation for the NHR.In this case, the power demand increases linearly from 20% to 100% in 60 s.
The observation errors of variations of the relative nuclear power, the relative precursor concentration, and the average temperatures of the fuel and coolant, i.e., the observation errors of state-variables δn r , δc r , δT f and δT cav with constant δ wv and different r p are all illustrated in Figure 2. Furthermore, the observation errors of these state-variables with different δ wv and constant r p are shown in Figure 3.This case also represents a stressed operation for the NHR.The load signal changes linearly from 100% to 20% in a minute.The observation errors of state-variables δn r , δc r , δT f and δT cav with constant δ wv and different r p are all shown in Figure 4, and the responses of these observation errors with different δ wv and constant r p are given in Figure 5.

Discussion
In the procedure of load lift, the load increases rapidly from 20% to 100% in 60 s.Since the actual power level cannot vary so quickly, δn r becomes smaller, which indicates that the actual power level of the NHR is smaller than the load set by the operator in the initial phase of the process.Due to the function of power level controller, δn r becomes larger and larger, and finally equals zero.The difference of the power level causes the variations of the precursor concentration and average temperatures of the fuel and coolant inside the reactor core.Similarly, in the case of a load decrease from 100% to 20% in a minute, the actual power level also cannot change so fast, and therefore δn r become larger, which indicates that the actual power level of the NHR is higher than the load in the initial stage.Then the power-level becomes lower and lower due to the function of power controller, and finally reaches the full power-level.
From Figures 2-5, the MLP-based state-observer developed in this paper can provide bounded and convergent state-observations.The load variation leads to the variation of the state variables, which causes the variation of system output.The variation of system output then drives both the observer and learning algorithms of the MLP connection weights to generate a convergent state-observation.It is also clear from these figures that the variation of observer parameters cannot change the boundness and convergence of the state-observation.Further, from Figures 2 and 4, if positive scalar r p is larger, then the observation performance is higher.Actually, from Equation (79), scalar r p is larger, the influence of e O to the weighting connections that correspond to the state-observation of the thermal-hydraulic loop is stronger, which leads to higher observation performance of δT cav .From both Equations (55) and (57), since e 4 , i.e., the observation error of δT cav can affect the state-observation of neutron kinetics, higher observation performance of δT cav is positive to improve the observation quality of neutron kinetics.Moreover, from Figures 3 and 5, it is easy to see that if positive scalar δ wv is larger, the observation performance of δT cav is worse.However, there is a little improvement to the observation performance of δc r and δT f .Based upon the above discussion, MLP-based nonlinear state-observer composed of Equations ( 21), (47)-(49) provides both bounded and convergent observation of system state-variables, and the parameters of this observer should be properly adjusted.
From the curves plotted in Figures 2-5, both the overshoots and settling periods of the estimation errors of unmeasurable state δc r and δT f can be reduced to acceptable limits with properly selected scalars r p and δ wv , which leads to practical feasibility of this newly-built observer.Usually, r p should be larger, and δ wv should be selected based upon the trade-off between the observation performance of δT cav and that of δc r and δT f .Moreover, with comparison to the sliding mode observer [4], high gain observer [5] and DHGF [6], the main virtue of MLP-based nonlinear observer proposed in this paper is its high adaptation capability to system uncertainties.That is to say that this new observer has the adaptation performance that other observers for nuclear reactors do not have.
Finally, due to the widely utilization of those advanced digital control system platforms, there is no difficulty in realizing the MLP-based observer presented in this paper.Furthermore, since there have been some mature MLP network programs, it is easy for the engineers to implement both observer Equation (21) and learning Algorithms (48) and (49) as a software running on a digital platform.

Conclusions
Power-level control is an important technique that guarantees the operation stability and efficiency of the pressurized water reactor which is the most widely utilized nuclear fission reactor.Compared with classical static output feedback power-level control, advanced power-level regulators have the potential of improving closed-loop stability and dynamic performance by feeding back the internal state-variables.However, since not all of these internal states can be measured directly, it is necessary to develop state-observers to reconstruct those unmeasurable state-variables for the implementation of the advanced power-level controllers.It is well known that each PWR is naturally a complex nonlinear dynamic system with parameters varying with the power-level, fuel burnup, xenon isotope production, control rod worth, etc., which leads to the necessity of designing a nonlinear observer for the PWR with adaptability to the system uncertainties.Motivated by this, an MLP-based nonlinear adaptive observer is proposed for the PWR.Based upon Lyapunov stability theory, it is proved theoretically that this new observer can provide bounded and convergent state-observation.Numerical simulation results not only verify its feasibility, but also show the relationship between observation performance and tuning parameters.

Table 1 .
NHR Parameters at the Middle of the Fuel Cycle in 100% Power-Level.