Q-Learning Neural Controller for Steam Generator Station in Micro Cogeneration Systems

Lalik, Krzysztof; Kozek, Mateusz; Podlasek, Szymon; Figaj, Rafał; Gut, Paweł

doi:10.3390/en14175334

Open AccessArticle

Q-Learning Neural Controller for Steam Generator Station in Micro Cogeneration Systems

by

Krzysztof Lalik

^1,*,†

,

Mateusz Kozek

^1,†

,

Szymon Podlasek

^2,†,

Rafał Figaj

²

and

Paweł Gut

¹

Faculty of Mechanical Engineering and Robotics, AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Poland

²

Faculty of Energy and Fuels, AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Poland

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Energies 2021, 14(17), 5334; https://doi.org/10.3390/en14175334

Submission received: 30 June 2021 / Revised: 13 August 2021 / Accepted: 22 August 2021 / Published: 27 August 2021

(This article belongs to the Special Issue Automation and Robotics Application in Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

This article presents the results of the optimization of steam generator control systems powered by mixtures of liquid fuels containing biofuels. The numerical model was based on the results of experimental research of steam generator operation in an open system. The numerical model is used to build control algorithms that improve performance, increase efficiency, reduce fuel consumption and increase safety in the full range of operation of the steam generator and the cogeneration system of which it is a component. In this research, the following parameters were monitored: temperature and pressure of the circulating medium, exhaust gas temperature, oxygen content in exhaust gas, percentage control of oil burner power. Two methods of controlling the steam generator were proposed: the classic one, using the PID regulator, and the advanced one, using artificial neural networks. The work shows how the model is adapted to the real system and the impact of the control algorithms on the efficiency of the combustion process. The example is considered for the implementation of advanced control systems in micro-, small- and medium-power cogeneration and trigeneration systems in order to improve their final efficiency and increase the profitability of implementation.

Keywords:

renewable energy; mCHP; modelling; neural algorithms; PID controller

1. Introduction

The problem of optimal control of nonlinear multi-input multi-output (MIMO) systems is very complex [1,2,3,4]. Currently, many academic centers and private entrepreneurs are actively engaged in this topic [5,6,7,8]. In conventional and renewable energy, such circuits are very common. In [9,10,11,12], challenges and problems related to the resilience of such systems are described. The potential associated with improving the control of such systems and methods for experimental optimization of such control are disclosed therein. In most cases, they are based on linearizing the nonlinear model and attempting to tune various controllers to the linearized model. In each of these examples, it is necessary to synthesize a system of equations of the controlled object, which is often impossible or subject to considerable error due to parametric uncertainty.

Optimal control is also an important domain when using renewable energy sources. The basic control problems of such systems are described in [13]. With hybrid energy harvesting sources, one has to take into account the variations in the daytime and nighttime energy supply for the solar part. Therefore, a suitable structure is required for the controller to respond optimally and to drive, for example, the appropriate frequency of the generated current. In [14,15], the basic problems of inverter systems powered by photovoltaic systems and wind energy conversion in controlled and uncontrolled system architectures are described. They point out that in power generation, on the one hand, the load is strongly nonlinear, but also renewable sources are characterized by large disturbances and unsteady operating parameters. They indicate that the use of classical immune regulators with a current observer in the inverter can eliminate the need for additional load sensors, which results in a reduction in the price of such systems while maintaining control quality.

The problem of regulating nonlinear multivariable systems is also observed in the field of Building Management System (BMS) technology. The authors in [16,17,18] suggest the use of hybrid controllers (PI, PD and LQR) in the load system to control the current and voltage flow of actuators, air conditioners and external battery charger. However, they focus most on the well observable side of the load devices and less on the nonlinear energy supply systems.

There has been extensive research on the control of multidimensional polygenerational systems [19,20,21,22,23,24,25,26]. These works analyze systems that integrate heat pumps and cogeneration units with consideration of optimal control strategies. They are also devoted to the definition of operational strategies, maximizing efficiency considering, performance, battery energy storage capacity and electricity market opportunities. They clearly indicate the direction of control technology development in the power industry, which is crucial for the performance of autonomous cogeneration systems.

When it comes to controlling such facilities, there are a myriad of strategies that have their advantages and disadvantages. The simplest way to implement the control is through the serial and parallel PID controller structures algorithms described in [27,28]. They allow the realization of nonlinearly parameterized adaptive systems leading to coupled PID gain adaptations and some flexibility within a narrow range of system nonlinearities.

A certain solution to the control problem of nonlinear multidimensional structures is the use of immune controllers. In [29,30], a method is given for controlling temperature in an office facility using a resilient PI controller algorithm in conjunction with a descriptive function method, which proves to be applicable to some extent to eliminate oscillations while maintaining desired HVAC system performance. Ref. [31] details a chaotic Duffing–Holmes oscillator system in robust control for an industrial robotic manipulator. The results demonstrate the effectiveness of the algorithm for uncertain MIMO systems with some external interfering signals, minimal tracking error, and optimal control inputs. In contrast, Ref. [32] gives a robust PID controller using Kharitonov’s stability theorem. The controller design is based on a linearized model of a highly nonlinear Twin Rotor MIMO System (TRMS). It shows that the control quality is satisfactory within a certain range of model parameters.

A whole class of controller solutions for such systems is the use of fuzzy logic [33,34,35,36,37,38]. Fuzzy logic based controllers are so widely used because they provide a scientific methodology to replicate the behavior of human inference for imprecise and incomplete information in feedback. So it is a mathematical model of a control system based on heuristic human experience. With these methods, it is not necessary to build a MIMO system model. In addition, with appropriate setting of rules and membership functions, it is possible to control in the full range of nonlinearity of the object. However, it should be noted that in the construction of MIMO object controllers based on fuzzy logic, expert knowledge is necessary each time, and the controller itself is only as good as the number of identified and realized operating scenarios.

In the field of neural networks, neural regulators are also used [39,40,41]. Advanced control of MIMO objects with the use of neural networks is mostly concerned with intelligent buildings. In the case of control algorithms with a numerical model, it is necessary to provide a good quality model representing the entire dynamics of the object. Only in this case is it possible to guarantee satisfactory performance of smart building control.

In [42], a number of artificial neural network algorithms have been developed, optimized and implemented for the prediction of the outdoor temperature, which has been identified as a key parameter to reduce the required degree of observability of the system and, consequently, to reduce the number of required process variable sensors.

In [43], the use of a deep Q-network (DQN) to control a MIMO system is described. It was demonstrated that the DQN can determine the control equilibrium point between different energy consumers in a building, such as chillers, pumps, and air preparation stations.

This paper presents a proprietary neural controller for a MIMO facility using an actor-critic network. The proposed neural algorithm solves the discussed problems of previously known solutions. The research concerns the Twin-Delayed Deep Deterministic Policy Gradient TD3PG algorithm. The TD3PG agent is an extension of the deep deterministic policy gradient DDPG algorithm, which uses a reinforcement learning actor-critic structure that seeks an optimal strategy that maximizes the expected cumulative long-term reward. The use of a model-free, built-in reinforcement learning method allows the optimization of the control system to be realized without knowledge of the multidimensional object model. It also allows the optimum of the objective function for nonlinear objects to be achieved. This paper presents results for a learning network. For safety reasons, learning was performed on a numerical representation of the system, followed by final tuning using reinforcement learning on a real object. The results were compared with control using a classical PID controller.

2. Measurement Stand

In the system under study (Figure 1), steam is generated by a steam generator fueled with different types of liquid fuels. The description of the components is listed in Table 1. The fuels used in the experiment were pure fuel oil, a mixture of fuel oil and FAME (50/50), a mixture of fuel oil and rapeseed oil (50/50). The operating parameters of the steam generator were tested in an open system—steam outlet to the atmosphere. The set temperature of the circulating medium (steam) at the outlet of the device was set at 130

^{\circ}

C, the temperature of the medium (water) at the inlet was at the level of the ambient temperature, approximately 20

^{\circ}

C.

The working fluid flow through the evaporator-coil of a high-speed steam generator (flow-through boiler) with an inner diameter of 26.9 mm and a nominal tube thickness of 2.8 mm was analyzed in the studied facility. The maximum constant operating parameters of the steam generator are 191

^{\circ}

C and 12 barG. The optimum operating parameters are 185

^{\circ}

C and 10 barG; the steam output for a constant flow of circulating medium at full burner power is 180 kg of steam per hour. The key parameters are the temperature and pressure of the medium at the outlet of the steam generator, the oxygen content in the flue gas and its temperature. In the studied system, a constant flow of the medium is used, and the feed pump (PO) works at a constant speed of 70 L per hour. The controllable element is the oil burner (PO), which consists of a liquid fuel pump and a fan coupled on a common shaft. The fuel supply is throttled by a proportional valve, analogically controlled within 4–20 mA. This means that the speed of the fuel pump and fan will be the same, but the flows will no longer be proportional. Note that their operating parameters at different points do not change in a linear fashion, as the steam generator was originally designed for hysteresis operation. Therefore, it may not be possible to control with classical adaptive controllers, which is why it was decided to use an autoadaptive neural controller. The control of the test plant was based on an industrial PLC, and the program was implemented using Ladder (LD) graphical language. The burner power (PO) control was based on 0–10 V analog signal sent from a PLC to frequency converter. The system was equipped with a number of hardware and software protections to maximize safe operation—these conditions were the boundary conditions necessary to model the operation and learning of the neural network. The run tests required to achieve the above-mentioned steam generator operating parameters, which are the input data for the simulation work, were carried out following the same procedure for all fuel types. The tests consisted in starting the steam generator: heating up the combustion chamber with a constant flow of circulating medium through the coil, bringing the medium to the point of phase transition and its evaporation and stabilization of operating parameters at a specified level, which took about 360 s. After reaching a temperature close to 130

^{\circ}

C, which is the most important operating parameter of the generator, the originally implemented PID controller with original settings maintained the temperature oscillating in values close to the setpoint temperature for a period of 300 s. This was followed by a slow extinguishing of the steam generator consisting of a burner shutdown and cooling of the coil walls and heat extraction from the combustion chamber through the flow of circulating medium until a safe temperature was reached at the steam generator outlet. The goal of the research was to collect measurement data aimed at developing an autoadaptive control algorithm in a manner that takes into account not only the main parameter, which is the temperature of the circulating medium, but also other key measurement signals with specific priorities and weights. Ultimately, the steam generator could be fed with a mixture of liquid fuels based on plant products (bio-oil, bio-ethanol) and provide a heat source in micro and small-scale cogeneration/trigeneration systems.

Based on the developed technology, a general decision diagram for a model control system was created (Figure 2). The state machine allowed for the determination of the interdependencies between the set parameters and measured values, which is the basis to start work on the base control algorithm based on PID regulators. The scheme can be divided into four areas:

Ignition, which includes:
- Switching on the circulator motor, fuel valve, spark igniter and fan motor and fuel pump (coupled on a common shaft),
- Checking the status of power, flow and burner operation.
Control the operating parameters of the unit which includes a set of three cooperating PID controllers operating in parallel:
- temperature controller (based on the temperature in the steam manifold),
- Pressure controller (based on the pressure in the steam collector),
- lambda coefficient controller (based on oxygen concentration in the flue gas).
Software protection system, which includes a system ensuring safe operation, which reacts to exceeding the operation-threatening parameters, and cuts off the operation of electric motors/closes the valves from the software level when the operation-threatening parameters have been exceeded.
Continuous work allowing for constant work of the system after reaching stable parameters until:
- change of selected parameters,
- end of steam generator operation by the user,
- emergency situations.

The form of used PID controller is shown in Equation (1).

C = K_{p} + \frac{K_{i}}{s} + \frac{K_{d} s}{T_{f} s + 1}

(1)

where:

C—control,
$K_{p}$ —proportional gain,
$K_{i}$ —integral gain,
$K_{d}$ —derivative gain,
$T_{f}$ —derivative filter constant.

3. Methodology

The TD3PG agent is a reinforcement learning network agent searching for an optimal solution that maximizes the expected value of the aggregate reward. While training the network, the DDPG agent changes the weights for the actor and critic in each learning iteration. However, the history of these weights is stored in the network memory. Therefore, it can be interpreted as the experience of the actor and critic. The agent modifies the parameters of both learning participants each time using randomly selected elements of the experience buffer. At each stage of network training, the performance of the participants is also intentionally perturbed by the agent with a disturbance in the form of stochastic noise. The agent learning of the TD3PG network starts by optimizing the parameters

θ_{Q k}

of the critic function

Q_{k} (S, A)

and the parameters

θ_{μ}

of the deterministic agent

μ (S)

. The parameters

θ_{Q k^{'}}

of the critic’s future (target) function

Q_{k}^{'} (S, A)

and the parameters

θ_{μ^{'}}

of the actor’s future (target) function

μ^{'} (S)

satisfy condition (2) in the first iteration.

θ_{Q k^{'}} = θ_{Q k} θ_{μ^{'}} = θ_{μ}

(2)

At each learning epoch, an action A is selected for the current observation S by adding stochastic noise N to the deterministic actor function according to Equation (3)

A = μ (S) + N

(3)

Following step S, the reward function R and the next observation

S^{'}

are calculated. Both the current action A, the current and next observation (S and

S^{'}

) and the calculated reward function R are stored in the experience buffer. When a sufficiently large experience buffer is obtained, the draw of an experience series of size M is activated in each iteration. At the point if

S_{i}^{'}

is added, the goal function

y_{i}

takes the value

R_{i}

. In any other case,

y_{i}

takes the value according to (4).

y_{i} = R_{i} + γ * min_{k} (Q_{k}^{'} (S_{i}^{'}, c l i p (μ^{'} (S_{i}^{'} | θ_{μ}) + ϵ) | θ_{Q k^{'}}))

(4)

The mechanism of the algorithm is to add the experience rewards

R_{i}

with the minimum aggregate value of future rewards from the critter function. In addition, at each step, the loss function

L_{k}

is minimized according to all available experience according to relation (5)

L_{k} = \frac{1}{M} \sum_{i = 1}^{M} {(y_{i} - Q_{k} (S_{i}, A_{i} | θ_{Q k}))}^{2}

(5)

At certain iteration intervals, the actor is updated using the gradient policy by computing the expected return from the initial distribution J, according to relation (6).

\nabla_{θ_{μ}} J \approx \frac{1}{M} \sum_{i = 1}^{M} G_{a i} G_{μ i}

(6)

where

G_{a i}

is the gradient of the minimum output of the critter function considering the action A computed by the actor network, and

G_{μ i}

is the gradient of the output of the actor function depending on the current actor parameters

θ_{μ}

. Both gradients can be defined by the relation (7)

G_{a i} = \nabla_{A} min_{k} (Q_{k} (S_{i}, μ (S_{i} | θ_{μ}) | θ_{Q}))

(7)

G_{μ i} = \nabla_{θ_{μ}} μ (S_{i} | θ_{μ})

At the specified iteration interval

τ

, the parameters of the target actor and critics are also refreshed according to relations (8)

θ_{Q k^{'}} = τ θ_{Q k} + (1 - τ) θ_{Q k^{'}}

(8)

θ_{μ^{'}} = τ θ_{μ} + (1 - τ) θ_{μ^{'}}

The main advantage of the neural controller approach as opposed to classical differential equation models is the fact that there is no need for the actual model itself. The TD3PG agent allows for model-free control implementation. In classical controller synthesis, it can be difficult or impossible to identify system parameters. In the Q-learning algorithm approach, the network is trained with the largest possible spectrum of nonlinearities, which on the one hand makes the controller nonlinear, model-free, and also provides better control quality over a wider range of controlled system parameters.

The A shares approved by the agent are the direct object controls. So the task of TD3PG is to calculate the optimal controls

m_{1}

,

m_{2}

for the object (CHP). A schematic of the control system is shown in Figure 3. The signals described in this schematic are, in order: x—setpoint; e—static deviation; m—controls; y—outputs from the object. The indexing for the signals

x, e, y

are in order: 1—

O_{2}

concentration in the flue gas; 2—water vapor pressure; 3—flue gas temperature; 4—water vapor temperature. Control signal

m_{1}

is responsible for fan operation, while signal

m_{2}

for fuel flow throttling valve control.

Q-learning network description techniques are well known. The structure of the TD3PG agent is also well known. The novelty proposed in this paper is the actor neural network structure and the new reward function design. The reward function optimizes the control solution with special focus on the two most important parameters for the operator. These are steam pressure and temperature. Thus, the novelty is such a modified reward function to determine the key state variables for the TD3PG agent.

In order to learn the agent, a bench model was created. The identification experiment was conducted in an open system. The control values of the supply fan and the fuel flow throttling valve were set directly. The system response in the form of four process variables (

O_{2}

concentration in the exhaust gas measured with a lambda probe, vapor pressure measurement, exhaust gas temperature and vapor temperature) along with the control signals were recorded at a frequency of 1 Hz. A parametric black box model was created based on these measurements. This model was used for offline learning of the TD3PG agent.

4. Results

For the controller, the structure shown in Figure 4 is used as the proposed TD3PG agent. A vector of signal lapses (1) goes to an observation generator (2) and a reward generating function (3). The function compares the current observation with the action from the previous iteration computed by the actor-critic block (4). Based on the signal vectors from the current observation (3) and the current reward function (2), the decision block (4) derives a new current action. The action of the inference block can be terminated at any time by block (6). After superimposing saturation on the computed action (which is related to mapping the physical constraints of the signal magnitude), the control vector (5) is derived.

The agent learning process consisted of introducing negative weights for the reward coefficients. In this approach, the agent that suffered the least penalty will be the agent with the best structure. Figure 5 shows the reward function in each learning iteration of the neural network. The blue color shows the current reward of the actor, and the orange color shows the average reward of the actor. The critic’s evaluation of the actor’s performance, the expected value of the critic’s reward, is shown in yellow. Two aspects can be seen in the first iterations. An inexperienced actor makes big mistakes, which results in a big penalty (low reward), and a critic with a neutral attitude in 0 iterations evaluates the actor increasingly poorly. From approximately 250 iterations and above, the actor’s and critic’s rewards stabilize. The actor makes fewer mistakes because of his acquired experience, and the critic does not greatly change his evaluation. It is worth noting that each time the actor decreases in quality, the critic’s reward decreases, causing the actor’s performance to improve and vice versa. However, these reactions are shifted relative to each other.

In order to evaluate the quality of the neural regulator and compare it with classical control systems, two criteria were defined. The first was the static deviation,

e_{i}

, and the second was the integral index from the square of the deviation signal,

I_{L}

, defined by Equation (9). The importance of

I_{L}

is very high, since its physical interpretation is the energy lost in the control process. Its minimization, therefore, allows the comparison of different controls in terms of efficiency, which is of direct importance for thermal systems in terms of control optimality.

I_{L} = \int_{0}^{\infty} {[e_{i} (t)]}^{2} d t

(9)

Figure 6a presents the regularization results for the top 15 agents in the TD3PG network. Performance of the best 5 agents is presented in Table 2. Agent67 is the best agent. It achieves zero lag as early as the 8th minute of operation. The other agents can respond faster, but the negative static lag indicates that the process has overshot. For comparison, Figure 6b shows the static deviation for a critical gain tuned controller with the Pessen algorithm. It does not reach the setpoint until about the 25th minute without overshoot. For the cases shown in Figure 7a,b, the energy loss difference for the system with the neural controller is 29.1%. This illustrates the energy gain that can be achieved with the neural regulator.

5. Conclusions

MIMO systems are very complicated. In order for an object to be defined as a MIMO, there must be oblique links between the inputs and outputs of the object. Designing a controller for a MIMO object is generally more difficult and time consuming than for a one-dimensional object because of the significant complexity of the mathematical description. The degree of complexity of the control problem for a MIMO object is greatly affected by how strong the interactions are between the input and output values. The description of the model and the synthesis of the controller require expert knowledge in each case and often cause the controller to be under-tuned. Some of the objects also have large parametric uncertainty or are unobservable. This paper presents the application of neural networks in optimal control of a MIMO system in the form of a biofuel cogenerator.

A reinforcement learning neural network based on the TD3PG agent is a structured control method in which system performance is expected to outperform the use of classical controllers. In this paper, an actor-critic structure is used to design an optimal neural controller.

The added value of the proposed solution is as follows:

self-optimization of object control,
possibility of constitution of regulator operation on the basis of intuitive selection of network rewards,
possibility of reaching a convergent control solution without the necessity of controller synthesis.

The TD3PG agent was used to control an existing cogenerator. The control quality was compared with the existing control system based on PID controllers. The only intervention in the reference system was the re-tuning of the PID controllers using gradient methods. In no case did the classical control system achieve higher control quality than the selected TD3PG agents.

The proposed solution reaches the setpoint of the process variable faster. It should also be noted that it reduces energy losses as early as the control stage. This is crucial because, although not all losses can be eliminated, in process automation, they should be minimized in the control program of the MIMO device.

Further work will investigate the robustness of the actor-critic algorithm to respond to changes in object parameters associated with disturbance and non-stationarity and the possibility of training or retraining the controller in on-line operations.

Author Contributions

Conceptualization, K.L.; methodology, M.K. and K.L.; software, M.K.; validation, S.P.; formal analysis, K.L.; investigation, R.F. and P.G.; resources, S.P. and M.K.; data curation, M.K. and K.L.; writing—original draft preparation, K.L.; writing—review and editing, K.L. and M.K. and P.G.; visualization, K.L.; supervision, K.L.; project administration, K.L.; All authors have read and agreed to the published version of the manuscript.

Funding

This paper was written under a grant entitled “A neural optimization algorithm for tuning regulators of MIMO multidimensional systems” Agreement No. 22/GRANT/2021-IDUB.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MIMO	Multi-Input Multi-Output
BMS	Building Management System
PID	Proportional–Integral–Derivative controller
LQR	Linear–Quadratic Regulator
HVAC	Heating, Ventilation, Air Conditioning
TRMS	Twin Rotor MIMO System
DQN	Deep Q-Network
DDPG	Deep Deterministic Policy Gradien
TD3PG	Twin-Delayed Deep Deterministic Policy Gradient
FAME	Fatty Acid Methyl Esters

References

De Jesús Rubio, J.; Lughofer, E.; Pieper, J.; Cruz, P.; Martinez, D.I.; Ochoa, G.; Islas, M.A.; Garcia, E. Adapting H-infinity controller for the desired reference tracking of the sphere position in the maglev process. Inf. Sci. 2021, 569, 669–686. [Google Scholar] [CrossRef]
Martinez, D.I.; Rubio, J.D.J.; Garcia, V.; Vargas, T.M.; Islas, M.A.; Pacheco, J.; Gutierrez, G.J.; Meda-Campaña, J.A.; Mujica-Vargas, D.; Aguilar-Ibañez, C. Transformed Structural Properties Method to Determine the Controllability and Observability of Robots. Appl. Sci. 2021, 11, 3082. [Google Scholar] [CrossRef]
Huang, H.; Zhou, J.; Di, Q.; Zhou, J.; Li, J. Robust neural network–based tracking control and stabilization of a wheeled mobile robot with input saturation. Int. J. Robust Nonlinear Control 2019, 29, 375–392. [Google Scholar] [CrossRef]
Yang, X.; He, H.; Liu, D.; Zhu, Y. Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems. IET Control Theory Appl. 2017, 11, 2307–2316. [Google Scholar] [CrossRef]
Escobedo-Alva, J.O.; Garcia-Estrada, E.C.; Paramo-Carranza, L.A.; Meda-Campana, J.A.; Tapia-Herrera, R. Theoretical application of a hybrid observer on altitude tracking of quadrotor losing GPS signal. IEEE Access 2018, 6, 76900–76908. [Google Scholar] [CrossRef]
Soriano, L.A.; Zamora, E.; Vazquez-Nicolas, J.; Hernández, G.; Madrigal, J.A.B.; Balderas, D. PD Control Compensation Based on a Cascade Neural Network Applied to a Robot Manipulator. Front. Neurorobot. 2020, 14, 577749. [Google Scholar] [CrossRef]
Yen, V.T.; Nan, W.Y.; Van Cuong, P. Recurrent fuzzy wavelet neural networks based on robust adaptive sliding mode control for industrial robot manipulators. Neural Comput. Appl. 2019, 31, 6945–6958. [Google Scholar] [CrossRef]
Fu, J.; Liu, M.; Cao, X.; Li, A. Robust neural-network-based quasi-sliding-mode control for spacecraft-attitude maneuvering with prescribed performance. Aerosp. Sci. Technol. 2021, 112, 106667. [Google Scholar] [CrossRef]
Anderson, M.; Buehner, M.; Young, P.; Hittle, D.; Anderson, C.; Tu, J.; Hodgson, D. MIMO robust control for HVAC systems. IEEE Trans. Control Syst. Technol. 2008, 16, 475–483. [Google Scholar] [CrossRef]
Shen, Y.; Cai, W.J.; Li, S. Normalized decoupling control for high-dimensional MIMO processes for application in room temperature control HVAC systems. Control Eng. Pract. 2010, 18, 652–664. [Google Scholar] [CrossRef]
Abtahi, S.; Sadati, S.; Ghaffari, A. Design of sliding mode and lQR controllers for an HVAC system. Aerosp. Mech. J. 2013, 9, 1–10. [Google Scholar]
Haghighi, M.M.; Sangiovanni-Vincentelli, A.L. Modeling and Optimal Control Algorithm Design for HVAC Systems in Energy Efficient Buildings. Master’s Thesis, EECS Department, University of California, Berkeley, CA, USA, 2011. [Google Scholar]
Shankar, G.; Lakshmi, S.; Nagarjuna, N. Optimal load frequency control of hybrid renewable energy system using PSO and LQR. In Proceedings of the 2015 International Conference on Power and Advanced Control Engineering (ICPACE), Bengaluru, India, 12–14 August 2015; pp. 195–199. [Google Scholar]
Arab, N.; Vahedi, H.; Al-Haddad, K. LQR control of single-phase grid-tied PUC5 inverter with LCL filter. IEEE Trans. Ind. Electron. 2019, 67, 297–307. [Google Scholar] [CrossRef]
Muratovich, Z.D.; Do, T.D. LQR Based SMC for Three-Phase-Inverter with LC Filter in Renewable Energy Conversion Systems. In Proceedings of the 2019 International Conference on System Science and Engineering (ICSSE), Dong Hoi, Vietnam, 20–21 July 2019; pp. 456–461. [Google Scholar]
Homod, R.Z.; Gaeid, K.S.; Dawood, S.M.; Hatami, A.; Sahari, K.S. Evaluation of energy-saving potential for optimal time response of HVAC control system in smart buildings. Appl. Energy 2020, 271, 115255. [Google Scholar] [CrossRef]
Escobar, L.M.; Aguilar, J.; Garcés-Jiménez, A.; De Mesa, J.A.G.; Gomez-Pulido, J.M. Advanced fuzzy-logic-based context-driven control for HVAC management systems in buildings. IEEE Access 2020, 8, 16111–16126. [Google Scholar] [CrossRef]
Amin, N.M.; Ab Ghani, M.; Jidin, A.; Othman, S.; Jano, Z. Development of e-help manual using graphical user interface (gui) for battery management system (bms) in electric vehicle. J. Adv. Manuf. Technol. (JAMT) 2019, 13, 39–50. [Google Scholar]
Kotowicz, J. Analysis of operation of the gas turbine in a poligeneration combined cycle. Arch. Thermodyn. 2013, 34, 137–159. [Google Scholar] [CrossRef][Green Version]
Labella, A.; Mestriner, D.; Procopio, R.; Delfino, F. A simplified first harmonic model for the Savona Campus Smart Polygeneration Microgrid. In Proceedings of the 2017 IEEE International Conference on Environment and Electrical Engineering and 2017 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I CPS Europe), Milan, Italy, 6–9 June 2017; pp. 1–6. [Google Scholar] [CrossRef]
Collazos, A.; Maréchal, F.; Gähler, C. Predictive optimal management method for the control of polygeneration systems. Comput. Chem. Eng. 2009, 33, 1584–1592. [Google Scholar] [CrossRef]
Menon, R.P.; Paolone, M.; Maréchal, F. Study of optimal design of polygeneration systems in optimal control strategies. Energy 2013, 55, 134–141. [Google Scholar] [CrossRef]
Karavas, C.S.; Kyriakarakos, G.; Arvanitis, K.G.; Papadakis, G. A multi-agent decentralized energy management system based on distributed intelligence for the design and control of autonomous polygeneration microgrids. Energy Convers. Manag. 2015, 103, 166–179. [Google Scholar] [CrossRef]
Sornek, K.; Filipowicz, M.; Żołądek, M.; Kot, R.; Mikrut, M. Comparative analysis of selected thermoelectric generators operating with wood-fired stove. Energy 2019, 166, 1303–1313. [Google Scholar] [CrossRef]
Sornek, K.; Filipowicz, M.; Rzepka, K. The development of a thermoelectric power generator dedicated to stove-fireplaces with heat accumulation systems. Energy Convers. Manag. 2016, 125, 185–193. [Google Scholar] [CrossRef]
Bianco, V.; Szubel, M.; Matras, B.; Filipowicz, M.; Papis, K.; Podlasek, S. CFD analysis and design optimization of an air manifold for a biomass boiler. Renew. Energy 2021, 163, 2018–2028. [Google Scholar] [CrossRef]
El Rifai, K. Nonlinearly parameterized adaptive PID control for parallel and series realizations. In Proceedings of the 2009 American Control Conference, St. Louis, MO, USA, 10–12 June 2009; pp. 5150–5155. [Google Scholar]
Lee, Y.; Skliar, M.; Lee, M. Analytical method of PID controller design for parallel cascade control. J. Process. Control 2006, 16, 809–818. [Google Scholar] [CrossRef]
Rehrl, J.; Horn, M. Temperature control for HVAC systems based on exact linearization and model predictive control. In Proceedings of the 2011 IEEE International Conference on Control Applications (CCA), Denver, CO, USA, 28–30 September 2011; pp. 1119–1124. [Google Scholar]
Rehrl, J.; Horn, M.; Reichhartinger, M. Elimination of limit cycles in hvac systems using the describing function method. In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) Held Jointly with 2009 28th Chinese Control Conference, Shanghai, China, 15–18 December 2009; pp. 133–139. [Google Scholar]
Mahmoodabadi, M.J.; Maafi, R.A.; Taherkhorsandi, M. An optimal adaptive robust PID controller subject to fuzzy rules and sliding modes for MIMO uncertain chaotic systems. Appl. Soft Comput. 2017, 52, 1191–1199. [Google Scholar] [CrossRef]
Pandey, S.K.; Dey, J.; Banerjee, S. Design and real-time implementation of robust PID controller for Twin Rotor MIMO System (TRMS) based on Kharitonov’s theorem. In Proceedings of the 2016 IEEE 1st International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES), Delhi, India, 4–6 July 2016; pp. 1–6. [Google Scholar]
Dominik, I. Interval Type-2 Fuzzy Logic Control of NM70 Shape Memory Actuator. In Proceedings of the Smart Materials, Adaptive Structures and Intelligent Systems, Newport, RI, USA, 8–10 September 2014; Volume 46148. [Google Scholar]
Wang, R.; An, A.; Wen, Y.; Song, H. Study on the Influence of Parallel Fuzzy PID Control on the Regulating System of a Bulb Tubular Turbine Generator Unit. J. Electr. Eng. Technol. 2021, 16, 1403–1414. [Google Scholar] [CrossRef]
Dominik, I. Type-2 fuzzy logic controller for position control of shape memory alloy wire actuator. J. Intell. Mater. Syst. Struct. 2016, 27, 1917–1926. [Google Scholar] [CrossRef]
Dominik, I. Fuzzy logic control of rotational inverted pendulum. In Solid State Phenomena; Trans Tech Publ.: Baech, Switzerland, 2011; Volume 177, pp. 84–92. [Google Scholar]
Ketata, R.; De Geest, D.; Titli, A. Fuzzy controller: Design, evaluation, parallel and hierarchial combination with a pid controller. Fuzzy Sets Syst. 1995, 71, 113–129. [Google Scholar] [CrossRef]
Homod, R.Z.; Togun, H.; Abd, H.J.; Sahari, K.S. A novel hybrid modelling structure fabricated by using Takagi-Sugeno fuzzy to forecast HVAC systems energy demand in real-time for Basra city. Sustain. Cities Soc. 2020, 56, 102091. [Google Scholar] [CrossRef]
Kozek, M. Transfer Learning algorithm in image analysis with Augmented Reality headset for Industry 4.0 technology. In Proceedings of the 2020 International Conference Mechatronic Systems and Materials (MSM), Bialystok, Poland, 1–3 July 2020; pp. 1–5. [Google Scholar]
Bartoszewicz, A.; Kabziński, J.; Kacprzyk, J. Advanced, Contemporary Control: Proceedings of KKA 2020. In Proceedings of the 20th Polish Control Conference, Łódź, Poland, 21–28 October 2020; Volume 1196. [Google Scholar]
Chen, Y.; Tong, Z.; Zheng, Y.; Samuelson, H.; Norford, L. Transfer learning with deep neural networks for model predictive control of HVAC and natural ventilation in smart buildings. J. Clean. Prod. 2020, 254, 119866. [Google Scholar] [CrossRef]
Demirezen, G.; Fung, A.S.; Deprez, M. Development and optimization of artificial neural network algorithms for the prediction of building specific local temperature for HVAC control. Int. J. Energy Res. 2020, 44, 8513–8531. [Google Scholar] [CrossRef]
Ahn, K.; Park, C. Application of deep Q-networks for model-free optimal control balancing between different HVAC systems. Sci. Technol. Built Environ. 2020, 26, 61–74. [Google Scholar] [CrossRef]

Figure 1. Schematic of a steam generator fueled with liquid fuel mixtures. The components of the steam generator test stand are described in Table 1.

Figure 2. Decision diagram for the baseline algorithm based on a set of PID controllers.

Figure 3. Schematic of the cogenerator control system.

Figure 4. Structure of the regulator with the TD3PG agent.

Figure 5. Learning of the TD3PG agent.

Figure 6. Plot of control deviation for: (a) the best 15 neural network agents (b) PID controller.

Figure 7. Energy Loss for (a) the best 15 neural network agents (b) the PID controller.

Table 1. Description of measuring and control elements of the biomass steam generator.

Measuring and Safety Elements	Executive and Control Elements
ST—temperature measurement of the circulating medium	MP—feeding pump
SP—circulating medium pressure measurement	OB—liquid fuel burner
ET—flue gas temperature measurement	PLC—programmable logic controller
EL—measurement of concentration of oxygen in flue gas	FI—frequency inverter
FR—photoresistor	CPU—computer

Table 2. The improved performances of PID and different neural agents controllers.

	Rising Time	Settling Time	Steady State Error	Overshoot	Undershoot
PID	440	691.67	0	0	N/A
Agent67	280	389.99	0	−2.95	N/A
Agent60	280	389.99	0	−4.24	N/A
Agent72	280	634.44	−0.00012	−5.98	N/A
Agent138	280	722.55	−0.009	−6.56	N/A
Agent130	280	894.93	−0.005	−7.35	N/A

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lalik, K.; Kozek, M.; Podlasek, S.; Figaj, R.; Gut, P. Q-Learning Neural Controller for Steam Generator Station in Micro Cogeneration Systems. Energies 2021, 14, 5334. https://doi.org/10.3390/en14175334

AMA Style

Lalik K, Kozek M, Podlasek S, Figaj R, Gut P. Q-Learning Neural Controller for Steam Generator Station in Micro Cogeneration Systems. Energies. 2021; 14(17):5334. https://doi.org/10.3390/en14175334

Chicago/Turabian Style

Lalik, Krzysztof, Mateusz Kozek, Szymon Podlasek, Rafał Figaj, and Paweł Gut. 2021. "Q-Learning Neural Controller for Steam Generator Station in Micro Cogeneration Systems" Energies 14, no. 17: 5334. https://doi.org/10.3390/en14175334

APA Style

Lalik, K., Kozek, M., Podlasek, S., Figaj, R., & Gut, P. (2021). Q-Learning Neural Controller for Steam Generator Station in Micro Cogeneration Systems. Energies, 14(17), 5334. https://doi.org/10.3390/en14175334

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Q-Learning Neural Controller for Steam Generator Station in Micro Cogeneration Systems

Abstract

1. Introduction

2. Measurement Stand

3. Methodology

4. Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI