<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Sensors</journal-id>
<journal-title>Sensors</journal-title>
<issn pub-type="epub">1424-8220</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/s120506117</article-id>
<article-id pub-id-type="publisher-id">sensors-12-06117</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>Self-Learning Variable Structure Control for a Class of Sensor-Actuator Systems</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Chen</surname><given-names>Sanfeng</given-names></name><xref ref-type="aff" rid="af1-sensors-12-06117"><sup>1</sup></xref><xref ref-type="author-notes" rid="fn1-sensors-12-06117"><sup>†</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Li</surname><given-names>Shuai</given-names></name><xref ref-type="aff" rid="af2-sensors-12-06117"><sup>2</sup></xref><xref ref-type="author-notes" rid="fn1-sensors-12-06117"><sup>†</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname><given-names>Bo</given-names></name><xref ref-type="aff" rid="af3-sensors-12-06117"><sup>3</sup></xref><xref ref-type="corresp" rid="c1-sensors-12-06117"><sup>*</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Lou</surname><given-names>Yuesheng</given-names></name><xref ref-type="aff" rid="af4-sensors-12-06117"><sup>4</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Liang</surname><given-names>Yongsheng</given-names></name><xref ref-type="aff" rid="af1-sensors-12-06117"><sup>1</sup></xref></contrib></contrib-group>
<aff id="af1-sensors-12-06117">
<label>1</label> Key Lab of Visual Media Processing and Transmission, Shenzhen Institute of Information Technology, Shenzhen 518029, Guangdong, China; E-Mails: <email>chensanf@sziit.com.cn</email> (S.C.); <email>liangys@sziit.com.cn</email> (Y.L.)</aff>
<aff id="af2-sensors-12-06117">
<label>2</label> Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 07030, USA; E-Mail: <email>sam.shuai.li@gmail.com</email></aff>
<aff id="af3-sensors-12-06117">
<label>3</label> Department of Computer Science, University of Massachusetts, Amherst, MA 01003, USA</aff>
<aff id="af4-sensors-12-06117">
<label>4</label> School of Mechatronics and Information, Yiwu Industrial and Commercial College, Yiwu 322000, Zhejiang, China; E-Mail: <email>lusion@mail.ustc.edu.cn</email></aff>
<author-notes>
<corresp id="c1-sensors-12-06117">
<label>*</label>Author to whom correspondence should be addressed; E-Mail: <email>boliu@cs.umass.edu</email>; Tel.: +1-551-333-9638; Fax: +1-413-362-5733.</corresp><fn id="fn1-sensors-12-06117" fn-type="equal">
<label>†</label>
<p>These authors contributed equally to this work.</p></fn></author-notes>
<pub-date pub-type="collection">
<year>2012</year></pub-date>
<pub-date pub-type="epub">
<day>10</day>
<month>05</month>
<year>2012</year></pub-date>
<volume>12</volume>
<issue>5</issue>
<fpage>6117</fpage>
<lpage>6128</lpage>
<history>
<date date-type="received">
<day>04</day>
<month>04</month>
<year>2012</year></date>
<date date-type="rev-recd">
<day>16</day>
<month>04</month>
<year>2012</year></date>
<date date-type="accepted">
<day>29</day>
<month>04</month>
<year>2012</year></date></history>
<permissions>
<copyright-statement>© 2012 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2012</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)</p></license></permissions>
<abstract>
<p>Variable structure strategy is widely used for the control of sensor-actuator systems modeled by Euler-Lagrange equations. However, accurate knowledge on the model structure and model parameters are often required for the control design. In this paper, we consider model-free variable structure control of a class of sensor-actuator systems, where only the online input and output of the system are available while the mathematic model of the system is unknown. The problem is formulated from an optimal control perspective and the implicit form of the control law are analytically obtained by using the principle of optimality. The control law and the optimal cost function are explicitly solved iteratively. Simulations demonstrate the effectiveness and the efficiency of the proposed method.</p></abstract>
<kwd-group>
<kwd>sensor-actuator system</kwd>
<kwd>principle of optimality</kwd>
<kwd>Bellman equation</kwd>
<kwd>variable structure control</kwd>
<kwd>self-learning</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>With the development of mechatronics, automatic systems consisting of sensors for perception and actuators for action are more and more widely used in applications [<xref ref-type="bibr" rid="b1-sensors-12-06117">1</xref>–<xref ref-type="bibr" rid="b4-sensors-12-06117">4</xref>]. Besides the proper choices of sensors and actuators and an elaborate fabrication of mechanical structures, the control law design also plays a crucial role in the implementation of automatic systems especially for those with complicated dynamics. For most mechanical sensor-actuator systems, it is possible to model them in Euler-lagrange equations [<xref ref-type="bibr" rid="b4-sensors-12-06117">4</xref>,<xref ref-type="bibr" rid="b5-sensors-12-06117">5</xref>]. In this paper, we are concerned with the sensor-actuator systems modeled by Euler-lagrange equations.</p>
<p>Due to the importance of Euler-lagrange equations in modeling many real sensor-actuator systems, much attention has been paid to the control of such kind systems. According to the type of constraints, the Euler-lagrange system can be categorized into Euler-lagrange system without nonholonomic constraints (e.g., fully-actuated manipulator [<xref ref-type="bibr" rid="b6-sensors-12-06117">6</xref>,<xref ref-type="bibr" rid="b7-sensors-12-06117">7</xref>], omni-directional mobile robot [<xref ref-type="bibr" rid="b8-sensors-12-06117">8</xref>]), and the system subject to nonholonomic constraint [<xref ref-type="bibr" rid="b9-sensors-12-06117">9</xref>] (e.g., the cart-pole system [<xref ref-type="bibr" rid="b10-sensors-12-06117">10</xref>], the under-actuated multiple body system [<xref ref-type="bibr" rid="b11-sensors-12-06117">11</xref>]). For Euler-lagrange system without nonholonomic constraints, the dimension of inputs are often equal to the dimension of output and the system are often able to be transformed into a double integrator system by employing feedback linearization [<xref ref-type="bibr" rid="b12-sensors-12-06117">12</xref>]. Other methods, such as control Lyapunov function method [<xref ref-type="bibr" rid="b13-sensors-12-06117">13</xref>], passivity based method [<xref ref-type="bibr" rid="b14-sensors-12-06117">14</xref>], optimal control method [<xref ref-type="bibr" rid="b15-sensors-12-06117">15</xref>], etc., are also successfully applied to the control of Euler-lagrange system without nonholonomic constraints. In contrast, as the dimension of inputs is lower than that of outputs, it is often impossible to directly transform the Euler-lagrange system subject to nonholonomic constraints to a linear system and thus feedback linearization fails to stabilize the system. To tackle the difficulty, variable structure control based method [<xref ref-type="bibr" rid="b16-sensors-12-06117">16</xref>], backstepping based control [<xref ref-type="bibr" rid="b17-sensors-12-06117">17</xref>], optimal control based method [<xref ref-type="bibr" rid="b18-sensors-12-06117">18</xref>], discontinuous control method [<xref ref-type="bibr" rid="b19-sensors-12-06117">19</xref>], etc., are widely investigated and some useful design procedures are proposed. However, due to the inherent nonlinearity and nonholonomic constraints, most existing methods [<xref ref-type="bibr" rid="b16-sensors-12-06117">16</xref>–<xref ref-type="bibr" rid="b19-sensors-12-06117">19</xref>] are strongly model dependent and the performance are very sensitive to model errors. Inspired by the success of human operators for the control of Euler-lagrange systems, various intelligent control strategies, such as fuzzy logic [<xref ref-type="bibr" rid="b20-sensors-12-06117">20</xref>], neural networks [<xref ref-type="bibr" rid="b21-sensors-12-06117">21</xref>], evolutionary algorithms [<xref ref-type="bibr" rid="b22-sensors-12-06117">22</xref>], to name a few of them, are proposed to solve the control problem of of Euler-lagrange systems subject to nonholonomic constraints. As demonstrated by extensive simulations, these type of strategies are indeed effective to the control of Euler-lagrange systems subject to nonholonomic constraints. However, rigorous proof on the stability are difficult for this type of methods and there may exist some initializations of the state, from which the system cannot be stabilized.</p>
<p>In this paper, we propose a self-learning control method applicable to Euler-lagrange systems. In contrast to existing work on intelligent control of Euler-lagrange systems, the stability of the close loop system with the proposed method is proven in theory. On the other hand, different from model based design strategies, such as backstepping based design [<xref ref-type="bibr" rid="b17-sensors-12-06117">17</xref>], variable structure based design [<xref ref-type="bibr" rid="b16-sensors-12-06117">16</xref>], <italic>etc</italic>., the proposed method does not require information of the model parameters and therefore is a model independent method. We formulate the problem from an optimal control perspective. In this framework, the goal is to find the input sequence to minimize the cost function defined on infinite horizon under the constraint of the system dynamics. The solution can be found by solving a Bellman equation according to the principle of optimality [<xref ref-type="bibr" rid="b23-sensors-12-06117">23</xref>]. Then an adaptive dynamic programming strategy [<xref ref-type="bibr" rid="b24-sensors-12-06117">24</xref>–<xref ref-type="bibr" rid="b26-sensors-12-06117">26</xref>] is utilized to numerically solve the input sequence in real time.</p>
<p>The remainder of this paper is organized as follows: in Section 2, preliminaries on Euler-lagrange systems and variable structure control are given briefly. In Section 3, the problem is formulated as a constrained optimization problem and the critic model and the action model are employed to approximate the optimal mappings. The control law is then derived in Section 4. In Section 5, simulations are given to show the effectiveness of the proposed method. The paper is concluded in Section 6.</p></sec>
<sec>
<label>2.</label>
<title>Preliminaries on Variable Structure Control of the Sensor-Actuator System</title>
<p>In this paper, we are concerned with the following sensor-actuator system in the Euler-Lagrange form,
<disp-formula id="FD1">
<label>(1)</label>
<mml:math id="mm1" display="block">
<mml:semantics id="sm1">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mover accent="true">
<mml:mi>q</mml:mi>
<mml:mo>¨</mml:mo></mml:mover>
<mml:mo>+</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>q</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo>
<mml:mover accent="true">
<mml:mi>q</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mo>+</mml:mo>
<mml:mi>ϕ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>u</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>q</italic> ∈ ℝ<italic><sup>n</sup>, D</italic>(<italic>q</italic>) ∈ ℝ<sup>n×</sup><italic><sup>n</sup></italic> is the inertial matrix, <italic>C</italic>(<italic>q,q˙</italic>) ∈ ℝ<italic><sup>n</sup></italic><sup>×</sup><italic><sup>n</sup>, ϕ</italic>(<italic>q</italic>) ∈ ℝ<italic><sup>n</sup></italic> and <italic>u</italic> ∈ ℝ<italic><sup>n</sup></italic>. Note that the inertial matrix <italic>D</italic>(<italic>q</italic>) is symmetric and positive definite. There are three terms on the left side of the above equation. The first term involve the inertial force in the generalized coordinates, the second one models the Coriolis force and friction, the values of which depend on <italic>q̇</italic> and the third one is the conservative force, which is in correspondence to the potential energy. The control force <italic>u</italic> applied on the system drives the variation of the coordinate <italic>q</italic>. It is also noteworthy that we assume the dimension of <italic>u</italic> is equal to that of <italic>q</italic> here. This definition also admits the case for <italic>u</italic> with lower dimension than that of <italic>q</italic> by imposing constraints to <italic>u</italic>, e.g., the constraint <italic>u =</italic> [<italic>u</italic><sub>1</sub>,<italic>u</italic><sub>2</sub>, <italic>…,u<sub>n</sub></italic>] with <italic>u</italic><sub>1</sub> = 0 restricts <italic>u</italic> in <italic>a n</italic> – 1 dimensional space. Defining state variables <italic>x</italic><sub>1</sub> = <italic>q</italic> and <italic>x</italic><sub>2</sub> <italic>= q</italic>, the Euler-Lagrange <xref rid="FD1" ref-type="disp-formula">Equation (1)</xref> can be put into the following state-space form:
<disp-formula id="FD2">
<label>(2)</label>
<mml:math id="mm2" display="block">
<mml:semantics id="sm2">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo stretchy="false">)</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>ϕ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula>Note that the matrix <italic>D</italic>(<italic>x</italic><sub>1</sub>) is invertible as it is positive definite. The control objective is to asymptotically stabilize the Euler-Lagrange system (<xref rid="FD2" ref-type="disp-formula">2</xref>), <italic>i.e.</italic>, design a mapping (<italic>x</italic><sub>1</sub>,<italic>x</italic><sub>2</sub>) <italic>→ u</italic> such that <italic>x</italic><sub>1</sub> → 0 and <italic>x</italic><sub>2</sub> → 0 when time elapses.</p>
<p>As an effective design strategy, variable structure control finds applications in many different type of control systems including the Euler-Lagrange system. The method stabilizes the dynamics of a nonlinear system by steering the state to a elaborately designed sliding surface, on which the state inherently evolves towards the zero state. Particularly for the system (<xref rid="FD2" ref-type="disp-formula">2</xref>), we define <italic>s = s</italic>(<italic>x</italic><sub>1</sub>,<italic>x</italic><sub>2</sub>) as follows:
<disp-formula id="FD3">
<label>(3)</label>
<mml:math id="mm3" display="block">
<mml:semantics id="sm3">
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>c</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>c</italic><sub>0</sub> <italic>&gt;</italic> 0 is a constant. Note that <italic>s = c</italic><sub>0</sub><italic>x</italic><sub>1</sub> + <italic>x</italic><sub>2</sub> <italic>=</italic> 0 together with the dynamics of <italic>x</italic><sub>1</sub> in <xref rid="FD2" ref-type="disp-formula">Equation (2)</xref> gives the dynamics of <italic>x</italic><sub>1</sub> as <italic>ẋ<sub>1</sub></italic>) <italic>=</italic> –<italic>c</italic><sub>0</sub><italic>x</italic><sub>1</sub> for <italic>c</italic><sub>0</sub> <italic>&gt;</italic> 0. Clearly, <italic>x</italic><sub>1</sub> asymptotically converges to zero. Also we know <italic>x</italic><sub>2</sub> = 0 when <italic>x</italic><sub>1</sub> = 0 according to <italic>s =</italic> c<sub>0</sub><italic>x</italic><sub>1</sub> + <italic>x</italic><sub>2</sub> = 0. Therefore, we conclude the states x<sub>1</sub>, <italic>x</italic><sub>2</sub> on the sliding surface <italic>s =</italic> 0 for <italic>s</italic> defined in <xref rid="FD3" ref-type="disp-formula">Equation (3)</xref> converge to zero with time. With this property of the sliding surface, a control law driving the states to <italic>s =</italic> 0 definitely grantees the ultimate convergence to the zero states. Accordingly, the stabilization of the system can be realized by controlling <italic>s</italic> to zero. To reach this goal, a positive definite control Lyapunov function <italic>V</italic>(<italic>s</italic>), e.g., <italic>V</italic>(<italic>s</italic>) <italic>= s</italic><sup>2</sup>, is often used to design the control law. For stability consideration, the time derivative of <italic>V</italic>(<italic>s</italic>) is required to be negative definite. In order to guarantee the negative definiteness of the time derivative of <italic>V</italic>(<italic>s</italic>), exact information about the system dynamics (<xref rid="FD2" ref-type="disp-formula">2</xref>) is often necessary, which results in the model based design strategies.</p>
<p>About the Euler-Lagrange <xref rid="FD1" ref-type="disp-formula">Equation (1)</xref> for modeling sensor-actuator systems, we have the following remark:</p>
<p><bold>Remark 1</bold> <italic>In this paper, we are concerned with the class of sensor-actuator systems modeled by the Euler-Lagrange <xref rid="FD1" ref-type="disp-formula">Equation (1)</xref>. Actually, the dynamics of mechanical systems can be described by the Euler-Lagrange equation according to the rigid body mechanics [<xref ref-type="bibr" rid="b4-sensors-12-06117">4</xref>,<xref ref-type="bibr" rid="b5-sensors-12-06117">5</xref>], which is essentially equivalent to Newton's laws of motion. Therefore, mechanical sensor-actuator system can be modeled by <xref rid="FD1" ref-type="disp-formula">Equation (1)</xref>. In this regard, the Euler-Lagrange equation employed in the paper models a general class of sensor-actuator systems.</italic></p></sec>
<sec>
<label>3.</label>
<title>Problem Formulation</title>
<p>Without losing generality, we stabilize the system (<xref rid="FD1" ref-type="disp-formula">1</xref>) by steering it to the sliding surface <italic>s</italic> = 0 with <italic>s</italic> defined in <xref rid="FD3" ref-type="disp-formula">Equation (3)</xref>. Different from existing model based design procedures, we design a self-learning controller, which does not require accurate knowledge about <italic>D</italic>(<italic>q</italic>), <italic>C</italic>(<italic>q,q̇</italic>) and <italic>ϕ</italic>(<italic>q</italic>) in <xref rid="FD1" ref-type="disp-formula">Equation (1)</xref>. In this section, we formulate such a control problem from the optimal control perspective.</p>
<p>In this paper, we set the origin as the desired operating point, <italic>i.e.</italic>, we consider the problem of controlling the state of the system (<xref rid="FD1" ref-type="disp-formula">1</xref>) to the origin. For the case with other desired operating points, the problem can be equivalently transformed to the one with the origin as the operating point by shifting the coordinates. At each sampling period, the norm of <italic>s = c</italic><sub>0</sub><italic>x</italic><sub>1</sub> + <italic>x</italic><sub>2</sub>, which measures the distance from the desired sliding surface <italic>s</italic> = 0, can be used to evaluate the one step performance. Therefore, we define the following utility function associated with the one-step cost at the <italic>i</italic>th sampling period,
<disp-formula id="FD4">
<label>(4)</label>
<mml:math id="mm4" display="block">
<mml:semantics id="sm4">
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>U</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>with
<disp-formula id="FD5">
<label>(5)</label>
<mml:math id="mm5" display="block">
<mml:semantics id="sm5">
<mml:mrow>
<mml:mi>U</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>s</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn></mml:msub></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:mo>&lt;</mml:mo>
<mml:msub>
<mml:mi>δ</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>2</mml:mn></mml:msub></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:mo>&lt;</mml:mo>
<mml:msub>
<mml:mi>δ</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mi>n</mml:mi></mml:msub></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:mo>&lt;</mml:mo>
<mml:msub>
<mml:mi>δ</mml:mi>
<mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mtext>otherwise</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>s</italic> is defined in <xref rid="FD3" ref-type="disp-formula">Equation (3)</xref> and <italic>s =</italic> [<italic>s</italic><sub>1</sub>,<italic>s</italic><sub>2</sub>, <italic>…, s<sub>n</sub></italic>]<italic><sup>T</sup></italic>, |<italic>s<sub>i</sub></italic>| denotes the absolute value of the <italic>i</italic>th component of the vector <italic>s</italic>, the parameter <italic>δ<sub>i</sub>&gt;</italic> 0 for <italic>i</italic> = 1, 2,…, <italic>n</italic>. At each step, there is a value <italic>U<sub>i</sub></italic> and the total cost starting from the <italic>k</italic>th step along the infinite time horizon can be expressed as follows,
<disp-formula id="FD6">
<label>(6)</label>
<mml:math id="mm6" display="block">
<mml:semantics id="sm6">
<mml:mrow>
<mml:msub>
<mml:mi>J</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>=</mml:mo>
<mml:mi>J</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>u</mml:mi>
<mml:mo>¯</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>k</mml:mi></mml:mrow>
<mml:mo>∞</mml:mo></mml:munderover>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>γ</mml:mi></mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>k</mml:mi></mml:mrow></mml:msup>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>x</italic>(<italic>k</italic>) is the state vector of system (<xref rid="FD1" ref-type="disp-formula">1</xref>) sampled at the <italic>k</italic>th step with 
<inline-formula>
<mml:math id="mm7" display="inline">
<mml:semantics id="sm7">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mo stretchy="false">[</mml:mo>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn>
<mml:mi>T</mml:mi></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn>
<mml:mi>T</mml:mi></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">]</mml:mo></mml:mrow>
<mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:semantics></mml:math></inline-formula> γ is the discount factor with 0 &lt; γ &lt; 1, <italic>ū</italic>(<italic>k</italic>) <italic>=</italic> (<italic>u<sub>k</sub>,u<sub>k</sub>+</italic>1,…,<italic>u</italic><sub>∞</sub>) is the control sequence starting from the <italic>k</italic>th step. Note that for the deterministic system (<xref rid="FD1" ref-type="disp-formula">1</xref>), the preceding states after the <italic>k</italic>th step are determined by <italic>x</italic>(<italic>k</italic>) and the control sequence <italic>ū<sub>k</sub></italic>. Accordingly, <italic>J<sub>k</sub></italic> is a function of <italic>x</italic>(<italic>k</italic>) and <italic>ū</italic>(<italic>k</italic>) with <italic>J<sub>k</sub></italic> = <italic>J</italic>(<italic>x</italic>(<italic>k</italic>), <italic>ū</italic>(<italic>k</italic>)). Also note that both the cost function <italic>J<sub>k</sub></italic> and the utility function <italic>U<sub>k</sub></italic> are defined based on the discrete samplings of the continuous system (<xref rid="FD1" ref-type="disp-formula">1</xref>). Now, we can define the problem of controlling the sensor-actuator system (<xref rid="FD1" ref-type="disp-formula">1</xref>) in this framework as follows,
<disp-formula id="FD7">
<label>(7a)</label>
<mml:math id="mm8" display="block">
<mml:semantics id="sm8">
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mo>min</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>∞</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mo>Ω</mml:mo></mml:mrow></mml:munder>
<mml:msub>
<mml:mi>J</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>=</mml:mo>
<mml:munderover>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow>
<mml:mo>∞</mml:mo></mml:munderover>
<mml:mrow>
<mml:msup>
<mml:mi>γ</mml:mi>
<mml:mi>i</mml:mi></mml:msup>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>subject to:
<disp-formula id="FD8">
<label>(7b)</label>
<mml:math id="mm9" display="block">
<mml:semantics id="sm9">
<mml:mrow>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="left">
<mml:mtd columnalign="left">
<mml:mrow>
<mml:msub>
<mml:mover accent="true">
<mml:mi>x</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:msup>
<mml:mi>D</mml:mi>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:mrow></mml:msup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>ϕ</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD9">
<label>(7c)</label>
<mml:math id="mm10" display="block">
<mml:semantics id="sm10">
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mspace width="0.2em"/>
<mml:mtext>for</mml:mtext>
<mml:mspace width="0.2em"/>
<mml:mi>i</mml:mi>
<mml:mi>τ</mml:mi>
<mml:mo>≤</mml:mo>
<mml:mi>t</mml:mi>
<mml:mo>&lt;</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mi>τ</mml:mi></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>U<sub>i</sub></italic> is defined by <xref rid="FD4" ref-type="disp-formula">Equations (4)</xref> and (<xref rid="FD5" ref-type="disp-formula">5</xref>), τ &gt; 0 is the sampling period, the set Ω defines the feasible control actions, <italic>J</italic><sub>0</sub> is the cost function for <italic>k</italic> = 0 in <xref rid="FD6" ref-type="disp-formula">Equation (6)</xref>. It is worth noting that <italic>J</italic><sub>0</sub> is a function of <italic>ū</italic>(0) <italic>=</italic> (<italic>u</italic><sub>0</sub>, <italic>u</italic><sub>1</sub>,…, <italic>u</italic><sub>∞</sub>) and <italic>x</italic>(0) according to <xref rid="FD6" ref-type="disp-formula">Equation (6)</xref>. The optimization in <xref rid="FD7" ref-type="disp-formula">Equation (7)</xref> is relative to <italic>ū</italic>(0) with a given initial state <italic>x</italic>(0). Also note that in the optimization problem in <xref rid="FD7" ref-type="disp-formula">Equation (7)</xref>, the decision variable <italic>u</italic>(0),<italic>u</italic>(1), …,<italic>u</italic>(∞) are defined in every sampling period. The control action keeps the value in the duration of two consecutive sampling steps. This formulation is consistent with the real implementations of digital controllers.</p>
<p><bold>Remark 2</bold> <italic>There are infinitely many decision variables, which are u(0), u(1), …, u(∞), in the optimization problem in <xref rid="FD7" ref-type="disp-formula">Equation (7)</xref>. Therefore, this is an infinite dimensional problem. It cannot be solved directly using numerical methods. Conventionally, such kind of problem is often solved by using a finite dimensional approximation [<xref ref-type="bibr" rid="b27-sensors-12-06117">27</xref>]. In addition, note that the dynamic model of the system appears in the optimization problem in <xref rid="FD7" ref-type="disp-formula">Equation (7)</xref> and it will also show up in the finite dimensional relaxation of the problem, which means the resulting solution requires model information and thus is also model-dependent. In contrast, in this paper we investigate the model-independent variable structure control of sensor-actuator systems on the infinite time horizon.</italic></p></sec>
<sec>
<label>4.</label>
<title>Model-Free Control of the Euler-Lagrange System</title>
<p>In this section, we present the strategy to solve the constrained optimization problem efficiently without knowing the model information of the chaotic system. We first investigate the optimality condition of <xref rid="FD7" ref-type="disp-formula">Equation (7)</xref> and present an iterative procedure to approach the analytical solution. Then, we analyze the convergence of the iterative procedure and the stability with the derived control strategy.</p>
<sec>
<label>4.1.</label>
<title>Optimality Condition</title>
<p>Denoting <italic>J</italic>* the optimal value to the optimization problem in <xref rid="FD7" ref-type="disp-formula">Equation (7)</xref>, <italic>i.e.</italic>,
<disp-formula id="FD10">
<label>(8)</label>
<mml:math id="mm11" display="block">
<mml:semantics id="sm11">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msup>
<mml:mi>J</mml:mi>
<mml:mo>∗</mml:mo></mml:msup>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mo>min</mml:mo></mml:mrow>
<mml:mrow>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo>,</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mo>∞</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∈</mml:mo>
<mml:mo>Ω</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>J</mml:mi>
<mml:mn>0</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>subject to: (<xref rid="FD8" ref-type="disp-formula">7b</xref>); (<xref rid="FD9" ref-type="disp-formula">7c</xref>)</p>
<p>According to the principle of optimality [<xref ref-type="bibr" rid="b23-sensors-12-06117">23</xref>], the solution of <xref rid="FD7" ref-type="disp-formula">Equation (7)</xref> satisfy the following Bellman equation:
<disp-formula id="FD11">
<label>(9)</label>
<mml:math id="mm12" display="block">
<mml:semantics id="sm12">
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo>∗</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>y</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo>min</mml:mo></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>∈</mml:mo>
<mml:mo>Ω</mml:mo></mml:mrow></mml:munder>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>γ</mml:mi>
<mml:mi>J</mml:mi>
<mml:mo>∗</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>z</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>∀</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>,</mml:mo>
<mml:mo>∀</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo>…</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>z</italic> is the solution of <xref rid="FD8" ref-type="disp-formula">Equation (7b)</xref> at <italic>t = k</italic> + 1 with <italic>x</italic>(<italic>k</italic>) <italic>= y</italic> and the control action <italic>u</italic>(<italic>t</italic>) <italic>= u<sub>k</sub></italic> for <italic>kτ ≤ t &lt;</italic> (<italic>k</italic> + 1)τ. Without introducing confusion, we simply write <xref rid="FD11" ref-type="disp-formula">Equation (9)</xref> as follows
<disp-formula id="FD12">
<label>(10)</label>
<mml:math id="mm13" display="block">
<mml:semantics id="sm13">
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo>∗</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>min</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>γ</mml:mi>
<mml:mi>J</mml:mi>
<mml:mo>∗</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Define the Bellman operator <italic>ℬ</italic> relative to function <italic>h</italic>(<italic>z</italic>) as follows
<disp-formula id="FD13">
<label>(11)</label>
<mml:math id="mm14" display="block">
<mml:semantics id="sm14">
<mml:mrow>
<mml:mi>ℬ</mml:mi>
<mml:mi>h</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>z</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>min</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>γ</mml:mi>
<mml:mi>h</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>z</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Then, the optimality condition in <xref rid="FD12" ref-type="disp-formula">Equation (10)</xref> can be simplified into the following with the Bellman operator,
<disp-formula id="FD14">
<label>(12)</label>
<mml:math id="mm15" display="block">
<mml:semantics id="sm15">
<mml:mrow>
<mml:mi>J</mml:mi>
<mml:mo>∗</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>ℬ</mml:mi>
<mml:mi>J</mml:mi>
<mml:mo>∗</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Note that the function <italic>U<sub>k</sub></italic> is implicitly included in the Bellman operator. The <xref rid="FD14" ref-type="disp-formula">Equation (12)</xref> constitutes the optimality condition for problem in <xref rid="FD7" ref-type="disp-formula">Equation (7)</xref>. It is difficult to solve the explicit form of <italic>J</italic>* analytically from <xref rid="FD11" ref-type="disp-formula">Equation (9)</xref>. However, it is possible to get the solution by iterations. We use the following iterations to solve <italic>J</italic>*,
<disp-formula id="FD15">
<label>(13)</label>
<mml:math id="mm16" display="block">
<mml:semantics id="sm16">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>J</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>ℬ</mml:mi>
<mml:mover accent="true">
<mml:mi>J</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula>subject to: (<xref rid="FD8" ref-type="disp-formula">7b</xref>); (<xref rid="FD9" ref-type="disp-formula">7c</xref>)</p>
<p>The control action keeps constant in the duration between the <italic>k</italic>th and the <italic>k</italic> + 1th step, <italic>i.e., u*</italic>(<italic>t</italic>) <italic>=</italic> 
<inline-formula>
<mml:math id="mm17" display="inline">
<mml:semantics id="sm17">
<mml:mrow>
<mml:msubsup>
<mml:mi>u</mml:mi>
<mml:mi>k</mml:mi>
<mml:mo>∗</mml:mo></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> for <italic>kτ ≤ t &lt;</italic> (<italic>k</italic> + 1)τ. 
<inline-formula>
<mml:math id="mm18" display="inline">
<mml:semantics id="sm18">
<mml:mrow>
<mml:msubsup>
<mml:mi>u</mml:mi>
<mml:mi>k</mml:mi>
<mml:mo>∗</mml:mo></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> can be obtained from <xref rid="FD11" ref-type="disp-formula">Equation (9)</xref> based on <xref rid="FD15" ref-type="disp-formula">Equation (13)</xref>,
<disp-formula id="FD16">
<label>(14)</label>
<mml:math id="mm19" display="block">
<mml:semantics id="sm19">
<mml:mrow>
<mml:msubsup>
<mml:mi>u</mml:mi>
<mml:mi>k</mml:mi>
<mml:mo>∗</mml:mo></mml:msubsup>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mtext>argmin</mml:mtext>
<mml:mrow>
<mml:msub>
<mml:mi>u</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>∈</mml:mo>
<mml:mo>Ω</mml:mo></mml:mrow></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mi>k</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>γ</mml:mi>
<mml:mi>J</mml:mi>
<mml:mo>∗</mml:mo>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:semantics></mml:math></disp-formula></p></sec>
<sec>
<label>4.2.</label>
<title>Approximating the Action Mapping and the Critic Mapping</title>
<p>In the previous sections, the iteration (<xref rid="FD15" ref-type="disp-formula">13</xref>) is derived to calculate <italic>J</italic>* and the optimization (14) is obtained to calculate the control law. The iteration to approach <italic>J</italic>* and the optimization to derive <italic>u</italic>* have to be run in every time step in order to obtain the most up-to-date values. Inspired by the learning strategies widely studied in artificial intelligence [<xref ref-type="bibr" rid="b26-sensors-12-06117">26</xref>,<xref ref-type="bibr" rid="b28-sensors-12-06117">28</xref>], a learning based strategy is used in this section to facilitate the processing. After a enough long time, the system is able to memorize the mapping of <italic>J*</italic> and the mapping of <italic>u</italic>*. After this learning period, there will be no need to repeat any iterations or optimal searching, which will make the strategy more practical.</p>
<p>Note that the optimal cost <italic>J</italic>* is a function of the initial state. Counting the cost from the current time step, <italic>J</italic>* can also be regarded as a function of both the current state and the optimal action at current time step according to <xref rid="FD12" ref-type="disp-formula">Equation (10)</xref>. Therefore, <italic>ĵ</italic>(<italic>n</italic>), the approximation of <italic>J</italic>*, can also be regarded as a function relative to the current state and the current optimal input. As to the optimal control action <italic>u</italic>*, it is a function of the current state. Our goal in this section is to obtain the mapping from the current state and the current input to <italic>ĵ</italic> (<italic>n</italic>) and the mapping from the current state to the optimal control action <italic>u</italic>* using parameterized models, denoted as the critic model and the action model, respectively. Therefore, we can write the critic model and the action model as <italic>J<sub>n</sub></italic>(
<inline-formula>
<mml:math id="mm20" display="inline">
<mml:semantics id="sm20">
<mml:mrow>
<mml:msubsup>
<mml:mi>u</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>∗</mml:mo></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> <italic>x<sub>n</sub>, W<sub>c</sub></italic>) and 
<inline-formula>
<mml:math id="mm21" display="inline">
<mml:semantics id="sm21">
<mml:mrow>
<mml:msubsup>
<mml:mi>u</mml:mi>
<mml:mi>n</mml:mi>
<mml:mo>∗</mml:mo></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> (<italic>x<sub>n</sub>, W<sub>a</sub></italic>) respectively, where <italic>W<sub>c</sub></italic> is the parameters of the critic model and <italic>W<sub>a</sub></italic> is the parameters of the action model.</p>
<p>In order to train the critic model with the desired input-output correspondence, we define the following error at time step <italic>n</italic> + 1 to evaluate the learning performance,
<disp-formula id="FD17">
<label>(15)</label>
<mml:math id="mm22" display="block">
<mml:semantics id="sm22">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>ℬ</mml:mi>
<mml:mover accent="true">
<mml:mi>J</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:mover accent="true">
<mml:mi>J</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msubsup>
<mml:mi>e</mml:mi>
<mml:mi>c</mml:mi>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Note that <italic>Bĵ</italic>(<italic>n</italic>) is the desired value of <italic>ĵ</italic>(<italic>n +</italic> 1) according to <xref rid="FD15" ref-type="disp-formula">Equation (13)</xref>. Using the back-propagation rule, we get the following rule for updating the weight <italic>W<sub>c</sub></italic> of the critic model,
<disp-formula id="FD18">
<label>(16)</label>
<mml:math id="mm23" display="block">
<mml:semantics id="sm23">
<mml:mrow>
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd/>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>+</mml:mo>
<mml:mi>δ</mml:mi>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mo>=</mml:mo></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mover accent="true">
<mml:mi>J</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mover accent="true">
<mml:mi>J</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>l<sub>c</sub></italic>(<italic>n</italic>) is the step size for the critic model at the time step <italic>n</italic>.</p>
<p>As to the action model, the optimal control <italic>u</italic>* in <xref rid="FD16" ref-type="disp-formula">Equation (14)</xref> is the one that minimizes the cost function. Note that the possible minimum cost is zero, which corresponds to the scenario with the state staying inside the desired bounded area. In this regard, we define the action error as follows,
<disp-formula id="FD19">
<label>(17)</label>
<mml:math id="mm24" display="block">
<mml:semantics id="sm24">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mover accent="true">
<mml:mi>J</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>a</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn></mml:mfrac>
<mml:msubsup>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mn>2</mml:mn></mml:msubsup>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>Then, similar to the update rule of <italic>W<sub>c</sub></italic> for the critic model, we get the following update rule of <italic>W<sub>a</sub></italic> for the action model,
<disp-formula id="FD20">
<label>(18)</label>
<mml:math id="mm25" display="block">
<mml:semantics id="sm25">
<mml:mrow>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>a</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>a</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>l</mml:mi>
<mml:mi>a</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mi>a</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mover accent="true">
<mml:mi>J</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mover accent="true">
<mml:mi>J</mml:mi>
<mml:mo>^</mml:mo></mml:mover>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac>
<mml:mfrac>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:mo>∂</mml:mo>
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mi>a</mml:mi></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>n</mml:mi>
<mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>l<sub>a</sub></italic>(<italic>n</italic>) is the step size for the action model at the time step <italic>n</italic>.</p>
<p><xref rid="FD18" ref-type="disp-formula">Equations (16)</xref> and <xref rid="FD20" ref-type="disp-formula">(18)</xref> update the critic model and the action model progressively. After <italic>W<sub>c</sub></italic> and <italic>W<sub>a</sub></italic> have learnt the model information by learning for a long enough time, their values can be fixed at the one obtained at the final step and no further learning is required any longer, which is in contrast to <xref rid="FD16" ref-type="disp-formula">Equation (14)</xref> requiring to solve an optimization problem even after a long enough time.</p></sec></sec>
<sec>
<label>5.</label>
<title>Simulation Experiment</title>
<p>In this section, we consider the simulation implementation of the proposed control strategy. The dynamics given in <xref rid="FD1" ref-type="disp-formula">Equation (1)</xref> model a wide class of sensor-actuator systems. Particularly, to demonstrate the effectiveness of the proposed self-learning variable structure method, we apply it to the stabilizations of a typical benchmark system: the cart-pole system.</p>
<p>The cart-pole system, as sketched in <xref ref-type="fig" rid="f1-sensors-12-06117">Figure 1</xref>, is a widely used testbed for the effectiveness of control strategies. The system is composed of a pendulum and a cart. The pendulum has its mass above its pivot point, which is mounted on a cart moving horizontally. In this part, we apply the proposed control method to the cart-pole system to test the effectiveness of our method.</p>
<sec>
<label>5.1.</label>
<title>The Model</title>
<p>The cart-pole model used in this work is the same as that in [<xref ref-type="bibr" rid="b29-sensors-12-06117">29</xref>], which can be described as follows.
<disp-formula id="FD21">
<label>(19)</label>
<mml:math id="mm26" display="block">
<mml:semantics id="sm26">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo>¨</mml:mo></mml:mover>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>sin</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>+</mml:mo>
<mml:mo>cos</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:mo>−</mml:mo>
<mml:mi>F</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>m</mml:mi>
<mml:mi>l</mml:mi>
<mml:msup>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>sin</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo>sgn</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>−</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mi>p</mml:mi></mml:msub>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo>˙</mml:mo></mml:mover></mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mi>l</mml:mi></mml:mrow></mml:mfrac></mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>4</mml:mn>
<mml:mn>3</mml:mn></mml:mfrac>
<mml:mo>−</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mo>cos</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mi>θ</mml:mi></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:semantics></mml:math></disp-formula>
<disp-formula id="FD22">
<label>(20)</label>
<mml:math id="mm27" display="block">
<mml:semantics id="sm27">
<mml:mrow>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>¨</mml:mo></mml:mover>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>m</mml:mi>
<mml:mi>l</mml:mi>
<mml:mo stretchy="false">[</mml:mo>
<mml:msup>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mn>2</mml:mn></mml:msup>
<mml:mo>sin</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>−</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo>¨</mml:mo></mml:mover>
<mml:mo>cos</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">]</mml:mo>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo>sgn</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:semantics></mml:math></disp-formula>where
<disp-formula id="FD23">
<label>(21)</label>
<mml:math id="mm28" display="block">
<mml:semantics id="sm28">
<mml:mrow>
<mml:mo>sgn</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mn>1</mml:mn></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&gt;</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mi>f</mml:mi></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mo>&lt;</mml:mo>
<mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>with the following values of the parameters:
<list list-type="simple">
<list-item>
<p><italic>g</italic>: 9.8 <italic>m/s</italic><sup>2</sup>, acceleration due to gravity;</p></list-item>
<list-item>
<p><italic>m<sub>c</sub></italic>: 1.0 kg, mass of cart;</p></list-item>
<list-item>
<p><italic>m</italic>: 0.1 kg, mass of pole;</p></list-item>
<list-item>
<p><italic>l</italic>: 0.5 meter, half-pole length;</p></list-item>
<list-item>
<p><italic>μ<sub>c</sub></italic>: 0.0005, coefficient of friction of cart on track;</p></list-item>
<list-item>
<p><italic>μ<sub>p</sub></italic>: 0.000002, coefficient of friction of pole on cart;</p></list-item>
<list-item>
<p><italic>F</italic>: ±10 Newtons, force applied to cart center of mass.</p></list-item></list></p>
<p>This system has four state variables: <italic>y</italic> is the position of the cart on track, <italic>θ</italic> is the angle of the pole with respect to the vertical position, and <italic>ẏ</italic> and <italic>θ̇</italic> are the cart velocity and angular velocity, respectively.</p>
<p>Define 
<inline-formula>
<mml:math id="mm29" display="inline">
<mml:semantics id="sm29">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mfrac>
<mml:mi>l</mml:mi>
<mml:mrow>
<mml:mo>cos</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mn>4</mml:mn>
<mml:mn>3</mml:mn></mml:mfrac>
<mml:mo>−</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:msup>
<mml:mrow>
<mml:mo>cos</mml:mo></mml:mrow>
<mml:mn>2</mml:mn></mml:msup>
<mml:mi>θ</mml:mi></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>m</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo>+</mml:mo>
<mml:mi>m</mml:mi></mml:mrow></mml:mfrac></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></inline-formula>, 
<inline-formula>
<mml:math id="mm30" display="inline">
<mml:semantics id="sm30">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mo>sin</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow>
<mml:mrow>
<mml:mo>cos</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:semantics></mml:math></inline-formula>, 
<inline-formula>
<mml:math id="mm31" display="inline">
<mml:semantics id="sm31">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>3</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>,</mml:mo>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>m</mml:mi>
<mml:mi>l</mml:mi>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mo>sin</mml:mo>
<mml:mi>θ</mml:mi>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mi>p</mml:mi></mml:msub></mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">ml</mml:mtext>
<mml:mo>cos</mml:mo>
<mml:mi>θ</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:semantics></mml:math></inline-formula>, 
<inline-formula>
<mml:math id="mm32" display="inline">
<mml:semantics id="sm32">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>4</mml:mn></mml:msub>
<mml:mo stretchy="false">(</mml:mo>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>μ</mml:mi>
<mml:mi>c</mml:mi></mml:msub>
<mml:mo>sgn</mml:mo>
<mml:mo stretchy="false">(</mml:mo>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mo stretchy="false">)</mml:mo></mml:mrow>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>˙</mml:mo></mml:mover></mml:mfrac></mml:mrow></mml:semantics></mml:math></inline-formula>, <italic>A</italic><sub>5</sub> = <italic>m<sub>c</sub></italic> + <italic>m, A</italic><sub>6</sub>(<italic>θ,θ̇</italic>) <italic>= ml</italic>θ̇ <italic>sinθ, A</italic><sub>7</sub>(<italic>θ</italic>) <italic>=</italic> –<italic>ml cosθ</italic>. With these notations, <xref rid="FD21" ref-type="disp-formula">Equation (19)</xref> can be re-written as:
<disp-formula id="FD24">
<label>(22)</label>
<mml:math id="mm33" display="block">
<mml:semantics id="sm33">
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo>¨</mml:mo></mml:mover></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>F</mml:mi>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>3</mml:mn></mml:msub>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>4</mml:mn></mml:msub>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>˙</mml:mo></mml:mover></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>5</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow></mml:mfrac>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>¨</mml:mo></mml:mover></mml:mrow></mml:mtd>
<mml:mtd columnalign="left">
<mml:mrow>
<mml:mo>=</mml:mo>
<mml:mi>F</mml:mi>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>6</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>3</mml:mn></mml:msub>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow></mml:mfrac>
<mml:mover accent="true">
<mml:mi>θ</mml:mi>
<mml:mo>˙</mml:mo></mml:mover>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>4</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>4</mml:mn></mml:msub>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow></mml:mfrac>
<mml:mover accent="true">
<mml:mi>y</mml:mi>
<mml:mo>˙</mml:mo></mml:mover></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula></p>
<p>By choosing
<disp-formula id="FD25">
<mml:math id="mm34" display="block">
<mml:semantics id="sm34">
<mml:mrow>
<mml:mtable columnalign="right">
<mml:mtr columnalign="right">
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>D</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mn>0</mml:mn></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>5</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>C</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>3</mml:mn></mml:msub></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>4</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>6</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>3</mml:mn></mml:msub>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mtd>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>4</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>4</mml:mn></mml:msub>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr columnalign="right">
<mml:mtd columnalign="right">
<mml:mrow>
<mml:mi>ϕ</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>−</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>2</mml:mn></mml:msub>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>1</mml:mn></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>A</mml:mi>
<mml:mn>7</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>q</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mi>θ</mml:mi></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi>y</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow>
<mml:mo>,</mml:mo>
<mml:mi>u</mml:mi>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:mtable>
<mml:mtr>
<mml:mtd>
<mml:mi>F</mml:mi></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi>F</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:mrow>
<mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:semantics></mml:math></disp-formula>the system of <xref rid="FD21" ref-type="disp-formula">Equation (19)</xref> coincides with the model of <xref rid="FD1" ref-type="disp-formula">Equation (1)</xref>. Note that the input <italic>u</italic> in this situation is constrained in the set Ω = {<italic>u =</italic> [<italic>u</italic><sub>1</sub>, u<sub>2</sub>]<italic><sup>T</sup>, u</italic><sub>1</sub> <italic>= u</italic><sub>2</sub> ∈ ℝ}.</p></sec>
<sec sec-type="results">
<label>5.2.</label>
<title>Experiment Setup and Results</title>
<p>In the simulation experiment, we set the discount factor γ = 0.95, the sliding surface parameter <italic>k</italic> = 10, <italic>δ</italic><sub>1</sub> = 2, <italic>δ</italic><sub>2</sub> = 24. The feasible control action set Ω in <xref rid="FD9" ref-type="disp-formula">Equation (7)</xref> is defined as Ω = {<italic>u =</italic> [<italic>u</italic><sub>1</sub>,<italic>u</italic><sub>2</sub>]<italic><sup>T</sup>,u</italic><sub>1</sub> ∈ ℝ,u<sub>2</sub> ∈ ℝ,<italic>u</italic><sub>1</sub> = <italic>u</italic><sub>2</sub> = ±10 Newtons}. This definition corresponds to the widely used bang-bang control in industry. To make the output of the action model within the feasible set, the output of the action network is clamped to 10 if it is greater than or equal to zero and clamped to – 10 if less than zero. The sampling period τ is set to 0.02 seconds. Both the critic model and the action model are linearly parameterized. The step size of the critic model, which is <italic>l<sub>c</sub></italic>(<italic>n</italic>) and that of the action model, which is <italic>l<sub>a</sub></italic>(<italic>n</italic>) are both set to 0.03. Both the update of the critic model weight <italic>W<sub>c</sub></italic> in <xref rid="FD18" ref-type="disp-formula">Equation (16)</xref> and the update of the action model weight <italic>W<sub>a</sub></italic> in <xref rid="FD20" ref-type="disp-formula">Equation (18)</xref> last for 30 seconds. For the uncontrolled cart-pole system with <italic>F</italic> = 0 in <xref rid="FD21" ref-type="disp-formula">Equation (19)</xref>, the pendulum will fall down. The control objective is to stabilize the pendulum to the inverted direction (<italic>θ =</italic> 0). Time history of the state variables are plotted in <xref ref-type="fig" rid="f2-sensors-12-06117">Figure 2</xref> for the system with the proposed self-learning variable structure control strategy. From this figure, it can be observed that <italic>θ</italic> is stabilized in a small vicinity around zero (with a small error of ±0.1 rads), which corresponds to the inverted direction.</p></sec></sec>
<sec sec-type="conclusions">
<label>6.</label>
<title>Conclusions and Future Work</title>
<p>In this paper, the self-learning variable structure control is considered to solve a class of sensor-actuator systems. The control problem is formulated from the optimal control perspective and solved via iterative methods. In contrast to existing models, this method does not need pre-knowledge on the accurate mathematic model. The critic model and the the action model are introduced to make the method more practical. Simulations show that the control law obtained by the proposed method indeed achieves the control objective. Future work on this topic includes the theoretical proof of the convergence and exploration on the performance limit of the proposed strategy. Also, the control of other mechanical systems modeled by Euler-Lagrange system, such as manipulators <italic>etc</italic>., will be explored in our future work.</p></sec></body>
<back>
<ack>
<p>Shuai Li would like to share with the readers the poem by Rabindranath Tagore “The traveler has to knock at every alien door to come to his own and one has to wander through all the outer worlds to reach the innermost shrine at the end”. The authors would like to acknowledge the support by the National Natural Science Foundation of China under Grant No. 61172165 and Guangdong Science Foundation of China under Grant No. S2011010006116 and No. 10151802904000013.</p></ack>
<ref-list>
<title>References and Notes</title>
<ref id="b1-sensors-12-06117"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Isermann</surname><given-names>R.</given-names></name></person-group><article-title>Modeling and design methodology for mechatronic systems</article-title><source>IEEE/ASME Trans. Mechatr.</source><year>1996</year><volume>1</volume><fpage>16</fpage><lpage>28</lpage><pub-id pub-id-type="doi">10.1109/3516.491406</pub-id></citation></ref>
<ref id="b2-sensors-12-06117"><label>2.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>van de Panne</surname><given-names>M.</given-names></name><name><surname>Fiume</surname><given-names>E.</given-names></name></person-group><article-title>Sensor-actuator networks</article-title><conf-name>Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '93)</conf-name><conf-loc>Anaheim, CA, USA</conf-loc><conf-date>1–6 August 1993</conf-date><fpage>335</fpage><lpage>342</lpage></citation></ref>
<ref id="b3-sensors-12-06117"><label>3.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>B.</given-names></name><name><surname>Chen</surname><given-names>S.</given-names></name><name><surname>Li</surname><given-names>S.</given-names></name><name><surname>Liang</surname><given-names>Y.</given-names></name></person-group><article-title>Intelligent control of a sensor-actuator system via kernelized least-squares policy iteration</article-title><source>Sensors</source><year>2012</year><volume>12</volume><fpage>2632</fpage><lpage>2653</lpage><pub-id pub-id-type="doi">10.3390/s120302632</pub-id><pub-id pub-id-type="pmid">22736969</pub-id></citation></ref>
<ref id="b4-sensors-12-06117"><label>4.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>de Silva</surname><given-names>C.</given-names></name></person-group><source>Sensors and Actuators: Control System Instrumentation</source><publisher-name>Taylor &amp; Francis, CRC Press</publisher-name><publisher-loc>Boca Raton, FL, USA</publisher-loc><year>2007</year></citation></ref>
<ref id="b5-sensors-12-06117"><label>5.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Beer</surname><given-names>F.P.</given-names></name></person-group><source>Vector Mechanics for Engineers: Statics and Dynamics</source><publisher-name>McGraw-Hill</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2003</year></citation></ref>
<ref id="b6-sensors-12-06117"><label>6.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Lewis</surname><given-names>F.L.</given-names></name><name><surname>Dawson</surname><given-names>D.M.</given-names></name><name><surname>Abdallah</surname><given-names>C.T.</given-names></name></person-group><source>Manipulator Control Theory and Practice</source><publisher-name>Marcel Dekker</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2004</year><volume>15</volume></citation></ref>
<ref id="b7-sensors-12-06117"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Li</surname><given-names>S.</given-names></name><name><surname>Chen</surname><given-names>S.</given-names></name><name><surname>Liu</surname><given-names>B.</given-names></name><name><surname>Li</surname><given-names>Y.</given-names></name><name><surname>Liang</surname><given-names>Y.</given-names></name></person-group><article-title>Decentralized kinematic control of a class of collaborative redundant manipulators via recurrent neural networks</article-title><source>Neurocomputing</source><year>2012</year><volume>8</volume><fpage>108</fpage><lpage>121</lpage></citation></ref>
<ref id="b8-sensors-12-06117"><label>8.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Li</surname><given-names>S.</given-names></name><name><surname>Meng</surname><given-names>M.Q.H.</given-names></name><name><surname>Chen</surname><given-names>W.</given-names></name></person-group><article-title>SP-NN: A novel neural network approach for path planning</article-title><conf-name>Proceedings of IEEE International Conference on Robotics and Biomimetics</conf-name><conf-loc>Sanya, Hainan, China</conf-loc><conf-date>15–18 December 2007</conf-date><fpage>1355</fpage><lpage>1360</lpage></citation></ref>
<ref id="b9-sensors-12-06117"><label>9.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Bloch</surname><given-names>A.M.</given-names></name></person-group><source>Nonholonomic Mechanics and Control</source><publisher-name>Springer-Verlag</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2003</year></citation></ref>
<ref id="b10-sensors-12-06117"><label>10.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Yu</surname><given-names>H.</given-names></name><name><surname>Liu</surname><given-names>Y.</given-names></name><name><surname>Yang</surname><given-names>T.</given-names></name></person-group><article-title>Tracking control of a pendulum-driven cart-pole underactuated system</article-title><conf-name>Proceedings of IEEE International Conference on Systems, Man and Cybernetics</conf-name><conf-loc>Montreal, QC, Canada</conf-loc><conf-date>7–10 October 2007</conf-date><fpage>2425</fpage><lpage>2430</lpage></citation></ref>
<ref id="b11-sensors-12-06117"><label>11.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Seifried</surname><given-names>R.</given-names></name></person-group><article-title>Two approaches for feedforward control and optimal design of underactuated multibody systems</article-title><source>Multibody Syst. Dynam.</source><year>2012</year><volume>27</volume><fpage>75</fpage><lpage>93</lpage><pub-id pub-id-type="doi">10.1007/s11044-011-9261-z</pub-id></citation></ref>
<ref id="b12-sensors-12-06117"><label>12.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Isidori</surname><given-names>A.</given-names></name></person-group><source>Nonlinear Control Systems II</source><publisher-name>Springer-Verlag</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1999</year></citation></ref>
<ref id="b13-sensors-12-06117"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Primbs</surname><given-names>J.A.</given-names></name><name><surname>Nevistic</surname><given-names>V.</given-names></name><name><surname>Doyle</surname><given-names>J.C.</given-names></name></person-group><article-title>Nonlinear optimal control: A control lyapunov function and receding horizon perspective</article-title><source>Asian J. Control</source><year>2009</year><volume>1</volume><fpage>14</fpage><lpage>24</lpage></citation></ref>
<ref id="b14-sensors-12-06117"><label>14.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Ortega</surname><given-names>R.</given-names></name><name><surname>Loria</surname><given-names>A.</given-names></name><name><surname>Nicklasson</surname><given-names>P.J.</given-names></name><name><surname>Sira-Ramirez</surname><given-names>H.</given-names></name></person-group><source>Passivity-Based Control of Euler-Lagrange Systems</source><publisher-name>Springer-Verlag</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>1998</year></citation></ref>
<ref id="b15-sensors-12-06117"><label>15.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Azhmyakov</surname><given-names>V.</given-names></name></person-group><article-title>Optimal control of mechanical systems</article-title><source>Diff. Equat. Nonlin. Mech.</source><year>2007</year><volume>12</volume><fpage>3</fpage><lpage>16</lpage></citation></ref>
<ref id="b16-sensors-12-06117"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Huo</surname><given-names>W.</given-names></name></person-group><article-title>Predictive variable structure control of nonholonomic chained systems</article-title><source>Int. J. Comput. Math.</source><year>2008</year><volume>85</volume><fpage>949</fpage><lpage>960</lpage><pub-id pub-id-type="doi">10.1080/00207160701326798</pub-id></citation></ref>
<ref id="b17-sensors-12-06117"><label>17.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Dumitrascu</surname><given-names>B.</given-names></name><name><surname>Filipescu</surname><given-names>A.</given-names></name><name><surname>Minzu</surname><given-names>V.</given-names></name><name><surname>Filipescu</surname><given-names>A.</given-names></name></person-group><article-title>Backstepping control of wheeled mobile robots</article-title><conf-name>Proceedings of 15th International Conference on System Theory, Control, and Computing (ICSTCC 2011)</conf-name><conf-loc>Sinaia, Romania</conf-loc><conf-date>14–16 October 2011</conf-date><fpage>1</fpage><lpage>6</lpage></citation></ref>
<ref id="b18-sensors-12-06117"><label>18.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hussein</surname><given-names>I.I.</given-names></name><name><surname>Bloch</surname><given-names>A.M.</given-names></name></person-group><article-title>Optimal control of underactuated nonholonomic mechanical systems</article-title><source>IEEE Trans. Autom. Control.</source><year>2005</year><volume>53</volume><pub-id pub-id-type="doi">10.1109/TAC.2008.919853</pub-id></citation></ref>
<ref id="b19-sensors-12-06117"><label>19.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Pazderski</surname><given-names>D.</given-names></name><name><surname>Kozowski</surname><given-names>K.</given-names></name><name><surname>Krysiak</surname><given-names>B.</given-names></name></person-group><article-title>Nonsmooth stabilizer for three link nonholonomic manipulator using polar-like coordinate representation</article-title><source>Robot Motion and Control</source><person-group person-group-type="editor"><name><surname>Kozlowski</surname><given-names>K.</given-names></name></person-group><publisher-name>Springer</publisher-name><publisher-loc>Berlin/Heidelberg, Germany</publisher-loc><year>2009</year></citation></ref>
<ref id="b20-sensors-12-06117"><label>20.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cuesta</surname><given-names>F.</given-names></name><name><surname>Ollero</surname><given-names>A.</given-names></name><name><surname>Arrue</surname><given-names>B.C.</given-names></name><name><surname>Braunstingl</surname><given-names>R.</given-names></name></person-group><article-title>Intelligent control of nonholonomic mobile robots with fuzzy perception</article-title><source>Fuzzy Sets Syst.</source><year>2003</year><volume>134</volume><fpage>47</fpage><lpage>64</lpage><pub-id pub-id-type="doi">10.1016/S0165-0114(02)00229-4</pub-id></citation></ref>
<ref id="b21-sensors-12-06117"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wai</surname><given-names>R.J.</given-names></name><name><surname>Liu</surname><given-names>C.M.</given-names></name></person-group><article-title>Design of dynamic petri recurrent fuzzy neural network and its application to path-tracking control of nonholonomic mobile robot</article-title><source>IEEE Trans. Indust. Electr.</source><year>2009</year><volume>56</volume><fpage>2667</fpage><lpage>2683</lpage><pub-id pub-id-type="doi">10.1109/TIE.2009.2020077</pub-id></citation></ref>
<ref id="b22-sensors-12-06117"><label>22.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kinjo</surname><given-names>H.</given-names></name><name><surname>Uezato</surname><given-names>E.</given-names></name><name><surname>Duong</surname><given-names>S.C.</given-names></name><name><surname>Yamamoto</surname><given-names>T.</given-names></name></person-group><article-title>Neurocontroller with a genetic algorithm for nonholonomic systems: Flying robot and four-wheel vehicle examples</article-title><source>Artif. Life Robot.</source><year>2009</year><volume>13</volume><fpage>464</fpage><lpage>469</lpage><pub-id pub-id-type="doi">10.1007/s10015-008-0609-2</pub-id></citation></ref>
<ref id="b23-sensors-12-06117"><label>23.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Bertsekas</surname><given-names>D.P.</given-names></name></person-group><source>Dynamic Programming and Optimal Control</source><edition>3rd ed.</edition><publisher-name>Athena Scientific</publisher-name><publisher-loc>Nashua, NH, USA</publisher-loc><year>2005</year></citation></ref>
<ref id="b24-sensors-12-06117"><label>24.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Murray</surname><given-names>J.J.</given-names></name><name><surname>Cox</surname><given-names>C.J.</given-names></name><name><surname>Lendaris</surname><given-names>G.G.</given-names></name><name><surname>Saeks</surname><given-names>R.</given-names></name></person-group><article-title>Adaptive dynamic programming</article-title><source>IEEE Trans. Syst. Man Cyber.</source><year>2002</year><volume>32</volume><fpage>140</fpage><lpage>153</lpage><pub-id pub-id-type="doi">10.1109/TSMCC.2002.801727</pub-id></citation></ref>
<ref id="b25-sensors-12-06117"><label>25.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lewis</surname><given-names>F.L.</given-names></name><name><surname>Vrabie</surname><given-names>D.</given-names></name></person-group><article-title>Reinforcement learning and adaptive dynamic programming for feedback control</article-title><source>IEEE Circuits Syst. Mag.</source><year>2009</year><volume>9</volume><fpage>32</fpage><lpage>50</lpage><pub-id pub-id-type="doi">10.1109/MCAS.2009.933854</pub-id></citation></ref>
<ref id="b26-sensors-12-06117"><label>26.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Si</surname><given-names>J.</given-names></name><name><surname>Barto</surname><given-names>A.</given-names></name><name><surname>Powell</surname><given-names>W.</given-names></name><name><surname>Wunsch</surname><given-names>D.</given-names></name></person-group><source>Handbook of Learning and Approximate Dynamic Programming</source><publisher-name>John Wiley and Sons</publisher-name><publisher-loc>Hoboken, NJ, USA</publisher-loc><year>2004</year></citation></ref>
<ref id="b27-sensors-12-06117"><label>27.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mayne</surname><given-names>D.Q.</given-names></name><name><surname>Michalska</surname><given-names>H.</given-names></name></person-group><article-title>Receding horizon control of nonlinear systems</article-title><source>IEEE Trans. Autom. Control</source><year>1990</year><volume>35</volume><fpage>814</fpage><lpage>824</lpage><pub-id pub-id-type="doi">10.1109/9.57020</pub-id></citation></ref>
<ref id="b28-sensors-12-06117"><label>28.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Bishop</surname><given-names>C.M.</given-names></name></person-group><source>Pattern Recognition and Machine Learning</source><publisher-name>Springer</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2006</year></citation></ref>
<ref id="b29-sensors-12-06117"><label>29.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Si</surname><given-names>J.</given-names></name><name><surname>Wang</surname><given-names>Y.T.</given-names></name></person-group><article-title>Online learning control by association and reinforcement</article-title><source>IEEE Trans. Neural Netw.</source><year>2001</year><volume>12</volume><fpage>264</fpage><lpage>276</lpage><pub-id pub-id-type="doi">10.1109/72.914523</pub-id><pub-id pub-id-type="pmid">18244383</pub-id></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures</title>
<fig id="f1-sensors-12-06117" position="float">
<label>Figure 1.</label>
<caption>
<p>The cart-pole system.</p></caption>
<graphic xlink:href="sensors-12-06117f1.gif"/></fig>
<fig id="f2-sensors-12-06117" position="float">
<label>Figure 2.</label>
<caption>
<p>State profiles of the cart-pole system with the proposed control strategy.</p></caption>
<graphic xlink:href="sensors-12-06117f2.gif"/></fig></sec></back></article>
