Next Article in Journal
Blizzard: A Distributed Consensus Protocol for Mobile Devices
Next Article in Special Issue
Design and Real-Time Implementation of a Cascaded Model Predictive Control Architecture for Unmanned Aerial Vehicles
Previous Article in Journal
Existence of Traveling Waves of a Diffusive Susceptible–Infected–Symptomatic–Recovered Epidemic Model with Temporal Delay
Previous Article in Special Issue
Nonlinear Optimal Control for Stochastic Dynamical Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dimensionless Policies Based on the Buckingham π Theorem: Is This a Good Way to Generalize Numerical Results?

Département de génie mécanique, Faculté de génie, Université de Sherbrooke, 2500, boul. de l’Université, Sherbrooke, QC J1K 2R1, Canada
Mathematics 2024, 12(5), 709; https://doi.org/10.3390/math12050709
Submission received: 22 December 2023 / Revised: 25 February 2024 / Accepted: 26 February 2024 / Published: 28 February 2024
(This article belongs to the Special Issue Dynamics and Control Theory with Applications)

Abstract

:
The answer to the question posed in the title is yes if the context (the list of variables defining the motion control problem) is dimensionally similar. This article explores the use of the Buckingham π theorem as a tool to encode the control policies of physical systems into a more generic form of knowledge that can be reused in various situations. This approach can be interpreted as enforcing invariance to the scaling of the fundamental units in an algorithm learning a control policy. First, we show, by restating the solution to a motion control problem using dimensionless variables, that (1) the policy mapping involves a reduced number of parameters and (2) control policies generated numerically for a specific system can be transferred exactly to a subset of dimensionally similar systems by scaling the input and output variables appropriately. Those two generic theoretical results are then demonstrated, with numerically generated optimal controllers, for the classic motion control problem of swinging up a torque-limited inverted pendulum and positioning a vehicle in slippery conditions. We also discuss the concept of regime, a region in the space of context variables, that can help to relax the similarity condition. Furthermore, we discuss how applying dimensional scaling of the input and output of a context-specific black-box policy is equivalent to substituting new system parameters in an analytical equation under some conditions, using a linear quadratic regulator (LQR) and a computed torque controller as examples. It remains to be seen how practical this approach can be to generalize policies for more complex high-dimensional problems, but the early results show that it is a promising transfer learning tool for numerical approaches like dynamic programming and reinforcement learning.

1. Introduction

To solve challenging motion control problems in robotics (locomotion, manipulation, vehicle control, etc.), many approaches now include a type of mathematical optimization that has no closed-form solution and that is solved numerically, either online (trajectory optimization [1], model predictive control [2], etc.) or offline (reinforcement learning [3]). Numerical tools, however, have a major drawback compared to simpler analytical approaches: the parameters of the problem do not appear explicitly in the solutions, which makes it much harder to generalize and reuse the results. Analytical solutions to control problems have the useful property of allowing the solution to be adjusted to different system parameters by simply substituting the new values in the equation. For instance, an analytical feedback law solution to a robot motion control problem can be transferred to a similar system by adjusting the values of the parameters (lengths, masses, etc.) in the equation. However, with a reinforcement learning solution, we would have to re-conduct all the training, implying (generally) multiple hours of data collection and/or computation. It would be a great asset to have the ability to adjust black-box numerical solutions with respect to some problem parameters.
In this paper, we explore the concept of dimensionless policies, a more generic form of knowledge conceptually illustrated in Figure 1, as an approach to generalize numerical solutions to motion control problems. First, in Section 2, we use dimensional analysis (i.e., the Buckingham π theorem [4]) to show that motion control problems with dimensionally similar context variables must share the same feedback law solution when expressed in a dimensionless form, and discuss the implications. Two main generic theoretical results, relevant for any physically meaningful control policies, are presented as Theorems 1 and 2. Then, in Section 3, we present two case studies with numerical results. Optimal feedback laws computed with a dynamic programming algorithm are used to demonstrate the theoretical results and their relevance for (1) the classical motion control problem of swinging up an inverted pendulum in Section 3.1 and (2) a car motion control problem in Section 3.2. Furthermore, in Section 4, we illustrate—with two examples—how the proposed dimensional scaling is equivalent to changing parameters in an analytical solution.
A very promising application of the concept of dimensionless policies is to empower reinforcement learning schemes, for which data efficiency is critical [5]. For instance, it would be interesting to use the data of all the vehicles on the road, even if they are of varying dimensions and dynamic characteristics, to learn appropriate maneuvers in situations that occur very rarely. This idea of reusing data or results in a different context is usually called transfer learning [6] and has received a great deal of research attention, mostly targeted at applying a learned policy to new tasks. The more specific idea of transferring policies and data between systems/robots has also been explored, with schemes based on modular blocks [7], invariant features [8], a dynamic map [9], a vector representation of each robot hardware [10], or using tools from adaptive control [11] and robust control [12]. Dimensionless numbers and dimensional analysis comprise a technique based on the idea that some relationships should not depend on units that can be used for analyzing many physical problems [4,13,14]. The most well-known application in the field of fluid mechanics is the idea of matching ratios (i.e., Reynolds, Prandtl, or Mach numbers) to allow for the generalization of experimental results between systems of various scales. The recent success of machine learning and data-driven schemes highlights the question of generalizing results, and there is renewed interest in using dimensional analysis in the context of learning [15,16,17]. In this paper, we present an initial exploration of how dimensional analysis can be applied specifically to help generalize policy solutions for motion control problems involving physically meaningful variables like force, length, mass, and time.

2. Dimensionless Policies

In the following section, we develop the concept of dimensionless policies based on the Buckingham π theorem and present generic theoretical results that are relevant for any type of physically meaningful control policy.

2.1. Context Variables in the Policy Mapping

Here, we call a feedback law mapping f, specific to a given system, from a vector space representing the state x of the dynamic system to a vector space representing the control inputs u of the system:
u = f x
Under some assumptions (fully observable systems, additive costs, and infinite time horizon), the optimal feedback law is guaranteed to be in this state feedback form [18]. We will only consider motion control problems that lead to such time-independent feedback laws in the following analysis. To consider the question of how this system-specific feedback law can be transferred to a different context, it is useful to think about higher dimensional mapping π , which is herein referred to as a policy, also having a vector of variables c describing the context as an additional input argument as illustrated in Figure 2.
Definition 1. 
A policy is defined as the solution to a motion control problem in the form of a function computing the control inputs u from the system states x and context parameters c as follows:
u = π x , c
with u R k x R n c R m
where k is the dimension of the control input vector, n is the dimension of the dynamic system state vector, and k is the dimension of the vector of context parameters.
The context c is a vector of relevant parameters defining the motion control problem, i.e., parameters that affect the feedback law solution. The policy π is thus a mapping consisting of the feedback law solutions for all possible contexts. In Section 3.1, a case study is conducted by considering the optimal feedback law for swinging up a torque-limited inverted pendulum. For this example, the context variables are the pendulum mass m, the gravitational constant g, and the length l, as well as what we call task parameters: a weight parameter in the cost function q and a constraint τ m a x on the maximum input torque. For a given pendulum state, the optimal torque is also a function of the context variables; i.e., the solution is different if the pendulum is heavier or more torque-limited.
Definition 2. 
A feedback law with a subscript letter a is defined as the solution to a motion problem for a specific situation defined by an instance of context variables c a , as follows:
f a x = π x , c = c a x
The feedback law f a thus represents a slice of the global policy when the context variables are fixed at c a values as illustrated in Figure 3.
The goal of generalizing a feedback law to a different context can thus be formulated into the following question: if a feedback law f a is known for a context described by variables c a , can this knowledge help us to deduce the policy solution in a different context, namely c b ?
π x , c = c a = f a x π x , c = c b = ?
Using the Buckingham π theorem [4], we will show that, if the context is dimensionally similar, then both feedback laws must be equal up to scaling factors (Theorem 2).

2.2. Buckingham π Theorem

The Buckingham π theorem [4] is a tool based on dimensional analysis [13,14] that enables restating a relationship involving multiple physically meaningful dimensional variables using fewer dimensionless variables:
x 1 = f ( x 2 , , x n ) Π 1 = f ( Π 2 , , Π p )
If d fundamental dimensions are involved in the n dimensional variables (for instance, time [T], length [L], and mass [M]), then the number of required dimensionless variables, often called Π groups, is p n d . In most situations, the number of variables in the relationship can be reduced directly by the number of fundamental dimensions involved, and p = n d . The Buckingham π theorem provides a methodology to generate the Π groups; however, the choice of Π groups is not unique. The approach is to select (arbitrarily) d variables involving the d fundamental dimensions independently, called the repeated variables. Then, the Π groups are generated by multiplying all the other variables by the repeated variables exponentiated by rational exponents selected to make the group dimensionless. Assuming x 1 , … x d , where the selected repeated variables, the Π groups are
Π i = x d + i x 1 e 1 i x 2 e 2 i x d e d i Repeated variables i = { 1 , , p }
Finding the correct exponents to make all groups dimensionless can be formulated as solving a linear system of d equations. We refer to previous literature for more details on the theorem, and here use it specifically on the defined concept of policy map.

2.3. Dimensional Analysis of the Policy Mapping

If a policy is physically meaningful (for example, a policy that computes a force based on position and velocity, but not a policy for playing chess), we can use the Buckingham π theorem to simplify the policy in dimensionless form.
Theorem 1. 
If a policy is physically meaningful and all its variables involve d fundamental dimensions that are independently present in the context variables c, then the policy can be restated in a dimensionless form as follows:
u = π ( x , c ) u = π ( x , c )
u R k x R n c R m u R k x R n c R ( m d )
where the dimensionless variables can be related to dimensional variables using transformation matrices that depend only on the context variables as follows:
u = T u ( c ) u
x = T x ( c ) x
c = T c ( c ) c
Furthermore, the transformation matrices can be used to relate the dimensional and dimensionless policy as follows:
π ( x , c ) = T u 1 ( c ) π T x ( c ) x , T c ( c ) c
Proof. 
For a system with k control inputs, we can treat the policy as k mappings from states and context variables to each scalar control input u j :
u j = π j x 1 , , x n , c 1 , , c m
where Equation (14) is the jth line of the policy in vector form, as described by Equation (2). Then, if the state vector is defined by n variables, and the context is defined by m (system and task) parameters, then each mapping π j is a relation between 1 + n + m variables. Under the assumption that the policy involves physically meaningful variables, and that it is invariant under an arbitrary scaling of any fundamental dimensions—i.e., independent of a system of units—then we can apply the Buckingham π theorem [4] to conclude that, if d dimensions are involved in all the variables, then Equation (14) can be restated as an equivalent relationship between p dimensionless Π groups where p 1 + n + m d . Assuming that d dimensions are involved in the m context variables, and that we are in the typical scenario where maximum reduction is possible ( p = 1 + n + m d ), then we can select d context variables { c 1 , c 2 , , c d } as the basis (the repeated variables) to scale all other variables into dimensionless Π groups. We denote dimensionless Π group as the base variables with an asterisk (*) as follows:
u j = u j c 1 e 1 j u c 2 e 2 j u c d e d j u j = { 1 , , k }
x i = x i c 1 e 1 i x c 2 e 2 i x c d e d i x i = { 1 , , n }
c i = c i c 1 e 1 i c c 2 e 2 i c c d e d i c i = { d + 1 , , m }
where exponents e i j are rational numbers selected to make all equations dimensionless. We can then define transformation matrices and write Equations (15)–(17) in a vector form where the repeated variables are grouped into matrices defined as the following:
u 1 u k u = c 1 e 11 u c 2 e 21 u c d e d 1 u 0 0 0 0 0 0 c 1 e 1 k u c 2 e 2 k u c d e d k u T u ( c ) u 1 u k u
x 1 x n x = c 1 e 11 x c 2 e 21 x c d e d 1 x 0 0 0 0 0 0 c 1 e 1 n x c 2 e 2 n x c d e d n x T x ( c ) x 1 x n x
c d + 1 c m c = 0 0 c 1 e 1 ( d + 1 ) u c 2 e 2 ( d + 1 ) u c d e d ( d + 1 ) u 0 0 0 0 0 0 0 0 0 0 c 1 e 1 m u c 2 e 2 m u c d e d m u T c ( c ) c 1 c m c
which correspond to Equations (10)–(12). Matrices T u and T x are square diagonal matrices and Equations (10) and (11) are thus inversibles (unless a repeated variable is equal to zero) and can be used to go back and forth between dimensional and dimensionless states and input variables. Matrix T c consists of a block of d columns of zeros, followed by a diagonal block of dimensions ( m d ) × ( m d ) , and Equation (12) is not inversible. For a given context c, there is only one dimensionless context c ; however, a given dimensionless context c may correspond to multiple dimensional contexts c.
Then, the Buckingham π theorem indicates that the relationship described by Equation (14) can be restated in a relationship between the Π groups involving d less variables, which, based on the selected repeated variable, corresponds to
u j = π j x 1 , , x n , c d + 1 , , c m
By applying the same procedure to all control inputs, we can then assemble all k mappings back into a vector form, as follows:
u 1 u k = π ( x 1 x n Dimensionless feedback law f , c d + 1 c m context c )
corresponding to Equation (8). Finally, based on the defined transformations in Equations (10)–(12), we can relate the dimensional policy to the dimensionless version as follows:
π ( x , c ) = T u 1 ( c ) π T x ( c ) x x , T c ( c ) c c u u
which corresponds to Equation (13). □

2.4. Transferring Feedback Laws between Similar Systems

Based on the dimensional analysis, we can demonstrate that any feedback law can be generalized to a different context under the condition of dimensional similarity. In this section, we show that a feedback law can be transferred exactly to another motion control problem by scaling the input and output of the function based on matrices that can be computed using the dimensional analysis. The salient feature of this result is that the conditions are very generic; even a black-box discontinuous non-linear policy (such as those obtained using deep-reinforcement learning algorithms) can be transferred this way. The limitation is that the condition for an exact transfer is having equal dimensionless context variables c .
First, it is useful to define dimensionless feedback laws that correspond to specific cases of the dimensionless policy, as we defined for the dimensional mapping.
Definition 3. 
We denote a dimensionless feedback law f a , the global dimensionless policy for a specific instance of context variables c a , as follows:
f a ( x ) = π ( x , c = c a ) x
where c a is the dimensionless version of the context variables instance c a , and equal to
c a = T c ( c a ) c a
Lemma 1. 
Two feedback laws, which are solutions to the same motion control problem for two instances of context variables, will be equal in dimensionless form if they share the same dimensionless context:
f b ( x ) = f a ( x ) x i f c a = c b
Proof. 
This follows from the definition:
f a ( x ) = π ( x , c = c a )
f a ( x ) = π ( x , c = c b )
f a ( x ) = f b ( x )
Lemma 2. 
In a specific context described by variables c a , a dimensional feedback law can be restated in dimensionless form, and vice versa, by scaling the input and the output using the defined transformation matrices T x ( c a ) and T u ( c a ) as follows:
f a ( x ) = T u 1 ( c a ) f a T x ( c a ) x x u x
f a ( x ) = T u ( c a ) f a T x 1 ( c a ) x x u x
Proof. 
Starting from Equation (13) and substituting c with a specific instance c a , then substituting policy maps on each side with feedback laws f a and f a based on the definition, we obtain Equation (30):
π ( x , c a ) = T u 1 ( c a ) π T x ( c a ) x , T c ( c a ) c a
f a ( x ) = T u 1 ( c a ) f a T x ( c a ) x
Then, starting from the right side of Equation (31) and substituting the function f a with Equation (30), the matrices are reduced to identity matrices and we obtain Equation (31):
T u ( c a ) f a T x 1 ( c a ) x = T u ( c a ) T u 1 ( c a ) f a T x 1 ( c a ) T x ( c a ) x
T u ( c a ) f a T x 1 ( c a ) x = f a ( x )
Theorem 2. 
If a feedback law f a is known—for instance, as the result of a numerical algorithm—and this is the solution to a motion control problem with context variables c a , we can compute the solution f b to the same motion control problem for different context variables c b by scaling the input and output of f a as follows:
f b ( x ) = T u 1 ( c b ) T u ( c a ) f a T x 1 ( c a ) T x ( c b ) x x
if contexts c a and c b are dimensionally similar, i.e., if the following condition is true:
T c ( c b ) c b = T c ( c a ) c a
Proof. 
First, f b can be written based on its dimensionless form f b in a context c b using Equation (30) from Lemma 2. Also, based on Lemma 1, under the similarity condition—i.e., c b = c a or equivalently T ( c b ) c b = T ( c a ) c a — we have that f b is equal to f a . Finally, f a can be written based on its dimensional form f a in a context c a , using Equation (31) from Lemma 2, as follows:
f b ( x ) = T u 1 ( c b ) f b T x ( c b ) x
f b ( x ) = T u 1 ( c b ) f a T x ( c b ) x if c b = c a
f b ( x ) = T u 1 ( c b ) T u ( c a ) f a T x 1 ( c a ) T x ( c b ) x
The idea is summarized in Figure 4. To transfer a feedback law, we must first extract the dimensionless form, a more generic form of knowledge, and then scale it back to the new context.

2.5. Dimensionally Similar Contexts

Equation (36) can be used to scale a policy for an exact transfer of policy solutions regarding context c sharing the same dimensionless context c , a condition that is referred to as dimensionally similar. Equation (12) is a mapping from an m dimensional space to an m d dimensional space, and its inverse has multiple solutions. A given dimensionless context c corresponds to a subset of all possible values of dimensional context c. As illustrated in Figure 5 and Figure 6 with low-dimensional examples ( m = 2 and d = 1 ), the subsets of context c leading to the same c can be linear if c is just a ratio of two variables of the same dimensions, or a non-linear curve if c involves exponents leading to a more complex polynomial relationship. In general, when the context c involves many dimensions, it is important to note that the similarity condition means meeting multiple conditions (one for each element of the vector c ) in a higher dimensional space, as illustrated in Figure 7 for the pendulum swing-up example that is studied in the next section. To some degree, this dimensionally similar context condition is a technique to regroup the motion control problems that are the same up to scaling factors. Therefore, it is also logical that their solutions should be equivalent up to scaling factors.

2.6. Summary of the Theoretical Results

The dimensional analysis leads us to the following relevant theoretical results, which are very generic since no assumptions on the form of the policy function are necessary:
  • The global problem of learning π ( x , c ) , i.e., the feedback policies for all possible contexts, is simplified in a dimensionless form π ( x , c ) because we can remove d input dimensions from the unknown mapping (typically, d would be 2 or 3 for controlling a physical system involving time, force, and length); see Theorem 1.
  • The feedback law solutions of dimensionally similar subset of contexts share the exact same solution when restated in a dimensionless form; see Lemma 1.
  • A feedback law, which is a solution to a motion control problem in a context, can be transferred exactly to another context, under a condition of dimensional similarity, by appropriately scaling its inputs and outputs; see Theorem 2.
Just for illustration purposes, let us imagine we have a policy for a spherical submarine where the context is defined by a velocity, a viscosity, and a radius. In dimensionless form, we would find that the context can be described by a single variable, the Reynolds number, and that (1) learning the policy will be easier in dimensionless form because it is a function of a lesser number of variables and (2) if we know the feedback law solution for a specific context of velocity, viscosity, and radius, then we can actually re-use it for multiple versions of the same motion control problem sharing the same Reynolds number.

3. Case Studies with Numerical Results

In this section, we use numerically generated optimal policy solutions for two motion control problems as examples illustrating the salient features of the presented theoretical results of Section 2 and the potential for transfer learning.

3.1. Optimal Pendulum Swing-Up Task

The first numerical example is the classical pendulum swing-up task. This example illustrates that an optimal feedback law in the form of a table look-up generated for a pendulum of a given mass and length can be transferred to a pendulum of a different mass and length if the motion control problem is dimensionally similar. The example is also used to introduce the concept of regime for motion control problems.

3.1.1. Motion Control Problem

The motion control problem is defined here as finding a feedback law for controlling the dynamic system described by the following differential equation:
m l 2 θ ¨ m g l sin θ = τ
which minimizes the infinite horizon quadratic cost function provided by
J = 0 q 2 θ 2 + τ 2 d t
subject to input constraints provided by
τ m a x τ τ m a x
Note that, here, (1) the cost function parameter q has a power of two to allow its value to be in units of torque; (2) it was chosen not to penalize high velocity values for simplicity; (3) the weight multiplying the torque is set to one without a loss of generality as only the relative values of weights impact the optimal solution; and (4) all parameters are time-independent constants. Thus, assuming that there are no hidden variables and that Equations (41)–(43) fully describe the problem, the solution—i.e., the optimal policy for all contexts—involves the variables listed in Table 1, and should be of the form provided by
τ inputs = π θ , θ ˙ states , m , g , l system parameters , q , τ m a x task parameters Context c
It is interesting to note that, while there are three system parameters, m, g, and l, they only appear independently in two groups in the dynamic equation. We can thus consider only two system parameters. For convenience, we selected m g l , corresponding to the maximum static gravitational torque (i.e., when the pendulum is horizontal), and the natural frequency ω = g l , as listed in Table 2.

3.1.2. Dimensional Analysis

Here, we have one control input, two states, two system parameters, and two task parameters, for a total of 1 + ( n = 2 ) + ( m = 4 ) = 7 variables involved. In those variables, only d = 2 independent dimensions ( M L 2 T 2 and T 1 ) are present. Using c 1 = m g l and c 2 = ω as the repeated variables leads to the following dimensionless groups:
Π 1 = τ = τ m g l [ M L 2 T 2 ] [ M ] [ L T 2 ] [ L ]
Π 2 = θ = θ [ ]
Π 3 = θ ˙ = θ ˙ ω [ T 1 ] [ T 1 ]
Π 4 = τ m a x = τ m a x m g l [ M L 2 T 2 ] [ M ] [ L T 2 ] [ L ]
Π 5 = q = q m g l [ M L 2 T 2 ] [ M ] [ L T 2 ] [ L ]
All three torque variables ( τ , q, and τ m a x ) are scaled by the maximum gravitational torque, and the pendulum velocity variable is scaled by the natural pendulum frequency. The transformation matrices are thus
τ = 1 / m g l T u τ
θ θ ˙ = 1 0 0 1 / ω T x θ θ ˙
q τ m a x c = 0 0 1 / m g l 0 0 0 0 1 / m g l T c m g l ω q τ m a x c
By applying the Buckingham π theorem [4], Equation (44) can be restated as a relationship between the five dimensionless Π groups:
τ = π θ , θ ˙ , q , τ m a x
According to the results of Section 2, for dimensionally similar swing-up contexts (meaning those with equal q and τ m a x ratios), the optimal feedback laws should be equivalent in their dimensionless forms. In other words, the optimal policy f a , found in the specific context c a = [ m a , l a , g a , q a , τ m a x , a ] , and the optimal policy f b , in a second context, c b = [ m b , l b , g b , q b , τ m a x , b ] , are equal when restated in dimensionless form: f a = f b if q a = q b and τ m a x , a = τ m a x , b . Furthermore, f b can be obtained from f a or vice versa using the scaling formula provided by Equation (36) if this condition is met. However, if q a q b or τ m a x , a τ m a x , b , then f a cannot provide us with information on f b without additional assumptions. Figure 7 illustrates that, for the pendulum swing-up problem, the similarity condition can be represented as a line in a three-dimensional space created by three dimensional context variables. Each condition of equal values of q and τ m a x is a plane in this space, and the intersection of the two planes is the subset of context meeting the two conditions. Also, it is interesting to note that the fourth context variable ω is not an additional axis here because it is not involved in Equation (52).

3.1.3. Numerical Results

Here, we use a numerical algorithm (methodological details are presented in Section 3.1.5) to compute numerical solutions to the motion control problem defined by Equations (41)–(43). The algorithm computes feedback laws in the form of look-up tables, based on a discretized grid of the state space. The optimal (up to discretization errors) feedback laws are computed for nine instances of context variables, which are listed in Table 3. In those nine contexts, there are three subsets of three dimensionally similar contexts. Also, each subset includes the same three pendulums: a regular pendulum, one that is two times longer, and one that is twice as heavy (as illustrated in Figure 1). Contexts c a , c b , and c c describe a task where the torque is limited to half the maximum gravitational torque. Contexts c d , c e , and c f describe a task where the application of large torques is highly penalized by the cost function. Contexts c g , c h , and c i describe a task where position errors are highly penalized by the cost function.
Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15 and Figure 16 illustrate that, for each subset with equal dimensionless context, the dimensional feedback laws generated look numerically very similar. They are similar up to the scaling of their axis, if we neglect slight differences due to discretization errors. Furthermore, the figures also illustrate that the dimensionless version of the feedback laws ( f ), computed using Equation (31), is equal within each dimensionally similar subset. These were the expected results predicted by the dimensional analysis presented in Section 2.
In terms of how this can be applied in a practical scenario, we see that, if we compute the feedback law provided in Figure 8a, we can obtain the feedback law provided in Figure 9a directly by scaling the original policy with Equation (36) using the appropriate context variables, without having to recompute. In some sense, Equation (36) provides us with the ability to adjust the feedback law spontaneously to conform with new system parameters m g l or ω , as would be the case with an analytical solution, even when working with black-box mapping, resulting in the form of a table look-up. However, the equivalence of the scaled solution is only guaranteed within a dimensionally similar context subset, which is the main limitation of this approach. The feedback law provided in Figure 8a cannot be scaled into the feedback law provided in Figure 14a, for instance, since τ m a x and q are not equals. It is also interesting to note that trajectory solutions from the same starting point—and computed cost-to-go functions (not illustrated)—are also all equivalent, up to scaling factors, within similar subgroups. Hence, optimal trajectories and cost-to-go solutions could also be shared and transferred between similar systems using the same technique that we demonstrate here for feedback laws.

3.1.4. Regimes of Solutions

In some situations, changing a context variable will not have any effect on the optimal policy. For instance, for a torque-limited optimal pendulum swing-up problem, augmenting τ m a x or q while keeping the other value fixed will have little effect above a given threshold. For instance, if we look at the solutions for contexts c d , c e , and c f , using a large amount of torque is so highly penalized by the cost function that the saturation limit does not have much impact on the solution (except for edge cases on the boundary). Thus, we would expect that augmenting τ m a x would not change the solution. Figure 17 and Figure 18 show a slice (to allow for visualization) of the dimensionless optimal policy solution for various contexts. Figure 17 illustrates the results of changing τ m a x while keeping q fixed. We can see that, when τ m a x < 0.3 , the policy is almost always on the min–max allowable torque values; this behavior is often called bang–bang. At the other extreme, when τ m a x > 2.5 , the policy solution is continuous and almost never affected by the saturation. Figure 18 illustrates the results of changing q while keeping τ m a x fixed. We can see that, when q < 0.1 , the optimal policy solution does not reach min–max saturation, while, when q > 1.0 , the policy is almost always on the min–max allowable values.
We can see that, for extreme context values, two types of behavior occur, illustrated as regions in the dimensionless context space in Figure 19. Those regions are best characterized by a ratio of q and τ m a x , a new dimensionless value that we define as the ratio of the maximum torque saturation τ m a x over the weight parameter in the cost function q:
R = τ m a x q = τ m a x q
When the value of R 1 , the policy solution is partially continuous and reaches the min–max value in some other region of the state space; this is a behavior we call the transition regime. When the value of R 1 , the constraint on torque drives the solution to exhibit bang–bang behavior. In this region (that we approximate here, based on our sensitivity analysis, as R 0.1 ), the global policy is only a function of τ m a x :
π ( θ , θ ˙ , q , τ m a x ) π ( θ , θ ˙ , τ m a x ) if R 1
i.e., the value of q does not affect the solution. On the other hand, when the value of R 1 , the policy is unconstrained. In this region (that we approximate here, based on our sensitivity analysis, as R 10 ), the global policy is only a function of q since the constraint is so far away:
π ( θ , θ ˙ , q , τ m a x ) π ( θ , θ ˙ , q ) if R 1
The concept of regime is often leveraged in fluid mechanics. It allows us to generalize results between situations where the relevant dimensionless numbers do not match exactly. For instance, when the Mach number is small ( M a < 0.3 ), we can generally assume an incompressible regime where various speeds of sound would not change the behavior much. Here, for the purpose of transferring policy solutions between contexts, this means that the condition of having the same exact dimensionless context variables can be relaxed with an inequality that corresponds to a regime. For instance, if we have two contexts in the unconstrained regime, it is sufficient to match only q to create equivalent dimensionless policies.
Proposition 1. 
If it is assumed that Equation (56) holds, the condition of having equivalent dimensionless feedback laws is relaxed to an inequality for one of the context variables as follows:
f a ( θ , θ ˙ ) f b ( θ , θ ˙ )
if q a = q b and R a 1 and R b 1
Proof. 
First, if R a 1 and R b 1 , then from Equation (56) we can approximate the policy not to be a function of τ m a x :
f a ( θ , θ ˙ ) = π ( θ , θ ˙ , q a , τ m a x , a ) π ( θ , θ ˙ , q a )
f b ( θ , θ ˙ ) = π ( θ , θ ˙ , q b , τ m a x , b ) π ( θ , θ ˙ , q b )
Hence, if q a = q b , we have the following:
f a ( θ , θ ˙ ) π ( θ , θ ˙ , q a ) = π ( θ , θ ˙ , q b ) f b ( θ , θ ˙ )
Also, for two contexts in a bang–bang regime, it is sufficient to match only τ m a x to have equivalent dimensionless policies.
Proposition 2. 
If it is assumed that Equation (55) holds, the condition of having equivalent dimensionless feedback laws is relaxed to an inequality for one of the context variables as follows:
f a ( θ , θ ˙ ) f b ( θ , θ ˙ )
if τ m a x , a = τ m a x , b and R a 1 and R b 1
Proof. 
First, if R a 1 and R b 1 , then, from Equation (55), we can approximate the policy not to be a function of q :
f a ( θ , θ ˙ ) = π ( θ , θ ˙ , q a , τ m a x , a ) π ( θ , θ ˙ , τ m a x , a )
f b ( θ , θ ˙ ) = π ( θ , θ ˙ , q b , τ m a x , b ) π ( θ , θ ˙ , τ m a x , b )
Hence, if τ m a x , a = τ m a x , a , we have the following:
f a ( θ , θ ˙ ) π ( θ , θ ˙ , τ m a x , a ) = π ( θ , θ ˙ , τ m a x , b ) f b ( θ , θ ˙ )
From another point of view, assuming one of those regimes means that we could have removed one variable from the context at the start of the dimensional analysis. All in all, the impact of identifying such regimes is that we can increase the size of the context subset to which the dimensionless version of the policy should be equivalent, leading to a potentially larger pool of systems that can share a learned policy and numerical results.

3.1.5. Methodology

We obtained the optimal feedback law presented in this section using a basic dynamic programming algorithm [18] on a discretized version of the continuous system. The approach is almost equivalent to the value iteration algorithm [5], which is sometimes referred to as model-based reinforcement learning, with the exception that, here, the total number of iteration steps was fixed (corresponding to a very long time horizon approximating an infinite horizon) instead of the iteration being stopped after reaching a convergence criterion. This approach was chosen to enable the collection of consistent results across all contexts, leading to a wide range of order-of-magnitude cost-to-go solutions. The time step was set to 0.025 s, the state space was discretized into an even 501 × 501 grid, and the continuous torque input was discretized into 101 discrete control options. Special out-of-bounds and on-target termination states were included to guarantee convergence [18]. Also, using dynamic programming made the setting of additional parameters to define the domain necessary. Although those parameters should not affect the optimal policy far away from the boundaries, dimensionless versions of those parameters were kept fixed in all the experiments as follows:
θ m a x = θ m a x = 2 π
θ ˙ m a x = θ ˙ m a x ω = π
t f = t f ω = 20 × 2 π
where θ m a x is the range of angles for which the optimal policy is solved, set to one full revolution; θ ˙ m a x is the range of angular velocity for which the optimal policy is solved; and t f is the time horizon, set to 20 periods of the pendulum using the natural frequency. The source code is available online at the following link: https://github.com/alx87grd/DimensionlessPolicies (accessed on 25 February 2024), and this Google Colab page allows users to reproduce the results: https://colab.research.google.com/drive/1kf3apyHlf5t7XzJ3uVM8mgDsneVK_63r?usp=sharing (accessed on 25 February 2024).

3.2. Optimal Motion for a Longitudinal Car on a Slippery Surface

The second numerical example is a simplified car positioning task. We use this example to illustrate that an optimal feedback law in the form of a table look-up generated for a car of a given size can be transferred to a car of a different size if the motion control problem is dimensionally similar, as illustrated at Figure 20. The example includes state constraints and a different type of non-linearity (i.e., not similar to the pendulum swing-up) to illustrate how generic the developed dimensionless polices concepts are.

3.2.1. Motion Control Problem

The motion control problem is defined here as finding a feedback law to control the dynamic system, as described by the following differential equation:
x ¨ = μ ( s ) g x c l + μ ( s ) y c with μ ( s ) = f x f n f = 2 1 + e 70 s 1
where μ ( s ) is the ratio of vertical to horizontal forces on the front wheel, that is, a non-linear function of the front wheel slip s. The above equations represent a simple dynamic model of the longitudinal motion of a car, assuming that the controller can impose the wheel slip of the front wheel and that suspensions are infinitely rigid (but that weight transfer is included). Interestingly, it is already standard practice to model the ground–tire interaction with an empirical curve μ ( s ) relating two dimensionless variables.
The objective is to minimize the infinite horizon quadratic cost function provided by the following:
J = 0 q 2 x 2 + s 2 d t
subject to the constraints of keeping ground reaction forces positive, as provided by the following:
0 f n f = g x c x ¨ y c
0 f n r = g ( l x c ) + x ¨ y c
where the weight transfer potentially limits the allowable motions. Note that the cost function parameter q in this problem has a power of minus two to have a value with units of length, and all parameters are time-independent constants. The solution to this problem, i.e., the optimal policy for all contexts, involves the variables listed in Table 4 and should be of the form provided by
s input = π x , x ˙ states , x c , y c , g , l , q Context c

3.2.2. Dimensional Analysis

Here, we have one control input, two states, and five context parameters, for a total of 1 + ( n = 2 ) + ( m = 5 ) = 8 variables. Of those variables, only d = 2 independent dimensions (length [ L ] and time [ T ] ) are present. Using c 1 = g and c 2 = l as the repeated variables leads to the following dimensionless groups:
Π 1 = s = s [ ]
Π 2 = x = x l [ L ] [ L ]
Π 3 = x ˙ = x ˙ g l [ L T 1 ] [ L T 2 ] 1 / 2 [ L ] 1 / 2
Π 4 = x c = x c l [ L ] [ L ]
Π 5 = y c = y c l [ L ] [ L ]
Π 6 = q = q l [ L ] [ L ]
All three length variables are scaled by the wheel base, and the velocity variable is scaled using a combination of the wheel base and gravity. The transformation matrices are then as follows:
s = 1 T u s
x x ˙ = 1 l 0 0 1 g l T x x x ˙
x c y c q c = 0 0 1 / l 0 0 0 0 0 1 / l 0 0 0 0 0 1 / l T c g l x c y c q c
By applying the Buckingham π theorem [4], Equation (74) can be restated as a relationship between the six dimensionless Π groups as follows:
s = π x , x ˙ , x c , y c , q

3.2.3. Numerical Results

Here, as in the pendulum example, numerical solutions to the motion control problem are computed for the nine instances of context variables listed in Table 5. In those nine contexts, there are three subsets of three dimensionally similar contexts. Contexts c a , c b , and c c describe situations where the CG horizontal position is at half the wheel base; contexts c d , c e , and c f describe situations in which the the CG is very high (and hence the cars are very limited by the weight transfer); and contexts c h , c i , and c j describe situations in which position errors are highly penalized by the cost function plus cars with a very low CG relative to the wheel base.
Figure 21, Figure 22, Figure 23, Figure 24, Figure 25, Figure 26, Figure 27, Figure 28 and Figure 29 illustrate that, for each subset with an equal dimensionless context, solutions are equal within each dimensionally similar subset when scaled into the dimensionless form. This was, again, the expected result predicted by the dimensional analysis presented in Section 2. In terms of how to use this in a practical scenario, this exemplifies how various cars (which are different but share the same ratios) could share a braking policy, for instance.

3.2.4. Methodology

The same methodology as the pendulum example (see Section 3.1.5) was used for the car motion control problem. The time step was set to 0.025 s, the state space was discretized into an even 501 × 501 grid, and the continuous slip input was discretized into 101 discrete control options. Additional domain parameters were set as follows:
x m a x = 10 l
x ˙ m a x = 2 g l
t m a x = 10 x m a x v m a x
The source code is available online at the following link: https://github.com/alx87grd/DimensionlessPolicies (accessed on 25 February 2024), and this Google Colab page allows users to reproduce the results: https://colab.research.google.com/drive/1-CSiLKiNLqq9JC3EFLqjR1fRdICI7e7M?usp=share_link (accessed on 25 February 2024).

4. Case Studies with Closed-Form Parametric Policies

To better understand the concept of a dimensionless policy, in this section, two examples based on well-known closed-form solutions to classical motion control problems are presented to illustrate how using Theorem 2 can be equivalent to substituting new system parameters in an analytical solution.

4.1. Dimensionless Linear Quadratic Regulator

The first example is based on the linear quadratic regulator (LQR) solution [19] for the linearized pendulum that allows for a closed-form analytical solution of optimal policy. This allows us to compare the method of transferring the policy with the proposed scaling law of Equation (36) to the method of transferring the policy by substituting the new system parameters in the analytical solution.
Here, we consider a simplified version of the pendulum swing-up problem (see Section 3.1), and a linearized version of the equation of motion is used as follows:
m l 2 θ ¨ m g l θ = τ
The same infinite horizon quadratic cost function is used, as follows:
J = 0 q 2 θ 2 + τ 2 d t
However, no constraints on the torque are included in this problem. All parameters are also assumed to be time-independent constants. The same variables are used in this problem definition as before, except that the torque limit τ m a x variable is absent. The global policy solution should then have the following form:
τ inputs = π l q r θ , θ ˙ states , m , g , l system parameters , q task parameters context c
We can thus select the same dimensionless Π groups as in Section 3.1.2 and conclude that Equation (90) can be restated under the following dimensionless form:
τ = π l q r θ , θ ˙ , q
Proposition 3. 
For this motion control problem, defined by Equations (88) and (89), an analytical solution exists and the optimal policy is provided by
τ = m g l + ( m g l ) 2 + q 2 θ + 2 m l 2 ( m g l + ( m g l ) 2 + q 2 ] θ ˙
Proof. 
See Appendix A. □
Applying Equation (31) to this feedback law leads to the dimensionless form, using G = m g l and H = m l 2 for shortness, as follows:
τ = f ( x ) = T u ( c ) f T x 1 ( c ) x = 1 G f ( θ , ω θ ˙ )
τ = 1 G G + G 2 + q 2 θ + 1 G 2 H G + G 2 + q 2 ω θ ˙
τ = 1 + G 2 + q 2 G 2 θ + 2 H ω 2 G G + G 2 + q 2 G θ ˙
τ = 1 + 1 + ( q ) 2 θ + 2 1 + 1 + ( q ) 2 θ ˙
The dimensionless policy is only a function of the dimensionless states and the dimensionless cost parameter q , as predicted by Equation (91) based on the dimensional analysis. It is interesting to note that Equation (96) represents the core generic solution to the LQR problem and is independent of unit and scale.
We can also use this analytical policy solution to demonstrate Theorem 2, i.e., show that scaling the policy with Equation (36) is equivalent to substituting new context variables when the contexts are dimensionally similar.
Proposition 4. 
Suppose that we have two context instances, labeled a and b, and that we use the global policy solution of Equation (92) to obtain two versions of context-specific feedback laws:
f a ( θ , θ ˙ ) = G a + G a 2 + q a 2 θ + 2 H a ( G a + G a 2 + q a 2 ) θ ˙
f b ( θ , θ ˙ ) = G b + G b 2 + q b 2 θ + 2 H b ( G b + G b 2 + q b 2 ) θ ˙
where
G a = m a g a l a H a = m a l a 2
G b = m b g b l b H b = m b l b 2
Based on Theorem 2, if q a = q b , we can obtain f b directly by scaling f a based on Equation (36) as follows:
f b ( θ , θ ˙ ) = G b G a f a θ , ω a ω b θ ˙
where
ω a = G a / H a ω b = G b / H b
Proof. 
If we substitute f a in Equation (101) by the analytical solution provided by Equation (97) and then distribute the multiplying scaling factors, we obtain
f b ( θ , θ ˙ ) = G b G a G a + G a 2 + q a 2 θ + 2 H a ( G a + G a 2 + q a 2 ) ω a ω b θ ˙
f b ( θ , θ ˙ ) = G b 1 + 1 + ( q a ) 2 θ + G b 2 1 + 1 + ( q a ) 2 θ ˙ ω b
f b ( θ , θ ˙ ) = G b + G b 2 + ( G b q a ) 2 θ + 2 H b G b + G b 2 + ( G b q a ) 2 θ ˙
which is equivalent to Equation (98) when
G b q a = q b or equivalently q a = q b
which is the condition of having equal dimensionless contexts ( c a = c b ) for this motion control problem. □
This example illustrates that applying the scaling of Equation (36) based on the dimensional analysis framework is equivalent to changing the context variables in an analytical solution when the dimensionless context variables are equal.

4.2. Dimensionless Computed Torque

The second example is again based on the pendulum but using the computed torque control technique [20]. This also allows us to compare the method of transferring the policy with the proposed scaling law of Equation (36) to the method of transferring the policy by substituting the new system parameters in the analytical solution. This example is not based on a quadratic cost function, as opposed to previous examples, to illustrate the flexibility of the proposed schemes.
Here, we present a second analytical example. A computed torque feedback law is a model-based policy (assuming that there are no torque limits) that is the solution to the motion control problem of making a mechanical system that converges on a desired trajectory, with a specified second-order exponential time profile defined by the following equation:
0 = ( θ ¨ d θ ¨ ) + 2 ω d ζ ( θ ˙ d θ ˙ ) + ω d 2 ( θ θ )
For the specific case of the pendulum swing-up problem, we assume that all parameters are time-independent constants and that our desired trajectory is simply the upright position ( θ ¨ d = θ ˙ d = θ d = 0 ), leaving only two parameters to define the tasks: ω d and ζ . Then, the computed torque policy takes the following form:
τ input = π c t θ , θ ˙ states , m , g , l system parameters , ω d , ζ task parameters context c
and the analytical solution is as follows:
τ = m g l sin θ 2 m l 2 ω d ζ θ ˙ m l 2 ω d 2 θ
Here, the context includes the system parameters and two variables characterizing the convergence speed. Note that the task parameters directly define the desired behavior, as opposed to the previous examples where they were defining the behavior indirectly through a cost function. The states, control inputs, and system parameters are the same as before; only the task parameters differ, and their dimensions are presented in Table 6.
Here, seven variables and only p = 2 independent dimensions ( M L 2 T 2 and T 1 ) are involved. Thus, five dimensionless groups can be formed, as follows:
1 + ( n = 2 ) + ( m = 4 ) ( p = 2 ) = 5
Using m g l and ω , the system parameters, as the repeating variables leads to the following dimensionless groups:
Π 1 = τ = τ m g l [ M L 2 T 2 ] [ M ] [ L T 2 ] [ L ]
Π 2 = θ = θ [ ]
Π 3 = θ ˙ = θ ˙ ω [ T 1 ] [ T 1 ]
Π 4 = ω d = ω d ω [ T 1 ] [ T 1 ]
Π 5 = ζ = ζ [ ]
Then, applying the Buckingham π theorem tells us that the computed torque policy can be restated as the following relationship between the dimensionless variables:
τ = π c t θ , θ ˙ , ω d , ζ
Here, we can confirm directly (since we have an analytical solution) that applying Equation (31) to the computed torque feedback law provided by Equation (109) leads to the following dimensionless form:
τ = 1 m g l m g l sin θ 2 m l 2 ω d ζ ω θ ˙ m l 2 ω d 2 θ
τ = sin θ 2 ω d ζ θ ˙ ( ω d ) 2 θ
thereby confirming the structure predicted by Equation (116) based on the dimensional analysis.
We can, again, use this example to demonstrate Theorem 2 and show that, when the dimensionless context is equal, scaling a policy using Equation (36) is equivalent to substituting new values of the system parameters into the analytical equation.
Proposition 5. 
Suppose that we have two context instances, labeled a and b, and that we use the global policy solution of Equation (109) to obtain two versions of context-specific feedback laws:
f a ( θ , θ ˙ ) = G a sin θ 2 H a ω d , a ζ a θ ˙ H a ω d , a 2 θ
f b ( θ , θ ˙ ) = G b sin θ 2 H b ω d , a ζ b θ ˙ H b ω d , a 2 θ
Based on Theorem 2, if ω d , a = ω d , b and ζ a = ζ b , we can obtain f b directly by scaling f a based on Equation (36) as follows:
f b ( θ , θ ˙ ) = G b G a f a θ , ω a ω b θ ˙
Proof. 
If we substitute f a in Equation (121) by the analytical solution provided by Equation (119) and then distribute the multiplying scaling factors, we obtain
f b ( θ , θ ˙ ) = G b G a [ G a sin θ 2 H a ω d , a ζ a ω a ω b θ ˙ H a ω d , a 2 θ ]
f b ( θ , θ ˙ ) = G b sin θ 2 ω d , a ω a ζ a θ ˙ ω b ω d , a ω a 2 θ
f b ( θ , θ ˙ ) = G b sin θ 2 H b ω b ω a ω d , a ζ a θ ˙ H b ω b ω a ω d , a 2 θ
which is exactly equivalent to Equation (120) (i.e., equivalent to substituting the a instance of the context variables to the b instance) if
ω b ω a ω d , a = ω d , b and ζ a = ζ b
which is the dimensional similarity condition ( c a = c b ) for this motion control problem:
ω d , a ω a = ω a = ω b = ω d , b ω b and ζ a = ζ b

5. Conclusions

The dimensional analysis of physically meaningful control policies, leveraging the Buckingham π theorem, leads to two interesting theoretical results. (1) In dimensionless form, the solution to a motion control problem involves a reduced number of parameters. (2) It is possible to exactly transfer a feedback law between similar systems without any approximation, simply by scaling the input and output of any type of control law appropriately, including via numerically generated black-box mapping. However, the main practical limitation of this approach is that, if the condition of dimensional similarity ( c a = c b ) is not met exactly, then there are no theoretical guarantees regarding whether a policy is transferable without additional assumptions, such as with the discussed concept of regimes of behavior. Also, we demonstrated how those results can be used to exactly transfer even discontinuous black-box policies between similar systems using two simple examples of dynamical systems and numerically generated optimal feedback laws. An interesting direction for further exploration would be investigating how good an approximation is when a feedback law is transferred from a context that is not exactly the same but close. Also, it would be interesting to test the concept of dimensionless policies to empower a reinforcement learning scheme that could collect data from various, but dimensionally similar, systems to accelerate the learning process.

Funding

This research was funded by NSERC discovery grant number RGPIN-2018-05388.

Data Availability Statement

The source code used to generate numerical results in this paper is available link: https://github.com/alx87grd/DimensionlessPolicies (accessed on 25 February 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CGCenter of gravity
CTComputed torque
LQRLinear quadratic regulator

Appendix A. LQR Analytic Solution

In this section, we show that the policy provided by Equation (92) is optimal with respect to the LQR problem defined in Section 4.1. We can write the equation of motion provided by Equation (88) in state-space form, using G = m g l and H = m l 2 , as follows:
d d t θ θ ˙ = 0 1 G / H 0 A θ θ ˙ x + 0 1 / H B τ u
Then, by adapting a solution from [21], if we parameterize the weight matrix of the cost function as follows:
J = 0 x T a ( a 2 G ) 0 0 b 2 2 a H Q x + u T 1 R u d t
the optimal cost-to-go is provided by the following:
J = x T b ( a G ) a H a H b H S x
and the optimal feedback policy is provided by
u = R 1 B T S K x = a b K x
This solution can by verified by substituting matrices into the algebraic Riccati equation provided by
0 = S A + A T S S B R 1 B T S + Q
since the problem fits into the framework of the classical infinite horizon LQR result [18]. Then, we can see that the cost function defined in Section 4.1 is a special case, where Q 11 = q 2 and Q 22 = 0 , leading to the following equations:
q 2 = a ( a 2 G )
0 = b 2 2 a H
Solving for a and b, and retaining the positive solution, leads to the following:
a = G + G 2 + q 2
b = 2 a H = 2 H G + G 2 + q 2
which, when substituted into Equation (A4), is equal to the policy provided by Equation (92) in Section 4.1.

References

  1. Kuindersma, S.; Deits, R.; Fallon, M.; Valenzuela, A.; Dai, H.; Permenter, F.; Koolen, T.; Marion, P.; Tedrake, R. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Auton. Robot. 2016, 40, 429–455. [Google Scholar] [CrossRef]
  2. Schwenzer, M.; Ay, M.; Bergs, T.; Abel, D. Review on model predictive control: An engineering perspective. Int. J. Adv. Manuf. Technol. 2021, 117, 1327–1349. [Google Scholar] [CrossRef]
  3. Rudin, N.; Hoeller, D.; Reist, P.; Hutter, M. Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. In Proceedings of the 5th Conference on Robot Learning, London, UK, 8–11 November 2021; pp. 91–100. [Google Scholar]
  4. Buckingham, M.E. On Physically Similar Systems; Illustrations of the Use of Dimensional Equations. Phys. Rev. 1914, 4, 345. [Google Scholar] [CrossRef]
  5. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; Bradford Books: Cambridge, MA, USA, 2018. [Google Scholar]
  6. Taylor, M.E.; Stone, P. Transfer Learning for Reinforcement Learning Domains: A Survey. J. Mach. Learn. Res. 2009, 10, 1633–1685. [Google Scholar]
  7. Devin, C.; Gupta, A.; Darrell, T.; Abbeel, P.; Levine, S. Learning modular neural network policies for multi-task and multi-robot transfer. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 2169–2176. [Google Scholar] [CrossRef]
  8. Gupta, A.; Devin, C.; Liu, Y.; Abbeel, P.; Levine, S. Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning. arXiv 2017, arXiv:1703.02949. [Google Scholar]
  9. Helwa, M.K.; Schoellig, A.P. Multi-robot transfer learning: A dynamical system perspective. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 4702–4708. [Google Scholar] [CrossRef]
  10. Chen, T.; Murali, A.; Gupta, A. Hardware Conditioned Policies for Multi-Robot Transfer Learning. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 2–8 December 2018; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
  11. Pereida, K.; Helwa, M.K.; Schoellig, A.P. Data-Efficient Multirobot, Multitask Transfer Learning for Trajectory Tracking. IEEE Robot. Autom. Lett. 2018, 3, 1260–1267. [Google Scholar] [CrossRef]
  12. Sorocky, M.J.; Zhou, S.; Schoellig, A.P. Experience Selection Using Dynamics Similarity for Efficient Multi-Source Transfer Learning Between Robots. arXiv 2020, arXiv:2003.13150. [Google Scholar] [CrossRef]
  13. Bertrand, J. Sur l’homogénéité dans les formules de physique. Cah. Rech. L’Acad. Sci. 1878, 86, 916–920. [Google Scholar]
  14. Rayleigh, L. VIII. On the question of the stability of the flow of fluids. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1892, 34, 59–70. [Google Scholar] [CrossRef]
  15. Bakarji, J.; Callaham, J.; Brunton, S.L.; Kutz, J.N. Dimensionally consistent learning with Buckingham Pi. Nat. Comput. Sci. 2022, 2, 834–844. [Google Scholar] [CrossRef] [PubMed]
  16. Fukami, K.; Taira, K. Robust machine learning of turbulence through generalized Buckingham Pi-inspired pre-processing of training data. In Proceedings of the APS Division of Fluid Dynamics, Phoenix, AZ, USA, 21–23 November 2021; Meeting Abstracts ADS Bibcode: 2021APS..DFDA31004F. p. A31.004. [Google Scholar]
  17. Xie, X.; Samaei, A.; Guo, J.; Liu, W.K.; Gan, Z. Data-driven discovery of dimensionless numbers and governing laws from scarce measurements. Nat. Commun. 2022, 13, 7562. [Google Scholar] [CrossRef] [PubMed]
  18. Bertsekas, D.P. Dynamic Programming and Optimal Control: Approximate Dynamic Programming; Athena Scientific: Nashua, NH, USA, 2012. [Google Scholar]
  19. Kalman, R.E. Contributions to the theory of optimal control. Bol. Soc. Mat. Mex. 1960, 5, 102–119. [Google Scholar]
  20. Asada, H.H.; Slotine, J.J.E. Robot Analysis and Control; John Wiley & Sons: New York, NY, USA, 1986. [Google Scholar]
  21. Hanks, B.; Skelton, R. Closed-form solutions for linear regulator-design of mechanical systems including optimal weighting matrix selection. In Proceedings of the 32nd Structures, Structural Dynamics, and Materials Conference, Baltimore, MD, USA, 8–10 April 1991. [Google Scholar] [CrossRef]
Figure 1. Shared dimensionless policy for inverted pendulums: under some conditions, various dynamic systems will share the same optimal policy up to scaling factors that can be found based on a dimensional analysis.
Figure 1. Shared dimensionless policy for inverted pendulums: under some conditions, various dynamic systems will share the same optimal policy up to scaling factors that can be found based on a dimensional analysis.
Mathematics 12 00709 g001
Figure 2. The policy π is a feedback law that also includes problem parameters as additional arguments. (a) Generic policy; (b) inverted pendulum example.
Figure 2. The policy π is a feedback law that also includes problem parameters as additional arguments. (a) Generic policy; (b) inverted pendulum example.
Mathematics 12 00709 g002
Figure 3. A feedback law f is a slice of the higher dimensional policy mapping π in a specific context.
Figure 3. A feedback law f is a slice of the higher dimensional policy mapping π in a specific context.
Mathematics 12 00709 g003
Figure 4. Isolating the dimensionless knowledge in a policy enables its exact transfer to any dimensionally similar motion control problem.
Figure 4. Isolating the dimensionless knowledge in a policy enables its exact transfer to any dimensionally similar motion control problem.
Mathematics 12 00709 g004
Figure 5. Example of dimensionally similar context subsets that are lines in a plane ( m = 2 and d = 1 ). Context c a is dimensionally similar to c b but not to c c or c d .
Figure 5. Example of dimensionally similar context subsets that are lines in a plane ( m = 2 and d = 1 ). Context c a is dimensionally similar to c b but not to c c or c d .
Mathematics 12 00709 g005
Figure 6. Example of dimensionally similar context subsets that are non-linear curves in a plane ( m = 2 and d = 1 ). Context c a is dimensionally similar to c b but not to c c or c d .
Figure 6. Example of dimensionally similar context subsets that are non-linear curves in a plane ( m = 2 and d = 1 ). Context c a is dimensionally similar to c b but not to c c or c d .
Mathematics 12 00709 g006
Figure 7. A dimensionally similar subset (equal c ) can be represented as a line in a 3D space for the pendulum swing-up problem. The feedback law solutions to problem with context variables on the same line are equivalent in dimensionless form.
Figure 7. A dimensionally similar subset (equal c ) can be represented as a line in a 3D space for the pendulum swing-up problem. The feedback law solutions to problem with context variables on the same line are equivalent in dimensionless form.
Mathematics 12 00709 g007
Figure 8. Numerical results for context c a . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Figure 8. Numerical results for context c a . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Mathematics 12 00709 g008
Figure 9. Numerical results for context c b . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Figure 9. Numerical results for context c b . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Mathematics 12 00709 g009
Figure 10. Numerical results for context c c . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Figure 10. Numerical results for context c c . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Mathematics 12 00709 g010
Figure 11. Numerical results for context c d . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Figure 11. Numerical results for context c d . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Mathematics 12 00709 g011
Figure 12. Numerical results for context c e . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Figure 12. Numerical results for context c e . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Mathematics 12 00709 g012
Figure 13. Numerical results for context c f . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Figure 13. Numerical results for context c f . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Mathematics 12 00709 g013
Figure 14. Numerical results for context c g . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Figure 14. Numerical results for context c g . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Mathematics 12 00709 g014
Figure 15. Numerical results for context c h . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Figure 15. Numerical results for context c h . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Mathematics 12 00709 g015
Figure 16. Numerical results for context c i . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Figure 16. Numerical results for context c i . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at θ = π .
Mathematics 12 00709 g016
Figure 17. Optimal dimensionless policy for various contexts: τ = π ( θ , θ ˙ = 0 , q = 0.5 , τ m a x = [ 0.1 , , 5.0 ] ) .
Figure 17. Optimal dimensionless policy for various contexts: τ = π ( θ , θ ˙ = 0 , q = 0.5 , τ m a x = [ 0.1 , , 5.0 ] ) .
Mathematics 12 00709 g017
Figure 18. Optimal dimensionless policy for various contexts: τ = π ( θ , θ ˙ = 0 , q = [ 0.05 , , 2.0 ] , τ m a x = 0.5 ) .
Figure 18. Optimal dimensionless policy for various contexts: τ = π ( θ , θ ˙ = 0 , q = [ 0.05 , , 2.0 ] , τ m a x = 0.5 ) .
Mathematics 12 00709 g018
Figure 19. Regime zones for a torque-limited pendulum swing-up problem.
Figure 19. Regime zones for a torque-limited pendulum swing-up problem.
Mathematics 12 00709 g019
Figure 20. Car positioning motion control problem.
Figure 20. Car positioning motion control problem.
Mathematics 12 00709 g020
Figure 21. Numerical results for context c a . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Figure 21. Numerical results for context c a . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Mathematics 12 00709 g021
Figure 22. Numerical results for context c b . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Figure 22. Numerical results for context c b . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Mathematics 12 00709 g022
Figure 23. Numerical results for context c c . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Figure 23. Numerical results for context c c . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Mathematics 12 00709 g023
Figure 24. Numerical results for context c d . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Figure 24. Numerical results for context c d . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Mathematics 12 00709 g024
Figure 25. Numerical results for context c e . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Figure 25. Numerical results for context c e . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Mathematics 12 00709 g025
Figure 26. Numerical results for context c f . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Figure 26. Numerical results for context c f . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Mathematics 12 00709 g026
Figure 27. Numerical results for context c g . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Figure 27. Numerical results for context c g . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Mathematics 12 00709 g027
Figure 28. Numerical results for context c h . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Figure 28. Numerical results for context c h . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Mathematics 12 00709 g028
Figure 29. Numerical results for context c i . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Figure 29. Numerical results for context c i . (a) Feedback law f; (b) dimensionless feedback law f ; (c) optimal trajectory, starting at x = 5 l .
Mathematics 12 00709 g029
Table 1. Pendulum swing-up optimal policy variables.
Table 1. Pendulum swing-up optimal policy variables.
VariableDescriptionUnitsDimensions
Control inputs
τ Actuator torqueNm[ M L 2 T 2 ]
State variables
θ Joint anglerad[]
θ ˙ Joint angular velocityrad/s[ T 1 ]
System parameters
mPendulum masskg[M]
gGravitym/s2[ L T 2 ]
lPendulum lenghtm[L]
Task parameters
qWeight parameterNm[ M L 2 T 2 ]
τ m a x Maximum torqueNm[ M L 2 T 2 ]
Table 2. Pendulum reduced system parameters.
Table 2. Pendulum reduced system parameters.
VariableDescriptionUnitsDimensions
m g l Maximum gravitational torqueNm[ M L 2 T 2 ]
ω = g l Natural frequencys−1[ T 1 ]
Table 3. Pendulum swing-up problem context variables.
Table 3. Pendulum swing-up problem context variables.
mglq τ max
Problems with τ m a x = 0.5 and q = 0.1
Context c a :1.010.01.01.05.0
Context c b :1.010.02.02.010.0
Context c c :2.010.01.02.010.0
Problems with τ m a x = 1.0 and q = 0.05
Context c d :1.010.01.00.510.0
Context c e :1.010.02.01.020.0
Context c f :2.010.01.01.020.0
Problems with τ m a x = 1.0 and q = 10
Context c g :1.010.01.0100.010.0
Context c h :1.010.02.0200.020.0
Context c i :2.010.01.0200.020.0
Table 4. Longitudinal car optimal policy variables.
Table 4. Longitudinal car optimal policy variables.
VariableDescriptionUnitsDimensions
Control inputs
sWheel slip-[]
State variables
xCar positionm[L]
x ˙ Car velocitym/s[ L T 1 ]
System parameters
gGravitym/s2[ L T 2 ]
lLength (wheel base)m[L]
x c center of gravity (CG) horizontal positionm[L]
y c center of gravity (CG) vertical positionm[L]
Task parameters
qWeight parameterm[L]
Table 5. Car problem parameters.
Table 5. Car problem parameters.
lg x c y c q
Problems with x c = 0.5 , y c = 0.5 , and  q = 20
Context c a :2.09.81.01.040
Context c b :1.09.80.50.520
Context c c :3.09.81.51.560
Problems with x c = 0.5 , y c = 1.5 , and  q = 10
Context c d :2.09.81.03.020
Context c e :1.09.80.51.510
Context c f :3.09.81.54.530
Problems with x c = 0.5 , y c = 0.1 , and  q = 2
Context c g :2.09.81.00.24
Context c h :1.09.80.50.12
Context c i :3.09.81.50.36
Table 6. Computed torque task variables.
Table 6. Computed torque task variables.
VariableDescriptionUnitsDimensions
Task parameters
ω d Desired closed-loop frequencys−1[ T 1 ]
ζ Desired closed-loop damping-[-]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Girard, A. Dimensionless Policies Based on the Buckingham π Theorem: Is This a Good Way to Generalize Numerical Results? Mathematics 2024, 12, 709. https://doi.org/10.3390/math12050709

AMA Style

Girard A. Dimensionless Policies Based on the Buckingham π Theorem: Is This a Good Way to Generalize Numerical Results? Mathematics. 2024; 12(5):709. https://doi.org/10.3390/math12050709

Chicago/Turabian Style

Girard, Alexandre. 2024. "Dimensionless Policies Based on the Buckingham π Theorem: Is This a Good Way to Generalize Numerical Results?" Mathematics 12, no. 5: 709. https://doi.org/10.3390/math12050709

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop