Centralized Networks to Generate Human Body Motions

We consider continuous-time recurrent neural networks as dynamical models for the simulation of human body motions. These networks consist of a few centers and many satellites connected to them. The centers evolve in time as periodical oscillators with different frequencies. The center states define the satellite neurons’ states by a radial basis function (RBF) network. To simulate different motions, we adjust the parameters of the RBF networks. Our network includes a switching module that allows for turning from one motion to another. Simulations show that this model allows us to simulate complicated motions consisting of many different dynamical primitives. We also use the model for learning human body motion from markers’ trajectories. We find that center frequencies can be learned from a small number of markers and can be transferred to other markers, such that our technique seems to be capable of correcting for missing information resulting from sparse control marker settings.


Introduction
In recent years, various neural network topologies have been used for recognizing and representing human body motions. In particular, the use of deep networks has been proposed [1,2] or of Long Short-Term Memory (LSTM) networks and their extensions [1,3,4]. Additionally, specialized architectures for human motions such as so-called "phase-functional networks" [5] have recently been proposed.
In this paper we advocate the use of another kind of network-the so-called centralized network [6]. Inspired by the success of these networks in neuroscience, genetics, and ecology [7][8][9][10][11][12][13], we consider centralized, continuous-time recurrent networks of an analogous topological structure as dynamical models for the simulation of human body motions.
Our method combines nonlinear oscillators, centralized architectures, and approximation by radial basis functions. All these ingredients are present in different fields in neuroscience, robotics, and machine learning, but to the best of our knowledge they have not yet been put together. Nonlinear oscillators were discovered as building blocks of locomotor neural circuits in animals, and similar designs were mimicked to control the movements of robots [14][15][16]. Although the idea of coupling oscillators to neural networks was successfully used to model gait transitions in cybernetic models [15,17], there is no systematic approach for learning complicated body movements from sensors data based on this idea. In order to do so, we use radial basis function networks-a popular spontaneous emergence of complex collective behavior. In scale-free networks, the fraction P(k) of nodes in the network having k connections to other nodes (i.e., having degree k) can be estimated for large values of k as P(k) ∼ k −γ , where γ is a parameter whose value is typically in the range 2 < γ < 3 [29]. In such networks, the degree is extremely heterogeneous. In particular, there are strongly connected nodes that can be named hubs, or centers. The hubs communicate to each other directly, or via a number of weakly connected nodes. The weakly connected nodes that interact mainly with hubs can be called satellites. Scale-free networks also have nodes of intermediate connectivity. Networks that have only two types of nodes-strongly connected hubs and weakly connected satellites-are known as bimodal degree networks [31]. Because of the presence of a large number of hubs, scale-free or bimodal degree networks can be called centralized.
It has been shown that centralized networks show a good compromise between robustness and flexibility. They are resilient with respect to external perturbations and are insensitive to noise, while remaining totally controllable [32][33][34]. Furthermore, centralized networks are universal approximate models, and can simulate any structurally stable dynamics [6,35,36]. Other interesting dynamical properties of centralized networks are related to their ability to switch, activating on turning the coordinated evolution of different sets of nodes. On one hand, this capacity is responsible for the "stable yet switchable" property, meaning that the network remains stable in a given context and is able to reach another stable state when a stimulus indicates a change in the context [6]. On the other hand, centralized networks can be itinerant; i.e., spontaneously changing their functioning mode [10].
The above dynamical properties of centralized networks have received particular attention in neuroscience, genetics, and ecology. Centralized connectivity has been found by functional imaging of brain activity in neuroscience [7], and also by large-scale studies of protein-protein interactions or of metabolic networks in functional genetics [8,9]. Itinerant and switching behavior was observed in the transient activity of antennal lobe neurons involved in insect olfaction or in the activity of high vocal centers controlling songbird patterns [10]. The robustness of scale-free networks was emphasized in relation to food-webs and ecosystems [11,12], epidemics [13], etc.
Motivated by the success of centralized networks in neuroscience, genetics, and ecology, we consider centralized, continuous-time recurrent networks of an analogous topological structure as dynamical models for simulation of human body motions. From the general setting, we take the idea that these networks consist of a few centers and many satellites. As human motions are very often cyclical but with varying frequencies, the centers may evolve in time as oscillators with different frequencies. We take the idea of radial basis function (RBF) networks to define the center states. An additional switching module allows us to turn from one particular motion to another. Due to this structure, the network can simulate a large class of different motions with good accuracies, which depend on the oscillator frequencies.

Centralized Networks for Elementary Human Motions
The networks consist of n centers with the states q i , and a number of satellites with states X j , Y j , Z j , where j = 1, . . . , N n. In the simplest case, when we approximate a single relatively simple motion, the time evolutions of the center states are governed by harmonic oscillator equation: where q i is the coordinate of the i-th oscillator, ω i is the frequency of that oscillator, and n is the number of oscillators. Often even two oscillators (n = 2) provide a good accuracy, but for more complicated motions one can take n ∈ {3, 4, 5}. Let q(t) = (q 1 , . . . , q n ) be the vector of the oscillator states, depending on time t, and x k (t) are output coordinates (here x 1 (t) = X(t), x 2 (T) = Y(t), x 3 (t) = Z(t)).
The centers are connected with N output coordinates x k by a network: where x k is the k-th coordinate on the body, k = 1, . . . , N. The functions Φ j form a basis in the space L 2 ([−X 0 , X 0 ], where x 0 is characteristic maximal amplitude of motion for the j-th point, b is a parameter, and N m is the number of basis functions. The matrix entry W kj describes the action of the node j on x k . Note that (2) defines a straight-forward network that maps the center states q i into the output coordinate x k by N m hidden neurons (satellites), and therefore, there are no interactions between satellites. There are possible different choices of Φ j . For example, we can consider the following cases.

A
Harmonic basis. Here we assume that where b is a frequency.

B
System of radial basis functions.
For the case where a motion consists of many segments and we observe sharp transitions between those segments, we can use radial basis functions where φ is a fixed function, b is a sharpness parameter, andq (j) is the vector of centers of radial basis functions with componentsq (j ) = (q (j) where the latter are parameters of the system, and |z| denotes the Euclidian norm of the vector z: |z| = ∑ n i=1 z 2 i . We assume that the radial basis function φ(|z|) is well localized at z = 0 and is smooth. For example, we can take the Gaussian φ(|z|) = exp(−|z| 2 /2). (5)

C
Polynomial basis.
The basis B has an important advantage: the radial basis functions provide local approximations that are important to approximate complicated motions with sharp transitions.
To perform switching in the network, we will also use the sigmoidal functions σ. They are increasing and smooth (at least twice differentiable) functions such that Typical examples can be given by The structure of interactions between centers and coordinates x i can be described by Figure 1. The graph consists of eight nodes denoted by v 1 , v 2 , w 1 , w 2 , w 3 , w 4 , w 5 , w 6 . Each node w i corresponds to a contribution of a radial basis function Φ(q −q (j) ). The nodes v 1 , v 2 form the set of centers C and they affect w i . In turn, the nodes w i determine the output coordinate x 1 .

Centralized Networks Generating a Large Class of Human Body Motions
To approximate different motions by a single network, we should have the possibility of changing the frequencies and coefficients W kj .
The main idea is as follows. Each motion can be approximated by a network described in the previous subsection, with adjusted frequencies ω i and appropriated coefficients W kj . We can use nonlinear oscillators to obtain all possible frequencies. For example, one can use the model described below. Consider networks consisting of n centers, which evolve as nonlinear oscillators: where q i is the coordinate of the i-th oscillator, f (q) is a nonlinear function, and z c is a control paremeter (one can take, for instance, f = sin(q) or f = aq − bq 3 ). We assume that where p 0 is a fixed number. Solutions of Equation (9) are periodic functions of time, with the period T(z i ) and the frequency ω(z i ) = 2π/T. It can be found by the motion integral of Equation (9) that: where F is the antiderivative of f : f (q) = dF dq . Notice that Equations (2)-(9) are "standard" (see e.g., [19]).

Consider a set of human motions characterized by a set of coordinates
where the upper index j corresponds to a particular motion. Each motion can be described by the model (1) and (2) with the corresponding frequencies ω By the switching module (described in the next subsection), we find a network subsystem which has z (j) c as local attractors. Then, we can construct maps z c → ω 1 (z), . . . , ω n (z) and z c → W kl (z c ) such that Hence, our global model for human motion consists of 1. a system of n nonlinear oscillators (9) with the control parameters z i , i = 1, . . . , n; 2. an RBF network defined by (2); 3. maps obeying Equations (12) and (13); and 4. a switching module that is a network with M + 1 nodes, where M is the number of different motions.
In the next section, we describe the switching module.

Switching Module
Ideas behind construction. Before stating a formal statement, we present a brief outline which describes the main ideas of the proof and the architecture of the switchable network. The network consists of two modules. The first module is a generating one and it is a centralized neural network with n centers q 1 , . . . , q n and satellites x 1 , . . . , x N . The second module consists of a center v n+1 = z and m satellitesw 1 , . . . ,w m . The satellites from this module interact only with the module center z; i.e., in this module the interactions can be described by a distar graph [6]. Only the center of the second module interacts with the neurons of the first (generating) module. We refer to the second module as a switching one. This architecture is shown in Figure 2.
For the switching module, the corresponding differential equations have the following form. Let us consider a distar interaction motif, where a node z is connected in both directions with m nodes w 1 , . . . ,w m . By this notation, the equations for the switching module can be written down in the form where i = 1, . . . , m andb i ,ã j ,λ > 0. In order to come up with a mathematical description of the way in which switching module works, let us consider the system of differential equations where z is a real control parameter. Let z 1 , . . . , z m+1 be some values of this parameter. We find a vector field Q such that for z = z l , where l = 1, . . . , m, the dynamics defined by (16) have the prescribed dynamics. For example, we can set n = 2 and and which gives (9).
For the switching module, we adjust the center-satellite interactions and the center response time parameter ξ in such a way that for a set of values ξ the switching module has the dynamics of the system shown in (14) and (15), with m different rest points z = z 1 , z 2 , . . . , z m+1 , and for sufficiently large ξ the system shown in (14) and (15) has a single equilibrium close to z 1 = 0. The existence of such a choice will be shown in Lemma 1. This lemma has been stated and proven in the generic context ([6] Lemma 8.2). Due to its importance, we restate it here. Figure 2. Modular architecture. This can be seen as an example of the architecture described in [6]. The switching module consists of the center z and the satellitesw 1 ,w 2 ,w 3 . The generating module consists of the centers v 1 , v 2 and the satellites w 1 , . . . , w 6 . Note that there is a feedback between z and the satellites w i ; however, there is no feedback of w j on v l . Lemma 1. Let β ∈ (0, 1) and let m be a positive integer. For sufficiently small κ > 0, there existā j , b i ,h i , h such that (i) for an open interval of values ξ the system in (14) and (15) has m stable hyperbolic rest points where j = 1, . . . , m; (ii) for ξ > ξ 0 > 0 the system in (14) and (15) has a single stable hyperbolic rest point.
For the proof of this lemma we refer to [6].

Non-Segmented Motions
Simple motions can be handled as a whole (i.e., without any segmentation). Let us fix the index j (i.e., consider a particular motion). Let t 1 , . . . , t K be time moments where we have data on human body coordinates X j (t), Y j (t), Z j (t), where j is the index of an optical marker on the body and the number of the markers is N, j = 1, ..., N. All X, Y, and Z are thus vectors with N components. Let ε(k, ω) be the L 2 -approximation accuracy for the x-component and k-th marker defined by where x k (q) are defined by (2). Similarly, The relative accuracies for X, Y, Z components are given by respectively. Let us fix a k ∈ {1, 2, ..., N} (i.e., a marker on the human body). For a set of frequency vectors ω, we compute the integral relative accuracy ε r,k (ω) = (ε 2 r,X,ω (k) + ε 2 r,Y,ω (k) + ε 2 r,Z,ω (k))/3.
Then, we find a ω * such that ε(ω * ) is minimal: The corresponding coefficients W kl can be found by the standard Matlab programs, which approximate a target function by RBF networks. Here we use standard radial basis functions of Gaussian type, where the sharpness parameter b can be adjusted by trial and error to minimize ε.
Numerical results show that the frequencies found for a particular motion by a value of k (a specific marker choice) and giving a small ε r,k can be applied to find good approximations for all rest values of k (i.e., for all other markers). An alternative method is to take the average of all markers, and then However, in this case the running time of the algorithm sharply increases.

Segmented Motions
For complex motions it is difficult to uniformly approximate a whole motion using a few neurons; sometimes such approximation is good anywhere except for a certain interval. In fact, it is difficult to expect that all parts of complicated motions consisting of quite different elementary submotions can be handled with the same frequencies. However, we can use the segmentation. We then decompose the motion in segments [T i , T i+1 ], where i = 1, . . . , N seg . For each segment we can determine optimal frequencies as described above and compute the accuracies. The frequency optimization can be done in two ways. If the number of oscillators is small (say, n = 1, 2), we can perform an exhaustive search over a uniform grid. For larger n, one can use a random search.

Comparison with DMPs
Let us compare the approach based on centralized networks, proposed in this present paper, and the classical method of dynamic movement primitives (DMPs). Both approaches use the same general representation, which, following [24], we write down as follows (see Equations (1) and (2) in [24]): The first equation is a time-dependent dynamical system, and the second one describes a transformation of trajectories of that dynamical system to desired trajectories y(t). Note that the term P(s) = Perturbation(s) should be adapted to induce a desired behaviour in the system; i.e., to reproduce a given trajectory [24]. So, a DMP consists of two parts, as described by Ernesti et al. [24]: "the canonical system and the transformation system. While the canonical system defines the state of the DMP in time, the transformation system is the link between this DMP state and the robot. The transformation system can be easily adapted to a desired trajectory; i.e., by solving a standard regression problem. The canonical system determines the type of attractor which can be either discrete or periodic".
The DMP method uses P(s) to attain the twofold goal: to represent trajectories tending to rest points and periodic trajectories. In fact, roughly speaking, the dynamics of any dissipative systems reduce to some transient trajectories and motions on local attractors. However, it is not so simple to represent simultaneously transient dynamics, as was mentioned in [22]. To attain this goal, we must use sufficiently sophisticated formulas for P(s), which are based mainly on radial basis functions and the fact that RBFs are universal approximators.
In our centralized network approach, we use the same transformation system (29). However, we add a new idea in the representation of canonical part (28). It is well known that many motions generated by dissipative systems consist of slow and fast components. Fast components can describe, for example, transient trajectories, while slow components correspond to motions on local attractors. To represent such complex dynamics, we can nonetheless use systems of oscillators [37].
In particular, in our approach we usually use two oscillators, one of higher frequency and another of low frequency, although one can take three or more oscillators for complicated target motions. This idea works well: we greatly simplify the complicated formulas suggested in [22], and all transformation systems take the feed-forward form:

Results
For empiric tests we use the CMU Motion Capture Database [38]. We use two motions from family number 86, as these consist of sequences of several different motions performed by one actor subsequently, and hence have also been used as a test suite for different motion segmentation algorithms (e.g., [39] and references therein).
We use markers on left and right heels and left and right wrists, as in general from the position of these four markers even the full body motion can be reconstructed quite well [40,41].

Results without Segmentation and Ad Hoc Segmentations
We have considered two representative motions: Trial 1 and Trial 2. The first motion consists of jumping, hopping, turning, kicking, and punching, the second one is comprised of walking, squatting, running, stretching, jumping, punching, and drinking. The first motion is split into four segments [1,1300] An overview of results is given in Table 1. In Figures 3-6 we show the results of the approximations of the different marker coordinates by 25, 50, and 100 satellites and two oscillators of motion CMU 86 Trial 1 consisting of jumping, kicking, and punching. In Figure 7 a three-dimensional plot of the marker trajectory of the right wrist of the same motion is presented.
In Figures 8 and 9 we give the approximations for a simple non-segmented motion. An approximation of a complicated motion (CMU 86 Trial 2) consisting of walking, squatting, running, stretching, jumping, punching, and drinking is given in Figures 10-12. For this motion an RBF-network with three centers and 100 satellites was necessary for good approximations on the segmentation of the hand into three parts. (c) (d)   The integral relative accuracies (for x, y, and z coordinates together) are as follows: for the first segment consisting of walking and squats the accuracy is 0.006, for the second segment consisting of running (in a circle) it is 0.002, and for the third one consisting of stretches it is 0.005.
As the accuracies of course improve when using more satellites, we computed the Akaike information criterion corrected for finite sample sizes (AICc) [42], a likelihood-based measurement for systematic tests involving 20-250 satellites, which weights accuracy against the number of parameters (with lower AIC values being better). The results for Trial 1 with 1, 2, 3, and 4 centers is given in Figure 13.
Notice that only the comparative values are important. The global optimum is reached for three centers and 150 satellites (with a small increase for higher satellite numbers).

Results Based on Algorithmic Segmentations as Pre-Processing Steps
Algorithmic segmentation methods yield much smaller segments than our ad hoc segmentations. Using the method described by Krüger et al. [39] as a pre-processing step, the motion CMU 86 Trial 1 is segmented into six main parts with five transition motions. When taking each of the 11 segments as an input, only 16 satellite neurons per segment are sufficient for good approximations. Notice that the required number of neurons is more than a factor of 15 smaller than the number of frames in each segment.
In Figure 14 the absolute and relative integral errors for all 31 markers and all 11 segments of CMU 86 Trial 1 using two centers and 49 satellites are given. The differences between the relative integral errors and the absolute errors can be explained by the large motions of some markers in some segments. We observe a certain ruggedness of the fitting landscape, which can be explained by the rather complicated nature of the motions and the transitions between motions of very different characteristics.
In Figure 15 a systematic comparison of the approximation errors over the different segments in CMU 86 Trial 1 are given when using 16 satellites respectively 100 satellites.  . Approximation errors (in mm) for the algorithmically found segments of motions in CMU 86 Trial 1. We give the errors for using 16 and 100 satellites. As a comparison we give the results using rhythmic dynamic movement primitives (DMPs) with 100 basis functions computed with pydmps, and the average approximation error of the Bayesian approach reported in [28] (Table 3). Notice that segments 2, 4, 6, and 8 are short transitional motions between the neighboring segments. The average over all segments is 8.7 mm for the DMPs, 7.9 mm for two centers and 16 satellites, and 6.7 mm for two centers and 100 satellites.

Comparison with Other Approaches
In [28] a method for marker reconstructions based on local similarity searches, building local linear models of found similar motions, and using this information as priors in a pose-wise reconstruction process reported results on marker accuracy on the motions of family CMU 86 from the CMU mocap database. The overall Bayesian framework is similar to that already suggested by [27]. The reported average joint error for CMU 86 Trial 1 is 1.30 cm ( [28], Table 3), when taking prior information of motions from the CMU database into account. In our approach the average joint errors are more than one order of magnitude smaller. Although the results are not fully comparable, it is encouraging to see that our approach gives better results even without relying on prior information of other motions, as the Bayesian approach used in [28] does.
We approximated the marker trajectories of the segmented motions of CMU motion 86 Trial 1 and Trial 2 with DMPs using the pydmps implementation by Travis DeWolf, which is available at https://github.com/studywolf/pydmps. We have used the code for the rhythmic DMPs (using 100 basis functions) on the algorithmically segmented motions. The results in comparison to our centralized networks are detailed in Figures 15 and 16.

Discussion and Conclusions
We have shown that marker trajectories of representative body parts can be approximated well by centralized networks consisting of very few centers as oscillators-2 to 3 oscillators have been shown to be sufficient even for rather complicated motions. The needed satellites required even for very good approximations are one to two orders of magnitude smaller than the number of frames considered; hence, our technique yields very compact representations and compresses marker trajectories. The learned frequencies of one marker could be transferred to other markers, so our technique seems to be capable of the motion reconstruction problem from a few markers [27]. As this problem is of particular practical interest if the input data are not marker positions but sensor readings of inertial measurement units [40,41], an application of our method to this setting is of interest. As the accuracy of reconstruction at the level of a single marker is very good, we presume that such a technique could also yield much better reconstruction results than the existing Bayesian approaches. In future work we will investigate this line of research. Additionally, the use of surface electromyography (EMG) has become an increasingly practical sensor technology for human motion interaction (e.g., the Myo Gesture Control Armband), and our technique can be used for sensor data of different kinds. Additionally, for investigating surface EMG signals of animal motions, the centralized networks might yield another basic technique, which we will test on existing data sets [43].
While with the dynamic movement primitives (DMPs) the use of one oscillator has been commonly used and the use of radial basis functions closely correspond to our techniques, we can readily use more than one oscillator. In our experiments, the use of two (or three) oscillators yielded better results than using just one. The idea of switching has also been proposed in the context of DMPs [25]-yielding in some sense a conceptual ad hoc extension. As has been noted in [25] (Section 2.3.5), the modeling easily becomes complex. In our proposal the switching module stays within the realm of centralized networks. In principle, our techniques should be applicable in all contexts in which DMPs have been used, yielding a simpler modeling alternative.
The oscillator frequencies give very useful semantic information on motions that should be widely applicable. By searching for similar vectors of oscillator frequencies, our technique can also give a basis for motion retrieval, which in contrast to other techniques does not involve similarity measures for poses first [28] but works directly on marker trajectories. As vectors of oscillation frequencies are readily indexable, efficient retrieval from even huge motion databases is possible-and fine tuning the query regarding the weighting of different body parts is even possible without a re-indexing of the entire database. By manipulating the oscillator frequencies or transferring them to other marker positions, the presented techniques are also capable of various motion adaption and synthesis tasks, which range from a new technical basis for the classic ideas by Pullen and Bregler [44] to ideas related to motion fields [45]. It will be the topic of future work to explore these directions in more detail.
In our current method there is no need to use a priori knowledge on human motions by referring to similar known motions, as is the basis of Bayesian approaches [26][27][28]. Being an advantage on the one hand, it is on the other hand a disadvantage if such a priori knowledge on "similar motions" is available. Incorporating such a possibility is very much in the realm of neural networks, and will be a topic for future research.
Moreover, centralized networks should also be applicable in the context of motion anticipation: by extrapolation from the past into the future, the presented technique also has the potential for full body motion anticipation in the short-term when staying within a fixed tuple of oscillator frequencies and for the mid-term range when using switching. We will explore this possibility within our future research in the collaborative research unit "Anticipating Human Behavior", funded by Deutsche Forschungsgemeinschaft under grant number FOR 2535.