The methods used to reconstruct analytical dynamic models and their governing equations from time series with tools to optimize the identification of the system for the different configurations studied in this work are now introduced.
Sparse Identification of Nonlinear Dynamics (SINDy)
The algorithm has been introduced by Brunton et al. [
24] is named sparse identification of nonlinear dynamics (SINDy); it is a method that allows the identification of nonlinear dynamical systems using sparse regression and sparse representation, see
Figure 1. It is based on the assumption of sparsity of the observed dynamics: most physical systems can be described by governing equations that are rather sparse in the high-dimensional space of possible nonlinear functions. Nonlinear dynamical systems, as found in structural dynamics, can be represented as
where
represents the state of the system at time t and the nonlinear function
represents the dynamic constraints that define the equation of motion of the system. The function f often consists only of a few terms, making it sparse in the space of possible functions. To determine function f from data, a time history of
is collected,
is measured or approximated numerically from the states and they are collected in the matrices
and
. Next, an augmented library
, consisting of candidate nonlinear functions of
, is constructed; it may consists of constants, polynomials (such as
), trigonometric functions, and other terms. Each column of
represents a candidate function for the right-hand side of Equation (1). Only a few of these functions are active in each row of f, so a regression problem is set up to determine the vector of coefficients
that determine which nonlinear functions are active for each degree of freedom
Here, the central idea is to solve the over-determined system of equations such that the coefficient vector is sparse. Hence, contrary to classical least-squares solutions, small solution terms are erased according to the sparsification parameter
in an iterative fashion. As a result, each column
of
represents a sparse vector of coefficients determining which terms are active in the right-hand side for one of the row equations
in Equation (1). Once
has been determined, the reconstructed set of each row of governing equations can be read directly from it
where
is a vector of symbolic functions of elements of X. Therefore, the overall model is
Each coefficient vector defines the linear combination of nonlinear functions for each state
Often, only a fraction of all states of a dynamical system can be measured during experiments; Taken’s theorem [
26] allows to reconstruct the full dynamics of a nonlinear system from a single time series. The states in the reconstructed space are not, obviously, identical to states in the true phase space but the reconstructed trajectories can be useful because they are topologically similar to the original dynamics, so they have the same geometrical and dynamical properties of the measured dynamics. The strategy for state-space reconstruction is the time-delay embedding, where a single time series
is re-arranged in m-dimensional reconstruction-space vectors
from m time-delayed samples of the measurements
. The embedding parameters are the delay
, derived from the first zero of the autocorrelation function of the time series, and the embedding dimension m, estimated with the false near neighbor (FNN) algorithm, that is a standard tool for determining the embedding dimension [
27,
28,
29].
SINDy also requires state time derivatives that can be measured or generated numerically. As proposed in [
24,
25]. Total Variation Regularized Numerical Differentiation (TVRegDiff) [
30] is used to compute derivatives numerically without noise amplification.
SINDy tries to reconstruct the time series accurately, sometimes generating models of high complexity that reproduce the given data perfectly but rely on a larger number of active functions than the one of the actual underlying governing equation. Tools proposed by Stender et al. [
25] are used to improve and automate the identification of a sparse system with SINDy. The first algorithm introduced finds a correct value of the sparsification parameter
, on which the population of the coefficient matrix Ξ depends;
is varied between the full range of non-zero entries (NZE) of Ξ from NZE = 100% to NZE = 0% and it selects the optimal value of the sparsification parameter
that minimizes the error between the input signal and the one obtained through time integration of the identified set of ODEs. Then, an optimization is introduced to find values of coefficients that improve the reconstruction of the system; this step consists of changing the values of non-zero entries found by SINDy within prescribed boundaries and using the sequential quadratic programming (sqp) method [
31] to find values that reduce the error in the reconstruction further.
In this work, the forcing terms are directly appended to the library of nonlinear functions as a column, instead of introducing an additional degree of freedom for time. In practical applications, usually the forcing is a known quantity as it is measured as a time-dependent parameter, which is reflected in the proposed setup.