1. Introduction
The algebra of the Kronecker product is essential to multivariate statistical analysis. It is involved in the formulation of multivariate statistical models and in the derivation of their estimating equations. Such derivations commonly require the differentiation of functions of matrices in respect of their matrix arguments. This generates expressions that demand the use of the Kronecker product (see
Pollock (
1979,
1985) and
Magnus (
2010) for examples).
Given that the subject is so well served, one might wonder what scope exists for further contributions. The repository of specialised results is extensive and most of what is needed is available, if looked for diligently. Nevertheless, the subject remains problematic. In many respects, it lacks sufficient transparency to allow easy access to its users, who are liable to call upon it only to satisfy their occasional needs.
Much of the problem lies in the notational difficulty of the subject. This is not primarily a matter of notational complexity. It is a more a result of the use of a notation that conceals the underlying structure of the objects of the analysis. A litany of results has been created that are not readily proven and that are mostly immemorable.
The contention that underlies this paper is that many of the difficulties can be relieved by adopting an appropriate notation involving an explicit use of indices. In the context of a treatise on differential geometry,
Cartan (
1952) famously decried the confusion that can arise from what he described as
une debauche d’indices. He suggested that a clearer understanding of the geometry could be achieved by appealing to abstract concepts. This paper offers a different opinion. An orderly index notation is an indispensable aid to efficient computation. It assists rather that impedes ready access to vector space interpretations.
The continued adherence of econometricians and statisticians to the conventional notation of matrix algebra, when dealing with Kronecker products, is explained by one of their essential purposes, which is to cast complex multivariate models into the formats of simpler models for which there exist well developed theories. An early example of the use of the Kronecker product in econometrics was provided by
Zellner (
1962) in his treatment of the so-called seemingly unrelated regression equations. His treatment enabled the system of equations to be cast in the format of a single multiple regression equation, which enabled him to exploit the existing results pertaining to such equations.
This paper is concerned primarily with matters of computation. Therefore, the next section considers the alternative ways in which computers handle multidimensional arrays.
Section 3 presents an index notation, which is designed to assist in computations, and it provides an analysis of the Kronecker product, which is the matrix of a tensor product with respect to a natural basis.
Section 4, which may be omitted without interrupting the flow of the exposition, describes the formal algebra of the tensor product, and it mentions some of its applications.
Section 5 and
Section 6 deal with a particular problem concerning multiple Kronecker products, which, it is believed, will serve to illustrate the power of the index notation.
2. The Multidimensional Framework and the Computer
The natural framework for a multi-dimensional array is a set of points in the positive orthant of a multi-dimensional space that have integer-valued co-ordinates. These co-ordinates will serve as the indices of the elements of the array.
The simplest array is a linear array of one dimension, which is described as a vector. Next is a rectangular array of two dimensions, which is a described as a matrix. In three dimensions, there is a cubic array, and the progression continues with hypercubes of increasing dimensions. However, there is an obvious difficulty that limits the dimensions of a practical representation to no more than two. The elements in the interior of a cube are hidden from view, and an object of more than three dimensions is incapable of visualisation.
The only practical recourse for making higher-dimensional arrays accessible and amenable to manipulations is to map them into a linear array or into a matrix. Such a mapping establishes an ordering within the lower-dimensional array of the elements of the higher-dimensional array.
The simplest examples of such mappings are the ways in which a matrix of M rows and N columns can be mapped into a vector. The row-major mapping joins successive rows of the matrix in a wide row vector. Here, the index j passes through N values for each increment of the index i. The index that locates the element within the vector is given by . The alternative column-major mapping joins successive columns of the matrix in a long column vector, and then the index that locates the element is given by .
These two mappings are commonly employed in assigning the elements of a matrix to contiguous cells of the memory of a computer. Whereas the Pascal and C compilers adopt the row-major principle, the FORTRAN compiler and the MATLAB interpreter adopt the column-major principle.
The space to store a matrix
A of order
M times
N in the memory of a computer is commonly provided by a statement such as
Regardless of which of the two mappings has been used to store the matrix, its elements can be accessed and displayed readily in either order, and matrix can also be reconstituted and displayed in its original form. Thus, with the help of nested do-loops, the matrix can be displayed via a code such as
the row-major form would be displayed via
and the column-major form would be displayed by reversing the order of the do-loops:
The extension to arrays of higher dimensions is straightforward. Consider a cubic array
in which the elements
are indexed by
,
, and
. On the assumption that the index
k in now the most rapidly changing, the row-major mapping would give the element a position corresponding to
, whereas, with
i as the most rapid index, the column-major mapping would give it the position corresponding to
.
It should be noted that, if the initial values of i, j and k are 1, and if , then, in the row-major mapping, it might be declared that k stands for units, j stands for tens and i stands for hundreds. Then, the formula for the position corresponds to how we count in base 10. More generally, if M, N and P were to take different values, then the index would correspond to a mixed radix representation of a number. As it is, with the initial values at one, we are liable to describe within the row-major mapping as the index of a lexicogoraphic or dictionary ordering.
A three-dimensional array has three indices. Two of these indices will serve to define the framework of a matrix and the third index will serve to define the order of the resulting matrices. The indices can be assigned to these roles in six different ways. To keep track of the elements of a multidimensional array, it is clear that an index notation is called for. This will be provided in the next section together with the definitions of a few of the objects that arise most frequently in the processes of statistical computing. These objects are liable to be compounded from objects of lower dimensions. This gives rise to structures that one may be able to exploit in pursuit of computational efficiency.
To achieve computational efficiency, it is necessary to avoid displacing the values that are stored in the memory of the computer. They must be left in place, regardless of the changing nature of the structures to which they may give rise, whether these are vectors or matrices or the products of matrices. Such objects can be created by the manipulation of pointers, which are variables that contain the addresses of the memory locations in which the values are stored.
It is notable that do-loops are implemented by pointers, which accounts for the ease with which their order can be altered within a nested sequence. Thus, whichever system of storage is used, one can rely on do-loops (or pointers) to pluck the elements from the memory in the orders that are required.
3. The Index Notation
The object that will be analysed in detail in subsequent sections of the paper is a threefold Kronecker product postmultuplied by a long vector. It is appropriate already to display this object, which may be denoted by
Its factors are as follows:
In these definitions, both the brackets
and the parentheses
are meant to signify that each index is varying over its designated range. The lower limits of the indices are unity, and their upper limits may be denoted by the corresponding capital letters, so that, for example,
. The alternative notation for the matrices and for the long vector
has been taken from a paper of
Pollock (
1985). It expresses the matrices as weighted combinations of the corresponding basic matrices. Thus, for example,
where
is a basic matrix with a unit in the
jth row and the
pth column and with zeros elsewhere (the parentheses indicate that summations are to be taken in respect of the indices
j and
p that are associated with the basic matrices and which are repeated within the accompanying scalar elements). The
pth column of the matrix
A is
, whereas its
jth row is
. Here, as in Equation (
8), the summations are over the basis elements, which are vectors in these cases. Summations occur only when there are repeated instances of an index.
There are alternative ways of arranging the elements of the matrix to define different objects. The transpose
of the matrix together with the long column vector
and the wide row vector
, which are described, respectively, as its column-major and row-major forms, may be denoted as follows:
Observe that, in the mappings
and
, the migrating index moves ahead of the index or the string of indices that it is joining. In the mappings
and
, the migrating index joins from behind.
Whereas the matrix elements will remain fixed in the computer memory in the order indicated by their indices, the nature of a derived multidimensional object is indicated by the positioning of the indices on the basic matrices or vectors. Thus, denotes a basic matrix with a unit in the jth row and the pth column, whereas is a long column vector with a unit in the th position and is a wide row vector with a unit in the th position. Elsewhere, there are only zero-valued elements. For the basis vectors, the composite indices follow a lexicographic ordering, which accords with the conventional definition of a Kronecker product, as will be confirmed at the end of this section.
The translation from the elements that are held in the computer’s memory to the various multidimensional objects should be by virtue of the pointer arithmetic that is associated with the basic arrays , and . These matrices (or vectors) determine the format of the object or its “shape”, which is the MATLAB terminology.
Observe that, in the case of the long column vector , the elements are arrayed in the reverse of their lexicographic ordering, with the trailing index p as the leading classifier within the vector. In the case of the long column vector of (6), which is the transpose of the wide vector , the elements follow the lexicographic ordering of the indices that are associated with the basic vectors. It will also be necessary to consider the long vector in which the elements are arrayed in reverse lexicographic order, which is the order adopted by the MATLAB program.
The composition of the matrix
A with a matrix
may be denoted by
The final expression is derived by cancelling the superscripted column index
p of the leading matrix with the subscripted row index
p in the trailing matrix. The index
p that is affixed to the right brace within the product is to emphasise that the elements within the braces are to be summed over the range of the repeated index. It may be omitted if the eyes can be relied on to recognise repeated indices. Thus
which is described as a contraction with respect to
p.
The transformation
from
to
is effected by the commutation matrix
. Thus
and it will be recognised that
, albeit that these identities do not extend to objects of higher dimensions.
It is easy to seen that
© is an orthonormal matrix such that
. It has been given various names.
Pollock (
1979) has called it the tensor commutator, denoted by a capital
T inscribed in a circle.
Magnus and Neudecker (
1979) described it as the commutation matrix, which they denoted by
K, and
Van Loan (
2000) has described it as the shuffle matrix, denoted by
S.
The notation for the Kronecker product of matrices is illustrated by the following example:
Here, the row indices
and the column indices
associated with the basic matrices both follow lexicographic orderings. The order of the matrices within a Kronecker product can be reversed by applying the tensor commutator to both the rows and the columns. Thus,
The subscripts on the
© matrices indicate the indices that are to be commuted.
It is now possible to evaluate the product of (6). This can be expressed in the conventional matrix notation and in the index notation as follows:
On replacing the row-major vector
by the column-major vector
and on reversing the order of the matrices, an alternative version of the product is created in the form of
A series of commutation operations link the two forms. These forms might be regarded as generalisations of the equations
where
and
, and where the other matrices are as defined previously. The equations of (17) tend to convey the idea that a multiple Kronecker product can be computed via pairwise matrix multiplications. To pursue such a strategy, it is necessary to cast the vectors
or
and their transforms into the shapes of matrices.
The various operations of reshaping that might be applied to the column-major vector
would produce matrices that are denoted as follows:
Similar operations can be applied to the row-major vector
. It is important that the reshaping operations should be effected by manipulating pointers, instead of by moving the elements themselves.
The conversions
and
involve separating the trailing and leading elements, respectively, from the rest of the composite index
, whereby
and
. These conversions will prove to be amenable to the
reshape operator of MATLAB. To create
via MATLAB would require a commutation operation to bring the index
q to the leading or trailing position. However, the creation of all of these matrices is readily achieved with do-loops that are nested appropriately. The MATLAB
reshape operator is considered in detail in
Section 6, and examples are provided in an appendix. In a recent paper,
Fackler (
2019) has show that some efficiency in computing the product of (6) can be gained by exploiting the
reshape procedure of that program.
It may be helpful to demonstrate that the definition that has been provided in (13) for the Kronecker product
does correspond to the usual definition, which depends on displaying the subscripted elements within the context of their matrices. It is sufficient to consider the product of two square matrices of order 2 as follows:
It can be seen that the composite row indices associated with the elements follow the lexicographic sequence , as do the composite column indices , which is to say that the conventional definition of a Kronecker product is in accordance with the row-major scheme.
4. Tensor Products and Associated Notations
The majority of econometric texts that consider the Kronecker product make no mention of tensor products, notwithstanding the fact that a Kronecker product may be classified as a tensor product. An exception is the account of
Pollock (
1985), which defines tensors in the context of multilinear algebra.
There are two ways in which the algebra of tensors may be approached. The first way is via mathematical physics, where vectors are liable to subsist in three-dimensional or four-dimensional spaces. Tensors are used in expounding the theory of electrodynamics via Maxwell’s equations—see
Brau (
2004)—the theory of special relativity—see
Lawden (
1967) —and the theory of the elasticity of solids—see
Bower (
2010). In such contexts, vectors and tensors are liable to acquire substantive connotations.
A second way of approaching tensors and tensor products is via the abstract theory of multilinear algebra that was propounded originally by
Bourbaki (
1948) and which has been expounded in the more recent texts of
Greub (
1967) and
Marcus (
1973). Such expositions take a coordinate-free approach that avoids specifying bases for the vector spaces and which does not preempt the choice of a metric.
The coordinate-free approach is well suited to a physical analysis that regards vectors as geometric objects that exist in space independently of any coordinate system or of its basis vectors. This flexibility is exploited in the theory of special relativity to enable geometric vectors to be viewed within alternative frames of reference. The natural basis is appropriate to an observer at rest within their own frame of reference. However, objects that are in motion relative to the observer require other coordinate systems, which are derived by applying the Lorentz transform to the natural basis. (see
Marder (
1968) and
Susskind and Friedman (
2018), for examples).
In the main, data analysts and computer scientists are liable to regard vectors as algebraic objects comprising an ordered set of elements, which are the coordinates of the vector relative to a natural basis consisting of orthogonal vectors of unit length comprised by an identity matrix. Our approach, in this section, is to express the basic definitions in terms of arbitrary vector spaces, denoted by and etc., and to interpret them immediately in terms of real finite-dimensional coordinate spaces. An n-dimensional real vector space will be denoted by .
If is a vector space defined over a scalar field , then the dual space is the space of all linear functionals mapping from to . Given that it is of the same dimension, the dual space is isomorphic to the primal vector space . The relationship between and is reflexive or reciprocal, which is to say that the dual of the dual space is the primal space.
The dual of the vector space may be denoted by . The elements of a coordinate vector space are column vectors, which may be denoted by subscripted lowercase symbols, such as , whereas the elements of the corresponding dual space are row vectors denoted by symbols with superscripts, such as . Lowercase letters without subscripts, such as , are liable to be regarded as column vectors. The inner product is a linear functional.
If
and
are vector spaces defined over a scalar field
, then a mapping
from the Cartesian product of the spaces to the scalar field is a bilinear functional if, when
and
and when
, there are
The set of all such bilinear functionals is a vector space denoted by
. Familiar examples of the mapping
are as follows:
were
denotes a matrix with a unit in the
th position and with zeros elsewhere. The second of these products may be written, more simply, as
This is a bilinear functional, which can also be considered to be a linear functional, if one of the vectors is regarded as the means by which the other vector is mapped to the field
.
Less familiar are the matrix forms of the mappings
and
. Examples of these, in turn, are as follows:
and
Reference to (17) will show that
and that
.
It is important to recognise that the vector space represents the means by which vectors and are mapped into the scalar field . It should be distinguished from the individual mappings of , each of which entails specific values of u and v. The distinction can be made clear by considering the functional , where and are column vectors and is a matrix. The pair are an element of the Cartesian product . The matrix A is an element of , and f is a product of a specific mapping of .
A tensor product of vector spaces may be defined in a confusing variety of ways. However, it is appropriate to regard the tensor product
as unique linear mapping from the set
. Thus, for example,
Shephard (
1966) defines it to be the dual of the vector space
of bilinear functionals, so that each bilinear functional corresponds to a unique linear functional on
. From the point of view of the real coordinate spaces, this amounts to nothing more than a change of notation, Nevertheless, what is revealed can be insightful.
A distinction must be made between the forms , and . The first of these is liable to be described as a contravariant product, and it may be noted that . The others may be described as covariant products, and it should be noted that the order of the two spaces is significant: and . In the present context, covariance and contravariance denote relationships. However, in mathematical physics and in other contexts, they are regarded as attributes of the vector spaces.
If
and
, then their contravariant product is an element in
that has the form
The corresponding covariant products in
and
, respectively. are
These are elementary or decomposable products. Examples of non-decomposable tensor products of the three varieties are
These would become decomposable products if
for all
, albeit that a decomposable product may be the product of two or more non-decomposable products.
It is when coordinates are attributed to physical vectors in 3-dimensional space or in 4-dimensional spacetime that the terms covariant and contravariant come to denote attributes as opposed to reciprocal relationships. The usage can be demonstrated by considering a change of bases. Let and be alternative bases of a vector space . Then, there exists transformations and that transform one basis into the other.
Consider a vector expressed as
in terms of the natural basis
and, alternatively, as
, where
and
are the coordinate vectors relative to the bases
W and
M, respectively. Then, since
, there are
, whence
and
. Thus, the bases and the corresponding coordinates transform as follows:
Let
be a diagonal matrix. Then, the
ith column of the transformed basis
M is
and the vector of coordinates relative the basis vectors of
M is
They have a contravariant relationship with the basis vectors. The effect of halving the length of a basis vector will be the doubling of the value of an associated coordinate.
In general, vectors that change their scale inversely to changes in the scale of the reference axes (i.e. the basis vectors) are commonly described as contravariant. Dual vectors that change in the same way as the changes to scale of the reference axes are described as covariant. If denotes a velocity vector, then its coefficients , which have the dimension of distance/time, can said to constitute a contravariant vector. By contrast, a gradient vector of a dimension quantity/distance will be a covariant vector.
In the Einstein notation, which suppresses the summation sign, contravariant components carry upper indices, and the velocity vector will be denoted, typically, by
where
denotes a basis vector. The understanding is that, whenever an index is repeated within an expression, there must be a corresponding summation. This easement of the notation, which dispenses with the summation signs, relies on an understanding of the context of the expression.
The notation of the present paper adopts the same summation convention for expressions that are found within parentheses. Thus, denotes an identity matrix. However, in the absence of the parentheses, denotes a matrix with in the ith row and the jth column and with zeros elsewhere. The expression from (21) denotes a matrix with a unit in that position and with zeros elsewhere. Since there are no repeated indices, the parentheses do not imply a summation.
It should be noted that, in this notation, scalar elements never carry superscripts. Furthermore, the notation refers invariably to algebraic or coordinate vectors. It does not encompass coordinate-free geometric vectors, which are common in physical contexts.
The diversity of the existing tensor notations is too great to be summarised here. However, numerous references to alternative notations have been given by
Harrison and Joseph (
2016). Mention should also be made of
De Lathauwer et al. (
2000) and
Bader and Kolda (
2006), who have provided accounts that have been widely referenced by computer scientists.
5. The Problem at Hand
In terms of the index notation, the product of (6) may be expressed as
The final expression of the RHS is the product of the post multiplication of
by
. It is derived by cancelling the (superscripted) column indices
, which are associated with the basic matrices within the leading factor, which is the Kronecker product, with the (subscripted) row indices in the trailing factor, which is the vector
. The indices
affixed to the right brace in the penultimate expression emphasise that the elements within the braces are to be summed over the repeated indices. The indices of summation can be inferred from the contents of the braces.
The final expression of (32) suggests various ways of computing the product . One way would be by a single contraction or summation in respect of the composite index . In so far as the vector is concerned, the composite index is liable to correspond to a set of adjacent addresses in the memory of the computer.
The number of multiplications involved in forming the Kronecker product is , which is the product of the number of elements in the constituent matrices. The operation of forming via a single contraction in respect of the composite index entails multiplications and additions.
A more efficient way of computing the product is by successive contractions in respect of the indices
r,
q and
p, separately and successively. This could be described as the divide-and-conquer procedure. Then, a sequence of products, in long-vector form, that would lead from
to
are as follows:
Here, the elements of the matrices
,
and
are adduced to the product in the course of three successive contractions. Passing from
to
via the contraction over
r requires
multiplications. Passing from
to
via the contraction over
q requires
multiplications. Passing from
to
via the contraction over
p requires
multiplications.
However, there are six different orders in which the contractions can be performed, which correspond to the permutations of the indices which are the column indices of the matrices A, B, and C, respectively. In comparing the efficiency of the alternative procedures, one need only count the number of multiplications, since these are far more expensive than are the additions (the orders of computational complexity are and n, respectively).
When
,
and
, with
as before, there are 384 multiplications in the slow procedure. The number of multiplications in the divide-and-conquer procedure depends on the sequence of the contractions. The number of multiplications in the various the divide-and-conquer procedures are listed below with the sequence of contractions denoted by a sequence of matrices, where the order in which they are deployed should be read from right to left (as in a sequence of matrix multiplications):
To minimise the computational burden by minimising the number of multiplications within the divide-and-conquer procedure, one should ensure the contractions in respect of p, q, and r are ordered such that numbers of the associated multiplications are declining. When the row orders of the matrices are equal, the most efficient sequence of contractions will be self-evident, and there will be no need to count the number of multiplications.
The calculation of the product of (32) can de accomplished using nested do-loops. Six loops are required that fall into two groups, which comprise the outer indices and the inner indices . The former group are the indices of the product vector and the latter group are the indices of the successive contractions. Within each group, the order of the indices may be freely permuted.
However, it might pay to deploy the outer indices in the order listed above, since this will enable the elements of the product to be created on the fly in the order in which they would be printed in a column. Otherwise, the printing of the column would need to be delayed until all of its elements have been created.
To illustrate a code appropriate to such calculations, it is enough to consider the equation
, which can be rendered in the index notation as
The code is as follows:
The code becomes increasingly prolix with a growing number of indices, each of which gives rise to a nested do-loop. However, such code can be encapsulated within a procedure or a subroutine with a concise interface.
6. Matrix Multiplications and the Reshaping Operations
Many authors have recognised the inefficiency of forming a multiple Kronecker product before postmultiplying it by a vector. They have proposed a variety of methods that rely on pairwise matrix multiplications that are accompanied by the shuffling of the elements of the products in order to facilitate the succeeding matrix multiplications. These operations have been described as the reshaping of the matrices.
A method for solving the equation
of (6) for
X was proposed by
Pereyra and Scherer (
1973). A much simplified solution was provided by
de Boor (
1979), who showed that this could be achieved by a sequence of solutions involving the matrices
A,
B and
C successively.
An indication or the way in which the product (6) might be calculated via a sequence of matrix multiplications was given in a paper on
Dyksen (
1987), which contained the identity
, where
and
, albeit that the necessary vectorisations of the matrices was not indicated.
In a recent paper,
Fackler (
2019) has proposed that computations involving multiple Kronecker products can be achieved by ordinary matrix multiplications without displacing elements that are stored in the computer memory, as shuffling is liable to do. To illustrate how this method can be implemented in the MATLAB program, we may consider an input vector
, which follows a column-major scheme. Then, the method, which applies the matrices
,
and
in succession to
, entails the following objects,
In the process, the basis vector undergo the following transformation:
It can be seen that, as the indices
p,
q,
r of the contractions drop off the end of the sequence, the new indices
j,
k,
ℓ join at the head.
When the equations of (37) are given their appropriate shapes, they become
and the sequence of matrix multiplications can be denoted by
This entails two operations that are amenable to the
procedure:
Here, the variant matrices
and
have shapes that are derived directly from
and
, respectively, which represent what is stored in the computer’s memory. The
reshape procedure is explained in detail in the
Appendix A.
If it is required to reverse the sequence of matrix multiplications that are applied to the column-major vector
, then these must become post-multiplications, and the corresponding equations will be
The sequence of matrix multiplications can be denoted by
However, reference to (15) indicates that, if the row-major vector
is available, then the following sequence of matrix pre-multiplications can be pursued:
According to the table of (34), which is based on the values , , and , neither of the sequences of (40) and (44) lead to a fully efficient computation. A recourse might be to re-order the matrices within the multiple Kronecker product to create the most efficient sequence, which would also require the elements within the vector to be re-ordered. Whether this would be allowable or convenient is a matter that depends on the substantive nature of the problem that is represented by the equation. However, it is reasonable to suppose that, in most cases, this will be easy to achieve,
Next is the apparent obstacle that the
reshape procedure is specific to MATLAB. However, as
Fackler (
2019) has indicated, it should be straightforward to replicate the procedure in many of the available programming languages, using the arithmetic of pointers. Whereas the method of pairwise multiplication has been described for the case of a column-major scheme, there is no difficulty in adapting to a row-major scheme.
Our preference would be to calculate the product of (6) according to the scheme of (33), or some permutation thereof, that does not require matrix multiplications. However, the intention of this paper has not been to insist on one way of another of conducting the computations, provided that gross inefficiency is avoided. Instead, the aim has been to clarify the nature of the alternative procedures.