Approximation of Time-Frequency Shift Equivariant Maps by Neural Networks

Dae Gwan Lee

doi:10.3390/math12233704

Department of Mathematics and Big Data Science, Kumoh National Institute of Technology, Gumi 39177, Gyeongsangbuk-do, Republic of Korea

Mathematics2024, 12(23), 3704;https://doi.org/10.3390/math12233704

This article belongs to the Special Issue AI Advances in Edge Computing

Version Notes

Order Reprints

Abstract

Based on finite-dimensional time-frequency analysis, we study the properties of time-frequency shift equivariant maps that are generally nonlinear. We first establish a one-to-one correspondence between

Λ

-equivariant maps and certain phase-homogeneous functions and also provide a reconstruction formula that expresses

Λ

-equivariant maps in terms of these phase-homogeneous functions, leading to a deeper understanding of the class of

Λ

-equivariant maps. Next, we consider the approximation of

Λ

-equivariant maps by neural networks. In the case where

Λ

is a cyclic subgroup of order N in

Z_{N} \times Z_{N}

, we prove that every

Λ

-equivariant map can be approximated by a shallow neural network whose affine linear maps are simply linear combinations of time-frequency shifts by

Λ

. This aligns well with the proven suitability of convolutional neural networks (CNNs) in tasks requiring translation equivariance, particularly in image and signal processing applications.

Keywords:

neural networks; equivariance; time-frequency shifts; time-frequency analysis

MSC:

68T07; 42C15; 94A12

1. Introduction

Over the past decade, machine learning techniques based on deep neural networks, commonly referred to as deep learning [1], have achieved significant breakthroughs across a wide range of fields, including image recognition [2,3], speech recognition [4], language translation [5,6], and game playing [7], among others. These advancements are largely driven by the availability of increasingly large training datasets and greater computational resources. Another important factor is the development of specialized neural network architectures, including convolutional neural networks [2], residual networks [3], recurrent networks (notably LSTMs [5]), and transformer networks [6].

A common theme in the design of neural network architectures is the necessity to respect the symmetries inherent in the task at hand. For instance, in image classification, the classification result should remain invariant under small translations of the input image, making convolutional neural networks a suitable choice. Likewise, in audio classification [8], the classification result should be invariant to shifts in time or changes in pitch. In principle, a fully connected neural network can learn to respect such symmetries provided that training data are sufficiently given. Nevertheless, architectures that are inherently aligned with these symmetries tend to exhibit improved generalization and thus show better performance.

In mathematical terms, symmetries can be expressed as follows. Let V be a vector space and let

GL (V)

be the general linear group of V. For a group G and a map

ρ : G \to GL (V)

, we say that a map

F : V \to V

is equivariant under group actions of G (or simply G-equivariant) if

F ρ (λ) = ρ (λ) F

for all

λ \in G

, and invariant under group actions of G (or simply G-invariant) if

F ρ (λ) = F

for all

λ \in G

. We will be focusing on the case where V is a Hilbert space and

ρ (λ)

is a unitary operator for all

λ \in G

. (A Hilbert space is a vector space equipped with an inner product that induces a distance function, making it a complete metric space. Examples of Hilbert spaces include

R^{d}

and

C^{d}

, and Hilbert spaces are often regarded as natural generalizations of signal spaces.)

A particularly important and well-studied example of equivariance involves translations. It is well known that translation-equivariant linear operators are exactly the convolution operators (see, e.g., Section 2.3 of [9], Theorem 4.12 of [10], and Theorem 2.17 of [11]), and that convolutional neural networks (CNNs) are well-suited for approximating these operators. As a natural generalization of CNNs, Cohen and Welling [12] introduced the so-called group equivariant convolutional neural networks (GCNNs), which can handle more general symmetry groups than just translations. Later, Cohen et al. [13] developed a general framework for GCNNs on homogeneous spaces such as

R^{d}

and

S^{2}

, and Yarotsky [14] investigated the approximation of equivariant operators using equivariant neural networks. More recently, Cahill et al. [15] introduced the so-called group-invariant max filters, which are particularly useful for classification tasks involving symmetries, and Balan and Tsoukanis [16,17] constructed stable embeddings on quotient space modulo group action, yielding group-invariant representations via coorbits. Further advances include the work of Huang et al. [18], who designed approximately group-equivariant graph neural networks by focusing on active symmetries, and Blum-Smith and Villar [19], who introduced a method for parameterizing invariant and equivariant functions based on invariant theory. In addition, Wang et al. [20] provided a theoretical analysis of data augmentation and equivariant neural networks applied to non-stationary dynamics forecasting.

In this paper, we are particularly interested in the setting of finite-dimensional time-frequency analysis, which provides a versatile framework for a wide range of signal processing applications, see, e.g., [21,22]. It is known that every linear map from

C^{N}

to

C^{N}

can be expressed as a linear combination of compositions of translations and modulations (see (3) below). We consider maps

F : C^{N} \to C^{N}

that are generally nonlinear and are Λ-equivariant for a given subgroup

Λ

of

Z_{N} \times Z_{N}

, that is,

F \circ π (k, ℓ) = π (k, ℓ) \circ F

for all

(k, ℓ) \in Λ

. Here,

π (k, ℓ) : = M^{ℓ} T^{k}

represents the time-frequency shift by

(k, ℓ)

, where

T, M : C^{N} \to C^{N}

are the translation and modulation operators defined as

T x = (x_{N - 1}, x_{0}, x_{1}, \dots, x_{N - 2})

and

M x = (ω^{0} x_{0}, ω^{1} x_{1}, \dots, ω^{N - 1} x_{N - 1})

,

ω : = e^{2 π i / N}

, for

x = (x_{0}, x_{1}, \dots, x_{N - 1}) \in C^{N}

, respectively (see Section 2.1 for further details). For any

F : C^{N} \to C^{N}

and any nonzero

v \in C^{N}

, we define

F_{v} : C^{N} \to C

by

F_{v} (x) = ⟨F (x), v⟩

,

x \in C^{N}

. For any

Ω \subset Z_{N}

, we say that a function

H : C^{N} \to C

is Ω-phase homogeneous if

H (e^{2 π i s / N} x) = e^{2 π i s / N} H (x)

for all

s \in Ω

and

x \in C^{N}

.

We first address the properties of the mapping

F \mapsto F_{v}

from the space of

Λ

-equivariant functions

C^{N} \to C^{N}

to the space of certain phase homogeneous functions.

Theorem 1

(see Theorem 3 below). Assume that

span {π (k, ℓ) v : (k, ℓ) \in Λ} = C^{N}

for some subgroup Λ of

Z_{N} \times Z_{N}

and some vector

v \in C^{N}

. Then, the mapping

F \mapsto F_{v}

is an injective map from the space of Λ-equivariant functions

C^{N} \to C^{N}

to the space of

Ω_{Λ}

-phase homogeneous functions

C^{N} \to C

, where

Ω_{Λ} : = {k ℓ^{'} mod N : (k, ℓ), (k^{'}, ℓ^{'}) \in Λ}

. Moreover, if

{π (k, ℓ) u}_{(k, ℓ) \in Λ}

is a dual frame of

{π (k, ℓ) v}_{(k, ℓ) \in Λ}

in

C^{N}

, then a Λ-equivariant function

F : C^{N} \to C^{N}

can be expressed as

F (x) = \sum_{(k, ℓ) \in Λ} e^{- 2 π i k ℓ / N} F_{v} (π (- k, - ℓ) x) π (k, ℓ) u .

If

| Λ | = N

, then the mapping

F \mapsto F_{v}

is a bijective map from the space of Λ-equivariant functions

C^{N} \to C^{N}

to the space of

Ω_{Λ}

-phase homogeneous functions

C^{N} \to C

.

We then consider the approximation of

Λ

-equivariant maps. In particular, we show that if

Λ

is a cyclic subgroup of order N in

Z_{N} \times Z_{N}

, then every

Λ

-equivariant map can be easily approximated by a shallow neural network whose affine linear maps consist of linear combinations of time-frequency shifts by

Λ

.

Theorem 2

(see Theorem 5 below). Assume that

σ : C \to C

is shallow universal and satisfies

σ (e^{π i / N} z) = e^{π i / N} σ (z)

for all

z \in C

. Let

Λ = {(0, 0), (1, s), \dots, (N - 1, (N - 1) s)}

for some

s \in {0, 1, \dots, N - 1}

. Then, any continuous Λ-equivariant map

F : C^{N} \to C^{N}

can be approximated (uniformly on compact sets) by a shallow neural network

x \mapsto \sum_{j = 1}^{J} c_{j} σ (A_{j} x + b_{j} v),

where

A_{j} \in span {π (k, ℓ) : (k, ℓ) \in Λ}

,

b_{j} \in C

for

j = 1, \dots, J

, and

v \in C^{N}

satisfies

π (k, ℓ) v = e^{k ℓ π i / N} v

for all

(k, ℓ) \in Λ

. Moreover, every map of this form is Λ-equivariant.

In the case

s = 0

, i.e.,

Λ = {(0, 0), (1, 0), \dots, (N - 1, 0)}

, the

Λ

-equivariant maps

F : C^{N} \to C^{N}

are precisely those that are translation equivariant, meaning that

F T = T F

. Furthermore, if F is linear, then F is just a convolutional map, which can be expressed as a linear combination of

T^{k}

,

k = 0, \dots, N - 1

, or simply as an

N \times N

circulant matrix. If F is nonlinear, then Theorem 2 shows that F can be approximated by a shallow neural network whose affine linear maps are convolutional maps, i.e., by a shallow convolutional neural network. This agrees with the well-established fact that convolutional neural networks (CNNs) are particularly well-suited for applications involving translation equivariance, especially in image and signal processing.

Organization of the Paper

In Section 2, we begin by reviewing some basic properties of time-frequency shift operators, followed by a discussion on time-frequency group equivariant maps, and then prove our first main result, Theorem 1, which establishes a 1:1 correspondence between

Λ

-equivariant maps and certain phase-homogeneous functions. Section 3 is devoted to the approximation of

Λ

-equivariant maps. We first discuss the embedding of

Λ

into the Weyl–Heisenberg group, which allows for the use of tools from group representation theory. (The finite Weyl–Heisenberg group

H_{N}

is the set

Z_{N} \times Z_{N} \times Z_{N}

equipped with group operation

(k, ℓ, s) + (k^{'}, ℓ^{'}, s^{'}) : = (k + k^{'}, ℓ + ℓ^{'}, s + s^{'} - k ℓ^{'})

. The noncommutativity of

H_{N}

plays an important role in finite-dimensional time-frequency analysis; see, e.g., [21,23].) After reviewing key concepts from group representation theory, we consider the case of cyclic subgroups of

Z_{N} \times Z_{N}

, where group representations can be defined directly without embedding into the Weyl–Heisenberg group. Section 3 concludes with the proof of our second main result, Theorem 2, which establishes the approximation of

Λ

-equivariant maps by a shallow neural network whose affine linear maps consist of linear combinations of time-frequency shifts by

Λ

.

2. Time-Frequency Shift Equivariant Maps

2.1. Time-Frequency Shift Operators

We define the translation (time shift) operator

T : C^{N} \to C^{N}

by

T x = (x_{N - 1}, x_{0}, x_{1}, \dots, x_{N - 2}), x = (x_{0}, x_{1}, \dots, x_{N - 1}) \in C^{N},

and the modulation (frequency shift) operator

M : C^{N} \to C^{N}

by

M x = (ω^{0} x_{0}, ω^{1} x_{1}, \dots, ω^{N - 1} x_{N - 1}) with ω : = e^{2 π i / N} .

These operators are linear unitary operators, which can be represented by

N \times N

unitary matrices:

\begin{matrix} T = [\begin{matrix} 0 & 0 & 0 & \dots & 0 & 1 \\ 1 & 0 & 0 & \dots & 0 & 0 \\ 0 & 1 & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 0 & 0 \\ 0 & 0 & 0 & \dots & 1 & 0 \end{matrix}], M = [\begin{matrix} 1 & 0 & 0 & \dots & 0 & 0 \\ 0 & ω & 0 & \dots & 0 & 0 \\ 0 & 0 & ω^{2} & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & ω^{N - 2} & 0 \\ 0 & 0 & 0 & \dots & 0 & ω^{N - 1} \end{matrix}] . \end{matrix}

Note that since

T^{N} = M^{N} = I_{N}

, we have

T^{N + k} = T^{k}

and

M^{N + ℓ} = M^{ℓ}

for all integers k and ℓ. The time-frequency shift by

(k, ℓ) \in Z_{N} \times Z_{N}

is defined by

π (k, ℓ) : = M^{ℓ} T^{k}

. Since

T, M : C^{N} \to C^{N}

are linear unitary operators, the operator

π (k, ℓ)

is also linear and unitary.

For a Hilbert space

H

, we will denote the class of all linear operators on

H

by

L (H)

, and the class of all linear unitary operators on

H

by

U (H)

.

Proposition 1.

For any

k, ℓ = 0, \dots, N - 1

, we have

M^{ℓ} T^{k} = ω^{k ℓ} T^{k} M^{ℓ} .

(1)

This implies

(M^{ℓ} T^{k}) (M^{q} T^{p}) = ω^{- k q} M^{ℓ + q} T^{k + p} = ω^{ℓ p - k q} (M^{q} T^{p}) (M^{ℓ} T^{k})

for

k, ℓ, p, q = 0, \dots, N - 1

, and consequently, the operators

M^{ℓ} T^{k}

and

M^{q} T^{p}

commute if and only if

[(k, ℓ), (p, q)] : = ℓ p - k q

is a multiple of N. Moreover, for any

k, ℓ = 0, \dots, N - 1

, we have

{(M^{ℓ} T^{k})}^{- 1} = ω^{- k ℓ} M^{- ℓ} T^{- k},

(2)

that is,

π {(k, ℓ)}^{- 1} = ω^{- k ℓ} π (- k, - ℓ)

.

Proof.

The relation (1) is easily seen by computation. Using (1), we obtain

{(M^{ℓ} T^{k})}^{- 1} = T^{- k} M^{- ℓ} = ω^{- k ℓ} M^{- ℓ} T^{- k},

which is exactly (2). □

Remark 1.

The definition of

[\cdot, \cdot]

remains unchanged for time-frequency shift operators of the form

T^{k} M^{ℓ}

with

(k, ℓ) \in Z_{N} \times Z_{N}

. Indeed, (1) implies

(T^{k} M^{ℓ}) (T^{p} M^{q}) = ω^{ℓ p - k q} (T^{p} M^{q}) (T^{k} M^{ℓ})

for

k, ℓ, p, q = 0, \dots, N - 1

, and consequently, the operators

T^{k} M^{ℓ}

and

T^{p} M^{q}

commute if and only if

[(k, ℓ), (p, q)] = ℓ p - k q

is a multiple of N.

For a subgroup

Λ

of

Z_{N} \times Z_{N}

, its adjoint group is defined by

\begin{matrix} Λ^{\circ} : = & {(p, q) \in Z_{N} \times Z_{N} : (M^{ℓ} T^{k}) (M^{q} T^{p}) = (M^{q} T^{p}) (M^{ℓ} T^{k}) f o r a l l (k, ℓ) \in Λ} \\ = & {(p, q) \in Z_{N} \times Z_{N} : ℓ p - k q \in N Z f o r a l l (k, ℓ) \in Λ} . \end{matrix}

Since

{M^{ℓ} T^{k} : k, ℓ = 0, \dots, L - 1}

forms a basis for

L (C^{N})

(see, e.g., Lemma 1 of [24]), every linear operator

F \in L (C^{N})

can be expressed as

F = \sum_{k, ℓ = 0}^{L - 1} a_{k, ℓ} M^{ℓ} T^{k} for some a_{k, ℓ} \in C, k, ℓ = 0, \dots, L - 1 .

(3)

If F commutes with

M^{ℓ} T^{k}

for

(k, ℓ) \in Λ

, then we must have

a_{k, ℓ} = 0

for

(k, ℓ) \notin Λ^{\circ}

, so that

F = \sum_{(k, ℓ) \in Λ^{\circ}} a_{k, ℓ} M^{ℓ} T^{k} .

Therefore, the commutant (or centralizer) of

(π, Λ)

(see, e.g., Proposition 4.14 of [25]) is given by

\begin{matrix} C (π, Λ) : = & {F \in L (C^{N}) : F π (k, ℓ) = π (k, ℓ) F f o r a l l (k, ℓ) \in Λ} \\ = & span {π (k, ℓ) : (k, ℓ) \in Λ^{\circ}} . \end{matrix}

(4)

Remark 2.

For a subgroup Λ of

Z_{L} \times Z_{L}

, its adjoint group

Λ^{\circ}

has cardinality

L^{2} / | Λ |

. While this fact is somewhat considered folklore, we could not find a suitable reference in the literature, so we provide a short proof of this fact in Appendix A.

2.2. $Λ$ -Equivariant Maps

Definition 1.

For any

Λ \subset Z_{N} \times Z_{N}

, we say that a map

F : C^{N} \to C^{N}

is

Λ

-equivariant if

F \circ π (k, ℓ) = π (k, ℓ) \circ F f o r a l l (k, ℓ) \in Λ .

(5)

Clearly, the set of

Λ

-equivariant linear maps is precisely

C (π, Λ)

, the commutant of

(π, Λ)

. According to (4), every

(π, Λ)

-equivariant linear map is of the form

\sum_{(k, ℓ) \in Λ^{\circ}} a_{k, ℓ} M^{ℓ} T^{k} for some {a_{k, ℓ}}_{(k, ℓ) \in Λ^{\circ}} \in C^{Λ^{\circ}} .

Since the case of linear maps is obvious, our consideration of

Λ

-equivariant maps will be focused on nonlinear maps.

We first observe some necessary conditions for

Λ

-equivariance.

Proposition 2.

Let Λ be a subgroup of

Z_{N} \times Z_{N}

, and assume that

F : C^{N} \to C^{N}

is Λ-equivariant. If Λ contains

(k, ℓ)

and

(k^{'}, ℓ^{'})

with

s = gcd (k ℓ^{'}, N)

, then

F (e^{2 π i s / N} x) = e^{2 π i s / N} F (x) f o r a l l x \in C^{N} .

(6)

Proof.

Note that since

Λ

is a subgroup, we have

(k + k^{'}, ℓ + ℓ^{'}) \in Λ

. Using Proposition 1 and (5), we have

\begin{matrix} ω^{- k ℓ^{'}} π (k + k^{'}, ℓ + ℓ^{'}) F (x) & = π (k, ℓ) π (k^{'}, ℓ^{'}) F (x) = F (π (k, ℓ) π (k^{'}, ℓ^{'}) x) \\ = F (ω^{- k ℓ^{'}} π (k + k^{'}, ℓ + ℓ^{'}) x) \\ = π (k + k^{'}, ℓ + ℓ^{'}) F (ω^{- k^{'} ℓ} x), \end{matrix}

so that

ω^{- k ℓ^{'}} F (x) = F (ω^{- k ℓ^{'}} x)

. Since

gcd (k ℓ^{'}, N) = s

, there exist some

p, q \in Z

with

p (- k^{'} ℓ) + q N = s

. In fact, we can choose

p \in {0, \dots, N - 1}

such that

p (- k^{'} ℓ) \equiv s mod N

. Then, for any

x \in C^{N}

, we have

ω^{s} F (x) = {(ω^{- k ℓ^{'}})}^{p} F (x) = {(ω^{- k ℓ^{'}})}^{p - 1} F (ω^{- k ℓ^{'}} x) = \dots = F ({(ω^{- k ℓ^{'}})}^{p} x) = F (ω^{s} x),

which is equivalent to (6). □

It is easily seen that for a subgroup

Λ

of

Z_{N} \times Z_{N}

, the set

Ω_{Λ} : = {k ℓ^{'} mod N : (k, ℓ), (k^{'}, ℓ^{'}) \in Λ}

forms a subgroup of

Z_{N}

; in fact,

Ω_{Λ} = s_{0} Z / N Z

, where

s_{0} : = min {gcd (k ℓ^{'}, N) : (k, ℓ), (k^{'}, ℓ^{'}) \in Λ}

. This leads to the following definition.

Definition 2.

Let

m, n \in N

and

N \in N

. For any

Ω \subset Z_{N}

, we say that a map

F : C^{n} \to C^{m}

is

Ω

-phase homogeneous if

F (e^{2 π i s / N} x) = e^{2 π i s / N} F (x) f o r a l l s \in Ω, x \in C^{n} .

Definition 3.

For any

F : C^{N} \to C^{N}

and any nonzero

v \in C^{N}

, we define

F_{v} : C^{N} \to C

by

F_{v} (x) = ⟨F (x), v⟩, x \in C^{N} .

We now present our first main theorem, which addresses the properties of the mapping

F \mapsto F_{v}

. Note that if

F : C^{N} \to C^{N}

is

Λ

-equivariant for a subgroup

Λ

of

Z_{N} \times Z_{N}

, then it is

Ω_{Λ}

-phase homogeneous by Proposition 2, and so is

F_{v}

.

Before stating the theorem, we note that

{π (k, ℓ) v : (k, ℓ) \in Z_{N} \times Z_{N}}

is a tight frame for

C^{N}

whenever

v \neq 0

(see, e.g., Proposition 2 of [24]). Moreover, there exists a nonzero vector

v \in C^{N}

such that every N elements of

{π (k, ℓ) v : (k, ℓ) \in Z_{N} \times Z_{N}}

are linearly independent in

C^{N}

. In fact, such vectors form a dense open set

W_{N}

of full measure in

C^{N}

(see Theorem 1 of [24]). If

Λ \subset Z_{N} \times Z_{N}

is a set of cardinality at least N, then for any

v \in W_{N}

we have

span {π (k, ℓ) v : (k, ℓ) \in Λ} = C^{N}

, in which case

{π (k, ℓ) v}_{(k, ℓ) \in Λ}

forms a frame of

C^{N}

.

Theorem 3.

Assume that

span {π (k, ℓ) v : (k, ℓ) \in Λ} = C^{N}

for some subgroup Λ of

Z_{N} \times Z_{N}

and some vector

v \in C^{N}

. Then, the mapping

F \mapsto F_{v}

is an injective map from the space of Λ-equivariant functions

C^{N} \to C^{N}

to the space of

Ω_{Λ}

-phase homogeneous functions

C^{N} \to C

. Moreover, if

{π (k, ℓ) u}_{(k, ℓ) \in Λ}

is a dual frame of

{π (k, ℓ) v}_{(k, ℓ) \in Λ}

in

C^{N}

, then a Λ-equivariant function

F : C^{N} \to C^{N}

can be expressed as

F (x) = \sum_{(k, ℓ) \in Λ} ω^{- k ℓ} F_{v} (π (- k, - ℓ) x) π (k, ℓ) u .

(7)

If

| Λ | = N

, then the mapping

F \mapsto F_{v}

is a bijective map from the space of Λ-equivariant functions

C^{N} \to C^{N}

to the space of

Ω_{Λ}

-phase homogeneous functions

C^{N} \to C

.

Proof.

To prove the injectivity of

F \mapsto F_{v}

, suppose that

F_{v} = H_{v}

for some

Λ

-equivariant functions

F, H : C^{N} \to C^{N}

. Then, for any

(k, ℓ) \in Λ

and

x \in C^{N}

, we have

\begin{matrix} ⟨F (x), π (k, ℓ) v⟩ & = ⟨π {(k, ℓ)}^{- 1} F (x), v⟩ = ⟨ω^{- k ℓ} π (- k, - ℓ) F (x), v⟩ \\ = ⟨ω^{- k ℓ} F (π (- k, - ℓ) x), v⟩ = ω^{- k ℓ} F_{v} (π (- k, - ℓ) x) \\ = ω^{- k ℓ} H_{v} (π (- k, - ℓ) x) = ⟨ω^{- k ℓ} H (π (- k, - ℓ) x), v⟩ \\ = ⟨ω^{- k ℓ} π (- k, - ℓ) H (x), v⟩ = ⟨π {(k, ℓ)}^{- 1} H (x), v⟩ \\ = ⟨H (x), π (k, ℓ) v⟩ . \end{matrix}

Since

{π (k, ℓ) v : (k, ℓ) \in Λ}

is complete in

C^{N}

, we obtain that

F (x) = H (x)

for all

x \in C^{N}

.

Now, let

{π (k, ℓ) u}_{(k, ℓ) \in Λ}

be a dual frame of

{π (k, ℓ) v}_{(k, ℓ) \in Λ}

in

C^{N}

, which means that

z = \sum_{(k, ℓ) \in Λ} ⟨z, π (k, ℓ) v⟩ π (k, ℓ) u, z \in C^{N} .

Then, for any

x \in C^{N}

, we have

\begin{matrix} \sum_{(k, ℓ) \in Λ} ω^{- k ℓ} F_{v} (π (- k, - ℓ) x) π (k, ℓ) u & = \sum_{(k, ℓ) \in Λ} ω^{- k ℓ} ⟨F (π (- k, - ℓ) x), v⟩ π (k, ℓ) u \\ = \sum_{(k, ℓ) \in Λ} ⟨ω^{- k ℓ} π (- k, - ℓ) F (x), v⟩ π (k, ℓ) u \\ = \sum_{(k, ℓ) \in Λ} ⟨π {(k, ℓ)}^{- 1} F (x), v⟩ π (k, ℓ) u \\ = \sum_{(k, ℓ) \in Λ} ⟨F (x), π (k, ℓ) v⟩ π (k, ℓ) u \\ = F (x), \end{matrix}

which establishes (7).

Finally, assume that

| Λ | = N

. Then,

{π (k, ℓ) v}_{(k, ℓ) \in Λ}

forms a Riesz basis for

C^{N}

, so there exists a unique dual Riesz basis

{π (k, ℓ) u}_{(k, ℓ) \in Λ}

of

{π (k, ℓ) v}_{(k, ℓ) \in Λ}

in

C^{N}

, which is necessarily biorthogonal to

{π (k, ℓ) v}_{(k, ℓ) \in Λ}

(see, e.g., [26]). To prove the surjectivity of

F \mapsto F_{v}

, we pick any

Ω_{Λ}

-phase homogeneous function

g : C^{N} \to C

and set

F : C^{N} \to C^{N}

by

F (x) : = \sum_{(k, ℓ) \in Λ} ω^{- k ℓ} g (π (- k, - ℓ) x) π (k, ℓ) u, x \in C^{N} .

Then, for any

(p, q) \in Λ

and

x \in C^{N}

, we have

\begin{matrix} F (π (p, q) x) & = \sum_{(k, ℓ) \in Λ} ω^{- k ℓ} g (π (- k, - ℓ) π (p, q) x) π (k, ℓ) u \\ = \sum_{(k, ℓ) \in Λ} ω^{- (k + p) (ℓ + q)} g (π (- k - p, - ℓ - q) π (p, q) x) π (k + p, ℓ + q) u \\ \overset{(8)}{=} \sum_{(k, ℓ) \in Λ} ω^{- (k + p) (ℓ + q)} g (ω^{- (k + p) q} π (- k, - ℓ) x) ω^{p ℓ} π (p, q) π (k, ℓ) u \\ = \sum_{(k, ℓ) \in Λ} ω^{- (k + p) (ℓ + q)} ω^{- (k + p) q} g (π (- k, - ℓ) x) ω^{p ℓ} π (p, q) π (k, ℓ) u \\ = \sum_{(k, ℓ) \in Λ} ω^{- k ℓ} g (π (- k, - ℓ) x) π (p, q) π (k, ℓ) u \\ = π (p, q) \sum_{(k, ℓ) \in Λ} ω^{- k ℓ} g (π (- k, - ℓ) x) π (k, ℓ) u \\ = π (p, q) F (x), \end{matrix}

which shows that F is

Λ

-equivariant. Since

{π (k, ℓ) u}_{(k, ℓ) \in Λ}

and

{π (k, ℓ) v}_{(k, ℓ) \in Λ}

are biorthogonal, it holds for any

x \in C^{N}

that

\begin{matrix} F_{v} (x) & = ⟨ F (x), v ⟩ = ⟨\sum_{(k, ℓ) \in Λ} ω^{- k ℓ} g (π (- k, - ℓ) x) π (k, ℓ) u, v⟩ \\ = ⟨\sum_{(k, ℓ) \in Λ} g (ω^{- k ℓ} π (- k, - ℓ) x) π (k, ℓ) u, v⟩ \\ = ⟨\sum_{(k, ℓ) \in Λ} g (π {(k, ℓ)}^{- 1} x) π (k, ℓ) u, v⟩ \\ = g (x) . \end{matrix}

Hence, we conclude that the mapping

F \mapsto F_{v}

is also surjective. □

Remark 3.

As one would expect, the mapping

F \mapsto F_{v}

is not surjective if

| Λ | > N

. Indeed, if

| Λ | > N

and

span {π (k, ℓ) v : (k, ℓ) \in Λ} = C^{N}

, then there are many dual frames of

{π (k, ℓ) v : (k, ℓ) \in Λ}

in

C^{N}

. If

g = F_{v}

for some F and v, then for any dual frames

{π (k, ℓ) w}_{(k, ℓ) \in Λ}

and

{π (k, ℓ) \tilde{w}}_{(k, ℓ) \in Λ}

of

{π (k, ℓ) v}_{(k, ℓ) \in Λ}

we have

\begin{matrix} \sum_{(k, ℓ) \in Λ} ω^{- k ℓ} g (π (- k, - ℓ) x) π (k, ℓ) w = F (x) = \sum_{(k, ℓ) \in Λ} ω^{- k ℓ} g (π (- k, - ℓ) x) π (k, ℓ) \tilde{w} \end{matrix}

for all

x \in C^{N}

by (7). Certainly, not every

Ω_{Λ}

-phase homogeneous function

g : C^{N} \to C

satisfies this property.

3. Approximation of $Λ$ -Equivariant Maps

In this section, we consider an approximation of continuous

Λ

-equivariant maps

F : C^{N} \to C^{N}

that are generally nonlinear, where

Λ

is a subgroup of

Z_{N} \times Z_{N}

and the

Λ

-equivariance is defined by (5). For instance, the map

F : C^{N} \to C^{N}

given by

F (x) = {∥ x ∥}^{p} x

with

p > 0

, is a nonlinear continuous

Λ

-equivariant map.

As seen in Section 2.2 (particularly in Theorem 3 and its proof), working with the time-frequency shift operators

π (k, ℓ)

,

(k, ℓ) \in Λ

, usually requires careful bookkeeping of extra multiplicative phase factors due to the non-commutativity of T and M. (The non-commutativity of T and M can often be frustrating. However, it is precisely this non-commutativity that has given rise to the deep and rich theory of time-frequency analysis [23].) In fact, the map

{π |}_{Λ} : Λ \to U (C^{N}), (k, ℓ) \mapsto M^{ℓ} T^{k},

is generally not a group homomorphism; indeed,

π (k, ℓ) π (k^{'}, ℓ^{'}) = e^{- 2 π i k ℓ^{'} / N} π (k + k^{'}, ℓ + ℓ^{'})

(8)

is equal to

π (k + k^{'}, ℓ + ℓ^{'})

only if

k ℓ^{'}

is a multiple of N (see Proposition 1). (Although

{π |}_{Λ}

is not a group homomorphism and thus not a group representation, it is often referred to as a projective group representation of G on

C^{N}

. In general, a map

ρ : G \to U (H)

is called a projective group representation of G on

H

if for each pair of

g_{1}, g_{2} \in G

, there exists a unimodular

c (g_{1}, g_{2}) \in C

such that

ρ (g_{1} g_{2}) = c (g_{1}, g_{2}) ρ (g_{1}) ρ (g_{2})

; see, e.g., [25].) Obviously, the computations involved would be simplified significantly if

{π |}_{Λ}

were a group homomorphism in general. Note that, as mentioned in Section 1, a group homomorphism

ρ : G \to U (H)

whose images are unitary operators on

H

is called a (unitary) group representation of G on

H

, where G is a group and

H

is a separable Hilbert space. Therefore, the map

{π |}_{Λ}

would be a unitary representation if it were a group homomorphism.

In the following, we first discuss a systematic method of avoiding such extra multiplicative phase factors by embedding

Λ \subset Z_{N} \times Z_{N}

into the Weyl–Heisenberg group. After briefly reviewing essential concepts on group representations and neural networks, we consider cyclic subgroups of

Z_{N} \times Z_{N}

, in which case the map

{π |}_{Λ}

can be replaced by a unitary group representation. We show that if

Λ

is a cyclic subgroup of

Z_{N} \times Z_{N}

, then any

Λ

-equivariant map

C^{N} \to C^{N}

can be approximated with shallow neural networks involving the adjoint group

Λ^{\circ}

, which has significantly fewer degrees of freedom compared with standard shallow neural networks.

3.1. Embedding of $Λ$ into the Weyl–Heisenberg Group

To avoid the bookkeeping of extra multiplicative phase factors, we can simply embed the subgroups of

Z_{N} \times Z_{N}

into the finite Weyl–Heisenberg group

H_{N} = Z_{N} \times Z_{N} \times Z_{N}

, on which group representations can be defined. There exists a group representation

τ : H_{N} \to U (C^{N})

, known as the Schrödinger representation, which satisfies

τ (k, ℓ, 0) = π (k, ℓ)

for all

(k, ℓ) \in Z_{N} \times Z_{N}

. In fact, for any subgroup

Λ

of

Z_{N} \times Z_{N}

and any subgroup

Ω

of

Z_{N}

containing

Ω_{Λ} : = {k ℓ^{'} mod N : (k, ℓ), (k^{'}, ℓ^{'}) \in Λ}

, the map

τ : Λ \times Ω \to U (H), τ (k, ℓ, s) = e^{2 π i s / N} M^{ℓ} T^{k},

(9)

is a group representation of

G = Λ \times Ω

on

C^{N}

, with the group operation on G given by

(k, ℓ, s) + (k^{'}, ℓ^{'}, s^{'}) : = (k + k^{'}, ℓ + ℓ^{'}, s + s^{'} - k ℓ^{'}) .

Clearly, we have

τ (k, ℓ, 0) = M^{ℓ} T^{k} = π (k, ℓ)

for all

(k, ℓ) \in Λ

.

It is clear that a map

F : C^{N} \to C^{N}

is

Λ

-equivariance in the sense of (5) if and only if it is

(τ, Λ \times {0})

-equivariant in the sense of Definition 4. Moreover, in this case, Proposition 2 implies that F is

Ω_{Λ}

-phase homogeneous, which is equivalent to

F \circ τ (0, 0, s) = τ (0, 0, s) \circ F

for all

s \in Ω_{Λ}

. Consequently, we have the following proposition.

Proposition 3.

For any subgroup Λ of

Z_{N} \times Z_{N}

and any

F : C^{N} \to C^{N}

, the following are equivalent.

(i): F is Λ-equivariant;
(ii): F is $(τ, Λ \times {0})$ -equivariant;
(iii): F is $(τ, Λ \times Ω_{Λ})$ -equivariant.

Using the true group representation

τ

instead of

π | Λ

allows us to avoid the tedious bookkeeping of extra multiplicative phase factors. Note, however, that

τ

requires three input parameters, while

π | Λ

involves only two. In fact, the description of the extra phase factors is simply transferred to the third parameter of

τ

. Nevertheless, an important advantage of using

τ

instead of

{π |}_{Λ}

is that it allows for the use of tools from group representation theory.

3.2. Group Representations and Neural Networks

In this section, we review some concepts and tools from group representation theory and introduce the so-called ♮-transform and its inverse transform for later use. We also review the basic structure of neural networks and the universal approximation theorem.

We assume that G is a finite group, and consider maps of the form

F : H \to H

, where

H

is a finite-dimensional Hilbert spaces on which a unitary representation

ρ

of G is defined. This means that for each

λ \in G

, the map

ρ (λ) : H \to H

is a linear unitary operator, and that

ρ : G \to U (H)

is a group homomorphism, i.e.,

ρ (λ_{1} λ_{2}) = ρ (λ_{1}) ρ (λ_{2})

for all

λ_{1}, λ_{2} \in G

. Let us formally state the definition of equivariance and invariance in this setting.

Definition 4

(Equivariance and Invariance). For a group G and a unitary representation ρ of G on a Hilbert space

H

, we say that a map

F : H \to H

is

$(ρ, G)$ -equivariant if $F ρ (λ) = ρ (λ) F$ for all $λ \in G$ ;
$(ρ, G)$ -invariant if $F ρ (λ) = F$ for all $λ \in G$ .

Note that a

(ρ, G)

-equivariant/invariant map

F : H \to H

is not necessarily linear or bounded.

Definition 5.

For a group G, the left translation of a vector

x \in C^{G}

by

λ \in G

is given by

L_{λ} x (ν) : = x (λ^{- 1} ν) f o r ν \in G .

In fact, the map

λ \mapsto L_{λ}

is a group homomorphism from G to

U (C^{G})

, that is,

L_{λ_{1} λ_{2}} = L_{λ_{1}} L_{λ_{2}}

for all

λ_{1}, λ_{2} \in G

, and therefore, it induces a group representation of G on

C^{G}

. We say that a map

Φ : C^{G} \to C^{G}

is left G-translation equivariant if

Φ L_{λ} = L_{λ} Φ

for all

λ \in G

.

Definition 6.

Let G be a group and let ρ be a unitary representation of G on a Hilbert space

H

. Given a window

g \in H

, the set

{ρ (λ) g : λ \in G}

is called the orbit of g under

ρ (λ)

for

λ \in G

. The map

U_{g} : H \to C^{G}

defined by

U_{g} (f) = {⟨ f, ρ (λ) g ⟩}_{λ \in G}

is called the analysis operator of

{ρ (λ) g : λ \in G}

, and its adjoint operator

U_{g}^{*} : C^{G} \to H

given by

U_{g}^{*} (x) = \sum_{λ \in G} x_{λ} ρ (λ) g

is called the synthesis operator of

{ρ (λ) g : λ \in G}

.

It is easy to check that

U_{g} ρ (λ) = L_{λ} U_{g} and U_{g}^{*} L_{λ} = ρ (λ) U_{g}^{*}, λ \in G .

(10)

We are particularly interested in the case where the orbit of g spans

H

, that is,

span {ρ (λ) g : λ \in G} = H

. Since

H

is finite-dimensional, this implies that

{ρ (λ) g : λ \in G}

is a frame for

H

and the associated frame operator

S_{g} : = U_{g}^{*} U_{g}

is a positive, self-adjoint bounded operator on

H

. It follows from (10) that

S_{g} ρ (λ) = ρ (λ) S_{g}

and thus

S_{g}^{- 1} ρ (λ) = ρ (λ) S_{g}^{- 1}

for all

λ \in G

. For any

f \in H

, we have

\begin{matrix} f & = S_{g}^{- 1} S_{g} f = S_{g}^{- 1} (\sum_{λ \in G} U_{g} f (λ) ρ (λ) g) = \sum_{λ \in G} U_{g} f (λ) S_{g}^{- 1} (ρ (λ) g) \\ = \sum_{λ \in G} U_{g} f (λ) ρ (λ) S_{g}^{- 1} (g) = \sum_{λ \in G} U_{g} f (λ) ρ (λ) g^{*} = U_{g^{*}}^{*} U_{g} f, \end{matrix}

where

g^{*} : = S_{g}^{- 1} (g) \in H

. This shows that

U_{g^{*}}^{*} U_{g}

is the identity operator on

H

, i.e.,

U_{g^{*}}^{*} U_{g} = {Id}_{H},

(11)

and correspondingly,

{ρ (λ) g^{*} : λ \in G}

is the canonical dual frame of

{ρ (λ) g : λ \in G}

.

In light of (11), we newly introduce a transform which lifts a map

H \to H

to a map

C^{G} \to C^{G}

, and also its inverse transform.

Definition 7.

Let G be a finite group and let ρ be a unitary representation of G on a finite-dimensional Hilbert space

H

. Assume that

span {ρ (λ) g : λ \in G} = H

, and let

S_{g} : = U_{g}^{*} U_{g}

and

g^{*} : = S_{g}^{- 1} (g)

. For any map

F : H \to H

, the ♮-transform of F is defined by

F^{♮} : = U_{g} \circ F \circ U_{g^{*}}^{*} : C^{G} \to C^{G} .

For any map

Φ : C^{G} \to C^{G}

, the inverse ♮-transform of Φ is defined by

Φ^{- ♮} : = U_{g^{*}}^{*} \circ Φ \circ U_{g} : H \to H .

As shown in Figure 1, the ♮-transform converts a map

H \to H

into a map

C^{G} \to C^{G}

, and the inverse ♮-transform converts a map

C^{G} \to C^{G}

into a map

H \to H

.

Figure 1. The ♮-transform and its inverse transform.

Proposition 4.

Let G be a finite group, and let ρ be a unitary representation of G on a finite-dimensional Hilbert space

H

. Assume that

span {ρ (λ) g : λ \in G} = H

, and let

S_{g} : = U_{g}^{*} U_{g}

and

g^{*} : = S_{g}^{- 1} (g)

. Then, the following hold.

(i): ${(F^{♮})}^{- ♮} = F$ for any map $F : H \to H$ .
(ii): A map $F : H \to H$ is continuous if and only if $F^{♮}$ is continuous.
(iii): A map $F : H \to H$ is $(ρ, G)$ -equivariant if and only if $F^{♮}$ is left G-translation equivariant.

Proof.

(i) It follows from (11) that

{(F^{♮})}^{- ♮} = U_{g^{*}}^{*} (U_{g} F U_{g^{*}}^{*}) U_{g} = F

for any

F : H \to H

.

(ii) Since the maps $U_{g} : H \to C^{G}$ and $U_{g^{*}}^{*} : C^{G} \to H$ are bounded linear operators, the continuity of F implies the continuity of $F^{♮} = U_{g} F U_{g^{*}}^{*}$ . Similarly, the continuity of $F^{♮}$ implies the continuity of $F = {(F^{♮})}^{- ♮} = U_{g^{*}}^{*} F^{♮} U_{g}$ .
(iii) It follows from (10) that the G-equivariance of F implies the left G-translation equivariance of $F^{♮} = U_{g} F U_{g^{*}}^{*}$ . Similarly, the left G-translation equivariance of $F^{♮}$ implies the G-equivariance of $F = {(F^{♮})}^{- ♮} = U_{g^{*}}^{*} F^{♮} U_{g}$ . □

We now provide a brief review of neural networks and the universal approximation theorem.

Let

K

be either

R

or

C

. An activation function is a function

σ : K \to K

that acts componentwise on vectors; that is,

σ (x_{1}, \dots, x_{n}) = (σ (x_{1}), \dots, σ (x_{n}))

for any

(x_{1}, \dots, x_{n}) \in K^{n}

.

A fully connected feedforward neural network with P hidden layers is given by

Ψ : K^{d} \to K^{n}, Ψ (x) = R^{(P)} \circ (σ \circ R^{(P - 1)}) \circ \dots \circ (σ \circ R^{(0)}),

(12)

where

R^{(p)} : K^{N_{p}} \to K^{N_{p + 1}}

,

x \mapsto A^{(p)} x + b^{(p)}

is affine-linear with

N_{0} = d

and

N_{P + 1} = n

. Such a function

Ψ

is often called a neural network, but we will call it a σ-neural network to specify the activation function employed.

A shallow neural network is a neural network with a single (

P = 1

) hidden layer. In particular, a shallow neural network with output dimension

n = 1

is given by

Ψ : K^{d} \to K, Ψ (x) = \sum_{j = 1}^{J} c_{j} σ (w_{j}^{T} x + b_{j}) with some J \in N, c_{j}, b_{j} \in K, w_{j} \in K^{d} .

(13)

Definition 8.

A function

σ : K \to K

is called shallow universal if the set of

K

-valued shallow σ-networks is dense in the set of all continuous functions

f : K^{d} \to K

, with respect to locally uniform convergence.

The following theorem, known as the universal approximation theorem, is a fundamental result in the theory of neural networks.

Theorem 4

(The universal approximation theorem; see [27,28,29,30,31] for

K = R

, and [32] for

K = C

). Let

d \in N

.

A function $σ : R \to R$ is shallow universal if and only if σ is not a polynomial.
A function $σ : C \to C$ is shallow universal if and only if σ is not a polyharmonic. Here, a function $τ : C \to C$ is called polyharmonic if there exists $m \in N$ such that $τ \in C^{2 m}$ in the sense of real variables and $Δ^{m} σ \equiv 0$ , where $Δ = \frac{\partial^{2}}{\partial x^{2}} + \frac{\partial^{2}}{\partial y^{2}}$ is the usual Laplace operator on $C ≅ R^{2}$ .

In 1996, Mhaskar [33] obtained a quantitative result for approximation of

C^{n}

functions using shallow networks with smooth activation functions. More recently, Yarotsky [34] derived a quantitative approximation result for deep ReLU networks, where ReLU networks are given by (12) with

K = R

and the ReLU activation function

σ : R \to R

,

σ (x) = max {x, 0}

, and “deep” refers to having a large

P \in N

in (12). For the case of complex-valued deep neural networks, we refer to [35].

3.3. Cyclic Subgroups $Λ$ of $Z_{N} \times Z_{N}$

We now consider the case of cyclic subgroups of

Z_{N} \times Z_{N}

, where group representations can be defined directly without embedding into the Weyl–Heisenberg group. The cyclic subgroups of order N in

Z_{N} \times Z_{N}

are given by

\begin{matrix} Λ_{s} & = {(0, 0), (1, s), \dots, (N - 1, (N - 1) s)} = ⟨ (1, s) ⟩, s = 0, \dots, N - 1, \\ Λ_{\infty} & = {(0, 0), (0, 1), \dots, (0, N - 1)} = ⟨ (0, 1) ⟩ . \end{matrix}

If N is prime, these are the only nontrivial proper subgroups of

Z_{N} \times Z_{N}

, but if N is composite, there exist noncyclic subgroups of order N in

Z_{N} \times Z_{N}

; for instance,

{0, 2, 4} \times {0, 3}

is a noncyclic subgroup of order 6 in

Z_{6} \times Z_{6}

. It is easily seen that the adjoint group of

Λ_{s}

in

Z_{N} \times Z_{N}

is

Λ_{s}

itself; that is,

{(Λ_{s})}^{\circ} = Λ_{s}

(see Section 2.1).

We define the map

ρ : Λ_{s} \to U (C^{N})

by

(ρ (k, ℓ) x) (n) = e^{- k ℓ π i / N} e^{2 π i ℓ n / N} x (n - k), (k, ℓ) \in Λ_{s}, x \in C^{N} .

Setting

ω_{0} : = e^{π i / N}

, we may simply write

ρ (k, ℓ) = ω_{0}^{- k ℓ} M^{ℓ} T^{k} = ω_{0}^{k ℓ} T^{k} M^{ℓ}, (k, ℓ) \in Λ_{s} .

(14)

For any

(k, ℓ), (k^{'}, ℓ^{'}) \in Λ_{s}

, we have

\begin{matrix} ρ (k + k^{'}, ℓ + ℓ^{'}) & = ω_{0}^{- (k + ℓ) (k^{'} + ℓ^{'})} M^{ℓ + ℓ^{'}} T^{k + k^{'}} = ω_{0}^{- k ℓ - k^{'} ℓ^{'} - 2 k ℓ^{'}} M^{ℓ + ℓ^{'}} T^{k + k^{'}} \\ \overset{(8)}{=} ω_{0}^{- k ℓ - k^{'} ℓ^{'}} M^{ℓ} T^{k} M^{ℓ^{'}} T^{k^{'}} = ρ (k, ℓ) ρ (k^{'}, ℓ^{'}), \end{matrix}

where we used the fact that

k^{'} ℓ = k ℓ^{'}

for all

(k, ℓ), (k^{'}, ℓ^{'}) \in Λ_{s}

. This shows that

ρ

is a group homomorphism and thus a unitary group representation of

Λ_{s}

on

C^{N}

. Due the symmetry in (14),

ρ

is called the symmetric representation of

Λ_{s}

on

C^{N}

.

Note that for any

F : C^{N} \to C^{N}

and

(k, ℓ) \in Z_{N} \times Z_{N}

, we have

F π (k, ℓ) = π (k, ℓ) F

if and only if

F ρ (k, ℓ) = ρ (k, ℓ) F

, where we used the relation

ρ (k, ℓ) = ω_{0}^{- k ℓ} π (k, ℓ)

from (14). This implies that a map

F : C^{N} \to C^{N}

is

Λ_{s}

-equivariant in the sense of Definition 1 if and only if it is

(ρ, Λ_{s})

-equivariant in the sense of Definition 4. Importantly, employing

(ρ, Λ_{s})

-equivariance in place of

Λ_{s}

-equivariance will allow us to apply the tools from group representation theory described in Section 3.2.

We are interested in approximating

Λ_{s}

-equivariant (or

(ρ, Λ_{s})

-equivariant) maps

F : C^{N} \to C^{N}

by neural networks. For this, we need to choose a complex-valued activation function

σ : C \to C

(see Section 3.2) for the neural networks. Since

σ

acts componentwise on its input, i.e.,

(x_{1}, \dots, x_{N}) \mapsto (σ (x_{N}), \dots, σ (x_{N}))

, it clearly commutes with all translations, i.e.,

σ T = T σ

; however,

σ

does not commute with modulations in general. As shown in (14), the representation

ρ

includes the multiplicative phase factor

ω_{0} = e^{π i / N}

, so we will assume that

σ : C \to C

is

e^{π i / N}

-phase homogeneous (see Definition 2):

σ (e^{π i / N} z) = e^{π i / N} σ (z), z \in C,

which ensures that

σ

commutes with all

ρ (k, ℓ)

and all modulations.

We first need the following lemma. Below, we denote by

1_{N} : = (1, 1, \dots, 1) \in C^{N}

the vector whose entries are all equal to 1.

Lemma 1.

Assume that

σ : C \to C

is shallow-universal. If a map

F : C^{N} \to C^{N}

satisfies

F T = T F

, then there exists a shallow convolutional neural network

Ψ : C^{N} \to C^{N}, Ψ (x) = \sum_{j = 1}^{J} c_{j} σ (B_{j} x + b_{j} 1_{N}), x \in C^{N},

where

B_{j} \in span {T^{k} : k = 0, \dots, N - 1}

and

b_{j} \in C

for

j = 1, \dots, J

, which approximates F uniformly on compact sets in

C^{N}

.

Proof.

Using the universal approximation theorem (see Theorem 4), the first output component map

F_{0} : C^{N} \to C

,

x \mapsto (F x) (0)

, can be approximated by a shallow network

ψ : C^{N} \to C, x \mapsto \sum_{j = 1}^{J} c_{j} σ (w_{j}^{T} x + b_{j})

with some

J \in N

,

b_{j}, c_{j} \in C

,

w_{j} \in C^{N}

. Note that since

F T = T F

and since

T^{N}

is the identity map on

C^{N}

, we have

F T^{n} = T^{n} F

for all

n \in Z

. This condition provides approximations for other component maps

F_{n} : C^{N} \to C

,

x \mapsto (F x) (n)

, with

n = 1, \dots, N - 1

, in terms of

Ψ

. In fact, we have

(F x) (n) = (T^{- n} F x) (0) = (F T^{- n} x) (0) \approx ψ (T^{- n} x), x \in C^{N}, n = 1, \dots, N - 1 .

Consequently, the map

F : C^{N} \to C^{N}

,

x \mapsto {(F x) (n)}_{n = 0}^{N - 1}

, is approximated by the map

Ψ : C^{N} \to C^{N}

defined by

(Ψ x) (n) = ψ (T^{- n} x)

for

n = 0, \dots, N - 1

. For

x, y \in C^{N}

, let

x * y

be the circular convolution of a and b defined by

(x * y) (n) = \sum_{k = 0}^{N - 1} x_{k} y_{n - k}

, where x and y are understood as N-periodic sequences on the integers. Then, for any

x \in C^{N}

and

n = 0, \dots, N - 1

, we have

Ψ (T^{- n} x) = \sum_{j = 1}^{J} c_{j} σ ((w_{j} * x) (n) + b_{j}),

and therefore, we may write

Ψ : C^{N} \to C^{N}, Ψ (x) = {ψ (T^{- n} x)}_{n = 0}^{N - 1} = {\{\sum_{j = 1}^{J} c_{j} σ ((w_{j} * x) (n) + b_{j})\}}_{n = 0}^{N - 1} .

It is easily seen that every convolutional map

C^{N} \to C^{N}

,

x \mapsto w * x

, is a linear map, and in fact, a linear combination of

T^{k}

,

k = 0, \dots, N - 1

. Hence, the map

Ψ : C^{N} \to C^{N}

can be rewritten as

Ψ (x) = \sum_{j = 1}^{J} c_{j} σ (B_{j} x + b_{j} 1_{N}), x \in C^{N},

where

B_{j} \in span {T^{k} : k = 0, \dots, N - 1}

for

j = 1, \dots, J

. The fact that

Ψ

approximates F uniformly on compact sets in

C^{N}

follows from the uniform approximation of

F_{0}

by

ψ

on compact sets in

C

. Finally, we note that

Ψ

expressed above is a shallow convolutional neural network described in Section 3.2. This completes the proof. □

Theorem 5.

Assume that

σ : C \to C

is shallow universal and satisfies

σ (e^{π i / N} z) = e^{π i / N} σ (z)

for all

z \in C

. Let

Λ = Λ_{s}

for some

s \in {0, 1, \dots, N - 1}

. Then, any continuous

(ρ, Λ)

-equivariant (or Λ-equivariant) map

F : C^{N} \to C^{N}

can be approximated (uniformly on compact sets) by a shallow neural network

x \mapsto \sum_{j = 1}^{J} c_{j} σ (A_{j} x + b_{j} v),

where

A_{j} \in span {ρ (k, ℓ) : (k, ℓ) \in Λ}

and

b_{j} \in C

for

j = 1, \dots, J

, and

v \in C^{N}

satisfies

ρ (k, ℓ) v = v

for all

(k, ℓ) \in Λ

. Moreover, every map of this form is

(ρ, Λ)

-equivariant (or Λ-equivariant).

Remark 4.

Since

ρ (k, ℓ) = ω_{0}^{- k ℓ} π (k, ℓ)

by (14), we have

span {ρ (k, ℓ) : (k, ℓ) \in Λ} = span {π (k, ℓ) : (k, ℓ) \in Λ}

for any

Λ \subset Z_{N} \times Z_{N}

. On the other hand, the vectors satisfying

ρ (k, ℓ) b = b

can be significantly different from those satisfying

π (k, ℓ) b = b

.

Proof.

Since

Λ = Λ_{s}

is cyclic, we order its elements as

(0, 0), (1, s), \dots, (N - 1, (N - 1) s)

, and treat

C^{Λ}

as

C^{N}

, since

C^{Λ} ≃ C^{N}

. Then, the operators

U_{g} : C^{N} \to C^{Λ}

and

U_{g}^{*} : C^{Λ} \to C^{N}

, given in Definition 6, can be represented as the

N \times N

matrices

U_{g} = [\begin{matrix} {(ρ (0, 0) g)}^{*} \\ ⋮ \\ {(ρ (N - 1, (N - 1) s) g)}^{*} \end{matrix}], U_{g}^{*} = [\begin{matrix} ρ (0, 0) g, \dots, ρ (N - 1, (N - 1) s) g \end{matrix}],

respectively, where

{(\cdot)}^{*}

denotes the conjugate transpose. Setting

g = (1, 0, \dots, 0) \in C^{N}

, we have

U_{g}^{*} = diag {(e^{k^{2} s π i / N})}_{k = 0}^{N - 1} = [\begin{matrix} 1 & 0 & \dots & 0 \\ 0 & e^{1^{2} s π i / N} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & e^{{(N - 1)}^{2} s π i / N} \end{matrix}],

(15)

so that

S_{g} = U_{g}^{*} U_{g} = {Id}_{N}

and

g^{*} : = S_{g}^{- 1} g = g

. As a result, the set

{ρ (k, ℓ) g}_{(k, ℓ) \in Λ}

forms an orthonormal basis for

C^{N}

.

Note that for any continuous

(ρ, Λ)

-equivariant

F : C^{N} \to C^{N}

, the map

F^{♮} : = U_{g} F U_{g}^{*} : C^{Λ} \to C^{Λ}

is continuous and left

Λ

-translation equivariant (see Proposition 4). If F is linear, then

F^{♮}

is also linear and can be represented as a circulant matrix, equivalently,

F^{♮} = \sum_{k = 0}^{N - 1} c_{k} T^{k} : C^{Λ} \to C^{Λ}

for some

c_{0}, \dots, c_{N - 1} \in C

, so that

F = U_{g}^{*} (U_{g} F U_{g}^{*}) U_{g} = U_{g}^{*} F^{♮} U_{g} = \sum_{k = 0}^{N - 1} c_{k} {(U_{g}^{*} T U_{g})}^{k} .

Therefore, the commutant of

(ρ, Λ)

is given by

\begin{matrix} C (ρ, Λ) : = & {F \in L (C^{N}) : F ρ (k, ℓ) = ρ (k, ℓ) F for all (k, ℓ) \in Λ} \\ = & span {U_{g}^{*} T^{k} U_{g} : k = 0, \dots, N - 1} . \end{matrix}

On the other hand, since

ρ (k, ℓ) = ω_{0}^{- k ℓ} π (k, ℓ)

by (14), the commutant of

(ρ, Λ)

coincides with that of

(π, Λ)

, i.e.,

C (ρ, Λ) = C (π, Λ) \overset{(4)}{=} span {π (k, ℓ) : (k, ℓ) \in Λ^{\circ}} = span {ρ (k, ℓ) : (k, ℓ) \in Λ^{\circ}} .

Since the adjoint group of

Λ = Λ_{s}

is itself, i.e.,

Λ^{\circ} = Λ

(see Section 2.1), we obtain

span {ρ (k, ℓ) : (k, ℓ) \in Λ} = C (ρ, Λ) = span {U_{g}^{*} T^{k} U_{g} : k = 0, \dots, N - 1} .

(16)

Now, we consider the general case where

F : C^{N} \to C^{N}

is possibly nonlinear. If F is nonlinear, then

F^{♮} = U_{g} F U_{g}^{*} : C^{Λ} \to C^{Λ}

is a nonlinear left

Λ

-translation equivariant map. Since

Λ = Λ_{s} = {(0, 0), (1, s), \dots, (N - 1, (N - 1) s)}

is an additive group and since

| Λ | = N

and

C^{Λ} ≃ C^{N}

, the map

F^{♮}

can be viewed as a map from

C^{N}

to

C^{N}

. For simplicity, we will abuse notation and write

F^{♮} : C^{N} \to C^{N}

instead of

F^{♮} : C^{Λ} \to C^{Λ}

; thus, the first component of

F^{♮} (x) \in C^{Λ} (≃ C^{N})

will be simply denoted by

(F^{♮} x) (0)

instead of

(F^{♮} x) (0, 0)

. Then, the left

Λ

-translation equivariance of

F^{♮}

can be expressed as

F^{♮} T = T F^{♮}

. By applying Lemma 1 to

F^{♮} : C^{N} \to C^{N}

, we obtain a shallow convolutional neural network

Ψ : C^{N} \to C^{N}, Ψ (x) = \sum_{j = 1}^{J} c_{j} σ (B_{j} x + b_{j} 1_{N}), x \in C^{N},

where

B_{j} \in span {T^{k} : k = 0, \dots, N - 1}

, and

b_{j} \in C

for

j = 1, \dots, J

, which approximates

F^{♮}

uniformly on compact sets in

C^{N}

; that is,

F^{♮} (x) = (U_{g} F U_{g}^{*}) (x) \approx Ψ (x) = \sum_{j = 1}^{J} c_{j} σ (B_{j} x + b_{j} 1_{N}) x \in C^{N} .

By the continuity of the operators

U_{g}^{*}

and

U_{g}

, we obtain

F (x) = (U_{g}^{*} (U_{g} F U_{g}^{*}) U_{g}) (x) \approx \sum_{j = 1}^{J} c_{j} U_{g}^{*} σ (B_{j} U_{g} x + b_{j} 1_{N}), x \in C^{N} .

Note that since

σ (e^{π i / N} z) = e^{π i / N} σ (z)

for all

z \in C

, the function

σ : C \to C

commutes with

U_{g}^{*}

given by (15), that is,

U_{g}^{*} σ = σ U_{g}^{*}

. Therefore, we have

F (x) \approx \sum_{j = 1}^{J} c_{j} σ (U_{g}^{*} B_{j} U_{g} x + b_{j} U_{g}^{*} 1_{N}) = \sum_{j = 1}^{J} c_{j} σ (A_{j} x + b_{j} v), x \in C^{N},

where

A_{j} : = U_{g}^{*} B_{j} U_{g} \in span {ρ (k, ℓ) : (k, ℓ) \in Λ}

by (16), and the vector

v : = U_{g}^{*} 1_{N} \in C^{N}

satisfies

ρ (k, ℓ) v = ρ (k, ℓ) U_{g}^{*} 1_{N} \overset{(4)}{=} U_{g}^{*} L_{λ} 1_{N} = U_{g}^{*} 1_{N} = v, (k, ℓ) \in Λ .

(17)

Finally, we note that for any

(k, ℓ) \in Λ

,

\begin{matrix} ρ (k, ℓ) (\sum_{j = 1}^{J} c_{j} σ (A_{j} x + b_{j} v)) & = \sum_{j = 1}^{J} c_{j} ρ (k, ℓ) σ (A_{j} x + b_{j} v) \\ = \sum_{j = 1}^{J} c_{j} σ (ρ (k, ℓ) A_{j} x + b_{j} ρ (k, ℓ) v) \\ = \sum_{j = 1}^{J} c_{j} σ (A_{j} ρ (k, ℓ) x + b_{j} v), \end{matrix}

where we used that

ρ (k, ℓ)

is a linear (unitary) operator commuting with

σ

, and that

A_{j} \in C (ρ, Λ)

by (16) and

ρ (k, ℓ) v = v

by (17). Therefore, every map of the form

x \mapsto \sum_{j = 1}^{J} c_{j} σ (A_{j} x + b_{j} v)

is

(ρ, Λ)

-equivariant. □

Remark 5.

The proof relies on observing (16) and choosing

g \in C^{N}

such that

U_{g}^{*} σ = σ U_{g}^{*}

. To obtain

U_{g}^{*} σ = σ U_{g}^{*}

, we have chosen

g \in C^{N}

so that

U_{g}^{*}

is a diagonal matrix with exponential entries, and required an appropriate phase-homogeneity on σ so that σ commutes with those exponentials. This technique does not work for

Λ_{\infty}

because

U_{g}^{*}

cannot be expressed as a diagonal matrix for any

g \in C^{N}

in that case.

Example 1.

Let

N = 4

and

s = 1

, so that

Λ = Λ_{1} = {(0, 0), (1, 1), (2, 2), (3, 3)} \subset Z_{4} \times Z_{4}

. In this case, we have

ω = e^{2 π i / 4} = i

,

ω_{0} = e^{π i / 4} = \frac{1}{\sqrt{2}} (1 + i)

, and

ρ (k, ℓ) = ω_{0}^{- k ℓ} M^{ℓ} T^{k}

. Then,

\begin{matrix} ρ (0, 0) = I_{4}, \\ ρ (1, 1) = ω_{0}^{- 1} M T = ω_{0}^{- 1} [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & i & 0 & 0 \\ 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & - i \end{matrix}] [\begin{matrix} 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{matrix}] = ω_{0}^{- 1} [\begin{matrix} 0 & 0 & 0 & 1 \\ i & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 \\ 0 & 0 & - i & 0 \end{matrix}], \\ ρ (2, 2) = ω_{0}^{- 4} M^{2} T^{2} = - [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & - 1 \end{matrix}] [\begin{matrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{matrix}] = [\begin{matrix} 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & 1 \\ - 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{matrix}], \\ ρ (3, 3) = ω_{0}^{- 9} M^{2} T^{2} = ω_{0}^{- 1} [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & - i & 0 & 0 \\ 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & i \end{matrix}] [\begin{matrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \end{matrix}] = ω_{0}^{- 1} [\begin{matrix} 0 & 1 & 0 & 0 \\ 0 & 0 & - i & 0 \\ 0 & 0 & 0 & - 1 \\ i & 0 & 0 & 0 \end{matrix}], \end{matrix}

and

ρ (k, k) ρ (k^{'}, k^{'}) = ρ (k + k^{'}, k + k^{'})

for all

k, k^{'} = 0, 1, 2, 3

. With

g = {(1, 0, 0, 0)}^{T}

, we have

\begin{matrix} U_{g}^{*} = [\begin{matrix} ρ (0, 0) g & ρ (1, 1) g & ρ (2, 2) g & ρ (3, 3) g \end{matrix}] = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & ω_{0} & 0 & 0 \\ 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & ω_{0} \end{matrix}], \\ v : = U_{g}^{*} 1_{4} = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & ω_{0} & 0 & 0 \\ 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & ω_{0} \end{matrix}] [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}] = [\begin{matrix} 1 \\ ω_{0} \\ - 1 \\ ω_{0} \end{matrix}] . \end{matrix}

It is easy to check that v is invariant under

ρ (0, 0)

,

ρ (1, 1)

,

ρ (2, 2)

,

ρ (3, 3)

; that is,

ρ (k, k) v = v

for all

k = 0, 1, 2, 3

. Theorem 5 shows that any Λ-equivariant map

F : C^{4} \to C^{4}

can be approximated (uniformly on compact sets) by functions of the form

\begin{matrix} x \mapsto \sum_{m = 1}^{M} c_{m} σ (A_{m} x + b_{m} v), \end{matrix}

where

A_{m} \in span {ρ (k, k) : k = 0, 1, 2, 3}

and

b_{m} \in C

for

m = 1, \dots, M

. It is worth noting that while ρ is a unitary group representation of

Λ = {(0, 0), (1, 1), (2, 2), (3, 3)}

on

C^{4}

, the map

{π |}_{Λ}

given by

π (k, ℓ) = M^{ℓ} T^{k}

for

(k, ℓ) \in Λ

is not a group representation of Λ on

C^{4}

, since

π (1, 1) π (1, 1) = (- i) π (2, 2)

by (8).

4. Discussion

In this paper, we used finite-dimensional time-frequency analysis to investigate the properties of time-frequency shift equivariant maps that are generally nonlinear.

First, we established a one-to-one correspondence between

Λ

-equivariant maps and certain phase-homogeneous functions, accompanied by a reconstruction formula expressing

Λ

-equivariant maps in terms of these functions. This deepens our understanding of the structure of

Λ

-equivariant maps by connecting them to their corresponding phase-homogeneous functions.

Next, we considered the approximation of

Λ

-equivariant maps by neural networks. When

Λ

is a cyclic subgroup of order N in

Z_{N} \times Z_{N}

, we proved that every

Λ

-equivariant map can be approximated by a shallow neural network with affine linear maps formed as linear combinations of time-frequency shifts by

Λ

. For the subgroup

Λ = ⟨ (1, 0) ⟩ = {(0, 0), (1, 0), \dots, (N - 1, 0)}

, the

Λ

-equivariance corresponds to translation equivariance, and our result shows that every translation equivariant map can be approximated by a shallow convolutional neural network, which aligns well with the established effectiveness of convolutional neural networks (CNNs) for applications involving translation equivariance. In this context, our result extends the approximation of translation equivariant maps to general

Λ

-equivariant maps, with potential applications in signal processing.

Finally, we note that the tools used to prove the approximation result (Theorem 2) are applicable in a more general setting than the one described in Section 3.3. In particular, Definitions 6 and 7, and Proposition 4 apply to general unitary representations of arbitrary groups. Therefore, our approach can be adapted to derive similar results for general group-equivariant maps, which we leave as a direction for future research.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (RS-2023-00275360).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The author would like to thank Andrei Caragea, Johannes Maly, Goetz Pfander, and Felix Voigtlaender for their valuable discussions during the early stages of this paper.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. A Proof of the Fact That |Λ°| = L²/|Λ| for Any Subgroup Λ of ℤ_L × ℤ_L

For finite abelian groups, it is known (see Lemma 4.2 of [25]) that the adjoint

Λ^{\circ}

of a subgroup

Λ \subset G \times \hat{G}

is the symplectic analogue of the dual subgroup

Λ^{⊥}

, in the sense that

Λ^{\circ} = J Λ^{⊥}

, where

J = (\begin{matrix} 0 & I_{| G |} \\ - I_{| G |} & 0 \end{matrix}) .

(In fact, a similar characterization is known for locally compact Abelian groups; see, e.g., Lemma 3.5.9 and Lemma 7.7.3 of [36]. In particular, for separable subgroups

Λ = Λ_{1} \times Λ_{2} < G \times \hat{G}

, we have

Λ^{\circ} = Λ_{2}^{⊥} \times Λ_{1}^{⊥}

while

Λ^{⊥} = Λ_{1}^{⊥} \times Λ_{2}^{⊥}

.) This implies that

Λ^{\circ}

has the same cardinality as

Λ^{⊥}

.

Here, the dual (annihilator)

H^{⊥}

of a subgroup H of

G

is defined as

H^{⊥} = {m \in \hat{G} : ⟨ m, n ⟩ = 1 for all n \in H},

where

⟨ m, n ⟩ = e^{2 π i (m_{1} n_{1} / N_{1} + \dots + m_{d} n_{d} / N_{d})}

for

m = (m_{1}, \dots, m_{d})

,

n = (n_{1}, \dots, n_{d})

, if

G = Z_{N_{1}} \times \dots \times Z_{N_{d}}

. It is easily seen that

| H^{⊥} | \cdot | H | = | G |

, for instance, by taking

x = 1^{G} = (1, 1, \dots, 1)

in Theorem 6.3 of [21],

| H^{⊥} | \cdot \sum_{h \in H} x (h) = \sum_{m \in H^{⊥}} \hat{x} (m), x \in C^{G} .

Therefore, we have

| Λ^{\circ} | = | Λ^{⊥} | = | Z_{L} \times Z_{L} | / | Λ | = L^{2} / | Λ |

.

References

LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, USA, 8–13 December 2014; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 27. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
Hershey, S.; Chaudhuri, S.; Ellis, D.P.W.; Gemmeke, J.F.; Jansen, A.; Moore, R.C.; Plakal, M.; Platt, D.; Saurous, R.A.; Seybold, B.; et al. CNN Architectures for Large-Scale Audio Classification. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 131–135. [Google Scholar] [CrossRef]
Oppenheim, A.; Schafer, R. Discrete-Time Signal Processing, 3rd ed.; Pearson: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
Walnut, D. An Introduction to Wavelet Analysis; Birkhäuser: Boston, MA, USA, 2002. [Google Scholar]
Boggess, A.; Narcowich, F.J. A First Course in Wavelets with Fourier Analysis, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Cohen, T.; Welling, M. Group Equivariant Convolutional Networks. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Volume 48, pp. 2990–2999. [Google Scholar]
Cohen, T.; Geiger, M.; Weiler, M. A general theory of equivariant CNNs on homogeneous spaces. arXiv 2018, arXiv:1811.02017. [Google Scholar]
Yarotsky, D. Universal approximations of invariant maps by neural networks. Constr. Approx. 2022, 55, 407–474. [Google Scholar] [CrossRef]
Cahill, J.; Iverson, J.W.; Mixon, D.G.; Packer, D. Group-invariant max filtering. arXiv 2022, arXiv:2205.14039. [Google Scholar] [CrossRef]
Balan, R.; Tsoukanis, E. G-invariant representations using coorbits: Bi-lipschitz properties. arXiv 2023, arXiv:2308.11784. [Google Scholar]
Balan, R.; Tsoukanis, E. G-invariant representations using coorbits: Injectivity properties. arXiv 2023, arXiv:2310.16365. [Google Scholar]
Huang, N.; Levie, R.; Villar, S. Approximately equivariant graph networks. arXiv 2023, arXiv:2308.10436. [Google Scholar]
Blum-Smith, B.; Villar, S. Machine learning and invariant theory. arXiv 2022, arXiv:2209.14991. [Google Scholar] [CrossRef]
Wang, R.; Walters, R.; Yu, R. Data augmentation vs. equivariant networks: A theory of generalization on dynamics forecasting. arXiv 2022, arXiv:2206.09450. [Google Scholar]
Pfander, G.E. Gabor frames in finite dimensions. In Finite Frames; Casazza, P.G., Kutyniok, G., Eds.; Applied and Numerical Harmonic Analysis; Birkhäuser: Boston, MA, USA, 2013; pp. 193–239. [Google Scholar]
Kaplan, A.; Lee, D.G.; Pfander, G.E.; Pohl, V. Sparse deterministic and stochastic channels: Identification of spreading functions and covariances. In Compressed Sensing in Information Processing; Kutyniok, G., Rauhut, H., Kunsch, R.J., Eds.; Springer International Publishing: New York, NY, USA, 2022; pp. 105–144. [Google Scholar]
Gröchenig, K. Foundations of Time-Frequency Analysis; Applied and Numerical Harmonic Analysis; Birkhäuser: Boston, MA, USA, 2001. [Google Scholar] [CrossRef]
Lawrence, J.; Pfander, G.E.; Walnut, D. Linear independence of Gabor systems in finite dimensional vector spaces. J. Fourier Anal. Appl. 2005, 11, 715–726. [Google Scholar] [CrossRef]
Feichtinger, H.G.; Kozek, W.; Luef, F. Gabor analysis over finite Abelian groups. Appl. Comput. Harmon. Anal. 2009, 26, 230–248. [Google Scholar] [CrossRef]
Christensen, O. An Introduction to Frames and Riesz Bases, 2nd ed.; Birkhäuser: New York, NY, USA, 2016. [Google Scholar]
Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 1991, 4, 251–257. [Google Scholar] [CrossRef]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Leshno, M.; Lin, V.; Pinkus, A.; Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 1993, 6, 861–867. [Google Scholar] [CrossRef]
Pinkus, A. Approximation theory of the MLP model in neural networks. Acta Numer. 1999, 8, 143–195. [Google Scholar] [CrossRef]
Voigtlaender, F. The universal approximation theorem for complex-valued neural networks. Appl. Comput. Harmon. Anal. 2023, 64, 33–61. [Google Scholar] [CrossRef]
Mhaskar, H.N. Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 1996, 8, 164–177. [Google Scholar] [CrossRef]
Yarotsky, D. Error bounds for approximations with deep ReLU networks. Neural Netw. 2017, 94, 103–114. [Google Scholar] [CrossRef] [PubMed]
Caragea, A.; Lee, D.G.; Maly, J.; Pfander, G.E.; Voigtlaender, F. Quantitative approximation results for complex-valued neural networks. SIAM J. Math. Data Sci. 2022, 4, 553–580. [Google Scholar] [CrossRef]
Feichtinger, H.G.; Strohmer, T. (Eds.) Gabor Analysis and Algorithms; Birkhäuser: Boston, MA, USA, 1998. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Approximation of Time-Frequency Shift Equivariant Maps by Neural Networks

Abstract

1. Introduction

Organization of the Paper

2. Time-Frequency Shift Equivariant Maps

2.1. Time-Frequency Shift Operators

2.2. $Λ$ -Equivariant Maps

3. Approximation of $Λ$ -Equivariant Maps

3.1. Embedding of $Λ$ into the Weyl–Heisenberg Group

3.2. Group Representations and Neural Networks

3.3. Cyclic Subgroups $Λ$ of $Z_{N} \times Z_{N}$

4. Discussion

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. A Proof of the Fact That |Λ°| = L²/|Λ| for Any Subgroup Λ of ℤ_L × ℤ_L

References

Article Metrics

Citations

Article Access Statistics

Approximation of Time-Frequency Shift Equivariant Maps by Neural Networks

Abstract

1. Introduction

Organization of the Paper

2. Time-Frequency Shift Equivariant Maps

2.1. Time-Frequency Shift Operators

2.2. Λ -Equivariant Maps

3. Approximation of Λ -Equivariant Maps

3.1. Embedding of Λ into the Weyl–Heisenberg Group

3.2. Group Representations and Neural Networks

3.3. Cyclic Subgroups Λ of Z N × Z N

4. Discussion

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. A Proof of the Fact That |Λ°| = L2/|Λ| for Any Subgroup Λ of ℤL × ℤL

References

Article Metrics

Citations

Article Access Statistics

2.2. $Λ$ -Equivariant Maps

3. Approximation of $Λ$ -Equivariant Maps

3.1. Embedding of $Λ$ into the Weyl–Heisenberg Group

3.3. Cyclic Subgroups $Λ$ of $Z_{N} \times Z_{N}$

Appendix A. A Proof of the Fact That |Λ°| = L²/|Λ| for Any Subgroup Λ of ℤ_L × ℤ_L