1. Introduction
Over the past decade, machine learning techniques based on deep neural networks, commonly referred to as 
deep learning [
1], have achieved significant breakthroughs across a wide range of fields, including image recognition [
2,
3], speech recognition [
4], language translation [
5,
6], and game playing [
7], among others. These advancements are largely driven by the availability of increasingly large training datasets and greater computational resources. Another important factor is the development of specialized neural network architectures, including convolutional neural networks [
2], residual networks [
3], recurrent networks (notably LSTMs [
5]), and transformer networks [
6].
A common theme in the design of neural network architectures is the necessity to respect the symmetries inherent in the task at hand. For instance, in image classification, the classification result should remain invariant under small translations of the input image, making convolutional neural networks a suitable choice. Likewise, in audio classification [
8], the classification result should be invariant to shifts in time or changes in pitch. In principle, a fully connected neural network can learn to respect such symmetries provided that training data are sufficiently given. Nevertheless, architectures that are inherently aligned with these symmetries tend to exhibit improved generalization and thus show better performance.
In mathematical terms, symmetries can be expressed as follows. Let V be a vector space and let  be the general linear group of V. For a group G and a map , we say that a map  is equivariant under group actions of G (or simply G-equivariant) if  for all , and invariant under group actions of G (or simply G-invariant) if  for all . We will be focusing on the case where V is a Hilbert space and  is a unitary operator for all . (A Hilbert space is a vector space equipped with an inner product that induces a distance function, making it a complete metric space. Examples of Hilbert spaces include  and , and Hilbert spaces are often regarded as natural generalizations of signal spaces.)
A particularly important and well-studied example of equivariance involves translations. It is well known that translation-equivariant linear operators are exactly the convolution operators (see, e.g., Section 2.3 of [
9], Theorem 4.12 of [
10], and Theorem 2.17 of [
11]), and that convolutional neural networks (CNNs) are well-suited for approximating these operators. As a natural generalization of CNNs, Cohen and Welling [
12] introduced the so-called group equivariant convolutional neural networks (GCNNs), which can handle more general symmetry groups than just translations. Later, Cohen et al. [
13] developed a general framework for GCNNs on homogeneous spaces such as 
 and 
, and Yarotsky [
14] investigated the approximation of equivariant operators using equivariant neural networks. More recently, Cahill et al. [
15] introduced the so-called group-invariant max filters, which are particularly useful for classification tasks involving symmetries, and Balan and Tsoukanis [
16,
17] constructed stable embeddings on quotient space modulo group action, yielding group-invariant representations via coorbits. Further advances include the work of Huang et al. [
18], who designed approximately group-equivariant graph neural networks by focusing on active symmetries, and Blum-Smith and Villar [
19], who introduced a method for parameterizing invariant and equivariant functions based on invariant theory. In addition, Wang et al. [
20] provided a theoretical analysis of data augmentation and equivariant neural networks applied to non-stationary dynamics forecasting.
In this paper, we are particularly interested in the setting of finite-dimensional time-frequency analysis, which provides a versatile framework for a wide range of signal processing applications, see, e.g., [
21,
22]. It is known that every 
linear map from 
 to 
 can be expressed as a linear combination of compositions of translations and modulations (see (
3) below). We consider maps 
 that are generally 
nonlinear and are Λ-
equivariant for a given subgroup 
 of 
, that is, 
 for all 
. Here, 
 represents the time-frequency shift by 
, where 
 are the translation and modulation operators defined as 
 and 
, 
, for 
, respectively (see 
Section 2.1 for further details). For any 
 and any nonzero 
, we define 
 by 
, 
. For any 
, we say that a function 
 is Ω-
phase homogeneous if 
 for all 
 and 
.
We first address the properties of the mapping  from the space of -equivariant functions  to the space of certain phase homogeneous functions.
Theorem 1 (see Theorem 3 below)
. Assume that  for some subgroup Λ of  and some vector . Then, the mapping  is an injective map from the space of Λ-equivariant functions  to the space of -phase homogeneous functions , where . Moreover, if  is a dual frame of  in , then a Λ-equivariant function  can be expressed asIf , then the mapping  is a bijective map from the space of Λ-equivariant functions  to the space of -phase homogeneous functions . We then consider the approximation of -equivariant maps. In particular, we show that if  is a cyclic subgroup of order N in , then every -equivariant map can be easily approximated by a shallow neural network whose affine linear maps consist of linear combinations of time-frequency shifts by .
Theorem 2 (see Theorem 5 below)
. Assume that  is shallow universal and satisfies  for all . Let  for some . Then, any continuous Λ-equivariant map  can be approximated (uniformly on compact sets) by a shallow neural networkwhere ,  for , and  satisfies  for all . Moreover, every map of this form is Λ-equivariant. In the case , i.e., , the -equivariant maps  are precisely those that are translation equivariant, meaning that . Furthermore, if F is linear, then F is just a convolutional map, which can be expressed as a linear combination of , , or simply as an  circulant matrix. If F is nonlinear, then Theorem 2 shows that F can be approximated by a shallow neural network whose affine linear maps are convolutional maps, i.e., by a shallow convolutional neural network. This agrees with the well-established fact that convolutional neural networks (CNNs) are particularly well-suited for applications involving translation equivariance, especially in image and signal processing.
  Organization of the Paper
In 
Section 2, we begin by reviewing some basic properties of time-frequency shift operators, followed by a discussion on time-frequency group equivariant maps, and then prove our first main result, Theorem 1, which establishes a 1:1 correspondence between 
-equivariant maps and certain phase-homogeneous functions. 
Section 3 is devoted to the approximation of 
-equivariant maps. We first discuss the embedding of 
 into the Weyl–Heisenberg group, which allows for the use of tools from group representation theory. (The finite Weyl–Heisenberg group 
 is the set 
 equipped with group operation 
. The noncommutativity of 
 plays an important role in finite-dimensional time-frequency analysis; see, e.g., [
21,
23].) After reviewing key concepts from group representation theory, we consider the case of cyclic subgroups of 
, where group representations can be defined directly without embedding into the Weyl–Heisenberg group. 
Section 3 concludes with the proof of our second main result, Theorem 2, which establishes the approximation of 
-equivariant maps by a shallow neural network whose affine linear maps consist of linear combinations of time-frequency shifts by 
.
  3. Approximation of -Equivariant Maps
In this section, we consider an approximation of continuous 
-equivariant maps 
 that are generally nonlinear, where 
 is a subgroup of 
 and the 
-equivariance is defined by (
5). For instance, the map 
 given by 
 with 
, is a nonlinear continuous 
-equivariant map.
As seen in 
Section 2.2 (particularly in Theorem 3 and its proof), working with the time-frequency shift operators 
, 
, usually requires careful bookkeeping of extra multiplicative phase factors due to the non-commutativity of 
T and 
M. (The non-commutativity of 
T and 
M can often be frustrating. However, it is precisely this non-commutativity that has given rise to the deep and rich theory of time-frequency analysis [
23].) In fact, the map
      
      is generally not a group homomorphism; indeed,
      
      is equal to 
 only if 
 is a multiple of 
N (see Proposition 1). (Although 
 is not a group homomorphism and thus not a group representation, it is often referred to as a 
projective group representation of 
G on 
. In general, a map 
 is called a 
projective group representation of 
G on 
 if for each pair of 
, there exists a unimodular 
 such that 
; see, e.g., [
25].) Obviously, the computations involved would be simplified significantly if 
 were a group homomorphism in general. Note that, as mentioned in 
Section 1, a group homomorphism 
 whose images are unitary operators on 
 is called a 
(unitary) group representation of G on , where 
G is a group and 
 is a separable Hilbert space. Therefore, the map 
 would be a unitary representation if it were a group homomorphism.
In the following, we first discuss a systematic method of avoiding such extra multiplicative phase factors by embedding  into the Weyl–Heisenberg group. After briefly reviewing essential concepts on group representations and neural networks, we consider cyclic subgroups of , in which case the map  can be replaced by a unitary group representation. We show that if  is a cyclic subgroup of , then any -equivariant map  can be approximated with shallow neural networks involving the adjoint group , which has significantly fewer degrees of freedom compared with standard shallow neural networks.
  3.1. Embedding of  into the Weyl–Heisenberg Group
To avoid the bookkeeping of extra multiplicative phase factors, we can simply embed the subgroups of 
 into the finite Weyl–Heisenberg group 
, on which group representations can be defined. There exists a group representation 
, known as the 
Schrödinger representation, which satisfies 
 for all 
. In fact, for any subgroup 
 of 
 and any subgroup 
 of 
 containing 
, the map
        
        is a group representation of 
 on 
, with the group operation on 
G given by
        
        Clearly, we have 
 for all 
.
It is clear that a map 
 is 
-equivariance in the sense of (
5) if and only if it is 
-equivariant in the sense of Definition 4. Moreover, in this case, Proposition 2 implies that 
F is 
-phase homogeneous, which is equivalent to 
 for all 
. Consequently, we have the following proposition.
Proposition 3. For any subgroup Λ of  and any , the following are equivalent.
- (i) 
- F is Λ-equivariant; 
- (ii) 
- F is -equivariant; 
- (iii) 
- F is -equivariant. 
 Using the true group representation  instead of  allows us to avoid the tedious bookkeeping of extra multiplicative phase factors. Note, however, that  requires three input parameters, while  involves only two. In fact, the description of the extra phase factors is simply transferred to the third parameter of . Nevertheless, an important advantage of using  instead of  is that it allows for the use of tools from group representation theory.
  3.2. Group Representations and Neural Networks
In this section, we review some concepts and tools from group representation theory and introduce the so-called ♮-transform and its inverse transform for later use. We also review the basic structure of neural networks and the universal approximation theorem.
We assume that G is a finite group, and consider maps of the form , where  is a finite-dimensional Hilbert spaces on which a unitary representation  of G is defined. This means that for each , the map  is a linear unitary operator, and that  is a group homomorphism, i.e.,  for all . Let us formally state the definition of equivariance and invariance in this setting.
Definition 4 (Equivariance and Invariance). For a group G and a unitary representation ρ of G on a Hilbert space , we say that a map  is
- -equivariant if  for all ; 
- -invariant if  for all . 
 Note that a -equivariant/invariant map  is not necessarily linear or bounded.
Definition 5. For a group G, the left translation 
of a vector  by  is given by  In fact, the map  is a group homomorphism from G to , that is,  for all , and therefore, it induces a group representation of G on . We say that a map  is left G-translation equivariant if  for all .
Definition 6. Let G be a group and let ρ be a unitary representation of G on a Hilbert space . Given a window , the set  is called the orbit of 
g under 
 for 
. The map  defined byis called the analysis operator of 
, and its adjoint operator  given byis called the synthesis operator of 
.
  We are particularly interested in the case where the orbit of 
g spans 
, that is, 
. Since 
 is finite-dimensional, this implies that 
 is a frame for 
 and the associated frame operator 
 is a positive, self-adjoint bounded operator on 
. It follows from (
10) that 
 and thus 
 for all 
. For any 
, we have
        
        where 
. This shows that 
 is the identity operator on 
, i.e.,
        
        and correspondingly, 
 is the canonical dual frame of 
.
In light of (
11), we newly introduce a transform which lifts a map 
 to a map 
, and also its inverse transform.
Definition 7. Let G be a finite group and let ρ be a unitary representation of G on a finite-dimensional Hilbert space . Assume that , and let  and . For any map , the ♮-transform 
of F is defined byFor any map , the inverse ♮-transform 
of Φ is defined by  As shown in 
Figure 1, the ♮-transform converts a map 
 into a map 
, and the inverse ♮-transform converts a map 
 into a map 
.
Proposition 4. Let G be a finite group, and let ρ be a unitary representation of G on a finite-dimensional Hilbert space . Assume that , and let  and . Then, the following hold.
- (i) 
-  for any map . 
- (ii) 
- A map  is continuous if and only if  is continuous. 
- (iii) 
- A map  is -equivariant if and only if  is left G-translation equivariant. 
 Proof.  (i) It follows from (
11) that 
 for any 
.
		  
- (ii) Since the maps  and  are bounded linear operators, the continuity of F implies the continuity of . Similarly, the continuity of  implies the continuity of . 
- (iii) It follows from ( 10- ) that the  G- -equivariance of  F-  implies the left  G- -translation equivariance of  - . Similarly, the left  G- -translation equivariance of  -  implies the  G- -equivariance of  - .    □ 
 We now provide a brief review of neural networks and the universal approximation theorem.
Let  be either  or . An activation function is a function  that acts componentwise on vectors; that is,  for any .
A fully connected feedforward neural network with 
P hidden layers is given by
        
        where 
, 
 is affine-linear with 
 and 
. Such a function 
 is often called a 
neural network, but we will call it a 
σ-neural network to specify the activation function employed.
A 
shallow neural network is a neural network with a single (
) hidden layer. In particular, a shallow neural network with output dimension 
 is given by
        
Definition 8. A function  is called shallow universal if the set of -valued shallow σ-networks is dense in the set of all continuous functions , with respect to locally uniform convergence.
 The following theorem, known as the universal approximation theorem, is a fundamental result in the theory of neural networks.
Theorem 4 (The universal approximation theorem; see [
27,
28,
29,
30,
31] for 
, and [
32] for 
)
. Let .- A function  is shallow universal if and only if σ is not a polynomial. 
- A function  is shallow universal if and only if σ is not a polyharmonic. Here, a function  is called polyharmonic if there exists  such that  in the sense of real variables and , where  is the usual Laplace operator on . 
 In 1996, Mhaskar [
33] obtained a quantitative result for approximation of 
 functions using shallow networks with smooth activation functions. More recently, Yarotsky [
34] derived a quantitative approximation result for deep ReLU networks, where ReLU networks are given by (
12) with 
 and the ReLU activation function 
, 
, and “deep” refers to having a large 
 in (
12). For the case of complex-valued deep neural networks, we refer to [
35].
  3.3. Cyclic Subgroups  of 
We now consider the case of cyclic subgroups of 
, where group representations can be defined directly without embedding into the Weyl–Heisenberg group. The cyclic subgroups of order 
N in 
 are given by
        
       If 
N is prime, these are the only nontrivial proper subgroups of 
, but if 
N is composite, there exist noncyclic subgroups of order 
N in 
; for instance, 
 is a noncyclic subgroup of order 6 in 
. It is easily seen that the adjoint group of 
 in 
 is 
 itself; that is, 
 (see 
Section 2.1).
We define the map 
 by
        
        Setting 
, we may simply write
        
        For any 
, we have
        
        where we used the fact that 
 for all 
. This shows that 
 is a group homomorphism and thus a unitary group representation of 
 on 
. Due the symmetry in (
14), 
 is called the 
symmetric representation of 
 on 
.
Note that for any 
 and 
, we have 
 if and only if 
, where we used the relation 
 from (
14). This implies that a map 
 is 
-equivariant in the sense of Definition 1 if and only if it is 
-equivariant in the sense of Definition 4. Importantly, employing 
-equivariance in place of 
-equivariance will allow us to apply the tools from group representation theory described in 
Section 3.2.
We are interested in approximating 
-equivariant (or 
-equivariant) maps 
 by neural networks. For this, we need to choose a complex-valued activation function 
 (see 
Section 3.2) for the neural networks. Since 
 acts componentwise on its input, i.e., 
, it clearly commutes with all translations, i.e., 
; however, 
 does not commute with modulations in general. As shown in (
14), the representation 
 includes the multiplicative phase factor 
, so we will assume that 
 is 
-phase homogeneous (see Definition 2):
        which ensures that 
 commutes with all 
 and all modulations.
We first need the following lemma. Below, we denote by  the vector whose entries are all equal to 1.
Lemma 1. Assume that  is shallow-universal. If a map  satisfies , then there exists a shallow convolutional neural networkwhere  and  for , which approximates F uniformly on compact sets in .  Proof.  Using the universal approximation theorem (see Theorem 4), the first output component map 
, 
, can be approximated by a shallow network
          
          with some 
, 
, 
. Note that since 
 and since 
 is the identity map on 
, we have 
 for all 
. This condition provides approximations for other component maps 
, 
, with 
, in terms of 
. In fact, we have
          
          Consequently, the map 
, 
, is approximated by the map 
 defined by 
 for 
. For 
, let 
 be the circular convolution of 
a and 
b defined by 
, where 
x and 
y are understood as 
N-periodic sequences on the integers. Then, for any 
 and 
, we have
          
          and therefore, we may write
          
          It is easily seen that every convolutional map 
, 
, is a linear map, and in fact, a linear combination of 
, 
. Hence, the map 
 can be rewritten as
          
          where 
 for 
. The fact that 
 approximates 
F uniformly on compact sets in 
 follows from the uniform approximation of 
 by 
 on compact sets in 
. Finally, we note that 
 expressed above is a shallow convolutional neural network described in 
Section 3.2. This completes the proof.    □
 Theorem 5. Assume that  is shallow universal and satisfies  for all . Let  for some . Then, any continuous -equivariant (or Λ-equivariant) map  can be approximated (uniformly on compact sets) by a shallow neural networkwhere  and  for , and  satisfies  for all . Moreover, every map of this form is -equivariant (or Λ-equivariant).  Remark 4. Since  by (14), we have  for any . On the other hand, the vectors satisfying  can be significantly different from those satisfying .  Proof.  Since 
 is cyclic, we order its elements as 
, and treat 
 as 
, since 
. Then, the operators 
 and 
, given in Definition 6, can be represented as the 
 matrices
          
          respectively, where 
 denotes the conjugate transpose. Setting 
, we have
          
          so that 
 and 
. As a result, the set 
 forms an orthonormal basis for 
.
Note that for any continuous 
-equivariant 
, the map 
 is continuous and left 
-translation equivariant (see Proposition 4). If 
F is 
linear, then 
 is also linear and can be represented as a circulant matrix, equivalently, 
 for some 
, so that
          
         Therefore, the commutant of 
 is given by
          
          On the other hand, since 
 by (
14), the commutant of 
 coincides with that of 
, i.e.,
          
          Since the adjoint group of 
 is itself, i.e., 
 (see 
Section 2.1), we obtain
          
Now, we consider the general case where 
 is possibly 
nonlinear. If 
F is nonlinear, then 
 is a nonlinear left 
-translation equivariant map. Since 
 is an additive group and since 
 and 
, the map 
 can be viewed as a map from 
 to 
. For simplicity, we will abuse notation and write 
 instead of 
; thus, the first component of 
 will be simply denoted by 
 instead of 
. Then, the left 
-translation equivariance of 
 can be expressed as 
. By applying Lemma 1 to 
, we obtain a shallow convolutional neural network
          
          where 
, and 
 for 
, which approximates 
 uniformly on compact sets in 
; that is,
          
          By the continuity of the operators 
 and 
, we obtain
          
         Note that since 
 for all 
, the function 
 commutes with 
 given by (
15), that is, 
. Therefore, we have
          
          where 
 by (
16), and the vector 
 satisfies
          
         Finally, we note that for any 
,
          
          where we used that 
 is a linear (unitary) operator commuting with 
, and that 
 by (
16) and 
 by (
17). Therefore, every map of the form 
 is 
-equivariant.    □
 Remark 5. The proof relies on observing (16) and choosing  such that . To obtain , we have chosen  so that  is a diagonal matrix with exponential entries, and required an appropriate phase-homogeneity on σ so that σ commutes with those exponentials. This technique does not work for  because  cannot be expressed as a diagonal matrix for any  in that case.  Example 1. Let  and , so that . In this case, we have , , and . Then,and  for all . With , we haveIt is easy to check that v is invariant under , , , ; that is,  for all . Theorem 5 shows that any Λ-equivariant map  can be approximated (uniformly on compact sets) by functions of the formwhere  and  for . It is worth noting that while ρ is a unitary group representation of  on , the map  given by  for  is not a group representation of Λ on , since  by (8).    4. Discussion
In this paper, we used finite-dimensional time-frequency analysis to investigate the properties of time-frequency shift equivariant maps that are generally nonlinear.
First, we established a one-to-one correspondence between -equivariant maps and certain phase-homogeneous functions, accompanied by a reconstruction formula expressing -equivariant maps in terms of these functions. This deepens our understanding of the structure of -equivariant maps by connecting them to their corresponding phase-homogeneous functions.
Next, we considered the approximation of -equivariant maps by neural networks. When  is a cyclic subgroup of order N in , we proved that every -equivariant map can be approximated by a shallow neural network with affine linear maps formed as linear combinations of time-frequency shifts by . For the subgroup , the -equivariance corresponds to translation equivariance, and our result shows that every translation equivariant map can be approximated by a shallow convolutional neural network, which aligns well with the established effectiveness of convolutional neural networks (CNNs) for applications involving translation equivariance. In this context, our result extends the approximation of translation equivariant maps to general -equivariant maps, with potential applications in signal processing.
Finally, we note that the tools used to prove the approximation result (Theorem 2) are applicable in a more general setting than the one described in 
Section 3.3. In particular, Definitions 6 and 7, and Proposition 4 apply to general unitary representations of arbitrary groups. Therefore, our approach can be adapted to derive similar results for general group-equivariant maps, which we leave as a direction for future research.