A Literature Review: Geometric Methods and Their Applications in Human-Related Analysis

Geometric features, such as the topological and manifold properties, are utilized to extract geometric properties. Geometric methods that exploit the applications of geometrics, e.g., geometric features, are widely used in computer graphics and computer vision problems. This review presents a literature review on geometric concepts, geometric methods, and their applications in human-related analysis, e.g., human shape analysis, human pose analysis, and human action analysis. This review proposes to categorize geometric methods based on the scope of the geometric properties that are extracted: object-oriented geometric methods, feature-oriented geometric methods, and routine-based geometric methods. Considering the broad applications of deep learning methods, this review also studies geometric deep learning, which has recently become a popular topic of research. Validation datasets are collected, and method performances are collected and compared. Finally, research trends and possible research topics are discussed.


Introduction
With the emergence of low-cost RGB-D cameras, human bodies can be digitized at a lower cost [1][2][3], and their actions can also be easily captured [4,5]. In 3D spaces (for point cloud models or meshes), studying the geometric attributes becomes natural. The geometric attributes (for example, the number of holes and the geometric adjacency of objects) of data are extracted, and methods for studying geometric attributes are proposed.
The notion of "geometric methods" is used in this review and refers to methods that study the geometric attributes of data, methods with geometric constraints, or generalized methods with spatial or temporal information. When dealing with continuous 3D models, certain geometries, like topology, Riemann manifold, conformal geometry, etc., are better choices. They are capable of describing the properties of the geometric object from the perspective of the geometric object. In a Euclidean space, the global coordinates are cumbersome for describing attributes along the object surface. For example, the geodesic distance is a better description for two points on a geometric object than the Euclidean distance; the geodesic distance is from the perspective of the points on the surface, and it considers the distance one point needs to traverse on the surface. local geometric properties and is applied along the object surface. There are also methods that extend deep learning to manifolds through embedding. Since there is not a canonical embedding for a general manifold, the researchers in [11,12,29,30] proposed solutions for the special case of Riemannian manifolds.
Despite the wide applications of geometric methods especially in HRA, there are few literature reviews. Related works are studied extensively in this literature review. Because geometric concepts require math foundations, mathematical knowledge of geometric spaces, such as basic concepts, geometric properties, and geometric measurements, are firstly introduced. The contributions of this paper include: • Geometric methods and their applications in human-related analysis are extensively studied. • Geometric methods are studied based on the scope in which they are applied, and we classify them into: feature-oriented geometric methods, object-oriented geometric methods, and routine-based geometric methods. • Geometric methods and their performances on standard datasets are collected so that researchers who are interested in this topic can identify the state of the art.
The remainder of the paper is organized as follows. Section 2 introduces basic geometric concepts. Section 3 explores variant types of geometric methods. Section 4 explores specific geometric methods for human-related analysis. Section 5 introduces deep learning-based geometric methods. Section 6 studies generalized geometrics for human-related analysis. Section 7 collects a validation dataset for human-related analysis. Section 8 compares the performances of related works. Section 9 concludes the review and discusses future research trends. Figure 1 presents an overall view of the paper. Overall view of this paper. This review is mainly composed of four modules: geometric methods for generic objects; geometric method-based human-related analysis; geometric deep learning for human-related analysis; and generalized geometrics for human-related analysis. Each module has its subsections, each of which is a class of methods based on its categorization standards. HSA, human shape analysis.

Basic Geometric Concepts
In this section, important concepts in the topology and manifolds that are widely used in geometric methods are introduced. Concepts of manifolds are usually difficult to conceive of and can be defined in various ways. This review selects among the different definitions of each concept and chooses the one that is easier to understand. It is more conceivable to consider topological concepts as being developed from set theories, so set theories are firstly introduced. Many concepts of manifolds are developed based on topological concepts, so manifold concepts are introduced following topological concepts. Figure 2 shows the components of this section. Before presenting detailed definitions, the mathematical symbols are listed below. Figure 2. Main components of Section 2. This section is composed of four modules: set theory concepts; topology concepts developed from set theory; algebraic topology concepts (topology plus algebra); and manifold concepts (a topology that locally resembles Euclidean spaces).

Set Theory Concepts
An easier way to interpret topology is to consider topological concepts to be developed from set theories. Selected concepts from set theories are introduced in this section. Other concepts, like equivalence relation, equivalence class, and covering, are introduced in the Appendix A.

Metric
A metric or distance function on a set X is a real-valued function d defined on the Cartesian product XˆX such that for all x, y, z P X: • dpx, yq ď 0 with equality iff x " y. • dpx, yq " dpy, xq. • dpx, yq`dpy, zq ď dpx, zq.

Quotient Vector Space
The quotient of a vector space by a subspace can be defined based on the equivalence class. Let V be a vector space over a field k and W Ď V be a subspace. An equivalence relation " W on V can be denoted by v " W v 1 if and only if v´v 1 P W, where " W is an equivalence relation. The quotient is denoted by V/" W , and V/" W is itself a vector space over k, with the addition and scalar multiplication rules satisfying rvs`rv 1 s " rv`v 1 s and κrvs " rκvs. V/" W can also be denoted by V/W, which is referred to as the quotient space of V by W.

Topological Concepts
Topology is independent of any particular coordinate representation, while objects' representation in Euclidean spaces is certain. For example, every point in a three-dimensional Euclidean space is determined by three coordinates, while in topology, a global coordinate system does not exist. The self perspective is the essence of the conciseness in topological representations. It retains more general features, like the number of holes in the geometry while ignoring some fine details, like the distance functions. Specifically, topological properties of a shape are invariant under certain deformations: they do not change if the shape is stretched or compressed, but change under other deformations, like "tearing" or "adjoining". Topological concepts are selectively introduced in this section. Please refer to the Appendix for definitions of closed sets, the interior and closure of a set, limit points, continuous functions, quotient maps, Hausdorff space, and metrics.

Topology
Here, the geometric view of the topology developed from surfaces and neighborhoods is adopted. According to [31], for each point x of a set X, the neighborhoods of x are a non-empty collection of subsets of X and satisfy four axioms: • x lies in each of its neighborhoods. • The intersection of two neighborhoods of x is a neighborhood of x. • If N is a neighborhood of x and if U is a subset of X that contains N, then U is a neighborhood of x. • If N is a neighborhood of x and ifN denotes the set tz P N|N is a neighborhood of zu, thenN is a neighborhood of x (the setN is called the interior of N).
The assignment of a collection of neighborhoods is called a topology on the set X. A topology defined with neighborhoods is easy to conceive of, but hard to work with. On the contrary, the topology based on openness is defined. A subset O of X is open if it is a neighborhood of each of its points. A topological space is then a set X together with a collection of open subsets Ω that satisfies the four conditions: • The empty set H is in Ω. • X is in Ω.

•
The intersection of a finite number of sets in Ω is also in Ω.

•
The union of an arbitrary number of sets in Ω is also in Ω.

Homeomorphism
A function h : X Ñ Y is called a homeomorphism if it is one-one onto continuous and has a continuous inverse. When such a function exists, X and Y are called homeomorphic (or topologically equivalent) spaces. Figure 3 shows a homeomorphism between a sphere and a tetrahedron. The illustration shows a regular tetrahedron T projected onto a sphere with centerT using radial projections from the center.
π(x) x T Figure 3. Radial projection from a tetrahedron T onto a sphere with centerT. An example is shown as follows: a point x on a surface of the tetrahedron projected onto its corresponding point πpxq on the sphere with the radial projection function π.

Quotient Space
A quotient space is a set together with a topology. If X is a space and A is a set and if p : X Ñ A is a surjective map, then there exists exactly one topology Ω on A relative to which p is a quotient map, and it is called the quotient topology induced by p.
A quotient space (also called an identification space) is, intuitively speaking, the result of identifying or "gluing together" certain points of a given topological space. Figure 4 shows an example of obtaining the two-sphere S 2 by gluing the circle S 1 to a single point.

Algebraic Topology Concepts
Algebraic topology combines algebra with geometry by defining algebraic operations on geometric objects. The fundamental idea of algebraic topology is to develop methods for distinguishing between two topological spaces or two maps. The topological group is introduced. Please refer to the Appendix for the definitions of orbit space, homotopy, the fundamental group, and homology.
A topological group is a topological space with a binary operation and the inverse map, both being continuous. For example, G is a topological group if the multiplication map pg, hq Þ Ñ gh from GˆG to G and the inverse map g Þ Ñ g´1 from G to G are both continuous. One extremely useful topology group is the general linear group. For example, the general linear group over R, denoted by GLpn, Rq, is the group of nˆn invertible matrices with real entries.

Manifold Concepts
A manifold is both a Hausdorff space and a topological space that locally homeomorphic to the Euclidean space, that is, we can find a continuous bijective invertible mapping between a local area on the manifold and a local area in the Euclidean space. Furthermore, analysis can be carried out by imposing smooth structures on a manifold (similar as a differential Euclidean space). It is not sufficient to develop analysis on the manifold, but with certain methods (like parallel transport), tangent spaces at different points on the manifold are related. Essential concepts in the manifold are introduced in this section. Please refer to the Appendix for definitions of atlas, smooth manifold, section, vector bundle, fiber bundle, the tangent bundle of a vector bundle, vertical bundle, vector bundle homomorphism, vector bundle isomorphism, and connection.

Topological Manifold
Assume that M is a topological space; M is a topological manifold of dimension n if it has the following properties:  Figure 5 illustrates an example of a coordinate chart. Let M be a topological n-manifold. If pU, ϕq and pV, ψq are two charts such that U X V " ∅, then the composite map ψ˝ϕ´1 : ϕpU X Vq Ñ ψpU X Vq, also called the transition map from ϕ to ψ, is a composition of homeomorphisms and is a homeomorphism.
Two charts, pU, ϕq and pV, ψq, are said to be smoothly compatible if either U X V " ∅ or the transition map ψ˝ϕ´1 is a diffeomorphism.

Tangent Space/Tangent Bundle
Let M be a smooth manifold, p P M and M p be the set of all smooth real-valued functions, each of which is defined on some open neighborhood of p. A tangent vector to M at p is a map v : M p Ñ R such that: for all f , g P M p , λ, µ P R. The set of all tangent vectors to M at p is denoted by T p M. It is called the tangent space to M at p. Figure 6 shows an exemplary tangent space.
T M

Parallel Transport
Let M be a smooth manifold with a vector bundle connection ∇; let c : I Ñ M be a differentiable curve from an interval I into M; and let V 0 P T cpt 0 q M be a vector tangent to M at cpt 0 q for some t 0 P I. A vector field V is said to be a parallel transport of V 0 along c provided that Vptq (t P I) is a vector field for which Vpt 0 q " V 0 . The notion of a parallel transport on a manifold M clarifies the idea of translating a vector field V along a differentiable curve to attain a new vector field V 1 , which is parallel to V. Figure 8 shows an illustration of parallel transports under Levi-Civita connections. A Levi-Civita connection is a torsion-free metric connection preserving a given (pseudo-)Riemannian metric. The transport on the left side is given by the metric ds 2 " dr 2`r2 dθ 2 . The transport on the right side is given by the metric ds 2 " dr 2`d θ 2 .

Lie Group and Lie Algebra
A Lie group is a group G that is also an analytic manifold such that for σ, τ P G, the mapping pσ, τq Ñ στ´1 of the product manifold GˆG into G is analytic. Lie algebra is a vector space g over a field F with an operation r¨,¨s : gˆg Ñ g, which we call a Lie bracket, such that the following axioms are satisfied: • Bilinearity: rax`by, zs " arx, zs`bry, zs, rz, ax`bys " arz, xs`brz, ys for all scalars a, b in F, and all elements x, y, z in g. • Skew-symmetry or alternativity: rx, xs " 0, which implies rx, ys "´ry, xs for all x, y P g. • Jacobi Identity: rx, ry, zss`ry, rz, xss`rz, rx, yss " 0.

Geometric Methods for Generic Objects
In this section, various geometric methods are introduced. Reviewed methods are categorized based on the scope of the geometric properties that are extracted and the way in which the geometric properties are encoded. Some methods encode geometric attributes in features (see Section 3.1); some methods utilize concepts and theories from the topology and manifold and extract geometric properties on objects (see Section 3.2); and some methods extract geometric properties following certain procedures and denote the objects with structured representations, e.g., graph structures (see Section 3.3). There are also methods belonging to multiple categories. For example, the positive definite manifold-based methods in Section 3.1.2 belong to "feature-oriented geometric methods", and they also belong to "object-oriented geometric methods". This review selects a logically more appropriate categorization in the case mentioned above. In the following section, geometric methods are studied based on this method of categorization.
Other methods to incorporate geometric information, like regression-based methods [32][33][34][35], manifold diffeomorphisms [36], and others, are also utilized in applications like image processing. These methods are working on 2D objects and are difficult to extend to human-related analysis on 3D data, so they are not elaborated in this review.

Feature-Oriented Geometric Methods
A feature space is the space where an object is projected as a feature point. This section explores the geometric properties of the parameter spaces. Utilizing geometrics in a feature space can be implemented through exploiting neighboring properties of feature points, or through studying geometric attributes and geometric properties in the space.

Distance-Based Methods
Similarities among features extracted from the raw data can be calculated. Distances between sample pairs are extracted and are used to denote geometric attributes. Distances are constructed using similarity measures. The authors in [12] generalized from the case of vector space inputs to the case of a manifold. Distances on manifolds were calculated as geodesic distances between the data [37].

Positive Definite Manifold-Based Methods
Covariance matrices are used to capture representative features [38]. Covariance matrices describe the correlatoin between sampled data points. They are positive definite (PD) matrices and lie on PD manifolds. Temporal sequences are also capable of being embedded in the PD manifold. For example, the authors in [15] built a temporal hierarchy of covariance descriptors for human action classification. Works on computing distances on the PD manifold include [39][40][41][42].
To analyze covariance descriptors, Euclidean geometry is often not appropriate; thus, methods using non-Euclidean metrics have been proposed, e.g., [42,43]. In particular, Gram and Hankel matrices [44,45] and Bregman divergences [29,38,[46][47][48] have been successfully applied in a number of covariance descriptor-based applications. Methods considering dynamic information have also been proposed [44,45], in which dynamic information is denoted with Hankel matrices and sequences are compared using the Hankelet subspaces angle. Other examples include [49], in which the authors extended a locally aggregated descriptor (VLAD) to Riemannian manifolds.
In the special case of infinite dimensions, the authors in [50] extended covariance matrices into a Hilbert space.

Kernels over a Manifold
Kernels provide mathematical formulations for covariance matrices. The applications of this type of method include dictionary learning and sparse coding [29,30,51].
Usually, kernels over a manifold are implemented over the Riemannian manifold because the original manifold is required to have distance measures. A Riemannian metric on a manifold M is a smoothly-varying inner product x¨,¨y on the tangent space T x M at each point x P M. A Riemannian manifold is a manifold equipped with a Riemannian metric. Some works embed Riemannian manifolds into the reproducing kernel Hilbert space (RKHS). RKHS is a linear space, so it is simple and effective representation. There are also other types of kernels, for example the geodesic exponential kernel in [52], which provides a kernel-based solution for the general Riemannian manifolds.

Moduli Space
For the specific task of classification, moduli space is a natural solution. Moduli spaces can be thought of as geometric solutions to geometric classification problems. Such spaces are the space of equivalence classes of complex structures, where two complex structures are deemed "the same" if they are equivalent by conformal mapping [53]. Two equivalent objects may look very different; but in a moduli space, equivalent objects have the same description, while inequivalent objects have different descriptions.

Object-Oriented Geometric Methods
In object-oriented methods, the geometric attributes of an object are extracted and studied.

Tangent Space-Based Methods
Tangent spaces (defined in Section 2.4.3) are associated with each point on a manifold. Some of the tangent space-based methods utilize mappings between the tangent space of the manifold and the manifold. An exponential map is a map from the tangent bundle of the manifold to the manifold. In addition, a logarithmic map is its reverse map. The exponential and logarithmic maps are illustrated in Figure 9. The authors in [12] used the Riemannian exponential and logarithmic maps to define a sparse representation on Riemannian manifolds. The formulation is a generalization of the linear sparsity condition to manifolds.

Conformal Geometry-Based Methods
Computational conformal geometry is an interdisciplinary field combining computing and conformal geometry. A conformal mapping is an angle-preserving mapping, and computational conformal geometry designs its algorithms in computing. The authors in [53] presented a thorough description of the theoretical foundations, as well as the practical algorithms of computational conformal geometry. A widely-used application of conformal geometry is in matching two object models. For example, the authors in [16] utilized it to find shape correspondences between two objects. It conformally maps the interior of an n-gon P bijectively to that of another n-gon Q. This mapping can be utilized to embed 3D meshes onto a 2D plane. However, when this map is extended to the boundary, it does not necessarily map the vertices of P to those of Q. For many applications, it is important to identify the "best" vertex-preserving mapping between two polygons, i.e., one that minimizes the maximum angle distortion. It can be considered as conformal geometric methods implemented in a greedy way. Such maps exist, are unique, and are known as extremal quasiconformal maps or Teichmüller maps.  Figure 9. Illustration of the exponential and the logarithmic maps. The example point of g on the manifold M is mapped to a point on the tangent plane T e M using a logarithmic map Log M pgq. The exponential map exp M puq is the reverse of the logarithmic map.

Principal Geodesic Analysis
Principal geodesic analysis (PGA) is an extension of principal component analysis (PCA) to manifolds. PGA has applications in shape analysis [54], and probabilistic PGA was utilized [55] to solve human activity recognition.
Since the objective function in the PGA algorithm is highly non-linear and generally difficult to solve efficiently, researchers who first introduced PGA [56] proposed a linear approximation. Exact computation can also be obtained under certain constraints. For example, the authors in [57] presented an exact computation of the PGA of data on the rotation group SOp3q. For constrained manifolds, like the constant curvature Riemannian manifolds in [58], optimization in PGA could be computed efficiently. The authors in [59] also proposed an exact PGA computation method without any linearization for data with a large variance.

Routine-Based Geometric Methods
Following certain routines, geometric information can also be encoded. Reducing representation dimensions, representing objects with a graph, and topological data analysis are all utilized to encode geometric information.

Dimension Reduction-Based Methods
Dimension-reduced representations (also called embeddings) are utilized to study feature space properties [60]. Considering the geometric properties of the feature representation, some non-linear dimension reduction algorithms have been utilized, e.g., the Laplacian eigenmaps (LE) framework, which recovers the low-rank structure of the manifold in a projected space. Laplacian eigenmaps [61] use graphs to find the embedding of the data in a low-dimensional space.

Graph-Based Methods
Graphs are concise representations for structural data. Graphs consist of units and connections. Units are connected if certain criteria are met. One wide application of graphs is to construct a mesh model from point clouds, in which units are connected if the distance between a pair is below a threshold. After graphs are constructed, clustering is usually utilized to explore the geometrically-adjacent attributes. One method of clustering a point cloud is single linkage clustering and its extensions [74,75]. In the single-linkage clustering methods, a graph is constructed with the vertex set as the set of points in the cloud and the edges as point connections if their distance is less than a threshold.
Under the assumption that high-dimensional data samples lie on or close to a smooth low-dimensional manifold, and the manifold can be approximated discretely as a graph, graphs can also be utilized to describe the low-dimensional intrinsic structure of the high-dimensional data. The emerging field of signal processing on graphs also facilitates the graph representation of signals [76].

Topological Data Analysis
In a broader perspective, topological data analysis (TDA) is an approach to analyzing data using topological methods [77,78] and is closely related to persistent homology, an adaptation of homology (defined in Appendix A.3.4) to point cloud data.
The TDA mentioned here refers to certain procedures for extracting topological properties from point cloud data. For example, the authors in [79] analyzed the geometric adjacency properties of an object and represented the object as a graph composed of nodes denoting key parts of the object. The graph considered the topological properties of the object. Topological properties are denoted by a topological network, i.e., a collection of nodes and a collection of edges connecting some of the nodes. Figure 10 shows the pipeline of the proposed method. TDA summarizes the data in a way that keep its global structure and local details to some degree, which is missing in other analysis methods, such as principal component analysis (PCA), multidimensional scaling (MDS), and cluster analysis.

410
For articulated objects, like human bodies, extrinsic properties are not capable of describing 411 their intrinsic properties, like shapes and symmetric properties. Although suffered from topological 412 noise, isometry-preserving properties are widely used for human related analysis, e.g., methods 413 from [146,147,149,151,152] are utilized for human shape analysis and the method from [148] is utilized 414 for human shape recognition. Geometric methods are also isometry-preserving methods. In this 415 section, geometric methods applied in human-related analysis are explored. Aiming at various 416 application scenarios, different geometric methods are utilized including those introduced in Section 3.

417
The methods in this section are classified based on applications. There are also works on literature 418 reviews for specific applications, e.g., mesh segmentation [141], shape analysis [142,143], or shape 419 retrieval [144]. General shape analysis has a wider scope than HSA. Shape comparisons, computing shape 422 summary statistics, mathematical modelling of shape variations, and shape synthesis are all included 423 in general shape analysis. 3D human shape synthesis is plausible using general shape synthesis 424 methods and this review concentrates on analyzing the human models instead of editing them so 425 shape synthesis is not the focus of this review. Shape summary statistics and shape variation modelling

Geometric Method-Based Human-Related Analysis
For articulated objects, like human bodies, extrinsic properties are not capable of describing their intrinsic properties, like shapes and symmetric properties. Although suffering from topological noise, isometry-preserving properties are widely used for human-related analysis, e.g., the methods from [80][81][82][83][84][85] are utilized for human shape analysis, and the method from [86] is utilized for human shape recognition. Geometric methods are also isometry-preserving methods. In this section, geometric methods applied in human-related analysis are explored. Aiming at various application scenarios, different geometric methods are utilized including those introduced in Section 3. The methods in this section are classified based on the applications. There are also works with literature reviews for specific applications, e.g., mesh segmentation [87], shape analysis [88,89], or shape retrieval [90].

Human Shape Analysis
General shape analysis has a wider scope than HSA. Shape comparisons, computing shape summary statistics, mathematical modeling of shape variations, and shape synthesis are all included in general shape analysis. 3D human shape synthesis is plausible using general shape synthesis methods, and this review concentrates on analyzing the human models instead of editing them, so shape synthesis is not the focus of this review. Shape summary statistics and shape variation modeling-related methods are discussed in the human pose-related analysis subsection. In this section, shape comparisons are discussed.
In computer graphics, object shapes are usually compared through a metric, or the dissimilarity measure. Geodesics are important for computing distances between object samples in representation space (e.g., a shape space) or on the shape surface. Spectral analysis is one widely-used method for measuring 3D human shape geodesics. Spectral analysis is an analysis in terms of eigenvalues (e.g., heat kernel signature-based method in Section 4.1.1), frequency spectrum (e.g., the learned spectral descriptor-based method in Section 4.1.3), etc.
Furthermore, diffusion geometry has been studied and utilized to describe intrinsic geometric properties of objects. In diffusion geometry, the distances between points are denoted in a way so that this is transformed into a metric learning problem, and various kernels are used, including the heat kernel, the wave kernel, etc.

Heat Kernel-Based Methods
The behavior of a quantum particle on the manifold is modeled by the Schrödinger equation: where ψpx, tq is the function capturing the particle behavior, and ∆ M is the Laplace-Beltrami operator (LBO) of M: which is the divergence of the gradient. The divergence is the extent to which some quantity is exiting an infinitesimal region of a space, and the gradient is a multi-variant version of the derivative. LBO is the generalization of the Laplacian on Riemannian manifolds. Given an initial heat distribution f : M Ñ R, let H t p f q denote the heat distribution at time t: H t " e´t ∆ M . The heat kernel is based on the exponential function of the eigenvalues of the LBO [91]: The heat kernel signature (HKS) [92] is a dense descriptor constructed by considering the diagonal of the heat kernel: It is also known as the autodiffusivity function. Additionally, the HKS of dimension Q at point x is defined by sampling the autodiffusivity function at some fixed times t 1 , . . . , t Q : fpxq " pk t 1 px, xq, . . . , k t Q px, xqq T .

Wave Kernel Signature-Based Methods
The wave kernel signature (WKS) evaluates the probability of a quantum particle being located at a point of a manifold under a certain energy distribution. The probability of finding the particle at point x is given by: The definition depends on the initial frequency distribution πpλq. For example, the authors in [93,94] considered a log-normal frequency distribution π ν pλq " expp log ν´log λ 2σ 2 q with mean frequency ν and standard deviation σ. The Q-dimensional wave kernel signature (WKS) is defined as: where p ν pxq is the probability Equation (6) corresponding to the initial log-normal frequency distribution with mean frequency ν, and ν 1 , . . . , ν Q are some logarithmically-sampled frequencies.

Learned Spectral Descriptor-Based Methods
Under the proposition that the descriptor should consider the statistics of the corpus of shapes (for example, thin and fat human models) and those of the class of transformations (such as human pose variations), the authors in [95] proposed a learning scheme for the construction of optimized spectral descriptors and formulated the descriptor in a generic form: where τpλq " pτ 1 pλq, . . . , τ Q pλqq T is a bank of transfer functions acting on the LBO eigenvalues, and the parametric transfer function: is defined in terms of the B-spline basis β 1 pλq, . . . , β M pλq and the parametrization coefficients a qm pq " 1, . . . , Q, m " 1, . . . , Mq. Plugging Equation (9) into Equation (8), the q th component of the spectral descriptor is represented as: where gpxq " pg 1 pxq, . . . , g M pxqq T is a vector-valued function dependent only on the intrinsic geometry of the shape. Thus, Equation (8) is parametrized by the QˆM matrix A " pa qm q and can be written in matrix form as fpxq " Agpxq. The main idea of [95] is to learn the optimal parameters A by minimizing a task-specific loss, which reduces to Mahalanobis-type metric learning. Figure 11 visualizes the distances computed from the three kernels mentioned in this section, and Figure 12 shows the computed correspondences between two human models using the three kernels. Figure 11. Three kernel-based distance visualized on human models. Visualized distances between the reference point (pointed with red arrows in the first column of each sub-group) and other points on the model. On the left, the reference point is the right writs, in the middle the belly, and on the right the chest. The first row shows the results from the heat kernel, the second row shows the results form the wave kernel, and the third row shows the results of the proposed kernel in [95]. Dark   Heat kernel signature (HKS), wave kernel signature (WKS), and learned spectral descriptors for point matching between human models. Correspondences computed on TOSCA shapes with geodesic distance distortion below 10% of the shape diameter using the heat kernel signature, wave kernel signature, and learned spectral descriptor (from left to right) [

Human Pose-Related Analysis
Pose space deformation methods are widely used in human pose morphing. Based on the pose space deformation methods, model reduction has proven useful to increase the performance of static pose-space deformation both with [96][97][98] and without dynamics [99]. Given morphing targets, some works [96] constructed a single pose-independent basis by performing PCA on the sets of bases computed at the underformed configuration. Others obtained the basis by performing PCA on full simulation data [97][98][99]. To accommodate large deformations, the basis can be improved using modal derivatives [100] or linear transformations of the basis [101].
Pose-space subspace methods are utilized in human pose representation to construct reduced-order models with pose-dependent bases [102]. Variant subspace is computed for each representative set and these subspace is further combined into a dynamic system.
In Euclidean space, adding two poses might result in a physically-infeasible pose. Methods for representing 3D human poses with Lie groups have been proposed to solve this problem [25]. Lie group theory provides a semantically meaningful space for adding and subtracting human poses.

Human Action-Related Analysis
Human actions are recognizable from both still images and videos (or image sequences). When dealing with videos (or image sequences), temporal information is beneficial to boost the action recognition accuracy.

Relative 3D Geometry-Based Methods for Human Action Recognition
Many of the skeleton-based approaches for human action recognition use joint locations and joint angles to represent human poses. For example, the authors in [103] introduced a family of skeletal representations for HAR. The family of the proposed features used the relative 3D rotations between various body parts. They were split into two groups: four transformation-based features and two rotation-based features. Using the proposed representations, human actions are modeled as curves in the R3DG feature space (illustrated in Figure 13). Action recognition is then performed by classifying these curves with a combined method of dynamic time warping, Fourier temporal pyramid representation, and support vector machines.

Matrix Embedding for 3D Human Action Recognition
Hankel matrices carry useful invariant properties, e.g., the rank of the Hankel matrix measures the complexity of the underlying dynamics [45]. Despite its advantages, Hankel matrices are not robust against noise. The authors in [104] embedded the sequences into a Riemannian manifold by using positive definite regularized Gram matrices of their Hankelets. Gram matrices inherit the rank and invariance properties of the associated Hankel matrices. Furthermore, Gram matrices are confined to the positive semi-definite (PSD) manifold and capture the underlying geometry better than directly comparing the sequences or Hankel matrices.

Graph-Based Human Action Recognition
Graph-based algorithms have been widely used for action recognition in conventional RGB videos [105][106][107]. Interesting works include graph representations for high-level features. For example, the authors in [108] proposed a graph representation for skeleton-based 3D action recognition. A node of the graph is modeled as a motionlet, which is a semantic part of the trajectory of a joint. The edge is labeled as spatiotemporal relationships between connected motionlets. Constructed graphs are decomposed into substructures called subgraphs, and these subgraphs are compared based on a proposed graph kernel named the subgraph-pattern graph kernel (SPGK). The proposed kernel considers both spatial and temporal information. To circumvent the NP-hard problem of extracting all subgraph patterns from a graph, the authors reformulated the kernel using dynamic programming.

Lie Group-Based Human Action Recognition
Given human skeletal representations in a Lie group, human actions can be represented as curves in this Lie group. The authors in [109] used this type of method. First, a skeletal representation was proposed to explicitly model the 3D geometric relationships between various body parts using rotations and translations in the 3D space. The proposed skeletal representation lies in the Lie group SEp3qˆ¨¨¨ˆSEp3q, which is a curved manifold. Using the proposed representation, human actions can be modeled as curves in this Lie group. Due to the difficulty of classifying curves in the Lie group, the action curves from the Lie group are mapped to its Lie algebra, which is a vector space. Then, classification is performed with a combined method of dynamic time warping, Fourier temporal pyramid representation, and linear SVM.
The authors in [110] used a similar pipeline of first representing skeletons with Lie groups and then classifying the actions, represented as curves, in Lie groups. Specifically, each skeleton is represented using the relative 3D rotations between various body parts. The skeletal representation is a point in the Lie group SOp3qˆ¨¨¨ˆSOp3q. Then, using this representation, human actions are modeled as curves in this Lie group. The action curves are mapped onto its Lie algebra by combining the logarithm map with rolling maps, and classification is performed in the Lie algebra.

Dynamic Manifold Warping for Human Action Recognition
For temporal misalignment problems on a manifold, dynamic time warping algorithms are adapted for solving human action recognition problems. For example, the authors in [111] proposed a spatiotemporal manifold (STM) model to analyze human action trajectories with latent spatial structure. Action sequences were aligned with respect to latent parameters, which encoded a path as a point moving on a manifold from a starting point with a parameter value of zero to an ending point with a parameter value of one. In addition, a motion similarity metric was proposed for human action sequences, both in 2D and 3D.

Geometric Deep Learning for Human-Related Analysis
Deep learning has achieved remarkable performance breakthroughs in speech recognition, natural language processing, and computer vision. In particular, convolutional neural network (CNN) architectures perform well on many image analysis tasks such as classification [112], segmentation [113][114][115], regression [116], and synthesis tasks [117]. A convolution can be thought of as a template matching with filters, and convolution operations on a whole image are carried out by a sliding window procedure. In the case of processing images, one extracts a patch of pixels within a window, correlates it with a template, and moves the window to the next position. Recently, geometric deep learning [118,119] has been the focus of considerable research attention (http://geometricdeeplearning.com/, https://sites.google.com/site/deepgeometry/), while literature reviews on specific applications remain absent.
In this section, geometric deep learning methods and their applications in human-related analysis are studied extensively. Based on how geometric information is utilized, by directly applying traditional convolution operations to geometric objects or by redefining convolution operations and traversing methods on manifolds, geometric deep learning is classified into extrinsic deep learning methods and intrinsic deep learning methods. Figure 14 compares these two types of methods implemented with CNN. Extrinsic CNN (the left subfigure in Figure 14) extends the traditional convolution operation from 2D to 3D and does convolution using the 3D templates shown as the cube in the figure. On the contrary, intrinsic CNN (the right subfigure in Figure 14) defines convolution on the manifold, i.e., along the object surface, and the dimensions of the convolution operations can be considered as 2D. Feature pooling is also an important module in the deep learning architecture, and it is crucial for dimension reduction. Therefore, Section 5.1 introduces feature pooling methods, and the rest of this section explains various ways to define convolutions.

Geometric Feature Pooling
Feature pooling is a key component for reducing representation dimensions. Two prevailing pooling techniques, namely average and max poolings, are not theoretically optimal due to the unrecoverable loss of the spatial information. The authors in [121] proposed generalizing previous pooling methods towards a weighted p -norm spatial pooling function tailored for class-specific feature distributions. Specifically, the pooled features are weighted by the image location of a specific visual word. The original method was proposed under the bag of words (BoW) pipeline, but theoretically, it can be adapted to the deep learning architecture.

Extrinsic Deep Learning
Deep CNNs have recently been adapted to process 3D data by generalizing standard 2D convolutions to 3D. These methods of treating geometric data are called extrinsic methods. Their applications include processing 3D geometric shapes, for example, 3D object detection from RGB-D data [122], object classification of point clouds data [123], 3D object local feature matching [124], and 3D deformation flows [125].

Volumetric CNN for Shape Analysis
A natural extension to the classic CNN that processes 2D images is to process 3D data using a volumetric representation and perform 3D convolutions. The authors in [28] presented a 3D deep learning framework for modeling shapes using a voxel representation for 3D object shapes, called ShapeNets. The approach represents a geometric 3D shape as a probabilistic distribution in a voxel grid, and a convolutional deep belief network is used to learn the joint distribution of all voxels. The dataset and the source code are available (http://3DShapeNets.cs.princeton.edu). This generic shape analysis algorithm is applicable to human body models.

Geometric Constrained Extrinsic CNN for Human Shape Analysis
Instead of adapting the convolution operations in the network, geometric information can also be incorporated through other measures. Traditional classification neural networks tend to separate the surface points lying in different, but nearby classes, which results in ambiguous point categories at the segmentation boundaries. To solve this problem, the authors in [126] proposed smoother feature representations. The CNN network consists of layers of descriptor extractions and a classification layer and removing the classification layer after training leaves the descriptor extraction network. This architecture is widely used for feature extraction. Extracted features are then fused with an ensemble of classification tasks. To ensure descriptor smoothness, the authors proposed randomizing the dense-label generation procedure. Specifically, multiple segmentations of the same person were considered (shown in Figure 15), and a classification problem was introduced for each. The source code and the dataset are available (https://github.com/halimacc/ DenseHumanBodyCorrespondences).

Intrinsic Deep Learning
Alternatively, the convolution operations and how the convolution operates over the whole object are redefined on a manifold. This type of methods are called intrinsic methods.

Spatial-Domain Geometric CNN for Human Shape Analysis
A straightforward method for defining an intrinsic equivalent of a convolution is through the spatial domain. One method is to consider local receptive fields, in which the grid is replaced by a weighted neighborhood. Figure 16 shows an exemplary construction of a spatial-domain geometric CNN. Figure 16. Spatial construction of geometric CNN. K(K = 2 in the example) scales are considered. Ω k is defined as a partition of Ω k´1 into d k clusters. Each layer of the network transforms a f k´1 -dimensional signal indexed by Ω k´1 into a f k -dimensional signal indexed by Ω k . The figure is originally from [127].
Another approach utilizes local polar coordinate systems. The authors in [128] defined the patch operator as a combination of Gaussian weights defined on a local polar system of coordinates (shown in Figure 17). After extracting the local geodesic coordinate system, the geodesic patch operator is defined as: where w θ and w ρ are the angular weight and the radial weight, respectively. An angular max pooling was used due to the difficulties of fixing the angular axes at each sampled point, leading to the following definition of the geodesic convolution: Furthermore, Fourier transform layers and covariance layers are also defined to transform signals into the frequency domain and inspect the global features from all input dimensions.

Spectral Analysis-Based Intrinsic CNN
Another type of method generalizes the convolution operator with the spectrum analysis. A fundamental result of classical Euclidean signal processing states that the Fourier transform diagonalizes the convolution operator [119]. Then, convolutions may be extended to general manifolds by finding the corresponding basis. In the case of graph representations, the convolution operator can be carried out with the spectrum of its graph Laplacian. For example, in [127], convolution operations are defined as follows: for each layer k " 1 . . . K, an input vector x k of size |Ω|ˆf k´1 is transformed into an output x k`1 of dimensions |Ω|ˆf k : where F k,i,j is a diagonal matrix, V is composed of the eigenvectors of the Laplacian, and h is a real-valued non-linear function. In addition, filters with constant spatial support are obtained by choosing specific sampling steps in the spectral domain.

Localized Spectral CNN for Human Shape Analysis
One drawback of spectral analysis is the difficulty in the spatial localization. Spectral analysis is global because the basis functions are global. There are studies that specialize in spatial localization through operations on the spectral domain. In [129,130], these operations were achieved through windowed Fourier transform on the spectral domain.
The windowed graph Fourier transform (WGFT) of a signal f [129,130] can be defined through the filtering signal g: pS f qpx, kq :" x f , g x,k y, where g x,k pnq is a windowed element centered at vertex x and frequency k: pλ l qχl pxqχ l pnq.
Then, WGFT can be reformulated as: The WGFT pS f qpx, kq filters signal f at point x at frequency k with a window defined byĝ l . By collecting its behavior over different frequencies, the content of signal f in a local support around x is extracted, thus reproducing the window extraction on images. The localized spectral convolution layer can thus be defined as: where f p pp " 1, . . . , Pq is the input signal, W " pw q,k,p q is a QˆKˆP tensor representing the learnable weights, and f out q pq " 1, . . . , Qq is the output signal.

Heat Diffusion CNN for Human Shape Analysis
The heat diffusion equation is also used for extending traditional CNN to a manifold. Heat diffusion measures heat diffused on a manifold. The heat propagation on a shape X is governed by the heat diffusion Equation (1). Given the initial heat distribution a delta function centered on x, the heat distribution on X after some time t is represented by the heat kernel h t px,¨q. The heat kernel, as formulated in Equation (3), is isotropic. Generalized heat diffusion is described by the anisotropic diffusion equation: where ∇ X and div X denote the intrinsic gradient and divergence operators and f px, tq is the temperature at point x at time t. The thermal conductivity matrix Apxq specifies the heat conductivity properties at each point on shape X. The general diffusion model can be utilized for shape analysis [131]. The authors in [120] defined the thermal conductivity matrix as: where the matrix R θ pxq performs rotation of θ w.r.t. the reference direction (e.g., the maximum curvature) and α ą 0 is a parameter controlling the degree of anisotropy.
In the spectral domain, the anisotropic heat kernel is given by: where φ αθ,k pxq and λ αθ,k are the eigenfunctions and eigenvalues of the anisotropic Laplacian ∆ αθ "´divpA αθ pxq∇q. In [120], such kernels were used as the weighting functions for the construction of the patch operator: Similar to the spectral analysis-based intrinsic CNN, heat diffusion CNN is composed of sequentially stacked layers, i.e., the output of the previous layer is used as the input to the subsequent layer, and the convolution operation is replaced by a layer tailored for heat diffusion.

A Unified Spatial-Domain Geometric Deep Learning Architecture for Human Shape Analysis
The authors in [132] proposed a unified geometric CNN generalizing the CNN to non-Euclidean domains. Instead of using fixed handcrafted weight functions, parametric kernels with learnable parameters were proposed. Particularly, a Gaussian kernel with learnable parameters was used: where ř ř ř j and µ j are learnable dˆd and a dˆ1 covariance matrix and mean vector. Various non-Euclidean CNN methods previously proposed in the literature can be considered as particular instances of the proposed framework.

Geometric Structures over Deep Learning for Human Action Recognition
There are also studies on learning geometric structures over CNN. For example, the authors in [133] proposed a deep discriminative structured model, namely convolutional neural random fields (CNRFs), and applied it to the action recognition problem. In the proposed model, a spatiotemporal CNN was developed for feature learning from input image frames, and the CNN was combined with conditional random fields (CRFs) for capturing the interdependencies between outputs. The parameters from both CRF and CNN were learned in a joint fashion, which enabled structured prediction and feature learning.

Generalized Geometrics for Human-Related Analysis
General information denoting spatial or temporal distributions and attribute occurrences can be considered as geometric information in a generalized perspective. They are also useful information for boosting HRA. They are named generalized geometrics and are further classified into three sub-categories and introduced in this section.

Spatial Geometrics for Human Pose-Related Analysis and Human Action-Related Analysis
In HPA, spatial geometrics can be encoded as local structural features. The authors in [134] proposed a local joint structure as a complement for global features of individual body part locations and combined the two features for posture description. Local joint structures, specifically the triangle area of the three consecutively adjacent joints, were computed. It is a complement for body part locations in the sense that body part locations denote a single body part, while the proposed joint structure contains relative joint positions. Then, classification was performed with a combined method of dynamic time warping, Fourier temporal pyramid representation, and linear SVM.
Some works directly explored the neighboring properties. For example, the authors in [135] proposed a geometric correspondence feature named the Trisarea feature. It describes neighboring properties between human body joints and is defined as the area of the triangle formed by three joints. This feature is utilized to identify human poses, of which variations over time capture the characteristics of human action.
Furthermore, in HAR, spatial geometrics can be encoded in features. For example, relative positions, distances between body joints, etc., are effective spatial geometric features. In [136], features from 3D skeleton data were processed separately by LSTM and CNN to conduct effective recognition with later fusion. Spatial features such as relative position, the distance between joints, and distances between joints and lines were explored, while temporal features such as the joint distances map and the joint trajectories map were studied. Spatial features were fed into LSTM, and temporal features were fed into CNN for recognizing actions.
Another way of encoding spatial geometrics is through modeling the co-occurrence of actions. The co-occurrence of actions was modeled in a probabilistic way without supervision in [137]. Videos containing human actions are considered as a sequence of short-term action clips (action words), and an activity is considered as a set of action topics indicating which actions are present in the video. A probabilistic model relating the action words and the action topics was proposed. It modeled long-range action relations that exist in the complex activity. The model was applied to unsupervised action segmentation and recognition and to detect forgotten actions, namely action patching.

Temporal Geometrics for Human Action Recognition
Temporal geometrics can be encoded by directly modeling the dynamics in a geometric space. For example, the authors in [138] proposed a second-order stochastic dynamical model in the state space (a Riemannian manifold) of articulated objects and derived equations of a Riemannian extended Kalman filter to perform the structure estimation from an image sequence captured by a camera from one perspective. The proposed model was proven by the authors to be locally weakly observable.
Furthermore, motion dynamics can be described in the original feature space. One widely-used measure is through scene flow. Scene flow describes the motion of 3D objects in the real world and implicitly describes the geometry of the 3D objects in motion. Scene flow can be considered as an optical flow fused from multiple cameras. The authors in [27] proposed the extraction and use of scene flow for action recognition from RGB-D data.

Spatial-Temporal Geometrics for Action Segmentation and Action Recognition
Action segmentation algorithms mine temporal segments containing actions from untrimmed videos. By incorporating a spatial component that represents the relationships between objects and a temporal component to capture object relationships across time, the method in [139] achieved better performances.
For action recognition problems, spatiotemporal information can be extracted through feature extraction and network extraction. The authors in [140] presented SkeletonNet, a deep learning framework for skeleton-based 3D action recognition. Cosine distance (CD) and normalized magnitude (NM) features were proposed and extracted from each frame of the skeleton sequence. Instead of treating the features of all frames as a time series, the authors fed extracted features to the proposed deep learning network, which contained two streams, one to extract the general features from the CD feature, while the other processed the NM feature. Outputs from the two streams were concatenated and processed by a fully-convolutional layer and then classified.
Furthermore, spatiotemporal information can be considered by modifying the deep learning network structure. For example, the authors in [141] proposed a differential gating scheme for a long short-term memory (LSTM) neural network and incorporated the spatial dynamics in action motions. The information gain was achieved by the derivative of states (DoS). The LSTM neural network utilizes three types of gating schemes for learning representations from long input sequences. The proposed method considered spatial information by incorporating DoS from the previous state into the input and forget gate and DoS from the current state into the output gate (as shown in Figure 18). Another example is [142], in which the authors extended the RNN-based methods from temporal domains to spatiotemporal domains and applied them to analyze action-related information within the input data.
The spatiotemporal information can also be learned from channels other than RGB data. The authors in [26] combined spatiotemporal geometric features from depth images and joint positions to solve human action recognition problems. The method learned spatiotemporal features by constructing a 3D-based deep CNN (3D 2 CNN) for depth sequences. Depth images and joint positions were processed separately and fused in a later stage. Furthermore, spatiotemporal discrimination can be utilized to recognize human actions at different speeds. For example, the authors in [143] achieved this through considering spatiotemporal discrimination and action speed variations.

Validation Datasets
Publicly-available datasets for validating HRA are collected and categorized according to data types and applications. The datasets are classified based on their data type and their targeted applications: 3D human datasets, composed of 3D human models mainly for human shape analysis; 3D human action datasets with 3D data for human action analysis; RGB-D people datasets, composed of RGB-D data for people detection and people tracking; RGB-D human pose datasets for human pose analysis; and RGB-D human action datasets with RGB-D data for human action analysis.

3D Human Datasets
In this section, public datasets on 3D humans are collected. These datasets are utilized to validate applications such as shape analysis, including deformable shape matching and shape retrieval. 3D human data with noise and partial 3D human model analysis are also considered.

KIDS Dataset
This dataset (https://vision.in.tum.de/data/datasets/kids) consists of two shape classes ("kid" and "fat kid", as shown in Figure 19) under different poses, where the same poses are applied to both classes. The 3D shapes undergo nearly isometric and within-class deformations. All shapes in the dataset are given in OFF format and have around 60k vertices and consistent triangulations.

ShapeNet
ShapeNet is a well-maintained, large-scale dataset of 3D shapes. ShapeNet is composed of several subsets: (1) ShapeNetCore [144], including 55 common object categories (approximately 51,300 unique 3D models), 12 object categories of PASCAL 3D+, and a popular computer vision 3D benchmark dataset. (2) ShapeNetSem [145], including 12,000 models of 270 categories and annotated with manually-verified category labels, consistent alignments, real-world dimensions, estimates of their material composition at the category level, and estimates of their total volume and weight.

3D Shape Dataset with Noise
This dataset (https://vision.in.tum.de/data/datasets/topkids) consists of a collection of 3D shapes under deformations including topological changes [20]. The dataset has the ground-truth matching the null shape for all shapes, but not all vertices have a match due to topological changes.

Partial Shape Dataset
The Partial Shape Dataset (https://vision.in.tum.de/data/datasets/partial) includes two datasets, one is the cuts dataset with 456 partial shapes and the other is the holes dataset with 684 partial shapes. These two datasets exemplify different kinds of partiality: The cuts dataset contains shapes with a single cut; The holes dataset contains irregular holes and multiple cuts. Examples from the dataset are shown in Figure 21. The datasets provided can be used for deformable 3D shape matching and retrieval under partiality transformations [18].

HumanEva Dataset
The HumanEva-I and HumanEva-II (http://humaneva.is.tue.mpg.de/) datasets were obtained from a motion capture system. The HumanEva-I dataset contains seven calibrated video sequences (four grayscale and three color) with synchronized 3D body poses. The dataset has 4 subjects of 6 actions, including "walking", "jogging", "gesturing", etc. The dataset is split into training, validation, and testing sets. Also, the error measurements of the 2D and 3D poses are provided.

RGB-D People Datasets
The RGB-D People Datasets (http://www2.informatik.uni-freiburg.de/~spinello/RGBD-dataset. html) contain people in RGB-D Kinect data with annotaions. This datasets are composed of more than 3000 RGB-D frames. In the datasets, mostly are upright walking and standing persons. The persons are under differnt occlusion conditions. This dataset has been re-annotated in [150,151]. Examples from the dataset are shown in Figure 22.

RGB-D Human Tracking Dataset
There are five validation videos with ground-truths, and 95 evaluation videos in the RGB-D Human Tracking Dataset (http://tracking.cs.princeton.edu/dataset.html). Captured by Kinect v1, each sequence has its RGB images and depth images. Captured videos contain moving objects such as humans, balls, and cars and are labeled with per-frame bounding boxes covering only the target object. The authors in [152] presented a quantitative comparison of various algorithms on this dataset. Examples and annotations from the dataset are shown in Figure 23.

Human Daily Activity Dataset
The authors in [137] collected an RGB-D activity video dataset recorded by the Kinect v2, containing human daily activities composed of multiple actions interacting with different objects.

Cornell Activity Datasets
Cornell Activity Datasets CAD-60 and CAD-120 are two RGB-D human activity datasets (http: //pr.cs.cornell.edu/humanactivities/data.php) containing skeleton and RGB-D data. RGB-D data have a resolution of 240ˆ320, of which the RGB data are saved as three-channel 8-bit PNG files, and the depth data are saved as single-channel 16-bit PNG files.

50 Salads Dataset
The dataset (http://cvip.computing.dundee.ac.uk/datasets/foodpreparation/50salads/) captures 25 people preparing two mixed salads each and contains over four hours of the annotated accelerometer and RGB-D video data. The RGB video data have a resolution of 640ˆ480 pixels at 30 Hz and the depth maps a resolution of 640ˆ480 pixels at 30 Hz, and the three-axis accelerometer data are at 50 Hz [153].

UR Fall Detection Dataset
This dataset (http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html) contains 70 (30 falls`40 activities of daily living) sequences [154]. Fall events were recorded with two Microsoft Kinect cameras and corresponding accelerometric data. Examples from the dataset are shown in Figure 24.

Tum Kitchen Dataset
The TUM Kitchen Dataset contains several subjects sitting by a table. Some perform activities simulating a robot, transporting items one-by-one; while others behave more human-like and grasp as many objects as they can in one performance. And for each subject performing reaching and grasping, there are two trials.

Performances of Related Works
The performances of various geometric methods for HRA studied in this review are compared in terms of estimation accuracy or estimation error and shown in Table 1. The methods are categorized based on their applications, i.e., HSA, HPA, or HAA. Due to the characteristics of the specific application, the number of methods in each category varies. For example, for HRA and HPA, many algorithms measure their performance by quality, while for HAA, many algorithms validate their performance based on recognition accuracy. In each category, the methods are listed in the chronological order of publication, and then in alphabetical order by the method names. For each validation dataset, the best recognition accuracy (in percent) or the minimum estimation error (in centimeter) among all experiment settings are listed.
From the table, we can see that the average precision of HRA and mostly HAA is quite high. Except for some difficult datasets (i.e., "PASCAL VOC 2011", "PASCAL VOC 2012", "ChaLearn LAP IsoGD", "50 Salads", and "JIGSAWS"), the recognition accuracy was above 80% for all validation datasets. The "Enhanced-LSTM-based method" and "Gram matrix-based method" achieved 100% accuracy on three of the validation datasets.
For HSA, the paper reviews the related works on human shape correspondence, human model symmetry analysis, and human shape recognition. For human shape correspondence, the authors in [83] represented a deformation field as a linear operator on real-valued functions on the shape and gave the state-of-the-art performance on human shape correspondence. An exemplary result is illustrated in Figure 25. Another exemplary work is from [95], and its visualized results are illustrated in Figure 12. Quantitative measurement of the method from [126] is shown in the human shape analysis section in Table 1. symmetry analysis, and human shape recognition. For human shape correspondence, authors in [151] 873 represent a deformation field as a linear operator on real-valued functions on the shape and gives 874 the state-of-the-art performance on human shape correspondence. An examplar result is illustrated 875 in Fig. 25. Another examplar work is from [71] and its visualized results are illustrated in Fig.12.

876
Quantitative measurement of the method from [125] is shown in human shape analysis section in 877 Table 1.  For human body shape recognition, a state-of-the art method [148] is proposed based on a geodesic 883 distance matrix. A recognition rate of 100% is obtained on the TOSCA database. Some examplar results 884 are illustrated in Fig. 27.  Table 1. For human model symmetry analysis, the authors in [84] proposed a numerical framework for the analysis, addressing the problems of full and partial exact and approximate symmetry detection and classification. The exemplary results are illustrated in Figure 26. Note that the increase in regularity results in the shortening of the boundary at the expense of the symmetry of the part.
For human body shape recognition, a state-of-the art method [86] was proposed based on a geodesic distance matrix. A recognition rate of 100% was obtained on the TOSCA database. Some exemplary results are illustrated in Figure 27.
For HPA, the review studies human pose space modeling and human pose estimation. The method of using a pose-space subspace method [102] gives a good performance on modeling the human pose space. The proposed method uses secondary soft-tissue finite element method (FEM) dynamics computed under arbitrary rigged or skeletal motion. Experiment comparisons are illustrated in Figure 28. The performances of the human pose estimation method using geometric methods are shown in Table 1.    utilize geometric information as priors for improving task performances, which use images or videos 908 as inputs. A (binary) relation on a set X is a subset of XˆX. We often denote relations by " and write 917 x " x 1 to indicate that px, x 1 q is in the relation.

918
A relation " on a set X is an equivalence relation if the following three conditions hold:

919
• x " x for all x P X. The ways to denote equivalence classes are listed as follows:

924
• The equivalence class of x P X, denoted by rxs, means the set tx 1 |x " x 1 u. The sets rxs for all 925 x P X form a partition of the set X.

926
• The set of equivalence classes under " can be denoted as X{ ", and it is referred to as the 927 quotient of X with respect to ". If A is a subset of the topological space X and if x is a point of X, we say that x is a limit point

Conclusions and Discussions
This review presented a comprehensive study on human-related analysis (HRA), including human shape analysis, human pose-related analysis, and human action-related analysis. It first introduced fundamental concepts in the topology and manifold as fundamental knowledge for geometric modeling with these theories. Then, geometric methods using these theories were introduced. Later, geometric methods applied for HRA were studied. Considering the great impactof deep learning and its potential in feature extraction and feature representation, the review also considered geometric deep learning, which has recently been a popular topic. Then, generalized geometric methods, which study general purpose geometric information for HRA, were explored. Validation datasets for verifying geometric HRA methods were collected, and the performances of various methods were collected, compared, and shown in a table.
For further research, one topic worth exploring is defining intrinsic deep learning algorithms on RGB-D data, specifically defining the convolution, the pooling, and the spatial shift operation on the RGB-D domain. There are very few works on this topic despite its wide applications. Another research topic worth exploring is learning geometric information and utilizing it as priors. 3D data are still comparatively more difficult to acquire than images or videos; thus, it would be helpful to utilize geometric information as priors for improving task performances, which use images or videos as inputs. Acknowledgments: We are extremely grateful for the reviewers who helped to improve the quality of the paper and the editors who helped us going through the process. This work is the result of a pleasant collaboration. It started with a three-month research stay of the first author in Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (SIAT), where she got the idea and had fruitful discussions with Yali Wang and Tianqi Fan.

Conflicts of Interest:
The authors declare no conflict of interest. A (binary) relation on a set X is a subset of XˆX. We often denote relations by " and write

Abbreviations
x " x 1 to indicate that px, x 1 q is in the relation.
A relation " on a set X is an equivalence relation if the following three conditions hold: • x " x for all x P X. • x " x 1 if and only if x 1 " x. • x " x 1 and x 1 " x 2 implies x " x 2 .

Equivalence Class
The ways to denote equivalence classes are listed as follows: • The equivalence class of x P X, denoted by rxs, means the set tx 1 |x " x 1 u. The sets rxs for all x P X form a partition of the set X.

•
The set of equivalence classes under " can be denoted as X{ ", and it is referred to as the quotient of X with respect to ". If A is a subset of the topological space X and if x is a point of X, we say that x is a limit point (or "cluster point," or "point of accumulation") of A if every neighborhood of x intersects A at some point other than x or x is a limit point of A if it belongs to the closure of A´rxs. The point x may or may not lie in A.

Appendix A.2.2. Continuous Function
Let X and Y be topological spaces. A function f : X Ñ Y is continuous if for each point x of X and each neighborhood N of f pxq in Y, the set f´1pNq is a neighborhood of x in X. A metric on a set X is a function d : XˆX Ñ R having the following properties: • dpx, yq ě 0 for all x, y P X, equality holds if and only if x " y. • dpx, yq " dpy, xq for all x, y P X. • (Triangle inequality) dpx, yq`dpy, zq ď dpx, zq, for all x, y, z P X.
A topology can be imposed on a set by defining a metric on a topology. Given a metric d on X, dpx, yq is often called the distance between x and y. Given t ą 0, consider the set B d px, q " ty|dpx, yq ă u of all points y whose distance from x is less than . It is called the -ball centered at x. fundamental group of X relative to the base point x 0 . It is denoted by π 1 pX, x 0 q. Two spaces that are homeomorphic have fundamental groups that are isomorphic.

Appendix A.3.4. Homology
Although homotopy is easy to define and conceptually very attractive, homotopy groups are very difficult to compute. Another invariant is homology, which, instead of being easy to define and difficult to compute, is difficult to define and easy to compute. The idea of counting occurrences of patterns directly is unworkable, but that of counting equivalence classes of occurrences of patterns under an equivalence relation is workable. In this section, basic notations for homology, i.e., the definition of homology, the normal subgroup, the Abelian Group, and the commutator subgroup, are introduced.
Let G be a group and H be a subset of G. If H is a group under the operation of G, then we say that H is a subgroup of G.
A subgroup H of a group G is normal in G if and only if gH " Hg for all g in G.

Appendix A.3.4.2. Abelian Group
A group is Abelian if xy " yx for all group elements x and y.

Appendix A.3.4.3. Commutator Subgroup
Let G be a group. If x, y P G, we denote the element rx, ys " xyx´1y´1 by rx, ys; it is called the commutator of x and y. The subgroup of G generated by the set of all commutators in G is called the commutator subgroup of G and is denoted by rG, Gs. The subgroup rG, Gs is a normal subgroup of G, and the quotient group G{rG, Gs is Abelian.

Appendix A.3.4.4. Homology
If X is a path-connected space, let H 1 pXq " π 1 pX, x 0 q{rπ 1 pX, x 0 q, π 1 pX, x 0 qs. H 1 pXq is the first homology group of X. Groups H n pXq are called the homology groups of X that are defined for all n ě 0.

Appendix A.4. Manifold Concepts
Appendix A.4.1. Atlas A family pU α , u α q αPA of charts on M such that the U α form a cover of M is called an atlas. An atlas A is called a smooth atlas if any two charts in A are smoothly compatible with each other.

Appendix A.4.2. Smooth Manifold
A smooth structure on a topological n-manifold M is a maximal smooth atlas. A smooth manifold is a pair pM, Aq, where M is a topological manifold, and A is a smooth structure on M. The smooth structure is usually omitted, and M is called a smooth manifold. Smooth structures are also called differentiable structures or C 8 structures. Note that a C 0 manifold is a topological manifold. In such a case, we write j r x f " j r x g or j r f pxq " j r gpxq.

1041
The elements of the manifold T r k M -J r 0 pR k , Mq are said to be the k-dimensional velocities of 1042 order r on M, or (k,r)-velocities.

1043
The projection π r r´1 : T r˚M Ñ T r´1˚M is a linear morphism of vector bundles. Its kernel is described by the following exact sequence of vector bundles over M 0 Ñ S r T˚M Ñ T r˚M π r r´1 Ý ÝÝ Ñ T r´1˚M Ñ 0 Let pE , p, M, Sq be a fibre bundle and T p : TE Ñ TM be the tangent mapping. Its kernel kerT p ": VE is called the vertical bundle of E . For example, if pU α , ψ α : E |U α Ñ U αˆV q αPA is a vector bundle atlas for E , such that pU α , u α q is also a manifold atlas for M, then pE |U α , ψ 1 α q α P A is an atlas for the manifold E , where ψ 1 α -pu αˆI d V q˝ψ α : E |U α Ñ U αˆV Ñ u α pU α qˆV Ă R mˆV .
A vector bundle atlas pU α , ψ α q αPA for pE , p, Mq is a set of pairwise compatible vector bundle charts pU α , ψ α q such that pU α q αPA is an open cover of M.
Therefore, we see that for fixed py, vq the transition functions are linear in pξ, wq P R mˆV . This finding 1047 describes the vector bundle structure of the tangent bundle pTE, π E , E q.

1048
For fixed py, ξq, the transition functions of TE are also linear in pv, wq P VˆV. This gives a 1049 vector bundle structure on pTE, T p, TMq. Its fibre addition will be denoted by Tp`Eq : TpEˆM E q " 1050 TEˆT M TE Ñ TE since it is the tangent mapping of`E. Its scalar multiplication will be denoted by 1051 Tpm E t q.
A fiber bundle pE , p, M, Sq consists of a manifold E , a manifold M, a vector space S, and a smooth mapping p satisfying the above conditions. Appendix A.4.5. The Tangent Bundle Of A Vector Bundle Let pE , p, Mq be a vector bundle with fiber addition`E : EˆM E Ñ E and fiber scalar multiplication m E t : E Ñ E . Then, the tangent bundle of the manifold E , denoted by pTE, π E , E q, is itself a vector bundle with fiber addition denoted by`T E and scalar multiplication denoted by m TE t .
Appendix A.4.6. Vertical Bundle Given two fibered manifolds pM, p, N q and pM,p,N q, a morphism pM, p, N q Ñ pM,p,Nq means a smooth map f : M Ñ N transforming each fiber of M into a fiber ofM.
Two maps f , g : M Ñ N are said to determine the same r-jet (jets or holonomic jets) at x P M, if for every curve γ : R Ñ M with γp0q " x, the curves f˝γ and g˝γ have the r th order contact at zero. In such a case, we write j r x f " j r x g or j r f pxq " j r gpxq. The elements of the manifold T r k M -J r 0 pR k , Mq are said to be the k-dimensional velocities of order r on M, or (k,r)-velocities.
Therefore, we see that for fixed py, vq, the transition functions are linear in pξ, wq P R mˆV . This finding describes the vector bundle structure of the tangent bundle pTE, π E , E q. For fixed py, ξq, the transition functions of TE are also linear in pv, wq P VˆV. This gives a vector bundle structure on pTE, T p, TMq. Its fiber addition will be denoted by Tp`Eq : TpEˆM E q " TEˆT M TE Ñ TE since it is the tangent mapping of`E. Its scalar multiplication will be denoted by Tpm E t q. The space Ξ P TE : T p.Ξ " 0 in TM " pT pq´1p0q is denoted by VE and is called the vertical bundle over E . The local form of a vertical vector Ξ is Tψ 1 α .Ξ " py, v; 0, wq; thus, the transition function looks like pTψ 1 α˝T pψ 1 β q´1qpy, v; 0, wq " pu αβ pyq, ψ αβ pu´1 β pyqqv; 0, ψ αβ pu´1 β pyqqwq.
Appendix A.4.7. Vector Bundle Homomorphism/Isomorphism Let pE , p, Mq and pF , q, N q be vector bundles. A vector bundle homomorphism ϕ : E Ñ F is a fiber-respecting fiber linear smooth mapping (denoted by a diagram): If ϕ is invertible, it is called a vector bundle isomorphism. Appendix A. 4

.8. Connection
A connection on the fiber bundle pE , p, M, Sq is a vector-valued one-form (a one-form on a manifold M is a smooth mapping of the total space of the tangent bundle of M to R whose restriction to each fiber is a linear functional on the tangent space, i.e., α : TM Ñ R, α x " α| T x M : T x M Ñ R, where α x is linear.) Φ P Ω 1 pE ; VEq with values in the vertical bundle VE such that Φ˝Φ " Φ and ImΦ " VE; thus, Φ is a projection TE Ñ VE.
Connection defines parallel transport on the fiber bundle, so we can consider it as how fibers are connected over manifolds. If the fiber bundle is a vector bundle as shown in Figure A1, the connection is linear.