A New Geometric Metric in the Shape and Size Space of Curves in R n

: Shape analysis of curves in R n is an active research topic in computer vision. While shape itself is important in many applications, there is also a need to study shape in conjunction with other features, such as scale and orientation. The combination of these features, shape, orientation and scale (size), gives different geometrical spaces. In this work, we deﬁne a new metric in the shape and size space, S 2 , which allows us to decompose S 2 into a product space consisting of two components: S 4 × R , where S 4 is the shape space. This new metric will be associated with a distance function, which will clearly distinguish the contribution that the difference in shape and the difference in size of the elements considered makes to the distance in S 2 , unlike the previous proposals. The performance of this metric is checked on a simulated data set, where our proposal performs better than other alternatives and shows its advantages, such as its invariance to changes of scale. Finally, we propose a procedure to detect outlier contours in S 2 considering the square-root velocity function (SRVF) representation. For the ﬁrst time, this problem has been addressed with nearest-neighbor techniques. Our proposal is applied to a novel data set of foot contours. Foot outliers can help shoe designers improve their designs.


Introduction
Shape analysis of curves in R n , where n ≥ 2, is an important branch in many applications, including computer vision and medical imaging. Using the landmark representation of objects, Dryden et al. [1] studied the joint shape and size features of objects. However, an over-abundance of digital data, especially image data, is prompting the need for a different kind of shape analysis. In particular, the representation of shapes as elements of infinite-dimensional Riemannian manifolds with a given metric is of interest at this time and has important applications [2,3]. More recently, Srivastava et al. [4] presented a special representation of curves, called the square-root velocity function, or SRVF, under which a specific elastic metric becomes an L 2 metric and simplifies the shape analysis. This approach was analyzed by Kurtek et al. [5] on different scenarios, corresponding to different combinations of physical properties of the curves: shape, size, location and orientation. When the metric used in these infinite-dimensional spaces is invariant with respect to scaling, translation, rotation and reparameterization, the Riemannian manifold that represents the space of curves is known as the shape space, and following the notation of [5] will be denoted as S 4 . However, in many of the applications, other features such as orientation or size (scale) also play important roles and need to be incorporated into the underlying framework. A prime example is medical imaging, where the size of the anatomical structure of interest can provide important diagnostic information. In the case where the curve length (size) is considered, the feature space is called the shape and size (or shape and scale) space and will be denoted as S 2 [5]. Other spaces denoted as S 1 and S 3 consider also changes in the orientations of the curves [5]. In this work, we will focus on the shape space S 4 and the shape and size space S 2 , which are in general, completely different infinite-dimensional Riemannian manifolds. Curves in S 2 are different elements of the space if their shape or scale are different. Curves in S 4 are different elements of the space if their shape is different.
It seems natural, however, that the distance between two curves in space S 2 should be related to the distance of these same curves in space S 4 . We can thereby discern whether the distance between the curves in space S 2 is due, to a greater extent, to their difference in size or to their difference in shape.
In this sense, in [6], the Sobolev-type metric given in [3] for the shape space of planar closed curves is extended to the space of all planar closed curves where the metric considered exhibits a decomposition of the space of closed planar curves into a product space consisting of three components; that is, centroid translations, scale changes and curves in the shape space.
In this approach, we will consider representations of curves in R n from square-root velocity functions (SRVF). Using these representations, we will consider two feature spaces studied in [5]: the shape space S 4 and the shape and size space S 2 . The metric in S 4 will be the same as in [5]; however, we propose a new metric in S 2 , which is completely different to the metric considered in [5]. This metric enjoys the property that S 2 can be decomposed into a product space consisting of two components: S 4 × R, where the second space is related to the length (size) of the curve.
The outline of the paper is as follows: In Section 2, we review the SRVF representation of curves and the standard elastic metrics. In Section 3, we introduce the new metric in S 2 . The mean shape and geodesics with this new metric are introduced in Sections 4 and 5, respectively. A comparison of the proposed metric and the standard elastic metric is carried out in Section 6 in a controlled setting with simulated curves, where we show the advantages of our proposal. We propose a procedure to detect outlier contours in S 2 considering the SRVF representation. To the best of our knowledge, this is the first time this problem has been addressed with nearest-neighbor (NN) techniques. This is introduced in Section 7. Not only that, but so far outlier detection (in the multivariate context) in Anthropometry has only been used as a cleaning technique, for correcting or removing the outliers before analyzing data in the multivariate context [7,8]. However, outliers report very valuable information in the footwear design process: outliers can indicate which kinds of feet are more different from the rest and could therefore cause fitting problems in footwear if the design is not appropriate. In Section 8, our proposal is applied to a novel data set of foot contours. Finally, Section 9 contains the conclusions.
The code and data for reproducing the results are available at http://www3.uji.es/~epifanio/ RESEARCH/metric.zip.

Classical Spaces of Curves in R n for the SRVF Representation
In this section, we review some results from [9]. In particular, we consider the SRVF representation of curves in R n and we summarize the main results for the shape space, S 4 , and for the shape and size space, S 2 , with the standard elastic metrics.
For every q ∈ L 2 ([0, 1], R n ), there is a curve β (unique up to translation) such that the given q is the SRVF function of that β. In fact, If a curve β is of length one, then 1 0 |q(t)| 2 dt = 1. Furthermore, the hypersphere is a Hilbert manifold. One way to study the shape and size (scale) space of open curves is to consider as a pre-shape space To take care of the rotation and reparameterization of the curve β, we remember that a rotation is an element of SO(n), the special orthogonal group of n × n matrices; and a reparameterization is an element of The action of a reparameterization γ ∈ Γ transforms the curve β : [0, 1] → R n to the curve t → β(γ(t)). Hence, by the definition of the SRVF of a curve, we define the action of γ ∈ Γ in C 0 by γ(q(t)) := q(γ(t)) γ (t).
Likewise, the action of O ∈ SO(n) on q ∈ C 0 is just O(q)(t) = O(q(t)). We shall denote the combined action The orbit of a function q ∈ L 2 is If we consider the metric in L 2 given by the usual inner product the feature space of interest is: and the distance in S 2 is: Then, the geodesic between q 1 and the optimal reparameterization of q 2 , which is denoted as On the other hand, the shape space (without considering the size (scale) of the curves) is The distance in S 4 is given by and the geodesic between q 1 and q * 2 is where θ = cos −1 q 1 , q * 2 L 2 .

A New Metric in the Shape and Size Space of Curves in R n
When S 2 is considered as shape and size space, it is difficult to distinguish whether the distance between two shapes [q 1 ] and [q 2 ] is due to the difference in shape or to the difference in size between the corresponding curves β 1 and β 2 . We are therefore going to consider another shape-size space for curves that will be isometric to S 2 with another appropriate product metric.
Instead of considering in L 2 the usual L 2 -metric given in Equation (5), if q ∈ L 2 ([0, 1], R n ), for any two vectors v 1 , v 2 in T q L 2 ([0, 1]) ≡ L 2 ([0, 1], R n ), we will consider the following metric to endow L 2 ([0, 1]) with a Riemannian structure, where The case R(q) = 0 will be excluded, which will mean that curves of length 0 are not considered in our space.
From this metric, we will endow (Theorem 1) C 0 × R with a Riemannian structure in such a way that C 0 × R will be isometric to L 2 ([0, 1]),ĝ . This isometry will be exported (Theorem 2) to an isometry between S 2 and S 4 × R.
Therefore, we obtain a new metric which enjoys the property that S 2 can be decomposed into a product space consisting of two components: S 4 × R, where the second space is related with the length (size) of the curve. This new metric is associated with a distance function, see Corollary 3, given by This distance is invariant under rotations and under rescaling in the sense that ) for any O in SO(n) and any λ > 0.

An Isometry between
We will begin this section defining a function F which will provide an isometry between , ln R(q) .
Observe that F is well defined for any The function F has immediate smooth inverse given by The Functions R, π and F and Their Properties Now we state some properties of the function R, some properties of π(q) := 1 R(q) q from L 2 to C 0 , that is the natural projection given by the normalization of an SRVF using its norm, and some properties of the function F: Proposition 1. Properties of the functions R, π and F: . Then, R(q) = L(β) is the square root of the length of the curve β.

For any
Namely, the rotation group commutes with the projection π. 5. For any λ > 0 7. F is smooth and admits smooth inverse with never vanishing differential map.
From the properties of π, R and F, we can conclude the following diffeomorphisms.
Corollary 1. From the function F we have that As already mentioned at the beginning of the section, if and for any two vectors v 1 , v 2 in T q L 2 ([0, 1], R n ) \ {0} we will use the following metric to endow L 2 ([0, 1], R n ) \ {0} with a Riemannian structure, Therefore, using where the usual product metric is considered in C 0 × R.
Proof. We have shown in Corollary 1 that F is a diffeomorphism; therefore, we only have to prove that the pullbackĝ * of the metricĝ is the usual product metric in C 0 × R.
because q ∈ C 0 . Therefore, for any two vectors Now, first of all, we need to prove that in the tangent space to C 0 × R at (q, t), T q C 0 is orthogonal to T t R. In order to do that, consider two vectors v 1 ∈ T q C 0 and v 2 ∈ T t R and two curves γ 1 , where v 2 ∈ R, q ∈ C 0 . Hence, Since we are using the usual product metric in C 0 × R, we conclude the following result: where d C 0 is the usual distance in C 0 .
From this explicit expression of the distance, it is easy to see that the metricĝ is invariant under the action of reparameterizations and rotations.
Proof. We need to prove that By Proposition 1 we know that R((O, γ)(p)) = R(p) (and R((O, γ)(q)) = R(q)). Hence, the proposition follows because 3.2. The Isometry between S 2 and S 4 × R Using the isometry given by Theorem 1, an isometry between S 2 and S 4 × R can be constructed as stated in the following theorem: Theorem 2. The isometry F can be exported to an isometry [F] by using the following commutative diagram Proof. From Proposition 2 we know that SO(n) × Γ acts by isometries on (L 2 \ { 0},ĝ). Therefore, since the action of the group Γ × SO(n) on (L 2 \ { 0},ĝ) and on C 0 is by isometries, and bearing in mind the diffeomorphisms in Corollary 1, we obtain the result.
The isometry [F] of the above theorem can be used to obtain the expression of the new distance function.
In the following proposition we are proving that d new is a well defined distance function. = 0 and R(p) = R(q).

For any
Proof. Most of the statements of the proposition follows directly from the definition and the properties of d 4 and R. We shall prove the triangle inequality for the sake of completeness. Let us denote by v, w ∈ R 2 the vectors given by Then, by applying the triangle inequality for d 4 ,

The Mean Shape
Given {β 1 , · · · , β n }, a sample of parameterized curves, and their corresponding SRVF, {q 1 , · · · , q n }, the Karcher mean shape regarding the new metric d new is defined as is the geometric mean of the {R(q 1 ), · · · , R(q n )}, i.e., n ∏ n i=1 R(q i ), and a gradient-based approach for finding the value of q/R(q) that minimizes can be found in [10,11]. The detailed algorithm to find the Karcher mean in the shape space S 2 can be found in [12].
Givenμ C 0 the Karcher mean of {q i /R(q i )} i=1,··· ,n in the shape space, the Karcher mean in the shape and size space with the new metric is obtained aŝ Hence, applying Equation (2), the mean curve iŝ C 0 (s)|μ C 0 (s)|ds (18) where L(β i ) is the length of the curve β i .

Corollary 4. Any geodesic in
where α is a geodesic in C 0 and a, b ∈ R. Therefore any geodesic in (L 2 ([0, 1]) \ { 0},ĝ) can be written as with A, b ∈ R and α a geodesic in C 0 .
The Karcher means of the ten spirals and of the ten circumferences are computed with the new metric (β new , Equation (18)) and by using the distance proposed by [5] in the shape and size space S 2 . These means are shown in Figure 2, where the original curves are plotted in light blue;β new is plotted in black color andβ 2 , the Karcher-mean using the distance d 2 , is plotted in red color. As can be seen, in Figure 2a,b, there is a very slight difference between the meansμ new andμ 2 of the ten spirals. Figure 2c,d show that the means coincide in the case of the circumferences. An example comparing the geodesics obtained with d new and d 2 , can be seen in Figure 3, without great differences among them.
Finally, the distance matrices D new and D 2 between the twenty curves are computed using both metrics, and in order to compare the performance of d 2 and d new , a multidimensional scaling (MDS) analysis [13] has been carried out. The MDS algorithm is a descriptive data reduction procedure to display the information contained in a (m × m)-distance matrix, D, in a low-dimensional space such that the between-object distances are preserved as well as possible. Then, for each distance matrix D, the method looks for a set of orthogonal variables {y 1 , · · · , y p }, p < m such that the Euclidean distances of the elements with respect to these variables are as close as possible to the distances given in the original matrix D. In Figure 4, MDS has been applied to the distance matrices computed with both metrics. In both graphics (Figure 4a,b), the black points represent the twenty spirals α 1i shown in Figure 1a, and the green points represent the twenty circumferences α 2i shown in Figure 1b.  The ten spirals are marked in green and the ten circumferences are represented with black asterisks.
As can be seen, there are slights differences among the MDS representations of both metrics. If we perform a k-means cluster with k = 2 from D new and D 2 , in both cases the two groups are perfectly recovered. We also recover the two groups if we apply DBSCAN [14,15].
If we re-scale the twenty figures, multiplying them by 50, and we consider the twenty resulting curves jointly to the twenty original ones, we can compute again the distance matrices D new and D 2 between the 40 curves. The MDS scaling representation of these distance matrices can be found in Figure 5. In both graphics, the initial spirals {β i1 } i=1,··· ,10 are plotted in green; the circumferences {β i2 } i=1,··· ,10 are plotted in black, and their scaled versions, {50β i1 } i=1,··· ,10 and {50β i2 } i=1,··· ,10 are plotted in red and blue color, respectively. The ten initial spirals are marked in green and the ten initial circumferences are represented with black asterisk, the spirals re-scaled by 50 are the red asterisks and the circumferences are plotted in blue.
If we perform a k-means cluster analysis with k = 4 from the distance matrices, the four groups are recovered from D new , but for D 2 , the distance among the scaled circumferences increases regarding to the distance among the initial circumferences, so the group of large circumferences is splitted into two clusters while the initial (short) curves (spirals and circumferences) are joined in a unique cluster ( Figure 6). However, the algorithm DBSCAN applied on both distance matrices, allow us in both cases recover again the four initial groups.
(a) (b) Figure 6. MDS applied to the distance matrices: (a) using d new , (b) using d 2 . The ten initial spirals are marked with green asterisks and the ten initial circumferences are represented with black asterisks, the enlarged spirals are plotted in red and the enlarged circumferences are plotted in blue.
A k-means cluster analysis with k = 6 so as the DBSCAN algorithm applied on D new recovers the 6 simulated groups. However, the DBSCAN algorithm applied on D 2 provides 5 clusters on this data set joining in a single cluster the initial (short) curves (spirals and circumferences) and distinguishing the other groups (Figure 7d). The k-means algorithm with k = 5 provides the same result, but if the k-means algorithm is applied with k = 6 clusters, the set of the largest circumferences is split into two groups (Figure 7c). Once again, it can be clearly seen that the distance d 2 among shapes increases with the scaling factor. Figure 7. MDS applied to the distance matrices: (a) using d new , (b) using d 2 . The ten initial spirals are marked in green and the ten initial circumferences are represented with black asterisks. The enlarged spirals are plotted in red (factor 50) and magenta (factor 250) and the enlarged circumferences are plotted in blue (factor 50) and cyan (factor 250). The clusters obtained on d 2 are plotted: (c) using k-means algorithm with k = 6, (d) using DBSCAN and k-means, k = 5.

Detection of Outliers
Although there are a variety of techniques for outlier detection for different types of data in any metric space based on nearest-neighbor techniques (see [16] for a detailed explanation), they have not been fully exploited in the shape and size space of curves. Some of the main references are based on box-plots of the distances to the median to detect outliers, such as [12,17], and more recently the method based on elastic depths proposed by [18].
We propose a technique for outlier detection based on the proposed distance. Nearest-neighbor techniques are very popular due to their good results, conceptual simplicity and interpretability in the classic multivariate case [19]. We consider this idea for the shape and size space of curves. The k-NN Anomaly Detection algorithm searches for the nearest k-neighbors, i.e., the k closest curves, for every element in the database, and calculates the average distance of the k-neighbors. In the multivariate case, the Euclidean distance is used, but here we use the proposed distance to find the neighbors. This procedure returns outlier scores; as usual, the highest score denotes the highest degree of outlierness. A way to establish a binary decision about whether or not to label a point as an outlier is to use a box-plot with the outlier scores and to consider the points detected as outliers by the box-plot as anomalies.
We compare our procedure with that introduced in [12,17,18] using the data sets of open curves used in [12,17], which are available from [20]. For the Example 1 considered in [17] formed by 70 spirals, Ref. [12] found 6 outliers, Ref. [17] also found 6 outliers (2 scale outliers and 4 mild shape outliers) and [18] found 4 outliers (3 due to amplitude and 1 due to phase) with the recommended value of k = 2, which is the boxplot multiplier, while 9 outliers are found with the classical k = 1.5. However, with our methodology, we detect 8 outliers (the results are stable, we obtain the same outliers with k = 5, 10 or 15). We have also computed the square of the distance in Equation (17) (d(p, q) 2 ) and we have computed the contribution in percentage due to shape d 2 C 0 (π(p), π(q))/d(p, q) 2 and due to size d 2 R (ln R(p), ln R(q))/d(p, q) 2 , for each outlier. The percentages of contribution due to shape for the 8 outliers are: 21%, 29%, 34%, 35%, 41%, 50%, 50% and 74%. For the Example 3 in [17] formed by 176 fiber tracts in the human brain extracted from a diffusion tensor magnetic resonance image (DT-MRI), we detect the same 11 outliers also detected by [12,17], and all are due to shape, with percentages of contribution due to shape of 90%. However, [18] with k = 2 returned 62 outliers (62 due to amplitude, 23 of them are also outliers due to phase), i.e., 35% of points of the sample are considered outliers.

Application to a Real Data Set
Footwear design relies greatly on knowledge of foot size and shape. Proper fit is an essential condition for potential shoe buyers, besides the fact that poorly fitting footwear can cause foot pain and deformity, especially in women. Although people with extreme feet (very different from the rest) may be the most likely customers with poor fit, in anthropometric studies they are not usually searched. However, outliers report very valuable information for the footwear design process, since they can help shoe designers adjust their designs to a larger part of the population and can increase their awareness of customers characteristics that will make them uncomfortable to wear, whether when considering a range of special sizes or modifying any shoe feature to fit more users.
The aim of this section was to detect the outliers in an anthropometric foot database. We carry out a separate analysis for men and women, since gender foot shape differences are well-known [21,22]. Furthermore, footwear designers usually propose different types of shoes for women and men.

Foot Database
A total of 770 3D right foot scans were carried out. A total of 389 men and 381 women representing the Spanish adult female and male population were measured. The data were collected in different regions across Spain at shoe shops and workplaces using an INFOOT laser scanner [23]. The scanning process is carried out while the participant stands upright placing equal weight on each foot, in a specific position and orientation (see Figure 8). The result is a 3D point cloud representing the complete outer surface of the foot, including the sole of the foot.
3D foot shapes were registered using the method described by [24] with a template made up of 5000 vertices, with five foot landmarks (i.e., 1st and 2nd toe tips; 1st and 5th metatarsal heads; and pternion; see Figure 9). This set of landmarks is automatically located on 3D foot scans and allows the extraction of foot measurements and contours, according to the definitions used by the Human Shape Lab of the Biomechanics Institute of Valencia (IBV), which comply with standards and are compatible with the accepted definitions found in the literature [25][26][27][28]. In particular, we consider the longitudinal contour passing through the Ball Position. The mean shapes for men and women in S 2 are displayed in Figure 10.

Detection of Foot Outliers
We have applied our outlier procedure to the curves of men and women with k = 10. A total of 24 and 18 outliers are detected for men and women, respectively. In order to briefly describe the outlier curves detected, we show the percentiles of each outlier for the four variables that could most influence shoe fit according to shoe design experts. Specifically, these variables are: Foot Length, FL (distance between the rear and foremost point the foot axis); Ball Girth, BG (perimeter of the ball section); Ball Width, BW (maximal distance between the extreme points of the ball section projected onto the ground plane); and Instep Height, IH (maximal height of the instep section, located at 50% of the foot length). Tables 1 and 2 show the percentile profiles of the outliers found for women and men, respectively. Note that for some of the outliers some of the variables show extreme percentiles, i.e., very high or very low percentiles. However, in many other cases, outliers do not show extreme values in these variables. Therefore, outliers can be due to the particular combination of the variables or due to the particular configuration of the curve that cannot be summarized by these four variables. In summary, with the proposed procedure, we can detect feet that are "not normal", which may not be detected with a classic multivariate analysis. Figure 11 shows the most outlier feet for men and women. For men, the outlier feet are the 14th and 9th, while for the women the outliers are the 7th and 17th. Note that those feet do not have really extreme percentiles.

Conclusions
We have proposed a new metric in the shape and size space S 2 that, unlike the previous proposals, allows us to distinguish whether the distance between two shapes [q 1 ] and [q 2 ] is due to the difference in shape or to the difference in size between the corresponding curves β 1 and β 2 . It has been compared with the metric proposed by [5] in a simulation study, where our proposal is shown to perform better. Furthermore, we also show the advantages of the new metric, such as its invariance to changes of scale.
For the first time, we have also proposed a procedure based on the distances and NN techniques in S 2 for finding outlier curves in S 2 . We have applied it to a novel industrial data set. The foot outliers found by considering their contour can help shoe designers improve their designs in order to provide customers with a better fit.
In future work, in regards to the theory, closed curves could be considered and an the appropriate metric defined. Furthermore, the new metric could be used in other types of statistical problems besides outlier detection, such as classification, clustering, or new ones, where curves in S 2 have never been used before, such as archetype analysis [29] or archetypoid analysis [30]. Finally, in regards to the footwear application, the outlier procedure with the new metric could be applied to other kinds of foot contours, such as the Ball Girth, and of course, scopes for other fields of application.