A Non-structural Representation Scheme for Articulated Shapes

For representing articulated shapes, as an alternative to the structured models based on graphs representing part hierarchy, we propose a pixel-based distinctness measure. Its spatial distribution yields a partitioning of the shape into a set of regions each of which is represented via size normalized probability distribution of the distinctness. Without imposing any structural relation among parts, pairwise shape similarity is formulated as the cost of an optimal assignment between respective regions. The matching is performed via Hungarian algorithm permitting some unmatched regions. The proposed similarity measure is employed in the context of clustering a set of shapes. The clustering results obtained on three articulated shape datasets show that our method performs comparable to state of the art methods utilizing component graphs or trees even though we are not explicitly modeling component relations.


Introduction
Shape is a distinctive object attribute which is frequently utilized in image processing and computer vision applications. Measuring similarity of objects via their shapes is a difficult task due to high within-class and low between-class variations. Within-class variations may be due to transformations such as rotation, scaling and deformation of articulations. Articulated shapes can be successfully represented by structural representations which are organized in the form of graphs of shape components. However, it is challenging to build and compare structural representations. Moreover, measuring similarity of shapes through their structural representations requires finding a correspondence between a pair of graphs, which is an intricate process entailing advanced algorithms.
In this work, we propose a representation scheme for articulated shapes which involves neither building a graph of shape components nor matching a pair of graphs. The proposed representation is used to measure pairwise shape similarity according to which we cluster a set of shapes. The clustering results obtained on three articulated shape datasets show that our method performs comparable to state of the art methods utilizing component graphs or trees even though we are not explicitly modeling component relations.

The Method
Our representation scheme relies on first constructing multiple high-dimensional feature spaces in which shape points (pixels in 2D discrete setting) are represented and then, determining distinctness of the shape points in each space separately via Robust Principal Component Analysis (RPCA).
The distinctness values deduced from each feature space are utilized for two main purposes. First, their spatial distribution on the 2D shape domain is used to partition the shape into a set of regions. Second, each region is described by the normalized probability distribution of the corresponding distinctness values. The dissimilarity between a pair of shapes via each feature space is defined as the cost of the optimal assignment between their regions. Notice that we do not build any graphs to model the shape structure and the optimal assignment problem does not involve matching a pair of graphs. arXiv:1807.11411v1 [cs.CV] 30 Jul 2018 The final shape dissimilarity is computed by combining the dissimilarities deduced from multiple feature spaces. Below, we present the details of our representation scheme.

Construction of a High-dimensional Feature Space
Consider a planar shape discretized using a grid of dimension n 1 × n 2 . We construct a 30-dimensional feature vector f x,y ∈ R 30 for each shape pixel (x, y) where 1 ≤ x ≤ n 1 and 1 ≤ y ≤ n 2 . In order to compute the feature component at each slot k for k = 1, 2, . . . , 30, we first solve a linear system of equations in which the feature value of each shape pixel u x,y k is related with the feature values of its four-neighboring pixels via (1) and we then normalize the obtained values as in (2).
k is solved only for the shape pixels and, it is considered 0 for the pixels outside the shape. ρ k is a scalar parameter which controls the dependence between the feature value of each shape pixel and the feature values of its four-neighbors.
In Fig. 1 (left), we show the features computed for a one-dimensional signal using three different values of ρ k corresponding to 1/10, 1/5 and 1 times the whole signal length. We normalize each feature to have the maximum value of 1 (see Fig. 1 (right)). We observe that the feature values monotonically increase towards the center. By varying ρ k , we obtain a collection of features each encoding a different degree of local interaction between the shape pixels and their surroundings. We determine ρ k for k = 1, 2, . . . , 30 as ρ k = k × ρ /30 where ρ represents the extent of the maximum interaction between the shape pixels and their surroundings. In order to represent different shapes in a common feature space, we determine ρ for each shape individually as a measurement of the same shape property.

Determining Multiple High-dimensional Feature Spaces
We utilize two different shape measurements which are related with thickness of the shape body and the maximum distance between the shape extremities. The first measurement R is computed as the maximum value of the shape's distance transform which gives the distance of each shape point from the nearest boundary. The second measurement G is computed as the maximum value of the pairwise geodesic distances between the boundary points where the geodesic distance between a pair of points depends on the shortest path connecting them through the shape domain.
As shown in Fig. 2, R and G provide characteristic shape information which can be used to define the extent of the local interactions between the shape pixels during the feature space construction. We . R and G correspond to the thickness of the shape body (red) and the maximum distance between the shape extremities (blue), respectively. construct six different feature spaces for which ρ is selected as multiples of R or G, namely, 2R, 3R, 4R, (2/3)G, (2/4)G and (2/5)G.

Computing Distinctness of Shape Pixels via Each Feature Space
We organize the feature vectors in the form of a matrix D ∈ R m×30 where each row represents the feature vector computed for a shape pixel and m is the total number of shape pixels. The matrix D is decomposed into a low-rank matrix L and a sparse matrix S via RPCA, which seeks to solve the following convex optimization problem: where ||.|| * denotes the sum of the singular values of the matrix, ||.|| 1 is the sum of the absolute values of all matrix entries and λ is the weight of penalizing denseness of the sparse matrix S. Various algorithms are proposed to solve the optimization problem in (3). We use the inexact augmented Lagrange multipliers method for RPCA [1], which is efficient and highly accurate. We choose λ = 1/ √ m as suggested by the available implementation of [1].
The correlation between the feature vectors hence the shape pixels is encoded by the matrix L whereas their discrimination is contained in the matrix S. Thus, we define the distinctness of each shape pixel as the norm of the corresponding vector in the matrix S. The shape pixels whose feature components vary more are found to be more distinct. The shape articulations are associated with larger distinctness since they are thinner compared to the shape body and the constant value coming from the shape boundary is propagated faster in these regions during the feature computation.

Partitioning Shapes into a Set of Regions via Each Feature Space
We utilize the afore-mentioned property of the distinctness values in order to partition shapes into a set of regions. We first divide the shape domain into two disjoint sets by thresholding at the mean distinctness value. We further partition each set into multiple regions by dilating the two sets one after another in descending distinctness order. In this way, we remove the connections between different regions of each set. Radius of the structuring element used for dilating each pixel is determined using the distance of the pixel from the nearest boundary.

Measuring Pairwise Shape Dissimilarity via Each Feature Space
We describe each shape region by the normalized probability distribution of the distinctness values of its constituent pixels where the normalization is performed by making the probability sum equal to the ratio of the region area to the total shape area. In order to estimate the probability distribution, we simply utilize the histogram of the distinctness values with a constant bin size 0.01. The dissimilarity between a pair of shapes is defined as the cost of the optimal assignment between their regions. We use Hungarian matching for solving the optimal assignment problem. We do not assume any relation between the regions of each shape. Hungarian matching aims to find a one-to-one correspondence between the regions of the two shapes leaving some regions unmatched. The cost of assigning two regions is simply taken as the sum of the absolute value of the difference between their normalized probability distributions. The cost of leaving a region unmatched is taken as the sum of its normalized probability distribution, which is equal to the ratio of its area.

Combining Pairwise Shape Dissimilarities Deduced from Multiple Feature Spaces
In order to define the final dissimilarity of a pair of shapes, we compute a weighted average of the dissimilarities deduced from the six feature spaces. The weight is 1/4 for each of the dissimilarities via the feature spaces constructed using R whereas it is 1/12 for each of the dissimilarities via the feature spaces constructed using G. The non-uniform weighting is due to that R is more reliable than G since the shape body is a more stable structure compared to the articulations.

Experimental results
As shown in Fig. 3, the distribution of distinctness values vary considering representations of different shapes via the same feature space or representations of a single shape via different feature spaces. Grouping of the distinctness values on the shape domain provides partitioning of the shape into meaningful regions such as the shape body (gray) and the articulations (black) via simple operations. In order to observe the clustering effect implied by the proposed dissimilarity measure, we utilize t-Distributed Stochastic Neighbor Embedding (t-SNE) [2] which aims to map objects into a plane based on their pairwise dissimilarities. In Fig. 4, we show the t-SNE mapping result for 56shapes [3] dataset which consist of 14 shape categories each with 4 shapes where the within category variations are due to transformations such as rotation, scaling and deformations of articulations. We see that the shapes from the same category cluster together and the shapes from the similar categories (e.g. horse and cat shapes) are close to each other.
We compare our clustering results with state of the art methods using Normalized Mutual Information (NMI). NMI measures the degree of agreement between the ground-truth category partition and the obtained clustering partition by utilizing the entropy measure. The formulation of NMI is as follows [4]. Let n j i denote the number of shapes in cluster i and category j, n i denote the number of shapes in cluster i, and n j denote the number of shapes in category j. Then NMI can be computed as follows: where I is the number of clusters, J is the number of categories and N is the total number of shapes. A high value of NMI indicates that the obtained clustering matches well with the ground-truth category partition. In order to compute NMI of our clustering result, we need to assign a cluster id to each shape. Given the t-SNE mapping of a dataset obtained using our proposed dissimilarity measure, we apply affinity propagation [5] to partition the dataset into a number of clusters (which is chosen equal to the number of categories in the dataset).
In Table 1, we present NMIs of our proposed method and other state of the art methods on 56shapes [3], 180shapes [6] and 1000shapes [7] datasets. 180shapes dataset consist of 30 categories each with 6 shapes. 1000shapes dataset consist of 50 categories each with 20 shapes. The method of common structure discovery (CSD) [4] employs hierarchical clustering in which a common shape structure is constructed each time two clusters are merged into a single cluster where building a common shape structure requires matching skeleton graphs. The method (skeleton path+spectral) presented in the work [8] combines the skeleton path distance [9] with spectral clustering. The performance of these two skeleton-based methods decreases for 1000shapes dataset which contains unarticulated shape categories such as face category. For 1000shapes dataset, the highest performance is obtained via the method (shape context+spectral) in [8] which uses shape context descriptor [10]. As the shape context descriptor is not robust to deformation of shape articulations, the performance decreases for highly articulated 56shapes and 180shapes datasets. Inner distance shape context (IDSC) descriptor [11] is an articulation invariant alternative to the shape context descriptor. In the work [4], the performance of IDSC combined with normalized cuts (Ncuts) algorithm is reported for the three datasets. Overall, we accurately cluster the shapes from 56shapes dataset and our proposed method has the highest NMI average over the three datasets. We observe that without constructing and matching graphs of shape components, our method performs comparable to the structural methods.

Summary and Conclusion
We presented a novel representation scheme which does not involve any relational/structural model of the shape components. Our representation scheme is based on a pixel-wise distinctness measure which is obtained by applying RPCA to the shape pixels represented in a high-dimensional feature space. The distinctness measure is proven to be very useful. Its spatial distribution on the shape domain provides easy partitioning of the shape into meaningful regions and its probability distribution provides a description of each region. We define a pairwise dissimilarity measure as the cost of optimal mapping between regions of the shapes. The results of the clustering experiments on highly articulated shape datasets show that our proposed method performs comparable to state of the art methods.
Author Contributions: A.G. and S.T. contributed to the design and development of the proposed method and to the writing of the manuscript. A.G. contributed additionally to the software implementation and testing of the proposed method.
Funding: This research was funded by TUBITAK grant number 112E208.

Conflicts of Interest:
The authors declare no conflict of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

2D
Two-dimensional RPCA Robust Principal Component Analysis t-SNE t-Distributed Stochastic Neighbor Embedding NMI Normalized Mutual Information CSD Common Structure Discovery IDSC Inner Distance Shape Context Ncuts Normalized Cuts