Nonparametric Bayesian Learning of Infinite Multivariate Generalized Normal Mixture Models and Its Applications

This paper addresses the problem of data vectors modeling, classification and recognition using infinite mixture models, which have been shown to be an effective alternative to finite mixtures in terms of selecting the optimal number of clusters. In this work, we propose a novel approach for localized features modelling using an infinite mixture model based on multivariate generalized Normal distributions (inMGNM). The statistical mixture is learned via a nonparametric MCMCbased Bayesian approach in order to avoid the crucial problem of model over-fitting and to allow uncertainty in the number of mixture components. Robust descriptors are derived from encoding features with the Fisher vector method, which considers higher order statistics. These descriptors are combined with a linear support vector machine classifier in order to achieve higher accuracy. The efficiency and merits of the proposed nonparametric Bayesian learning approach, while comparing it to other different methods, are demonstrated via two challenging applications, namely texture classification and human activity categorization.


Introduction and Related Works
In recent years, there has been major progress in Statistical Machine Learning (SML) with unsupervised learning problems such as clustering, classification and recognition for both univariate and multivariate data. The success of statistical models depends on their capacity to model and build an effective representation that holds the underlying distribution of the data. SML has been used in various areas of research, such as knowledge discovery, computer vision and pattern recognition. Over the last decades, there has been extensive state-of-the-art related to these problems. In particular, there has been an increasing interest in the use of both supervised and unsupervised learning methods to assign or group similar objects into homogeneous and disjoint clusters. Such methods are also used for constructing classifiers to effectively recognize objects on the basis of discriminative visual features. A typical broadly used learning method is the so-called statistical mixture model [1][2][3][4][5], which has the advantage of offering a tractable and flexible basis for statistical inference. It is noted that statistical modelling has much to contribute to machine learning and data science. In particular, mixture models are widely used in scientific problems, where multidimensional objects are to be clustered or classified. In this situation, the data should be modelled in terms of a mixture of many components. The well-principled Finite mixture models (FMM) are used with success to partition multimodal data points and to learn cluster membership of observations with unknown cluster labels [6]. For instance, finite Gaussian mixture models were employed successfully in many applications [7,8].
However, assuming Gaussian form density implies a strong assumption about the clusters' shape and may lead to overfitting the number of components, as well as a poor recognition (or classification) rate.
A difficult aspect when considering finite mixture models is the determining of the exact component numbers, which help to avoid the issues of over-and under-fitting and minimizing the approximation errors, especially when we are trying to model complex real-world data problems (such as multimodal data) [9]. For instance, Laplace and Normal densities fail to fit many complex shapes with multi-dimensional data and when describing the heavier tails caused by specific patterns [10,11]. In this context, non-Gaussian data modelling plays an essential role in accurate data clustering and classification. This problem can be addressed with more flexible statistical models, such as generalized Gaussian mixtures [11]. For this purpose, extensive research efforts have been developed; nevertheless, many of them still fail to achieve high accuracy for many applications [12]. On the other hand, several deterministic approaches have been implemented to deal with model parameters estimation [13,14]. These approaches (also known as likelihood-based approaches) are well documented but they suffer from local optimality.
In recent years, there has been increasing interest in Bayesian approaches (both parametric and nonparametric), which are seen as an attractive alternative for dealing with the limitations of deterministic methods such as dependency on initialization [11,15,16]. Indeed, the availability of prior knowledge about the parameters helps to accurately estimate these parameters from the data using a fully Bayesian approach. This approach is becoming more and more popular due to its flexibility and potential to include prior information. In previous work [10], authors have shown that Bayesian estimation of multivariate generalized Gaussian (Normal) mixture can offer good flexibility in terms of data fitting. The proposed method allows for the flexible modeling of both multidimensional and multivariate data (i.e., correlated data) by considering the relationship between attributes. It has also been demonstrated that multivariate normal mixtures are suitable for high-dimensional data, but inference on multivariate mixture models is therefore more difficult [17]. Parametric approaches, however, are inappropriate for real machine learning problems as data sets grow over time under uncertainty and incompleteness. Thus, unlike parametric Bayesian approaches, which assume an unknown finite number of components, nonparametric Bayesian approaches guess infinitely complex models (i.e., an infinite number of components) [18,19] and have undergone significant theoretical and computational progress over the years [20,21]. It should also be noted that, in finite mixture models, estimating the optimal number of clusters is one of the difficult issues that can lead to an overfitting problem. To cope with this, it is possible to think about extending the finite model to infinite mixtures. In addition, given some famous algorithms that are able to generate accurately observations from posterior distribution, such as the Gibbs algorithm-a particular form of Markov Chain Monte Carlo method (MCMC) [22][23][24]-infinite mixture models have become an attractive paradigm for effective unsupervised learning.
The rest of this paper is organized as follows: In Section 3, we present our proposed infinite mixture model as well as a fully Bayesian learning approach; in Section 4, we study the performance of the developed framework on the basis of different applications and data sets; finally, in Section 5, we conclude the current work.

Proposed Method
In this work, we take a step forward by extending the finite multivariate generalized Normal mixture [10] to a more flexible infinite mixture (inSSDMM) where a nonparametric prior on the data is adopted. The proposed mixture is learned via a fully unsupervised Bayesian nonparametric approach by analyzing data without a priori information on the number of clusters. Indeed, we propose applying a fully MCMC inference based on Gibbs and Metropolis-Hastings algorithms to learn our infinite model. An efficient soft quantization method for features and descriptors encoding using a Fisher Vector is also exploited in this work, as in [25]. Unlike some earlier works, such as histograms and kernel codebook (bag-of-words approach), the Fisher Vector method offers more robust local features representation [25]. Finally, we test the proposed approach on two challenging problems, namely texture classification and object categorization.
In the following, we describe the different steps of the proposed framework, such as the feature extraction step, the modelling step with the infinite mixture model and learning the process via priors and posteriors (see Figure 1). Indeed, we start by extracting effective visual features from each input image. Then, these features are modelled using our developed infinite multivariate generalized Normal mixture model. The proposed statistical model is then learned via an effective nonparametric MCMC-based Bayesian approach in order to avoid the crucial problem of model over-fitting and to allow uncertainty in the number of mixture components. The next step consists of encoding features with robust Fisher vectors to obtain effective descriptors that consider higher order statistics. Finally, these descriptors are combined with a linear support vector machine classifier in order to classify and categorize images.

Features Extraction and Encoding
The feature extraction step has been largely dealt with in the field of image analysis applications. It is the process of acquiring relevant information such as shape, color and texture. The extraction and characterization of particular regions in a given image is one of the preliminary steps in many image processing procedures. To accomplish this objective, it is crucial to analyze these images by extracting important details (patterns) and providing an accurate description of them. If relevant and representative features are well extracted, then the corresponding images will be interpreted better and then classified correctly. It is known that the majority of classification and/or categorization methods follow a common approach that is based on extracting visual features and applying a feature quantization step (Bag of Words (BoW) model) and/or a soft assignment with mixture models. However, these attempts give little importance to feature encoding and therefore result in lossy step, as shown in [25,26]. In order to overcome this shortcoming (i.e., reduce the quantization error), we propose here to apply a robust encoding step based on Fisher kernel [25,27,28], which is defined as an extension of the BoW model. The resulting Fisher Vector, which is determined by the gradient with respect to the model's parameters, contains both zero order statistics and higher order statistics of the developed infinite mixtures of the features. Thus, the obtained vector is computed by concatenating the gradient for all components of the infinite mixture model (inMGNMM). After that, we proceed with a power normalization step on the feature vector in order to obtain an invariant vector of descriptors, as in [28]. The last step consists of introducing the obtained descriptors into a linear SVM classifier to perform the classification or the categorization task. Thus, the proposed pipeline is summarized as follows (see also Figure 1):

1.
Extracting visual features from the training images;

2.
Encoding extracted features into robust descriptors based on the proposed infinite mixture model and a Fisher kernel to encode the higher order statistics of the infinite mixtures of the extracted features; 3.
Classifying and/or categorizing image descriptors using the SVM classifier.

Model Specification
We start by recalling here the finite mixture of K multivariate generalized Normal distributions (MGND).

Definition 1 (Multivariate generalized Normal distribution).
We say that Y, D-dimensional vector, has a multivariate generalized Normal distribution with parameters ( µ; Σ; β) and noted as Y ∼ MGN D ( µ; Σ; β), if its density function is defined as [10]: For the case of finite mixtures, we suppose that we have K components following a mul- Thus, the likelihood is defined as follows [4]: where Θ = ( w, θ). p( Y n |θ k ) is the k th component of the mixture defined by its parameters µ k is the mean vector of the MGGD distribution. • β k represents its shape parameter. It inspects the peakedness and the spread of the distribution. • Σ k is its covariance matrix, also called a scatter matrix.
It is noted that each Y n is generated from one component.

The Infinite Mixture Model
In recent years, many nonparametric Bayesian methods have been developed in order to build more realistic and more flexible statistical mixture models. In this case, parameters are considered random and follow specific probability distributions (known as prior distributions), allowing us to describe our knowledge before seeing the data, and to update our prior beliefs, the likelihood is used. On the other hand, nonparametric Bayesian models have the advantage of allowing an infinite number of clusters, so there is no constraint on this number, and they are therefore considered more suitable for modeling, in particular with dynamic data. Thus, the number of parameters may increase or decrease as we have new data. With an infinite mixture, we can overcome the limits of the finite models, which are highly restrictive and then we effectively take into account the uncertainty of the model (the data come from an infinite number of clusters). In what follows, we will show how infinite mixture models have the ability to generate new clusters or delete some of existing ones when new data arrive. The model learning is based on building estimates of the posterior distribution for our infinite multivariate generalized Normal mixture model using the Monte Carlo Markov Chain (MCMC) technique [15,29]. Given that the calculation of posterior distributions is intractable for the case of mixture models, it is therefore recommended to implement effective approximation tools to generate samples from these posterior distributions. In this work, we are motivated to apply the well known Gibbs sampler [11,30] to generate posterior samples. In what follows, we focus on developing the priors and conditional posteriors needed to run Gibbs sampling.

Priors and Conditional Posterior Distributions
In contrast to deterministic approximation, which is generally based on the EM algorithm-where the missing data noted by Z = ( Z 1 , . . . , Bayesian approach to mixture learning is more efficient. Indeed, it is able to overcome the drawbacks related to the deterministic methods (maximum likelihoodbased technique) such as its convergence to local maxima and their tendency to overfit noisy and sparse data. In the following, we develop our fully nonparametric Bayesian approach for learning the infinite multivariate generalized Normal mixture model (inMGNMM). Thus, the likelihood of complete data (both observed and missing) is expressed as: The role of the likelihood is to update the prior knowledge as regards the parameters of the model (p(Θ)). Then, the priors on the model parameters will be updated and expressed in terms of posterior knowledge p(Θ|Y, Z ). By exploiting the Bayes' rule, we have the following relationship between priors, likelihood and posteriors [4]: where (Z, Y ), p(Θ) and p(Θ|Z, Y ) are the complete data, the prior information about the model's parameters and the posterior, respectively. It is important to note that the choice of the priors is one of the crucial aspects of Bayesian modeling and directly influences the final results [31]. The provided observed data will certainly improve these priors. In what follows, selecting and specifying the best choice for priors is covered as well as the finding of the underlying equations for the posteriors. After that, we will simulate a value very close to the truth for Θ ∼ p(Θ|Z, Y ) by applying the efficient Gibbs sampler algorithm. In the current work, we choose priors for the parameters of the proposed model as follows: • We start by specifying a common prior for µ j as a Normal distribution, thus we have For the shape parameter , β j , an appropriate prior is the Gamma distribution (β j ∼ G(α β , β β )). • For the covariance matrix Σ j , a natural prior choice is the Inverted Wishart (Σ j ∼ IW(ψ, ν)).
Having all these prior distributions, we move now to calculate all conditional posterior distributions based on the Bayes' theorem. - The conditional posterior distribution for the mean parameter is determined as: - The conditional posterior distribution for the shape parameter is determined as: - The conditional posterior distribution of covariance matrix is calculated as: As for the conditional posterior distribution of the mixing weight w, we know that w is determined on the simplex {(w 1 , . . . , w K ) : ∑ K−1 j=1 w k < 1}, then, in order to account for the infinite mixture principle, a natural prior that we have to state is the Dirichlet distribution with parameters δ M [20].
The inference of w is performed via the inference of the membership vectors Z n . Thus, we can deduce p(Z | w) as follows: where a k represents the total elements in cluster k. When considering the Dirichlet integral, . For Y l , l = 1, . . . , n − 1, the conditional prior that X n to be in component k is given as [32]: In order to N vectors, Equation (10) is updated as follows [32]: where Z −n = { Z 1 , . . . , Z n−1 , Z n+1 , . . . , Z N }, a −nk is the number of vectors, excluding Y i , in cluster j. It is then necessary to calculate the conditional posterior by multiplying the prior (Equation (11)) by the likelihood Y i .
Remember that, in order to deploy mixture models, we have to define the number of components (called the model's complexity), which is always a difficult question. In our nonparametric case, we initially assume an infinite number of clusters by making K → ∞ in Equation (11). So, the conditional prior becomes [20,32]: where the represented clusters are denoted by R and the unrepresented clusters are denoted by U . Once the conditional priors in Equation (12) are evaluated, we now calculate the posterior using the following resulting equation [32]: Depending on a given probability, the former equation can be explained clearly by the fact that each vector is assigned to a represented or unrepresented cluster. If a vector is set for an unrepresented component, then we will create a new represented component all the time we have at least an empty cluster). This scenario explains the principle behind the countably infinite mixture well [20]. Our implemented Bayesian infinite model is illustrated in Figure 2. This figure shows the underlying graphical model. Figure 2. Graphical representation of our developed Bayesian infinite multivariate generalized Normal mixture model. Fixed hyperparameters are indicated by rounded boxes and random variables by circles. Y is the observed variable, Z is the latent variable, large boxes indicate repeated process, and the arcs show the dependencies between variables.

Pseudo-Algorithm
After obtaining all necessary conditional posterior distributions, we consider the MCMC to estimate the underlying values. Here, the iterative Gibbs algorithm is executed [21] in order to sample the next unknown amounts from the conditional distributions given the values obtained previously. Our implemented pseudo-algorithm is summarized as follows: • Generate Z i from Equation (13) It is worth noting that, during the initialization step, the algorithm starts by supposing that all the vectors are in the same cluster. Moreover, the initial values for the model's parameters are supposedly produced from their prior distribution. To sample the vectors Z i , it is required to compute the integral in Equation (13); however, this quantity is not analytically tractable. To deal with this issue, we propose to apply an efficient algorithm named "Metropolis-Hastings algorithm (M-H)". The latter has the advantage of making the right decision to accept or reject the new samples-at iteration (t)-using an acceptance rate defined as follows: where q is the target distribution. In order to deal with the convergence problem with MCMC inference (in other words, how much the MCMC has to be executed), this problem is addressed, as in [33], to stop sampling and by applying a successful algorithm named one long-run with diagnostics [34].

Experiments
The goal of the following experiments is to evaluate the proposed framework and to compare the performance of the infinite multivariate generalized Normal mixture to its counterpart finite model and also to other models. We are primarily concerned with two challenging applications, namely texture classification and human activity categorization. For each of these, we have run ten chains with varying initial parameter values and have considered 10,000 iterations for the Metropolis-within-Gibbs sampler. Convergence was assessed using the diagnostic procedures, as in [35]. For all models tested, we discarded the first 1000 iterations as "burn-in" and kept the remaining 9000 iterations, from which we calculated the posteriors.

Texture Classification
First of all, we are motivated by the problem of texture images' modeling and categorization. Unlike natural images, which include certain objects and structures, texture images are a special category of images and do not have a well defined shape. Texture characteristics play an important role in visual content analysis and are a well-studied property of images. It has been widely exploited before as an important indicator for the categorization and classification of images [36][37][38][39]. A number of computer vision applications use the texture information such as information retrieval, material classification [40], object detection [37,41], image segmentation [36,42,43] and facial expression recognition [38]. Previous texture extraction techniques have been mainly based on Gray Level Co-occurring Histograms [44], Local Binary Pattern (LBP) [45] and different filter banks [46]. An in-depth examination of these approaches is beyond the scope of our study. Instead, we are focusing on an efficient technique to extract relevant texture features and this technique has produced interesting results (DMD) [25]. It is based on the representation of texture using a set of local features called dense micro-block difference (DMD). The obtained features do not involve any quantization, thus retaining the complete information. In addition, these features do not involve any thresholding step and have the advantage of being quick and simple, like binary features. They are capable of achieving high discrimination and are considered invariant with respect to orientation, scale and resolution. After extracting these features, we encode them using a Fisher vector method [25] to obtain a set of descriptors of the underlying image. Thus, each image is represented by a robust multidimensional vector of descriptors that involve high-order statistics (the Matlab code for the features is available at http://www.cs.tut.fi/~mehta/texturedmd (accessed on 1 February 2021)). The last step is to classify the resulting descriptors using a linear support vector machine classifier, which results in the categorization of texture images into homogeneous categories. In all our experiments, we use a linear SVM classifier, which is able to directly operate on the vector of features and also requires less training time over other SVM kernels. The performance study is reported in terms of accuracy based on 10 fold cross validation.
Experimental evaluation involves the following texture databases. The first dataset, named UIUCTex [47], contains 25 texture classes and each one includes 40 images, thus, each has 40,640 × 480 pixel images. The textures of this dataset are taken under different scales and viewpoint changes, including non-rigid deformations. The second dataset, named KTH-TIPS [40], contains images of ten classes (materials) and each one includes 81 images (200 × 200 pixel ) per class. Images are captured with different combinations such as nine different scales, viewed under three different poses and illumination directions. The third dataset is the UMD dataset [48], which is composed of 25 textures classes including 40 samples (with resolution of 1280 × 900) per each texture-class (a total number of 25 × 40 = 1000 uncalibrated images). Images are captured under different significant distances and viewing directions, and the illuminations are uncontrolled. Figure 3 shows random samples of textured images from different classes presenting in the three datasets. For all these databases, 50% of images are randomly chosen for training and 50% for testing. In order to quantify how well textures are classified, we evaluate the performance with an accuracy metric. To validate our approach (inMGNMM), we compared it with five other methods, namely finite Gaussian mixture (GMM), infinite Gaussian mixture (inGMM), finite generalized Gaussian mixture (GGMM), infinite generalized Gaussian mixture (inGGMM), and finite multivariate generalized Gaussian mixture (MGGMM) [10].  The evaluation results using three different datasets for six different methods are summarized in Tables 1-3, respectively. Accordingly, it is clear that our method has the highest achieved accuracy for the three datasets. In particular, for the case of the UMD dataset, the accuracy of our infinite mixture with DMD features is 92.12% and, therefore, it outperforms the other methods-GMM, inGMM, GGMM and inGGMM by-6%, 5%, 2.37% and 1.92%, respectively. Likewise, we reached the same conclusion for other datasets, and our model provides very encouraging results using different features as compared to other models. By contrast, the worst performance is obtained with the Gaussian-based classification. We can also conclude that all obtained results using our proposed infinite mixture are increased by around 1% to 2% if comparing it to its finite mixture counterpart (MGGMM). This can be explained by the model's capacity to combine prior knowledge during the learning process, its generalization capability, and its flexibility in terms of making less restrictive assumptions about the number of clusters than the finite model. On the other hand, a superior performance of the visual feature DMD over the SIFT and LBP for texture classification task can be observed in all these datasets. With DMD, the accuracy is improved over SIFT and LBP by around 3% and 2%, respectively. This result is probably achieved by the rich description of DMD obtained by considering all possible fine details at different resolutions and scales. Finally, we note that the performance of all methods for the KTH-TIPS dataset indicates that it is sometimes difficult to identify texture because of its various rotation and viewpoint changes.

Human Actions Categorization
Recently, multimedia categorization and recognition are becoming two challenging research problems that have attracted a lot of attention for several applications [49][50][51][52]. Object categorization refers to the task of labelling objects into one of the predefined and meaningful categories and this step is mainly based on extracting effective visual features. Image categorization plays an essential role in many applications such as automatic organization, activity recognition, object retrieval and tracking. In particular, recognizing human activities (HAR) is extremely recommended for security purposes (video surveillance), health care, robotics, and so forth. The objective is to identify the actions of one or more persons and then to analyze them [53]. It is worth noting that manually categorizing a big dataset is impractical and time consuming. Recently, various automatic and semi-automatic methods have been developed to address this problem [10,12,[54][55][56]. For instance, a finite generalized Gaussian mixture model is investigated to deal with HAR through a complex real dataset [10,12,52]. In [57], the authors developed an algorithm to identify human gestures using features derived from key poses. A Hidden Markov Model is also presented for HAR in [58]. It is noted that, for a human, a simple glance is enough to classify or categorize any action. Nevertheless, exact and accurate categorization with a computer (or algorithm) is a very difficult task and it will be more complex in the presence of illumination, occlusions, lighting, noise, the and absence of a large amount of labelled data. Thus, increasing attention has been paid to unsupervised machine learning methods, which are able to learn objects from unlabelled data. Furthermore, implementing more advanced computational intelligence techniques is required to solve the above issues and to achieve higher recognition rates.
In this work, our contribution consists of exploiting our approach for human activity categorization and recognition. For this goal, the KTH human action dataset [59] was used to evaluate its performance. Some samples of representative actions with four scenarios are presented in Figure 4. These scenarios are outdoors (s1), outdoors with scale variation (s2), outdoors with different clothes (s3) and indoors (s4). The dataset basically contains 2391 sequences with different actions that are grouped into six categories-running, boxing, walking, jogging, hand waving and hand clapping. Before we applied our model to categorize actions, we basically adopted a few useful preprocessing steps. Fist, a visual vocabulary was constructed by adopting the bag-ofwords (BOW) model representation for the input training sequences. Indeed, each sequence would be defined by a set of local SIFT3D descriptors [60]. These features were then quantized as visual words via the K-means algorithm [61]. We then had a frequency histogram over the visual words. After that, we applied a probabilistic Latent Semantic Analysis (pLSA) [62] method on the obtained frequency histograms to generate a d-dimensional vector for each image. This final representation would then be used for unsupervised activity categorization.
In order to assess the performance of the proposed approach, we adopted the accuracy metric as a recognition rate for human action categorization. To test the efficiency of our method, we used a five-fold cross-validation setup. For evaluation purposes, we compared the proposed approach with several state-of-the-art methods as depicted in Table 4. According to this comparative study, we can clearly see that our approach provides the best recognition rate (84.68%) compared to the other methods. This result is considered very encouraging given that we approached the recognition problem in an unsupervised manner. The superior performance of the proposed infinite model over its finite counterpart (82.01%) is also clear. This result proves that the infinite mixtures are more attractive and flexible enough to be applied to recognition problems. We report that the developed framework does not take into account any background tracking or subtraction and, for this reason, the accuracy is considered quite high. Note that sometimes we are not able to recognize very similar actions such as those involving hand motions. To improve these findings, it may be interesting to simultaneously incorporate a feature selection process into the developed statistical learning model.  [63] 83.33 Wong et al. [64] 73.24 Fan et al. [65] 79.33 Schuldt et al. [59] 71.71 Dolar et al. [66] 78.50 MGGMM-FS [10] 82.01 MGGMM-KL [10] 82.36 inMGNMM (our method) 84.68

Discussion
The experiments conducted show the superiority of our approach over other state-ofthe-art statistical models and methods. Indeed, the obtained results show the benefits of our model and prove the flexibility of the implemented statistical framework. In particular, we obtain a better performance within the descriptor DMD (in terms of accuracy). However, it is important to note that some poor results are obtained (for all involved methods in this work) and this can probably be explained by the difficulty of identifying the exact texture class due to its different rotations and changes in viewpoints. One possible solution, which we are starting to work on, is to introduce a feature selection mechanism, as in [67,68], to improve the classification performance. Another challenge we face with our method is to fine-tune hyperparameters that may have an impact on classification and recognition accuracy. It is also important to note that the quality of images used for validation purposes may be affected by uncertainties and/or inaccuracies that would make the edges not perfectly distinguishable. In these cases, it would be necessary to use fuzzy preprocessing of the images, as done in [69,70]. On the other hand, Bayesian classification by infinite mixture models is also noted to be interesting due to its efficient MCMC sampling technique; however, it is problematic because of its high computational cost. In particular, we run the Gibbs algorithm several times to achieve a correct average accuracy. Therefore, one potential for future work might be to extend this work to use variable inference instead, in order to save time over MCMC methods. Finally, it may be important to apply the proposed work to more complicated and challenging machine vision datasets for further performance evaluation.

Conclusions
We have proposed in this paper an effective hierarchical nonparametric statistical Bayesian framework that tackles complex data modeling, categorization and classification. The proposed framework is based on a mixture of multivariate generalized Normal distributions (MGNMM). The main goal here is to extend the finite MGNMM to the infinite case (inMGNMM), in which the number of components is supposed to be infinite initially and will be inferred automatically during the learning and modeling of classes. The success of the nonparametric Bayesian learning approach is due to the appropriate choice of priors and the accurate posterior estimates via MCMC simulations. The proposed inMGNMM is learned through MCMC inference. This learning algorithm is guaranteed to converge and has the advantage of taking into account our prior knowledge, to formalize our uncertainty via flexible probability distributions, and to overcome the over-fitting issue. Our choice of the infinite assumption, instead of the finite, is motivated by the fact it is able to learn the number of components (i.e., model selection) and the parameters at the same time. The implemented framework allows for the logical treatment of determining the optimal number of components and the efficient estimation of the parameters' values. The effectiveness of the developed approach is confirmed by testing it on two challenging computer vision applications, namely texture classification and human activities categorization. Extensive experiments were conducted and the results demonstrate that the proposed approach outperforms the state-of-the-art on a number of datasets. These results prove the coherence, effectiveness and flexibility of our approach. Future works could be devoted to extending the current Bayesian work by integrating a feature selection mechanism, which could further enhance the expected results. It is also our hope that many other real-world image/signal processing problems can be tackled within the proposed approach. Finally, we plan to extend the current work by implementing a variational inference model in order to overcome the computational complexity of the Bayesian approach. Data Availability Statement: Data available in a publicly accessible repository: UIUCTex [47]; KTH-TIPS [40]; UMD dataset [48]; KTH human action dataset [59].