Probabilistic Unsupervised Machine Learning Approach for a Similar Image Recommender System for E-Commerce

: The recommender system is the most profound research area for e-commerce product recommendations. Currently, many e-commerce platforms use a text-based product search, which has limitations to fetch the most similar products. An image-based similarity search for recommendations had considerable gains in popularity for many areas, especially for the e-commerce platforms giving a better visual search experience by the users. In our research work, we proposed a machine-learning-based approach for a similar image-based recommender system. We applied a dimensionality reduction technique using Principal Component Analysis (PCA) through Singular Value Decomposition (SVD) for transforming the extracted features into lower-dimensional space. Further, we applied the K-Means++ clustering approach for the possible cluster identification for a similar group of products. Later, we computed the Manhattan distance measure for the input image to the target clusters set for fetching the top-N similar products with low distance measure. We compared our approach with five different unsupervised clustering algorithms, namely Minibatch, K-Mediod, Agglomerative, Brich, and the Gaussian Mixture Model (GMM), and used the 40,000 fashion product image dataset from the Kaggle web platform for the product recommendation process. We computed various cluster performance metrics on K-means++ and achieved a Silhouette Coefficient (SC) of 0.1414, a Calinski-Harabasz (CH) index score of 669.4, and a Davies–Bouldin (DB) index score of 1.8538. Finally, our proposed PCA-SVD transformed K-mean++ approach showed superior performance compared to the other five clustering approaches for similar image product recommendations.


Introduction
In the digital era, the e-commerce industry is growing rapidly, especially since many of the users are migrating towards online shopping from traditional offline shopping in many developing countries. In countries such as India, China, Singapore, Malaysia, Japan, etc., the growth rate consistently increases, and millions of users are interested in purchases through e-commerce platforms [1,2]. There is a wider scope of increase in the demand for e-commerce purchases in many developing nations due to the COVID-19 pandemic situation, and millions of people prefer to purchase through online shopping now-a-days. Especially several categories of products are in high demand like clothing, electronics, furniture, sports, etc. [2]. Out of which apparels and some of the electronic products mostly rely on visual appearance to attract the users to purchase the products. These e-commerce portals consist of millions of images relevant to various products, and bringing the desired product of the customer is a challenging issue. Many researchers had come up with several possibilities to address the problem but not a satisfactory solution to many e-commerce problems [3][4][5][6] as many e-commerce platforms use a recommender system and mainly rely on textbased searching approaches. It takes the user input and bases it on the word tokenization process. The recommender system is to bring the possible matching products to the users. However, this kind of approach had some limitations and missed many features such as colors, patterns, texture, and shape in the product images [7,8]. In this text-based search, there is a need to specify the descriptions of each product, which may not describe the entire product. Here, we come up with a possible solution, which is to search for the relevant product bases on the given input image, as shown in Figure 1. We try to train the model for the various features through the feature engineering process, such as colors, texture, and the shape of the objects in the image, which further can be processed through various machine learning dimensionality reduction techniques, such as Principal Component Analysis (PCA) [9], Expectation-Maximization (EM) PCA [10], Linear Discriminant Analysis (LDA) [11], and Probabilistic PCA [12,13], for the proper recommendations from the given input image [14]. Once the features are extracted, it is necessary to compute the similarity measure in terms of distance to be computed from the origin to the set of images in the database [15]. Several clustering-based unsupervised learning approaches have been proposed by researchers for a similar image recommender system where the ground truth labels are known. For measuring the distance and analyzing the quality of the recommender model, the following measures [16] are to be used for the unsupervised learning approach.


Adjusted Rand Index (ARI) is used to measure the similarity between the clusters where ground truth labels are known [17]. The ARI can be computed using Equations (1) and (2) [16].
where , are the number of elements with the same and different set in S of the cluster.
 Homogeneity (hg), Completeness (cp) are the measures for the same class and same cluster predictions [18], and harmonic mean can be computed using V-measure [19] as described in Equation (3) [16].
Where is the random value to be randomly considered as less than 1, the harmonic mean value ranges between 0 and 1, and the highest value will be the best to consider.  The Fowlkes-Mallows score (FMS) [20,21] is used to compute the geometric mean of the similarity of the clusters where the ground truth labels are known where the FMS is lying between the 0 and 1, and a greater value is a better similarity among the clusters. The FMS uses the True Positives (TP), False Positive (FP), and False Negative (FN) for the similarity measure analysis is shown in Equation (4) [16].
In our research work, we proposed of approach through unsupervised clustering algorithms where the ground truth labels are unknown. This kind of clustering problem can be computed with performance measures such as the Silhouette Coefficient (SC), the Calinski-Harabasz (CH) index, and the Davies-Bouldin (DB) index, and these measures are discussed briefly in the results section in the paper.
Further, we organized this research work into five major sections, where related works are discussed in Section 2, our proposed method of a similar image recommender system is presented in Section 3, dataset, pre-processing stage, and various experimental approaches are presented in Section 4, the final results and comparisons are presented in Section 5 and the conclusion and limitations of this research work are presented in Section 6.

Related Works
Image-based recommendations through clustering play a vital role in many application areas, and profound research works are done mostly on the medical images for various analyses using artificial intelligence and other machine learning techniques. Hancer et al. [22] proposed an artificial bee colony approach for the three benchmarks Lena, Remote Sensing, and Brain MRI images for the clustering approach compared with the particle swarm optimization, and K-means cluster approaches. M. Gong et al. [23] had come with a new clustering approach using the Kernal metric for the fuzzy c-mean clustering process and tested on the various image datasets like synthetic, natural, and medical images for the performance evaluation. Karthikeyan and Aruna [24] proposed a probabilistic text and image-based semi-supervised clustering approach. They used the topic modeling, comparing image features such as color sets and image block signature computing the similarity distance measure. Their method further compared with the K-means and Dbscan unsupervised clustering algorithms.
Matrix-factorization-based image clustering was proposed by K. Zeng et al. [25], and the objective of their work to recognize the basis on the parts of the objects in the images which are computed through the non-negative matrix factorization technique with the hypergraph regularization process. The authors used the USPS handwritten digit dataset, ORL, and Yale, which are the grayscaled face image datasets used for the performance analysis, and compared the results with pre-existence clustering algorithms such as graph regularized Non-negative matrix factorization (NMF), PCA, and K-means. An improved Artificial Bee Colony (ABC) clustering approach had been tested on seven benchmark images compared to the fitness values, objective functions along with quality measures such as DBI and XBI values with particle swarm optimization, genetic algorithms, and K-means clustering [26].
Younus et al. [27] proposed a content-based similar image retrieval system using Particle swarm optimization (PSO) and K-means by extracting various features in the color images like histograms, co-occurrence matrix, both the color and wavelet moment for the clustering process. For their experimentation, the Wang dataset is used, which consists of 10 label classes with every 100 images. Precision and recall measures are computed on the query and retrieval images and listed out the comparison results with existing methods. The hierarchical and flat clustering approach used to cluster for the cell phone images was proposed [28,29] using sensor pattern noise. In this approach, a total of 1350 images are used to process and cropped and converted to grayscale for applying wiener filter and applied the hierarchical and flat clustering approach. Then, to validate the clustering process, a silhouette coefficient is computed and further applied true positive rate to assess the degree of certainty of the clusters.
Convolutional neural networks have been used to train and perform image clustering by continuously applying the forward and backward propagation to improve the clustering process for representation learning, object loss function, and interpreted through agglomerative clustering process [30]. The work is extensively tested on the benchmark multiple handwritten image datasets and facial image datasets and made a performance comparison with several other algorithmic approaches by using NMI metric. Pandey and Khanna [31] had proposed the Content-based image retrieval (CBIR) approach using agglomerative clustering and applied on labeled multiple datasets and computes various cluster measure for similarity. Vantage Point (VP) trees are a popular approach for the faster indexing of the images in a CBIR approach and in [32] come up with a new distance index measure called DCIVI for the nearest and furthest neighbors by applying the VP-Tree concept for faster image retrieval on a remote sensing image dataset compared with a sequential scanning algorithm in terms of indexing, feature extraction and query response time.
Biradar and Ahmed [33] developed visual CBIR using edge and corner detection of the image using the Harris corner detector and feature extraction of the images done using the Voronoi tree algorithm. Further authors using a support vector machine (SVM) as a classifier of final image classification achieved 90% accuracy on their approach. In [13], the PCA-based unsupervised learning approach was implemented for the segmentation of brain tumor cells from T1 weighted MRI scan images. To this process, authors tested various PCA dimensionality reduction techniques and found that Expectation-Maximization (EM) PCA and Probabilistic PCA (PPCA) are the best fit for their process through FCM clustering and K-means clustering with various image dimensions. A graphbased probabilistic dimensionality reduction using manifold hashing techniques was proposed for image similarity search, which uses the K-nearest neighbor approach for landmark representation and further analysis [34]. Authors had tested their model on benchmark datasets such as CIFAR, MNIST, NUS-WIDE, and GIST and compared with other popular hash methods, such as LSH, ITQ, MDSH, AGH, etc., and shown improvement using their proposed method.
A similar fruit query system was proposed by Fachrurrozi et al. [35] using fuzzy-based feature extraction using a color histogram, and the color moment invariant approach then applied the Kmeans clustering for the extracted features to compute the similarity distance between the query image and the cluster images done using the K-nearest neighbor search algorithm. In this process, the authors used various single and multi-object fruits to test the accuracy of their model and achieved 92.5% and 90%, respectively. Many researchers have been using converting RGB (Red, Green, and Blue) images to greyscale for feature extraction and further apply various approaches for similar search recommendations. Instead of converting into greyscale, PCA with a smaller dimension of the images can be utilized. Fleece fabric-based image classification has developed using PCA dimensionality reduction and further applies K-nearest neighbors and Naive Bayes classifiers for the accuracy comparison [36]. Various textile image retrieval was proposed using local PCA-based features descriptors like color, orientation for localized color feature extraction and classification done on the join feature criterion for better image retrieval and experimentation carried out on various textile images with different patterns and measured the precision and recall [37].
Chen et al. [38] proposed a novel approach to high dimensional large scale datasets for image classification and retrieval using the local neighboring approach and clustering through NQ-Dbscan and achieves the ( * log( )) and ( ) for some cases. Singh and Srivastava [39] had come up with an improved image classifier using Random Forest classifier and LBP feature extraction for text, color, moment, and histogram features. Benchmark Wang dataset is used for performance analysis in terms of accuracy, precision-measure, and MCC are compared with K-NN, SVM, Naive Bayes with the proposed approach. In [40], an artificial neural-network-based approach, along with various image descriptors for color, edge detection using the YCbCr method and discrete wavelet transformation, was applied for better image retrieval, and experiments have been carried out on the Wang dataset and computed the accuracy measures. Later, Recall measures were compared with the existing methods. Jian et al. [41] used the direction patch extraction approach based on the perception and color histogram, and the Gabor filter for texture feature, employed for its process to avoid the complex segmentation, further uses Dbscan clustering for various regions and was compared with other researchers work in terms of precision and recall.
Jafarzadegan et al. [42] proposed the PCA-based combined hierarchical cluster approach and used the cophenetic correlation score and Wilcoxon hypothesis measure to compare the quality and efficiency of the cluster approach. Various datasets have been taken and compared with other existing methods and shown improvement using their methodology. Mateen et al. had proposed VGG-19based feature extraction along with dimensionality reduction using PCA, SVD and optimized through Gaussian Mixture Model for the Fundus image classification and carried out the experimentation on the Kaggle image dataset consisting of 35,126 images and achieved more than 92% accuracy at various levels [43]. The authors of [44] presented keyword-based image recommendations for e-commerce products using the Markov-chain-based method along with the annotations for the image and results are compared with the existing methods. D. sha et al. had proposed a fashion clothing feature extraction for analyzing various attributes, such as cloth pattern, sleeves, and collar using GIST, Fourier and Pyramid Histogram of Oriented Gradients (PHOG) feature extraction and experimented using the tmall fashion dataset with 8000 images [45]. In [46], the authors proposed a text-based image recommendation using bag-of-words and the Term frequency-Inverse document frequency (TF-IDF) approach to get similar product image recommendations. For this approach, the authors experimented on the amazon fashion image dataset and computed the Euclidean distance measure to fetch the top-N similar recommendations.
Many researchers have worked on various machine learning approaches for similar image retrieval systems for various domains with supervised and unsupervised approaches. The majority of the researchers performed computing performance metrics such as precision, recall, and F1, where ground truth labels are known. In this paper, we have addressed a similar image recommender system for the e-commerce domain using various machine learning unsupervised clustering algorithms and computed the cluster performance metrics where ground truth labels are unknown.

Similar Image Recommendation
In this section, we discuss our proposed research method using Machine learning statistical unsupervised dimensionality reduction techniques using Principal Component Analysis through singular value decomposition (PCA-SVD) on a fashion image dataset and further applied unsupervised clustering algorithm using K-means(++) to find out the similar image clusters and computed the distance measures for the similar top-N product image recommendations and in Figure  2 shown our proposed research method. Figure 2. Proposed visual recommender system using machine learning approaches.

PCA through Eigen Value Decomposition (PEVD)
Dimensionality reduction is helpful to reduce the image dimensions and initial image matrix represented with the RGB (Red, Blue, and Green) color space values. Initially, the standardization process was for scaling and centering the data on a d-dimensional apparel image dataset. In this process, we can compute mean and standard deviation on the given data points as follows Equations (5)-(7) [9,47].
where are the feature values of the given data matrix; further, the covariance data matrix is constructed to see how far the features are changing together by using the following covariance Equation (8), where A, B are the features and , are the feature values, and the mean is represented by µ.
Further, the covariance matrix is used for calculating and building eigenvalues and eigenvectors for measuring the spread of the variance on projected vector points. Computing the eigenvalues and eigenvector from the following Equation (9): where , , , … and , , , … are the eigenvalues and eigenvectors of covariance matrix and further generates the new, transformed dimension vector.

PCA through Singular Value Decomposition (PSVD)
Principal components (PC) computation through singular value decomposition is a better approach in terms of numerical precision and stability. SVD [48,49] relies on the divide-and-conquer strategy, and eigenvalue decomposition mostly relies on the QR algorithm [50], which is less stable, and a loss of precision can cause the forming of the covariance matrix in the EVD approach. PCA through SVD is also computationally efficient when compared to EVD in terms of dealing with the high dimensional datasets. SVD for principal components can be computed using Equation (10) Furthermore, in our approach, we have performed the dimensionality reduction on our ecommerce dataset. The total dimensions in our dataset before PSVD [51] transformation is 14,400 and after applying the SVD and reduced by 90.01% and retained the 144 components to transform the ecommerce images. The following Figure 3 shows the reconstructed images from the original image after retaining the major principal components, and Figure 4 shown the 2-Dimensional visualization image projection for major principal components of the e-commerce product dataset.

K-Mean(++) Clustering Approach
Clustering is an unsupervised learning technique to find out the K-Patterns in the given image dataset. For our image similarity approach, we have taken a Fashion image dataset with 40,000 images for analysis. We have computed K-Mean clustering [52] by computing ten iterative times with different centroid positions, and the K-Means++ [53] initialization method has been used for better convergence. An earlier version of K-Means consists of the initialization method, assignment of data points, updating the centroid, and repeats the stages until it finds out the convergence. Hence, choosing the random k-centroids is also known as the Initialization sensitivity procedure. To overcome the initialization sensitivity procedure in our approach, we have adapted the K-Means ++ approach for finding out the better initial selection of K-centroids for PCA transformed images using the SVD approach on our e-commerce 40K dataset. Here, in the K-Means ++ approach initially picks a centroid point arbitrarily and computes the distance to all available data points from the arbitrarily initialized centroid, as shown in Equation (11). = ( : ) || − || (11) where is the number of centroids used and picked for each iteration and becomes the new centroid depends on the proportion of higher probability to distance and we repeat these steps until finding the K-centroids. Further, we find the Euclidean distance for each data point from the Kcentroids using Equation (12).
Closest points with respect to the e-commerce images are assigned to the clusters and repeats this process until we reach the convergence and all the image points ( ) are assigned to Cluster by using Equation (13).
Further, we have generated 16 different clusters from sampled images from the 40K image dataset by computing inertia, as shown in Figure 5, and the silhouette coefficient to understand the goodness of our cluster model. Equation (14) is used to measure the goodness metric through the silhouette coefficient [54] on various clusters formed using k-Means++.
where ( ), ( ) are the average and lowest average distances between and all other data points within and different clusters, respectively.

Similar Distance Measure
Fetching the relevant similar images from the given input image from the fashion dataset, we have computed the Manhattan distance measure [15] between any two given vectors X and Y to compute the lowest distance on PCA through SVD transformed image clusters generated using the K-Means++ approach for e-commerce product image clusters by using the distance metric function as shown in Equation (15) and fetches the top-5 most similar images for the given input image as user recommendations.

Dataset and Experimentation
We have collected 40K fashion product image dataset [55] along with the major category and subcategory of the products. There is a total of six major categories of products, such as apparel, accessories, footwear, personal care, sporting goods, etc., and 44 subcategories of the various fashion products top wear, shoes, bags, watches, wallets, belts, etc. Out of these major categories and subcategories, most items are listed in apparel, accessories, footwear, and personal care. Further, as in the pre-processing data stage, we have sampled out into our training dataset with 7632 images of 16 subcategories for our experimentation purpose with each category of equal distribution to avoid the unbalance data distribution problems that affect the final recommendations. Figure 6 shows the category product image distribution from our fashion product image dataset.

Experimentation
In this section, we performed our experimentation using the PSVD approach for dimensionality reduction and fetched the important principal components. Initially, we have set the number of components to 500 and computed the PCA singular values through the PSVD process on our sampled dataset dimensionally reduced to 90.01% generates 144 final principal components. Figure 7 shows cumulative variance by 500 PCA components and convergence achieved at 144 principal components. Further, we have computed the K-means++ clustering approach on the PSVD transformed images. Here, we have computed various initializations for the number of cluster identifications ranging from 8 to 16. For each iteration, we tried to determine the better convergence for the better number of cluster formation of our fashion products. Further, we have calculated each cluster inertia to maximize the quality estimation of the number of clusters for further analysis. We have found optimal values at cluster 16 with a lower inertia value, which converge after 35 iterations. In Figures  8 and 9, showed the high dimensional cluster segregation for 16 subcategories of images using tdistributed stochastic neighbor embedding (t-SNE) [56] for the K-means++ approach.  Finally, we computed the distance measure using the Manhattan distance measure for the given input image to the cluster images and fetched the top-K similar images, and we have further analyzed and compared with other standard clustering algorithms for better results. All our experimentations are carried out using a desktop computer with Intel Core i7-8700K Processor/16 GB of RAM/64-bit Ubuntu operating system/Python 3.6 Environment.

Results
We have carried out various cluster performance metrics to analyze and achieve better recommendation results for our approach. Apart from inertia for K-means clustering, we have, furthermore, computed various cluster performance metrics, such as the silhouette coefficient, as computed using Equation (14) with results ranges from the values −1 to 1, where the higher the SC value, the more efficient the cluster. In our approach, we have achieved a 0.1414 coefficient value, which is a higher value from the other algorithms. The comparison results for the SC coefficient for various unsupervised clustering algorithms are shown in Figure 10. To analyze the criteria for the variance ratio for checking the clusters, we computed the Calinski-Harabasz (CH) score. The CH score tells us whether or not the clusters are well separated and if the optimal value is higher than any other clustering approach. Equations (16)- (18) shows the calculation of the CH score: where is the number of data points, ( ), ( ) are between and within the cluster variation, respectively, is total clusters, is the set of cluster points, and the total data size is . The CH score we got here for our approach is 669.4, which an optimal value compared to other algorithms. The comparison results for various CH scores are shown in Figure 11. Finally, we have computed the Davies-Bouldin index score for the average cluster similarity, which computes that an optimal value is a high score among the clusters, and zero is the lowest score. The DB index score can be computed using Equations (19) and (20): = + (20) where is the mean distance between cluster points and is to compute the similarity between centroids of the clusters and . The DB score we achieved is 1.8538, which is slightly lower compared to other algorithms. The comparative performance of the DB index score for various clustering algorithms is shown in Figure 12.   Figure 13 shown the final similar product recommendations using PSVD transformed K-means++ approach with Top-5 recommendations fetched using the Manhattan distance measure.

Conclusions
The purpose of our proposed model is to fetch similar images for e-commerce portals when the user selects the desired product image using unsupervised statistical machine learning techniques. Here, in our approach, we initially performed PSVD dimensionality reduction on the fashion product image dataset and retained the 144 principal components and achieved a 90.01% variance from the original dimensions, which are a total of 14,400 dimensions. Further, we performed the K-means++ clustering on the PSVD transformed images. We have compared PSVD transformed with other clustering algorithms, such as MiniBatch, Agglomerative, Brich, the Gaussian Mixture Model (GMM), and K-Mediod. Performance metrics are evaluated on all these algorithms for the SC coefficient for similarity, CH score for variance ration, and DB score for average similarity. Out of all the measures, PSVD-K-means++ has scored well on the SC coefficient and CH score and is lagging in the DB index score. Where we observed in the final similar image recommendations, most of the recommendations work well, but, in a case like if the input image orientation changes, it is possible to get the mixed product suggestion instead of all the similar images. To overcome these limitations further, we suggested applying image augmentation techniques and deep learning approaches for more accurate image feature extraction and training for similar image recommendations.
Author Contributions: Conceptualization, methodology, validation, writing-original draft preparation, S.K.A.; Supervision, review and guidance, A.A. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.