SmartFit: Smartphone Application for Garment Fit Detection

: The apparel e-commerce industry is growing day by day. In recent times, consumers are particularly interested in an easy and time-saving way of online apparel shopping. In addition, the COVID-19 pandemic has generated more need for an effective and convenient online shopping solution for consumers. However, online shopping, particularly online apparel shopping, has several challenges for consumers. These issues include sizing, ﬁt, return, and cost concerns. Especially, the ﬁt issue is one of the cardinal factors causing hesitance and drawback in online apparel purchases. The conventional method of clothing ﬁt detection based on body shapes relies upon manual body measurements. Since no convenient and easy-to-use method has been proposed for body shape detection, we propose an interactive smartphone application, “SmartFit”, that will provide the optimal ﬁtting clothing recommendation to the consumer by detecting their body shape. This optimal recommendation is provided by using image processing and machine learning that are solely dependent on smartphone images. Our preliminary assessment of the developed model shows an accuracy of 87.50% for body shape detection, producing a promising solution to the ﬁt detection problem persisting in the digital apparel market.


Introduction
Recently, societal demands for purchasing apparel products online have increased with technological progress. As a result, the apparel e-commerce industry is booming more than ever before [1]. Therefore, the online presence of apparel companies is necessary for the business to succeed. This is especially during the COVID-19 pandemic, while online apparel shopping for consumers is quickly and gradually increasing [2]. The revenue in apparel e-commerce in 2020 is USD 110.6 billion and is expected to increase to USD 153.6 billion by 2024 [3]. However, one of the main obstacles to the path of increasing revenue in this industry is the large amount of returns. In addition, the industry faces issues with the cost of merchandising, shipping, and processing [4]. There has been a study showing that around 30% of the apparel purchased online is returned due to poor fit [5]. The current existing online shopping websites do not provide any means of measuring body size and shape. Tech startups like True Fit (www.truefit.com) and Fits.me (www.fits.me) have developed virtual fitting rooms using virtual mannequins to mimic the body measurements of shoppers and display what a certain garment, i.e., apparel, might look like when it is worn by the consumer [6]. However, these types of solutions fail in the practical size assessment at the industrial level due to technical barriers such as an overcomplicated process, inefficient technology, low user adaption and costly expenses required to adopt the technology [7]. Even advanced technologies such as Sizer (www.sizer.me) and MTailor (www.mtailor.com) do not always provide actual fit information and size measurements, failing to completely solve the poor fit issue [8].
Online apparel fit detection has been a burning issue due to the lack of technology that can detect consumer body shapes. Consumers' garment fit largely depends on their individual body shapes [9]. Therefore, there is a tremendous amount of profit loss due to the poor fit and associated return issues. In fact, 50% of the time, online purchases result in returns, which in turn, cost time and money to both the consumer and retailer. Returns may cost around USD 10-15 in total per garment, which is a huge burden to be split among the consumers as well as businesses [10].
As most of the garments are made universally in conventional body shapes and sizes, there is a need to solve fit issues in a personalized way starting from apparel manufactures all the way to the retailers and consumers. Aside from tailoring apparel stores, apparel manufacturers are adapting to the patterns of producing garments based on shapebased anthropometric data [11,12]. The four most common traditional body shapes for females [13], namely inverted triangle, pear, hourglass, and rectangle [14] are shown in Figure 1. The body shape plays an important role in understanding the optimal fit of garments for individual consumers [15]. However, current technologies cannot provide the body shape information related to the garment fit [10].
individual body shapes [9]. Therefore, there is a tremendous a the poor fit and associated return issues. In fact, 50% of the tim in returns, which in turn, cost time and money to both the cons may cost around USD 10-15 in total per garment, which is a hug the consumers as well as businesses [10].
As most of the garments are made universally in conventio there is a need to solve fit issues in a personalized way starting f all the way to the retailers and consumers. Aside from tailori manufacturers are adapting to the patterns of producing garm anthropometric data [11,12]. The four most common tradition [13], namely inverted triangle, pear, hourglass, and rectangle The body shape plays an important role in understanding the individual consumers [15]. However, current technologies cann information related to the garment fit [10]. In this paper, we proposed a novel body shape detection which extracts consecutive feature points using speeded up ro detects visual words from extracted features points using the b and classifies body shapes using a machine learning techniq neighbor (k-NN) [18]. We also propose a convolutional neural body shape detection algorithm. Specifically, we suggested a age processing and machine learning based body shape detec images. Especially, SURF feature detection, the bag-of-features ing technique have not been proposed to differentiate the bo The conventional method for body shape detection is limited t based approach [20] or a multiple photo-based 3D reconstruc are mostly inaccurate and impractical due to the expensive co in this paper, we suggest a unique combination of image proces based body shape detection using 2D smartphone images.  In this paper, we proposed a novel body shape detection algorithm for garment fit, which extracts consecutive feature points using speeded up robust features (SURF) [16], detects visual words from extracted features points using the bag-of-features model [17], and classifies body shapes using a machine learning technique, such as the k-nearest neighbor (k-NN) [18]. We also propose a convolutional neural network (CNN)-based [19] body shape detection algorithm. Specifically, we suggested a unique combination of image processing and machine learning based body shape detection using 2D smartphone images. Especially, SURF feature detection, the bag-of-features model, and machine learning technique have not been proposed to differentiate the body shapes for garment fit. The conventional method for body shape detection is limited to a physical measurement-based approach [20] or a multiple photo-based 3D reconstruction approach [15], which are mostly inaccurate and impractical due to the expensive computation [21]. Therefore, in this paper, we suggest a unique combination of image processing and machine learning based body shape detection using 2D smartphone images. Figure 2 describes the overall procedure of the proposed method. This paper is organized as follows: Section 2 describes data acquisition and precessing procedures. Our proposed method consisting of feature detection and classi tion algorithms is explained in Section 3. Results and discussions are presented in Sect 4 and 5, respectively.

Data Acquisition and Pre-Processing
Experimental images have been acquired following the IRB protocol (IRB2020at Texas Tech University which allows for the usage of the images acquired from pu online data. A total of 160 images were originally acquired from online sources (Hour body shape, Available from: www.abeautifulbodyshape.com/hourglass-body-shape-explained/, body shape, Available from: www.bodyvisualizer.com/) and the Google search engine, w are publicly available (Our acquired dataset, Available from: www.youtu.be/204kb29la Then, a total of 80 images were manually selected among them by the expert in app design based on the body shape determinant criteria [22]. These images were selected cause they fell into the four types of body shapes for females we particularly focused namely: inverted triangle, pear, hourglass, and rectangle, since females have more is with the fit and these are the most common body shapes for them. Based on the refer [23], the authors chose 80 images (20 for each category) for the proof-of-concept stud this paper. The major vote method [24] was used to define the label of the images and voting criteria came from the conventional measurement method [25]. Specifically used the major vote among four individuals including two experts of apparel design merchandising and two individuals trained by the expert in apparel design. The con tional method we used in this paper is based on the relations among four body meas ment areas, i.e., shoulder, bust, waist, and hip [13,22,26]. First, the inverted triangle b shape was identified if the shoulders are wider than the hips. Second, the pear body sh was determined when the hips were wider than the shoulders. Third, the hourglass b shape was classified when the width of shoulders and hips were similar, but the w was significantly smaller. Fourth, the rectangle body shape was determined when sizes of the shoulder, bust, waist, and hip appear to be similar.
For the homogeneous processing of the body shape detection, we first perfor preprocessing techniques including image resizing (Image Resize, Available f www.mathworks.com/help/images/ref/imresize.html), edge detection, and smoothing on This paper is organized as follows: Section 2 describes data acquisition and preprocessing procedures. Our proposed method consisting of feature detection and classification algorithms is explained in Section 3. Results and discussions are presented in Sections 4 and 5, respectively.

Data Acquisition and Pre-Processing
Experimental images have been acquired following the IRB protocol (IRB2020-482) at Texas Tech University which allows for the usage of the images acquired from public online data. A total of 160 images were originally acquired from online sources (Hourglass body shape, Available from: www.abeautifulbodyshape.com/hourglass-body-shape-explained/, Pear body shape, Available from: www.bodyvisualizer.com/) and the Google search engine, which are publicly available (Our acquired dataset, Available from: www.youtu.be/204kb29laBo). Then, a total of 80 images were manually selected among them by the expert in apparel design based on the body shape determinant criteria [22]. These images were selected because they fell into the four types of body shapes for females we particularly focused on, namely: inverted triangle, pear, hourglass, and rectangle, since females have more issues with the fit and these are the most common body shapes for them. Based on the reference [23], the authors chose 80 images (20 for each category) for the proof-of-concept study in this paper. The major vote method [24] was used to define the label of the images and the voting criteria came from the conventional measurement method [25]. Specifically, we used the major vote among four individuals including two experts of apparel design and merchandising and two individuals trained by the expert in apparel design. The conventional method we used in this paper is based on the relations among four body measurement areas, i.e., shoulder, bust, waist, and hip [13,22,26]. First, the inverted triangle body shape was identified if the shoulders are wider than the hips. Second, the pear body shape was determined when the hips were wider than the shoulders. Third, the hourglass body shape was classified when the width of shoulders and hips were similar, but the waist was significantly smaller. Fourth, the rectangle body shape was determined when the sizes of the shoulder, bust, waist, and hip appear to be similar.
For the homogeneous processing of the body shape detection, we first performed preprocessing techniques including image resizing (Image Resize, Available from: www. mathworks.com/help/images/ref/imresize.html), edge detection, and smoothing on the acquired images using MATLAB (MATLAB 2020, Available from: www.mathworks.com/products/matlab. html). Specifically, the image resizing changes different sizes of images into the same Electronics 2021, 10, 97 4 of 15 size, i.e., 300 × 600 pixels. To focus on the outer borderline in body shape detection, the edges are detected on the same sized-images using an edge detection algorithm, e.g., Sobel filter [27]. For smoothing, the Gaussian 2D image filter whose filter response at the point (x, y) of the image I is applied and given in Equation (1) [28]: where σ is the standard deviation of the Gaussian distribution. Using the Gaussian filter, the edge detected image is smoothed for noise reduction [29]. The sigma for the Gaussian image filter is taken as σ = 0.5. Figure 3 shows an example of smoothing using the Gaussian filter on one of the body shape images in our database. Specifically, we applied this Gaussian filter with σ = 0.5 to the edge detected image shown in Figure 3a, and Figure 3b shows its smoothing outcome.
the edges are detected on the same sized-images using an edge de Sobel filter [27]. For smoothing, the Gaussian 2D image filter whos point (x, y) of the image I is applied and given in Equation (1) [28]: where is the standard deviation of the Gaussian distribution. Us the edge detected image is smoothed for noise reduction [29]. The s image filter is taken as σ = 0.5. Figure 3 shows an example of smooth filter on one of the body shape images in our database. Specifically, w ian filter with σ = 0.5 to the edge detected image shown in Figure 3a its smoothing outcome. In this paper, we considered four types of body shapes: inverte glass, and rectangle shapes. Each category of body shape consists The obtained database was divided into training and testing sets hold-out and leave-one-out approaches were assessed. For the h proach, 70% of the total number of images was used to train the mo for validation [30] (see Figure 4). Hence, 14 images from each categ = 56 images were used for training the model. On the other hand, randomly held out for validation [31]. As a result, the remaining 2 category) has been used to validate the model. The leave-one-ou sessed where the machine learning model is applied once for each base, using all other images as the training data, and the selected image [32]. Therefore, the algorithm was simulated 80 times, each as training data and remaining image as the testing image, consecu In this paper, we considered four types of body shapes: inverted triangle, pear, hourglass, and rectangle shapes. Each category of body shape consists of 20 labeled images. The obtained database was divided into training and testing sets. For validation, both hold-out and leave-one-out approaches were assessed. For the hold-out validation approach, 70% of the total number of images was used to train the model and 30% was used for validation [30] (see Figure 4). Hence, 14 images from each category resulting in 14 × 4 = 56 images were used for training the model. On the other hand, 30% of the data were randomly held out for validation [31]. As a result, the remaining 24 images (six for each category) has been used to validate the model. The leave-one-out method was also assessed where the machine learning model is applied once for each image from the database, using all other images as the training data, and the selected image as a single test image [32]. Therefore, the algorithm was simulated 80 times, each time taking 79 images as training data and remaining image as the testing image, consecutively.

Methods
Our proposed method first extracts the feature points from the preprocessed images obtained in Section 2, and then uses the extracted features to classify the images into body shape categories. The feature extraction procedure is described in Sections 3.1 and 3.2, and the body shape classification procedure is explained in Section 3.3.

Feature Points Detection
SURF feature points were extracted from the preprocessed images to use them as input for the classifier. We adopted SURF since it is widely used in computer vision [33] due to its faster and more accurate characteristics compared to other methods, such as SIFT [34] and Harris corner detection [35] methods. The SURF feature point extracting procedure consists of the generation of the integral image, the approximation of the Hessian determinant and performing non-maximal suppression [36] as shown in Figure 5. The integral image was used for the fast calculation of the sum of the pixel values in a given image or a rectangular subset of the given image. The characteristics of the integral image permit the fast computation of box convolution filters [37]. The box filter was used for detecting blobs or finding points of interest in the image [38,39].
Specifically, corners, edges, and blobs required for body shape detection can be found using the box filter. The box filter is interpolated in the scale (the up-scaling filter

Methods
Our proposed method first extracts the feature points from the preprocessed images obtained in Section 2, and then uses the extracted features to classify the images into body shape categories. The feature extraction procedure is described in Sections 3.1 and 3.2, and the body shape classification procedure is explained in Section 3.3.

Feature Points Detection
SURF feature points were extracted from the preprocessed images to use them as input for the classifier. We adopted SURF since it is widely used in computer vision [33] due to its faster and more accurate characteristics compared to other methods, such as SIFT [34] and Harris corner detection [35] methods. The SURF feature point extracting procedure consists of the generation of the integral image, the approximation of the Hessian determinant and performing non-maximal suppression [36] as shown in Figure 5.

Methods
Our proposed method first extracts the feature points from the preprocessed images obtained in Section 2, and then uses the extracted features to classify the images into body shape categories. The feature extraction procedure is described in Sections 3.1 and 3.2, and the body shape classification procedure is explained in Section 3.3.

Feature Points Detection
SURF feature points were extracted from the preprocessed images to use them as input for the classifier. We adopted SURF since it is widely used in computer vision [33] due to its faster and more accurate characteristics compared to other methods, such as SIFT [34] and Harris corner detection [35] methods. The SURF feature point extracting procedure consists of the generation of the integral image, the approximation of the Hessian determinant and performing non-maximal suppression [36] as shown in Figure 5. The integral image was used for the fast calculation of the sum of the pixel values in a given image or a rectangular subset of the given image. The characteristics of the integral image permit the fast computation of box convolution filters [37]. The box filter was used for detecting blobs or finding points of interest in the image [38,39].
Specifically, corners, edges, and blobs required for body shape detection can be found using the box filter. The box filter is interpolated in the scale (the up-scaling filter The integral image was used for the fast calculation of the sum of the pixel values in a given image or a rectangular subset of the given image. The characteristics of the integral image permit the fast computation of box convolution filters [37]. The box filter was used for detecting blobs or finding points of interest in the image [38,39]. Specifically, corners, edges, and blobs required for body shape detection can be found using the box filter. The box filter is interpolated in the scale (the up-scaling filter size, i.e., 9 × 9, 15 × 15, 21 × 21, 27 × 27, etc.). The scale space model is implemented as up-scaling Electronics 2021, 10, 97 6 of 15 box filters, with a resulted image convoluted with box filters of different sizes, which are shown in Figure 6. Each detected feature point in the range of filter size scales is detected and stored in a feature vector.
Electronics 2021, 10, x FOR PEER REVIEW 6 size, i.e., 9 × 9, 15 × 15, 21 × 21, 27 × 27, etc.). The scale space model is implemented as scaling box filters, with a resulted image convoluted with box filters of different si which are shown in Figure 6. Each detected feature point in the range of filter size sc is detected and stored in a feature vector. The non-maximal suppression is required to localize the most dominant featur the scale space model. For the localization of feature points, a non-maximum suppress in a 3 × 3 × 3 neighborhood [40] is used to find the local maxima in a (x, y, ) space app at the scale of the detected feature point as shown in Figure 7a. Figure 7b shows the in image with detected feature points using non-maximal suppression. In this paper, a database of training body images with their corresponding SURF tures is formed. Figure 8 shows the strongest 40 feature points (see green points) w corresponding scales (shown in green circles) and orientation which are automatically tected using SURF from an image. The non-maximal suppression is required to localize the most dominant feature in the scale space model. For the localization of feature points, a non-maximum suppression in a 3 × 3 × 3 neighborhood [40] is used to find the local maxima in a (x, y, σ) space applied at the scale of the detected feature point as shown in Figure 7a. Figure 7b shows the input image with detected feature points using non-maximal suppression. size, i.e., 9 × 9, 15 × 15, 21 × 21, 27 × 27, etc.). The scale space model is implemented as upscaling box filters, with a resulted image convoluted with box filters of different sizes, which are shown in Figure 6. Each detected feature point in the range of filter size scales is detected and stored in a feature vector. The non-maximal suppression is required to localize the most dominant feature in the scale space model. For the localization of feature points, a non-maximum suppression in a 3 × 3 × 3 neighborhood [40] is used to find the local maxima in a (x, y, ) space applied at the scale of the detected feature point as shown in Figure 7a. Figure 7b shows the input image with detected feature points using non-maximal suppression. In this paper, a database of training body images with their corresponding SURF features is formed. Figure 8 shows the strongest 40 feature points (see green points) with corresponding scales (shown in green circles) and orientation which are automatically detected using SURF from an image. In this paper, a database of training body images with their corresponding SURF features is formed. Figure 8 shows the strongest 40 feature points (see green points) with corresponding scales (shown in green circles) and orientation which are automatically detected using SURF from an image.
Feature points are the dominant attribute of an image. Particularly, the feature points are the dominant marks on the outline of the preprocessed images of the subjects. However, they do not provide information about the body shape. These feature points are used in the further procedure to extract body shape information and it is explained in Section 3.2 below. Feature points are the dominant attribute of an image. Particularly, the feature points are the dominant marks on the outline of the preprocessed images of the subjects. However, they do not provide information about the body shape. These feature points are used in the further procedure to extract body shape information and it is explained in Section 3.2 below.

Bag-of-Features
In our proposed method, the bag-of-features model is utilized to categorize body shape categories. The bag-of-features model is an image patch-based classification method, where most occurring image patches are used to classify an image [41]. The bag-of-features algorithm first obtains image patches from the feature points detected using the SURF algorithm (described in Section 3.1). Then, from the feature points in the training images, we get feature vectors. Balancing the number of features is done across all image categories to improve clustering. Among the categories, the minimum number of the strongest number of features is chosen to calculate the total number of descriptors using Equation (2):

Number of Features = Number of Category × ,
where one of the categories has the least number of strongest descriptors of n. The descriptors of each feature from all the training image database are then clustered for developing the 'visual words.' Each SURF descriptor provides a vector with a length of 64. A total of 500 visual words was obtained, each of which is determined by the center of the clustered feature descriptors in the known category of body shapes (inverted triangle, pear, hourglass, and rectangle) using k-means clustering [32]. In our body detection experiment, the numbers of descriptors and clusters were around 78,000 and 500 words, respectively, and the dimension of each word was 64 [42]. The visual words were then included in a visual dictionary. The number of visual words in the dictionary was taken as an experimental parameter. If the number was too small, visual words would not represent all patches, and if too large, quantization artifacts and overfitting will occur. Figure 9 shows the feature classification of the k-NN algorithm with decision boundaries (feature descriptors are mapped in two dimensions) for all the images of the four types of body shapes in our database.

Bag-of-Features
In our proposed method, the bag-of-features model is utilized to categorize body shape categories. The bag-of-features model is an image patch-based classification method, where most occurring image patches are used to classify an image [41]. The bag-of-features algorithm first obtains image patches from the feature points detected using the SURF algorithm (described in Section 3.1). Then, from the feature points in the training images, we get feature vectors. Balancing the number of features is done across all image categories to improve clustering. Among the categories, the minimum number n of the strongest number of features is chosen to calculate the total number of descriptors using Equation (2): where one of the categories has the least number of strongest descriptors of n. The descriptors of each feature from all the training image database are then clustered for developing the 'visual words.' Each SURF descriptor provides a vector with a length of 64. A total of 500 visual words was obtained, each of which is determined by the center of the clustered feature descriptors in the known category of body shapes (inverted triangle, pear, hourglass, and rectangle) using k-means clustering [32]. In our body detection experiment, the numbers of descriptors and clusters were around 78,000 and 500 words, respectively, and the dimension of each word was 64 [42]. The visual words were then included in a visual dictionary. The number of visual words in the dictionary was taken as an experimental parameter. If the number was too small, visual words would not represent all patches, and if too large, quantization artifacts and overfitting will occur. Figure 9 shows the feature classification of the k-NN algorithm with decision boundaries (feature descriptors are mapped in two dimensions) for all the images of the four types of body shapes in our database.

k-Nearest Neighborhood (k-NN)
The dictionary of visual words was used to classify a subject's body shape category which is developed from training images. To classify an input image, the proposed method detects and extracts features from the input image and then using the k-nearest neighbor algorithm, constructs a histogram of features for each image with 500 words. The visual words (cluster centers of feature descriptors) are always the same and obtained from all the images. The feature histogram uses the k-NN algorithm to calculate the frequency of occurrence of individual visual words. Equation (3) shows the Euclidean distance equation [43] for error correction for accurate classification using k-NN: where k is the number of nearest neighbors, x is the data point, y is the neighboring point and d is the distance. An input image is classified by determining the maximum occurrence of the visual words (feature patches) that corresponds to a specific body shape category. The occurrence of those visual words in that image is calculated using k-NN. Figure 10 shows the histogram of the visual word vocabulary occurring in the input image, with the bar heights corresponding to the frequency of occurrence.
Electronics 2021, 10, x FOR PEER REVIEW 8 of 16 Figure 9. The k-nearest neighbor algorithm for body shape classification. The bag-of-features descriptors are mapped in 2 dimensions in the figure, each cluster center pointing to one visual word using k-means clustering, and the k-nearest neighbor is used for generating the histogram of visual words.

k-Nearest Neighborhood (k-NN)
The dictionary of visual words was used to classify a subject's body shape category which is developed from training images. To classify an input image, the proposed method detects and extracts features from the input image and then using the k-nearest neighbor algorithm, constructs a histogram of features for each image with 500 words. The visual words (cluster centers of feature descriptors) are always the same and obtained from all the images. The feature histogram uses the k-NN algorithm to calculate the frequency of occurrence of individual visual words. Equation (3) shows the Euclidean distance equation [43] for error correction for accurate classification using k-NN: where k is the number of nearest neighbors, x is the data point, y is the neighboring point and d is the distance. An input image is classified by determining the maximum occurrence of the visual words (feature patches) that corresponds to a specific body shape category. The occurrence of those visual words in that image is calculated using k-NN. Figure  10 shows the histogram of the visual word vocabulary occurring in the input image, with the bar heights corresponding to the frequency of occurrence. The bag-of-features model provides a method for counting the visual word occurrences in an image by encoding the image into a feature vector. For classification, each input image is assigned the majority class of the k-nearest visual words in the descriptor space, using the Euclidian distance mentioned in Equation (3). The parameter k controls how many of the nearest neighbors vote for the output class. The histogram bins are incremented based on the distance of the feature descriptor to a particular cluster center. The histogram length corresponds to the number of visual words of the dictionary.
The k-nearest neighbor algorithm was trained on our training dataset. The extracted The bag-of-features model provides a method for counting the visual word occurrences in an image by encoding the image into a feature vector. For classification, each input image is assigned the majority class of the k-nearest visual words in the descriptor space, using the Euclidian distance mentioned in Equation (3). The parameter k controls how many of the nearest neighbors vote for the output class. The histogram bins are incremented based Electronics 2021, 10, 97 9 of 15 on the distance of the feature descriptor to a particular cluster center. The histogram length corresponds to the number of visual words of the dictionary.
The k-nearest neighbor algorithm was trained on our training dataset. The extracted features from the training images were used to develop the feature map. The k-nearest neighbor was calculated using the Euclidian distance from Equation (3). When a new image is obtained, the feature is extracted from the image and mapped on the feature map. The distance is calculated among the adjacent points for the new image. The distance is then sorted, and the k-nearest neighbors are taken. A major voting system was used to classify the new image to one of the four classes. The following block diagram in Figure 11 describes the k-NN algorithm for body shape detection.
The bag-of-features model provides a method for counting the visual word occur rences in an image by encoding the image into a feature vector. For classification, each input image is assigned the majority class of the k-nearest visual words in the descripto space, using the Euclidian distance mentioned in Equation (3). The parameter k control how many of the nearest neighbors vote for the output class. The histogram bins are in cremented based on the distance of the feature descriptor to a particular cluster center The histogram length corresponds to the number of visual words of the dictionary.
The k-nearest neighbor algorithm was trained on our training dataset. The extracted features from the training images were used to develop the feature map. The k-neares neighbor was calculated using the Euclidian distance from Equation (3). When a new im age is obtained, the feature is extracted from the image and mapped on the feature map The distance is calculated among the adjacent points for the new image. The distance i then sorted, and the k-nearest neighbors are taken. A major voting system was used to classify the new image to one of the four classes. The following block diagram in Figure  11 describes the k-NN algorithm for body shape detection. Figure 11. The k-nearest neighbor block diagram for classifying body shape. For the classification, we considered four different types of body shapes: inverted triangle, pear, hourglass, and rectangle. Hence, a feature point consisting of multiple dimension parameters is then sorted into the specific class that is closest to the visual word following the k-NN rule. In this way, our proposed algorithm detects the body shape.

Convolutional Neural Network (CNN)
The CNN algorithm uses a neural network consisting of input layers, hidden layers, and output layers with a bank of the convolutional filter and pooling operations. CNN applies convolutional filters to an input image to create the feature map of summarized features detected in the input. To reduce dimensionality, a pooling layer was used. Changing the values of weights of the neurons when applied through the neural network enables CNN to differentiate between classes. The block diagram of CNN for body shape detection is given below.
The convolutional layers extract the feature maps using convolution filters, and fully connected layers are used to adjust the weights of a classifier. Here, the Alexnet architecture [44] has been adopted, which has eight layers including s convolutional layers with n depths and m max pooling layers, as shown in Figure 12. The activation function for each node in the fully connected layers is a sigmoid function which is expressed in Equation (4): where x is the input value and σ(x) is the output value between 0 and 1. In each stage, the convolution function is used to extract features, and the activation function is used for classification.
tion is given below. The convolutional layers extract the feature maps using convolution filters, and fully connected layers are used to adjust the weights of a classifier. Here, the Alexnet architecture [44] has been adopted, which has eight layers including s convolutional layers with n depths and m max pooling layers, as shown in Figure 12. The activation function for each node in the fully connected layers is a sigmoid function which is expressed in Equation (4) where x is the input value and ( ) is the output value between 0 and 1. In each stage, the convolution function is used to extract features, and the activation function is used for classification. Figure 13 shows the training and testing process of the CNN. In the proposed method, the CNN model was trained on 56 images taken randomly with labels from the database as training data. For each of the input training images, this operation enables updating the weight values of the CNN. For testing, the remaining 24 were taken and validated with the labels.    where x is the input value and ( ) is the output value between 0 and 1. the convolution function is used to extract features, and the activation funct classification. Figure 13 shows the training and testing process of the CNN. In method, the CNN model was trained on 56 images taken randomly with l database as training data. For each of the input training images, this ope updating the weight values of the CNN. For testing, the remaining 24 w validated with the labels.   Figure 14 shows the body shape classification procedure of our proposed CNN algorithm. The trained CNN differentiates the body shapes of input test images, as shown in Figure 14.
Electronics 2021, 10, x FOR PEER REVIEW 11 of 16 Figure 14 shows the body shape classification procedure of our proposed CNN algorithm. The trained CNN differentiates the body shapes of input test images, as shown in Figure 14.

Results
In our proposed method, k-NN or CNN is adopted for a body shape detection algorithm due to their relatively high accuracy performance. k-NN is a representative of a group of machine learning classifiers which is widely used in classifications in various fields including handwritten digit recognition from images [45] and tumor classification from images [46] and simple in procedure, calculating the Euclidian distance between da-

Results
In our proposed method, k-NN or CNN is adopted for a body shape detection algorithm due to their relatively high accuracy performance. k-NN is a representative of a group of machine learning classifiers which is widely used in classifications in various fields including handwritten digit recognition from images [45] and tumor classification from images [46] and simple in procedure, calculating the Euclidian distance between dataset input images [47]. Both k-NN and CNN are implemented in a computer having Intel (R) Core TM i5-8250U CPU @ 1.60 GHz (eight CPUs)~1.8 GHz (Intel Corporation, Santa Clara, California, USA) and 8192 MB RAM.
The k-NN was implemented with the input of the bag-of-features model. A minimum of 252,343 features were taken from each category, resulting in a total of 1,009,372 features (252,343 features/category × 4 categories) for the whole database (see Equation (2)). The proposed machine learning method with bag-of-features model and k-NN classifier results in an 87.50% accuracy whereas the support vector machine (SVM) [48] and multilayer perceptron (MLP) [49] delivers 72.17% and 62.50% accuracy, respectively. Therefore, our proposed method performs better in terms of accuracy (17.52% and 28.57% for SVM and MLP, respectively). The accuracy comparison of the proposed method to SVM and MLP is given in Table 1.  Figure 15 shows the prediction accuracy of two body shape categories using the bag-of-features model and k-NN classifier. As shown in Table 2, our classifier detects the inverted triangle and rectangle body shape with 100% accuracy, and the hourglass and pear with 67 and 83% accuracy, respectively, resulting in an overall 87.50% testing accuracy on average. It takes 133.13 s to train the k-NN with the training datasets.    Our simulation results show that the leave-one-out approach gives 88.75% accuracy while the hold-out approach gives 87.50% accuracy. In terms of computational time, the hold-out approach took around 60 times less compared to the leave-one-out approach.
The CNN model developed for body shape detection was trained on the same database. The training procedure uses nine epochs (the number of complete cycles through the training dataset) with a minimum batch size (the number of sample images processed before the model is updated for the next iteration) of six. The initial learning rate, which is the amount that the weights are updated during training as the step size, is taken as 0.0001. As shown in Table 3, the CNN classifier detects an inverted triangle, pear, hourglass, and rectangle body shape with 100% accuracy. It takes 59 s for the CNN to train itself with the training datasets.

Discussion
The conventional technology to detect body shapes relies on body measurements. The dimensional body scanning model depends on these characteristics to classify body shapes. For example, the inverted triangle is detected by this scanner when the heaviest part is the upper body, shoulders are wider than hips with the actual measurements [50]. In [22], the body shape detection method based on the parameters of female body measurements is suggested. However, our proposed method does not depend upon body measurements, instead using only the acquired image to detect the body shape. Our proposed method is more convenient as it does not require exact body measurements taken manually from users. Conventional methods have limitations in that incorrect measurements can be taken for body shape detection since the actual measurement is necessary to calculate the body shape, requiring the rigorous measurements of the shoulder, bust, hips and waist of a person. Collecting this measurement information is usually inconvenient and difficult to acquire due to the lack of measurement tools or professional tailor measurement skills. Since there is no universal body measurement table for predicting the body shape of a person, each method relies on different measurement criteria for the parameters for predicting the body shape, which is widely inaccurate [11]. However, our approach is an image processing and machine learning-based method which does not require any measurement input from the user's perspective and detects the actual body shape from a smartphone camera image. The major vote method [51] was used to assess the label of the subject, i.e., the body shape category. The labels are also supported by the traditional method of measurements [25], introducing repeatability among different observers. Our proposed technology does not rely on any precalculated criteria. The comparative evaluation of the proposed method with the state-of-art methods is presented in Table 4. Hidayati et al. [11] introduced a study on body shape detection and styling using measurements of 3150 female celebrities obtained from a website (www.bodymeasurements.org). The gold standard of the body shape labels was not defined in the paper, rather an unsupervised feature learning method was used by combining different measurements. The intention to use unsupervised learning is to extract the natural grouping of a set of body measurements. Therefore, the clustering of measurement data, e.g., affinity propagation, has been presented as a viable solution for body shape detection. As shown in Table 4, the accuracy of our proposed method is higher than the method proposed by Hidayati et al. [11].

Conclusions
In this paper, we proposed an effective method to detect body shapes using a smartphone, which will solve overall fit issues, increase consumer convenience and improve overall online apparel shopping experiences for consumers. Our proposed method has been shown to provide 87.50% accuracy on average in classifying four different body shapes: inverted triangle, pear, hourglass, and rectangle. The proposed methods expect to increase the feasibility of smartphone-based online apparel shopping by providing optimal fit garment suggestions in a more accurate and personalized way [52], which will finally benefit online apparel retailers by increasing their revenue and decreasing returns due to poor fit. The results of this study show the potential contribution to the textile and apparel industry by providing contact-free body shape detection with high accuracy. Further development of the proposed method will generate a model enabling the detection of other body types, incorporating with additional data on other existing body shapes. The next step of this research is to analyze the shape and fit of the garment itself to match this with the body shape detection data, so that it can provide optimal fit garment information for consumers.

Limitations and Future Research
There are some limitations of this study which can be explored in the future research. First, we only classified the four body shapes of females. Hence, future research should be conducted for body shapes of males as well as additional body shapes for female, such as the round shape. Second, we used 2D images to determine the body shapes. However, height and side silhouette in addition to front silhouette, which was examined in this paper, can affect the fit issues. Hence, further research of fit detection related to the height and side silhouette is needed. In addition, the examination of body shape detection with a larger sample size can increase the accuracy of the results.