Anthropometric Landmarks Extraction and Dimensions Measurement Based on ResNet

: Anthropometric dimensions can be acquired in 2D images by landmarks. Body shape variance causes low accuracy and bad robustness of landmarks extracted, and it is di ﬃ cult to determine the position of axis division point when dimensions are calculated by the ellipse model. In this paper, landmarks are extracted from images by convolutional neural network instead of the gradient of body outline. A general multi-ellipse model is proposed, the anthropometric dimensions are obtained from the length of di ﬀ erent elliptical segments and the position of axis division point is determined by thickness–width ratio of body parts. Finally, an evaluation is completed based on 87 subjects, in which it turns out that the average accuracy of our method for identifying landmarks is 96.6%, when the number of rotation angles is 2, the three main dimensional errors calculated by our model are smaller than existing method, and the errors of other dimensions are also within the margin of error for garment measuring.


Introduction
Anthropometry is an important part of ergonomics, which can be applied to garment customization, virtual fitting [1], somatotype [2][3][4] and other fields. Anthropometric dimension measurement [5] based on 2D images has attracted wide attention due to its high efficiency, portability, and cheapness. Most of the measurement methods rely on landmarks that are based on body silhouette and the accuracy of extracting landmarks affects the accuracy [6] of the dimensions. In recent years, experiments have shown that Convolutional Neural Network (CNN) has absolute advantages in pose estimate [7][8][9], image segmentation [10][11][12], but this technology has not been applied to the extraction of landmarks. Therefore, it is significant to integrate anthropometric dimension measurement with convolutional neural networks.
The anthropometric dimension measurement method based on 2D images is divided into three steps. First, the silhouette of the human body is extracted from 2D images. Then, the landmarks are extracted from silhouettes based on gradient. Finally, the anthropometric dimensions are calculated from the positional relationship between landmarks. For example, the double ellipse model [13] whose error is caused by the inaccuracy of silhouette extraction, the difference between human body shape and the fixed proportion of minor semi-axis division of ellipse.
This paper proposes a method of measuring anthropometric dimensions based on 2D images, which extracts landmarks from images through a convolutional neural network and builds a general multi-ellipse model in which body shape information is added to correct ellipse model for avoiding the error caused by the extraction of silhouette and double ellipse model. What is more important is that this method guarantees the high correlation between landmarks and body part, and the robustness of landmarks is extracted. Considering the human images of any resolution as the input of the network, we annotate 14 landmarks on the front and side images respectively and train these images separately by 101-layer ResNet [14], which outputs the heatmap of landmarks in images first and then outputs the precise coordinates of landmarks. In applications, a person is required to wear tights and stand with a fixed posture that allows slight posture changes, whose purpose is to reduce the deformation range of posture so that network can learn the spatial location of landmarks in the human body better. The body shape information of the human is retained, and the position of axis division point in the ellipse model is determined by the thickness-width ratio of body parts. First, real data are used to calculate the deformation range of thickness-width ratio of body parts and the position of axis division point, then establish a functional relationship between thickness-width ratio of body parts and the position of division point. By comparing 21 main dimensions measured with real data, it is confirmed that errors are within the margin of error for garment measuring (within 2 cm), and error calculated by the multi-ellipse model is smaller than the existing method.
Our contributions are as follows: (1) A new way to extract landmarks from 2D images is proposed, that is, extracting landmarks by a deep convolutional neural network. (2) A multi-ellipse model is proposed, and the position of the axis division point of the ellipse model is determined by the thickness-width ratio of the body parts, which reduces the error of anthropometric dimensions. (3) The method is evaluated on real samples and compared with other methods, which shows the accuracy of the multi-ellipse model when the number of images is 2.

Related Work
Anthropometric dimensions can be expressed by the actual distance between two landmarks. Whether it is a traditional manual measurement or a measurement based on 2D human images, most methods rely on landmarks that are highly related to the body parts, which requires that landmarks can be accurately extracted. For example, scale-invariant feature transform (SIFT) [15] and speeded up robust feature (SURF) [16] can be used to extract landmarks. However, such methods extract a large number of useless landmarks and have low correlation with body parts, and the process is very complex and inefficient. In [12], Murtaza Aslam proposed a method where landmarks are extracted automatically by numerical calculation, the maximum number of landmarks are obtained by converting 2D images into one-dimensional signals, which greatly improves the number and speed of landmarks extraction. However, the peak value of the human silhouette is not obvious when this method is applied to the human body with a special body shape, such as too fat or too thin, which causes only a few landmarks to be extracted. In [17], the neural network is used to regress 3D human model parameters from the silhouette, and the 3D human model is reconstructed by parameters, and landmarks are extracted on the model for measurement. With high complexity, this method is very inefficient when faced with mass data. Different from the above methods, the landmark extraction is assumed as a regression problem in this paper which uses a deep convolution neural network to extract spatial structure information of the human body, and the position of the landmarks is learned. A large number of landmarks can be extracted by our method which is not limited by body shape and other factors, so as to achieve faster and more accurate measurement of anthropometric dimensions.
With the rapid development of the convolutional neural network, the pose estimate algorithm has been effectively improved. In [18], conditional random fields are used to model the location of key points in CNN, and a heatmap is proposed for the first time to obtain the spatial information of key points. Shihen Wei proposed a multi-module phased network architecture to express spatial information and texture information [19]. The output of the previous stage of the network is taken as the input of the next stage and evaluated on multiple datasets, which has strong robustness. In [20], a new analytical induced learning (PIL) is proposed, which uses body part information to assist key point recognition and pose estimation. Different from previous work, this paper transforms skeleton Symmetry 2020, 12, 1997 3 of 15 key points into landmarks highly related to body parts and uses residuals to transfer features between discontinuous layers of the network.
After complete and reliable landmarks are extracted, the calculation of the dimensions is required. For general dimensions such as height and arm length, dimensions can be obtained by calculating the Euclidean distance between the two landmarks. On the other hand, other dimensions with radians, such as chest circumference and waist circumference, are difficult to calculate. The thickness and width of the chest circumference can be used as the minor and the major semi-axis of the ellipse, and each arc can be fitted as an ellipse. Dimensions can be obtained after calculating the perimeter of the ellipse, but this method has a large error when dealing with someone overweight. Considering the difference of the front and back radians of the human body, a new ellipsoid model is proposed that the front and back radians of the human body are fitted to two different semi ellipses and stitched into a complete ellipse to calculate anthropometric dimensions. In this paper, a general multi-ellipse model is presented, and body shape information in the thickness-width ratio of body parts is used to correct the position of division points in the ellipse model, which reduces the error of dimensions. Figure 1 shows the process of the algorithm. First, we input human images on the front or side into 101-layer ResNet, which extracts features of images and predicts the heatmap of landmarks. The final output is the coordinate of landmarks. After that, the body parts are modeled, and the dimensions are calculated based on the landmarks. key points into landmarks highly related to body parts and uses residuals to transfer features between discontinuous layers of the network. After complete and reliable landmarks are extracted, the calculation of the dimensions is required. For general dimensions such as height and arm length, dimensions can be obtained by calculating the Euclidean distance between the two landmarks. On the other hand, other dimensions with radians, such as chest circumference and waist circumference, are difficult to calculate. The thickness and width of the chest circumference can be used as the minor and the major semi-axis of the ellipse, and each arc can be fitted as an ellipse. Dimensions can be obtained after calculating the perimeter of the ellipse, but this method has a large error when dealing with someone overweight. Considering the difference of the front and back radians of the human body, a new ellipsoid model is proposed that the front and back radians of the human body are fitted to two different semi ellipses and stitched into a complete ellipse to calculate anthropometric dimensions. In this paper, a general multi-ellipse model is presented, and body shape information in the thickness-width ratio of body parts is used to correct the position of division points in the ellipse model, which reduces the error of dimensions. Figure 1 shows the process of the algorithm. First, we input human images on the front or side into 101-layer ResNet, which extracts features of images and predicts the heatmap of landmarks. The final output is the coordinate of landmarks. After that, the body parts are modeled, and the dimensions are calculated based on the landmarks.

Landmarks Extraction Based on ResNet
Measuring methods based on 2D images rely on landmarks highly related to body parts, and the accuracy of landmarks extraction directly affects measurement error. As is shown in Table 1, the coordinates of landmarks in the image are marked according to the method specified by GB/T 38131-2019 [21], which is the same as pose estimation [19]. The convolution neural network is used to implicitly learn the spatial information of the human body and the position of landmarks in the

Landmarks Extraction Based on ResNet
Measuring methods based on 2D images rely on landmarks highly related to body parts, and the accuracy of landmarks extraction directly affects measurement error. As is shown in Table 1, the coordinates of landmarks in the image are marked according to the method specified by GB/T 38131-2019 [21], which is the same as pose estimation [19]. The convolution neural network is used to implicitly learn the spatial information of the human body and the position of landmarks in the Symmetry 2020, 12, 1997 4 of 15 structure of the human body. Figure 2 shows all landmarks marked, with F representing the front and S representing the side.

FP1, SP1 Vertex
The highest point of the head.

FP2 Right Acromion
The most lateral point of the lateral edge of the spine (acromial process) of the right scapula, projected vertically to the surface of the skin.

Landmark Definition
FP1, SP1 Vertex The highest point of the head.

FP2 Right Acromion
The most lateral point of the lateral edge of the spine (acromial process) of the right scapula, projected vertically to the surface of the skin. Predicting the position of landmarks from 2D images is a very complex problem because there is a large deformed range of human posture and body shape. In order to make the network learn useful information better, we consider decoupling posture information from body shape information and requiring people to stand with a fixed posture. The network does not need to learn posture information but only information on body shape. The network structure is shown in Figure 3. The deep network is prone to degradation, which will affect the learning effect. However, this problem can be solved by the skip connection between different layers. More importantly, many image features will lose in the convolution process, and a positive learning effect cannot be obtained only through features transferred between continuous layers. While features transferred between discontinuous layers can be achieved after introducing residuals into the network, and important features extracted from lower layers will not be lost in the process of convolution.
where * represents real confidence score, is the predicted confidence score, L is the crossentropy loss function, and Lr is the Huber loss function.

Anthropometric Dimension Calculation Based on Multi-Ellipse Model
There are two human images used for dimension calculation and 14 landmarks labeled in the front and side images respectively. Assuming that the height of testers is known, the pixel-level height of testers is calculated from the landmark of the head and the foot, and the ratio of true height to pixel-level height is used as the scale of distance conversion.
The anthropometric dimensions are obtained from landmarks extracted in the front and side A deconvolution module is added after the feature-extracted module to restore images to a higher resolution level and to predict multichannel features, and the number of channels of the final output is the same as the number of landmarks. A location refinement module is also added to learn offset Lr of predicted location of feature maps, with an output dimension twice the number of landmarks because it contains offsets in the x and y directions.
The prediction process of CNN as f extracts features from input images and predicts confidence (the probability of x = z) that each pixel point z ∈ Z is the landmarks x. The process can be expressed as follows: where i represents input images, p is the score of prediction confidence, n is the number of landmarks, and cross-entropy is the loss function as follows: where p * x represents real confidence score, p x is the predicted confidence score, L is the cross-entropy loss function, and Lr is the Huber loss function.

Anthropometric Dimension Calculation Based on Multi-Ellipse Model
There are two human images used for dimension calculation and 14 landmarks labeled in the front and side images respectively. Assuming that the height of testers is known, the pixel-level height of testers is calculated from the landmark of the head and the foot, and the ratio of true height to pixel-level height is used as the scale of distance conversion.
The anthropometric dimensions are obtained from landmarks extracted in the front and side images. As shown in Figure 4, the dimensions obtained include six lengths (L1-L6), nine widths (A1-A4, B1-B5), six girth (D1-D3, C1-C3). Table 2 shows the definition of dimensions according to the method specified by GB/T 16160-2017 [22].    There are two types of dimensions that can be calculated, one type includes shoulder width, arm length, thigh circumference, etc. The other type includes chest, waist, hip circumference, etc., which is similar to ellipses and can be viewed as an ellipse with different curvatures.
Based on human images from different angles, a multi-ellipse model in a polar coordinate system is built, which is a general model for calculating ellipse dimensions. Due to the special structure of the human body, the curvature of different segments of the same part is different. The dimensions calculated are the cumulative sum of elliptical segments with different curvature, as shown in Figure 5a.  Where θ1, θ2 is the rotation angle of human body image, C1, C2 is the length of elliptical segment, p, q is the boundary points of elliptical segment C2, and α, β is the length of edge. A polar coordinate system ellipse equation with the center of ellipse as the origin is established, and the length of elliptical segment is calculated by the integral method as follows: The standard elliptic equation with a focus on the coordinate axis is built by p(αcosθ1, αsinθ2), q(βcosθ1, βsinθ2). Due to the symmetry of the human body structure, only the length of curve segment in (0, π) is needed to be calculated.
where Call represents dimension measurement and N represents rotation angle, which is the number of images. We get landmarks from the orthogonal image of the human body and build a multi-ellipse model with n = 2, as shown in Figure 6. In this case, θ1 = θ2 = π/2, the key to this model lies in how to The key factors to build a multi-ellipse model are the rotation angle θ n (0 < θ < 2π) of human body images and two sides of ellipse α n , β n . Assuming that the human body image has n angles, the elliptical segment at one of the angles is shown in Figure 5b.
Where θ 1 , θ 2 is the rotation angle of human body image, C 1 , C 2 is the length of elliptical segment, p, q is the boundary points of elliptical segment C 2 , and α, β is the length of edge. A polar coordinate system ellipse equation with the center of ellipse as the origin is established, and the length of elliptical segment is calculated by the integral method as follows: Symmetry 2020, 12,1997 8 of 15 The standard elliptic equation with a focus on the coordinate axis is built by p(αcosθ 1 , αsinθ 2 ), q(βcosθ 1 , βsinθ 2 ). Due to the symmetry of the human body structure, only the length of curve segment in (0, π) is needed to be calculated.
where C all represents dimension measurement and N represents rotation angle, which is the number of images. We get landmarks from the orthogonal image of the human body and build a multi-ellipse model with n = 2, as shown in Figure 6. In this case, θ 1 = θ 2 = π/2, the key to this model lies in how to determine the position of the division point S. The traditional ellipse model uses a fixed division point and has a large error when facing people of different body shape. In this paper, the location of the division point is determined by thickness-width ratio of human body parts. system ellipse equation with the center of ellipse as the origin is established, and the length of elliptical segment is calculated by the integral method as follows: The standard elliptic equation with a focus on the coordinate axis is built by p(αcosθ1, αsinθ2), q(βcosθ1, βsinθ2). Due to the symmetry of the human body structure, only the length of curve segment in (0, π) is needed to be calculated.
where Call represents dimension measurement and N represents rotation angle, which is the number of images. We get landmarks from the orthogonal image of the human body and build a multi-ellipse model with n = 2, as shown in Figure 6. In this case, θ1 = θ2 = π/2, the key to this model lies in how to determine the position of the division point S. The traditional ellipse model uses a fixed division point and has a large error when facing people of different body shape. In this paper, the location of the division point is determined by thickness-width ratio of human body parts.  Taking the chest as an example, first the deformation range of thickness-width ratio is calculated by real data. As shown in Figure 7a, chest width is assumed as A3 and half of chest width as L, that is, the length of ellipse major semi-axis alpha, the division point of chest thickness as S, the first half of minor semi-axis as S1, and the second half as S2, that is, the ellipse minor semi-axis beta. By known chest data, the second half of the minor semi-axis is represented as X and the first half as B1-x. As shown in Figure 7b, the deformation range of minor semi-axis X is calculated when chest thickness is B1 according to the above formulas. As shown in Figure 7c, by calculating the correlation between thickness-width ratio and the position of division point, it is found that the thickness-width ratio is inversely proportional to the position of division points.
A mapping relationship F is established between the thickness-width ratio and the position of division points. When the deformation range of thickness-width ratio is [a 1 , b 1 ] and the deformation range of the division point is [a 2 , b 2 ], then at the thickness-width ratio c, the location of division point D should be: where F represents the functional relationship between thickness-width ratio and the position of the division point.
shown in Figure 7b, the deformation range of minor semi-axis X is calculated when chest thickness is B1 according to the above formulas. As shown in Figure 7c, by calculating the correlation between thickness-width ratio and the position of division point, it is found that the thickness-width ratio is inversely proportional to the position of division points.
where F represents the functional relationship between thickness-width ratio and the position of the division point.

Experiment and Result
This section describes the datasets and shows the accuracy of landmarks extraction. The algorithm is evaluated based on real samples and compared with previous methods.

Experiment and Result
This section describes the datasets and shows the accuracy of landmarks extraction. The algorithm is evaluated based on real samples and compared with previous methods.

Datasets
The datasets are divided into two parts: network training and size calculation. The first part is the front and side images of humans for network training. These images include different people, each with 3500 images, in which 3000 are used as the training set and 500 are used as the test set. The reason why we can get a good training effect on the small sample data set is that all people in the images stand with a fixed posture, so it is no need for the network to learn the deformation range of human posture but the deformation range of body shape. As is shown in Table 3, the second part is 87 groups of front and side images for the testing algorithm. The data are all from the laboratory staff whose ages are between 20 and 50 years old with a BMI between 17 and 28, so that enough body shape has been included. All subjects are required to wear tight clothes, which can facilitate learning body shape information from the images.

Training Details
The network is trained by front and side images marked with 14 landmarks, and the heatmap is generated by a Gaussian template. The network of front images is introduced as an example, because the structure and parameters of the two networks are most the same. The network is based on 101-layer ResNet, which using the weights trained by resnet-v101 on ImageNet as the initial weight, through which the image is sampled to 1/16 of the original image and deconvolute to 1/8 of the original image after that, then the position of landmarks is predicted. The images of any resolution can input onto the network because the batch of input images is 1, and the SGD optimizer [23] is used to iterate 103w times on GTX2080, the learning rate is shown in Table 4. The side images are mirrored, and all images are expanded in the form of an image pyramid to enhance the robustness of the network.

Result
As shown in Figure 2, the landmarks are classified based on body parts, and PCK, that is the proportion of correctly predicted key points, is evaluated on the sample of 500 images. The PCK [24] with landmark j is defined as follows: where J represents the type of landmarks, σ is the detection threshold value, between 0.1 and 0.5 of head pixel value, n is the number of samples, x j pre represents the predicted coordinates of j-th landmarks, and x j truth is the true coordinates of the j-th landmarks. Because enough human body images are collected, the number of landmarks extracted remains steady, that is, the robustness of this method is better than other methods and has a high correlation with body parts. As shown in Table 5, the average accuracy of landmarks extraction is 96.6% because of the invariability of the data, that is, the posture of the human body is relatively fixed.
The average dimensions, which served as the true value (T), are calculated after the process that 87 subjects of anthropometric dimensions are measured by three professional researchers on two successive days. In the process of manual measurements, there are two kinds of errors in general. The first kind occurs in the measurements of the same dimension by different researchers, which is also called inter-observer error, while the other kind of error occurs in multiple measurements of the same dimension by the same researchers, which is also called intra-observer error [13]. In order to evaluate different kinds of errors, the technical error of measurement (TEM) and reliability coefficient (R) [25] of 87 subjects of dimensions are calculated as follows: where p represents the number of researchers, e represents the number of measurements and n represents the number of subjects. Given this, q represents the number of researchers in the first type of error while for the second kind of error, q is two times the number of researchers, m q represents the qth measurement, S 2 represents the sample variance. In this experiment, all researchers are required to make measurements twice. As is shown in Table 6, the mean values for intra-observers reliability of the three trainers are 0.899 (range: 0.82-0.97), 0.914 (range: 0.84-0.98), 0.904 (range: 0.81-0.98). For all trainers, the intra-observer reliability coefficient has shown good to high reliability. Similarly, the mean values for inter-observers reliability are 0.857 (range: 0.78-0.95). These results illustrate that the manually measured anthropometric dimensions are reliable.
The measurement obtained by our method is affected by two factors: (i) the distance between camera and subject. (ii) the degree of looseness of the subject's clothes. These factors should be adjusted for the most favorable measurements. According to the experiments, it is proved that the most favorable distance between subject and camera is 1 m and the subject had better wear tighter clothing. Under these conditions, the mean absolute difference (MAD) of dimensions are calculated, which is defined as: where i represents the i-th anthropometric dimension, P j represents the measurements of the proposed method, and J represents the manual measurement. As shown in Table 7, MAD of all dimensions are calculated, the result shows all dimension errors measured are within the margin of error, Maximum Permissible Error (MAE) for garment measuring proposed in reference [26]. As shown in Figure 8, the errors of three dimensions of chest, waist and hip are compared with errors calculated by Murtaza et al. [13], and Maximum Permissible Error (MAE), which proves that the error of our method is smaller than that of other methods. The square or diamond symbol on the error bar represent MAD and the dotted bar represents MAE. The upper and lower limits of error bars represent the standard deviation (SD) of absolute difference. As shown in Figure 8, the errors of three dimensions of chest, waist and hip are compared with errors calculated by Murtaza et al. [13]., and Maximum Permissible Error (MAE), which proves that the error of our method is smaller than that of other methods. The square or diamond symbol on the error bar represent MAD and the dotted bar represents MAE. The upper and lower limits of error bars represent the standard deviation (SD) of absolute difference. The absolute difference of a dimension greater than MAE is considered as an outlier. The PA of dimensions obtained by Lin [26] and Murtaza [13] is compared with proposed method based on the number of outliers. The PA of jth dimension (PAi) is evaluated as: where i represents the anthropometric dimension, m represents the number of outliners, and n represents the number of subjects. The agreement of methods on two important dimensions (A1, C1) in garment customization is evaluated. As shown in Figure 9, where T represents the result obtained by manual measurements and P represents the result calculated by our method. Among 87 subjects, the shoulder widths of 84 subjects are within the maximum error of clothing customization (8mm), the PA is 96%, and the bust girths of 83 subjects are within the maximum error of clothing customization (15mm), the PA is 95%. As shown in Figure 10, what is more important is that the proposed algorithm performed better for 20 out of 21 dimensions, and the average PA of all dimensions is 94.2% for the proposed method and The absolute difference of a dimension greater than MAE is considered as an outlier. The PA of dimensions obtained by Lin [26] and Murtaza [13] is compared with proposed method based on the number of outliers. The PA of jth dimension (PA i ) is evaluated as: where i represents the anthropometric dimension, m represents the number of outliners, and n represents the number of subjects. The agreement of methods on two important dimensions (A1, C1) in garment customization is evaluated. As shown in Figure 9, where T represents the result obtained by manual measurements and P represents the result calculated by our method. Among 87 subjects, the shoulder widths of 84 subjects are within the maximum error of clothing customization (8 mm), the PA is 96%, and the bust girths of 83 subjects are within the maximum error of clothing customization (15 mm), the PA is 95%. As shown in Figure 10, what is more important is that the proposed algorithm performed better for 20 out of 21 dimensions, and the average PA of all dimensions is 94.2% for the proposed method and 88% for [26] and 93% for [13]. Therefore, the results of statistical analysis illustrate that our method is more accurate and consistent after comparing with [26] and [13] in the automatic measurement of body dimensions.
Symmetry 2020, 12, x FOR PEER REVIEW 14 of 16 88% for [26] and 93% for [13]. Therefore, the results of statistical analysis illustrate that our method is more accurate and consistent after comparing with [26] and [13] in the automatic measurement of body dimensions.

Discussion and Conclusions
In this paper, a new method of anthropometric dimensions measurement based on 2D images is proposed for solving the problems of low extraction rate of landmarks, inaccurate division point of minor semi-axis of double ellipse model and big measurement error. The convolution neural network introduced to extract landmarks has higher robustness, that is, the number of landmarks is more stable and the correlation with body parts is higher. Then, a multi-ellipse model is proposed, in which the shape information (thickness-width ratio) is integrated into the multi-ellipse model with two images to correct the position of axis division point, so as to avoid the error of contour extraction and gradient calculation. The errors of 21 main dimensions measured are all within the margin of error for garment measuring. The errors of chest, waist and hip based on the multi-ellipse model are smaller than the existing methods.
In future work, more images with more postures can be added to the training set to improve the stability and accuracy of landmarks extraction. By expanding the number of test images, that is, using more human images with different rotation angles, the errors of the multi-ellipse model can be reduced. In addition, the data of dimensions measured in our experiment can also be used in garment customization, 3D human modeling, virtual fitting and other fields.

Discussion and Conclusions
In this paper, a new method of anthropometric dimensions measurement based on 2D images is proposed for solving the problems of low extraction rate of landmarks, inaccurate division point of minor semi-axis of double ellipse model and big measurement error. The convolution neural network introduced to extract landmarks has higher robustness, that is, the number of landmarks is more stable and the correlation with body parts is higher. Then, a multi-ellipse model is proposed, in which the shape information (thickness-width ratio) is integrated into the multi-ellipse model with two images to correct the position of axis division point, so as to avoid the error of contour extraction and gradient calculation. The errors of 21 main dimensions measured are all within the margin of error for garment measuring. The errors of chest, waist and hip based on the multi-ellipse model are smaller than the existing methods.
In future work, more images with more postures can be added to the training set to improve the stability and accuracy of landmarks extraction. By expanding the number of test images, that is, using more human images with different rotation angles, the errors of the multi-ellipse model can be reduced. In addition, the data of dimensions measured in our experiment can also be used in garment customization, 3D human modeling, virtual fitting and other fields.

Conflicts of Interest:
The authors declare no conflict of interest.