A Boundary Zone Method for the Generation of Multivariate Representative Humanoids

: The present study developed a novel multivariate representative humanoid (RH) generation method called the boundary zone method (BZM), which consists of (1) the formation of a boundary zone (BZ) for a designated accommodation percentage ( κ ), (2) the clustering of anthropometric cases in the BZ, and (3) the selection of representative cases from the clusters. By using 1988 U.S. Army anthropometric data for κ = 90% and 10 anthropometric dimensions, the BZM was compared to existing methods, including the square method (SM), the rectangular method (RM), and the circular method (CM) in terms of multivariate accommodation percentage (MAP), outlier percentage, and normalized outlier magnitude. The MAP analysis showed that only the BZM could form a group of RHs that precisely satisﬁed the designated κ , whereas the RM formed over-accommodating RH groups and both the SM and CM formed under-accommodating RH groups. The outlier analysis identiﬁed that only the BZM generated relevant RHs within the body size ranges of the target population. percentage, and normalized outlier magnitude. The evaluation results support the effectiveness of the BZM by showing that the BZM generates appropriate RHs for a designated accommodation percentage of the population and overcomes the limitations (loss of anthropometric variability, lack of body size diversity, and estimation error of an AD) of the existing methods. Further research is needed to validate the effectiveness of the BZM in solving real ergonomics design problems.


Introduction
A small group of digital humanoids (manikins, human models, or cases) representing the target population is used in the ergonomics design and evaluation of products and workstations in a digital environment such as JACK ® and RAMSIS ® . The use of a small group of representative humanoids (RHs) enables designers to efficiently apply the body size characteristics of the target population to product design and evaluation [1,2]. Thus, RHs need to be carefully determined to develop an ergonomics design suitable for a designated accommodation percentage (κ) of the target population [3,4].
The percentile RH generation method is commonly employed for its simplicity, although it is often criticized in terms of multivariate accommodation. You et al. [5] evaluated an ergonomics design of a bus operator's workstation using 5th, 50th, and 95th percentile humanoids specified by SAE J833 in a digital human simulation system. The percentile method determines the body sizes of RHs using designated percentile values of individual anthropometric dimensions (ADs) under consideration [2]. RHs created by the percentile method can accommodate a designated percentage of the target population for each individual AD, but not for multiple ADs [6,7]. For example, in designing the height and width of a bus door for 95% of the U.S. population using 1988 U.S. Army anthropometric data (Gordon et al., 1988), the percentile method uses a 95th percentile humanoid with a 95th percentile stature (183.8 cm) and a 95th percentile bideltoid breadth (41.8 cm). In this example, the height and width of the door would meet the designated κ if the stature and bideltoid breadth of the target population are considered independently, but become unsatisfactory if both the ADs are considered simultaneously-only 91.8% of the target population are smaller both in stature and bideltoid breadth than the 95th percentile humanoid.
To overcome the multivariate accommodation limitation of the percentile method, multivariate RH generation methods such as the rectangular, square, and circular methods (operationally termed in the present study) were developed using variable reduction techniques such as factor analysis (FA) and principal component analysis (PCA). The square method (SM; Figure 1a) generates RHs at the centroid and corners of a square boundary formed in the space of factors extracted by FA [8,9]. Next, the rectangular method (RM; Figure 1b) follows the same technique used in the SM, except it uses a rectangular boundary formed to statistically enclose a designated percentage of the population, as proposed by Kim and Whang [10]. Lastly, the circular method (CM; Figure 1c) employs PCA for variable reduction [11,12] and generates RHs at the centroid and points with a predefined angular interval (e.g., 45 • ) on the circular boundary formed in the space of components [13][14][15].
In this example, the height and width of the door would meet the designated κ if the stature and bideltoid breadth of the target population are considered independently, but become unsatisfactory if both the ADs are considered simultaneously-only 91.8% of the target population are smaller both in stature and bideltoid breadth than the 95th percentile humanoid.
To overcome the multivariate accommodation limitation of the percentile method, multivariate RH generation methods such as the rectangular, square, and circular methods (operationally termed in the present study) were developed using variable reduction techniques such as factor analysis (FA) and principal component analysis (PCA). The square method (SM; Figure 1a) generates RHs at the centroid and corners of a square boundary formed in the space of factors extracted by FA [8,9]. Next, the rectangular method (RM; Figure 1b) follows the same technique used in the SM, except it uses a rectangular boundary formed to statistically enclose a designated percentage of the population, as proposed by Kim and Whang [10]. Lastly, the circular method (CM; Figure 1c) employs PCA for variable reduction [11,12] and generates RHs at the centroid and points with a predefined angular interval (e.g., 45) on the circular boundary formed in the space of components [13][14][15]. Limitations of the existing multivariate generation methods include loss of anthropometric variability, the estimation error of ADs, and lack of body size diversity, as illustrated in Figure 2. Data reduction techniques such as FA and PCA reduce an original set of ADs to a small set of factors or components, while the majority of the body size variability (e.g., 60-97%) of the target population is accounted for, as shown in Table 1. Although this information distillation greatly simplifies the generation process of RHs, a significant proportion (e.g., 3-40%) of the body size variability not explained by extracted factors is lost in the RH generation process. Next, the conversion process of the factor scores of RHs to values of ADs causes significant estimation errors if the extracted factors Limitations of the existing multivariate generation methods include loss of anthropometric variability, the estimation error of ADs, and lack of body size diversity, as illustrated in Figure 2. Data reduction techniques such as FA and PCA reduce an original set of ADs to a small set of factors or components, while the majority of the body size variability (e.g., 60-97%) of the target population is accounted for, as shown in Table 1. Although this information distillation greatly simplifies the generation process of RHs, a significant proportion (e.g., 3-40%) of the body size variability not explained by extracted factors is lost in the RH generation process. Next, the conversion process of the factor scores of RHs to values of ADs causes significant estimation errors if the extracted factors (or components) are not strongly correlated with the ADs. Lastly, missing zones exist between the RHs along a boundary, which again causes a loss of the size diversity of the target population during the RH generation process.
i. 2021, 11, x FOR PEER REVIEW 3 of 10 (or components) are not strongly correlated with the ADs. Lastly, missing zones exist between the RHs along a boundary, which again causes a loss of the size diversity of the target population during the RH generation process.  To overcome the limitations of the existing multivariate generation methods, the present study proposes a novel multivariate method that generates RHs at a boundary zone (BZ) using cluster analysis and real anthropometric cases. The proposed boundary zone method (BZM) was evaluated with the existing multivariate methods (SM, RM, and CM) using 1988 U.S. Army anthropometric data and 10 ADs pertinent to computer workstation design [21] in terms of (1) multivariate accommodation percentage (MAP), (2) outlier percentage, and (3) normalized outlier magnitude.

Development of a Boundary Zone Method (BZM)
The BZM developed in the present study consists of three steps: (1) Formation of a BZ for a designated κ of the target population, (2) clustering of anthropometric cases in the BZ, and (3) selection of representative cases from anthropometric clusters as RHs.

Step 1: Formation of a Boundary Zone (BZ)
To identify anthropometric cases representing a designated κ of the target population, a BZ is constructed using the distribution of normalized squared distances of anthropometric cases from the centroid of the target population. A normality test, such as the

Step 1: Extract factors by data reduction technique
Step 3: Convert the factor scores to values of anthropometric dimensions Step 2: Determine factor scores of RHs at a boundary  To overcome the limitations of the existing multivariate generation methods, the present study proposes a novel multivariate method that generates RHs at a boundary zone (BZ) using cluster analysis and real anthropometric cases. The proposed boundary zone method (BZM) was evaluated with the existing multivariate methods (SM, RM, and CM) using 1988 U.S. Army anthropometric data and 10 ADs pertinent to computer workstation design [21] in terms of (1) multivariate accommodation percentage (MAP), (2) outlier percentage, and (3) normalized outlier magnitude.

Development of a Boundary Zone Method (BZM)
The BZM developed in the present study consists of three steps: (1) Formation of a BZ for a designated κ of the target population, (2) clustering of anthropometric cases in the BZ, and (3) selection of representative cases from anthropometric clusters as RHs.

Step 1: Formation of a Boundary Zone (BZ)
To identify anthropometric cases representing a designated κ of the target population, a BZ is constructed using the distribution of normalized squared distances of anthropometric cases from the centroid of the target population. A normality test, such as the Kolmogorov-Smirnov test, is applied to examine if the data set of an AD can be modeled by a normal distribution. For non-normal ADs, the Box-Cox transformation technique [22] is applied for normalization. Then, each anthropometric case of the target population is converted into a normalized squared distance (D) by Equation (1), which follows a χ 2 -distribution with degrees of freedom (df ) = n (the number of ADs under consideration), as ADs are multivariate normal [23][24][25]. Thus, the boundary of a designated κ of the target population is determined by χ 2 n (1 − κ) and the BZ for the designated κ can be constructed using two boundaries for κ ± tolerance (δ) (e.g., 90% ± 1%). Figure 3 illustrates a BZ formed by two boundaries using χ 2 2 (1 − 0.89) = 4.41 and χ 2 2 (1 − 0.91) = 4.81, where two ADs (n = 2) follow a bivariate normal distribution and κ ± δ = 90 ± 1% [4].
where: D = normalized squared distance; AD = n × 1 anthropometric case matrix of n anthropometric dimensions; µ = n × 1 average matrix of n anthropometric dimensions; Σ = n × n variance-covariance matrix of n anthropometric dimensions.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 Kolmogorov-Smirnov test, is applied to examine if the data set of an AD can be mod by a normal distribution. For non-normal ADs, the Box-Cox transformation techn [22] is applied for normalization. Then, each anthropometric case of the target popula is converted into a normalized squared distance (D) by Equation (1), which follows distribution with degrees of freedom (df) = n (the number of ADs under consideration ADs are multivariate normal [23][24][25]. Thus, the boundary of a designated κ of the ta population is determined by 2 n  (1 − κ) and the BZ for the designated κ can be constru using two boundaries for κ ± tolerance (δ) (e.g., 90% ± 1%).

Step 2. Clustering of Anthropometric Cases in the BZ
To form an optimal group of RHs, K-means cluster analysis is applied to the an pometric cases within the BZ. Anthropometric cases with similar body sizes are grou into the same cluster, as illustrated in Figure 4a. An appropriate number of clusters be determined by in-depth analysis on the MAP. For example, Figure 5 illustrates tha optimal number of clusters for the number of ADs = 10, κ ± δ = 90% ± 1%, and the num of anthropometric cases with the BZ = 60 can be determined at 40, from which the M stably satisfies the designated κ.

Step 2. Clustering of Anthropometric Cases in the BZ
To form an optimal group of RHs, K-means cluster analysis is applied to the anthropometric cases within the BZ. Anthropometric cases with similar body sizes are grouped into the same cluster, as illustrated in Figure 4a. An appropriate number of clusters can be determined by in-depth analysis on the MAP. For example, Figure 5 illustrates that the optimal number of clusters for the number of ADs = 10, κ ± δ = 90% ± 1%, and the number of anthropometric cases with the BZ = 60 can be determined at 40, from which the MAP stably satisfies the designated κ.

Step 3. Selection of Representative Anthropometric Cases
Representative anthropometric cases are selected as RHs by identifying those closest to the centroids of the anthropometric clusters formed in the BZ, as illustrated in Figure  4b. An RH representing each cluster can be defined by either the one nearest to the centroid of the cluster or the centroid itself; the former was adopted in the present study to avoid an error due to estimation.

Anthropometric Data
Herein, 1988 U.S. Army anthropometric data [26] were used to evaluate the existing and proposed multivariate RH generation methods. The U.S. Army anthropometric survey contains measurements of 132 ADs for 3987 participants (2213 women and 1774 men) and its database is freely available on the web. Of the anthropometric data, the data set of 10 ADs (abdominal extension depth, elbow rest height, forearm-to-forearm breadth, buttock-knee length, hip breadth, thigh clearance, buttock-popliteal length, popliteal height, number of cluster selected

Step 3. Selection of Representative Anthropometric Cases
Representative anthropometric cases are selected as RHs by identifying those closest to the centroids of the anthropometric clusters formed in the BZ, as illustrated in Figure  4b. An RH representing each cluster can be defined by either the one nearest to the centroid of the cluster or the centroid itself; the former was adopted in the present study to avoid an error due to estimation.

Anthropometric Data
Herein, 1988 U.S. Army anthropometric data [26] were used to evaluate the existing and proposed multivariate RH generation methods. The U.S. Army anthropometric survey contains measurements of 132 ADs for 3987 participants (2213 women and 1774 men) and its database is freely available on the web. Of the anthropometric data, the data set of 10 ADs (abdominal extension depth, elbow rest height, forearm-to-forearm breadth, buttock-knee length, hip breadth, thigh clearance, buttock-popliteal length, popliteal height knee height, and foot length) considered in computer workstation design [21] was extracted for comparative evaluation.
number of cluster selected Figure 5. Determination of an optimal number of clusters (illustrated). The number of anthropometric dimensions (n) = 10 and anthropometric cases within 90% ± 1% boundary zone = 60.

Step 3. Selection of Representative Anthropometric Cases
Representative anthropometric cases are selected as RHs by identifying those closest to the centroids of the anthropometric clusters formed in the BZ, as illustrated in Figure 4b. An RH representing each cluster can be defined by either the one nearest to the centroid of the cluster or the centroid itself; the former was adopted in the present study to avoid an error due to estimation.

Anthropometric Data
Herein, 1988 U.S. Army anthropometric data [26] were used to evaluate the existing and proposed multivariate RH generation methods. The U.S. Army anthropometric survey contains measurements of 132 ADs for 3987 participants (2213 women and 1774 men) and its database is freely available on the web. Of the anthropometric data, the data set of 10 ADs (abdominal extension depth, elbow rest height, forearm-to-forearm breadth, buttock-knee length, hip breadth, thigh clearance, buttock-popliteal length, popliteal height, knee height, and foot length) considered in computer workstation design [21] was extracted for comparative evaluation.

Performance Measures
The multivariate RH generation methods were evaluated in terms of three aspects: (1) The MAP, (2) outlier percentage, and (3) normalized outlier magnitude. MAP refers to the proportion of the target population accommodated by a group of RHs. Next, outlier percentage refers to the proportion of RHs that are larger or smaller than the body size ranges of the target population. Lastly, normalized outlier magnitude refers to the distance of an outlier from the corresponding body size range normalized by the corresponding mean: normalized outlier m agnitude = |outlier − max or min| mean (2)

Evaluation Programs
Programs were coded using MATLAB 7.0 (MathWorks, Inc., Natick, MA, USA) to evaluate the multivariate RH generation methods. The algorithms of the existing generation methods (SM, RM, and CM) were implemented by referring to Bittner [8], Kim and Whang [10], and Meindl et al. [15]. The calculation procedures of the MAP, outlier percentage, and normalized outlier magnitude were also coded for efficient evaluation. Lastly, the MAP of the BZM was analyzed with the RH generation results with 10 repeated trials because the seed points of clusters are randomly selected in K-mean cluster analysis and the corresponding clustering results vary accordingly [27].

Evaluation Results
The results of the MAP and outlier analyses are summarized in Table 2 for the RHs generated by the existing (SM, RM, and CM) and proposed (BZM) methods with the selected 1988 U.S. Army anthropometric data for κ ± δ = 90% ± 1%. Various groups of RHs were formed by increasing the number of factors or principal components from two to six in the existing methods and by repeating the K-mean cluster analysis with n = 10 in the proposed method. The MAP analysis results identified that the BZM was most preferred in terms of precise accommodation and the RM was most preferred in terms of the number of RHs. The BZM generated RH groups that accommodated precisely (90.6 ± 0.7%) the designated κ = 90% of the target population, while the RM tended to generate over-accommodating RH groups and both the SM and CM tended to generate under-accommodating RH groups. Next, the minimum number of RHs satisfying the designated κ was nine (MAP = 93%) for the RM, 41 for the BZM (MAP = 91%), and 73 for the CM (MAP = 92%)-note that the SM could not generate any RH group satisfying the designated κ. However, this preference for the RM in terms of the number of RHs was limited due to existence of outliers in the corresponding generated RH groups.
The outlier analysis results showed that the BZM was most preferred, followed by the SM, CM, and RM in terms of outlier percentage and normalized outlier magnitude. Figure 6 illustrates that the minimum values of forearm-to-forearm breadth (FFB) and abdominal extension depth (AED) of the 33 RHs generated by the RM were 27.2 cm and 12.3 cm, respectively, which were smaller than their corresponding minima (37.3 cm for FFB and 15.3 cm for AED) of the target population. Of the existing methods, the RM generated improper RHs with the largest values of outlier percentage and normalized outlier magnitude, followed by the CM and then the SM. On the contrary, the BZM did not include any outliers in its generated RHs.
The BZM generated RH groups that accommodated precisely (90.6  0.7%) the designated κ = 90% of the target population, while the RM tended to generate over-accommodating RH groups and both the SM and CM tended to generate under-accommodating RH groups. Next, the minimum number of RHs satisfying the designated κ was nine (MAP = 93%) for the RM, 41 for the BZM (MAP = 91%), and 73 for the CM (MAP = 92%)-note that the SM could not generate any RH group satisfying the designated κ. However, this preference for the RM in terms of the number of RHs was limited due to existence of outliers in the corresponding generated RH groups.
The outlier analysis results showed that the BZM was most preferred, followed by the SM, CM, and RM in terms of outlier percentage and normalized outlier magnitude. Figure 6 illustrates that the minimum values of forearm-to-forearm breadth (FFB) and abdominal extension depth (AED) of the 33 RHs generated by the RM were 27.2 cm and 12.3 cm, respectively, which were smaller than their corresponding minima (37.3 cm for FFB and 15.3 cm for AED) of the target population. Of the existing methods, the RM generated improper RHs with the largest values of outlier percentage and normalized outlier magnitude, followed by the CM and then the SM. On the contrary, the BZM did not include any outliers in its generated RHs. An in-depth analysis identified that the RHs of the BZM properly represented the body size diversity of the target population, while the RHs of the existing generation methods failed to do so, especially for ADs with a similar factor loading pattern. For example, Table 3 shows the results of a factor analysis on the 10 ADs related to computer workstation design; Figure 7a illustrates that the generated RHs of all of the existing and BZM methods properly represent the body size diversity of the target population for buttock-knee length and hip breadth, which have a different factor loading pattern, while Figure 7b displays how the generated RHs of all the existing methods failed to do so for abdominal extension depth and thigh clearance, which have a similar factor loading pattern.  An in-depth analysis identified that the RHs of the BZM properly represented the body size diversity of the target population, while the RHs of the existing generation methods failed to do so, especially for ADs with a similar factor loading pattern. For example, Table 3 shows the results of a factor analysis on the 10 ADs related to computer workstation design; Figure 7a illustrates that the generated RHs of all of the existing and BZM methods properly represent the body size diversity of the target population for buttock-knee length and hip breadth, which have a different factor loading pattern, while Figure 7b displays how the generated RHs of all the existing methods failed to do so for abdominal extension depth and thigh clearance, which have a similar factor loading pattern.

Discussion
The present study developed the BZM, which can resolve the limitations (loss of anthropometric variability, lack of body size diversity, and estimation error of an AD) of the existing multivariate RH generation methods (SM, RM, and CM). The BZM uses normalized squared distances of anthropometric cases that follow a χ 2 -distributionnot data reduction techniques such as FA and PCA, which cause the loss of anthropometric variability in RH generation [20]. Next, the BZM uses a statistical clustering method to cluster anthropometric cases in the BZ so that the occurrence of missing zones between RHs can be avoided and the body size diversity of the target population can be better represented. Lastly, the BZM selects a real anthropometric case, not a case of which individual body dimensions are estimated, for an RH of each cluster so that an error due to estimation of ADs can be prevented.
The BZM is based on the assumption that ADs follow a multivariate normal distribution to form a BZ using normalized square distances of anthropometric cases. ADs are commonly known as normal [2,7], but significant normality violations exist in some ADs, such as chest breadth and forearm-hand length [28]. The Kolmogorov-Smirnov test was conducted on the 10 ADs of the U.S. Army data considered in the present study and identified that three ADs in males and four ADs in females failed to satisfy the normal distribution assumption at α = 0.01. In the present study, non-normal ADs were transformed by the Box-Cox transformation method before establishing a BZ. Since Box-Cox transformation was applied only to identify the RHs, its reverse transformation was not necessary.
The BZM requires a technical discretion on the level of tolerance (e.g., δ = 1%) and

Discussion
The present study developed the BZM, which can resolve the limitations (loss of anthropometric variability, lack of body size diversity, and estimation error of an AD) of the existing multivariate RH generation methods (SM, RM, and CM). The BZM uses normalized squared distances of anthropometric cases that follow a χ 2 -distribution-not data reduction techniques such as FA and PCA, which cause the loss of anthropometric variability in RH generation [20]. Next, the BZM uses a statistical clustering method to cluster anthropometric cases in the BZ so that the occurrence of missing zones between RHs can be avoided and the body size diversity of the target population can be better represented. Lastly, the BZM selects a real anthropometric case, not a case of which individual body dimensions are estimated, for an RH of each cluster so that an error due to estimation of ADs can be prevented.
The BZM is based on the assumption that ADs follow a multivariate normal distribution to form a BZ using normalized square distances of anthropometric cases. ADs are commonly known as normal [2,7], but significant normality violations exist in some ADs, such as chest breadth and forearm-hand length [28]. The Kolmogorov-Smirnov test was conducted on the 10 ADs of the U.S. Army data considered in the present study and identified that three ADs in males and four ADs in females failed to satisfy the normal distribution assumption at α = 0.01. In the present study, non-normal ADs were transformed by the Box-Cox transformation method before establishing a BZ. Since Box-Cox transformation was applied only to identify the RHs, its reverse transformation was not necessary.
The BZM requires a technical discretion on the level of tolerance (e.g., δ = ±1%) and a large amount (>2000) of anthropometric data to construct a proper BZ. If a tolerance level is too small, the number of anthropometric cases within the BZ can become too small to generate RHs in the subsequent steps. Conversely, if a tolerance level is too large, anthropometric cases far apart from the designated κ can be considered as candidates of RHs. Therefore, an appropriate level of tolerance needs to be determined by considering various technical aspects such as the size of the anthropometric database-the larger the size of the anthropometric database, the smaller the tolerance level.
The outlier analysis showed that only the BZM did not generate outliers, while the existing methods generated RHs with improper values of ADs beyond their corresponding body size ranges of the target population. The occurrence of outliers in the existing methods is caused by a significantly large error in estimation during the conversion process of factors scores to values of ADs, especially ADs that are not strongly correlated with factors or components. To avoid the occurrence of an outlier in RH generation, the BZM selects a measured (not estimated) anthropometric case closest to the centroid of each cluster as an RH.
The present study used the U.S. Army anthropometric database collected by Gordon et al. [26] to evaluate the BZM method. Although the U.S. Army database contains large measurements (3987 participants) on a comprehensive set of anthropometric variables, it is necessary to evaluate the BZM method with an anthropometric databases of various populations. Therefore, future research is recommended to validate the performance of the BZM method with various anthropometric databases for generalization.

Conclusions
The BZM was developed and compared to the existing multivariate RH generation methods (SM, RM, and CM) based on FA or PCA in terms of the MAP, outlier percentage, and normalized outlier magnitude. The evaluation results support the effectiveness of the BZM by showing that the BZM generates appropriate RHs for a designated accommodation percentage of the population and overcomes the limitations (loss of anthropometric variability, lack of body size diversity, and estimation error of an AD) of the existing methods. Further research is needed to validate the effectiveness of the BZM in solving real ergonomics design problems.