Comparison of Multivariate Analysis Methods as Applied to English Speech

: A newly developed factor analysis, origin-shifted factor analysis, was compared with a normal factor analysis to analyze the spectral changes of English speech. Our ﬁrst aim was to investigate whether these analyses would cause di ﬀ erences in the factor loadings and the extracted spectral-factor scores. The methods mainly di ﬀ ered in whether to use cepstral liftering and an origin shift. The results showed that three spectral factors were obtained in four main frequency bands, but neither the cepstral liftering nor the origin shift distorted the essential characteristics of the factors. This conﬁrms that the origin-shifted factor analysis is more recommendable for future speech analyses, since it would reduce the generation of noise in resynthesized speech. Our second aim was to further identify acoustic correlates of English phonemes. Our data show for the ﬁrst time that the distribution of obstruents in English speech constitutes an L-shape related to two spectral factors on the three-dimensional conﬁguration. One factor had center loadings around 4100 Hz, while the other was bimodal with peaks around 300 Hz and 2300 Hz. This new ﬁnding validates the use of multivariate analyses to connect English phonology and speech acoustics.


Introduction
The frequency range covered by the human hearing system is remarkable. In an initial attempt to model the hearing system's frequency resolution, Zwicker (1961) proposed the bark scale, which divides the audible frequency range into 24 frequency ranges between 20 to 15500 Hz [1]. These 24 frequency ranges are now commonly considered as representing 24 "critical bands", which can be regarded as a series of bandpass filters through which sounds are processed [2]. The frequency range of speech as in a common AM-broadcasting system is limited to a relatively narrow range of approximately <7000 Hz, and in classic models it is often represented by 20 critical bands between 50 and 7000 Hz in the auditory periphery [3]. In order to resynthesize fairly intelligible speech, however, even fewer frequency bands seem sufficient.
Previous studies related to this proceeded from a series of principal component analyses and listening experiments. Plomp et al. (1967) found that only two main principal components could

Speech Samples
To compare the results of the origin-shifted factor analysis with the results of the normal factor analysis, the same 200 English sentences as in Nakajima et al. (2017) [8] were used. The sentences were spoken by one male and two female native speakers of English, taken from "The ATR British English Speech Database" [13]. The speech samples were recorded with a sampling frequency of 12000 Hz and 16-bit linear quantization. The speech signals of all the spoken sentences in the database were segmented into individual phonemes and were labeled utilizing the Machine Readable Phonetic Alphabet (MRPA) [14]. A total number of 31663 "phonemes" were labeled, but to maintain consistency with Nakajima et al.'s study (2017) [8], in which the phoneme labeling of this database had to be reexamined, 7523 were omitted, and the same 24140 English phonemes were taken up as analysis samples.

Procedure
Following the Nyquist theorem [15] and the 12,000-Hz sampling frequency of the English speech samples, 19 critical-band filters were constructed to cover the frequency range of 50-5300 Hz. Their center frequencies ranged from 75-4800 Hz. The frequency bands were adopted from Zwicker and Terhardt (1980) [16]. The frequency range below 50 Hz was not utilized because it should have been less relevant to the speech signal.
The factor analyses were performed as follows. Nakajima et al. (2017) calculated the powers of the filter outputs by squaring and moving-averaging them with a Gaussian window of σ = 5 ms. Thus, they obtained a smoothed power fluctuation for each filter, and the powers for the 19 filters were sampled at every 1 ms as 19 variates for factor analysis. In comparison with that method, here we followed Kishida et al. (2016) [12], where the speech signals were sampled every 1 ms with a 30-ms-long Hamming window. Kishida et al. then transferred every 30-ms-long segment to a power spectrum by Fast Fourier Transform (FFT) [12]. Following this, the power spectrum was smoothed with a 5-ms short-pass lifter by cepstral analysis [17]. The origin of the factors, which are represented as orthogonal vectors in the 19-dimensional variate space, was shifted from the gravity center of the data points to the silent point, at which all variates (powers) are zero. In this origin-shifted factor analysis, a silent part in the speech signal is represented always at the silent point. This method thus should be suitable for observing the natures of relatively weak sounds, which are closer to the silent point. This was not the case in Nakajima et al. (2017) [8], in which a normal factor analysis was used.
Together, we here performed four different factor analyses: with and without cepstral liftering, and with and without the origin shift. We obtained the factor loadings and took up the factor scores of the central midpoints of time for the labeled phonemes. We then analyzed the distribution of the phonemes as represented in the three-dimensional factor space. The three-dimensional axes derived from the four factor analyses were rotated by varimax rotation [18], resulting in spectral factors.

Factor Loadings of the Three Spectral Factors
From the perspective of the analysis processes, the major differences between the two methods, i.e., the factor analyses in Nakajima et al. (2017) [8] and in Kishida et al. (2016) [12], were whether to use the cepstral liftering and the origin shift to the silent point. Figure 1 shows the factor loadings for all the English speech samples spoken by the three native speakers, obtained with each of the four analysis methods. One division of the horizontal axis corresponds to 0.5 critical bandwidth. Four main frequency bands were obtained. The first constituted a low-frequency band, approximately from 50 to 600 Hz. The second was a mid-low frequency band, approximately from 600 to 1700 Hz, followed by a mid-high-frequency band, approximately from 1700 to 3000 Hz. Finally, the fourth range was a Appl. Sci. 2020, 10, 7076 4 of 12 high-frequency band, approximately above 3000 Hz. Confirming the results of Ueda and Nakajima (2017) [7], Nakajima et al. (2017) [8], and Kishida et al. (2016) [12], these four frequency bands were related to three spectral factors. One factor, the "low & mid-high factor" (Figure 1, red line) was bimodal in that the frequencies of relatively high loadings were around 300 Hz and around 2300 Hz. The second factor, the "mid-low factor", was located around 1100 Hz (Figure 1, black line). The third factor, the "high factor", was located around 4100 Hz (Figure 1, blue line).
. Figure 1. Factor loadings of the three extracted spectral factors of 200 English speech samples from three native speakers. Four different factor analyses were performed: (a) origin-shifted factor analysis with cepstral liftering, following [12]; (b) normal factor analysis with cepstral liftering; (c) originshifted factor analysis without cepstral liftering; (d) normal factor analysis without cepstral liftering.
The cumulative contributions of the three spectral factors were around 43% in Figure 1a,b, while they were around 45% in Figure 1c,d. As can be seen by comparing Figure 1a,b, or by comparing Figure 1c,d, the origin shift did not greatly affect the factor loadings-the factor loadings with origin shift and without origin shift turned out to be very similar. Cepstral analysis, however, affected the factor loadings for just one factor. That is, the second peak of the bimodal "low & mid-high factor" was prominent without cepstral liftering, as can be seen in Figure 1c,d, but was reduced with cepstral liftering, as shown in Figure 1a,b. Thus, the similarity in the factor loadings between Figure 1a,b or between Figure 1c,d confirmed that the origin shift keeps the essential features of the factors in the analyses. Four different factor analyses were performed: (a) origin-shifted factor analysis with cepstral liftering, following [12]; (b) normal factor analysis with cepstral liftering; (c) origin-shifted factor analysis without cepstral liftering; (d) normal factor analysis without cepstral liftering.
The cumulative contributions of the three spectral factors were around 43% in Figure 1a,b, while they were around 45% in Figure 1c,d. As can be seen by comparing Figure 1a,b, or by comparing Figure 1c,d, the origin shift did not greatly affect the factor loadings-the factor loadings with origin shift and without origin shift turned out to be very similar. Cepstral analysis, however, affected the factor loadings for just one factor. That is, the second peak of the bimodal "low & mid-high factor" was prominent without cepstral liftering, as can be seen in Figure 1c,d, but was reduced with cepstral liftering, as shown in Figure 1a,b. Thus, the similarity in the factor loadings between Figure 1a,b or between Figure 1c,d confirmed that the origin shift keeps the essential features of the factors in the analyses.

Factor Scores of the Three Spectral Factors
Following the analysis of the factor loadings, the factor scores were analyzed. In Figures 2-5 below, panels (a), (b), and (c) show the distributions of combinations of the three spectral-factor scores, i.e., between the "low & mid-high factor" (around 300 and 2300 Hz), the "mid-low factor" (around 1100 Hz), and the "high factor" (above 3000 Hz). Panel (d) in  shows the distribution in the three-dimensional configuration as viewed in a direction from above-right to below-left in panel (a). In panel (d), the horizontal axis represents the combination of the "mid-low factor" and the "high factor", by calculating (x−y)/ √ 2, where x indicates the factor score of the "mid-low factor", and y the factor score of the "high factor", following Nakajima et al. (2017) [8].

Factor Scores of the Three Spectral Factors
Following the analysis of the factor loadings, the factor scores were analyzed. In Figures 2-5 below, panels (a), (b), and (c) show the distributions of combinations of the three spectral-factor scores, i.e., between the "low & mid-high factor" (around 300 and 2300 Hz), the "mid-low factor" (around 1100 Hz), and the "high factor" (above 3000 Hz). Panel (d) in Figures 2-5 shows the distribution in the three-dimensional configuration as viewed in a direction from above-right to below-left in panel (a). In panel (d), the horizontal axis represents the combination of the "mid-low factor" and the "high factor", by calculating (x−y)/√2, where x indicates the factor score of the "midlow factor", and y the factor score of the "high factor", following Nakajima et al. (2017) [8].    The distribution of the factor scores showed a similar tendency in each of the four factor analyses ( . Although the cepstral liftering had some influence on the factor loadings of the second peak of the bimodal "low & mid-high factor", overall the factor scores showed very similar distributions in the factor space. The same can be said for the use of the origin shift. Whether the origin shift was applied or not, the factor loadings (Figure 1a-d) and the factor scores (Figures 2-5) showed very similar tendencies.
As can be seen, all phonemes were distributed into three fairly distinctive areas in the three-dimensional space. On the "mid-low factor", the highest factor scores were obtained by vowels, then by sonorant consonants, while the lowest factor scores were obtained by obstruents. Most of the obstruents occupied a position very near to or below zero on the "mid-low factor". By contrast, they occupied a position above zero on the "low & mid-high factor", while on the "high factor" they even reached the highest factor scores. In the three-factor factor analysis, Nakajima et al. (2017) [8] showed that the distribution of the three English-phoneme categories (i.e., vowels, sonorant consonants, and obstruents) constituted an L-shape in the two-dimensional factor space of the "high factor" and the "mid-low factor". That is, the distributions of the phonemes linearly extended along both axes from the origin into the positive directions. Although not as clearly pronounced and narrowly shaped as in Nakajima et al. (2017) [8], also in the present results, a rather similar L-shaped distribution was found as in Figures 2, 3, 4 and 5a. Interestingly, the distribution of the obstruents also showed an L-shape in the two-dimensional factor space of Figures 2, 3, 4 and 5c. The factor scores of obstruents were mutually exclusive between the "low & mid-high factor" and the "high factor": If the factor score of the "low & mid-high factor" was higher, then the factor score of the "high factor" was very low, and vice versa. Generally, the L-shaped distribution of the factor scores of the obstruents across the "low & mid-high factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8], but hardly in a noticeable way. In the present analyses, the L-shaped distribution appeared far more prominently.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 12  The distribution of the factor scores showed a similar tendency in each of the four factor analyses (Figures 2-5). Although the cepstral liftering had some influence on the factor loadings of the second peak of the bimodal "low & mid-high factor", overall the factor scores showed very similar distributions in the factor space. The same can be said for the use of the origin shift. Whether the origin shift was applied or not, the factor loadings (Figure 1a-d) and the factor scores (Figures 2-5) showed very similar tendencies.
As can be seen, all phonemes were distributed into three fairly distinctive areas in the threedimensional space. On the "mid-low factor", the highest factor scores were obtained by vowels, then by sonorant consonants, while the lowest factor scores were obtained by obstruents. Most of the obstruents occupied a position very near to or below zero on the "mid-low factor". By contrast, they occupied a position above zero on the "low & mid-high factor", while on the "high factor" they even reached the highest factor scores. In the three-factor factor analysis, Nakajima et al. (2017) [8] showed that the distribution of the three English-phoneme categories (i.e., vowels, sonorant consonants, and obstruents) constituted an L-shape in the two-dimensional factor space of the "high factor" and the "mid-low factor". That is, the distributions of the phonemes linearly extended along both axes from the origin into the positive directions. Although not as clearly pronounced and narrowly shaped as in Nakajima et al. (2017) [8], also in the present results, a rather similar L-shaped distribution was Given the prominence of the L-shape for obstruents, more detailed analyses were performed. Figure 6 shows that the distributions of voiced obstruents ( Figure 6a found as in Figures 2-5a. Interestingly, the distribution of the obstruents also showed an L-shape in the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents were mutually exclusive between the "low & mid-high factor" and the "high factor": If the factor score of the "low & mid-high factor" was higher, then the factor score of the "high factor" was very low, and vice versa. Generally, the L-shaped distribution of the factor scores of the obstruents across the "low & mid-high factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8], but hardly in a noticeable way. In the present analyses, the L-shaped distribution appeared far more prominently.
Given the prominence of the L-shape for obstruents, more detailed analyses were performed. Figure 6 shows that the distributions of voiced obstruents ( Figure 6a (Figure 6a), three obstruents occupied a relatively high position on the "low & mid-high factor", almost parallel to the y-axis, and close to the origin on the high factor: /b/, /v/, and /ð/, for which the factor scores of /v/ and /ð/ were around half of /b/. By contrast, four other voiced obstruents occupied a relatively high position on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ and /dʒ/. Only the factor scores of /d/ were both distributed on the "low & mid-high factor" and the "high factor", yet again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed the L-shape, but they were mainly higher on the "low & mid-high factor". The distributions of /t/ and /θ/ were also L-shaped, but they were mainly higher on the "high factor". The distributions of three obstruents occupied a relatively high position only on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores of /s/ were about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ/ and /dʒ/ and the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor". found as in Figures 2-5a. Interestingly, the distribution of the obstruents also showed an L-shap the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents were mutu exclusive between the "low & mid-high factor" and the "high factor": If the factor score of the " & mid-high factor" was higher, then the factor score of the "high factor" was very low, and vice ve Generally, the L-shaped distribution of the factor scores of the obstruents across the "low & mid-h factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8], but har in a noticeable way. In the present analyses, the L-shaped distribution appeared far m prominently. Given the prominence of the L-shape for obstruents, more detailed analyses were perform Figure 6 shows that the distributions of voiced obstruents ( Figure 6a (Figure 6a), three obstruents occupied a relatively high position on the " & mid-high factor", almost parallel to the y-axis, and close to the origin on the high factor: /b/, and /ð/, for which the factor scores of /v/ and /ð/ were around half of /b/. By contrast, four other voi obstruents occupied a relatively high position on the "high factor", almost parallel to the x-axis, close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ and /dʒ/. Only factor scores of /d/ were both distributed on the "low & mid-high factor" and the "high factor", again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed the L-sha but they were mainly higher on the "low & mid-high factor". The distributions of /t/ and /θ/ w also L-shaped, but they were mainly higher on the "high factor". The distributions of three obstrue occupied a relatively high position only on the "high factor", almost parallel to the x-axis, and c to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores of /s/ w about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ/ and /dʒ/ the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor".  Figures 2-5a. Interestingly, the distribution of the obstruents also showed an the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents wer exclusive between the "low & mid-high factor" and the "high factor": If the factor score & mid-high factor" was higher, then the factor score of the "high factor" was very low, and Generally, the L-shaped distribution of the factor scores of the obstruents across the "low factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8] in a noticeable way. In the present analyses, the L-shaped distribution appeared prominently.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed t but they were mainly higher on the "low & mid-high factor". The distributions of /t/ an also L-shaped, but they were mainly higher on the "high factor". The distributions of three occupied a relatively high position only on the "high factor", almost parallel to the x-axi to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ/ a the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor". Interestingly, the distribution of the obstruents also sho the two-dimensional factor space of Figures 2-5c. The factor scores of obstrue exclusive between the "low & mid-high factor" and the "high factor": If the facto & mid-high factor" was higher, then the factor score of the "high factor" was very l Generally, the L-shaped distribution of the factor scores of the obstruents across th factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2 in a noticeable way. In the present analyses, the L-shaped distribution ap prominently.
Given the prominence of the L-shape for obstruents, more detailed analyse Figure 6 shows that the distributions of voiced obstruents ( Figure 6a For the voiced obs three obstruents occupied a relatively high position on the "low & mid-high facto to the y-axis, and close to the origin on the high factor: /b/, /v/, and /ð/, for which /v/ and /ð/ were around half of /b/. By contrast, four other voiced obstruents oc high position on the "high factor", almost parallel to the x-axis, and close to the or mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ factor scores of /d/ were both distributed on the "low & mid-high factor" and the again in an L-shape. The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, disp but they were mainly higher on the "low & mid-high factor". The distributions also L-shaped, but they were mainly higher on the "high factor". The distributions occupied a relatively high position only on the "high factor", almost parallel to th Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the the two-dimensional factor space of Figures 2-5c. The facto exclusive between the "low & mid-high factor" and the "high & mid-high factor" was higher, then the factor score of the "hig Generally, the L-shaped distribution of the factor scores of the o factor" and the "high factor" as analyzed here also appeared in in a noticeable way. In the present analyses, the L-shap prominently.
Given the prominence of the L-shape for obstruents, mo , and /θ/ (205) three obstruents occupied a relatively high position on the "lo to the y-axis, and close to the origin on the high factor: /b/, /v/, /v/ and /ð/ were around half of /b/. By contrast, four other vo high position on the "high factor", almost parallel to the x-axis mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ wer factor scores of /d/ were both distributed on the "low & mid-h again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, but they were mainly higher on the "low & mid-high factor" also L-shaped, but they were mainly higher on the "high factor occupied a relatively high position only on the "high factor", Interestingly, the distribution of the obstruents also showed an L-shape in the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents were mutually exclusive between the "low & mid-high factor" and the "high factor": If the factor score of the "low & mid-high factor" was higher, then the factor score of the "high factor" was very low, and vice versa. Generally, the L-shaped distribution of the factor scores of the obstruents across the "low & mid-high factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8], but hardly in a noticeable way. In the present analyses, the L-shaped distribution appeared far more prominently.
Given the prominence of the L-shape for obstruents, more detailed analyses were performed. Figure 6 shows that the distributions of voiced obstruents ( Figure 6a (Figure 6a), three obstruents occupied a relatively high position on the "low & mid-high factor", almost parallel to the y-axis, and close to the origin on the high factor: /b/, /v/, and /ð/, for which the factor scores of /v/ and /ð/ were around half of /b/. By contrast, four other voiced obstruents occupied a relatively high position on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ and /dʒ/. Only the factor scores of /d/ were both distributed on the "low & mid-high factor" and the "high factor", yet again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed the L-shape, but they were mainly higher on the "low & mid-high factor". The distributions of /t/ and /θ/ were also L-shaped, but they were mainly higher on the "high factor". The distributions of three obstruents occupied a relatively high position only on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores of /s/ were about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ/ and /dʒ/ and the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor". /, for which the factor scores of /v/ and / Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the obstruents also sh the two-dimensional factor space of Figures 2-5c. The factor scores of obstru exclusive between the "low & mid-high factor" and the "high factor": If the fact & mid-high factor" was higher, then the factor score of the "high factor" was very Generally, the L-shaped distribution of the factor scores of the obstruents across t factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2 in a noticeable way. In the present analyses, the L-shaped distribution a prominently. Given the prominence of the L-shape for obstruents, more detailed analys Figure 6 shows that the distributions of voiced obstruents ( Figure 6a (Figure 6a), three obstruents occupied a relatively high p & mid-high factor", almost parallel to the y-axis, and close to the origin on the and /ð/, for which the factor scores of /v/ and /ð/ were around half of /b/. By contra obstruents occupied a relatively high position on the "high factor", almost paral close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ factor scores of /d/ were both distributed on the "low & mid-high factor" and th again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, dis but they were mainly higher on the "low & mid-high factor". The distributions also L-shaped, but they were mainly higher on the "high factor". The distribution occupied a relatively high position only on the "high factor", almost parallel to t to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the facto about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor".
/ were around half of /b/. By contrast, four other voiced obstruents occupied a relatively high position on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /g/, / Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the obstru the two-dimensional factor space of Figures 2-5c. The factor score exclusive between the "low & mid-high factor" and the "high factor & mid-high factor" was higher, then the factor score of the "high facto Generally, the L-shaped distribution of the factor scores of the obstru factor" and the "high factor" as analyzed here also appeared in Nak in a noticeable way. In the present analyses, the L-shaped d prominently.
Given the prominence of the L-shape for obstruents, more det Figure 6 shows that the distributions of voiced obstruents (Figure 6a (Figure 6a), three obstruents occupied a rela & mid-high factor", almost parallel to the y-axis, and close to the o and /ð/, for which the factor scores of /v/ and /ð/ were around half of / obstruents occupied a relatively high position on the "high factor", a close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were arou factor scores of /d/ were both distributed on the "low & mid-high fa again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/ but they were mainly higher on the "low & mid-high factor". The also L-shaped, but they were mainly higher on the "high factor". The occupied a relatively high position only on the "high factor", almost to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for wh about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v ð /, /d Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the o the two-dimensional factor space of Figures 2-5c. The factor exclusive between the "low & mid-high factor" and the "high f & mid-high factor" was higher, then the factor score of the "high Generally, the L-shaped distribution of the factor scores of the o factor" and the "high factor" as analyzed here also appeared in in a noticeable way. In the present analyses, the L-shape prominently.
Given the prominence of the L-shape for obstruents, mor  (Figure 6a), three obstruents occupied & mid-high factor", almost parallel to the y-axis, and close to t and /ð/, for which the factor scores of /v/ and /ð/ were around hal obstruents occupied a relatively high position on the "high fact close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were factor scores of /d/ were both distributed on the "low & mid-hi again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k but they were mainly higher on the "low & mid-high factor". also L-shaped, but they were mainly higher on the "high factor" occupied a relatively high position only on the "high factor", a to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, fo about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents ð /and /z/, for which the factor scores of /g/ and /z/ were around half of / Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 found as in Figures 2-5a. Interestingly, the distribution of the obstruents also showed an L-sha the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents were mut exclusive between the "low & mid-high factor" and the "high factor": If the factor score of the & mid-high factor" was higher, then the factor score of the "high factor" was very low, and vice v Generally, the L-shaped distribution of the factor scores of the obstruents across the "low & mid factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8], but h in a noticeable way. In the present analyses, the L-shaped distribution appeared far prominently.
Given the prominence of the L-shape for obstruents, more detailed analyses were perfor Figure 6 shows that the distributions of voiced obstruents ( Figure 6a (Figure 6a), three obstruents occupied a relatively high position on the & mid-high factor", almost parallel to the y-axis, and close to the origin on the high factor: /b and /ð/, for which the factor scores of /v/ and /ð/ were around half of /b/. By contrast, four other v obstruents occupied a relatively high position on the "high factor", almost parallel to the x-axis close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ and /dʒ/. On factor scores of /d/ were both distributed on the "low & mid-high factor" and the "high factor again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed the L-s but they were mainly higher on the "low & mid-high factor". The distributions of /t/ and /θ/ also L-shaped, but they were mainly higher on the "high factor". The distributions of three obstr occupied a relatively high position only on the "high factor", almost parallel to the x-axis, and to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores of /s/ about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / / and /d Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the obstruents also showed the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents w exclusive between the "low & mid-high factor" and the "high factor": If the factor sco & mid-high factor" was higher, then the factor score of the "high factor" was very low, a Generally, the L-shaped distribution of the factor scores of the obstruents across the "lo factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [ in a noticeable way. In the present analyses, the L-shaped distribution appear prominently.
Given the prominence of the L-shape for obstruents, more detailed analyses we Figure 6 shows that the distributions of voiced obstruents (Figure 6a) (Figure 6a), three obstruents occupied a relatively high positio & mid-high factor", almost parallel to the y-axis, and close to the origin on the high f and /ð/, for which the factor scores of /v/ and /ð/ were around half of /b/. By contrast, fou obstruents occupied a relatively high position on the "high factor", almost parallel to t close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ and factor scores of /d/ were both distributed on the "low & mid-high factor" and the "hig again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed but they were mainly higher on the "low & mid-high factor". The distributions of /t/ also L-shaped, but they were mainly higher on the "high factor". The distributions of th occupied a relatively high position only on the "high factor", almost parallel to the x-a to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scor about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / /. Only the factor scores of /d/ were both distributed on the "low & mid-high factor" and the "high factor", yet again in an L-shape. also L-shaped, but they were mainly higher on the "high factor". The distributions of three obstruents occupied a relatively high position only on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores of /s/ were about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and /ð/ were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ/ and /dʒ/ and the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor". Figure 6. The L-shaped distributions of voiced obstruents (a) and voiceless obstruents (b) in the twodimensional space of the "low & mid-high factor" and the "high factor". Factor scores of the three extracted spectral factors were obtained by origin-shifted factor analysis with cepstral liftering (cf. Figure 2) and the English speech samples were from three native speakers. Figure 7 shows the distributions of all the obstruents divided into fricatives/affricates and plosives on the two-dimensional configuration, again clearly showing a distinctive L-shape. Here Figure 6. The L-shaped distributions of voiced obstruents (a) and voiceless obstruents (b) in the two-dimensional space of the "low & mid-high factor" and the "high factor". Factor scores of the three extracted spectral factors were obtained by origin-shifted factor analysis with cepstral liftering (cf. Figure 2) and the English speech samples were from three native speakers.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed the L-shape, but they were mainly higher on the "low & mid-high factor". The distributions of /t/ and /θ/ were also L-shaped, but they were mainly higher on the "high factor". The distributions of three obstruents occupied a relatively high position only on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": / Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the obstruents also sh the two-dimensional factor space of Figures 2-5c. The factor scores of obstru exclusive between the "low & mid-high factor" and the "high factor": If the fac & mid-high factor" was higher, then the factor score of the "high factor" was very Generally, the L-shaped distribution of the factor scores of the obstruents across factor" and the "high factor" as analyzed here also appeared in Nakajima et al. ( in a noticeable way. In the present analyses, the L-shaped distribution prominently. Given the prominence of the L-shape for obstruents, more detailed analy Figure 6 shows that the distributions of voiced obstruents ( Figure 6a)  For the voiced ob three obstruents occupied a relatively high position on the "low & mid-high fac to the y-axis, and close to the origin on the high factor: /b/, /v/, and /ð/, for whic /v/ and /ð/ were around half of /b/. By contrast, four other voiced obstruents o high position on the "high factor", almost parallel to the x-axis, and close to the mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of / factor scores of /d/ were both distributed on the "low & mid-high factor" and t again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, di but they were mainly higher on the "low & mid-high factor". The distribution also L-shaped, but they were mainly higher on the "high factor". The distribution occupied a relatively high position only on the "high factor", almost parallel to to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the fact about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor".

/, /t
Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the obstruents al the two-dimensional factor space of Figures 2-5c. The factor scores of ob exclusive between the "low & mid-high factor" and the "high factor": If the & mid-high factor" was higher, then the factor score of the "high factor" was Generally, the L-shaped distribution of the factor scores of the obstruents ac factor" and the "high factor" as analyzed here also appeared in Nakajima e in a noticeable way. In the present analyses, the L-shaped distribut prominently.
Given the prominence of the L-shape for obstruents, more detailed a Figure 6 shows that the distributions of voiced obstruents ( Figure 6a For the voice three obstruents occupied a relatively high position on the "low & mid-hig to the y-axis, and close to the origin on the high factor: /b/, /v/, and /ð/, for w /v/ and /ð/ were around half of /b/. By contrast, four other voiced obstrue high position on the "high factor", almost parallel to the x-axis, and close to mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half factor scores of /d/ were both distributed on the "low & mid-high factor" a again in an L-shape. The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f but they were mainly higher on the "low & mid-high factor". The distribu also L-shaped, but they were mainly higher on the "high factor". The distribu occupied a relatively high position only on the "high factor", almost parall to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and ð / were mainly related to the "low & mid-high factor". The voiced obstru the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor". /, and /s/, for which the factor scores of /s/ were about half of those of / Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 12 found as in Figures 2-5a. Interestingly, the distribution of the obstruents also showed an L-shape in the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents were mutually exclusive between the "low & mid-high factor" and the "high factor": If the factor score of the "low & mid-high factor" was higher, then the factor score of the "high factor" was very low, and vice versa. Generally, the L-shaped distribution of the factor scores of the obstruents across the "low & mid-high factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8], but hardly in a noticeable way. In the present analyses, the L-shaped distribution appeared far more prominently.
Given the prominence of the L-shape for obstruents, more detailed analyses were performed. Figure 6 shows that the distributions of voiced obstruents (Figure 6a (Figure 6a), three obstruents occupied a relatively high position on the "low & mid-high factor", almost parallel to the y-axis, and close to the origin on the high factor: /b/, /v/, and /ð/, for which the factor scores of /v/ and /ð/ were around half of /b/. By contrast, four other voiced obstruents occupied a relatively high position on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ and /dʒ/. Only the factor scores of /d/ were both distributed on the "low & mid-high factor" and the "high factor", yet again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed the L-shape, but they were mainly higher on the "low & mid-high factor". The distributions of /t/ and /θ/ were also L-shaped, but they were mainly higher on the "high factor". The distributions of three obstruents occupied a relatively high position only on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores of /s/ were about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ/ and /dʒ/ and the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor". / and /t Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of found as in Figures 2-5a. Interestingly, the distribution of the obstruents also showed an L-shape the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents were mutua exclusive between the "low & mid-high factor" and the "high factor": If the factor score of the "lo & mid-high factor" was higher, then the factor score of the "high factor" was very low, and vice ver Generally, the L-shaped distribution of the factor scores of the obstruents across the "low & mid-hi factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8], but hard in a noticeable way. In the present analyses, the L-shaped distribution appeared far mo prominently.
Given the prominence of the L-shape for obstruents, more detailed analyses were performe Figure 6 shows that the distributions of voiced obstruents (Figure 6a three obstruents occupied a relatively high position on the "low & mid-high factor", almost paral to the y-axis, and close to the origin on the high factor: /b/, /v/, and /ð/, for which the factor scores /v/ and /ð/ were around half of /b/. By contrast, four other voiced obstruents occupied a relative high position on the "high factor", almost parallel to the x-axis, and close to the origin on the "low mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ and /dʒ/. Only t factor scores of /d/ were both distributed on the "low & mid-high factor" and the "high factor", y again in an L-shape. The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed the L-shap but they were mainly higher on the "low & mid-high factor". The distributions of /t/ and /θ/ we also L-shaped, but they were mainly higher on the "high factor". The distributions of three obstruen occupied a relatively high position only on the "high factor", almost parallel to the x-axis, and clo to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores of /s/ we about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ/ and /dʒ/ a the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor". /. In sum, the voiced obstruents /b/, /v/, and / Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribu the two-dimensional factor space of Figures 2-5c. exclusive between the "low & mid-high factor" and & mid-high factor" was higher, then the factor score Generally, the L-shaped distribution of the factor sco factor" and the "high factor" as analyzed here also a in a noticeable way. In the present analyses, prominently.
The distributions of voiceless obstruents (Figur but they were mainly higher on the "low & mid-h also L-shaped, but they were mainly higher on the "h occupied a relatively high position only on the "hig to the origin on the "low & mid-high factor": /ʃ/, /tʃ about half of those of /ʃ/ and /tʃ/. In sum, the voiced ð / were mainly related to the "low & mid-high fa the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly rela Figure 6. The L-shaped distributions of voiced obstr dimensional space of the "low & mid-high factor" extracted spectral factors were obtained by origin- Figure 2) and the English speech samples were from / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, / Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the obstruen the two-dimensional factor space of Figures 2-5c. The factor scores o exclusive between the "low & mid-high factor" and the "high factor": I & mid-high factor" was higher, then the factor score of the "high factor" Generally, the L-shaped distribution of the factor scores of the obstruent factor" and the "high factor" as analyzed here also appeared in Nakajim in a noticeable way. In the present analyses, the L-shaped distr prominently.
Given the prominence of the L-shape for obstruents, more detaile Figure 6 shows that the distributions of voiced obstruents ( Figure 6a (Figure 6a), three obstruents occupied a relativ & mid-high factor", almost parallel to the y-axis, and close to the origi and /ð/, for which the factor scores of /v/ and /ð/ were around half of /b/. B obstruents occupied a relatively high position on the "high factor", alm close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around factor scores of /d/ were both distributed on the "low & mid-high facto again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, an but they were mainly higher on the "low & mid-high factor". The dist also L-shaped, but they were mainly higher on the "high factor". The dis occupied a relatively high position only on the "high factor", almost pa to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, a ð / were mainly related to the "low & mid-high factor". The voiced ob the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high fact / and /d Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the o the two-dimensional factor space of Figures 2-5c. The factor exclusive between the "low & mid-high factor" and the "high & mid-high factor" was higher, then the factor score of the "high Generally, the L-shaped distribution of the factor scores of the o factor" and the "high factor" as analyzed here also appeared in in a noticeable way. In the present analyses, the L-shap prominently.
Given the prominence of the L-shape for obstruents, mor  (Figure 6a), three obstruents occupied & mid-high factor", almost parallel to the y-axis, and close to and /ð/, for which the factor scores of /v/ and /ð/ were around ha obstruents occupied a relatively high position on the "high fact close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were factor scores of /d/ were both distributed on the "low & mid-h again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, / but they were mainly higher on the "low & mid-high factor". also L-shaped, but they were mainly higher on the "high factor" occupied a relatively high position only on the "high factor", a to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, f about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents ð / were mainly related to the "low & mid-high factor". The v the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the " / and the voiceless obstruents / Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Intere the two-dimensional factor spa exclusive between the "low & m & mid-high factor" was higher, Generally, the L-shaped distribu factor" and the "high factor" as in a noticeable way. In the prominently.
The distributions of voicele but they were mainly higher on also L-shaped, but they were ma occupied a relatively high posit to the origin on the "low & mid about half of those of /ʃ/ and /tʃ/ ð / were mainly related to the the voiceless obstruents /ʃ/, /tʃ/, /, /t Appl. Sci. 2020, 10, x FOR PEER RE found as in Figures 2-5a. In the two-dimensional factor exclusive between the "low & mid-high factor" was hig Generally, the L-shaped dis factor" and the "high factor in a noticeable way. In prominently.
Given the prominence Figure 6 shows that the dist 6b) indeed showed distinc obstruents, with the numbe (467), /ð/ (968), /dʒ/ (53), obstruents: /p/ (563), /t/ (168 ʃ / (226), /h/ (340), /tʃ/ (19 three obstruents occupied a to the y-axis, and close to th /v/ and /ð/ were around ha high position on the "high f mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for whic factor scores of /d/ were bo again in an L-shape. The distributions of vo but they were mainly high also L-shaped, but they wer occupied a relatively high p to the origin on the "low & about half of those of /ʃ/ an ð / were mainly related t the voiceless obstruents /ʃ/, /, /s/ were mainly related to the "high factor". Figure 7 shows the distributions of all the obstruents divided into fricatives/affricates and plosives on the two-dimensional configuration, again clearly showing a distinctive L-shape. Here eleven obstruents were categorized as fricatives/affricates, with the number of analyzed data points in parentheses: /θ/ (205), / Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 12 found as in Figures 2-5a. Interestingly, the distribution of the obstruents also showed an L-shape in the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents were mutually exclusive between the "low & mid-high factor" and the "high factor": If the factor score of the "low & mid-high factor" was higher, then the factor score of the "high factor" was very low, and vice versa. Generally, the L-shaped distribution of the factor scores of the obstruents across the "low & mid-high factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8], but hardly in a noticeable way. In the present analyses, the L-shaped distribution appeared far more prominently.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed the L-shape, but they were mainly higher on the "low & mid-high factor". The distributions of /t/ and /θ/ were also L-shaped, but they were mainly higher on the "high factor". The distributions of three obstruents occupied a relatively high position only on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores of /s/ were about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ/ and /dʒ/ and the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor". Figure 6. The L-shaped distributions of voiced obstruents (a) and voiceless obstruents (b) in the two-dimensional space of the "low & mid-high factor" and the "high factor". Factor scores of the three extracted spectral factors were obtained by origin-shifted factor analysis with cepstral liftering (cf. Figure 2) and the English speech samples were from three native speakers.  Figures 2-5a. Interestingly, the distribution the two-dimensional factor space of Figures 2-5c. The exclusive between the "low & mid-high factor" and the & mid-high factor" was higher, then the factor score of th Generally, the L-shaped distribution of the factor scores factor" and the "high factor" as analyzed here also appe in a noticeable way. In the present analyses, the prominently.
The distributions of voiceless obstruents (Figure 6b but they were mainly higher on the "low & mid-high also L-shaped, but they were mainly higher on the "high occupied a relatively high position only on the "high fa to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, an about half of those of /ʃ/ and /tʃ/. In sum, the voiced obs ð / were mainly related to the "low & mid-high factor the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related & mid-high factor" was higher, then the factor s Generally, the L-shaped distribution of the facto factor" and the "high factor" as analyzed here in a noticeable way. In the present analys prominently.
Given the prominence of the L-shape for found as in Figures 2-5a. Inte the two-dimensional factor s exclusive between the "low & & mid-high factor" was highe Generally, the L-shaped distr factor" and the "high factor" in a noticeable way. In th prominently. Given the prominence o Figure 6 shows that the distrib 6b) indeed showed distinctiv obstruents, with the number (467), /ð/ (968), /dʒ/ (53), /ʒ obstruents: /p/ (563), /t/ (1682 ʃ / (226), /h/ (340), /tʃ/ (195 three obstruents occupied a r to the y-axis, and close to the /v/ and /ð/ were around half high position on the "high fac mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which t factor scores of /d/ were both again in an L-shape. The distributions of voic but they were mainly higher also L-shaped, but they were occupied a relatively high po to the origin on the "low & m about half of those of /ʃ/ and ð / were mainly related to the voiceless obstruents /ʃ/, /t / (195) and /d Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 12 found as in Figures 2-5a. Interestingly, the distribution of the obstruents also showed an L-shape in the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents were mutually exclusive between the "low & mid-high factor" and the "high factor": If the factor score of the "low & mid-high factor" was higher, then the factor score of the "high factor" was very low, and vice versa.
Generally, the L-shaped distribution of the factor scores of the obstruents across the "low & mid-high factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8], but hardly in a noticeable way. In the present analyses, the L-shaped distribution appeared far more prominently.
Given the prominence of the L-shape for obstruents, more detailed analyses were performed. For the voiced obstruents (Figure 6a), three obstruents occupied a relatively high position on the "low & mid-high factor", almost parallel to the y-axis, and close to the origin on the high factor: /b/, /v/, and /ð/, for which the factor scores of /v/ and /ð/ were around half of /b/. By contrast, four other voiced obstruents occupied a relatively high position on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ and /dʒ/. Only the factor scores of /d/ were both distributed on the "low & mid-high factor" and the "high factor", yet again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed the L-shape, but they were mainly higher on the "low & mid-high factor". The distributions of /t/ and /θ/ were also L-shaped, but they were mainly higher on the "high factor". The distributions of three obstruents occupied a relatively high position only on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores of /s/ were about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ/ and /dʒ/ and the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor".
/ (53). Six obstruents were categorized as plosives: /p/ (563), /t/ (1682), /k/ (786), /b/ (609), /d/ (838) and /g/ (274). For the fricatives/affricates (Figure 7a), the distributions of six obstruents occupied a relatively high position on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /s/, /z/, / Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the obstruents also showed an the two-dimensional factor space of Figures 2-5c. The factor scores of obstruents wer exclusive between the "low & mid-high factor" and the "high factor": If the factor score & mid-high factor" was higher, then the factor score of the "high factor" was very low, and Generally, the L-shaped distribution of the factor scores of the obstruents across the "low factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) [8] in a noticeable way. In the present analyses, the L-shaped distribution appeared prominently.
Given the prominence of the L-shape for obstruents, more detailed analyses were Figure 6 shows that the distributions of voiced obstruents ( Figure 6a)  For the voiced obstruents three obstruents occupied a relatively high position on the "low & mid-high factor", alm to the y-axis, and close to the origin on the high factor: /b/, /v/, and /ð/, for which the fact /v/ and /ð/ were around half of /b/. By contrast, four other voiced obstruents occupied high position on the "high factor", almost parallel to the x-axis, and close to the origin on mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ and /d factor scores of /d/ were both distributed on the "low & mid-high factor" and the "high again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displayed t but they were mainly higher on the "low & mid-high factor". The distributions of /t/ an also L-shaped, but they were mainly higher on the "high factor". The distributions of three occupied a relatively high position only on the "high factor", almost parallel to the x-axi to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor scores about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ/ a & mid-high factor" was higher, then the factor score of the "high factor" was very low, Generally, the L-shaped distribution of the factor scores of the obstruents across the "lo factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2017) in a noticeable way. In the present analyses, the L-shaped distribution appea prominently.
Given the prominence of the L-shape for obstruents, more detailed analyses w obstruents occupied a relatively high position on the "high factor", almost parallel to close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ/ and factor scores of /d/ were both distributed on the "low & mid-high factor" and the "h again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, displaye but they were mainly higher on the "low & mid-high factor". The distributions of /t also L-shaped, but they were mainly higher on the "high factor". The distributions of th occupied a relatively high position only on the "high factor", almost parallel to the xto the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the factor sco about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents /z/, /ʒ the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high factor".

/, /t
Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the distribution of the obstruents also sho the two-dimensional factor space of Figures 2-5c. The factor scores of obstrue exclusive between the "low & mid-high factor" and the "high factor": If the facto & mid-high factor" was higher, then the factor score of the "high factor" was very Generally, the L-shaped distribution of the factor scores of the obstruents across th factor" and the "high factor" as analyzed here also appeared in Nakajima et al. (2 in a noticeable way. In the present analyses, the L-shaped distribution a prominently. Given the prominence of the L-shape for obstruents, more detailed analys Figure 6 shows that the distributions of voiced obstruents ( Figure 6a) and voiceles 6b) indeed showed distinctive L-shapes as well. Here eight obstruents were ca obstruents, with the number of analyzed data points in parentheses: /b/ (609), /d (467), /ð/ (968), /dʒ/ (53), /ʒ/ (54) and /z/ (827). Nine obstruents were categ obstruents: /p/ (563), /t/ (1682), /k/ (786), / ʃ / (226), /h/ (340), /tʃ/ (195), /s/ (1389), /f/ (552), and /θ/ (205). For the voiced obs three obstruents occupied a relatively high position on the "low & mid-high fact to the y-axis, and close to the origin on the high factor: /b/, /v/, and /ð/, for which /v/ and /ð/ were around half of /b/. By contrast, four other voiced obstruents oc high position on the "high factor", almost parallel to the x-axis, and close to the o mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around half of /ʒ factor scores of /d/ were both distributed on the "low & mid-high factor" and th again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, and /f/, dis but they were mainly higher on the "low & mid-high factor". The distributions also L-shaped, but they were mainly higher on the "high factor". The distributions occupied a relatively high position only on the "high factor", almost parallel to t to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which the facto about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, and / ð / were mainly related to the "low & mid-high factor". The voiced obstruents & mid-high factor" was higher, then the factor score of the "high factor" w Generally, the L-shaped distribution of the factor scores of the obstruents factor" and the "high factor" as analyzed here also appeared in Nakajim in a noticeable way. In the present analyses, the L-shaped distri prominently.
Given the prominence of the L-shape for obstruents, more detaile obstruents occupied a relatively high position on the "high factor", almo close to the origin on the "low & mid-high factor": /g/, / ʒ /, /dʒ/and /z/, for which the factor scores of /g/ and /z/ were around factor scores of /d/ were both distributed on the "low & mid-high facto again in an L-shape.
The distributions of voiceless obstruents (Figure 6b), /p/, /k/, /h/, an but they were mainly higher on the "low & mid-high factor". The dist also L-shaped, but they were mainly higher on the "high factor". The dist occupied a relatively high position only on the "high factor", almost pa to the origin on the "low & mid-high factor": /ʃ/, /tʃ/, and /s/, for which about half of those of /ʃ/ and /tʃ/. In sum, the voiced obstruents /b/, /v/, a ð / were mainly related to the "low & mid-high factor". The voiced ob the voiceless obstruents /ʃ/, /tʃ/, /s/ were mainly related to the "high facto /. The distributions of five obstruents were both located on the "low & mid-high factor" and the "high factor" in an L-shape: /θ/, / Appl. Sci. 2020, 10, x FOR PEER REVIEW found as in Figures 2-5a. Interestingly, the d the two-dimensional factor space of Figur exclusive between the "low & mid-high fac & mid-high factor" was higher, then the fact Generally, the L-shaped distribution of the f factor" and the "high factor" as analyzed he in a noticeable way. In the present ana prominently.
The distributions of voiceless obstruen but they were mainly higher on the "low & also L-shaped, but they were mainly higher o occupied a relatively high position only on to the origin on the "low & mid-high factor about half of those of /ʃ/ and /tʃ/. In sum, the ð / were mainly related to the "low & mid the voiceless obstruents /ʃ/, /tʃ/, /s/ were mai /, /f/, /v/, /h/, but they were mainly higher on the "low & mid-high factor". As for the distributions of plosives (Figure 7b), all of them displayed the L-shape, but /p/, /b/ and /k/ were mainly higher on the "low & mid-high factor", and /t/, /d/ and /g/ were mainly higher on the "high factor". occupied a relatively high position on the "high factor", almost parallel to the x-axis, and close to the origin on the "low & mid-high factor": /s/, /z/, /ʃ/, /ʒ/, /tʃ/ and /dʒ/. The distributions of five obstruents were both located on the "low & mid-high factor" and the "high factor" in an L-shape: /θ/, /ð/, /f/, /v/, /h/, but they were mainly higher on the "low & mid-high factor". As for the distributions of plosives (Figure 7b), all of them displayed the L-shape, but /p/, /b/ and /k/ were mainly higher on the "low & mid-high factor", and /t/, /d/ and /g/ were mainly higher on the "high factor". The L-shaped distributions of obstruents divided into fricatives/affricates (a) and plosives (b) in the two-dimensional space of the "low & mid-high factor" and the "high factor". Factor scores of the three extracted spectral factors were obtained by origin-shifted factor analysis with cepstral liftering (cf. Figure 2) and the English speech samples were from three native speakers.

Discussion and Conclusions
Here we performed a comparative study between different factor analysis methods applied to English speech. Our first aim was to determine whether the use of normal factor analysis and a modified "origin-shifted" factor analysis caused differences in the extracted spectral factors when applied to English speech. The difference between these two methods was manifested in two processing aspects: (1) whether cepstral liftering was applied or not, (2) whether the origin shift was used or not. The present results showed that without cepstral liftering, the factor loadings of one factor were more pronounced (Figure 1), but neither cepstral liftering nor the origin shift had a large impact on the factor scores ( Figures 2-5), whose distributions were similar.
The merit of the origin-shifted analysis is the following. Utilizing normal factor analysis, it can be difficult to find any features around silent parts in speech. When the speech is resynthesized, noise is very likely generated also at the silent parts, resulting in a continuous noise sounding in the background. The biggest advantage of the origin shift is that it makes all silent parts in speech signals plotted onto the silent point in the factor space, and that the silent parts remain silent when the speech is resynthesized. If the quality of the resynthesized speech is better, it can very likely be related more closely to the real auditory signal, which makes it more useful for listening experiments. Given that the results of the normal factor analysis and the origin-shifted factor analysis were similar, the originshifted factor analysis is highly recommendable for future research on speech resynthesis and subsequent listening experiments.
The second aim of our study was to determine the acoustic correlates of English phonemes by the four factor analyses. New insight was obtained with regard to obstruents. In an English syllable, Figure 7. The L-shaped distributions of obstruents divided into fricatives/affricates (a) and plosives (b) in the two-dimensional space of the "low & mid-high factor" and the "high factor". Factor scores of the three extracted spectral factors were obtained by origin-shifted factor analysis with cepstral liftering (cf. Figure 2) and the English speech samples were from three native speakers.

Discussion and Conclusions
Here we performed a comparative study between different factor analysis methods applied to English speech. Our first aim was to determine whether the use of normal factor analysis and a modified "origin-shifted" factor analysis caused differences in the extracted spectral factors when applied to English speech. The difference between these two methods was manifested in two processing aspects: (1) whether cepstral liftering was applied or not, (2) whether the origin shift was used or not. The present results showed that without cepstral liftering, the factor loadings of one factor were more pronounced (Figure 1), but neither cepstral liftering nor the origin shift had a large impact on the factor scores ( Figures 2-5), whose distributions were similar.
The merit of the origin-shifted analysis is the following. Utilizing normal factor analysis, it can be difficult to find any features around silent parts in speech. When the speech is resynthesized, noise is very likely generated also at the silent parts, resulting in a continuous noise sounding in the background. The biggest advantage of the origin shift is that it makes all silent parts in speech signals plotted onto the silent point in the factor space, and that the silent parts remain silent when the speech is resynthesized. If the quality of the resynthesized speech is better, it can very likely be related more closely to the real auditory signal, which makes it more useful for listening experiments. Given that the results of the normal factor analysis and the origin-shifted factor analysis were similar, the origin-shifted factor analysis is highly recommendable for future research on speech resynthesis and subsequent listening experiments.
The second aim of our study was to determine the acoustic correlates of English phonemes by the four factor analyses. New insight was obtained with regard to obstruents. In an English syllable, an isolated obstruent always has a position as a syllable-onset or a syllable-end. If two or more obstruents are next to one another, one of them begins or ends the syllable. Analyzing obstruents acoustically as in the present study helped to understand this phonological phenomenon. Our analyses showed a factor with a frequency range around 1100 Hz. According to Nakajima et al. (2017) [8], this factor was only related to vowels and sonorant consonants [8]; it seems to be closely related to syllable nuclei, most of which are vowels but a few are sonorant consonants. We provided evidence that obstruents were not only associated with the factor related to a frequency range above 3300 Hz (the "high factor"), as suggested by Nakajima et al. [8], but also with the bimodal factor with frequencies around 300 Hz and 2300 Hz (the "low & mid-high factor"). One likely cause for this is the difference in the initial analysis processing of the speech signal, but further investigation is required.
Our present findings thus suggest that these two extracted factors of the acoustic natures of obstruents reflect strong cues as to the beginning or the end of a syllable. It is important to note that the distributions of subsets of obstruents (voiced and voiceless, Figure 6; fricatives/affricates and plosives, Figure 7) indeed occupied high positions on the "low & mid-high factor" and the "high factor", but not on the "mid-low factor". The obstruents were often far from the origin, the silent point in the origin-shifted factor analysis, but they never went into the positive direction of the "mid-low factor". This confirms that obstruents do not constitute the syllable nucleus, but rather delimit the syllable, corroborating the typical sonority hierarchy in phonology on which obstruents have the lowest position [10,11].
Further research is necessary to identify the acoustic correlates of individual obstruents in more detail. The distributions of most obstruents obtained a relatively high position on one of the two factors other than the "mid-low factor" (Figure 6, Figure 7). This should be investigated further in the future. It would be fruitful to identify acoustic correlates of consonant clusters. Consonant clusters in English are in many cases word-initial consonant clusters, such as /br/ in "bridge", or word-final consonant clusters, such as /sks/ in "desks". Based on purely acoustic analyses of English speech sounds as in the present study, we expect to perform listening experiments on consonant perception. Obstruents are very likely to be perceived as delimiting syllables by their lack of sonority, i.e., by their low score of the "mid-low factor". An aspect of English phonology was thus connected to the acoustic natures of speech sounds.