Evaluation of Three Auditory-Sculptural Qualities Created by an Icosahedral Loudspeaker

: The icosahedral loudspeaker (IKO) was previously established as an electroacoustic instrument enabling the musical creation and orchestration of sculptural sound phenomena in the room. This is technically achieved by manipulating the strengths of the available acoustic reﬂection paths by using the IKO’s acoustic beamforming capabilities. In its use, listeners perceive auditory sculptures whose characterization needs investigation. We present a proposed set of sculptural quality attributes directionality, contour, and plasticity and a series of listening experiments investigating them. The experiments employ documented beam layouts using a selected set of sounds as conditions, and they evaluate the recognizability, perceivable grading, and discernibility of the proposed sculptural qualities.


Introduction
In spatial audio technology, see e.g., [1], a virtual source is a model description of how technical parameters map to the characteristics of an isolated auditory event. Most of all characteristics, the virtual source describes the relation of a modeled auditory event location to the gains and delays that feed a signal to the surrounding loudspeakers, as the most common application. In acoustically treated studio environments [2], numerous effects can be controlled relatively precisely on the level of abstraction of such a single, moving virtual source for spatialization. A common mechanism utilized is summing localization [3], which fuses the multiple sound instances of the loudspeakers to clearly localized auditory events whenever sounds are arriving with time lags < 1 ms.
Spatialization that creates immersive sound environments in electroacoustic music with 3D surround loudspeaker systems [4] often employs particle systems of virtual sources [5,6] fed by spectrally modified sounds. In contrast to isolated auditory events, we use the term auditory object [7] to refer to what perception segregates from a mixture of multiple concurrent sounds. Kendall [8] argues that auditory objects and constellations thereof are only indirectly controlled by spatialization using virtual source constellations because auditory objects are not only subject to the precedence effect [9,10], but also many other psychoacoustic effects, e.g., masking, auditory grouping, auditory attention, etc. [11]. Only in exceptional cases, all intended and composed virtual source constellations can become fully transparent auditory objects for all listeners.
What frequently turns out to be effective in spatial electroacoustic music is to make spatialization time-variant and to make its virtual sources move along trajectories that Godøy refers to as gestures [12]. The knowledge about the effectiveness of spatial gestures is particularly helpful, as discussed by Nyström and Smalley [13,14]. Such time-variant spatializations often offer comprehensible and stable auditory impressions that can be perceived by many listeners throughout the audience, even with a manageable number of loudspeakers and signals.
Beamforming compact spherical loudspeaker arrays were introduced by Warusfel and Avizienis [15,16] for the application in electroacoustic music. We introduced the spherical beamforming technology behind the icosahedral loudspeaker IKO in [17], cf. Figure 1. Commercial soundbars [18] or the IKO [17] emit sound beams whose directions are controlled as spatialization parameters. Spatialization on beamforming systems relies on the availability of a few pronounced reflections, and it is only precise if they can be precisely controlled. Wühle et al [19] describe fundamental studies for beamforming loudspeaker systems by comprehensively establishing experimental level thresholds that such systems should provide in isolating distinct reflection paths from the direct path. In this way, every distinct reflection could be used as additional, virtual loudspeaker-localized with the reflection's direction of arrival. These thresholds extend the experimental studies on the precedence effect to a range, in which, instead of attenuated leading sound, the lagging sound is desired to dominate localization. Despite the level thresholds being often only accomplished for a limited set of the available reflection paths, the control of auditory distance succeeds, which, as in Zahorik [20], is based on controlling the direct-to-diffuse energy ratio. Laitinen [21] and Wendt [22] present experiments that use compact spherical beamformer arrays to vary distance impressions by crossfading between direct and indirect, rather diffuse sound reinforcement. In previous studies [23][24][25][26], we were able to show the basic characteristics of static and dynamic beamforming in various experiments using stationary and transient sounds. In a performance situation, a quarter-circle arrangement of reflectors is often employed to support a robust and detailed spatialization throughout the audience area [27], cf. Figure 1.
Layers of different, gesturally moving sound beams using a set of signals help overcoming the problem that precise auditory object localization can be specific to the listening position, as experimentally shown for the IKO [25,27]. This kind of spatialization results in constellations of auditory objects, for which we have found and confirmed distinguishable auditory sculpture categories [25]. In the literature, Gertich ([28], p. 145) and Landy [29] describe sound sculptures as physical sculptures processd by the ability to produce sound. In contrast, Leitner [30] and Sharma [31] describe sound sculptures that use sound as integral element, detached from physical sculpture. We focus on the latter concept by considering auditory sculptures that are arranged by beam-formed air-borne sound and its wall reflections. Moreover, listening experiments with the IKO indicate that sculptural relations are inter-subjectively recognized [25]. In the composition and performance practice using the IKO as an electroacoustic instrument [31], sculptural qualities turned out to more efficiently characterize the various spatial impressions of a listener than the mere knowledge about localization directions and distances would offer.
We adopt three qualities from scholarly writings on sculpture, namely, directionality, contour, plasticity [32][33][34][35]. These expressions and definitions for physical solids do not automatically apply to auditory sculptures, but they lead to notions that we adopted for our purposes here by defining

•
Directionality describes the potential of auditory objects in the auditory sculpture to dynamically guide the listeners attention through a room.

•
Contour describes the degree of dependency of the auditory sculpture's outline (silhouette) on the listening position taken and imagined from temporal evolution.

•
Plasticity describes the degree of depth grading of the spatially layered auditory objects of the auditory sculpture in the room.
This paper presents multiple listening experiments that support the three proposed sculptural quality attributes. Section 2 introduces the experimental setups, as well as the sound material and its spatialization on the IKO. The subsequent sections describe the conditions of each experiment and discuss their results. Finally, the concluding Section 6 summarizes the findings.

Experiments on Sculptural Qualities
While the above-mentioned sculptural qualities already turned out to be useful in practice during the process of composition [31], there is a desire to confirm them inter-subjectively. Thus, the aim of the experiments is to check in how far a musically experienced, well-instructed test panel is able to recognize the composer's intention. To this end, we conducted a series of three consecutive experiments with 11-15 participants, where each study was based on the results of the preceding experiment. The conditions for our experiments are spatiatlizations on the IKO that are musically composed and selected with the intention of providing a grading in the three proposed sculptural qualities, based on experience from preceding compositions and experiments.
The first and initial experiment used a simple A/B comparison to indirectly rank conditions according to the strength of each sculptural quality separately. Similarly, the second and refining experiment used a multi-stimulus paradigm to directly rank multiple conditions. By contrast, the third and discriminating experiment used a multi-dimensional comparison to evaluate the relative share of the three sculptural qualities in the comparison of multiple conditions. In all experiments, the duration of each condition was 30 s to provide enough time for listening to the auditory sculpture, while keeping experimentation time short.

Sound Material
The sound material for the experiments described in this paper is chosen from a set of six sounds specified below, (cf. supplementary Audio S1). For each of the sounds, we have an assumption on its projection distance, i.e., on how far away from the IKO surface it will be perceived. The assumption is based on the finding described in [25] that the IKO beamforming with stationary broad-band sounds typically yield auditory objects further away from the IKO, e.g., at the walls or reflector panels around the IKO, than transient sounds due to a suppression of the precedence effect. By contrast, transient sounds produce auditory objects at reduced distance to the IKO or its surface. Together with the assumed ranking in projection distance (in brackets), these sounds are: PN(6) stationary pink noise FM(5) frequency-modulated metallic radiance sound with long attack and release time DR(4) drone (sustained, harmonically static) sound with gradual loudness increase, low cut at 236 Hz FG(3) even chain of fine grains BP(2) grain loop with bass pulses IB(1) fluctuating sequence of irregular bursts

Spatialization with the IKO
The above-mentioned sound material was generated as single-channel sound and thus, required further processing in order to be played back by the 20 drivers of the IKO. As the beamforming of the IKO uses 3rd-order Ambisonics ( [36], Ch.7) for sound projection, each mono sound was first fed into an encoder yielding a set of 16 Ambisonic signals, encoding the signal with adjustable beam direction (azimuth and elevation angles w.r.t. to the IKO-centered coordinate system shown in Figure 2) and width (by weights to reduce the effective Ambisonic order). Ambisonic signals from each encoded sound were added to the Ambisonic bus, whose 16 signals were decoded by a filter matrix to the 20 signals of the IKO. More details about the signal processing can be found in [17,36]. The conditions mainly encoded a subset of the sounds using different beam directions that were either static or moving in (counter-) clockwise rotation. The resulting 3rd-order beam width of ±30 • is largely frequency-independent between 200 Hz and 2 kHz and can be studied in [37]. Some conditions also include omnidirectional sound projection. A special sound movement is the so-called distance fade based on compositional practice [31] and experimental results in [22]. It yields an auditory object that continuously decreases or increases its distance to the IKO. A distance decrease starts with a 3rd-order beam into a certain direction, e.g., 0 • , and it continuously reduces the effective beam order until 0, i.e., becomes omnidirectional.

Experiment 1: Initial
The first experiment took place in December 2017 at Hybrid Lab TU Berlin. The room had a reverberation time of 0.8 s and the experimental setup is shown in Figure 2a. For each attribute of directionality, contour, and plasticity, there were distinct sets (one for familiarization and one for the actual experiment) of 3 conditions that, for this first experiment, were selected from existing IKO miniatures ( [31], p. 193ff). The conditions are described in Table 1 by their sounds and spatialization.

Conditions
For directionality, conditions d f1 . . . d f3 were chosen for familiarization that were intended by the composer to exhibit increasing magnitude in the quality directionality, cf. Table 1. The increase is mainly based on the addition of more rotations meant to increase guidance of the listener's attention through the room. The grading in the conditions of the actual experiment d 1 . . . d 3 follows the same idea: d 3 is assumed to cause the most pronounced directionality, as it uses PN(6) that projects the rotating beam farthest, while FG(3) of d 1 fades a downwards to an upwards pointing beam, which is assumed to cause only little directionality. The familiarization condition c f3 is expected to yield a strong contour, because of the weakly-projecting static IB(1) beam to the horizon and two opposing horizontal rotations of FG(3) and PN(6) that meet two times every second are meant to be heard as non-uniformity. The single rotation in c f1 and c f2 is assumed to yield less contour, while the elevation in the rotation of FG(3) in c f1 should reduce contour even further. Similarly for the experimental conditions, the degree of silhouette dependency was assumed to be strongest for c 3 with a 180 • /s clockwise rotation of BP(2) on the horizon and a 90 • /s counter-clockwise rotation at −60 • elevation of FG(3); the motion yields beam directions of coinciding azimuth at {0 • , −120 • , 120 • } at {0, 4 3 , 8 3 }s within every 4 s cycle. The least contour was expected for c 1 with its single static beam of DR(4).
The expected increase in the magnitude of plasticity is mainly composed by adding more and more simultaneous layers of sounds with different projection distance. The familiarization condition p f1 is expected to evoke the smallest magnitude due to the static DR(4), while p f1 combines rotating beams of DR(4) and FG (3). Similar grading is expected between p 1 and p 3 : the pair of rotating beams in p 3 with IB(1) and PN(6) and a static beam with FG(3) is assumed to contain the greatest complexity in simultaneous depth layering, while p 1 was assumed to be heard with the least plasticity. p 1 presents a sequence of three distance fades with FG(3) and the beam directed towards the listener, i.e., 180 • : The first fade employs a reduced order variation from 3 to 1 and back to 3, whereas the other fades make use of the full variation from 3 to 0 to 3 resulting in a distance fade with the entire range.

Method
The 13 listeners (25 to 30 years old, 1 female) were participants of the seminar Composing with Sculptural Sound Phenomena in Computer Music (Spatial Practices I). They were familiarized with the sculptural qualities by written explanations according to Section 1 and playback of three conditions with increasing magnitude in each quality. In order to avoid repetition priming, the familiarization used the conditions Table 1. Both were arranged under similar reasoning, but their details differ in sound and spatialization.
During the experiment, all listeners were in the room at the same time, and were able to choose a listening position according to their preference. Three questionnaires of 6 questions each were handed out to the participants, the first one dealing with the conditions for directionality, the second one with those for plasticity, the third one with those for contour; each one was headed by the German definitions (cf. supplementary Document S2) preceding the ones we give in Section 1, and the questionnaires offered enough space for every question, in case participants needed to make sketches before responding (cf. supplementary Handout S3). Every of the total of 18 questions referred to a condition pair denoted as A and B, to be answered directly after the playback of A and B via IKO: Is the magnitude of {directionality/countour/plasticity} greater in A, equal for both, or greater in B: A > B, A = B, B > A? and participants could ask for repeated playback of the current pair. The 3 conditions per attribute yield a full pairwise comparison set of 3 pairs. To gather all answers twice, 6 questions were posed per attribute. The playback appeared in a seemingly random order; randomization was both in which was first of the pair and the sequence of the presented pairs. The experiment took 30 min. Figure 3 shows the results of the full pairwise listening experiment in terms of the mean scale across all listeners. The individual-listener scale was built by summation of the entries (±0.5, 0) of the 3 × 3 matrix containing all 6 pairwise comparison ratings; +0.5 signified greater, 0 equal, −0.5 smaller rating of the respective example in the pair, and the analysis shows mean values and 95% confidence intervals. The statistical discussion by pairwise t-test shows that the differences are statistically significant (p < 0.05) between all pairs except the difference between the second and third plasticity conditions p 2 and p 3 (p = 0.087).

Discussion
Directionality: The results in Figure 3a show that the magnitude in directionality significantly increases from d 1 to d 3 . The directionality of d 2 is more pronounced than the one of d 1 , assuming that its sound FM(5) would already yield auditory objects projected further away from the IKO's surface than FG(3). Moreover, the rotating beam of d 2 is assumed to yield stronger lateralization cues in comparison to the fading from beam 1 (below) to beam 2 (above) in d 1 . Condition d 3 that has been rated as most directional uses PN(6) as one of the rotating beams and FG (3) for the other, of which we assume that it avoids auditory grouping of the two time-variant beams. Also the static IB(1) might fall into background in relation to FG(3). PN (6), with its strong projection scope, appears to guide attention more than the spatially less-defined rotating FM(5) of d 2 .
Contour: The experiment revealed that the magnitude of contour significantly increases in the conditions c 1 to c 3 , cf. Figure 3b. In condition c 1 it appears that the low-frequency DR(4) cannot be localized very well other than at the IKO. Low frequencies are typically only weakly localized in rooms, which might cause a more uniform contour of the auditory object. Even though in c 2 , FG(3) is directed upwards and PN(6) downwards, which can only produce weak lateralization cues, the contouring is assumed to be stronger than in c 1 as PN (6) still produces reflections at the closest walls behind the IKO. Condition c 3 presents the most pronounced contour. During one rotation of the slower beam, the two beams meet three times (every 120 degrees) resulting in three dominant directions. In combination with the temporal development of the directions from the two individual beams, they appear to yield a strong contour.
Plasticity: While the results indicate a significant increase in the magnitude of plasticity from p 1 to p 2 , the further increase to p 3 is only weak, cf. Figure 3c. We assume that the movement in p 1 draws attention gradually to the depth, but with no simultaneous depth grading experienced. Therefore, we assume that almost no plasticity was perceived. Condition p 2 uses layering of static FG(3) and rotating PN(6) with large projection scope that is most likely producing a time-variant but sonically distinguishable concurrent depth grading. Condition p 3 was rated with higher plasticity, which can be interpreted as adding a further layering that seems to have been experienced as a weak increase of plasticity, only. The expected rise in depth grading may have been obstructed by a stronger auditory grouping of sounds. The downwards directed FG(3) beam seems to have been masked strongly by the counter-clockwise rotating IB(1) and PN(6) so that the additional auditory object in the depth might not have added a perceivable layer.

Experiment 2: Refinement
After experiment 1 indicated that distinguishable increase in the proposed sculptural qualities can be composed such that they are clearly perceived (except for a weaker distinction in plasticity), a greater degree of nuance should be investigated in experiment 2, requiring to prepare a dedicated condition set. Experiment 2 was conducted on three days (15,22, and 23, May 2018) in the IEM lecture hall (0.5 s reverberation time) as in [25], see Figure 2b. Although the IEM lecture hall differs from the Hybrid Lab at TU Berlin with respect to size and reverberation time, both rooms are quite similar with respect to their (effective) critical distance, i.e., the distance at which the direct and the diffuse sound are equally loud: for an omnidirectional source this would be 1.0 m (IEM lecture hall) and 0.92 m (Hybrid Lab TU Berlin). The difference in the critical distance results in a difference in the direct-to-reverberant energy ratio of 0.4 dB which is clearly below the minimum audible difference in the most sensitive case of 2 dB [38]. Thus, both rooms can be assumed to be similar regarding the perceived distances of auditory objects [22].

Conditions
Aiming at a fine-grained increase of each of the quality attributes, a pool of 32 new conditions was composed for this experiment. Listening to these conditions in the given experimental setup (see [25]), the conditions of Table 2 were defined and sorted to establish a ranked set of 6 conditions per attribute: D 1 . . . D 6 for a graded increase in directionality, P 1 . . . P 6 for graded increase in plasticity, and C 1 . . . C 6 for a graded increase of contour. Directionality: The beam of D 1 with FM(5) is static and is therefore, not assumed to dynamically guide the listener's attention through the room. Although the more transient FG(3) would stick closer to the loudspeaker, D 2 is supposed to produce more shifts of attention due to the concurrent combination of static beam and rotation. Consequently D 3 using FM(5) for both the dynamic and static spatialization of D 2 should exhibit a more pronounced directionality. Moreover, D 4 alters D 2 by using FM(5) for the dynamic spatialization while keeping FG(3) for the static one, which removes the grouping between static and dynamic sounds. This should process the distinctiveness of the dynamic spatialization and thus, directionality. The single rotation of PN(6) in D 5 aims at producing shifts of direction to the reflective surfaces, in particular in the front and at the sides of the room. D 6 is a combination of two counter-clockwise rotations with the PN(6) and by the coinciding beams at {0, 180 • } every {0, 1} s within a period of two seconds, it is supposed to produce the strongest condition of directionality in terms of a distinctive oscillation between front and back.
Contour: C 1 presents PN(6) played back omnidirectionally. Therefore the auditory sculpture is supposed to be of uniform contour, temporally and from all listening positions. C 2 uses PN(6) also statically but with a beam directed towards the blackboard, which aims at introducing a slight silhouette dependency. C 3 uses a rotating beam of FG(3) that is expected to stay close to the IKO and yield more contour by the position-dependent and time-variant dominance of the wall reflections involved. C 4 uses the same rotation with FM(5), which is assumed to differ alone by the larger distance of its auditory objects to the IKO. This is expected to increase contour by producing a greater position dependency and time dependency with more noticeable jumps between the successively processd wall reflections. C 5 as a combination of a static and a rotating beam both with PN(6) aims at producing a noticeably extended auditory sculpture. It extends from the static blackboard reflection towards distinct reflections of the rotating beam, yielding a time-dependent and position-dependent outline. C 6 varies C 5 by employing FM(5) for the static beam. This is supposed to release the auditory grouping between rotating and static beam, and hereby to increase contour compared to C 5 .
Plasticity: P 1 consists of omnidirectionally spatialized pink noise PN (6), which is supposed to yield low depth grading, i.e. low plasticity. P 2 with a beam of FG(3) towards the blackboard (0 • ) intends to produce depth grading between the blackboard reflection and the remaining low-frequency direct sound (180 • ) to the listener that the IKO's beamforming is not able to fully attenuate. By combining a static beam with FM(5) to the blackboard and a rotating beam with FG(3) in P 3 , the different sound characteristics are supposed to span a depth range between blackboard and at least the IKO's visible surface as depth grading. As processd depth-grading version of P 3 , P 4 uses a reinforced reflection from the blackboard with PN(6) instead of FM (5), and it contains an added counter-clockwise rotating beam FM(5) that should yield two rotations differentiable by their sounds, appearing in different distances to the IKO. P 5 uses PN(6) as the most far-projecting sound for the rotating beam, combined with a static beam of FG(3) towards the black board, which aims at clearer scenario of larger depth grading between blackboard reflection and the distantly rotating auditory object. P 6 uses the same material and spatialization as P 4 , but with rotating beams of PN(6) and FG(3), targeting auditory objects separable by their sounds and distance with which they rotate around the IKO, while FM(5) should produce a more sinusoidal auditory object of static location defined by a distant blackboard reflection.

Method
Eleven listeners (28-56 years old, all male) took part in the experiment. All of them were staff or students of our institute and were experienced in evaluating spatial audio and listening to the IKO. Following a short introduction into the terminology of sculptural qualities with the help of a written description (cf. supplementary Handout S4) preceding the definition in Section 1, every listener sat alone in the room with the IKO, at the listening position as shown in Figure 2b. They were encouraged to sometimes leave the listening position in order to listen to the conditions from multiple perspectives. Listeners had a tablet touch interface to switch between and rate the various conditions with regard to each of the three attributes. The conditions D 1 . . . D 6 , e.g., were presented in three trials using a multi-stimulus interface, each time with a randomly ordered assignment to its six sliders and radio buttons (cf. Handout S4), one trial asked for ratings concerning directionality, another one concerning plasticity, and a third one concerning contour. The same procedure was done for the condition sets P 1 . . . P 6 and C 1 . . . C 6 , so that every listener comparatively rated the mapping of all three condition sets with regard to all three attributes, yielding 9 trials, each one a 6-stimulus task. This should allow to inspect how the intended increase of one attribute would map to all three attributes. The experiment took on average 45 min including the introduction. Figure 4 shows the results from experiment 2 on a graded increase of particular attributes. For every of the three attribute ratings, the figure shows the medians and 95% confidence intervals of the multi-stimulus ratings, for all of the condition sets. The trend for the directionality rating of D 1 . . . D 6 and the contour rating of C 1 . . . C 6 is more clearly pronounced as the other ratings of those condition sets, so that their graded increase could be clearly recognized by the listeners.

Discussion
As intended, the conditions D 1 . . . D 6 resulted in a graded increase of the sculptural quality directionality, cf. Figure 4. The same applies to conditions C 1 . . . C 6 for its contour rating, as well es P 1 . . . P 6 for plasticity. However, it appears that there is strong correlation between the sculptural quality ratings in the results, as most of the graded increases intended for a particular quality attribute is also found in the ratings of the other attributes, cf. Table 3. In particular, the conditions P 1 . . . P 6 that were originally intended to yield a graded increase in plasticity also resulted in a monotonic increase in directionality and contour (rank correlation of 1.00 between the median ratings of all attributes). For the conditions C 1 . . . C 6 , only the correlation between the ratings of directionality and contour is 1.00, whereas the correlation between the other qualities is less. The least correlation occurred for the conditions D 1 . . . D 6 , i.e., they result in a monotonic increase only for directionality. Table 3. Kendall rank correlation of median ratings in Directionality/Contour/Plasticity for each condition set.

Condition Set Directionality/Contour Directionality/Plasticity Contour/Plasticity
0.73 0.73 0.73 C 1 . . . C 6 1.00 0.87 0.87 P 1 . . . P 6 1.00 1.00 1.00 The strong correlation in the results may be due to the same strategy in composing the grading of sculptural qualities. A weak level of all qualities is achieved by a static, omnidirectional spatialization. Stronger levels are created by three strategies: (a) directional beams, (b) rotating beams, (c) combination of multiple beams using sounds with different projection distances. The lower correlation between the quality ratings of the conditions D 1 . . . D 6 indicates that there can be exceptions to the rule. The graded increase in directionality from D 4 . . . D 5 is assumed to be caused by the increase of the projection distance from two rotations FG(3)+FM(5) in D 4 to one with PN(6) in D 5 . Further increase is achieved by the second rotating PN(6) in D 6 as it it is assumed to guide the listener's attention more dynamically than a single beam or even one with a less far-projecting sound. For the other attributes contour and plasticity, D 4...6 did not cause a monotonic increase. The contour of the single PN(6) beam in D 5 does not increase when adding the same sound with an opposed rotation as in D 6 . However, while the second beam increases the plasticity in D 6 , D 5 yielded less plasticity than D 4 . This might be because the single PN(6) in D 5 provides less spatially layered elements as the combination of FG(3) and FM (5) in D 4 .

Experiment 3: Discrimination
In September 2018, we conducted a third experiment on the sculptural qualities based on conditions further developed from experiment 2. This time, the aim was to identify how discriminable the quality attributes are in a triangular mapping. This was done in particular because the ratings in experiment 2 were largely correlated for the three attributes. The setup of experiment 3 was the same as in experiment 2, see Figure 2b. Whereas the preceding experiments asked for comparative one-dimensional ratings subsequently and independently for directionality, contour, plasticity, this experiment asks for an inter-dependent and comparative rating of the three attributes at the same time. In this way, the experiment supports exclusive ratings: if one sculptural quality is rated as remarkably pronounced, both other qualities can only be rated as weak. As before, the selected set of conditions determines the span and structure of the comparative and relative results.

Conditions
The composition elements of the selected conditions are described in Table 4. The last column in the table shows the expected pronunciation of the sculptural qualities. The conditions were based on the conditions of experiment 2 with added components that enforce certain sculptural qualities. S 1 is based on C 4 that resulted in more directionality than plasticity or contour in experiment 2. In S 1 , the additional rotation with the same sound material is expected to increase the pronunciation of directionality. S 3 is based on P2 that was rated low in directionality, and a bit higher in plasticity and contour. By adding the sound FM(5) of larger projection distance to the sound FG(3) and by having it directed away from the listener, opposing the direction of FG(3), the intention was to increase plasticity. At the same time, contour is caused as the opposing beam directions make the auditory sculpture's outline depend on the listening position. S 5 is similar to D 4 and P 5 that consisted of a far-projecting rotating beam FM (5) or PN(6) and static FG(3), and both resulted in a similar pronunciation of all three sculptural attributes. By including both rotating components PN (6) or FM(5), the intention is to increase both plasticity and directionality. S 2 and S 4 are identical to C 2 and C 3 , respectively. Although C 2 is rated low in plasticity, C 3 is rated higher in all three qualities without any clear dominance. These conditions were kept for experiment 3 to investigate the influence of the task. In contrast to the previous experiment, the discrimination task is expected to sharpen the responses on dominance of the different qualities: S 2 with a single beam of PN(6) is expected to be a clear example for contour, whereas S 4 with its rotating FG(3) is expected to dominate in directionality. Table 4. Description of the conditions of experiment 3, their sound material (as described in Section 2.1), beam direction/rotation parameters, and their expected ranking of the three qualities D (directionality), C (contour), P (plasticity). Beams use 3 rd -order unless otherwise indicated (Az. . . azimuth, El. . . elevation, cw. . . clockwise, ccw. . . counter clockwise, snd. . . sound).

Method
Listeners were handed out the written description of the sculptural quality attributes. For this last experiment, a clarified German definition of the sculptural quality attributes has been designed from the original version (as used in experiments 1 and 2) that fully corresponds to the version presented in Section 1 (cf. supplementary Document S5). They were also informed that the matter of the investigation was whether and how different auditory sculptures can be discerned by these attributes, and they received an instruction on how the multi-stimulus triangular graphical user interface is used to enter the responses (cf. supplementary Handout S6). During the experiment, every listener was sitting alone in the room with the IKO and was encouraged to sometimes leave the listening position for listening from multiple perspectives. The fifteen listeners  year old, male) were staff or students of our institute and were experienced in the evaluation of spatial audio and listening to the IKO. Eleven of them already participated in the second experiment. Listeners used a laptop with graphical user interface to switch between and comparatively rate the various conditions with regard to the relative perceived share of either of the three attributes. This time, the conditions S 1 . . . S 5 were presented in two trials on a five-stimulus interface, each time using random assignment to its five radio buttons that switch between the conditions. For each condition, listeners had to position one of five movable markers within a triangular user interface to enter their comparative rating. The equi-lateral triangle defined a region in which the corners D: of great directionality (left), C: of great contour (right), and P: of great plasticity (top) mark exclusive ratings, and any point within the triangle could be used to rate graded mixtures of the three sculptural attributes. As one participant did not finish the repeated trial, 29 responses are evaluated below. The average duration of the experiment was 10 min per listener (i.e., 5 min per trial).

Results
The results shown in Figure 5 use bivariate statistical analysis to estimate the means and their 95% and 99.9% confidence region (ellipses) according to Hotelling's T 2 distribution, see ([39] Ch. 3). For a robust analysis, outliers were defined as responses lying outside a Mahalanobis distance of three estimated standard deviations within a preliminary, non-robust analysis. Those outliers were removed before the final analysis (there were two outliers, one in S 2 and one in S 5 ), see Figure 6. Because of the similar sizes of the statistical spreads, we may test for statistically significant differences by observing whether the mean value of a condition lies outside the 95% (p < 0.05) or the 99.9% (p < 0.001) confidence ellipses of the other conditions. All of the five conditions are significantly different at a level p < 0.001, because all means lie outside the 99.9% confidence ellipses of the other conditions. While there are clear extremes with the conditions S 4 (Directionality) and S 2 (Contour), the conditions do not provide an exclusive pronunciation of plasticity. Still, condition S 5 is a nearly exclusive mixture of directionality and plasticity, and condition S 3 exclusively maps to plasticity and contour.

Discussion
According to Figure 5, S 2 is perceived almost as mainly contoured and, therefore, lies closest to Contour (D7%, C81%, and P12%). We assign this to its far-projecting, static 0 • sound beam with PN(6) (see Table 4), away from the listener. In this way, when the listener is walking around in the room a distinctive listening-position-dependent contour would be observable, which otherwise triggers little or no pronounced depth grading or guidance of attention. Listeners rated S 4 closest to Directionality (D80%, C11%, P9%), see Figure 5. It is almost purely directed and consists of a single rotating beam of the sound FG(3) (see Table 4). Its movement is attention-guiding around the surrounding horizontal directions, and obviously it is both clear and smooth enough to avoid depth grading and contour.
S 1 is still rated as directed at D59% in Figure 5, and yet it already contains impressions of depth grading and contour (D59%, C20%, P22%). It consists of a pair of counter-rotating beams that periodically meet at 0 • and 180 • . Both beams use the far-projecting sound FM (5). Accordingly, the position of S 1 should be distinctly front-back oscillating with weakly pronounced transitions over the sides, which obviously yields slight plasticity. We assume that due to the pronounced front-back-oriented extent of its development, S 1 is also rated as slightly contoured. S 3 was rated to be plastic by P56% and contoured by C42% as seen in Figure 5, with no attention-guiding share D3%. It might be because of its static front-back beam pair ( Table 4) that it was perceived as yielding contour. Moreover, the static beams and the stationary sounds should avoid any temporal attention guidance. The fact that different, clearly discernible sounds FG(3) to 180 • and FM(5) to 0 • were used, could have caused a pronounced plasticity that slightly outweighs contour.
With some similarity to S 1 but clearly separated, the listeners rated S 5 as more plastic (D40%, C17%, P43%). According to Table 4, S 5 also contains two beams rotating in opposite directions, but with PN(6) and FM(5) as clearly distinguishable sound material, together with the additional static component FG (3). Both rotating sounds differ in projection distance and character, which is assumed to be the cause for the strong pronunciation of depth grading (plasticity) and attention guidance (directionality) in the results. The contour of S 5 is comparable to the one of S 1 .

Conclusions
This article presented a way to establish common perceptual qualities based on miniature auditory sculptures created by the icosahedral loudspeaker. The proposed qualities directionality, contour, plasticity are derived from artistic practice with sculpturality regarding spatial auditory perception and are based on scholarly writing about sculpting. A sequence of three listening experiments using the IKO evaluated the recognizability, perceivable grading, and discernibility of the sculptural qualities. The results could be linked to the sounds and spatialization used to create the auditory sculptures. By this, we showed how to systematically compose and shape auditory sculptures from sound and beam components in nuances and grading.
In the first experiment, we demonstrated that listeners were able to separably comprehend the increase in sculptural qualities in the respective material. In addition, we showed that listeners could also perceive a finer grading in the second experiment. Moreover, the results revealed that the intentionally composed increase in one quality also mapped to an increasing tendency in the others. By having listeners rate the proposed qualities simultaneously on a triangular map, our third experiment showed that listeners could clearly discriminate relative shares of directionality, contour, and plasticity.
While we managed to compose auditory sculptures of exclusive directionality or contour, exclusive plasticity was not achieved. This might not easily achievable with the spatialization used in our compositions, i.e., static and rotating beams of different directivity. It can be assumed that exclusive plasticity requires a more diffuse spatialization, such as from a feedback-delay network [40,41], that is too diffuse to create directionality and too uniform in its directional mapping to produce a perceivable contour.
Together with the sculptural categories from our previous article [25], the sculptural qualities established are meant to be helpful as a comprehensible, problem-specific terminology.