1. Introduction
The selection of construction materials traditionally prioritizes structural performance, with functional requirements and user experience considered secondary. With increasing urbanization, humans now spend nearly 80% of their time indoors, shifting attention toward health-oriented and human-centered design [
1,
2]. This shift has heightened interest in the psychological and physiological effects of materials and in how they shape lived experience within built spaces.
Human experience of built environments arises from multisensory interactions with materials through vision, touch, audition, and olfaction [
3]. Understanding these cross-modal processes is essential for explaining how spaces are perceived and evaluated [
4]. Within this context, wood is distinguished from inorganic materials by its strong multisensory appeal. Its warm color tones and natural grain patterns convey organic beauty [
5], whereas low thermal conductivity delivers a softer, warmer tactile sensation [
6]. Exposure to wooden interiors has been consistently linked to reduced blood pressure, lower stress levels, and improved emotional well-being. According to the biophilia hypothesis, these effects are attributed to naturalness, comfort, and aesthetic pleasure evoked by wood through its visual, tactile, and thermal properties [
7,
8,
9]. Consequently, the strategic selection and multisensory optimization of wood in architecture and interior design have attracted growing research and practical interest.
Aesthetic experience and comfort impressions depend not only on complex multimodal interactions between physiological sensors and material properties, but also on perceptual and cognitive processes that integrate subjective impressions derived from objective physical attributes—such as brightness, softness, and warmth—with emotional and semantic evaluations [
10]. These impressions arise as physical stimulus properties are first transduced by sensory receptors and encoded along modality-specific neural pathways, then integrated within cortical perceptual networks responsible for constructing coherent material impressions, and finally linked to affective and evaluative systems associated with emotional and semantic appraisal [
11,
12].
Previous studies have demonstrated that, under visual conditions, perceived warmth, harmony, and lightness account for approximately 60% of the variance in aesthetic evaluations [
13], whereas smoothness and naturalness reliably elicit positive emotional responses under tactile conditions [
14]. However, the mechanisms underlying how low-level sensory features that are closely tied to physical stimulus properties, such as roughness and hardness, are transformed into intermediate perceptual impressions, which in turn mediate the formation of high-level aesthetic and comfort-related evaluations, remain poorly understood. These processing pathways and their key mediators therefore warrant further exploration.
Many existing studies have examined material perception within a single sensory, even though everyday interactions involve simultaneous inputs from multiple senses [
5,
15]. Rather than a mere sum of inputs, multisensory perception involves dynamic cross-modal integration and cognitive reinterpretation [
16]. For instance, matching a visual appearance of glass with the sound of a struck pepper causes participants to perceive it as transparent plastic [
3]. and high-frequency auditory feedback during touch can induce the “parchment-skin illusion,” making the skin feel dry and paper-like [
17]. These examples illustrate how multisensory cues can reshape the perception of physical attributes and, consequently, aesthetic and emotional responses. Advancing architectural and interior materials research therefore requires a deeper understanding of cross-modal interactions, their relative weighting across senses, and the mechanisms that govern multisensory integration.
Cultural and experiential factors also strongly shape material perception and aesthetic judgment. Cross-cultural studies reveal divergent attitudes toward natural elements: participants from Germany and Sweden tend to perceive riverscapes with woody debris as more natural and aesthetically positive, whereas those from China and India often view them as disordered or unsafe [
18]. These differences underscore how cultural background can reshape preferences for “naturalness” versus “order and cleanliness.” Yet much of existing work has focused on populations from Japan, Northern Europe, and Canada, which may limit its applicability to Chinese contexts, where perceptual norms and emotional priorities may differ [
19].
For young Chinese adults, typically in their early to mid-20s, who commonly experience substantial academic and career pressure [
20], everyday settings such as offices, dormitories, and public spaces become important sites for psychological restoration. Evidence-based material selection and multisensory optimization of wood in these environments therefore offer practical value as a cost-effective strategy to boost comfort, alleviate stress, and foster emotional well-being through targeted architectural and interior design choices.
The present study addressed this gap by examining how young Chinese adults perceive and evaluate wood under controlled visual, tactile, auditory, and multisensory conditions. Sensory evaluations, exploratory structural equation modeling, Bayesian cue-combination analyses, and linear mixed-effects models were used to identify the latent perceptual structure of wood, trace pathways to aesthetic and emotional judgments, characterize cross-modal cue integration, and quantify how subjective evaluations map onto physical properties.
Based on this framework, three research hypotheses were proposed.
H1. Wood perception under multisensory conditions was expected to be structured into a limited set of latent perceptual dimensions linking sensory inputs to emotional judgments.
H2. Emotional evaluations were expected to depend primarily on these perceptual mediators rather than being directly predicted by physical properties alone.
H3. Sensory modalities were expected to contribute unequally to multisensory evaluations, consistent with reliability-weighted integration patterns.
The results provide the first culturally specific evidence on multisensory wood perception in Chinese young adults and establish a mechanistic framework linking physical parameters to perceptual and emotional outcomes. The findings are expected to provide practical design guidance for architects and material developers by clarifying how visual, tactile, and auditory cues may be strategically optimized to enhance comfort, psychological restoration, and emotional well-being in health-oriented built environments.
3. Results
3.1. Results of Inter-Participant Correlations Analysis
IPC analysis was conducted by computing Pearson correlation coefficients for ratings of all wood samples between every pair of participants, followed by calculating the mean and standard error. This procedure quantifies consensus for each evaluation item, where higher IPC values indicate greater consistency in the perceived relative ordering of specimens [
3].
As shown in
Figure 4a, IPC results under the visual condition revealed high consistency for surface brightness and moderate consistency for glossiness. The strong agreement on brightness reflects its status as a low-level visual attribute with high cross-participant stability [
31]. In contrast, glossiness showed lower agreement, likely because it is influenced by uneven reflections produced by hardwood features such as vessels and ray patterns. These structural elements can create directional or patchy highlights, introducing greater variance in perceived glossiness compared with brightness.
Nevertheless, similarly high IPC values for glossiness and brightness were also observed under auditory and tactile conditions, indicating that these perceptions are not purely visual but supported by multisensory cues. Previous research has shown that surface gloss can be inferred from non-visual properties such as smoothness or sound characteristics [
32], underscoring its inherently cross-modal nature.
Under the auditory condition, IPC values were generally lower than in the visual condition, indicating weaker cross-participant agreement when judgments relied on sound alone. Loudness and timbre showed relatively strong consistency, reflecting shared recognition of fundamental acoustic features. Roughness also produced a high IPC value, likely because listeners map frequency and vibration characteristics onto tactile roughness, revealing a stable audio–tactile correspondence.
In the tactile condition, attributes tied directly to surface contact, such as roughness and warmth, exhibited the highest consistency. Naturalness also reached its highest IPC value under touch, suggesting that haptic information becomes the primary basis for evaluating naturalness when vision is unavailable. This pattern aligns with evidence that tactile cues, being more uniform and less semantically driven than visual cues, promote stronger agreement across individuals [
21].
Affective evaluation items showed low IPC values across all unimodal conditions but increased markedly under the multisensory condition. This indicates that emotional and aesthetic judgments depend on the integration of multiple sensory inputs; unimodal cues provide insufficient information for reliable higher-order evaluation. These findings support the view that emotional responses to materials rely on cross-modal processing and highlight the importance of multisensory experience in environmental appraisal.
3.2. Results of Exploratory Factor Analysis
Prior to exploratory factor analysis (EFA), Kaiser–Meyer–Olkin (KMO) statistics and Bartlett’s tests of sphericity were computed. Across all sensory conditions, KMO values exceeded 0.80 and Bartlett’s tests were significant (
p < 0.001), confirming the adequacy of the data for factor extraction. EFA was then conducted using maximum-likelihood estimation with Geomin rotation. A three-factor solution was specified based on eigenvalue inspection and theoretical interpretability, and this model explained approximately 50% of the total variance across conditions, as shown in
Table 4.
In the visual condition, Factor 1 comprised affective appraisal items such as pleasantness and liking, and was labeled Emotional Judgment. Visually inferred roughness also loaded on this factor, indicating that surface roughness strongly shapes emotional evaluation when assessed through vision alone. Factor 2 grouped surface-related attributes including brightness and glossiness together with timbre and loudness, suggesting cross-modal visual–auditory correspondences. Factor 3 captured Internal Properties, including hardness, density, weight and warmth, indicating that participants could infer internal material characteristics independently of surface impressions even without physical contact.
The tactile condition yielded a similar three-factor configuration. However, roughness shifted from Emotional Judgment to Surface Properties, where it aligned with naturalness and newness, demonstrating its role as a direct psychophysical cue during haptic exploration. Naturalness showed the highest loading in this modality, reinforcing touch as the primary channel for assessing this attribute. Internal properties separated from surface texture under tactile, which is consistent with evidence that roughness perception relies predominantly on somatosensory processing (S1), whereas hardness involves more cognitive inference associated with medial prefrontal engagement [
33,
34].
Under the auditory condition, attributes including brightness, glossiness, roughness, and warmth were integrated into Factor 2 together with timbre, suggesting that impact sounds evoke vivid visual and tactile imagery through cross-modal mapping. High-frequency components were associated with smoother and colder impressions, in agreement with established audio–visual and audio–tactile correspondences [
35,
36]. Internal properties again remained independent.
In the multisensory condition, the three-factor configuration was largely preserved. Roughness exhibited cross-loadings on both Emotional Judgment and Surface Properties, indicating that when multiple senses are available, roughness extends beyond sensory registration and acquires affective value.
Overall, three perceptual dimensions emerged across all sensory conditions: Emotional Judgment, Surface Properties, and Internal Properties. Emotional Judgment consistently constituted the dominant dimension; Surface Properties were modality-dependent and incorporated cross-modal cues such as brightness, glossiness, and loudness; Internal Properties remained structurally stable and independent. Roughness acted as a pivotal attribute, shifting between affective and sensory roles depending on modality. Naturalness consistently aligned with Surface Properties under touch and audition, suggesting grounding in direct sensory features rather than higher-order aesthetic appraisal. Finally, texture contrast and moistness showed weak loadings (<0.20), indicating limited contribution to the core perceptual structure of wood.
Figure 5 presents the distribution of wood samples factor scores with 95% confidence ellipses for each sensory condition, accompanied by variable loadings. These biplots visualize how participants differ in their evaluations along two dominant perceptual dimensions.
Across the visual, tactile, and multisensory conditions, sample dispersion was markedly greater along Emotional Judgment (Factor 1) than on the surface-Related Properties (Factor 2), indicating wide variation in emotional judgments but relatively consistent evaluations of surface properties. By contrast, the auditory condition showed the reverse trend, with a greater spread on Factor 2 and a tighter clustering on Factor 1, meaning that sound elicited larger individual differences in inferred material properties while emotional responses were relatively similar. Touch produced the most compact distribution across all modalities, indicating the strongest agreement among participants. This pattern aligns with the direct and verifiable nature of haptic cues, which depend less on semantic inference than visually or aurally inferred information.
Variable loadings indicate how perceptual attributes align with the two latent dimensions, brightness, glossiness, and loudness loaded strongly on Factor 2 in the visual, auditory, and multisensory conditions, whereas tactile loadings were more evenly distributed, suggesting that haptic evaluations emerge from the integrated contribution of multiple cues rather than being driven by a small set of dominant attributes. Notably, naturalness showed a strong loading in the tactile condition, highlighting the particular relevance of touch in how naturalness is perceived.
3.3. Results of the Exploratory Structural Equation Model
EFA revealed that roughness, warmth, newness, and naturalness did not remain fixed in any single factor across sensory conditions. Their shifting affiliations indicate that they are neither purely evaluative nor solely tied to direct perception of material properties, but instead function as intermediate attributes that mediate between low-level sensory cues and higher-level aesthetic judgment. This structure aligns with the three-tier semantic framework proposed by Katahira et al., comprising physical perception, intermediate impressions, and value-based evaluation [
37]. In line with this framework, the Factor 1 produced by EFA could be divided into value-based evaluation and intermediate impression items such as naturalness, perceived value, and newness, which shifted depending on the sensory modality.
Factors 2 and 3, describing surface and internal properties, corresponded to material-level perception; however, the affiliation shifts in roughness and warmth across conditions indicated they did not consistently remain in this physical domain. Their meanings were strongly modulated by visual and multisensory cues such as brightness, color, and hardness [
38], indicating that they operate as mid-level perceptual attributes rather than strictly physical ones.
Finally, although most low-level sensory items such as glossiness and brightness displayed stable factor memberships, several informative cross-loadings persisted. Constraining such cross-loadings to zero in confirmatory models can inflate factor loadings and latent correlations, resulting in biased structures. To preserve these meaningful secondary loadings while maintaining model interpretability, exploratory structural equation modeling (ESEM) was adopted. ESEM allows cross-loadings to be estimated rather than forced to zero, thereby yielding a more accurate and theoretically coherent representation of the latent perceptual structure [
39]. The resulting model is shown in
Figure 6.
Multi-group exploratory structural equation modelling (MG-ESEM) was conducted using the robust maximum likelihood estimator (MLR). The low-level physical layer was estimated exploratorily with Geomin oblique rotation to permit correlated latent factors, whereas the impression and emotional judgment layers were specified using a confirmatory measurement model. Items with standardized loadings below 0.40 or with salient cross-loadings were removed to ensure a clean factor structure and discriminant validity.
Measurement invariance across the four sensory conditions was assessed in a sequential framework, testing configural, weak, and strong invariance. Fit index changes are reported in
Table 5. Although the ΔCFI of −0.014 at the weak-invariance step slightly exceeded the conventional −0.01 criterion [
40], it remained within the −0.02 tolerance recommended when meaningful group differences are expected [
41]. Measurement invariance was thus considered acceptable, allowing valid comparison of structural paths and mediated effects across sensory conditions.
Figure 7,
Figure 8,
Figure 9 and
Figure 10 illustrate the ESEM results for each sensory condition, elucidating the multi-level path structures underlying participants’ perceptions and judgments of wood materials, as well as the variations across sensory conditions. The measurement models demonstrated reliability across conditions, with all standardized factor loadings exceeding 0.50. The average variance extracted (AVE) for the two confirmatory factors, cleanliness and emotional judgment, reached or approached 0.50 in most conditions, reflecting adequate convergent validity and explanatory power. Although cross-loadings existed between the two exploratory latent variables, all were below 0.30, indicating acceptable discriminant validity.
Figure 7 shows that in the visual condition both surface and internal qualities influenced emotional judgment mainly through perceived value and cleanliness. Surface qualities also exerted a direct negative effect, indicating that visually darker, harder, and thicker wood was evaluated more valuable and favorably. Roughness had only a weak direct effect on emotional judgment but contributed indirectly through cleanliness, as smoother visual impressions elicited tidier associations that improved aesthetic evaluation. Warmth increased perceived naturalness and cleanliness but had no direct effect. Cleanliness was the dominant mediator with the highest path coefficient, whereas naturalness did not significantly affect emotional judgment.
Figure 8 indicates that in the tactile condition physical attributes affected emotional judgment almost entirely through indirect pathways. Cleanliness showed a strong positive effect and served as the primary mediator. Warm and rough tactile sensations increased perceived naturalness, yet naturalness itself did not influence emotional judgment directly. These patterns reflect the strongly verifiable nature of haptic cues, with value judgments arising mainly through intermediary impressions rather than physical features alone.
Figure 9 shows that the AVE for cleanliness was 0.429, which is below the 0.50 benchmark and lower than other condition. This finding indicates that auditory cues provide a weaker basis for cleanliness impressions. Despite this limitation, cleanliness remained the strongest mediator of emotional judgment. Naturalness exerted a positive effect on emotional judgment, whereas surface qualities produced a negative effect. Together, these results suggest that sounds shape evaluation mainly through associative, semantic inferences about cleanliness and naturalness rather than through direct perceptual evidence.
Figure 10 demonstrates that multisensory information enhanced convergent validity and greater explained variance, as all item loadings exceeded 0.50 and that confirmatory factors achieved AVE values above 0.50. The structural model combined features from the unimodal analyses: internal qualities, roughness, cleanliness, and perceived value each contributed significantly to emotional judgment, while the direct effect of surface qualities lost significance. This pattern indicates that when information is supplied by multisensory, emotional judgments depend more on mediated impressions and inferred value than on raw physical attributes.
Indirect effects were estimated by Monte Carlo simulation with twenty thousand resamples. Paths with absolute standardized indirect effects greater than 0.05 and their 95 percent bias-corrected confidence intervals are reported in
Figure 11.
In the visual condition, both perceived surface and internal qualities influenced emotional judgment indirectly via perceived cleanliness and value. The cleanliness-mediated effects were 0.304 [0.216 to 0.402] for surface qualities, and −0.171 [−0.260 to −0.088] for internal qualities. Mediation via perceived value produced effects of −0.118 [−0.180 to −0.065] for surface qualities and −0.163 [−0.228 to −0.107] for internal qualities. A serial pathway running from low-level attributes through roughness to cleanliness was also significant but smaller in magnitude.
Under tactile testing, mediation through cleanliness and value remained strong. The serial path from perceived surface qualities via roughness to cleanliness increased from 0.084 [0.050, 0.125] in the visual model to 0.123 [0.022, 0.262] in the tactile model, indicating that haptically smoother, harder, and heavier specimens were more likely to be judged as cleaner and more valuable and thus received higher evaluation preference.
Across auditory and multisensory conditions, cleanliness and perceived value consistently served as the strongest mediators of emotional judgment. In the auditory condition, these two impression-layer variables remained the dominant routes, supplemented by a weaker but significant indirect effect via sound-evoked warmth.
In the multisensory condition, the strongest mediated paths precisely mirrored those observed under touch and were fully subsumed within the broader visual path structure, revealing robust visuo-haptic integration. This pattern further indicates that, when multiple senses are engaged simultaneously, emotional judgment relies primarily on higher-order impressions rather than on direct physical sensory cues.
3.4. Results of the Analysis of Sensory Weight
Figure 12 presents the relative sensory weights in multisensory integration. All parameters had the maximum potential scale reduction factor (R-hat) < 1.01 and minimum effective sample size (n_eff) > 400, indicating successful convergence of the Markov chains and reliable posterior inference.
For surface qualities, vision was the dominant cue, with a weight of 0.679 [0.631, 0.725]. Audition provided moderate modulation (0.290 [0.247, 0.334]), whereas touch contributed minimally. This indicates that judgments of gloss, brightness, and other surface attributes are primarily visually driven, with auditory cues offering supportive refinement—a pattern consistent with cross-modal enhancement mechanisms [
42,
43]. For internal qualities, both vision and touch served as key channels, with comparable weights of 0.461 [0.405, 0.519] and 0.404 [0.345, 0.461], respectively, both clearly exceeding audition. These results suggest that internal qualities depend on the integration of visual and tactile cues, while auditory information plays only a minor role. For roughness and warmth, touch showed the highest weights of 0.555 [0.504, 0.607] and 0.544 [0.482, 0.607], respectively. Vision also made non-negligible contributions, indicating that visual cues such as color, and gloss interact with tactile sensations to shape multisensory judgments.
For value, cleanliness, and naturalness, the weighting patterns were consistent across dimensions: vision was the dominant channel, with weights above 0.5; touch played a secondary yet notable role with weights around 0.3; and audition contributed only weak modulation. These findings indicate that higher-level affective evaluations rely primarily on visual information, supported by tactile input, while auditory cues exert comparatively minor influence.
3.5. Correlation Between Physical Properties and Subjective Evaluation
The measurement results of various physical properties of wooden materials as shown in
Table 6.
Pearson correlation coefficients were computed between measured physical parameters and mean sensory ratings for each modality.
Figure 13 presents the resulting correlation matrices for the visual, tactile, and auditory conditions and reveals modality-specific mappings between physical properties and subjective impressions.
Under vision, almost all sensory ratings correlated significantly with visual physical features, indicating that gloss, color and grain are primary determinants of visual impressions. Gloss units measured at 60 degrees incidence, L* and b* were positively associated with ratings of perceived surface and internal properties, warmth and naturalness. By contrast, a*, overall color difference and color difference between early and late wood were negatively associated with perceived roughness, cleanliness and overall aesthetic judgment. These results suggesting that higher lightness and yellowness tended to evoke impressions of brightness, softness, warmth and naturalness, whereas increased redness and greater color heterogeneity tended to signal roughness and soiling and to lower visual aesthetic ratings. Among descriptors of wood grain, perceived roughness and cleanliness correlated negatively with contrast, while correlated positively with homogeneity and energy, implying that regular, uniform grain patterns are linked to impressions of smoothness and tidiness.
Under touch, wood density showed a strong negative relationship with perceived internal properties and warmth (p < 0.01), and a moderate positive relationship with emotional judgment (p < 0.05), indicating a haptic preference for denser samples. Thermal conductivity was not significantly related to all subjective evaluation, suggesting that, within the tested range of wood samples, hand-perceived thermal sensation is driven more by semantic associations with perceived internal properties than by measurable heat transfer. Measured roughness failed to predict perceived roughness. This discrepancy can be attributed to the fact that subjective roughness perception is jointly determined by asperity contacts governed by microscopic surface topography and by biomechanical skin interactions, including deformation and adhesion, that scale with the material’s elastic modulus. Consequently, a single roughness metric cannot capture the percept accurately.
In the auditory condition, higher sound frequency, greater specific dynamic elastic modulus, and increased acoustic radiation damping were linked to higher ratings of surface properties, roughness and cleanliness and to lower ratings of warmth and naturalness, indicating that higher-frequency sounds evoke cross-modal impressions of brighter, smoother and colder materials.
Figure 14 summarizes results under the multisensory condition. The multisensory context largely preserved the mappings observed in the visual and tactile modalities and in several cases strengthened them. Notably, gloss measured at an 85 degree incidence angle and the roughness metric
Sa, which had not reached significance in single sensory conditions, showed significant associations with multiple perceptual ratings under multisensory conditions. Meanwhile, several correlation directions shifted markedly in the multisensory condition. A weak positive relationship between density and surface properties observed in the tactile-only condition became a significant negative relationship when visual cues were added. Likewise, the negative associations between acoustic parameters such as frequency and specific dynamic modulus and the impressions of warmth and naturalness observed under audition alone reversed to significant positive correlations once visual and tactile information was present. These reversals are parsimoniously accounted for by reliability-weighted multisensory integration. When cues from multiple channels are combined, information from the most reliable modality receives greater weight. For surface texture perception, vision proved the more reliable cue and therefore dominated the fused estimate. Because denser hardwoods tend to appear darker, greater visual weighting introduced a negative density–texture association that supplanted the tactile-only pattern. By contrast, perceptions of warmth and naturalness relied principally on visual and haptic evidence, so auditory cues carried lower weight and exerted less influence in the combined estimate.
3.6. Results of Linear Mixed-Effects Models Analysis
Linear mixed-effects models were used to quantify how measured physical properties account for variance in subjective ratings. Severe multicollinearity among auditory predictors was detected by variance inflation factors (VLF) exceeding twenty, so frequency alone was retained as the auditory predictor. Physical parameters were entered as fixed effects, while participants and wood specimens were modeled as random effects to partition within- and between-group variance. Marginal R2 was used to quantify variance explained by fixed effects, conditional R2 to assess total model fit including random effects, and between-group R2 to capture how well the fixed effects explained average differences among samples.
Results for the visual condition are summarized in
Figure 15. Between-group
R2 exceeded 0.60 for several key pairs, including surface qualities predicted by
L*, internal qualities and warmth–coldness predicted by
b*, cleanliness and roughness predicted by color difference, naturalness predicted by texture correlation, and emotional judgment predicted by
a*. These high between-group
R2 indicate that these physical features account well for sample-level differences in visual impressions. However, marginal and conditional
R2 values indicated that higher-order impressions predicted by physical attributes had weak explanatory power. Marginal
R2 values were generally below 0.30, and although conditional
R2 increased after accounting for random effects, it remained far below the between-group
R2. This persistent gap reflects substantial unexplained variance at the individual level. Variance decomposition confirmed this pattern: predictors such as color difference for roughness, texture correlation for naturalness and perceived value, and
a* for emotional judgment showed large random components, with participant variance consistently exceeding sample variance.
Only surface impressions predicted by L* showed both marginal and conditional R2 above 0.60, with minimal random variance, indicating a stable population-level predictor. By contrast, color difference yielded a moderate marginal R2 above 0.30 for perceived cleanliness, yet participant variance substantially outweighed sample variance, suggesting that the perceptual mapping between color heterogeneity and cleanliness differs across individuals.
Figure 16 summarizes the LMM results for the tactile and auditory conditions. In the tactile condition, density was the dominant predictor of internal qualities and warmth. Both outcomes showed strong between-group
R2 > 0.60 and moderate fixed-effect strength as marginal
R2 > 0.30. For internal qualities, participant and sample variances were comparable, indicating that density explained both material differences and general perceptual trends. For warmth, participant variance was far greater, showing that density reliably ranked materials but could not account for individual thermal impressions.
In the auditory condition, frequency served as the key acoustic predictor. For surface qualities, between-group R2 exceeded 0.60 and marginal R2 approached 0.30, with sample variance outweighing participant variance, indicating a stable group-level link between acoustic cues and perceived surface attributes. In contrast, warmth exhibited extremely high between-group R2 (>0.90) but very low marginal R2 (<0.10), and participant variance dominated, demonstrating that although frequency sorted materials consistently at the group level, semantic mapping to thermal impressions varied widely across individuals.
Figure 17 presents the results for the multisensory condition. The multisensory mappings largely preserved the relationships observed under vision and touch.
L* remained the primary determinant of surface qualities and showed the strongest overall performance as between-group
R2, marginal
R2, and conditional
R2 all exceeded 0.60, indicating that lightness consistently explained sample-level differences in surface appraisal. Density retained moderate predictive value for internal qualities, with a marginal
R2 of 0.51 and minimal random variance, confirming its stable influence across participants.
Although b* continued to predict perceived naturalness and a* remained the leading predictor of emotional judgment—with between-group R2 values generally above 0.60—their marginal R2 values were consistently below 0.30. Thus, while these parameters captured reliable group-level ordering, they explained little of the variance in impression judgments. Participant-level variability dominated these outcomes, indicating that single physical metrics alone are insufficient to account for individual multisensory response.
4. Discussion
4.1. Sensory Difference Between Wooden Materials Perception
The ESEM structural models revealed clear modality differences. The visual condition produced the highest number of significant paths and the greatest dispersion along the emotional judgment axis, whereas the tactile condition yielded fewer paths and tighter clustering, consistent with stronger inter-participant agreement. These patterns may reflect differences in underlying neural processing.
Visual information follows the ventral stream. Retinal signals are transduced and relayed to primary visual cortex V1 where basic features are extracted. Successive processing in extrastriate cortex areas V2, V3 and V4 for low-level features like color and texture, which are then consolidated in inferior temporal (IT) cortex for view-invariant material recognition. IT representations are transmitted through the parahippocampal gyrus (PHG), and perirhinal cortices (PRC) into the hippocampal for episodic memory integration, and into multimodal anterior temporal regions that function as a semantic hub [
12].
Because visual processing supports both rapid bottom-up feature extraction and extensive top-down integration with memory and semantic networks, low-level material cues may influence emotional judgment directly or indirectly through memory- and meaning-based impressions. This dual pathway may help explain the richer path structure and greater individual variability in the visual condition, reflecting participant-specific histories, cultural associations and learned preferences.
In contrast, tactile processing of wood is initiated by mechanoreceptors in glabrous skin, including Meissner corpuscles, Merkel discs, Pacinian corpuscles and Ruffini endings, which transduce mechanical stimuli into neural impulses conducted by Aβ fibers. Signals ascend primarily via the dorsal column medial lemniscus pathway (DCML), relaying through the gracile and cuneate nuclei to the ventral posterolateral thalamus and then to primary somatosensory cortex (S1) where roughness and vibration attributes are extracted. Parallel projections reach secondary somatosensory cortex (S2) and posterior insula for integrated texture and shape perception, and affective tagging is achieved through further routing to anterior insula, the amygdala and orbitofrontal cortex (OFC). Crude touch and thermal or nociceptive inputs follow the spinothalamic tract (STT) to adjacent thalamic nuclei before reaching cortical targets [
11,
44,
45]. Compared with vision, this stream is considered relatively feedforward and somatically grounded, and may engage high-level semantic and episodic systems to a lesser extent. This may lead tactile judgments to rely more heavily on immediate psychophysical properties, which could contribute to tighter inter-participant agreement, fewer significant structural paths in the ESEM, and effects that are mediated mainly by haptic impressions such as perceived cleanliness and naturalness.
The auditory condition exhibited a distinctive profile. Factor scores along the surface-property varied widely among participants, yet brightness, glossiness, and roughness showed relatively high IPC, larger than 0.4. This indicates that impact sounds provide objective cues about material surfaces, enabling consistent relative rankings even when absolute ratings differ. By contrast, scores on the emotional judgment axis were tightly clustered with low IPC, reflecting uniformly neutral and weakly differentiated preferences.
This pattern is consistent with how material sounds are processed. Wood impacts are encoded in the cochlea, transmitted through brainstem auditory nuclei to the medial geniculate thalamus, and then decomposed in primary auditory cortex (A1) into frequency and decay characteristics. These signals subsequently engage posterior superior temporal regions that integrate information across modalities and can modulate early visual processing. Impact sounds also recruit insular areas linked to tactile perception, supporting cross-modal inference of surface features [
46]. This may allow listeners to infer roughness and gloss from sound alone, even without direct touch or vision, although individual variation in rating scale usage. Emotional responses, however, are generally thought to involve stronger engagement of limbic valuation circuits. Although auditory information can reach orbitofrontal and amygdala regions, impact sounds from wood are relatively unfamiliar in everyday life and may lack strong affective associations [
47]. This could be related to the relatively neutral and less variable emotional judgments observed.
In summary, the emotional experience of wood in architectural and furniture contexts can be optimized by integrating modality-specific regulatory mechanisms within a multisensory framework. For vision, substantial differences driven by culture and prior experience necessitate user segmentation, with wood color and grain pattern tailored to the preferences of distinct groups. For the young Chinese adult group examined in this study, brighter tones and more uniform grain patterns were associated with higher collective preference, although the relevance of these tendencies for other user groups remains to be investigated. For audition, impact sounds can be used to establish the association with perception and emotion judgement, thereby strengthening evaluations of material quality and affective valence. For touch, the relatively uniform encoding by mechanoreceptors and direct somatosensory projections to limbic regions can be exploited by carefully adjusting surface friction, vibration cues, and pressure feedback to enhance both tactile comfort and positive affective responses. Collectively, the coordinated use of these sensory channels enables a more comprehensive and effective optimization of wood-evoked emotion in built environments.
4.2. Pathways Influencing Emotional Judgments of Wooden Materials
Path analyses using ESEM consistently identified perceived cleanliness and value within the impression layer as the primary mediators linking physical attributes to emotional judgments. Across all sensory conditions the largest indirect effect was mediated by cleanliness, indicating that materials perceived as newer and cleaner were reliably associated with higher evaluation preference. Naturalness produced significant mediation in some models but its indirect effects were substantially smaller than those of cleanliness.
These findings suggest that high perceived naturalness did not consistently enhance evaluation preference, but was instead modulated by both cultural and individual dispositions. Prior research shows that the same landscape feature can elicit contrasting appraisals across cultures, with natural debris along riverbanks perceived as authentic in many European samples but viewed as signs of neglect and evaluated less favorably by Chinese observers [
18]. Individual differences in the need for order, as measured by the Personal Need for Structure (PNS) scale, similarly influence preference. A stronger need for structure predicts favoring clean, well-maintained environments, whereas lower need for structure aligns with greater tolerance for unmanicured scenes [
48,
49]. Given that cultural tendencies in China such as higher uncertainty avoidance, collectivism, and long-term orientation are associated with stronger preferences for order, Chinese participants generally exhibited higher structure-seeking tendencies than Western participants. Consequently, both cultural and individual factors made cleanliness and perceived value more salient mediators of aesthetic judgment, thereby explaining why tidy, intact surface cues exerted stronger effects than cues of unmaintained naturalness.
Within the mid-level physical layer, roughness functioned as a central cross-modal regulator. It showed a positive indirect effect on emotional judgment by enhancing perceived naturalness, yet simultaneously produced a negative indirect effect when interpreted as a cue of dust accumulation or poor maintenance, reducing perceived cleanliness. Thus, its influence depended on the relative weight assigned to naturalness versus cleanliness within a specific cultural or contextual framework.
Although low thermal conductivity of wood is widely assumed to promote preference by producing a warm tactile sensation, warmth perception exerted no significant direct effect on emotional judgment in any sensory condition. Its influence was limited to a modest indirect pathway via increased naturalness. This likely reflects the restricted variability in thermal conductivity among the tested samples and suggests that warmth perception relates more to physiological comfort and restorative experience than to core aesthetic evaluation, thereby contributing indirectly rather than as a direct determinant of attractiveness.
At the low-level physical layer, perceived surface and internal properties produced significant direct effects on emotional judgment across conditions, yet their indirect effects, —mediated by impression-level attributes, were consistently greater in magnitude. This indicates that low-level physical features influence aesthetic judgment primarily through cognitively and culturally shaped impression mapping rather than through immediate sensory preference, consistent with the classic physical to impression, and then to emotional mediation pathway observed across sensory modalities.
4.3. Sensory Weights in Multisensory Integration
Sensory weight analysis revealed a clear division of labor across modalities in wood perception. Visual input dominated evaluations of surface attributes, higher-order impressions, and emotional judgments, reflecting its superior role in resolving macroscopic material cues. Tactile input carried comparable or greater weight for internal properties and directly felt attributes such as warmth, and roughness. Notably, touch retained substantial influence on higher-level impressions and affective evaluations, underscoring its central role in material experience and occupant comfort. Auditory cues contributed minimally and primarily reinforced perception of surface properties through cross-modal correspondences. In the absence of learned associations, impact sounds did not reliably convey internal properties or support affective appraisal. Although sound has been proposed as a proxy for thickness or density in virtual environments [
35,
50,
51], its weighting remained low here, highlighting the importance of acquired audio–tactile mappings for effective auditory contribution [
43,
52].
The observed weighting pattern is consistent with reliability-weighted integration, in which sensory channels that provide more certain information for a given judgment receive greater influence. These findings support a human-centered, multisensory design strategy for wood in buildings. Visual attributes should be tailored to the needs of specific user groups, potentially by adjusting features such as brightness, color contrast, and texture uniformity according to group-related perceptual tendencies. For the young Chinese adults examined here, lighter tones and lower visual contrast were associated with higher preference. Tactile properties should be optimized to support broad comfort, and auditory cues should be incorporated when meaningful associations between sound, emotion, or physical attributes have been established. Through such coordinated multisensory refinement, emotional restoration and wellbeing in wooden environments can be more effectively supported [
4,
53].
4.4. Mapping Relationships Between Subjective Ratings and Physical Parameters
Correlation and linear mixed-effects analyses consistently showed that vision was driven primarily by surface color and texture, touch by density, audition by impact frequency, and multisensory perception by combined visual–haptic cues. These predictors explained substantial sample-level variance, with between-group R2 typically exceeding 0.60. However, only L* for perceived surface qualities and density for perceived internal qualities combined strong group-level explanatory power with meaningful individual-level predictability. For impression and emotional judgments, most physical parameters yielded marginal R2 below 0.30, and random effects dominated, indicating limited capacity to predict individual ratings due to large variability in baseline perception and semantic interpretation.
Thus, objective physical metrics serve as reliable indicators of low-level attributes and the relative ranking of materials, yet they are insufficient to account for individual aesthetic or affective responses. Accurate, personalized prediction of higher-order impressions requires integrating non-physical determinants such as prior sensory experience, cognitive biases, and cultural semantics.
4.5. Limitations
This study has several limitations. First, the participants were limited to young university students. Therefore, the extent to which the findings can be generalized to other age groups, such as children or older adults—whose sensory functions and cultural backgrounds may differ—remains to be examined.
Second, the wood materials investigated in this study represent a focused subset of commonly used species. Future research could extend the present findings by including a wider range of wood materials, broader age groups, and participants from diverse cultural backgrounds, in order to further strengthen evidence-based strategies for restorative wood design.
Third, the results were obtained under controlled experimental conditions, which may differ from real-life architectural environments. As a result, the generalization of the findings to everyday settings should be made with caution.
In addition, the cultural context discussed in this study was primarily interpreted through comparisons with previous studies conducted in Western populations. This approach does not fully capture the complexity of intercultural design implications. Future studies involving participants from multiple cultural backgrounds would help to further deepen understanding in this area.
Finally, olfactory stimulation was not included in the present study. Given the importance and complexity of olfactory perception, future research using experimental designs specifically tailored to smell is warranted to provide a more comprehensive understanding of multisensory experience.