The purpose of study 2 is to answer the following RQs: To what extent does adding a background impact the perceptions of unity and variety compared with viewing the individual window (RQ 2)? Does the background influence (a) the relationship between perceived unity and perceived variety and (b) the predictive effects of unity and variety on aesthetic preference (RQ 3)? Using questionnaires, study 2 employed a between-subjects design to avoid carryover effects that might arise from repeated evaluations of the same window by the same participant under different conditions. Different from study 1, a within-subjects design might have caused participants to recognize the study’s purpose after viewing the background conditions, leading them to adopt a comparative strategy for rating, weakening the reliability and validity of the ratings. A between-subjects design allows each participant to judge within only one context, ensuring that their aesthetic evaluations are based on a stable reference framework that more closely reflects their actual perceptual experience.
4.6. Study 2 Result
The reliability analysis of the questionnaire, indicated by Cronbach’s Alpha coefficient, is presented in
Table 5, showing good data reliability. Additionally, the KMO value is 0.714, and the significance of Bartlett’s Test of Sphericity is 0.000 < 0.01, indicating good data validity.
Windows in a Chinese background score about 0.34 higher on perceived unity than in a Western background on average, while the Western backgrounds raise perceived variety by around 0.37, as expected. In the Chinese-background condition, the three most preferred windows (Windows 1, 8, and 9) all exhibited very high unity (5.13–5.38) while maintaining only middle variety (3.95–4.73). As unity scores fell below approximately 4.75, preference decreased, even when variety remained at a middle level. In the western background, windows with high variety (>5.0) never reached preference scores above 4.9. Instead, the preference maximum (Windows 1, 3, and 6) occurred for windows combining high unity (4.92–5.03) with middle variety (4.15–4.62). Visual inspection of
Figure 4 suggested the pattern of high unity with middle variety for aesthetic preference.
After that, the LMM was conducted to examine how backgrounds influenced the perceived unity and variety from the window to the overall compositions. We combined Studies (no-background study 1 and with background study 2) and Background Style (Chinese and Western) into a single three-level factor. Treatment coding (no background = 0 0, Chinese background = 1 0, Western background = 0 1) made the model intercept equal to the mean unity rating for windows shown without the background. This intercept was also treated as the baseline. After that, the average of the two background conditions was compared to this baseline, testing them with the background effect.
We initially specified a random-intercepts model with random intercepts for participants and stimuli. We then compared this base model to (a) a model with random slopes of background at the participant level and (b) a model with random slopes of background at the stimulus level. Although these random-slope models reduced AIC and yielded significant likelihood-ratio tests compared to the random-intercepts model, both produced boundary (singular) fits (with variances collapsing and random-effects correlations approaching ±1), indicating that the additional variance components were not reliably estimable. Following current recommendations for LMM [
63,
64], we therefore retained the random-intercepts model as the selected specification. Besides, we did not reveal violations of model assumptions in residual diagnostics; therefore, this model was the final model.
Unity ratings were estimated at 4.81 when windows were viewed in isolation. Adding any background significantly reduced perceived unity (
B = −0.11, SE = 0.04,
t = −2.78,
p < 0.01). Follow-up results showed that a Chinese background was insignificantly higher than the no-background baseline (
B = 0.06, SE = 0.05,
t = 1.18,
p = 0.24, partial
R2 = 0.002), suggesting a negligible unique contribution to the variance in unity. In contrast, the Western background decreased unity significantly (
B = −0.28, SE = 0.05,
t = −5.71,
p < 0.001, partial
R2 = 0.039), accounting for a small but non-trivial effect in magnitude after controlling for other fixed and random effects. Thus, stylistic incongruence (Western background) decreased unity, while the congruent Chinese background led to a small but insignificant unity increase compared to the baseline, as shown in
Table 6.
Across the ten windows, the per-window Welch tests showed a pattern broadly consistent with the LMM results when inference was based on BH-FDR-adjusted
p values (
Table 7). Adding a Western background lowered perceived unity for nine windows (Windows 1–9), and this decrease remained statistically reliable for seven windows (Windows 1, 3, 4, 5, 6, 7, and 8;
p < 0.05), indicating a unity-suppressing effect under stylistic incongruence. By contrast, adding a Chinese background increased perceived unity for seven windows (Windows 2, 4, 5, 6, 7, 8, and 10), but none of these increases remained reliable after FDR correction (all
p > 0.05), suggesting that any unity enhancement under stylistic congruence was comparatively weak and inconsistent at the stimulus level. In the with-background condition, unity increased for two windows (Windows 2 and 10) but did not survive FDR correction, whereas eight windows decreased, with three reliable decreases (Windows 1, 3, and 7;
p < 0.05), as shown in
Table 7.
In conclusion, Western backgrounds consistently suppressed unity across most stimuli, whereas Chinese backgrounds generally maintained or slightly enhanced unity; inconsistent background effects may exceed consistent background effects at the level of unity.
The procedure was replicated for variety. For variety ratings, we used the same fixed-effect structure and random intercepts for participants and stimuli as the baseline model. We then compared this base model to (a) a model with random slopes of background at the participant level and (b) a model with random slopes of background at the stimulus level. The model with participant-level random slopes returned a boundary (singular) fit and was therefore discarded. In contrast, the model with stimulus-level random slopes substantially improved model fit over the random-intercepts model (ΔAIC ≈ 146; χ2 (5) = 155.69, p < 0.001) and did not exhibit singularity. We did not reveal violations of model assumptions in residual diagnostics. We therefore adopted the model with random intercepts for participants and random intercepts plus random slopes for stimuli as the final specification for variety.
Variety ratings were estimated at 4.62 when windows were viewed in isolation. Adding any background insignificantly increased perceived variety (
B = 0.02, SE = 0.04,
t = 0.54,
p = 0.59). Follow-up results showed that the Chinese background was significantly lower than the no-background baseline (
B = −0.16, SE = 0.05,
t = −3.12,
p = 0.002, partial
R2 = 0.012), indicating a small but non-negligible unique effect on perceived variety. In contrast, the Western background increased variety significantly (
B = 0.21, SE = 0.05,
t = 4.00,
p < 0.001, partial
R2 = 0.020), suggesting a small yet reliable increase. Thus, stylistic incongruence Western background increased variety significantly, while the congruent Chinese background decreased variety significantly, as shown in
Table 8.
Across the ten windows, the per-window Welch tests showed a pattern broadly consistent with the LMM results when inference was based on BH-FDR-adjusted
p values (
Table 9). Under the Chinese background, perceived variety decreased for eight windows (Windows 2, 4–10) and remained statistically reliable for six of them (Windows 2, 4, 5, 6, 7, and 10;
p < 0.05). Variety increased for two windows (Windows 1 and 3), but only Window 1 showed a reliable increase after FDR correction (
p < 0.05). Under the Western background, variety increased for eight windows (Windows 1, 4–10) and remained reliable for four windows (Windows 4, 5, and 10;
p < 0.001, and Window 7;
p < 0.05), whereas the apparent increases for Windows 1 and 8 did not survive FDR correction. Variety decreased for two windows (Windows 2 and 3), with only Window 3 showing a reliable decrease (
p < 0.001). In the with-background condition, variety decreased for six windows (Windows 2, 3, 6–9), but only Window 7 remained reliably lower after FDR correction (
p < 0.05). By contrast, variety increased for four windows (Windows 1, 4, 5, and 10), and this increase remained reliable for Windows 1 and 10 (
p < 0.05). As shown in
Table 9.
In conclusion, Chinese backgrounds decreased variety, whereas Western backgrounds increased it, mirroring the direction and magnitude of the LMM coefficients.
The following steps focused on the composition to examine the effects of unity, variety, and background at the composition level. We analyzed aesthetic preference ratings using LMM with unity and variety as continuous predictors and background style as a three-level factor (no background, Chinese background, and Western background). Unity and variety ratings were standardized before analysis. The fixed-effects structure included main effects of Unity and Variety, the three-level factor of Background (treatment-coded with no background as the baseline), and all two-way interactions between Unity and Variety with Background. We began with a random-intercepts model, including random intercepts for participants and stimuli. We then compared this baseline specification to (a) a model with random slopes of Unity and Variety at the participant level and (b) a model with random slopes of Unity and Variety at the stimulus level. Relative to the random-intercepts model, the participant-level random-slope model substantially improved fit (ΔAIC ≈ 235; χ2(5) = 245.02, p < 0.001) without evidence of singularity, indicating reliable between-participant variability in the slopes of unity and variety. The stimulus-level random-slope model yielded only a small improvement in AIC (ΔAIC ≈ 7.66; χ2(5) = 17.66, p = 0.003) at the cost of additional model complexity. We did not reveal violations of model assumptions in residual diagnostics. We therefore retained the model with random intercepts for stimuli and random intercepts plus random slopes for participants as the final specification.
In the no-background situation, unity showed a significantly positive relation with aesthetic preference (B = 0.36, SE = 0.02, t = 20.50, p < 0.001, partial R2 = 0.054), indicating a small-to-moderate unique effect: a one-standard-deviation increase in unity was associated with a 0.36-point increase in aesthetic preference and accounted for about 5% of the residual variance. Variety also showed a significantly positive relation with aesthetic preference (B = 0.21, SE = 0.02, t = 11.61, p < 0.001, partial R2 = 0.019), representing a smaller but still meaningful unique effect on preference. The Chinese background had a small positive effect on preference (B = 0.09, SE = 0.04, t = 1.97, p = 0.048, partial R2 = 0.005), whereas the Western background showed a small negative effect (B = −0.10, SE = 0.04, t = −2.29, p = 0.022, partial R2 = 0.007).
All interaction terms were significant and negative, but their partial
R2 values were very small (≤0.001), indicating subtle yet reliable effects. The interactions between unity and the Chinese (
B = −0.13, SE = 0.04,
t = −3.40,
p < 0.001) and Western (
B = −0.27, SE = 0.04,
t = −6.46,
p < 0.001) backgrounds indicated that the unity effect was weakened relative to the no-background situation. Likewise, the interactions between variety and the Chinese (
B = −0.08, SE = 0.04,
t = −2.17,
p = 0.029) and Western (
B = −0.12, SE = 0.04,
t = −3.08,
p = 0.002) backgrounds showed that the variety effect was also weakened relative to no background. In conclusion, unity exerted the strongest and most practically meaningful influence on aesthetic preference, variety contributed to a smaller yet non-trivial effect, and background mainly acted to attenuate these positive unity and variety effects at the composition level, as shown in
Table 10.
Although all interaction terms between unity/variety and background reached statistical significance, their partial R2 values were very small (all ≤0.01). This pattern indicates that it slightly attenuates or amplifies the positive effects of unity and variety on preference but does not overturn the overall dominance of unity (and, to a lesser extent, variety) in predicting aesthetic preference. Given the large sample size and the relatively constrained manipulation of background (two stylized roof types applied to the same window stimuli), such small but reliable interaction effects are methodologically plausible and theoretically consistent with the view that stylistic context fine-tunes, rather than replaces, the core unity–variety mechanism.
The LMM was conducted to examine how unity influenced variety, with crossed random intercepts for participants and stimuli. We then compared this baseline specification with more complex models that additionally allowed the unity slope to vary across participants and/or across stimuli. Model comparison based on AIC/BIC and likelihood-ratio tests indicated that these random-slope extensions either failed to provide a meaningful improvement in fit or led to boundary (singular) solutions. Following current recommendations for multilevel modelling [
63,
64], we therefore retained the random-intercepts model as the selected specification. We did not reveal violations of model assumptions in residual diagnostics. Consequently, we retained this model as the final specification.
Variety ratings were first examined as a function of unity and background. In the no-background condition, higher unity was strongly associated with lower perceived variety (
B = −0.62, SE = 0.02,
t = −36.76,
p < 0.001, partial
R2 = 0.149), indicating that unity alone accounted for about 15% of the variance in variety, a medium-to-large unique effect. Two significant interaction terms showed that this negative unity–variety relation was clearly influenced by background. In the Chinese-background condition, the unity × Chinese interaction was positive, indicating a modest but reliable moderating effect (
B = 0.65, SE = 0.04,
t = 15.62,
p < 0.001, partial
R2 = 0.032), meaning that the strong negative slope of unity on variety observed with no background was weakened in the presence of a congruent Chinese background. The unity × Western interaction was also positive, though slightly smaller in magnitude (
B = 0.57, SE = 0.04,
t = 15.20,
p < 0.001, partial
R2 = 0.021), indicating that an incongruent Western background likewise attenuated the opposition between unity and variety, reducing the steep negative slope to a much weaker level. Together, these findings show that while unity is a strong negative predictor of perceived variety in isolation, background context reliably softens this opposition, especially under the stylistically congruent Chinese background (
Table 11).