An Automated Decision-Support Framework for Interior Space Quality Evaluation Using Computer Vision and Multi-Criteria Decision-Making

Wang, Yuanan; Zhao, Zichen; Guan, Xuesong

doi:10.3390/buildings16081508

Open AccessArticle

An Automated Decision-Support Framework for Interior Space Quality Evaluation Using Computer Vision and Multi-Criteria Decision-Making

by

Yuanan Wang

¹,

Zichen Zhao

² and

Xuesong Guan

^1,*

¹

College of Art and Design, Nanjing Forestry University, Nanjing 210037, China

²

School of Life Sciences, Nanjing University, Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(8), 1508; https://doi.org/10.3390/buildings16081508

Submission received: 4 March 2026 / Revised: 29 March 2026 / Accepted: 6 April 2026 / Published: 12 April 2026

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

Download

Browse Figures

Versions Notes

Abstract

With the growing adoption of data-driven workflows and the need to compare numerous interior design alternatives in housing renewal, scalable and consistent assessment of interior space quality is increasingly important; however, current practice still depends on manual scoring and expert judgment. To address this gap, we propose an automation-ready framework that evaluates interior space quality from visual data. We construct the Functionality–Healthiness–Aesthetics Spatial Interior Dataset-10K (FHASID-10K) with 13,962 images for systematic validation. Three sub-models quantify functionality via space utilization and circulation smoothness, healthiness via detection of health-related visual elements, and aesthetics via semantic visual representations with regression-based prediction. Dimension scores are standardized and fused using the analytic hierarchy process (AHP) and the technique for order preference by similarity to ideal solution (TOPSIS) to produce a comprehensive score for ranking and grading. Experiments show stable score distributions and clear differentiation across space categories and style–space combinations. A gradient-boosted decision tree (GBDT) surrogate reconstructs the fused score with high accuracy (test R² = 0.9992; MSE = 1.1 × 10⁻⁵), and human-subject evaluation shows strong agreement with overall-quality ratings (r = 0.760, p < 0.001). Overall, the framework enables scalable benchmarking, scheme comparison, and decision support.

Keywords:

interior space quality; computer vision; automation; multi-criteria decision-making; AHP–TOPSIS; decision support; residential interiors

1. Introduction

In recent years, multi-criteria decision-making (MCDM) methods have been increasingly applied in engineering and built-environment research to address complex problems involving multiple criteria, competing objectives, and trade-offs [1,2]. In the field of interior design, space quality has become an important concern because it influences not only functional performance but also users’ perceptual and emotional experiences. As residential design gradually shifts from quantity-oriented development to quality-oriented improvement, the ability to evaluate interior space quality in a systematic, comparable, and reproducible manner has become increasingly important for design selection, renovation planning, and quality control [3,4,5].

At present, however, the evaluation of interior spaces still relies heavily on expert judgment, designer experience, and post-occupancy satisfaction surveys. Although these approaches remain valuable in practice, they are often subjective, time-consuming, and difficult to standardize across different spatial categories and design styles. This limitation is particularly evident in the context of existing housing renovation, where decision-making increasingly requires objective and repeatable tools for comparing alternative design schemes and optimizing residential environments [6,7]. In many cases, renovation efforts focus mainly on structural safety or localized functional improvement, while overall spatial quality—especially in terms of functional efficiency, health-related attributes, and visual experience—has not been evaluated from an integrated perspective.

Recent studies have introduced computational and data-driven approaches to quantify specific aspects of interior environments, such as spatial layout, visual characteristics, and environmental or health-related elements [8,9,10]. These studies provide important technical support for the quantitative assessment of interior environments. Nevertheless, most existing approaches focus on a single dimension and therefore struggle to capture the multidimensional nature of interior space quality [11,12]. In addition, the relationship between algorithmic outputs and human subjective perception is often insufficiently clarified, which limits the interpretability and practical applicability of computational evaluation models in real design contexts [13,14]. Therefore, there remains a need for an integrative evaluation framework that can combine multiple quality dimensions, support large-scale analysis, and produce results that are both quantitatively robust and practically meaningful.

To address this gap, this study develops a data-driven and image-based framework for interior space quality evaluation. Unlike prior studies that mainly rely on subjective judgment or apply MCDM to predefined qualitative criteria, the proposed method directly extracts evaluative evidence from visual data. Specifically, computer-vision-based sub-models are used to quantify functionality, healthiness, and aesthetics from interior renderings, and these dimensional scores are then integrated through AHP–TOPSIS to generate a comprehensive score for ranking and grading design schemes. The methodological novelty lies in establishing a unified three-dimensional evaluation structure, translating abstract quality dimensions into computable visual indicators, and combining computer-vision-based assessment with multi-criteria decision fusion in a scalable and reproducible pipeline for large-scale interior scheme evaluation.

The remainder of this paper is organized as follows. Section 2 reviews the relevant literature and identifies the research gap addressed in this study. Section 3 presents the theoretical methodology, including the evaluation dimensions, methodological framework, and the AHP–TOPSIS-based scoring logic. Section 4 describes the practical implementation of the framework, including the dataset, experimental workflow, and sub-model construction. Section 5 reports the results and validation findings. Section 6 discusses the main implications and limitations of the proposed framework. Finally, Section 7 concludes the paper and outlines directions for future research.

2. Literature Review

This section reviews the body of literature related to interior space quality evaluation and the methodological foundations of this study. Given that the proposed framework combines computer vision-based image analysis with multi-criteria decision-making, the literature is categorized into three aspects. The first part reviews computational and image-based studies relevant to interior space assessment. The second part examines multi-criteria decision-making methods that support the integration of heterogeneous evaluation indicators. The third part discusses the major dimensions and indicator systems used in previous interior quality evaluation studies. This categorization helps establish the theoretical and methodological basis for the proposed multi-dimensional evaluation framework.

2.1. Architectural Interior Space and Computational Design

With the development of computer graphics, artificial intelligence, and data analytics, architectural and interior space design has gradually shifted from experience-driven practices toward data-oriented and computational processes. Traditional evaluation of interior spaces typically relies on designers’ subjective judgment or user satisfaction surveys, lacking systematic and quantitative assessment methods [15]. In this context, computational design approaches have increasingly emerged as an important technical pathway for spatial performance analysis.

With the rapid advancement of computing technologies and algorithmic methods, computational design has progressively permeated various stages of architectural and interior design, evolving from an auxiliary drafting tool into a key technical means supporting spatial analysis, performance evaluation, and design generation [16,17]. Existing studies indicate that data-driven and computational model–based approaches can effectively enhance the efficiency of spatial performance analysis and design decision-making, providing new technical pathways for the systematic analysis and quantitative evaluation of complex spatial problems [18,19].

Computer vision techniques have also been widely applied in interior design research. In recent years, multilayer perceptrons (MLPs), as a representative feedforward neural network architecture, have been extensively used in architectural spatial performance prediction and design evaluation studies. Related research has mainly focused on the prediction of performance indicators such as building energy consumption, thermal comfort, and carbon emissions [20,21,22,23], as well as quantitative modeling of spatial form and visual aesthetic perception [24,25]. Furthermore, with the emergence of vision–language pre-trained models, CLIP and related models have been introduced into interior space research as tools for extracting high-level visual semantic features, enabling semantic analysis and aesthetic evaluation of architectural and interior scenes [26]. Studies employing CLIP as a high-level image feature extractor or visual–text embedding framework for architectural or interior scene analysis have demonstrated its strong capability to provide rich semantic and aesthetic representations of interior images [27,28,29,30,31,32].

Traditional computer vision tools also play an important role in the analysis of interior design details. For example, OpenCV-based image processing methods have been applied to color feature extraction and recognition in interior images to investigate the influence of color schemes on spatial emotion and perceptual experience [33]. By integrating such techniques with machine learning-based prediction models, previous studies have achieved automated analysis and optimization of interior color schemes, providing quantitative support for interior design decision-making [34]. In the identification of architectural and interior scene elements, deep learning–based object detection algorithms represented by the YOLO series have been demonstrated to be efficient tools [35]. At present, YOLO-based models have been applied to tasks such as furniture recognition, component classification, and scene understanding [36]. A substantial body of research further utilizes machine learning–based object detection to analyze the accuracy of interior layouts and furniture arrangements, automatically assess the rationality of large furniture placement, and identify specific design details in residential spaces [37].

Overall, international research has gradually shifted from single-dimensional performance analysis toward multi-indicator, multimodal, and intelligent comprehensive spatial evaluation. However, existing interior evaluation studies remain insufficient in integrating multiple dimensions, and a unified quantitative evaluation framework specifically tailored to interior design is still lacking.

2.2. Multi-Criteria Decision-Making Methods and Design Evaluation Model Construction

In the field of design evaluation, multi-criteria decision-making (MCDM) methods have become one of the most widely adopted approaches in architectural and engineering evaluation systems due to their capability to handle multiple attributes, conflicting objectives, and subjective judgment factors [38]. Commonly used MCDM methods include the Analytic Hierarchy Process (AHP), the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS), and weighted sum models [39]. To address uncertainty in the evaluation process, emerging approaches such as fuzzy logic and Neutrosophic logic have also attracted increasing attention [40]. AHP is widely applied for indicator weighting and hierarchical structure construction owing to its clear structure and strong interpretability [41], while TOPSIS enables ranking and comparative assessment of design alternatives based on their distances to ideal solutions [42].

In the architectural domain, MCDM methods have been extensively applied to structural safety analysis, sustainability assessment, green building decision-making, and engineering optimization [43,44,45,46]. International studies have proposed two-stage MCDM models based on spherical fuzzy sets (SFSs), integrating Delphi techniques with TOPSIS to screen and rank residential apartment evaluation indicators from the perspective of consumer preferences, thereby extending housing evaluation frameworks under complex and uncertain decision contexts [47]. In interior design research, MCDM approaches have been applied to sustainable furniture design evaluation frameworks [48], public interior space layout optimization, and user satisfaction modeling [49,50,51,52]. However, most existing studies rely heavily on expert judgment or questionnaire data to construct evaluation models, and evaluation dimensions are often limited to a single or a small number of attributes. As a result, these models struggle to systematically characterize the comprehensive quality of interior spaces across multiple dimensions, such as functional performance, health-supportive environments, and visual aesthetics. Moreover, existing studies generally lack automated analysis workflows that integrate data-driven techniques such as computer vision, and scalable quantitative evaluation frameworks based on objective spatial information have not yet been fully established [53].

2.3. Interior Space Design Evaluation Based on Multi-Dimensional Indicators

Evaluation of architectural interior space design is gradually evolving from single-dimensional analysis toward multi-dimensional and multimodal approaches. Existing studies have proposed various indicator systems from perspectives including spatial functionality, visual aesthetics, health-related attributes, and environmental psychology. For example, prior research has examined space utilization efficiency and spatial accessibility from the perspectives of furniture layout and interior spatial configuration [54,55]. At the health-related level, studies have focused on indicators such as lighting conditions, air quality, noise, ventilation, and the configuration of health-related objects to assess their impacts on occupants’ physiological and psychological well-being [56,57,58,59,60,61]. In terms of aesthetics, researchers have developed aesthetic evaluation models through image feature analysis, visual preference quantification, and design element extraction [62,63,64].

The introduction of visual models has further promoted the transition of interior space evaluation from single-dimensional analysis toward integrated multi-dimensional assessment. Studies over the past five years indicate that computer vision– and deep learning–based methods can automatically extract visual features such as spatial layout, compositional relationships, color, and material from interior images, and apply them to the joint evaluation of multiple indicators including functional rationality, aesthetic quality, and emotional perception. This development has contributed to improved objectivity and consistency in evaluation processes [65,66].

Despite growing research on multi-dimensional indicators, existing studies still face three main limitations: the lack of a unified framework integrating functionality, aesthetics, and health-related dimensions; the continued reliance on manual rating scales or localized simulations, with limited large-scale image-based automated analysis; and insufficient attention to real estate display spaces such as sales-office show-units, resulting in the absence of a broadly applicable evaluation system. To address these gaps, this study integrates MCDM theory with computer vision to develop a comprehensive functionality–healthiness–aesthetics evaluation model, providing a technical pathway for quantitative and intelligent interior space assessment.

2.4. Research Gap and Methodological Novelty

To address the above gap, this study develops a data-driven and image-based evaluation framework for residential interior spaces. Computer-vision-based sub-models are used to quantify functionality, healthiness, and aesthetics from interior renderings, and the resulting dimension-level scores are integrated through AHP–TOPSIS to generate a comprehensive score for ranking and grading design schemes.

The methodological novelty of this study lies in three aspects. First, it establishes a unified three-dimensional evaluation structure for interior space quality. Second, it translates abstract evaluation dimensions into computable visual indicators through dedicated sub-models, reducing exclusive reliance on manual scoring and questionnaire-based judgment. Third, it combines computer-vision-based quantitative assessment with AHP–TOPSIS-based multi-criteria fusion in a scalable and reproducible pipeline for large-scale comparison and benchmarking of interior design schemes.

3. Methodology

3.1. Overall Methodological Framework

The overall methodological workflow of the proposed framework is shown in Figure 1. The study consists of five main stages. First, a multidimensional evaluation framework is formulated for interior display units. Second, dataset preparation is conducted by defining the three evaluation dimensions—functionality (FS), healthiness (HS), and aesthetics (AS)—and constructing the FHASID-10K dataset. Third, three computer-vision-based sub-models are developed to generate FS, HS, and AS scores from interior renderings. Fourth, these dimension-specific scores are integrated through an AHP–TOPSIS-based multi-criteria decision-making procedure to obtain the comprehensive evaluation result for each sample. Fifth, the framework is further examined through validation and analysis, including sub-model result validation, GBDT-based indicator contribution prediction, SHAP-based interpretability analysis, and comparison with human subjective scoring. When necessary, model refinement is performed to improve consistency and reliability. Through this process, the proposed framework forms an integrated pipeline from framework formulation and dataset construction to score generation, decision fusion, and validated comprehensive evaluation.

3.2. Performance Criteria

The evaluation dimensions in this study were derived from an extensive review of prior studies on interior space quality, healthy built environments, and visual-perceptual design assessment [7,67,68]. Existing studies generally indicate that interior space quality is multidimensional and, in residential contexts, is commonly reflected in spatial usability, health-supportive environmental conditions, and visual-perceptual experience [19,69]. Based on this literature foundation, the present study defines three first-level dimensions: functionality (FS), healthiness (HS), and aesthetics (AS).

These dimensions were selected because they are both theoretically supported in existing research and visually inferable from interior images, making them suitable for the proposed image-based evaluation framework [55,70]. Specifically, functionality is associated with layout rationality, circulation efficiency, functional organization, and furniture suitability [7,16]; healthiness relates to healthy buildings, indoor environmental quality, and supportive residential environments [10,69]; aesthetics concerns environmental aesthetics, visual preference, style coherence, and perceptual order [19,71].

To further clarify the conceptual scope of each dimension, four second-level indicators were identified for each one based on the literature. Functionality includes space utilization efficiency, circulation accessibility, functional zoning clarity, and furniture scale appropriateness [55,70]. Healthiness includes natural element provision, rest and social-support provision, daily convenience provision, and hygiene and cleanliness assurance [60,67]. Aesthetics includes color harmony, style coherence, material quality and texture, and visual neatness [4,19]. These second-level indicators are used to define the theoretical structure of the framework rather than as independent final scoring variables. The final quantitative inputs to the AHP–TOPSIS model are the three dimension-level scores: FS, HS, and AS. The resulting hierarchical indicator system is summarized in Table 1.

3.3. Multi-Criteria Decision-Making Basis

Interior space quality evaluation is a typical multi-criteria decision-making problem because it involves multiple dimensions with different meanings and relative importance [72,73]. In the present study, functionality, healthiness, and aesthetics jointly determine the overall quality of an interior scheme, but none of these dimensions alone is sufficient to represent comprehensive space quality. A structured decision framework is therefore needed to determine the relative importance of the three dimensions and to synthesize their outputs into a comparable final result.

The Analytic Hierarchy Process (AHP) was selected because it is well suited to hierarchical evaluation systems and provides an interpretable procedure for deriving criterion weights from pairwise judgments. Compared with simple equal weighting, AHP can better reflect the relative importance of different evaluation dimensions and allows the consistency of expert judgments to be tested. This is particularly appropriate for the present study, in which the three dimensions represent conceptually different yet complementary aspects of interior space quality.

The Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) was selected because the final task of this study is to compare and rank a large number of interior design samples according to their overall performance. Compared with simple weighted summation, TOPSIS provides a clearer comparative logic by simultaneously considering the distance of each sample from the positive ideal solution and the negative ideal solution. This makes it suitable for large-scale scheme ranking and grading in a decision-support context.

Accordingly, the proposed framework combines AHP and TOPSIS in a complementary manner. AHP is used to determine the relative weights of the three first-level dimensions, while TOPSIS is used to integrate the dimension-level scores of functionality (FS), healthiness (HS), and aesthetics (AS) into a final comprehensive score. In this way, the framework preserves both the interpretability of expert-informed weighting and the comparative strength of distance-based multi-criteria ranking.

3.4. Comprehensive Scoring Using AHP–TOPSIS

In the present study, AHP and TOPSIS play different but complementary roles: AHP is used to determine the criterion weights of FS, HS, and AS, whereas TOPSIS is used to rank interior samples by integrating the standardized outputs of these three sub-models into a single comprehensive score.

3.4.1. Data Normalization

To integrate the outputs of the three sub-models within a unified AHP–TOPSIS framework, the criterion-level scores of functionality, healthiness, and aesthetics were first organized into a decision matrix. Specifically, for each interior sample i, the three sub-model outputs form a criterion vector:

y_{i} = (F S_{i}, H S_{i}, A S_{i})

(1)

where

F S_{i}

,

H S_{i}

, and

A S_{i}

denote the raw functionality, healthiness, and aesthetics scores of the

i

-th sample, respectively. Collecting all samples yields the initial decision matrix

X = [x_{i j}]_{m \times n}, n = 3

(2)

where m is the number of samples and the three columns correspond to FS, HS, and AS.

Because the three sub-model outputs may differ in numerical range and dispersion, min–max normalization was applied column-wise before TOPSIS integration so as to ensure comparability across criteria. The normalized value is computed as

x_{ij}^{'} = \frac{x_{ij} - \min (x_{j})}{\max (x_{j}) - \min (x_{j})}

(3)

where

x_{i j}

denotes the raw value of the

i

-th sample under the

j

-th criterion, and

x_{i j}^{'}

is the corresponding normalized value. For non-beneficial criteria, the normalization can be expressed as

x_{ij}^{'} = \frac{\max (x_{j}) - x_{ij}}{\max (x_{j}) - \min (x_{j})}

(4)

In the present study, the three criterion-level inputs used in TOPSIS—FS, HS, and AS—are all beneficial attributes, meaning that larger values indicate better performance. Therefore, Equation (3) is applied to all three criteria. After normalization, each sample is represented by the vector

(x_{i 1}^{'}, x_{i 2}^{'}, x_{i 3}^{'})

, corresponding to the normalized scores of FS, HS, and AS, respectively. These normalized criterion values constitute the direct inputs to the subsequent TOPSIS procedure.

3.4.2. AHP-Based Weight Determination

AHP was used to derive the relative weights of the three evaluation dimensions and their associated sub-criteria. Pairwise comparison judgments were collected from 15 respondents, including 7 designers, 5 real estate practitioners, and 3 user representatives, so as to integrate perspectives from design expertise, housing practice, and user experience. To protect personal privacy, individual identities are not disclosed, and only aggregated background information is reported in Table 2.

The judgments were elicited using the Saaty 1–9 scale. One pairwise comparison matrix was constructed for the criterion level (B1–B3), and three additional matrices were constructed for the indicator level under B1, B2, and B3, respectively. The individual judgments were aggregated using the geometric mean method to obtain group decision matrices. The priority vectors were then calculated by the eigenvalue method, and matrix consistency was assessed using the maximum eigenvalue (

λ_{m a x}

), the consistency index (CI), and the consistency ratio (CR). A matrix was considered acceptable when

C R < 0.10

.

Table 3 reports the aggregated pairwise comparison matrix and consistency test results for the criterion level. Functionality (B1) received the highest weight (0.4025), followed by Healthiness (B2) (0.3591) and Aesthetics (B3) (0.2384). Table 4 summarizes the supporting references, local weights, and consistency statistics of the indicator-level sub-criteria. To enhance the readability and traceability of the indicator system, the principal literature basis for the sub-criteria within each first-level dimension is explicitly summarized in Table 4. All matrices satisfied the consistency requirement, indicating that the expert judgments were sufficiently reliable for subsequent AHP–TOPSIS integration. Detailed AHP aggregation results for the indicator-level sub-criteria are reported in Appendix A.

3.4.3. Ideal Solution Definition

In TOPSIS, the performance of each alternative is evaluated according to its relative proximity to the positive ideal solution (PIS) and remoteness from the negative ideal solution (NIS). In the present study, each interior sample is treated as an alternative, and the normalized decision matrix is given by

\begin{matrix} X^{'} = [x_{i j}^{'}]_{m \times n}, n = 3 \end{matrix}

(5)

where the three criteria correspond to functionality, healthiness, and aesthetics.

Since all three criteria are beneficial, the PIS is defined as the best attainable combination of criterion values across all samples, that is, the column-wise maxima of the normalized matrix:

\begin{matrix} A^{+} = \{\underset{i}{m a x} (x_{i j}^{'}) ∣ j = 1,2, \dots, n\} \end{matrix}

(6)

Similarly, the NIS is defined as the worst attainable combination, namely the column-wise minima:

\begin{matrix} A^{-} = \{\underset{i}{m i n} (x_{i j}^{'}) ∣ j = 1,2, \dots, n\} \end{matrix}

(7)

For the present three-criterion setting, the above definitions can be written explicitly as

A^{+} = (A_{F S}^{+}, A_{H S}^{+}, A_{A S}^{+})

(8)

and

A^{-} = (A_{F S}^{-}, A_{H S}^{-}, A_{A S}^{-})

(9)

where

A_{F S}^{+}

,

A_{H S}^{+}

, and

A_{A S}^{+}

denote the maximum normalized values of FS, HS, and AS, respectively, and

A_{F S}^{-}

,

A_{H S}^{-}

, and

A_{A S}^{-}

denote the corresponding minima. In this way, the ideal solutions are determined directly from the standardized outputs of the three sub-models and provide the reference points for subsequent distance calculation.

3.4.4. Distance and Comprehensive Score Calculation

After the normalized decision matrix and the two ideal solutions are obtained, the AHP-derived criterion weights are incorporated into TOPSIS to reflect the relative importance of the three evaluation dimensions. Let

w = (w_{F S}, w_{H S}, w_{A S})

(10)

denote the weight vector of the three criterion-level scores. According to the AHP results reported in Table 3, the weights of functionality, healthiness, and aesthetics are 0.4025, 0.3591, and 0.2384, respectively.

The weighted Euclidean distance from the

i

-th sample to the PIS is calculated as

D_{i}^{+} = \sqrt{\sum_{j = 1}^{n} w_{j} (x_{i j}^{'} - A_{j}^{+})^{2}}

(11)

and the weighted Euclidean distance to the NIS is

D_{i}^{-} = \sqrt{\sum_{j = 1}^{n} w_{j} (x_{i j}^{'} - A_{j}^{-})^{2}}

(12)

For the present three-criterion setting, these two distances can be written explicitly as

D_{i}^{+} = \sqrt{w_{F S} (x_{i 1}^{'} - A_{F S}^{+})^{2} + w_{H S} (x_{i 2}^{'} - A_{H S}^{+})^{2} + w_{A S} (x_{i 3}^{'} - A_{A S}^{+})^{2}}

(13)

D_{i}^{-} = \sqrt{w_{F S} (x_{i 1}^{'} - A_{F S}^{-})^{2} + w_{H S} (x_{i 2}^{'} - A_{H S}^{-})^{2} + w_{A S} (x_{i 3}^{'} - A_{A S}^{-})^{2}}

(14)

where

x_{i 1}^{'}

,

x_{i 2}^{'}

, and

x_{i 3}^{'}

are the normalized FS, HS, and AS scores of the

i

-th sample, respectively.

The final comprehensive score is then expressed by the closeness coefficient:

\begin{matrix} C S_{i} = \frac{D_{i}^{-}}{D_{i}^{+} + D_{i}^{-}} \end{matrix}

(15)

where

C S_{i} \in [0,1]

. A larger

C S_{i}

indicates that the corresponding sample is closer to the positive ideal solution and farther from the negative ideal solution, and therefore exhibits better overall interior space quality. All samples are finally ranked according to

C S_{i}

to support comparative evaluation across different space types and design styles.

4. Framework Implementation

4.1. Dataset

To validate the proposed multi-dimensional evaluation framework for interior spaces, we constructed a large-scale interior rendering dataset named the Functionality–Healthiness–Aesthetics Spatial Interior Dataset-10K (FHASID-10K). The dataset was developed specifically for this study because the framework requires systematic comparison across both functional space types and interior design styles, whereas existing public interior image datasets are mainly designed for scene recognition, object detection, or semantic understanding and do not provide a dual-label structure aligned with the present evaluation task. Therefore, a task-oriented dataset was needed to support large-scale scoring, cross-category comparison, and framework-level validation.

FHASID-10K contains 13,962 interior rendering images annotated along two dimensions: seven typical residential functional space types and seven mainstream interior design styles. Their combinations form 49 style-space categories, providing a structured basis for comparative analysis across different residential interior scenarios. Representative examples are shown in Figure 2.

The images were collected from several mainstream online interior design platforms, including 3d66, zjusteasy, and znzmo, using web crawling techniques. Only publicly accessible, high-quality residential renderings were retained. More than 20,000 candidate images were initially collected, and a multi-stage cleaning process was applied to remove visually unclear samples, duplicated or near-duplicated images, samples with inconsistent stylistic presentation, and images that did not match the intended keywords or category definitions. After screening, 13,962 images were retained for subsequent analysis.

The labeling procedure was semi-automated. Preliminary labels for functional space and design style were first assigned according to webpage metadata, including search keywords, page titles, and source descriptions. These labels were then manually reviewed and corrected by the research team, while ambiguous cases were checked against predefined category definitions to ensure consistency. Images with mixed styles, unclear functional attributes, or insufficient visual evidence were excluded. This keyword-based pre-screening plus manual verification strategy improved both annotation efficiency and labeling reliability.

To enhance dataset validity, the selected categories were restricted to common residential show-unit scenarios frequently encountered in interior design practice. This ensured alignment with the analytical units of the proposed framework, namely functional space types, design styles, and their combinations. FHASID-10K is composed of professionally produced renderings rather than real photographs, which improves visual consistency for large-scale benchmarking but may limit direct generalization to real-world interiors with stronger noise, occlusion, and domain variability.

FHASID-10K was not used as a unified end-to-end supervised dataset for jointly training all three sub-models. Instead, it serves as a task-specific image set for batch scoring, cross-category comparison, and framework-level validation. Therefore, no single global training/validation/test split was defined for the full dataset. A formal data split was introduced only in the GBDT-based surrogate validation stage, where the data were divided into training and test sets at an 80/20 ratio, followed by five-fold cross-validation on the training set.

4.2. Experimental Design and Workflow

The experimental procedure consisted of four steps. First, FHASID-10K was constructed and organized by functional space type and design style. Second, the three sub-models were applied to each image to obtain functionality (FS), healthiness (HS), and aesthetics (AS) scores. Third, the three dimension-level scores were normalized and integrated through AHP–TOPSIS to generate the comprehensive score (CS). Finally, the resulting scores were analyzed through cross-category comparison, distribution analysis, and validation tests, including GBDT-based surrogate prediction, SHAP-based interpretability analysis, subjective agreement assessment, and sensitivity analysis.

4.3. Sub-Model Construction

The three sub-models were implemented using different technical strategies according to the target task. The functionality sub-model adopts OpenCV-based image processing and rule-based geometric analysis to quantify spatial utilization and circulation-related characteristics, because these indicators depend mainly on explicit spatial structure rather than semantic recognition. For the healthiness dimension, YOLOv5 was used to detect visually identifiable health-supportive elements, offering a practical balance between detection accuracy, inference efficiency, implementation maturity, and reproducibility. For the aesthetics dimension, OpenAI CLIP was adopted because aesthetic evaluation depends more on high-level visual-semantic representation than on simple object recognition. Compared with conventional CNN-based models trained on closed visual categories, CLIP provides stronger semantic alignment and transferability across diverse interior styles and scenes. Overall, these tools were selected for their task relevance, interpretability, transferability, and reproducibility within the proposed framework.

4.3.1. Functionality Sub-Model (FS)

To quantify the rationality and accessibility of space use in interior sample-room renderings, we develop a functionality sub-model (FS) by decomposing functionality into two complementary components: space utilization (SU) and accessibility (ACC). SU characterizes the degree of spatial occupancy and layout compactness, whereas ACC reflects the potential obstruction of circulation paths caused by object distribution. To keep the model structure lightweight and interpretable, equal weights are assigned to the two components and linearly fused to produce the overall functionality score:

FS = w_{SU} \cdot SU + w_{ACC} \cdot ACC, w_{SU} = w_{ACC} = 0.5

(16)

(1): Image preprocessing and binary occupancy mask

All input renderings are first standardized through preprocessing. Each image is converted to RGB format and processed using an OpenCV pipeline, and then transformed into grayscale to reduce computational complexity. A fixed threshold T = 200 is applied for binarization: pixels with grayscale values greater than T are treated as blank background. The result is then inverted to obtain a binary occupancy mask B∈{0,1}, where the foreground (white) denotes effective occupied regions in the space (e.g., furniture and functional components), and the background (black) denotes relatively empty areas. This binary mask is used to compute the occupancy ratio and obstacle complexity.

(2): Space utilization score (SU)

The space utilization score SU is defined based on the foreground pixel proportion R in the binary mask:

R = \frac{∣ {(x, y) ∣ B (x, y) = 1} ∣}{∣ Ω ∣}

(17)

where ∣Ω∣ is the total number of pixels in the image. Considering that overly low occupancy may indicate insufficient functional provisioning, whereas overly high occupancy may imply congestion and irrational layouts, the ideal occupancy ratio is set to R = 0.5. A symmetric penalty function is used to penalize deviations from this ideal, yielding:

SU = \max (0, 100 - 200 \cdot ∣ R - 0.5 ∣)

(18)

This formulation achieves the maximum score at R = 0.5 and decreases linearly as the occupancy ratio deviates from the ideal value. The score is bounded within the range of 0–100.

(3): Accessibility score (ACC)

The accessibility score ACC is designed to capture the influence of potential obstacles on circulation organization. Contours are extracted from the binary mask, and the number of detected contours is denoted as N. In general, a larger number of contours implies more fragmented object boundaries and more frequent potential obstacles, thereby increasing the risk of impeded movement. Accordingly, ACC is defined as a decreasing function of N, with a lower bound to avoid instability caused by extremely low scores:

ACC = \max (50, 100 - 2 N)

(19)

when the number of contours is large, ACC is constrained to a minimum of 50 to improve robustness to extreme cases.

(4): Overall functionality score (FS)

Finally, the overall functionality score FS is obtained by equally weighting and linearly combining SU and ACC:

FS = 0.5 \cdot SU + 0.5 \cdot ACC

(20)

All functionality-related scores are normalized to the range of 0–100 and reported with two decimal places. They are then used as the functionality-dimension inputs for subsequent multi-dimensional integration and validation in the comprehensive evaluation model. Figure 3 presents the workflow of the proposed functionality sub-model.

It should be noted that the FS sub-model is a rule-based computational module rather than a trainable machine learning model. Its reproducibility is ensured through explicit algorithmic rules, fixed processing steps, and deterministic parameter settings. Specifically, the fixed threshold was adopted because the dataset consists of professionally produced interior renderings with relatively consistent brightness and limited visual noise; the reference occupancy ratio, denoted by

R^{*}

, was set to 0.5 as a balanced midpoint between under-furnished and over-crowded layouts; and the contour-based accessibility term was intended as a lightweight proxy for layout fragmentation and potential circulation obstruction rather than a direct simulation of pedestrian movement.

4.3.2. Healthiness Sub-Model (HS)

Within the comprehensive evaluation of interior spaces, healthiness (HS) is used to characterize the configuration of environmental elements that are relevant to physical and mental well-being, daily routines, and hygiene assurance. To enable computable feature extraction from interior renderings, we define a health-related object set by leveraging the predefined object categories in the COCO dataset and aligning them with typical space-use behaviors and health-related scenarios. This object set covers: (i) nature and affective restoration elements (e.g., potted plant, vase); (ii) cognitive and psychological restoration cues (e.g., book, tv); (iii) diet- and nutrition-related objects (e.g., dining table, bowl, fruits); (iv) rest- and social-supporting furniture (e.g., chair, couch, bed); and (v) hygiene- and cleaning-related facilities (e.g., sink, toilet). Together, these elements constitute the visual and quantifiable basis for representing the healthiness dimension. Based on this mapping, we establish a unified health-related visual label set (Table 5), which includes 18 target categories.

To operationalize the HS dimension, we construct a healthiness sub-model based on object detection and rule-based score aggregation. Specifically, a YOLOv5 detector implemented in the Ultralytics framework with publicly available COCO-pretrained weights (e.g., yolov5s.pt) is used to identify health-related elements in interior renderings. Owing to its favorable balance between inference efficiency and detection performance across 80 common object categories, the model is well suited for batch processing of large-scale image samples. For each image, the detector outputs bounding boxes, class indices, and confidence scores, which are then mapped to category names and matched against the predefined health-related label set. The HS value is computed by counting the matched health-related objects and assigning 10 points to each detected item, with the final score capped at 100. Since the detector is employed directly for inference rather than retrained or fine-tuned on a task-specific interior-health dataset, training-specific indicators such as iteration numbers, loss curves, and task-specific mAP are not applicable in the present study. Through this process, the HS sub-model converts object-level semantic cues into a computable and comparable quantitative indicator of health-related environmental support.

For scoring, the healthiness score (Health Score, HS) is defined as a linear accumulation based on the presence of health-related categories. Let N_h denote the number of detected categories that belong to the health-related object set in a given image. The healthiness score is computed as:

HS = \min (100, 10 \times N_{h})

(21)

This formulation provides an intuitive measure of the richness of health-related elements in the space, while the upper bound ensures comparability and score stability across samples. The resulting

HS

is normalized to the range of 0–100 and used as the healthiness-dimension input for subsequent multi-dimensional integration and validation in the comprehensive evaluation model. Figure 4 illustrates the workflow of the proposed healthiness sub-model.

4.3.3. Aesthetics Sub-Model (AS)

The aesthetics dimension aims to characterize the visual quality and overall perceptual impression of interior renderings. Because aesthetic evaluation is inherently subjective and difficult to express through explicit rule-based criteria, the aesthetics sub-model (AS) adopts a CLIP-based feature extraction and regression framework. Specifically, a pretrained CLIP model (ViT-L/14) is used to extract image features, which are then fed into an MLP regressor to generate continuous aesthetic scores. In the present study, this module is used as a pretrained inference component rather than being retrained within the current framework.

(1): Feature extraction with CLIP

Each input image is converted to RGB format and processed using the official CLIP preprocessing pipeline. The image is then encoded by CLIP to obtain a feature vector f, which is further L2-normalized and used as the input to the regression module.

(2): Aesthetic regression with an MLP

A PyTorch-based MLP regressor is used to predict a continuous aesthetic score from the CLIP feature vector. The network adopts a 768–1024–128–64–16–1 architecture with ReLU activations in the hidden layers and a single-neuron output layer for scalar regression. The corresponding trained weight file is directly loaded to form the complete feature-regression inference pipeline. Since the present study uses this regressor as a pretrained inference module, training-specific settings and validation indicators such as optimizer configuration, epoch number, and validation-set R² and MSE are not reported as newly generated results in this work.

(3): Score normalization

To ensure comparability across samples, the raw regressor output S is first constrained to an empirical interval

[S_{\min}, S_{\max}]

, set to

[0.13, 0.19]

in this study, and then linearly scaled to the range

[0, 10]

:

AS = 10 \cdot \frac{clip (S, S_{\min}, S_{\max}) - S_{\min}}{S_{\max} - S_{\min}}

(22)

The resulting AS is used as the aesthetics-dimension input for subsequent integration and validation. To improve reproducibility, random seeds are fixed and deterministic settings are adopted during model inference. Figure 5 illustrates the workflow of the aesthetics sub-model.

To improve the transparency and reproducibility of the implementation, the main software tools used for the three sub-models are reported here. These include Python (version 3.10.16), OpenCV (version 4.13.0.92), NumPy (version 2.2.5), pandas (version 2.2.3), Pillow (version 11.2.1), the Ultralytics framework for YOLOv5 inference (version 8.3.230), PyTorch (version 2.9.1), and the CLIP model (ViT-L/14, OpenAI).

5. Results and Validation

5.1. Results of Functionality Scoring

Figure 6a reports the mean functionality scores (FS) across seven space types under different design styles. Overall, the average FS values fall within a relatively narrow band across both space types and styles, indicating that the proposed functionality sub-model yields stable scoring behavior on the large-scale dataset. Among the seven space types, entranceways exhibit a slightly higher mean FS, which is consistent with their role as circulation-intensive transition areas that typically favor compact layouts and clearer movement paths. In contrast, the remaining space types show comparable mean FS levels, suggesting that the dataset contains broadly balanced functional configurations.

Across design styles, the mean FS differences are generally modest compared with the differences observed across space types. This implies that functionality is more strongly associated with spatial typology and layout organization than with stylistic expression, which primarily influences visual appearance rather than circulation structure. Meanwhile, the variation reflected by the error bars indicates that the framework can capture functional diversity within each space type, likely arising from differences in layout compactness and circulation obstruction.

Figure 6b further illustrates the joint distribution of space utilization (SU) and accessibility (ACC), with the color/size encoding reflecting the resulting FS. Higher FS values are predominantly associated with simultaneously higher SU and higher ACC, whereas samples with constrained accessibility tend to exhibit lower FS even when utilization is relatively high. This pattern is consistent with the model design—FS is explicitly constructed from SU and ACC—and aligns with common interior design principles that emphasize both effective space use and unobstructed circulation.

5.2. Results of Healthiness Scoring

Healthiness scores are derived from detected visual elements related to environmental cues, psychological restoration, diet and nutrition, social support, and hygiene facilities. Figure 7 shows clear distributional differences in healthiness scores across space types, reflecting variations in the composition of health-related visual elements. Dining rooms and living rooms exhibit a higher proportion of samples in the high-score range; this is mainly because these spaces typically contain more positively associated health-related elements—such as dining tables, chairs, fruits and televisions—which not only support diet and nutrition needs but also facilitate leisure and social interaction.

Living rooms, dining rooms, and kitchens generally exhibit higher score distributions, while entrance halls, bedrooms, and bathrooms tend to show lower or moderate values. These differences are consistent with the functional characteristics and typical element configurations of each space type.

As illustrated in Figure 8, healthiness scores within each space category vary smoothly along ranked samples, with relatively narrow confidence intervals. This indicates stable scoring behavior and suggests that the healthiness sub-model produces consistent outputs across large-scale samples without abrupt fluctuations.

5.3. Results of Aesthetics Scoring

Figure 9 illustrates the distribution of aesthetic scores across different functional space types. The predicted scores exhibit a continuous distribution and are mainly concentrated in the mid-to-high range, indicating stable predictive behavior under large-sample conditions. Entrance halls tend to achieve higher aesthetic scores, while bedrooms show relatively lower and more stable distributions. This suggests that the model produces stable outputs for spaces with stronger privacy constraints and less compositional variation, while their overall aesthetic scores tend to be lower than those of more “refined” spaces with richer functional and visual elements.

The heatmaps in Figure 10 further reveal systematic differences in mean aesthetic scores across space–style combinations on three major design platforms. Across all platforms, entrance halls consistently receive higher scores, whereas bedrooms exhibit lower values. These trends are consistent with common residential design logic and visual perception patterns.

Overall, the results indicate that the proposed aesthetics model captures both global aesthetic tendencies and space-specific variations, while maintaining stable relative ranking behavior across large-scale samples.

5.4. Comprehensive Score Results and Visualization

Figure 11 summarizes the distribution characteristics of the comprehensive score (CS) derived from the AHP–TOPSIS integration. As shown in Figure 11a, the CS values of the first 200 samples remain largely within 0.3–0.6 and exhibit only local oscillations, indicating stable behavior without abrupt jumps or abnormal collapses. Based on predefined thresholds, Figure 11b reports that the medium-quality group (0.4 ≤ CS ≤ 0.7) accounts for 51.4% of samples, followed by the low-quality group (CS < 0.4, 48%), while high-quality cases (CS > 0.7) are rare (0.6%). Figure 11c further shows that CS values over all 13,962 renderings are mainly concentrated in the interval of 0.35–0.50 and form an approximately normal-shaped distribution, suggesting a continuous and well-calibrated scoring gradient suitable for large-scale ranking and stratification. Overall, these results indicate that the proposed comprehensive scoring model produces stable, continuous, and interpretable scores and avoids overestimating the upper tail, supporting its use for benchmarking interior design alternatives across space types and design styles.

Furthermore, Figure 12 presents a correlation heatmap between the three criterion-level indicators and the comprehensive score, providing a statistical view of the internal relationship structure among model dimensions. The results show that the pairwise correlation coefficients among functionality (FS), healthiness (HS), and aesthetics (AS) are all close to zero

(∣ r ∣< 0.15)

, indicating that these indicators remain relatively independent within the evaluation system and that no pronounced information redundancy or dimension coupling is present. On this basis, the comprehensive score CS_i exhibits a strong positive correlation with healthiness

(r= 0.80)

, a moderate positive correlation with functionality

(r= 0.50)

, and a comparatively weak correlation with aesthetics

(r= 0.18)

. These results suggest that variations in the comprehensive score are primarily driven jointly by the healthiness and functionality dimensions, whereas aesthetics plays a more complementary role. Statistically, this pattern supports the rationality of the AHP-derived weight allocation and the overall model structure.

Taken together, these results indicate that the proposed AHP–TOPSIS model yields stable and interpretable score distributions, with low indicator redundancy and broad consistency with the intended weighting structure.

5.5. Model Validation and Robustness Assessment

To evaluate the reliability and robustness of the proposed AHP–TOPSIS comprehensive evaluation model, a multi-layer validation strategy was adopted, including GBDT-based surrogate prediction [74], SHAP-based interpretability analysis, questionnaire-based subjective agreement testing [75], and sensitivity analysis of key parameter settings. Together, these analyses provide complementary evidence for the stability, interpretability, and plausibility of the proposed framework.

The validation, statistical analysis, sensitivity analysis, and visualization procedures in this section were implemented in Python (version 3.10.16) using NumPy (version 2.2.5), pandas (version 2.2.3), matplotlib (version 3.10.3), seaborn (version 0.13.2), scikit-learn (version 1.7.2), SciPy (version 1.15.3), and SHAP (version 0.49.1). In addition, the sensitivity analysis of heuristic parameter settings in the functionality sub-model was conducted using OpenCV (version 4.13.0.92) and Pillow (version 11.2.1) for image processing and batch recalculation of FS scores.

5.5.1. Surrogate Learning Check Using Gradient Boosting Decision Trees (GBDT)

A GBDT model was used as a surrogate to examine whether the AHP–TOPSIS fusion produces a stable and learnable mapping from FS, HS, and AS to the comprehensive score. Because the comprehensive score is deterministically derived from these three indicators, the analysis focuses on predictive stability, residual behavior, and consistency with the intended weighting structure.

As shown in Figure 13a, the residuals are centered close to zero without evident heteroscedasticity or structured bias, suggesting stable surrogate approximation across the score range. The calibration plot in Figure 13b further shows close agreement between the surrogate predictions and the AHP–TOPSIS scores. Quantitatively, the model achieves high test-set performance (R² = 0.9992; MSE = 0.000011) and maintains highly consistent results in five-fold cross-validation (mean R² = 0.9991), which is consistent with the deterministic nature of the fusion rule.

Figure 13c compares GBDT-derived feature importance with AHP weights. Although the two are not expected to match exactly in magnitude, both indicate stronger contributions from functionality and healthiness, while aesthetics shows a smaller contribution. This result provides a plausibility check that the implemented fusion broadly follows the intended prioritization.

5.5.2. SHAP-Based Interpretability Analysis and Feature Contribution

To improve interpretability, SHAP was introduced to analyze how individual indicators relate to the GBDT surrogate predictions. By estimating the marginal contribution of each feature across samples, SHAP provides an interpretable audit of the integrated evaluation results.

Figure 14a presents the global SHAP summary. Across FS, HS, and AS, higher indicator values generally correspond to larger SHAP contributions, indicating that each dimension tends to increase the predicted comprehensive score in the expected direction. No obvious contradictory trends are observed within the data range, suggesting a coherent monotonic response pattern. In terms of relative influence, FS and HS show larger SHAP magnitudes than AS, which is broadly consistent with the AHP-derived weighting structure.

Figure 14b further shows SHAP dependence patterns for the three criterion-level indicators. The SHAP values generally increase with the corresponding indicator values, supporting the monotonic contribution assumption embedded in the evaluation design. Compared with AS, FS and HS exhibit larger effect ranges, whereas AS shows a weaker but still predominantly positive influence. Overall, the SHAP results complement the quantitative validation by making contribution directions and relative influence levels explicit.

5.5.3. Agreement Between Model Scores and Subjective Ratings

To further examine agreement between model outputs and human spatial perception, subjective ratings were introduced as a third-layer validation. Based on the model-derived comprehensive scores, 21 representative interior show-unit renderings were selected from three quality levels (low, medium, and high) to construct a questionnaire for subjective evaluation. To reduce order effects, the 21 images were presented in an interleaved sequence across quality levels (Table 6).

Subjective evaluation data were collected through an online questionnaire, yielding 175 valid responses, including 38 experts, 38 industry practitioners, and 99 general users. Participants rated each interior rendering independently across four dimensions: functionality, healthiness, aesthetics, and overall spatial quality. To evaluate model validity, agreement between model outputs and subjective ratings was assessed at both the image level and the individual-rating level (Figure 15). At the image level, mean subjective ratings were computed for each image and compared with the corresponding model scores using Pearson correlation analysis.

The results show clear dimension-dependent differences. Functionality exhibits a moderate positive correlation with subjective ratings (r = 0.513, p = 0.017), suggesting that the functionality sub-model can partially reflect human judgments of spatial usability. Healthiness shows a higher level of agreement (r = 0.629, p = 0.002), indicating relatively stable identification of health-related spatial cues. In contrast, aesthetics shows a weak and non-significant correlation (r = 0.232, p = 0.312), reflecting the stronger influence of individual preference and style heterogeneity. At the comprehensive-score level, the model-derived comprehensive score (CS) is strongly correlated with subjective overall-quality ratings

(r = 0.760, p < 0.001)

, indicating that the weighted fusion of multi-dimensional information improves agreement with holistic human perception while preserving dimension-specific differences.

5.5.4. Sensitivity Analysis of Heuristic Parameter Settings in the Functionality Sub-Model

To examine whether the FS sub-model was overly dependent on heuristic parameter choices, a sensitivity analysis was conducted on the binarization threshold T, the reference occupancy ratio R*, and the contour penalty coefficient k. Specifically, T was varied from 180 to 220 around the baseline value of 200, R* was varied from 0.40 to 0.60 around the baseline value of 0.50, and k was varied from 1.5 to 2.5 around the baseline value of 2.0, while the remaining settings were kept unchanged. For each perturbed setting, the FS values of all 13,962 samples were recalculated and compared with the baseline results in terms of Spearman rank correlation and top-10% sample overlap.

As shown in Figure 16, perturbations of R* and k produced highly consistent results with the baseline, indicating that the FS ranking remained stable under moderate changes in these two parameters. By contrast, threshold perturbation led to more noticeable variation, suggesting that the fixed threshold should be interpreted as a dataset-calibrated parameter for the present rendering-based evaluation context rather than as a universal threshold. Overall, the sensitivity analysis supports the current FS formulation while clarifying the scope of its heuristic parameter settings.

5.5.5. Sensitivity Analysis of AHP-Derived Criterion Weights

To examine the robustness of the AHP-derived weighting scheme, a one-factor-at-a-time sensitivity analysis was performed on the three criterion weights of functionality (FS), healthiness (HS), and aesthetics (AS). Specifically, each weight was individually perturbed by ±10% and ±20% relative to its baseline value, while the remaining two weights were proportionally adjusted to maintain a total weight of 1. Under each perturbed scenario, the comprehensive score CSi of all 13,962 samples was recalculated using the same min–max normalization and TOPSIS procedure as in the baseline model.

As shown in Figure 17, the perturbed results remained highly consistent with the baseline across all tested scenarios. The Spearman rank correlation ranged from 0.949 to 1.000, and the top-10% sample overlap ranged from 0.790 to 1.000, indicating that the overall ranking structure and the set of high-quality samples remained relatively stable under moderate weight perturbations. Overall, these results support the robustness of the proposed AHP–TOPSIS framework.

6. Discussion

6.1. Structural Behavior of the Multi-Dimensional Scoring Framework

The proposed framework produced continuous and differentiable score distributions across FHASID-10K, suggesting that the AHP–TOPSIS fusion stage generated a stable ranking gradient rather than amplifying fluctuations from any single sub-model. This is important because the framework is intended for comparative screening and stratified benchmarking of large numbers of design alternatives rather than binary classification. The strong correlation between CS and HS, together with the moderate correlation between CS and FS, further indicates that the final ranking is mainly driven by dimensions with higher decision weights and clearer visual cues in the dataset [76].

However, the near-zero pairwise correlations among FS, HS, and AS should not be interpreted only as evidence of good indicator independence. A more cautious explanation is that these low correlations partly arise from the modular architecture itself. In the current implementation, FS is derived from grayscale occupancy and contour-based circulation proxies, HS from counts of predefined detected object categories, and AS from CLIP features followed by a shallow regression head. Because these sub-models rely on largely separate representations and no joint feature-learning mechanism is introduced, cross-dimensional spatial relationships are only weakly captured by design. As a result, the integrated CS is structurally interpretable, but its fusion is driven more by explicit weighting than by learned inter-dimensional interactions. The framework should therefore be understood as a structured multi-criteria aggregation system rather than a deeply coupled unified representation model.

This does not invalidate the framework, but it clarifies its methodological boundary. Its main advantages are modularity, transparency, and ease of replacement of sub-models. Its limitation is that potentially meaningful couplings, such as those between openness and visual order or between restorative furnishing and perceived comfort, are not directly encoded. Future work could explore cross-dimensional feature fusion, multi-task learning, or graph-based relation modeling to determine whether the observed low inter-dimensional correlations reflect genuine conceptual independence or the decoupled architecture of the current pipeline.

6.2. Why Did the Aesthetics Dimension Underperform?

The aesthetics sub-model is the weakest component of the current framework. Its correlation with the comprehensive score is low

(r= 0.18)

, and its agreement with human ratings is also weak and non-significant

(r = 0.232, p = 0.312)

. Although this can partly be attributed to the subjectivity of aesthetic perception and stylistic heterogeneity [19,77], the results also suggest limitations in the current model design.

First, the CLIP encoder was used as a generic pretrained feature extractor without domain-specific fine-tuning for interior aesthetic assessment. While CLIP provides semantically rich embeddings, semantic alignment does not necessarily translate into aesthetic discrimination [28,32]. Interior aesthetic judgment often depends on subtle cues such as proportion, material coordination, lighting atmosphere, compositional balance, and style-specific consistency, which may not be fully captured by a directly transferred general-purpose encoder.

Second, the regression head is relatively simple. In the current implementation, CLIP embeddings are mapped to a single scalar through an MLP. This design is computationally efficient, but it may be insufficient for modeling the complex and nonlinear structure of aesthetic perception, which often reflects interactions among style coherence, color relationships, furnishing density, lighting impression, and emotional atmosphere. Third, the AS pipeline is style-agnostic. Because the dataset spans seven design styles while the model assumes a shared aesthetic standard, agreement with human ratings may be reduced when evaluators apply partly different criteria across styles. Fourth, the normalization strategy may further compress discriminative variance if the raw score distribution is already narrow, thereby weakening the effective contribution of AS in the downstream fusion stage.

Taken together, the weak performance of AS reflects not only subjective preference heterogeneity but also an under-specified model for the aesthetic task. Future improvements should focus on fine-tuning the visual encoder on interior-specific aesthetic data, adopting a stronger prediction head or ranking-based objective, and introducing style-aware or conditional scoring mechanisms.

6.3. Reproducibility and Future Reuse of the Framework

Because the proposed framework is data-driven, its outputs depend on both dataset composition and implementation settings. Reproducibility should therefore be regarded as a prerequisite for future reuse. In this study, reproducibility refers to obtaining consistent results under the same dataset protocol, model settings, and analytical procedures, whereas replicability concerns whether similar conclusions can be reached on newly collected but comparable data.

To support reproducibility, future studies should be able to access or reconstruct the core dataset protocol, including sample sources, screening criteria, category definitions, model versions, random seeds, sub-model inference procedures, and AHP–TOPSIS settings. In the present study, the main information on data construction and model implementation has been reported in the manuscript, while detailed code and configuration settings can be made available by the corresponding author upon reasonable request. This improves the auditability, reuse potential, and extensibility of the framework.

The framework is also potentially reusable beyond the present task because of its modular structure. Similar visual-analysis pipelines have been applied to related built-environment tasks, such as urban façade analysis and material recognition. For example, building-façade datasets have supported deep-learning-based extraction of building characteristics from street-view imagery, demonstrating that structured visual indicators can be derived at scale for urban-form interpretation [78]. Likewise, transformer-based domain-adaptation methods have been used for automated detection of exterior cladding materials in street-view images, suggesting that visual models can be extended across related built-environment tasks when domain shift is explicitly addressed [79].

6.4. Generalizability to Real-World Interiors, Practical Implications, and Limitations

An important boundary of this study is that FHASID-10K is constructed from interior renderings rather than real photographs. This choice is appropriate for controlled benchmarking because renderings provide relatively clean, high-resolution, and style-explicit samples. However, they differ from real interiors in lighting noise, camera distortion, object occlusion, material aging, decoration irregularity, and occupancy traces. The current results should therefore not be directly assumed to generalize to real residential scenes without further validation.

This limitation is especially relevant to the aesthetics module, which is likely to be more sensitive to domain shift, but it also affects FS and HS. For functionality, renderings usually present cleaner layouts and more idealized viewpoints, which may overestimate the stability of occupancy- and contour-based proxies. For healthiness, object visibility is often more standardized than in photographs, which may make detection-based scoring appear more stable than it would be in lived-in environments. Accordingly, the present framework should be interpreted as a reproducible evaluation system for rendering-based comparative analysis rather than as a fully validated tool for real-scene post-occupancy assessment.

Even with this limitation, the framework retains practical value. It is suitable for early-stage scheme comparison, precedent benchmarking, batch screening of real-estate show units, and AI-assisted decision support in rendering-based design workflows [66,70]. Future work should validate the framework on mixed datasets containing both renderings and photographs, examine cross-domain calibration strategies, and test whether agreement with human ratings can be improved when the sub-models are adapted to real-scene data.

7. Conclusions

This study proposed an automated decision-support framework for interior space quality evaluation by integrating three computer vision–based sub-models with AHP–TOPSIS decision fusion. The framework combines functionality, healthiness, and aesthetics into a unified evaluation pipeline and enables quantitative, interpretable, and scalable assessment of residential interior schemes from visual data. The results indicate that the proposed method can provide stable comprehensive scores and meaningful differentiation across space types and design styles, while the validation analyses further support the robustness and interpretability of the framework.

From a practical perspective, the framework can support large-scale benchmarking and comparative analysis of interior design schemes in a more efficient and reproducible manner. It may assist designers, developers, and decision-makers in identifying strengths and weaknesses of design proposals at early stages, thereby supporting evidence-based optimization of spatial layout, health-related environmental support, and visual quality. In this sense, the framework is not only an evaluation tool but also a practical aid for design screening and iterative improvement.

At the same time, several limitations should be acknowledged. The current study is based on rendering images rather than real photographs, which may limit direct generalization to real-world interior environments with stronger noise, occlusion, and domain variability. In addition, the aesthetics dimension remains more sensitive to perceptual subjectivity and model-related constraints than the other two dimensions. Accordingly, the present framework should be interpreted as a reproducible and extensible evaluation system for rendering-based comparative assessment, rather than as a fully validated real-world deployment tool.

Future research can further strengthen the framework in several directions. First, the method should be validated on real photographic datasets and mixed-domain datasets to test its robustness under practical conditions. Second, domain adaptation and transfer-learning strategies can be introduced to improve cross-domain generalizability from renderings to real interior scenes. Third, more advanced visual models may be incorporated to enhance the robustness of aesthetic evaluation. Finally, the modular structure of the framework suggests potential extension to other built-environment tasks, such as architectural façade analysis, urban visual environment assessment, and broader design decision-support applications.

Author Contributions

Conceptualization, Y.W., Z.Z. and X.G.; Methodology, Y.W. and Z.Z.; Data curation, Y.W.; Formal analysis, Y.W.; Investigation, Y.W.; Visualization, Y.W.; Writing—original draft preparation, Y.W.; Validation, Z.Z.; Supervision, Z.Z. and X.G.; Resources, X.G.; Project administration, X.G.; Funding acquisition, X.G.; Writing—review and editing, Z.Z. and X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to data licensing and distribution restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Aggregated Pairwise Comparison Matrices and Local Weights

Table A1. Aggregated pairwise comparison matrices, local weights, and consistency-test results for the indicator-level sub-criteria under B1 (Functionality), B2 (Healthiness), and B3 (Aesthetics).

Panel A. Aggregated Pairwise Comparison Matrix for B1 (Functionality)
B1	C_{1_1}	C_{1_2}	C_{1_3}	C_{1_4}	Local Weight
C_{1_1} Space utilization efficiency	1.0000	0.7428	0.8371	1.4662	0.2398
C_{1_2} Circulation accessibility	1.3462	1.0000	1.1487	1.4349	0.2999
C_{1_3} Functional zoning clarity	1.1946	0.8706	1.0000	1.6117	0.2789
C_{1_4} Furniture scale appropriateness	0.6820	0.6969	0.6205	1.0000	0.1814
Panel B. Aggregated pairwise comparison matrix for B2 (Healthiness)
B2	C_{2_1}	C_{2_2}	C_{2_3}	C_{2_4}	Local weight
C_{2_1} Natural element provision	1.0000	0.7572	0.6310	0.7074	0.1875
C_{2_2} Rest and social-support provision	1.3206	1.0000	0.8113	0.6433	0.2230
C_{2_3} Daily convenience provision	1.5849	1.2326	1.0000	0.7963	0.2732
C_{2_4} Hygiene and cleanliness assurance	1.4137	1.5544	1.2558	1.0000	0.3163
Panel C. Aggregated pairwise comparison matrix for B3 (Aesthetics)
B3	C_{3_1}	C_{3_2}	C_{3_3}	C_{3_4}	Local weight
C_{3_1} Color harmony	1.0000	0.8001	1.4922	1.5162	0.2834
C_{3_2} Style coherence	1.2498	1.0000	1.4833	1.5162	0.3163
C_{3_3} Material quality and texture	0.6701	0.6742	1.0000	1.2980	0.2141
C_{3_4} Visual neatness	0.6595	0.6595	0.7704	1.0000	0.1862

Note (Panel A): λ_max = 4.0109, CI = 0.0036, and CR = 0.0040. (Panel B): λ_max = 4.0161, CI = 0.0054, and CR = 0.0061. (Panel C): λ_max = 4.0137, CI = 0.0046, and CR = 0.0052. Abbreviations: B1, B2, and B3 denote Functionality, Healthiness, and Aesthetics, respectively. λ_max denotes the maximum eigenvalue, CI denotes the consistency index, and CR denotes the consistency ratio.

References

Mishra, A.R.; Rani, P.; Cavallaro, F.; Hezam, I.M. Intuitionistic fuzzy fairly operators and additive ratio assessment-based integrated model for selecting the optimal sustainable industrial building options. Sci. Rep. 2023, 13, 5055. [Google Scholar] [CrossRef]
Maselli, G.; Cucco, P.; Nesticò, A.; Ribera, F. Historical heritage–MultiCriteria Decision Method (H-MCDM) to prioritize intervention strategies for the adaptive reuse of valuable architectural assets. MethodsX 2024, 12, 102487. [Google Scholar] [CrossRef]
Yu, M.; Chen, X.; Zheng, X.; Cui, W.; Ji, Q.; Xing, H. Evaluation of spatial visual perception of streets based on deep learning and spatial syntax. Sci. Rep. 2025, 15, 18439. [Google Scholar] [CrossRef]
Chen, J.; Shao, Z.; Zheng, X.; Zhang, K.; Yin, J. Integrating aesthetics and efficiency: AI-driven diffusion models for visually pleasing interior design generation. Sci. Rep. 2024, 14, 3496. [Google Scholar] [CrossRef] [PubMed]
Rui, L.; Firzan, M. Emotional Design of Interior Spaces: Exploring Challenges and Opportunities. Buildings 2025, 15, 153. [Google Scholar] [CrossRef]
Acampa, G.; Diana, L.; Marino, G.; Marmo, R. Assessing the Transformability of Public Housing through BIM. Sustainability 2021, 13, 5431. [Google Scholar] [CrossRef]
Brkanić Mihić, I. Housing Quality Assessment Model Based on the Spatial Characteristics of an Apartment. Buildings 2023, 13, 2181. [Google Scholar] [CrossRef]
Na, L.; Hui, Z.; Huaxia, X. Optimization design of interior space based on the two-stage deep learning network and Single sample-driven method. PLoS ONE 2025, 20, e0329487. [Google Scholar] [CrossRef]
Abbas, H.; Ren, S.B.; Asim, M.; Hassan, S.I.; Abd El-Latif, A.A. SODU2-NET: A novel deep learning-based approach for salient object detection utilizing U-NET. PeerJ Comput. Sci. 2025, 11, e2623. [Google Scholar] [CrossRef] [PubMed]
Dimitroulopoulou, S.; Dudzińska, M.R.; Gunnarsen, L.; Hägerhed, L.; Maula, H.; Singh, R.; Toyinbo, O.; Haverinen-Shaughnessy, U. Indoor air quality guidelines from across the world: An appraisal considering energy saving, health, productivity, and comfort. Environ. Int. 2023, 178, 108127. [Google Scholar] [CrossRef]
Schweiker, M.; Ampatzi, E.; Andargie, M.S.; Andersen, R.K.; Azar, E.; Barthelmes, V.M.; Berger, C.; Bourikas, L.; Carlucci, S.; Chinazzo, G.; et al. Review of multi-domain approaches to indoor environmental perception and behaviour. Build. Environ. 2020, 176, 106804. [Google Scholar] [CrossRef]
Han, X.; Yu, Y.; Liu, L.; Li, M.; Wang, L.; Zhang, T.; Tang, F.; Shen, Y.; Li, M.; Yu, S.; et al. Exploration of street space architectural color measurement based on street view big data and deep learning-A case study of Jiefang North Road Street in Tianjin. PLoS ONE 2023, 18, e0289305. [Google Scholar] [CrossRef]
Ali, A.M.; Benjdira, B.; Koubaa, A.; El-Shafai, W.; Khan, Z.; Boulila, W. Vision Transformers in Image Restoration: A Survey. Sensors 2023, 23, 2385. [Google Scholar] [CrossRef]
Choudhary, A.; Wu, H.; Tong, L.; Wang, M.D. Learning to Evaluate Color Similarity for Histopathology Images using Triplet Networks. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (BCB), Niagara Falls, NY, USA, 7–10 September 2019; pp. 466–474. [Google Scholar]
Jelić, A.; Tieri, G.; De Matteis, F.; Babiloni, F.; Vecchiato, G. The Enactive Approach to Architectural Experience: A Neurophysiological Perspective on Embodiment, Motivation, and Affordances. Front. Psychol. 2016, 7, 481. [Google Scholar] [CrossRef]
Yao, S.; Li, M.; Yuan, J.; Huo, Q.; Zhao, S.; Wu, Y. Optimization design of layout dimension for residential buildings weighing up daylighting, thermal comfort, and indoor air quality with a low-carbon decision-making. J. Build. Eng. 2024, 98, 111328. [Google Scholar] [CrossRef]
Chaillou, S. ArchiGAN: Artificial Intelligence x Architecture. In Architectural Intelligence: Selected Papers from the 1st International Conference on Computational Design and Robotic Fabrication (CDRF 2019); Yuan, P.F., Xie, M., Leach, N., Yao, J., Wang, X., Eds.; Springer Nature: Singapore, 2020; pp. 117–127. [Google Scholar]
Wang, S.; Yi, Y.K.; Liu, N. Multi-objective optimization (MOO) for high-rise residential buildings’ layout centered on daylight, visual, and outdoor thermal metrics in China. Build. Environ. 2021, 205, 108263. [Google Scholar] [CrossRef]
Zheng, Z.; Yang, D.; Zeng, L.; Mughees, N. A deep learning framework for objective aesthetic evaluation of indoor landscapes using CNN-GNN model. Sci. Rep. 2025, 15, 40810. [Google Scholar] [CrossRef] [PubMed]
Hosamo, H.; Coelho, G.; Nordahl, C.; Kraniotis, D. Building performance optimization through sensitivity Analysis, and economic insights using AI. Energy Build. 2024, 325, 114999. [Google Scholar] [CrossRef]
Wang, Y.; Wang, Q.; Zhang, S. A Lightweight Multi-Layer Perceptron Approach for Carbon Emission Prediction of Public Buildings Under Low-Dimensional Data Scenarios. Buildings 2025, 15, 4508. [Google Scholar] [CrossRef]
Alharbi, A.H.; Khafaga, D.S.; Zaki, A.M.; El-Kenawy, E.-S.M.; Ibrahim, A.; Abdelhamid, A.A.; Eid, M.M.; El-Said, M.; Khodadadi, N.; Abualigah, L.; et al. Forecasting of energy efficiency in buildings using multilayer perceptron regressor with waterwheel plant algorithm hyperparameter. Front. Energy Res. 2024, 12, 1393794. [Google Scholar] [CrossRef]
Zhou, G.; Moayedi, H.; Foong, L. Teaching–learning-based metaheuristic scheme for modifying neural computing in appraising energy performance of building. Eng. Comput. 2021, 37, 3037–3048. [Google Scholar] [CrossRef]
Shahbazi, Y.; Ghofrani, M.; Pedrammehr, S. Aesthetic Assessment of Free-Form Space Structures Using Machine Learning Based on the Expert’s Experiences. Buildings 2023, 13, 2508. [Google Scholar] [CrossRef]
Runge, J.; Zmeureanu, R. A Review of Deep Learning Techniques for Forecasting Energy Use in Buildings. Energies 2021, 14, 608. [Google Scholar] [CrossRef]
Li, Z.; Yan, X.; Wei, X.; Shao, F. IAACLIP: Image Aesthetics Assessment via CLIP. Electronics 2025, 14, 1425. [Google Scholar] [CrossRef]
Yuda, E.; Morikawa, N.; Kaneko, I.; Hirahara, D. CLIP-Guided Clustering with Archetype-Based Similarity and Hybrid Segmentation for Robust Indoor Scene Classification. Electronics 2025, 14, 4571. [Google Scholar] [CrossRef]
Xu, L.; Xu, J.; Yang, Y.; Huang, Y.-J.; Xie, Y.; Li, Y. CLIP Brings Better Features to Visual Aesthetics Learners. In Proceedings of the 2025 IEEE International Conference on Multimedia and Expo (ICME), Nantes, France, 30 June–4 July 2025; pp. 1–6. [Google Scholar]
Xia, S.; Cheng, Y.; Tian, R. ARCHICLIP: Enhanced Contrastive Language–Image Pre-training Model with Architectural Prior Knowledge. In Proceedings of the Conference on Computer-Aided Architectural Design Research in Asia (CAADRIA), Hsinchu, Taiwan, 20–26 April 2024; pp. 69–78. [Google Scholar]
Wen, M.; Liang, D.; Ye, H.; Tu, H. Architectural Facade Design with Style and Structural Features using Stable Diffusion Model. J. Intell. Constr. 2024, 2, 1–12. [Google Scholar] [CrossRef]
Liu, J.; Xiong, W.; Jones, I.; Nie, Y.; Gupta, A.; Ouguz, B. CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding. arXiv 2023, arXiv:2303.03565. [Google Scholar] [CrossRef]
Hentschel, S.; Kobs, K.; Hotho, A. CLIP knows image aesthetics. Front. Artif. Intell. 2022, 5, 976235. [Google Scholar] [CrossRef]
Yu, J.; Egger, R. Color and engagement in touristic Instagram pictures: A machine learning approach. Ann. Tour. Res. 2021, 89, 103204. [Google Scholar] [CrossRef]
Dewingong, T.; Afor, M.; Mishra, P.; Mishra, S.; Mishra, G.; Aliyu, B. Colour Detection for Interior Designs Using Machine Learning. In International Conference on Advancements in Interdisciplinary Research; Springer: Cham, Switzerland, 2023; pp. 243–254. [Google Scholar]
Li, Y.; Liu, C.; Lou, Y.; Shen, T.; Wu, Y.; Guo, J.; Li, Y.; Zhang, M. Integrating colored LiDAR and YOLO semantic segmentation for design feature extraction in Chinese ancient architecture. npj Herit. Sci. 2025, 13, 316. [Google Scholar] [CrossRef]
Heikel, E.; Espinosa-Leal, L. Indoor Scene Recognition via Object Detection and TF-IDF. J. Imaging 2022, 8, 209. [Google Scholar] [CrossRef]
Dwiek, S.A.; Bast, S.A. The application of machine learning in inner built environment: Scientometric analysis, limitations, and future directions. Front. Built Environ. 2024, 10, 1413153. [Google Scholar] [CrossRef]
Cui, Y. Investigation of urban land development and utilization using an intelligent AHP algorithm and decision support system. Sci. Rep. 2025, 15, 19048. [Google Scholar] [CrossRef] [PubMed]
Abo-alian, A.; Youssef, M.; Badr, N.L. A data-driven approach to prioritize MITRE ATT&CK techniques for active directory adversary emulation. Sci. Rep. 2025, 15, 27776. [Google Scholar] [CrossRef] [PubMed]
Luque Castillo, X.; Yepes, V. Multi-Criteria Decision Methods in the evaluation of social housing projects. J. Civ. Eng. Manag. 2025, 31, 608–630. [Google Scholar] [CrossRef]
Janeš, A.; Kadoić, N.; Begičević Ređep, N. Differences in prioritization of the BSC’s strategic goals using AHP and ANP methods. J. Inf. Organ. Sci. 2018, 42, 193–217. [Google Scholar] [CrossRef]
Aslan, V.; Sepetcioglu, M.Y. Modeling and evaluation of Mardin groundwater level potential using the TOPSIS method. Egypt. J. Remote Sens. Space Sci. 2025, 28, 553–561. [Google Scholar] [CrossRef]
Ma, J.; Siddhpura, M.; Haddad, A.; Evangelista, A.; Siddhpura, A. A Multi-Criteria Decision-Making Approach for Assessing the Sustainability of an Innovative Pin-Connected Structural System. Buildings 2024, 14, 2221. [Google Scholar] [CrossRef]
Heydari, A.; Abbasianjahromi, H. Evaluating the resilience of residential buildings during a pandemic with a sustainable construction approach. Heliyon 2024, 10, e31006. [Google Scholar] [CrossRef]
Chen, C.-H. A Novel Multi-Criteria Decision-Making Model for Building Material Supplier Selection Based on Entropy-AHP Weighted TOPSIS. Entropy 2020, 22, 259. [Google Scholar] [CrossRef]
Kouka, D.; Russo, M.; Barreca, F. Building sustainability assessment: A comparison between ITACA, DGNB, HQE and SBTool alignment with the European Green Deal. Heliyon 2024, 10, e34478. [Google Scholar] [CrossRef] [PubMed]
Nguyen, P.-H.; Tran, T.-H.; Thi Nguyen, L.-A.; Pham, H.-A.; Thi Pham, M.-A. Streamlining apartment provider evaluation: A spherical fuzzy multi-criteria decision-making model. Heliyon 2023, 9, e22353. [Google Scholar] [CrossRef] [PubMed]
Zhao, X.; Su, Y.; Su, H.; Li, W. Evaluating the sustainability of recycled plastic furniture design using the analytic hierarchy process-fuzzy comprehensive evaluation and machine learning models integrated evaluation method. J. Clean. Prod. 2025, 518, 145782. [Google Scholar] [CrossRef]
Hosseini, S.M.A.; Yazdani, R.; Fuente, A.d.l. Multi-objective interior design optimization method based on sustainability concepts for post-disaster temporary housing units. Build. Environ. 2020, 173, 106742. [Google Scholar] [CrossRef]
Zolfaghari, S.M.; Pons, O.; Nikolic, J. Sustainability assessment model for mass housing’s interior rehabilitation and its validation to Ekbatan, Iran. J. Build. Eng. 2023, 65, 105685. [Google Scholar] [CrossRef]
Dawood, S.; Crosbie, T.; Dawood, N.; Lord, R. Designing low carbon buildings: A framework to reduce energy consumption and embed the use of renewables. Sustain. Cities Soc. 2013, 8, 63–71. [Google Scholar] [CrossRef]
Kabak, M.; Köse, E.; Kırılmaz, O.; Burmaoğlu, S. A fuzzy multi-criteria decision making approach to assess building energy performance. Energy Build. 2014, 72, 382–389. [Google Scholar] [CrossRef]
Jiang, J. Enhancing interior design and space planning via human–machine intelligent interaction for artistic cognition. Sci. Rep. 2025, 15, 32344. [Google Scholar] [CrossRef]
Jiang, N. Smart Home Product Layout Design Method Based on Real-Number Coding Genetic Algorithm. Comput. Intell. Neurosci. 2022, 2022, 1523330. [Google Scholar] [CrossRef]
Yang, B.; Li, L.; Song, C.; Jiang, Z.; Ling, Y. Automatic interior layout with user-specified furniture. Comput. Graph. 2021, 94, 124–131. [Google Scholar] [CrossRef]
Yu, Y.-W.; Juan, Y.-K. Impact of interior design factors on the physical, physiological, and mental health of older adults—A scoping review. Humanit. Soc. Sci. Commun. 2025, 12, 956. [Google Scholar] [CrossRef]
Grassie, D.; Milczewska, K.; Renneboog, S.; Scuderi, F.; Dimitroulopoulou, S. Impact of Indoor Air Quality, Including Thermal Conditions, in Educational Buildings on Health, Wellbeing, and Performance: A Scoping Review. Environments 2025, 12, 261. [Google Scholar] [CrossRef]
Chamseddine, A.; Elzein, I.M.; Hassan, N. Indoor Air Quality in Critical Indoor Environments: A Review Paper. Water Air Soil Pollut. 2025, 236, 885. [Google Scholar] [CrossRef]
Colenberg, S.; Jylhä, T.; Arkesteijn, M. The relationship between interior office space and employee health and well-being—A literature review. Build. Res. Inf. 2020, 49, 352–366. [Google Scholar] [CrossRef]
Martín López, L.; Fernández Díaz, A.B. Interior Environment Design Method for Positive Mental Health in Lockdown Times: Color, Textures, Objects, Furniture and Equipment. Designs 2022, 6, 35. [Google Scholar] [CrossRef]
Zhang, Z.; Andersen, M. A review of the effectiveness of metrics for assessing human responses to biophilic environments involving views, shading, and interior design elements. J. Environ. Psychol. 2025, 105, 102669. [Google Scholar] [CrossRef]
Fan, Y.; Zhou, Y.; Yuan, Z. Interior Design Evaluation Based on Deep Learning: A Multi-Modal Fusion Evaluation Mechanism. Mathematics 2024, 12, 1560. [Google Scholar] [CrossRef]
Adilova, A.; Shamoi, P. Aesthetic preference prediction in interior design: Fuzzy approach. arXiv 2024, arXiv:2401.17710. [Google Scholar] [CrossRef]
Tawil, N.; Ascone, L.; Kühn, S. The contour effect: Differences in the aesthetic preference and stress response to photo-realistic living environments. Front. Psychol. 2022, 13, 933344. [Google Scholar] [CrossRef]
Chen, G. A Data-Driven Intelligent System for Assistive Design of Interior Environments. Comput. Intell. Neurosci. 2022, 2022, 8409495. [Google Scholar] [CrossRef] [PubMed]
Song, G.; Zhu, S. Interior Design Assessment System Based on Computer Vision and Multimedia. Comput. Aided Des. Appl. 2024, 21, 264–278. [Google Scholar] [CrossRef]
Lee, E.-J.; Park, S.-J. Biophilic experience-based residential hybrid framework. Int. J. Environ. Res. Public Health 2022, 19, 8512. [Google Scholar] [CrossRef]
Falagán, D.H. Human-centered design for flexible and inclusive housing. Hous. Soc. 2025, 52, 37–56. [Google Scholar] [CrossRef]
Yang, S.; Bai, T.; Feng, L.; Zhang, J.; Jiang, W. Indoor Environmental Quality in Aged Housing and Its Impact on Residential Satisfaction Among Older Adults: A Case Study of Five Clusters in Sichuan, China. Sustainability 2025, 17, 5064. [Google Scholar] [CrossRef]
Zhou, X.; Kim, S.; Chen, Y. SDXL model-based optimization for interior design: Data-driven and deep learning methods. PLoS ONE 2026, 21, e0342258. [Google Scholar] [CrossRef] [PubMed]
Geng, Z. Exploring visual features of ecological design in private housing: Search for visual principles. HERD: Health Environ. Res. Des. J. 2024, 17, 75–87. [Google Scholar] [CrossRef] [PubMed]
Hosseini Dehshiri, S.J.; Amiri, M. An integrated multi-criteria decision-making framework under uncertainty for evaluating sustainable hydrogen production strategies based on renewable energies in Iran. Environ. Sci. Pollut. Res. 2023, 30, 46058–46073. [Google Scholar] [CrossRef]
Xu, Y.; Wu, S. Indoor Color and Space Humanized Design Based on Emotional Needs. Front. Psychol. 2022, 13, 926301. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems; NeurIPS: San Diego CA, USA, 2017; pp. 4765–4774. [Google Scholar]
Xu, W.; Cui, X.; Qi, R.; Lin, Y. A Novel Multi-Criteria Decision-Making Approach to Evaluate Sustainable Product Design. Sustainability 2025, 17, 9436. [Google Scholar] [CrossRef]
Pombo, M.; Igdalova, A.; Pelli, D.G. Consensus and contention in beauty judgment. iScience 2024, 27, 110213. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Park, S.; Park, S.; Kim, J. Building façade datasets for analyzing building characteristics using deep learning. Data Brief 2024, 57, 110885. [Google Scholar] [CrossRef] [PubMed]
Wang, S. Domain adaptation using transformer models for automated detection of exterior cladding materials in street view images. Sci. Rep. 2026, 16, 2696. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overall methodological workflow of the proposed interior space evaluation framework.

Figure 2. Representative examples from FHASID-10K across 49 style–space categories. Rows denote functional space types and columns denote interior design styles.

Figure 3. Workflow of the functionality sub-model (FS).

Figure 4. Workflow of the health sub-model (HS).

Figure 5. Workflow of the aesthetics sub-model (AS).

Figure 6. Functionality scoring results across space types and design styles: (a) Mean functionality score (FS) by space type and style (error bars denote standard deviation); (b) Joint distribution of space utilization (SU) and accessibility (ACC), with FS encoded by marker color/size.

Figure 7. Kernel density distributions of healthiness scores across different space types and design styles.

Figure 8. Scatter distributions and mean trend lines of healthiness scores across different space types.

Figure 9. Bubble distribution of aesthetics scores across different space types and design styles.

Figure 10. Heatmap of average aesthetics scores across different functional spaces and design styles for three major design platforms.

Figure 11. Comprehensive score (CS) distribution characteristics derived from the AHP–TOPSIS integration: (a) CS trend for the first 200 samples; (b) Quality grade distribution based on predefined thresholds (low: CS < 0.4; medium: 0.4 ≤ CS ≤ 0.7; high: CS > 0.7); (c) Histogram of CS over all 13,962 samples.

Figure 12. Correlation matrix of evaluation indicators.

Figure 13. Surrogate learning diagnostics for the GBDT-based reconstruction of the AHP–TOPSIS comprehensive score: (a) Residual analysis of the surrogate model (residual distribution and residuals versus predicted CS); (b) Calibration plot comparing predicted and actual CS, where the dashed line denotes the line of perfect agreement (y = x); (c) Comparison between GBDT feature importance and AHP weight assignments for FS, HS, and AS.

Figure 14. SHAP-based interpretability analysis of the comprehensive score: (a) SHAP summary plot showing contribution distributions of the three first-level indicators (FS, HS, and AS); (b) SHAP dependence plots illustrating how each indicator value relates to its SHAP contribution.

Figure 15. Agreement between model scores and subjective ratings at image and individual levels: (a) Image-level agreement between model scores and mean subjective ratings across functionality (FS), healthiness (HS), aesthetics (AS), and the comprehensive score (CS); (b) Individual-level subjective ratings and image-level means, with regression trends fitted on image-level scores. In (b), blue points denote individual ratings, orange points denote image-level means, and the green line and shaded area denote the fitted trend and its 95% confidence interval.

Figure 16. Sensitivity analysis of FS parameter settings: (a) Spearman rank correlation with the baseline FS under threshold perturbation; (b) Top-10% sample overlap with the baseline FS under threshold perturbation; (c) Spearman rank correlation with the baseline FS under reference occupancy ratio perturbation; (d) Top-10% sample overlap with the baseline FS under reference occupancy ratio perturbation; (e) Spearman rank correlation with the baseline FS under contour penalty coefficient perturbation; (f) Top-10% sample overlap with the baseline FS under contour penalty coefficient perturbation. In panels (c,d), R* denotes the reference occupancy ratio used in the space utilization scoring function. The dashed vertical line indicates the baseline parameter setting in each perturbation analysis.

Figure 17. Sensitivity analysis of AHP-derived criterion weights: (a) Spearman rank correlation between perturbed and baseline CS_i values under different weight perturbations; (b) Top-10% sample overlap between perturbed and baseline rankings. The dashed vertical line indicates the baseline weight perturbation level (0%).

Table 1. Evaluation Indicator System for Interior Space Performance.

Goal Layer (A)	First-Level Indicator (B)	Second-Level Indicator (C)	Definition	Attribute
Overall goal: Comprehensive evaluation of interior spatial performance	Functionality (B1)	Space utilization efficiency (C_{1_1})	Measures layout compactness and the proportional relationship among functional zones, reflecting the efficiency of floor-area usage.	Beneficial
		Circulation accessibility (C_{1_2})	Evaluates the extent to which the layout supports pedestrian circulation and reachability, reflecting movement smoothness.	Beneficial
		Functional zoning clarity (C_{1_3})	Reflects the clarity of delineation and organization of different functional areas within the space.	Beneficial
		Furniture scale appropriateness (C_{1_4})	Assesses whether furniture dimensions are proportionate and compatible with the spatial scale.	Beneficial
	Healthiness (B2)	Natural element provision (C_{2_1})	Reflects the presence level of natural elements (e.g., greenery, daylight cues) in the space.	Beneficial
		Rest and social-support provision (C_{2_2})	Measures how well the space supports resting and social interaction activities.	Beneficial
		Daily convenience provision (C_{2_3})	Reflects the provision of facilities that support convenience for everyday activities.	Beneficial
		Hygiene and cleanliness assurance (C_{2_4})	Comprehensively reflects cleanliness conditions and potential hygiene-related risks in the space.	Beneficial
	Aesthetics (B3)	Color harmony (C_{3_1})	Evaluates the harmony and aesthetic quality of color composition.	Beneficial
		Style coherence (C_{3_2})	Measures the consistency and unity among stylistic elements in the space.	Beneficial
		Material quality and texture (C_{3_3})	Reflects the quality of material selection and the visual expression of material texture.	Beneficial
		Visual neatness (C_{3_4})	Measures visual load and orderliness, reflecting the degree of visual cleanliness.	Beneficial

Table 2. Background information of respondents involved in the AHP weighting process.

Group	Number	Professional Background	Years/Experience Profile	Selection Criteria	Role in Weighting
Designers	7	Interior design, architecture, environmental design	Several years of academic or professional experience in design-related work	Relevant professional background and familiarity with interior evaluation	Expert judgment
Real estate practitioners	5	Residential development, marketing, and project-related evaluation	Practical experience in housing-related projects and sample-room observation	Familiarity with residential spatial assessment	Practice-oriented judgment
User representatives	3	Residential users or potential homebuyers	Experience in residential use, housing visits, or preference comparison	Ability to provide user-centered assessment	User perception input

Table 3. Aggregated pairwise comparison matrix and consistency test results for the criterion level.

Criterion Level (A)	B1	B2	B3	Weight
B1	1.0000	1.1055	1.7118	0.4025
B2	0.9046	1.0000	1.4854	0.3591
B3	0.5842	0.6732	1.0000	0.2384

Note:

λ_{m a x} = 3.0002

,

C I = 0.0001

,

C R = 0.0002

.

Table 4. Supporting references, local weights, and consistency statistics of indicator-level sub-criteria.

Criterion	Sub-Criterion	Supporting References	Local Weight	$λ_{m a x}$	CI	CR
Functionality (B1)	Space utilization efficiency (C_{1_1})	[55,70]	0.2398	4.0109	0.0036	0.0040
	Circulation accessibility (C_{1_2})		0.2999
	Functional zoning clarity (C_{1_3})		0.2789
	Furniture scale appropriateness (C_{1_4})		0.1814
Healthiness (B2)	Natural element provision (C_{2_1})	[60,67]	0.1875	4.0161	0.0054	0.0061
	Rest and social-support provision (C_{2_2})		0.223
	Daily convenience provision (C_{2_3})		0.2732
	Hygiene and cleanliness assurance (C_{2_4})		0.3163
Aesthetics (B3)	Color harmony (C_{3_1})	[4,19]	0.2834	4.0137	0.0046	0.0052
	Style coherence (C_{3_2})		0.3163
	Material quality and texture (C_{3_3})		0.2141
	Visual neatness (C_{3_4})		0.1862

Note: Supporting references indicate the principal literature basis for the derivation of the sub-criteria within each first-level dimension.

Table 5. Health-related visual element label set for interior spaces (based on COCO).

Category Name	Health Semantic Description
potted plant	introduction of natural elements; stress reduction and affect regulation
vase	enhanced visual pleasantness; improved spatial enjoyment
clock	time awareness; support for daily rhythm
book	reading-related behavior; cognitive regulation and psychological restoration
tv	entertainment and social interaction; improved emotional stability
bottle	hydration cue; support for healthy habits
bowl	food container; indicator of eating-related behavior
banana	representative healthy food; cue for nutrition intake
apple	same as above; cue for balanced diet
chair	multifunctional furniture for resting/reading/dining
couch	relaxation and social interaction zone
bed	indicator of sleep health
dining table	cue for eating-related behavior
sink	handwashing/cleaning; hygiene-related behavior
toilet	sanitary facility; basic health and safety assurance
refrigerator	food storage; indicator of dietary safety
oven	support for healthier cooking practices
microwave	efficient food preparation; improved dietary convenience

Table 6. Presentation order of residential show-unit images in the questionnaire and their corresponding model scores.

Sequence Number	Image Id	Quality Level	Space	Style	CS	FS	HS	AS
1	7970	Medium	Living Room	Luxury	0.59	59.35	100.00	2.11
2	1316	Low	Bedroom	Luxury	0.39	34.41	70.00	2.12
3	7428	High	Living Room	Nordic	0.71	74.91	100.00	3.80
4	9835	Medium	Study	Luxury	0.42	50.58	50.00	4.11
5	5222	Low	Study	American	0.37	70.41	0.00	1.52
6	12038	Medium	Living Room	Nordic	0.46	56.15	60.00	2.30
7	13601	High	Dining Room	European	0.77	85.05	100.00	4.15
8	9812	Low	Study	Luxury	0.15	38.39	10.00	1.57
9	11563	Medium	Kitchen	European	0.52	50.17	100.00	0.00
10	12473	Low	Living Room	American	0.37	27.17	70.00	2.34
11	8553	High	Dining Room	Chinese	0.71	68.08	100.00	6.25
12	10700	Low	Bathroom	Nordic	0.34	60.97	20.00	0.00
13	11370	Medium	Kitchen	Nordic	0.48	39.36	100.00	0.00
14	8296	Medium	Entrance Hall	Minimalist	0.44	40.33	60.00	6.49
15	4886	Low	Study	Japanese	0.30	42.26	40.00	1.94
16	4949	Low	Study	Japanese	0.28	55.46	0.00	2.57
17	11471	Medium	Kitchen	Japanese	0.43	58.40	40.00	4.27
18	13449	Low	Dining Room	Nordic	0.40	29.43	80.00	1.58
19	2989	Medium	Living Room	European	0.42	43.29	60.00	4.27
20	13504	Low	Dining Room	Japanese	0.22	50.02	0.00	0.08
21	2876	Medium	Living Room	Japanese	0.41	32.75	80.00	1.56

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Zhao, Z.; Guan, X. An Automated Decision-Support Framework for Interior Space Quality Evaluation Using Computer Vision and Multi-Criteria Decision-Making. Buildings 2026, 16, 1508. https://doi.org/10.3390/buildings16081508

AMA Style

Wang Y, Zhao Z, Guan X. An Automated Decision-Support Framework for Interior Space Quality Evaluation Using Computer Vision and Multi-Criteria Decision-Making. Buildings. 2026; 16(8):1508. https://doi.org/10.3390/buildings16081508

Chicago/Turabian Style

Wang, Yuanan, Zichen Zhao, and Xuesong Guan. 2026. "An Automated Decision-Support Framework for Interior Space Quality Evaluation Using Computer Vision and Multi-Criteria Decision-Making" Buildings 16, no. 8: 1508. https://doi.org/10.3390/buildings16081508

APA Style

Wang, Y., Zhao, Z., & Guan, X. (2026). An Automated Decision-Support Framework for Interior Space Quality Evaluation Using Computer Vision and Multi-Criteria Decision-Making. Buildings, 16(8), 1508. https://doi.org/10.3390/buildings16081508

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Automated Decision-Support Framework for Interior Space Quality Evaluation Using Computer Vision and Multi-Criteria Decision-Making

Abstract

1. Introduction

2. Literature Review

2.1. Architectural Interior Space and Computational Design

2.2. Multi-Criteria Decision-Making Methods and Design Evaluation Model Construction

2.3. Interior Space Design Evaluation Based on Multi-Dimensional Indicators

2.4. Research Gap and Methodological Novelty

3. Methodology

3.1. Overall Methodological Framework

3.2. Performance Criteria

3.3. Multi-Criteria Decision-Making Basis

3.4. Comprehensive Scoring Using AHP–TOPSIS

3.4.1. Data Normalization

3.4.2. AHP-Based Weight Determination

3.4.3. Ideal Solution Definition

3.4.4. Distance and Comprehensive Score Calculation

4. Framework Implementation

4.1. Dataset

4.2. Experimental Design and Workflow

4.3. Sub-Model Construction

4.3.1. Functionality Sub-Model (FS)

4.3.2. Healthiness Sub-Model (HS)

4.3.3. Aesthetics Sub-Model (AS)

5. Results and Validation

5.1. Results of Functionality Scoring

5.2. Results of Healthiness Scoring

5.3. Results of Aesthetics Scoring

5.4. Comprehensive Score Results and Visualization

5.5. Model Validation and Robustness Assessment

5.5.1. Surrogate Learning Check Using Gradient Boosting Decision Trees (GBDT)

5.5.2. SHAP-Based Interpretability Analysis and Feature Contribution

5.5.3. Agreement Between Model Scores and Subjective Ratings

5.5.4. Sensitivity Analysis of Heuristic Parameter Settings in the Functionality Sub-Model

5.5.5. Sensitivity Analysis of AHP-Derived Criterion Weights

6. Discussion

6.1. Structural Behavior of the Multi-Dimensional Scoring Framework

6.2. Why Did the Aesthetics Dimension Underperform?

6.3. Reproducibility and Future Reuse of the Framework

6.4. Generalizability to Real-World Interiors, Practical Implications, and Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Aggregated Pairwise Comparison Matrices and Local Weights

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI