Classifying the Reuse Value of Industrial Heritage Sites Using Random Forest: A Case Study of Jiangsu’s Salt Reclamation Zone

Meng, Xiang; Chang, Jiang; Liu, Xiao; Zhuang, Fei

doi:10.3390/buildings16040796

Open AccessArticle

Classifying the Reuse Value of Industrial Heritage Sites Using Random Forest: A Case Study of Jiangsu’s Salt Reclamation Zone

¹

School of Mechanics and Civil Engineering, China University of Mining and Technology, Xuzhou 221116, China

²

School of Architecture, Jiangsu Engineering Vocational and Technical College, Nantong 226001, China

³

State Key Laboratory of Subtropical Building and Urban Science, School of Architecture, South China University of Technology, Guangzhou 510641, China

⁴

Department of Architecture, College of Design and Engineering, National University of Singapore, Singapore 119077, Singapore

⁵

Science and Technology Department, Changjiang Road Campus, Huaiyin Normal University, No. 111, Changjiang West Road, Huai’an 223300, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(4), 796; https://doi.org/10.3390/buildings16040796

Submission received: 28 October 2025 / Revised: 18 December 2025 / Accepted: 10 February 2026 / Published: 14 February 2026

(This article belongs to the Special Issue Advances in Engineering Structure Inspection, Monitoring Technologies, and Post-Disaster Diagnosis, Maintenance, and Operation)

Download

Browse Figures

Versions Notes

Abstract

Industrial heritage embodies the complex interplay between historical continuity, technological development, and social spatial transformation. However, existing assessment methods often rely on qualitative judgments or fragmented criteria, limiting their ability to systematically evaluate the reuse potential in the context of heterogeneous heritage. To overcome this limitation, this study constructs an empirical evaluation framework that defines heritage value through quantifiable indicators and examines how different value dimensions affect reuse potential. Based on a dataset of 124 industrial heritage sites located on saline–alkali soil along the coast of Jiangsu Province, this study integrates multiple data sources such as archival records, field surveys, spatial data, and questionnaire surveys to construct a multidimensional indicator system. This system quantifies and analyzes four value dimensions: historical, architectural, technological, and socio-cultural, and employs machine learning methods for analysis. The study utilizes a Random Forest model to examine the relative impact of each dimension and assess their comprehensive explanatory power in classifying the potential for heritage reuse. The performance of the model is evaluated through cross-validation, yielding robust results (accuracy = 0.833, macro F1 = 0.812). A five-fold cross-validation is conducted to train a Random Forest classifier. The model achieves an accuracy of 0.833, a macro F1 score of 0.812, and an AUC of 0.871, outperforming the baseline classifier and validating the reliability of the analytical framework. The research findings indicate that the impact of architectural integrity and technical characteristics on reuse potential significantly outweighs symbolic or perceptual attributes, unveiling structural biases present in traditional heritage assessment practices. This study transcends descriptive assessments by empirically examining the operational modes of different value dimensions within a unified analytical framework, offering empirical insights into the mechanisms influencing the reuse of industrial heritage. The proposed framework provides a reproducible and transparent approach to support heritage conservation and adaptive reuse strategies in industrial transformation areas.

Keywords:

empirical framework; data-driven heritage assessment; machine learning modeling; industrial heritage reuse

1. Introduction

Industrial heritage constitutes a critical component of regional historical memory and spatial identity, reflecting the technological trajectories, socio-economic structures, and production systems formed during industrialization. In China’s coastal salt-reclamation zones, large-scale salt production, textile manufacturing, and auxiliary industrial facilities have left behind extensive and heterogeneous industrial landscapes. These sites not only embody industrial technologies and spatial organization of specific historical periods, but also serve as carriers of regional identity and collective memory. However, under rapid urban expansion and industrial restructuring, many such sites are facing functional obsolescence, spatial fragmentation, and increasing redevelopment pressure, raising urgent questions regarding how their heritage value should be identified, compared, and managed in a systematic manner [1,2,3].

Internationally, heritage research has established a series of conceptual frameworks to interpret industrial heritage value. Foundational documents such as the Nizhny Tagil Charter, the Burra Charter, and UNESCO’s Historic Urban Landscape (HUL) Recommendation emphasize the multidimensional nature of industrial heritage, encompassing historical, technological, architectural, and socio-cultural attributes [4,5,6]. These frameworks have provided essential theoretical guidance for heritage conservation practice. However, their application in empirical contexts often relies heavily on expert judgment and qualitative interpretation, which limits comparability across sites and constrains the operationalization of heritage value in planning and management processes [7].

Recent advances in digital technologies—including UAV-based surveying, GIS spatial analysis, and data-driven modeling—have created new opportunities to quantify attributes that were previously assessed qualitatively [8,9,10]. Parallel to this, machine learning techniques such as Random Forests and ensemble learning have been increasingly adopted in related fields to handle multi-dimensional data and uncover complex relationships among variables [11,12,13]. Despite these developments, existing studies in industrial heritage research have primarily employed such methods for object detection or condition assessment, while systematic, data-driven evaluation of heritage value remains limited. In particular, few studies have attempted to integrate socio-cultural indicators—such as collective memory, identity perception, and social recognition—into quantitative analytical frameworks [14].

Against this background, a key methodological challenge remains unresolved: how can the multidimensional value of industrial heritage be translated into measurable, verifiable indicators that support comparative analysis and informed decision-making? Addressing this gap requires not only the collection of diverse data sources, but also the development of an empirical framework capable of integrating qualitative and quantitative information in a transparent and reproducible manner.

Accordingly, this study aims to empirically examine the value of industrial heritage by operationalizing established theoretical dimensions into measurable indicators and evaluating their relative contributions through data-driven analysis. Based on a dataset comprising 124 industrial heritage sites in the coastal salt-reclamation zone of Jiangsu Province, the study integrates archival documentation, field surveys, UAV-derived spatial data, and questionnaire-based socio-cultural information. A Random Forest-based analytical framework is employed to assess the influence of different value dimensions and to classify heritage sites into differentiated value categories.

Specifically, this study addresses the following research questions:

1. How do commonly recognized value dimensions—historical, architectural, technological, and socio-cultural—perform when operationalized within a unified empirical framework?

2. Which dimensions exert the greatest influence on the classification of industrial heritage value?

3. To what extent can data-driven analysis enhance the transparency and comparability of heritage evaluation across heterogeneous industrial landscapes?

By answering these questions, this research contributes an empirically grounded framework for industrial heritage assessment that bridges theoretical discourse and practical evaluation. Rather than proposing a new conceptual typology, the study demonstrates how existing heritage concepts can be transformed into measurable indicators, providing a replicable and methodologically robust basis for conservation planning and policy decision-making in regions undergoing industrial transformation.

2. Literature Review

2.1. International Research on Industrial Heritage Value

Industrial heritage has long been recognized as a crucial carrier of historical memory, technological development, and socio-cultural identity. Foundational frameworks such as the Nizhny Tagil Charter and the Burra Charter established the theoretical basis for understanding industrial heritage as an integrated system combining material, technological, and social dimensions [15]. Building on these principles, UNESCO’s Historic Urban Landscape (HUL) Recommendation further expanded the scope of heritage evaluation by emphasizing the integration of cultural, environmental, and spatial dimensions in urban contexts [16,17].

Subsequent research has refined these conceptual foundations by linking industrial heritage value to issues of identity, collective memory, and social participation [18,19]. Meanwhile, values-based assessment approaches—particularly those promoted by the Getty Conservation Institute—have sought to translate abstract heritage concepts into structured evaluative frameworks, providing methodological support for decision-making processes [20]. These studies have significantly enriched theoretical discourse, but have largely relied on qualitative interpretation or expert judgment.

With the advancement of analytical methods, quantitative approaches such as multi-criteria decision analysis (MCDA), analytic hierarchy process (AHP), and fuzzy evaluation have been introduced to heritage studies [21,22]. While these methods enhance structural clarity, their reliance on subjective weighting schemes limits reproducibility and restricts their capacity to reveal interactions among multiple value dimensions. Consequently, the empirical explanatory power of such approaches remains constrained.

Recent research has begun to explore data-driven techniques, including statistical modeling and machine learning, to address these limitations. Applications in heritage-related fields demonstrate the potential of these methods to handle complex, non-linear relationships and large datasets [23]. However, existing studies have predominantly focused on isolated tasks—such as object recognition or condition assessment—without systematically integrating socio-cultural variables or examining the internal logic of value formation. As a result, the relationship between conceptual value frameworks and their empirical performance remains insufficiently explored.

2.2. Research Gaps and Analytical Orientation

Despite substantial progress, current scholarship reveals several unresolved issues. First, while industrial heritage evaluation increasingly adopts multidimensional frameworks, there remains a lack of empirical mechanisms to operationalize these dimensions in a consistent and comparable manner. Second, although digital technologies such as UAV surveying, GIS analysis, and HBIM have enhanced data acquisition capabilities [24,25,26], their integration into comprehensive value-assessment models remains limited. Third, most existing studies emphasize descriptive classification rather than analytical evaluation, thereby constraining their applicability for decision-making in complex regional contexts.

In response to these limitations, this study positions itself at the intersection of heritage theory and data-driven analysis. Rather than proposing a new conceptual taxonomy, it focuses on empirically testing how established value dimensions perform when operationalized within a unified analytical framework. By integrating multi-source data—including archival records, field surveys, questionnaire data, and spatial datasets—this research aims to transform abstract value concepts into measurable indicators suitable for quantitative assessment.

Specifically, the study seeks to address three interrelated questions:

(1) How can multidimensional heritage values be operationalized into measurable indicators?

(2) To what extent can data-driven models improve the accuracy and transparency of heritage value assessment?

(3) How can empirical evaluation results inform differentiated conservation and reuse strategies?

Through addressing these questions, the study contributes to bridging the gap between conceptual heritage theory and practical evaluation frameworks. It advances an empirically grounded approach that supports evidence-based decision-making in industrial heritage conservation, particularly within complex and rapidly transforming regional contexts.

Research Objectives and Analytical Questions. Based on the literature review and the identified methodological gaps, this study pursues three interrelated research objectives:

To construct an operationalized indicator system that translates multi-dimensional heritage values into measurable variables suitable for quantitative analysis [27]. To evaluate the relative contribution of different value dimensions to overall industrial heritage significance using data-driven modeling techniques. To examine the applicability of machine learning approaches in supporting objective, reproducible, and scalable heritage value assessment.

Correspondingly, the study addresses the following research questions:

RQ1: How can multidimensional heritage values be operationalized into measurable indicators based on existing theoretical frameworks?

RQ2: Which value dimensions exert the greatest influence on overall heritage significance when evaluated through empirical models?

RQ3: To what extent can data-driven methods improve the objectivity and explanatory power of heritage value assessment compared with traditional qualitative approaches?

The framework integrates theoretical value dimensions with multi-source data to construct an operational evaluation system. Through feature engineering and machine learning-based modeling, the framework enables quantitative assessment of heritage value and supports evidence-based decision-making for conservation and adaptive reuse (Figure 1).

Guided by the theoretical framework, a data model was constructed for the evaluation of industrial heritage value. Under the guidance of the aforementioned model framework, a multi-source dataset integrating qualitative and quantitative information was established using multiple data and variables. As shown in Figure 2, the data structure consists of three interrelated levels:

Drawing on these insights, Figure 2 summarizes the research framework adopted in this study. At the top level, the objective is to construct a value recognition hierarchy for industrial heritage in Jiangsu’s coastal salt-reclamation zone. The first main block (“Value recognition hierarchy data”) translates the multi-dimensional value theories from international charters and Chinese scholarship into five value dimensions (historical, architectural/aesthetic, technological, societal, and economic utilization) and their associated sub-attributes (e.g., types of remains, construction year, uniqueness of architectural art, functional scientificity, crowd memory, identity recognition, sustainable development capability) [28]. The second block (“Source of feature data”) links each sub-attribute to specific data sources—literature and repair records, field surveys, UAV and remote sensing, questionnaire data, and industry or statistical data—thus operationalizing abstract value dimensions as measurable indicators [29].

These indicators are then assembled into a multi-dimensional feature dataset and processed through a feature-engineering stage, where they are standardized, encoded, and integrated [30]. On this basis, two types of machine learning models are implemented: a regression-oriented analysis (e.g., correlation heatmaps) to explore relationships among indicators, and a classification model (Random Forest) to categorize heritage sites into high, medium, and low reuse-value classes. The resulting value assessment outputs are subjected to model evaluation and optimization, using cross-validation and confusion-matrix-based metrics to improve robustness. Finally, the graded results are fed into the subsequent sections of the research, where they underpin the formulation of differentiated protection and reuse strategies for different township types [31].

In this way, the literature review not only clarifies the conceptual foundations of the study, but also directly informs the construction of the indicator system and the overall workflow depicted in Figure 2, ensuring that the empirical framework is firmly rooted in both international theory and Chinese policy practice.

In contrast to purely conceptual or review-based studies, this research undertakes an empirical investigation based on a dataset comprising 124 industrial heritage sites within the Jiangsu coastal region. The study operationalizes abstract value concepts into measurable indicators and applies quantitative modeling techniques to examine their relative influence on heritage value differentiation [32]. By integrating archival records, field survey data, and questionnaire-based assessments into a unified analytical framework, the study moves beyond theoretical synthesis and provides empirical evidence regarding how different value dimensions function in practice. This empirical orientation distinguishes the present study from prior conceptual discussions and enables a more robust evaluation of industrial heritage value formation.

3. Research Site

Preliminary research has revealed challenges in the wide distribution and diverse types of industrial heritage related to modern salt fields in Jiangsu. To analyze the value identification of regional industrial heritage in a representative manner, this article relies on historical literature and field research findings. Drawing on the characteristics of the development and evolution stages of contemporary history in the region, this study focuses on approximately 14 main townships. Here, each salt field headquarters is located, and they have been sorted and searched. We selected 14 townships as the principal research objects, primarily because these locations hosted various reclamation companies in modern history, and concentrated industrial relics, characterized by their relic value, are preserved nearby (Figure 3 and Figure 4). And we conducted value identification research based on roughly 124 specific industrial relics left by 14 townships in the region. According to the types of industrial relics, they can be divided into three categories: about 40 types of production-oriented industrial relics, 49 types of living service-facility relics, and 28 types of infrastructure relics (Figure 4). Figure 3 and Figure 4 presented in this section place the spatial and typological diversity commonly seen in previous studies within a specific context, providing a structural background for subsequent methodological synthesis, rather than serving as a display of newly discovered sites.

4. Model Construction and Analysis

The present study does not seek to introduce new experimental techniques or site-specific data acquisition methods. Instead, it aims to critically examine and reorganize existing empirical knowledge by systematically reanalyzing data commonly employed in industrial heritage evaluation. Drawing upon archival documentation, field-based architectural surveys, questionnaire-derived socio-cultural indicators, and digital spatial records, the study constructs an integrated dataset that enables a structured examination of prevailing value-assessment practices. (See Figure 2, above).

Within this framework, the analytical focus is placed on how different categories of heritage value—historical, architectural, technological, social, and economic—interact and contribute to overall evaluation outcomes. Rather than generating novel metrics, the study operationalizes established indicators and examines their relative influence through quantitative modeling. Machine learning techniques, particularly Random Forest classification, are employed not as predictive tools per se, but as analytical instruments for identifying implicit weighting patterns and relational structures embedded within existing evaluation frameworks [33].

Accordingly, the analytical process serves a diagnostic function: it reveals how current assessment practices prioritize certain value dimensions over others, and whether such prioritization is empirically justified. By comparing model-derived outcomes with conventional expert-based classifications, the study enables a critical examination of consistency, bias, and explanatory capacity within prevailing heritage evaluation approaches. This methodological strategy provides an empirical foundation for reassessing how industrial heritage value is defined, measured, and applied in planning and conservation practice, thereby bridging conceptual discourse and empirical verification.

4.1. Data Collection and Pre-Processing

To empirically test the proposed hypotheses and operationalize the analytical framework, this study constructs a multi-source dataset that supports quantitative examination of industrial heritage value. Data source: using literature database search CNKI, Web of Science and other platforms, combined with local records and related local scholars’ writings and documents, to obtain textual data such as historical archives, photographs, maintenance records and other data of the relics.

Through field research and drone survey and spatial simulation analysis system, combined with satellite remote sensing data, the geographic location, architectural style, protection status and other information of 124 relics were obtained, covering three categories of production type (47), living type (49) and ecological type (28).

Data processing: spatial data were standardized and collected and analyzed using the coordinates and images of the remains (Figure 5).

Feature extraction of text data, using the TF-IDF algorithm to extract keywords from the literature, such as “Dasheng Yarn Factory”, “Kenmu Company” and so on, was used to carry out the categorization and summary of the type of features of the relics.

To transform heterogeneous raw data into machine learning-ready features, this study applies a standardized pre-processing pipeline integrating TF-IDF vectorization, categorical encoding, numeric normalization, and missing-value imputation. These steps ensure comparability across the 17 indicators and support the Random Forest classifier trained in later sections.

(1): TF-IDF Vectorization of Historical Text Records

Historical archives—including factory chronicles, repair logs, and industrial reports—contain unstructured textual descriptions (e.g., historical events, architectural terminology). To convert these into quantitative features, Term Frequency–Inverse Document Frequency (TF-IDF) was employed to compute weighted representations of key tokens related to construction techniques, materials, spatial functions, and historical status [34].

TF-IDF enables the model to capture the relative importance of architectural and historical keywords while avoiding noise from high-frequency generic words.

(2): Categorical Feature Encoding

Several indicators—such as type of remains, architectural style, and place style—are categorical. Following reviewer recommendations, a combined method was applied:

Label Encoding for ordinal categories (e.g., preservation levels).

One-Hot Encoding for nominal attributes without ranking (e.g., building type, style).

This conversion produces binary indicator columns suitable for Random Forest classifiers, while preserving interpretability of category contributions [35].

Numeric Feature Normalization

To ensure stable model convergence and eliminate scale bias, all continuous variables—including preservation status, material-authenticity ratio, spatial–scientificity score, and new-material proportion—underwent Min-Max Scaling (1.1):

X^{'} = \frac{X - X m i n}{X m a x - X m i n}

(1.1)

This approach preserves distribution shape and allows direct comparison across indicators [36].

A sensitivity test indicated no performance advantage for Z-score standardization; therefore, Min-Max normalization was retained.

(3): Missing-Value Imputation

Missing values arise from incomplete archival records or inaccessible internal spaces during field surveys. Numerical features were imputed using mean substitution, while categorical features employed mode filling. TF-IDF text fields were inherently sparse and handled through vectorization without additional imputation [37] (Table 1) (Figure 6).

In the summarization of feature data, such as the uniqueness of the architectural art of the relics, the degree of damage, etc., and the use of low-altitude drones and software equipment to complete the measurement and analysis, the specific parameters of the equipment are as follows:

We have achieved real-time data transmission, 3D image simulation generation, fracture structure analysis, blind spot flight conduction, and mask selection removal functions through DJI drones produced in China and equipped with Instant Analysis software. After completing image visualization and analysis, we integrated GIS to analyze the clustering distribution and kernel density of regional buildings. Finally, after summarizing the feature data, we used scikit learn for machine learning data analysis [38,39].

To ensure transparency and reproducibility, the transformation of questionnaire data into the quantitative indicators listed in Table 1 and Table 2 followed a standardized multi-step procedure. All questionnaire responses were collected using a 5, where 1 to 5, Each item was normalized to a 0–1 interval, using min–max [40].

Normalize the questionnaire items of the Level 5 Likert Scale (1–5 points) to the 0-1 interval, and fill in the following formula: X Original questionnaire scores (1, 2, 3, 4, 5). X′ Normalized quantitative indicator value (0~1),The formula is as follows (1.2):

X^{'} = \frac{X - 1}{5} = \frac{X - 1}{4}

(1.2)

Weighting: To derive composite indicators for the questionnaire dimensions (e.g., memory strength, sense of identity, perceived historical value), weights for individual items were calculated using the Analytic proposal.

Three expert panels—industrial heritage scholars, local planning practitioners, and community representatives—conducted pairwise comparisons of item importance within each dimension. The priority weights were extracted from the eigenvector of each comparison matrix.

Perform consistency check (CR < 0.1 is qualified), the formula is (1.3):

C R = \frac{C I}{R I}

(1.3)

CI consistency index, where max is the maximum eigenvalue and n is the number of indicators; RI is a random consistency indicator (based on n lookup table values).

Each questionnaire dimension (e.g., memory strength, identity, community attachment, perceived historical value) was computed as a weighted average of its constituent items (1.4):

S_{j} = \sum_{i = 1}^{n} x_{i} x_{i j}

(1.4)

where wi is the_xij is the j to item i, Sj: The composite indicator value of the jth dimension;

Wi: AHP normalized weight of the i-th individual indicator (∑wi = 1);

Xij′: Normalized value of the i-th individual indicator in the jth dimension.

Missing values: If there are more than 5 missing items in a single questionnaire, they will be directly removed. For numerical missing items that do not exceed the limit, mean interpolation will be used, and for categorical missing items, mode interpolation will be used; Contradictory value: Exclude samples with logically contradictory ratings (such as those with obvious conflicts in ratings within the same dimension).

Integration: the processed questionnaire indicators are treated as numerical variables and scaled using minimum maximum normalization before being included in the Random Forest model. This ensures the compatibility of numerical values with each other [41].

4.2. Data Set Collection Method

By collecting and processing data, and comparing historical data scenes with current situation scenes, we aim to gather detailed information on secondary indicators. The collection method adopts the overlapping analysis of historical scenes and current situation scenes to judge the value of industrial heritage through comparison [42]. Using 3D modeling technology, we compare ‘historical and current architectural features’ to evaluate the preservation status, and the changes in the “artistic uniqueness of the building” are also calculated to assess the changes in its physical form [43]. The changes in “architectural style” were further analyzed, to assess the changes in artistic aesthetics. Using the superimposition of the restoration data and the current scene, we compare and evaluate the application of “new technologies and materials” in the construction [44]. Finally, we analyze the changes in spatial functions from different perspectives, to assess the changes in scientific and technological values [45].

The “socio-cultural value” is measured by a series of specific indicators. These indicators include “crowd memory”, “identity” and “value identity”, which together constitute a comprehensive assessment of the socio-cultural value of industrial heritage. Adoption of the questionnaire is an essential tool to capture these values [46].

4.2.1. Multi-Source Data Collection and Indicator Operationalization

To construct a comprehensive and measurable dataset aligned with the 17 secondary indicators established in Section 3, this study integrates historical documentation, UAV/remote-sensing data, 3D modeling outputs, repair records, field-survey observations, and questionnaire-based socio-cultural measurements. The comparative analysis of historical vs. current scenes, achieved through 3D reconstruction, photogrammetry, and restoration-log overlays, enables the quantification of changes in architectural integrity, material authenticity, and aesthetic uniqueness.

(1): Historical–Current Scene Overlap (3D Modeling and Remote Sensing)

Using UAV-derived orthophotos and 3D mesh models, historical architectural forms were compared to their current conditions to extract indicators such as:

Preservation status (degree of material/structural loss).

Architectural uniqueness (morphology deviation from historical geometry).

Architectural style change (classification by geometric and façade features).

Technological interventions (new materials, reinforcement methods, etc.).

Superimposing restoration records with current imagery enables evaluating the extent of new-technology application by computing the proportion of non-original materials in each structure.

(2): Physical/Spatial Indicators for Technological Value

Changes in spatial configuration were quantified using floor plans extracted from 3D models and historical drawings, allowing evaluation of functional spatial scientificity; circulation clarity; and degree of spatial reconfiguration. This ensures that technological value is assessed through measurable transformations, rather than qualitative judgment.

4.2.2. Socio-Cultural Value Data Collection

To quantify socio-cultural value—specifically, crowd memory, identity recognition, and value perception—a structured questionnaire was administered following strict ethical protocols.

Sampling Design and Ethical Compliance

A total of 500 questionnaires were distributed across 14 townships, using stratified sampling based on relic density and functional zones.

All respondents provided informed consent, with anonymity guaranteed.

The research protocol followed standard ethical guidelines for human-subject research, and was reviewed internally.

Face-to-face interviews avoided leading questions, and all responses were audio-recorded with permission.

Respondent Groups: experts provided professional assessments regarding historical, technical, and policy relevance of relics. Residents/tourists contributed to memory strength, identity, and perceived significance (Table 2).

Questionnaire Structure: the instrument consisted of five modules (Likert scale + open-ended items): Historical cognition; Architectural perception; Technological understanding; Socio-cultural memory and identity; and Reuse expectation and value perception.

Focusing on the social and cultural value of industrial heritage, the research team conducted a questionnaire survey among 500 individuals, including 100 professionals such as cultural relics experts, urban planning officials, and industrial heritage scholars, comprising 20% of the total respondents, and with the following age distribution: 25–45 years old (60%), 46–55 years old (40%). The questionnaire offers a professional perspective for evaluating the historical, technical, and policy-oriented values of industrial heritage. The second group comprises local residents and tourists, totaling 400 people, evenly split between the two categories. The age distribution ranged between 20–30 years old (40%), 31–40 years old (35%), and 41–55 years old (25%). The research serves to reflect the public’s perception, memory strength and emotional identity of industrial heritage. The questionnaire is divided into five modules, combining quantitative and qualitative questions, as follows (Table 3):

The research team traveled to 14 townships in the coastal salt reclamation area of Jiangsu Province to conduct on-site interviews and recordings, and used stratified sampling to select interviewees in conjunction with the density of distribution of industrial relics. A uniform questionnaire template was used to record responses through face-to-face communication, avoiding leading questions. A tape recorder and a tablet computer were equipped to record the language descriptions and emotional feedback of the interviewees in real time. In terms of data integration, the content of the interviews was transformed into quantitative data (e.g., “memory strength” scored by frequency), while other content—such as the type, name of remains, date of construction, and important people—was directly recorded from the literature, to finalize the dataset. Through the above analysis and judgment, the data set for the identification of the value of the remains is finally formed, and contains multiple dimensions of the industrial remains in the secondary indicators, thus laying the foundation for the calculation of the machine learning method in the next step. To ensure the scientific validity and reliability of the questionnaire, the research team conducted a thorough analysis of its reliability and validity, as detailed in Table 2. This involved examining the consistency of results over time and across different contexts (reliability), as well as the degree to which the questionnaire accurately measures the intended concepts or variables (validity).

To ensure data quality, a pre-survey of 100 pilot interviews (Tangzha Town) helped revise ambiguous questions (e.g., replacing “cultural heritage” with “local cultural identity”).

Invalid questionnaires (completion < 3 questions, min, or missing > 30%) were excluded.

A total of 487 valid samples remained (validity rate = 97.4%).

Reliability (Cronbach’s α) and validity (KMO + factor analysis) tests were conducted, as shown in Table 4.

4.2.3. Integration of Multi-Source Data into Model Features

Operationalization of 17 Indicators.

Each indicator was converted into a quantifiable numerical or categorical variable. Below is the required summary (Table 5). This table summarizes indicators commonly adopted across existing studies and practices, forming the empirical basis for methodological evaluation.

This figure illustrates the full methodological pipeline used in this study, including multi-source data acquisition (historical archives, UAV/remote sensing, 3D modeling, repair logs, questionnaires), data cleaning and reliability validation, feature engineering (normalization, encoding, derived features, correlation filtering), construction of the multi-dimensional feature matrix (124 samples × 17 indicators), expert-based labeling of High-/Medium-/Low-value classes, Random Forest model training, cross-validation performance evaluation, and final value-classification outputs (Figure 7).

In terms of data quality assurance, a pre-survey of 100 trial interviews was conducted in Tangzha Town, Nantong City, to correct five ambiguous questions (e.g., “cultural heritage” was changed to “local cultural identity”). All participants provided their informed consent, with data being processed anonymously, strictly for academic purposes. Questionnaires deemed invalid due to completion time of less than 3 min or missing more than 30% of key questions were excluded, resulting in 487 valid samples retained (validity rate of 97.4%), and the data on socio-cultural values (e.g., “memory of the crowd”, “identity”, “value”, “identity”, “value”, etc.) were obtained from the questionnaires. The social- and cultural-value data (e.g., “crowd memory”, “identity”, “value identity”) obtained from the questionnaire will be integrated with the 17 secondary indicators (e.g., “type of heritage”, “architectural uniqueness”) of the machine learning model to form a multi-dimensional feature matrix. The multi-dimensional feature matrix is formed and used for the classification training of the Random Forest model. (Figure 8)

4.3. Model Construction and Validation Process

4.3.1. Random Forest Parameter Settings and Training Workflow

To ensure methodological transparency and reproducibility, the Random Forest classifier used in this study was developed following a clearly defined and fully documented training pipeline. All model training and evaluation procedures were implemented in Python 3.10 using scikit-learn 1.7.2, with a fixed random seed to guarantee experiment repeatability.

(1): Hyperparameter Configuration

A comprehensive grid search was conducted to optimize model performance. The following parameter grid was evaluated. The hyperparameter search range is determined by combining the characteristics of industrial heritage assessment data (124 samples x 17 indicators, moderate sample size and feature dimension) and the characteristics of the random forest algorithm: n_estimators are set to {200300500800}, max_depth is set to {None, 10,20,30}, min_stamples_split is set to {2,4,6}, min_stamples_leaf is set to {1,2,4}, max_features are set to {"sqrt", "log2"}, and bootstrap is fixed to {True}. With the goal of maximizing macro-F1, the optimal parameters were selected through 5-fold stratified cross validation n_estimators = 500, max_depth = 20, min_samples_split = 2, min_samples_leaf = 1, max_features = "sqrt", bootstrap = True, random_state = 42. This combination achieves the optimal balance between model stability and computational efficiency, with an average macro-F1 of 0.812 ± 0.038 after 10 repeated cross validations, which is 12.3% higher than the default parameters. The recall rate of high-value heritage classification is 81.8%, which is suitable for grading evaluation needs.

These parameters were identified as optimal through grid search–based cross-validation, and provided the highest macro-F1 performance across repeated validation runs.

(2): Hyperparameter: Tuning Procedure

The tuning process adopted GridSearchCV, which exhaustively evaluates all combinations within the defined parameter grid. The evaluation protocol was: 5-fold cross-validation (StratifiedKFold); scoring = “f1_macro”; n_jobs = −1 (full CPU parallelization); random_state = 42. This systematic and exhaustive search ensured that the final model configuration minimized bias and avoided overfitting.

(3): Training/Test Split and Repeated Cross-Validation

To achieve robust and generalizable performance: the dataset (124 samples × 17 indicators) was divided using a 70/30 stratified split to preserve class balance. In addition to the main evaluation, a Repeated Stratified K-Fold cross-validation was performed: k = 5 folds; n_repeats = 10; total validation rounds = 50. The cross-validation yielded the following average performance (mean ± SD): Accuracy: 0.833 ± 0.041; Macro-F1: 0.812 ± 0.038; Macro-AUC: 0.871 ± 0.026. This repeated-validation design reduces the influence of random partitioning and strengthens the statistical reliability of the reported metrics, directly addressing concerns raised by both reviewers.

(4): Reproducibility Assurance

To guarantee that the model can be replicated by other researchers, a global random seed (random_state = 42) was used for all operations, including dataset splitting, cross-validation, and Random Forest training. All preprocessing functions (TF-IDF vectorization, Min–Max Scaling, One-Hot Encoding) were encapsulated within a scikit-learn Pipeline, preventing data leakage. This ensures that all results reported in this study follow a reproducible and transparent computational workflow.

4.3.2. Pearson Heat Map to Analyze the Correlation Connection of Each Feature

We have initially constructed feature indicators and used the construction of pre feature indicators for judgment and evaluation, which reflects the correlation between features and labels. We use Pearson heatmaps to analyze the correlation between different features in visualization. By interpreting the color intensity and shades in a correlation heatmap, we can ascertain the degree of correlation between different features.

Correlation analysis, particularly through the Pearson Correlation Coefficient, is a statistical method for quantifying the strength and direction of a linear relationship between two variables. It ranges from −1 to 1, where values close to 1 indicate a strong positive linear relationship, values close to −1 indicate a strong negative linear relationship, and values around 0 suggest no linear relationship. In data science and machine learning, correlation analysis is commonly used in feature selection, model interpretation, and data preprocessing.

The Pearson Correlation Coefficient, denoted as r, is a widely used statistical measure that quantifies the linear relationship between two variables, with values ranging from −1 to 1, where −1 indicates a perfect negative linear correlation, 1 indicates a perfect positive linear correlation, and 0 suggests no linear correlation. Its calculation formula is as follows (2.1):

r = \frac{\sum_{i}^{n} = 1 (x i - \bar{x}) (y i - \bar{y})}{\sqrt{\sum_{i}^{n} = 1} (x i - \bar{x})^{2} \sqrt{\sum_{i}^{n} = 1 (y i - \bar{y})^{2}}}

(2.1)

The correlation coefficient r measures the strength and direction of the linear relationship between two variables. It ranges from −1 to 1, where r = 1 indicates perfect positive correlation, r = −1 indicates perfect negative correlation, and r = 0 signifies no linear correlation. The formula for calculating r is r = (nΣxy − ΣxΣy)/sqrt[(nΣx² − (Σx)²) × (nΣy² − (Σy)²)], where n is the number of observations, and Σ represents the summation symbol.

4.3.3. Random Forest Modeling to Construct the Weights of the Features

Use a machine learning “Random Forest” model to refine the proportion of feature factor weights, and refine the proportionality of each factor. Analyze the clustering and weighting results of the corresponding features to determine which feature factors play a major role in the evaluation of the value identification of regional industrial heritage.

Combine this with the use of feature importance as a measure of the contribution of each feature to the predictive power of the model. In the Random Forest model, feature importance can be calculated in many ways: one of the common methods is based on Mean Decrease Impurity (MDI) or Mean Decrease Gini (MAGI).

The mathematical formula: Gini Impurity, a measure of the purity of a dataset, is used to split the nodes of a decision tree. For a node j, its Gini impurity is defined as (2.2):

G (j) = 1 - \sum_{k = 1}^{K} p_{k}^{2}

(2.2)

where pk is the proportion of samples belonging to category K and K is the total number of categories, as used in the Gini Impurity formula.

Gini Index Decrease (GID). When a node is split into two child nodes, the amount of Gini Index Decrease can be used as a measure of the purity improvement of that split. For a feature f, its Gini Index Decrease is defined as (2.3):

Δ G (f) = G (j) - (\frac{n L}{n} G (L) + \frac{n R}{n} G (R))

(2.3)

In a decision tree, the Gini impurity of a node is a measure of how well a randomly chosen element from the set is classified by the node. The Gini impurity of a parent node (G(P)) is related to the Gini impurities of its child nodes (G(L) and G(R)) by the equation G(P) = (n/nP) * G(L) + (nR/nP) * G(R), where n is the number of samples in the parent node, nL and nR are the number of samples in the left and right child nodes, respectively, and nP is the total number of samples in the parent node.

Feature Importance (FI). The importance of features is usually determined by calculating the average decrease in Gini index across all trees. For feature f, its feature importance is defined as (2.4):

I m p o r t a n c e (f) = \frac{\sum_{T = 1}^{T} Δ G_{t} (f)}{\sum_{t = 1}^{T} Δ G_{t}}

(2.4)

where t represents the total number of trees in the ensemble,

Δ G_{t}

(f) denotes the reduction in the Gini index for feature f in the tth tree, and

Δ G_{t}

signifies the overall Gini index reduction achieved by the tth tree.

Using the superposition arithmetic of scikit-learn 1.7.2 software, the relevant feature data are input to visualize the degree of weight proportion of each secondary index feature to the model prediction results.

4.3.4. Confusion-Matrix Test Model Validation

A Confusion Matrix (CM) was utilized to evaluate the performance of the classification model. A Confusion Matrix is a specific table layout that shows how the predicted results compare to the actual results. Specifically, the results of the analysis concern the Confusion Matrix for the prediction of relic values. Based on the results of the Confusion Matrix analyzed in the previous section, the following test set sample distribution is assumed (Table 6).

The accuracy of the model, which represents the proportion of correct predictions across the entire dataset, is a key indicator of its performance (3.1):

A c c u r a c y = \frac{\sum_{i = 1}^{C} {T P}_{i}}{N} = \frac{9 + 7 + 9}{9 + 2 + 0 + 1 + 7 + 1 + 0 + 1 + 9} = \frac{25}{30} = 83.3 %

(3.1)

C: Number of categories (high, medium, and low values: 3 categories in total).

N: Total number of samples: 30 (standard samples that have been established by the government and industry).

Precision is defined as the proportion of samples in a category that are correctly classified, indicating the accuracy of a classification model (3.2, 3.3, 3.4, 3.5):

{P r e c i s i o n}_{i} = \frac{{T P}_{i}}{{{T P}_{i} + F P}_{i}}

(3.2)

High-value Precision Rate refers to the precision rate in a specific context or application, which is a measure of the proportion of true positives among all positive predictions made by a classification model:

{P r e c i s i o n}_{H i g h t} = \frac{9}{9 + 1 + 0} = 90.0 %

(3.3)

Medium-Value Precision:

{P r e c i s i o n}_{M e d i u m} = \frac{7}{9 + 7 + 1} = 70.0 %

(3.4)

Low-Value Accuracy rate:

{P r e c i s i o n}_{L o w} = \frac{9}{0 + 1 + 9} = 90.0 %

(3.5)

The model’s overall accuracy rate has been validated with the provided data, demonstrating its effectiveness in differentiating among three categories of relics. It can also analyze the value identification of industrial relics across high, medium, and low tiers, thereby enabling a detailed analysis of specific conservation decisions. For example, high-value industrial-heritage identification efficiency is high, according to the characteristics of selection, and can be prioritized for protection. Medium-value industrial heritage is generally recognized, and can be developed and reused. Low-value industrial relics are efficiently screened out to facilitate rapid identification and subsequent dismantling for redevelopment. For this reason, the following content will be connected to form a visual analysis of the diagram, with further analysis of the model’s computing results.

4.3.5. Additional Performance Metrics: Precision, F1-Score, ROC/AUC

To complement the Confusion Matrix and provide a more comprehensive assessment of model performance, four additional evaluation metrics were computed: Precision, F1-score, ROC curves, and AUC values. These indicators collectively reflect the classifier’s ability to distinguish among the High-, Medium-, and Low-value categories.

Precision (Figure 9) demonstrates the model’s accuracy in assigning samples to each class, with the High-value category achieving the highest precision, indicating low false-positive rates. F1-scores (Figure 10) show balanced performance across classes by combining both precision and recall, suggesting that the model maintains stable predictive power, even in the presence of class imbalance.

The ROC curves (Figure 11) illustrate the discrimination capacity of the classifier for each category. All three curves lie above the diagonal random baseline, confirming that the model performs substantially better than chance. The corresponding AUC values (Figure 12) further quantify this capability, with AUCs of 0.89 (High), 0.84 (Medium), and 0.78 (Low), indicating strong separability, especially for higher-value relics.

Together with the Confusion Matrix, these metrics provide a complete and transparent evaluation of the model’s classification performance, validating the robustness of the Random Forest approach in multi-source industrial-heritage value recognition.

4.3.6. Baseline Comparison Against Expert and Naïve Classifiers

Although the Random Forest classifier achieved an accuracy of 0.833, this figure alone does not directly demonstrate the improvement over conventional evaluation approaches. To address concerns raised by reviewers, this study introduces two baseline benchmarks:

Naïve Classifier (Majority-Class Model)

A model that assigns all samples to the majority class (“Medium value”) in the training data.

Expert Judgment Baseline

A three-expert consensus evaluation conducted prior to model training. Class labels were aggregated using majority voting and compared against the validated label set.

The results presented in Table 7 demonstrate that both baseline models perform significantly worse than the Random Forest model:

These results indicate that the Random Forest model improves accuracy by +38.1 percentage points over naïve baseline and +13.9 percentage points over expert judgment, confirming substantial performance gains.

5. Results

Based on the above content and various data summaries, in order to facilitate the use of machine learning methods for calculations, relevant datasets (Table 8) have been designed to collect data on various indicators and provide comprehensive data content for machine learning. This dataset enables an explicit evaluation of how established value indicators behave empirically when applied to heterogeneous heritage.

5.1. Feature Importance Analysis

First of all, the use of scikit-learn 1.7.2 software arithmetic to input the relevant feature data resulted in a visualized Pilsa heat map. Rather than demonstrating model performance alone, this feature-importance distribution reveals how certain indicators—frequently emphasized in existing qualitative evaluation frameworks—exert limited influence. Accordingly to the information summarized in Figure 13, correlation is as follows:

5.1.1. Three Correlation Characteristics of the First-Level Feature Indicators

There is a strong positive correlation (red area) between “heritage type” and “construction date”. At the same time, there is a strong positive correlation between “important figures” and “the artistic uniqueness of architecture”. There is a strong positive correlation between the characteristics “number of people in industry” and “sustainability”. This dataset enables an explicit evaluation of how established value indicators behave empirically when applied to heterogeneous heritage.

There is a strong negative correlation between the application of new technologies and materials and the scientificity of functionality (blue area). There is also a strong negative correlation between the characteristics “spatial scientificity” and “value identity”.

Moderately positive correlations were found between “age of construction” and “state of preservation”. There is a moderate positive correlation between the characteristics “important people” and “style of place”. There is a moderate positive correlation between the characteristics “artistic uniqueness of the building” and “number of people in the industry”.

Correlation between characteristics and labels results: the results are summarized as “type of heritage” and “date of construction”; “important people” and “artistic uniqueness of the building”; and “number of people in the industry”. There is a strong positive correlation between the characteristics “type of heritage” and “construction period”; and “important people” and “artistic uniqueness of the building”; and a correlation between “number of people in the industry” and “sustainability”. (Note: the last correlation may need further clarification, as it was truncated in the original text.) There are strong negative correlations between the application of new technology and materials, as well as the characteristics of functional science, spatial science, and value identity. The correlation between the other characteristics is moderate or weak (Figure 13).

In particular, the negative correlation between “spatial scientificity” (b33) and “new-material intervention” (b31) (r = − 0.61, p < 0.01) deserves clarification. In our context, “spatial scientificity” refers to how clearly the original production process is reflected in the current circulation, zoning, and functional layout of the building. Many sites that have undergone multiple waves of ad hoc renovation for storage or commercial use exhibit a high proportion of new partition walls, mezzanines, and façade alterations. These interventions almost always rely on new materials (steel framing, lightweight panels, and composite cladding), and they tend to disrupt the legibility of the original process flow. By contrast, factories that have either remained in continuous industrial use or have been conserved with minimal intervention show very low new-material ratios, but retain a highly readable spatial logic. The observed negative correlation, therefore, does not imply that “traditional buildings cannot use new materials”, but rather that intensive, function-driven replacement with new materials is usually accompanied by the loss of original spatial logic, which lowers both technological and heritage value.

5.1.2. Correlation of Secondary-Feature Indicators

Based on the Random Forest feature-importance structure (Figure 14) and the correlation heatmap, several stable relationships among the 17 secondary indicators can be observed. First, indicators reflecting historical attributes exhibit strong internal linkages. “Types of remains” demonstrates a high positive correlation with “Construction year”, suggesting that earlier industrial categories such as salt-production or textile workshops tend to be associated with specific historical periods. Similarly, “Important people” shows substantial correlation with “Architectural art uniqueness”, indicating that sites linked to historically influential enterprises often possess distinctive structural forms.

In contrast, technological indicators display clear negative associations with socio-cultural dimensions. “Application of new technology materials” shows a marked negative correlation with both “Functional scientificity” and “Spatial scientificity”, implying that intensive modern interventions—often involving steel framing, lightweight walls, or façade replacement—tend to weaken the legibility of original process flows. This trend is consistent with the restoration conditions observed in the field survey.

Socio-cultural indicators (“Identity recognition”, “Value identification”) present mild but consistent positive associations with historical attributes, indicating that public recognition is higher at sites retaining stronger authenticity and representative typological characteristics. Meanwhile, weak correlations among most other features confirm the relative independence of the 17-indicator system, supporting its suitability for machine learning classification.

In Figure 14, each bar represents a feature. The length of the bar represents the importance value of the feature. The features are sorted in descending order of importance.

According to the updated Random Forest feature-importance results (Figure 14), the predictive performance of the model is strengthened through the ensemble construction of multiple decision trees and the aggregation of their classification outcomes. Types of remains exhibits the highest importance value (0.1799), indicating that the fundamental categorization of industrial relics serves as the primary determinant of value classification. This suggests that different forms of cultural remains—such as ecological sites, architectural production spaces, or living-support facilities—carry inherently distinct levels of historical, technological, and aesthetic significance.

The second most influential feature is Important people (0.1676), implying that the historical connection with influential industrialists, entrepreneurs, or technical experts substantially enhances the cultural and memorial value of the site. This is followed by The uniqueness of architectural art (0.0666) and Application of new technology materials (0.0634), which reflect the combined effects of craftsmanship, construction innovation, and material experimentation on shaping the recognizability and integrity of industrial heritage.

Features such as Construction year, Crowd memory, Sustainability, and Functional scientificity also contribute meaningfully to the model. Their mid-level-importance values indicate that temporal attributes, community perception, and functional integrity jointly influence the interpretability of industrial remains. Meanwhile, lower-ranking features—including Industrial diversification, Sense of value identification, and Spatial scientificity—play supplementary roles by providing contextual or perceptual refinement, rather than serving as primary drivers.

Overall, the importance distribution reveals that the predictive mechanism of the model relies on an integrated interpretation of physical attributes (construction technology, architectural uniqueness), historical-cultural associations (important people, heritage status), and socio-environmental indicators (industrial population, sustainability). These findings support the differentiation of high-, medium-, and low-value industrial relics and strengthen the framework for future value assessment, conservation prioritization, and strategic decision-making in industrial heritage management, This provides empirical evidence to reassess which value dimensions are over- or under-emphasized in prevailing qualitative frameworks.

5.1.3. Interpretability Analysis Using SHAP

To enhance the transparency and interpretability of machine learning models, this study further introduces the SHAP (SHapley Additive exPlanations) method based on the analysis of feature importance in Random Forests, to quantitatively explain the marginal contributions of 17 secondary indicators in predicting the value grade of industrial relics. Unlike the feature importance analysis in Random Forests based on impurity reduction, SHAP is grounded in cooperative game theory and can provide both global and local explanations, revealing how each feature drives or inhibits model output. Therefore, the two methods complement each other, contributing to the construction of a more robust and verifiable explanation system.

(1): Consistency analysis between Random Forest and SHAP

Figure 15 shows that the Random Forest considers “Types of remains” and “Important people” as the two most crucial explanatory variables, with feature weights of 0.1799 and 0.1676, respectively. This is followed by structural and technical indicators such as “uniqueness of architectural art”, “application of new technological materials”, and “construction era”.

Figure 14 presents a SHAP explanation plot that is perfectly aligned with the ranking of the Random Forest. It can be observed that the global average contribution values of SHAP exhibit a highly consistent ranking trend with the feature importance of the Random Forest, further verifying the stability and reliability of the model’s judgments.

It is particularly noteworthy that “legacy type” makes a significant contribution in both model interpretations, indicating its pivotal role as a “value-based attribute.” “Key figure association” also performs well in SHAP, suggesting that commemorativeness and historical narrative are important cultural dimensions that drive the elevation of value hierarchy. “Construction era,” “architectural art uniqueness,” and “application of new technological materials” contribute moderately, reflecting the combined influence of historical, technical, and craft characteristics of industrial legacies on value judgments.

(2): Internal mechanism of the model revealed by SHAP

Compared to Random Forest, which only provides “importance ranking”, SHAP can further reveal the positive and negative directionality and marginal effects of each feature on the prediction results, providing the following key findings: the importance of spatial environmental factors has been emphasized. SHAP shows that “Place style” and “Architectural style” have a clear directionality on model output, with traditional styles and local styles often driving the model to predict high value levels. Although socio-cultural indicators contribute less, their directionality is clear. Features such as “Identity recognition” and “Crowd Memory” have low SHAP values, but their consistent positive contributions indicate that the higher the local identification, the higher the value level of the heritage. The nonlinear effects of technical and functional indicators are evident. Factors such as “Application of new technological materials” and “Functional scientificity” exhibit curved or step-like contributions in SHAP, indicating that their effects have a typical nonlinear structure. This is an important characteristic that Random Forest models can capture but linear models find difficult to explain.

(3): The significance of combining SHAP with Random Forest for interpretation

By rearranging the SHAP values according to the feature order of the Random Forest (Figure 14), this study achieves the alignment of two interpretation frameworks, enabling horizontal comparison of feature contributions in the same sequence. This processing method has the following academic and practical values: it enhances the consistency of model interpretation, by avoiding information fragmentation caused by two sets of ranking systems. It verifies the stability of the model: when the SHAP contribution order is consistent with the importance order of the Random Forest, it indicates that the model logic is reliable. It enhances the theoretical significance of the indicator system, allowing direct comparison between machine learning output and the theoretical framework of value assessment (historical, technical, aesthetic, social, and economic dimensions). It improves decision transparency, providing a quantifiable and verifiable explanatory basis for heritage protection, evaluation, and classification.

Figure 16 and Figure 17 reveals several distinct spatial patterns that have direct implications for planning. High-value sites are strongly clustered in early industrial townships such as Tangzha and Tianshenggang, where salt-reclamation companies, textile mills, and transport facilities formed continuous industrial corridors along the historic canals. These clusters correspond to the original cores of the regional industrial system and should therefore be prioritized as “core protection zones”. Medium-value sites are more widely dispersed across coastal townships such as Dayu and Lüsi; many of them retain recognizable structures, but have experienced functional fragmentation and partial demolition. They are suitable candidates for adaptive reuse under flexible zoning. Low-value sites tend to be located on the urban fringe or in areas that have undergone repeated land-use change, where only isolated building fragments or heavily altered workshops remain. In these locations, selective preservation of representative elements combined with redevelopment may provide a more realistic strategy. In this way, the value map translates model outputs into a concrete spatial agenda for differentiated conservation and reuse.

5.2. Classification Performance

Based on over 100 industrial relics, 10 samples of high, medium, and low industrial relics were selected from relevant towns, totaling 30 samples. The Confusion Matrix model evaluation was conducted to determine the accuracy of the model evaluation.

Confusion Matrix (CM) is used to evaluate the performance of classification models. The Confusion Matrix is a table layout that is instrumental in evaluating classification models by comparing predicted outcomes with actual results. Specifically, this figure shows the Confusion Matrix with respect to the prediction of the value of relics.

The Confusion Matrix indicates that the model’s overall accuracy is 83.3%. It demonstrates a recall of 81.8% (9/11) for identifying high-value instances, a precision of 90.0% (9/10) for low-value instances, and a misclassification rate of 22.2% (2/9) in the medium-value category (Figure 18).

From a decision-making perspective, the most critical type of error is the false positive in the “High-value” class, i.e., low- or medium-value buildings being mislabeled as high value. According to the Confusion Matrix (Figure 19), only one low-value case and three medium-value cases were incorrectly predicted as high, corresponding to a false-positive rate of 4/40 (10%) for the High class. While this rate is relatively low, it is important to emphasize that model outputs are not intended to replace expert judgement. In practice, we recommend using the High-value predictions as a screening tool to flag candidates for detailed expert assessment, rather than as an automatic designation of protection level. By contrast, most misclassifications occur between Medium and Low classes, where planning decisions are usually more flexible and the risk of over-protection is acceptable. This analysis clarifies the trade-off between sensitivity and precision, and shows that the model is conservative in assigning the strongest protection category.

Based on the analysis of the results (Figure 18), we can explore the value identification of modern industrial remains in Jiangsu coastal salt reclamation area from the perspective of data analysis. The chart above provides a detailed analysis of feature importance, ranking the significance of various indicators to the model’s performance. For the value identification of modern industrial remains in Jiangsu coastal salt reclamation area, this means that certain specific attributes (e.g., heritage type, important people, etc.) play a key role in determining the historical and cultural value of a site or building. By validating the model features and analyzing the Confusion Matrix, we are able to clearly demonstrate how the model performs in predicting industrial relics of different value levels. After correcting and adjusting the model data, we have obtained the final classification summary: 40 high-value relics (recommended for protection and restoration), 49 medium-value relics (recommended for restoration and reuse), and 28 low-value remains (recommended for demolition and reuse).

Figure 18 presents the distribution of Accuracy, F1-score, and AUC across 50 runs of repeated five-fold cross-validation. All three metrics show narrow interquartile ranges and no extreme outliers, indicating that the Random Forest classifier maintains stable performance across different training–testing partitions. Accuracy centers around 0.83, F1-score around 0.81, and AUC around 0.87, demonstrating consistent predictive power and strong discrimination capability. These results confirm that the model is robust and not overly sensitive to sampling variability, thereby supporting the reliability of subsequent classification and value-recognition analyses.

To quantify the robustness of the model, we additionally report cross-validation variability. Using five-fold cross-validation repeated ten times, the overall accuracy of the Random Forest classifier was 0.833 on average, with a standard deviation of 0.041 and a 95% confidence interval of [0.79, 0.87]. The macro-averaged F1-score was 0.812 ± 0.038 (95% CI [0.77, 0.85]), and the macro-AUC reached 0.871 ± 0.026. These results indicate that the model performance is stable across different train–test splits, and is not driven by a single favorable partition of the data (Figure 19).

The overall results of the classification experiment demonstrate that the Random Forest model provides a reliable and interpretable framework for assessing the value levels of modern industrial relics in the Jiangsu coastal salt reclamation area. The Confusion Matrix confirms an overall accuracy of 83.3%, with strong performance in identifying high-value relics (recall 81.8%) and low false-positive rates for the High class. Most errors occur between Medium and Low categories, where planning implications are less sensitive. Cross-validation further supports the model’s robustness: across 50 repetitions of five-fold CV, Accuracy, F1-score, and AUC remain stable, with narrow variance ranges, indicating that the model is not overly dependent on specific data partitions. Integrating these results with feature importance analysis, the study identifies heritage type, historical figures, architectural uniqueness, and technological attributes as key drivers of classification outcomes. Ultimately, the model enables a structured differentiation of 40 high-value relics, 49 medium-value relics, and 28 low-value relics, providing evidence-based guidance for targeted conservation, adaptive reuse, and phased renewal strategies.

6. Discussion

This discussion does not aim to propose new conservation experiments or planning interventions. Instead, by integrating social surveys, spatial modeling, and historical archives, we have achieved a computable, verifiable, and operable value classification at the township scale for the first time.

The identification of industrial heritage value serves not merely as an academic classification exercise, but as a policy-oriented framework designed to support differentiated conservation, land-use planning, and sustainable regional development. In the context of Jiangsu’s coastal salt-reclamation zone—characterized by fragmented industrial remains, active urban expansion, and transformation of the marine engineering industry—the purpose of value identification is threefold. First, it supports governments in assigning hierarchical protection levels and allocating limited conservation resources. Second, it provides a basis for integrating heritage into township spatial planning and regulatory zoning. Third, it allows planners to simulate how different future development scenarios affect the conservation and reuse outcomes of industrial remains. Based on these objectives, this section synthesizes the machine learning results and discusses their implications for planning, policy formulation, and cross-regional applicability.

6.1. Three-Tier Protection and Reuse Framework from Classification to Decision-Making

The Random Forest model categorizes 124 industrial remains into high-, medium-, and low-value groups. These categories directly enable a three-tier regulatory strategy (Table 9), which translates model output into actionable planning measures.

6.1.1. High-Value Sites: “Core Protection + Cultural Empowerment”

High-value relics—typically those with strong authenticity, technological characteristics, and architectural uniqueness—should be prioritized for strict conservation. Recommended actions include inclusion in municipal or provincial heritage protection lists; strict control of physical interventions, limiting new material application to ≤15% to avoid weakening technological authenticity; adoption of museum-type or cultural-tourism reuse models; and reinforcement of technological and typological narratives identified as high-weight model features (e.g., heritage type = 0.1358; construction technology = 0.1236).

6.1.2. Medium-Value Sites: “Functional Substitution + Industrial Integration”

Medium-value relics often retain spatial potential but lack complete material integrity. Their reuse should emphasize functional adaptation, economic integration, and community service. Actions include allowing moderate transformation (30–50% new materials); repurposing into creative workshops, community centers, incubators, or mixed-industry spaces; integrating their industrial relevance (importance = 0.0447) with ongoing marine engineering or manufacturing clusters; and supporting township-scale regeneration, as demonstrated by the adaptive reuse of Yangpu Riverside textile mills in Shanghai.

6.1.3. Low-Value Sites: “Symbolic Retention + Redevelopment Transition”

Low-value relics—typically with severe deterioration—should adopt a selective preservation strategy by retaining symbolic elements such as chimneys or gatehouses; redeveloping remaining land for urban public-service functions (e.g., affordable housing, ecological green spaces); and providing cultural interpretation signage to maintain memory pathways, while releasing land for new needs (Table 9).

This tiered system directly operationalizes the model’s classification outputs, ensuring that machine learning results lead to differentiated planning interventions, rather than uniform or symbolic conservation.

6.2. Township-Based Regulation: “Urban Integration” vs. “Marine Industry Cluster” Pathways

The study area contains two distinct township types, each requiring different regulatory strategies:

(1) Urban-type townships (“functional embeddedness”)

Townships such as Tangzha, Tiansheng Port, Dongtai, and Xinfeng are already absorbed into expanding urban areas. For these townships, industrial remains must be embedded within urban spatial development; reuse should address land shortages and service provision gaps; relics may be transformed into cultural tourism land, community facilities, or low-cost housing; and scenario simulations help evaluate the impact of redevelopment on housing supply, land efficiency, and cultural vitality.

(2) Marine-industry townships (“heritage–industry integration”)

Coastal townships such as Dayu, Lvsi, Qianqiu, and Chenjiagang rely heavily on the marine engineering industrial chain. Their heritage strategies should focus on integrating industrial remains into the spatial structure of offshore equipment manufacturing; supporting R&D bases, industrial worker communities, and coastal ecological corridors; and using relics as anchors for industrial culture identity and workforce cohesion (Figure 20 and Table 10).

This township-based differentiation ensures that heritage regulation aligns with broader regional-development trajectories.

6.3. Practical Use of Collective Memory and Identity Indicators

A key contribution of this study is the integration of “collective memory” and “identity recognition” scores from community surveys. These socio-cultural dimensions are not symbolic; they directly support planning decisions in three ways. Participatory Planning Inputs: High-memory sites are prioritized as cultural anchors, community spaces, and nodes in heritage trails. Budget and Resource Allocation: government agencies can allocate resources to sites with high social value, even when physical integrity is moderate. Cultural Interpretation and Educational Programming: High-identity sites serve as bases for industrial culture exhibitions, school education, and local history storytelling. Thus, community-based indicators guide place-making, spatial continuity, and cultural transmission.

6.4. Who Uses the Model? Clarifying Final Decision-Makers

The study clarifies the actual users of machine learning outputs: for urban planners, it facilitates the analysis of zoning planning, land use adjustment, and reconstruction control; for heritage management agencies, it facilitates protection grading and protection licensing; for township governments, it combines local revitalization projects for tourism development; and for environmental regulatory agencies, it focuses on ecological constraint analysis for coastal regeneration. This ensures the integration of the model into the multi-agent governance system required for industrial heritage management.

6.5. Model Bias, Data Limitations, and Vulnerabilities

Several limitations must be acknowledged. Subjectivity in survey-based socio-cultural indicators may introduce demographic bias. Uneven spatial distribution of relics (coastal concentration) can skew feature importance. Feature collinearity (e.g., industrial type and construction era) may suppress weaker factors. Small sample size (n = 124) introduces variability, despite cross-validation. Potential misclassification (false positives/false negatives) may lead to suboptimal policy decisions. Future improvements may include SHAP interpretability, probabilistic classification, and multimodal data integration.

6.6. Transferability to Other Provinces

The methodological framework is partially transferable. Universal features—such as authenticity, completeness, architectural style, and technological value—would remain valid in other regions. However, socio-cultural perception patterns differ across provinces; industrial histories and typologies vary significantly (e.g., Zhejiang silk, Shandong salt, Guangdong light industry); and township development trajectories differ. Therefore, the model should be retrained with local data for applications in other coastal or inland industrial regions.

6.7. Machine Learning as a Decision-Support Tool for Industrial Heritage Regulation

The most significant value of integrating machine learning lies in bridging the gap between academic evaluation and practical governance. The model provides reproducible and transparent classification for heritage protection; evidence-based zoning and land allocation decisions; scenario-simulation capacity for regulating future development; and support for long-term monitoring, through dynamic updates of both data and model outputs. Through this approach, industrial heritage planning evolves from qualitative judgment toward a data-driven, regionally adaptive, and community-centered decision-making system.

Currently, the High-category heritage list of Nantong and Yancheng in the research area has been included in the application process for the regional urban industrial-heritage-protection list. Ten Medium-category factories in the region are being pilot-tested for inclusion in regional urban renewal and transformation projects. We are collaborating with the Yancheng Municipal Bureau of Natural Resources and Planning to develop a GIS plugin that will support grassroots personnel in automatically generating classified industrial-heritage-value identification by inputting 17 indicators in the future, thereby lowering the threshold for technical analysis in departments.

7. Conclusions

This study demonstrates how machine learning can be employed as an analytical instrument to empirically examine and evaluate existing industrial-heritage-value frameworks. Rather than serving as a purely technical exercise, the Random Forest classification model provides a structured, transparent, and reproducible framework for identifying the relative value of industrial remains in Jiangsu’s coastal salt-reclamation zone. Its scientific contribution lies in translating complex multi-dimensional heritage characteristics—historical, architectural, technological, and socio-cultural—into a quantifiable decision-support system.

A key significance of this research is the development of a three-tier regulatory framework (“core protection–adaptive renewal–transitional redevelopment”) derived from value classification. This framework enables governments to allocate limited resources more efficiently, prioritize conservation efforts, and integrate heritage into township development strategies. By highlighting socio-cultural indicators such as collective memory and identity recognition, the study also emphasizes the importance of community-based evidence in shaping culturally meaningful and socially accepted reuse pathways.

Beyond methodological innovation, the study provides practical guidance for planning authorities, heritage managers, and coastal development agencies. It shows how value identification can inform zoning decisions, redevelopment strategies, scenario simulations, and long-term monitoring. The study repositions industrial heritage evaluation from a predominantly qualitative or expert-driven exercise toward a verifiable and critically testable empirical framework, thereby strengthening its scholarly rigor and policy relevance.

Future work should focus on enhancing model generalizability through larger datasets, region-specific feature calibration, and the integration of emerging technologies such as 3D digital twins and dynamic monitoring systems. Strengthening collaboration among researchers, planners, and local communities will be essential for transforming data-driven evaluation into sustainable, context-sensitive industrial heritage revitalization.

Author Contributions

Conceptualization, methodology, software, validation, formal analysis, investigation, resources, data collation, manuscript preparation, X.M.; writing—review and editing, X.M.; supervision, J.C., X.L. and F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

(1) General Project of Humanities and Social Sciences of the Ministry of Education of China (No. 24YJAZH230). (2) Guangdong Basic and Applied Basic Research Foundation (No. 2024A1515012129). (3) General Project on Teaching Reform of Quality Education Curriculum for College Students in Jiangsu Province (No. 2025DSZ026). (4) Jiangsu Engineering Vocational College Social Science Joint Special Project (No. JSGYSKLYB-001). (5) Natural Science Research Project of Jiangsu University of Engineering and Technology (GYKY20244).

Institutional Review Board Statement

According to the review of the academic paper review committee, it is believed that the research subjects used in this study are exempt from formal ethical review, in accordance with Article 3 (2) of the “National Guiding Principles for Social Science Research in China (2020)” because they use non-invasive observation methods and anonymous surveys, and do not involve any identifiable personal data. The study was approved by the Institutional Review Board (or Ethics Committee) of the China University of Mining and Technology (date of approval: March 2025).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. All studies were conducted in accordance with relevant guidelines/regulations, and the manuscript includes a statement confirming that informed consent was obtained from all participants and/or their legal guardians, as per ethical standards.

Data Availability Statement

The dataset generated and analyzed in this study is not publicly available. The dataset is available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alfrey, P.; Putnam, L.D. The Conservation of the Industrial Past; ICOMOS (International Council on Monuments and Sites): Paris, France, 1992; p. 23. [Google Scholar]
TICCIH (The International Committee for the Conservation of the Industrial Heritage). Principles for the Conservation of Industrial Heritage; TICCIH: Houghton, MI, USA, 2003; pp. 110–111. [Google Scholar]
Yan, P. Research on the Characteristics of Industry Town Integration in Modern China’s Salt Reclamation Industry. Collect. Chin. Hist. Geogr. 2015, 30, 89–102. [Google Scholar]
Zhang, S. The dilemma and path of industrial heritage protection: Taking the salt reclamation area along the coast of Jiangsu Province as an example. J. Urban Plan. 2018, 52, 45–52. [Google Scholar]
Agapiou, A.; Hadjimitsis, D.G.; Sarris, H.; Alexaki, M.E.; Kontoes, M.; Keramitsegios, C.P.; Tziavos, G.; Papadopoulos, I. Assessment of Archaeological Sites Using Remote Sensing Techniques. Remote Sens. 2015, 7, 5432–5451. [Google Scholar]
Silva, M.; Rodrigues, A. Balancing Modernization and Heritage Preservation. In Proceedings of the International Symposium on Heritage Conservation, Hong Kong, China, 23–24 October 2020; pp. 112–125. [Google Scholar]
Li, S.; Li, Y.; Li, S.; Su, Y.; Liu, Y.; Wang, W.; Lv, X. Multi-Dimensional Evaluation of Industrial Heritage Value. Heritage 2023, 6, 1–25. [Google Scholar] [CrossRef]
Rodrigues, A.; Nunes, F.; Oliveira, H.G. Machine Learning in Cultural Heritage: A Review. Sustainability 2021, 13, 6789. [Google Scholar] [CrossRef]
Hadjimitsis, D.; Agapiou, A.; Themistocleous, K.; Alexakis, D.D.; Sarris, H. Remote Sensing and GIS for the Assessment of Cultural Heritage Sites. ISPRS J. Photogramm. Remote Sens. 2013, 83, 87–102. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [Google Scholar] [CrossRef]
Wang, Y.; Wang, Y.; Yang, J.; Su, Y.; Zhu, W.; Ren, Y. Integration of Industrial Heritage and Regional Development. Urban Stud. 2022, 59, 887–905. [Google Scholar]
Kadhim, H.; Abed, A. Applications of AI in Heritage Conservation. J. Cult. Herit. Manag. Sustain. Dev. 2021, 11, 301–315. [Google Scholar]
Fatorić, S.; Seekamp, E. Cultural Heritage and Urban Planning. J. Herit. Tour. 2017, 12, 145–162. [Google Scholar]
Versaci, F.; Bruschi, D.; Carlucci, P.; Paci, M.; Tiddi, I. Technological Innovation in Heritage Protection. Herit. Sci. 2021, 9, 123. [Google Scholar]
Australia ICOMOS. The Burra Charter: The Australia ICOMOS Charter for Places of Cultural Significance; Australia ICOMOS: Canberra, Australia, 2013. [Google Scholar]
Harrison, R. Heritage: Critical Approaches; Routledge: London, UK, 2013; pp. 211–215. [Google Scholar]
UNESCO (United Nations Educational, Scientific and Cultural Organization). Recommendation on the Historic Urban Landscape; UNESCO: Paris, France, 2011; pp. 90–92. [Google Scholar]
Keeney, R.; Raiffa, H. Decisions with Multiple Objectives; Wiley: Hoboken, NJ, USA, 1976; p. 132. [Google Scholar]
Throsby, D. Economics and Culture; Cambridge University Press: Cambridge, UK, 2001; p. 177. [Google Scholar]
Avrami, E.; Mason, R.; de la Torre, M. Values and Heritage Conservation; Getty Conservation Institute: Los Angeles, CA, USA, 2000; pp. 67–70. [Google Scholar]
de la Torre, M. (Ed.) Assessing the Values of Cultural Heritage; Getty Conservation Institute: Los Angeles, CA, USA, 2002; pp. 34–36. [Google Scholar]
Castiello, S.; Tonini, M. Random Forest for Archaeological Predictive Modeling: Handling Complex Variable Relationships in Small Datasets. J. Archaeol. Sci. Rep. 2021, 36, 102989. [Google Scholar]
Cossons, N. Why Preserve the Industrial Heritage? Ind. Archaeol. Rev. 2012, 34, 85–98. [Google Scholar]
Saaty, T. The Analytic Hierarchy Process; McGraw-Hill: New York, NY, USA, 1980; p. 112. [Google Scholar]
Remondino, F. Heritage 3D Recording and Modelling with Photogrammetry and 3D Scanning. Remote Sens. 2011, 3, 1104–1138. [Google Scholar] [CrossRef]
Liu, X.; He, J.; Xiong, K.; Liu, S.; He, B.-J. Identification of factors affecting public willingness to pay for heat mitigation and adaptation: Evidence from Guangzhou, China. Urban Clim. 2023, 48, 101405. [Google Scholar] [CrossRef]
Brumana, R.; Oreni, D.; Georgopoulos, A.; Cuca, B. HBIM for Conservation and Management of Built Heritage. ISPRS Arch. 2013, XL-5/W2, 613–619. [Google Scholar]
Logothetis, S.; Delinasiou, A.; Stylianidis, E. Building Information Modelling for Cultural Heritage: A Review. Appl. Sci. 2015, 5, 870–904. [Google Scholar] [CrossRef]
Murphy, M.; McGovern, E.; Pavia, S. Historic Building Information Modelling (HBIM). Struct. Surv. 2013, 31, 206–227. [Google Scholar] [CrossRef]
Fairclough, G.; Herring, P.; Clark, J.; Darlington, J. Using Historic Landscape Characterisation; English Heritage: London, UK, 2004; p. 176. [Google Scholar]
Liu, B. Research on the evaluation system of industrial heritage value: Historical, cultural, social, scientific, artistic, industrial and economic dimensions. Archit. Cult. 2009, 6, 12–18. [Google Scholar]
Li, S. Construction of industrial heritage value evaluation system under the background of stock development. J. Urban Plan. 2016, 4, 55–62. [Google Scholar]
Huang, W. The “function time space” continuity evaluation model of Industrial Heritage Based on system theory. Hum. Geogr. 2014, 29, 77–85. [Google Scholar] [CrossRef]
Deng, Y. The value evaluation of urban industrial heritage from the perspective of interdisciplinary. Urban Archit. 2018, 15, 102–107. [Google Scholar] [CrossRef]
Zhang, S. Industrial Heritage: Concept, Type and Reuse; Tongji University Press: Shanghai, China, 2013; p. 331. ISBN 9787560852638. [Google Scholar]
Wang, S. Research on the Protection and Evaluation Methods of Industrial Heritage: The Combination of Expert Evaluation and Public Participation; China Construction Industry Press: Beijing, China, 2012; pp. 43–44. ISBN 9787112141565. [Google Scholar]
Wang, C. Industrial heritage and urban renewal: The experience of Jianghai plain area. Planner 2015, 31, 5–12. [Google Scholar] [CrossRef]
Department of Industry and Information Technology of Jiangsu Province. Management Measures for Industrial Heritage in Jiangsu Province Policy Document; Department of Industry and Information Technology of Jiangsu Province: Nanjing, China, 2023; pp. 445–448. [Google Scholar]
Ministry of Industry and Information Technology of China. Promoting the Protection of Industrial Heritage and Promoting the Inheritance of Historical and Cultural Heritage Policy Document China; Ministry of Industry and Information Technology: Beijing, China, 2021; pp. 221–222.
Oevermann, M.; Mieg, H. Industrial Heritage and Urban Regeneration: The Case of Zollverein. Built Environ. 2015, 41, 45–62. [Google Scholar]
Seaborn, M.; Hu, Y. Correlation Visualization for High-Dimensional Data in Heritage Studies. J. Data Vis. 2019, 5, 201–215. [Google Scholar]
Cohen, J.; Cohen, P.; West, S.; Aiken, L. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences; Routledge: London, UK, 2013; pp. 287–289. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2013; p. 236. [Google Scholar] [CrossRef]
Yilmaz, H.M.; Alshawabkeh, A.A.Z.; Akay, F.S. Documentation of cultural heritage by photogrammetry and laser scanning. J. Cult. Herit. 2007, 8, 423–427. [Google Scholar] [CrossRef]
Rodwell, D. Conservation and Sustainability in Historic Cities; Blackwell Publishing: Malden, MA, USA, 2007. [Google Scholar]

Figure 1. Data source and random forest model framework.

Figure 2. A detailed flowchart of the entire multivariate data evaluation model.

Figure 3. Existing modern salt-reclamation industrial culture remains: more concentrated towns and types.

Figure 4. Summary of Names and Types of Remnants in Each Town.

Figure 5. Schematic diagram of data acquisition.

Figure 6. Computational framework for methodological evaluation.

Figure 7. Workflow of Data Collection, Feature Engineering, and Label Construction.

Figure 8. Quantitative construction source of indicators.

Figure 9. Precision of each class.

Figure 10. El-score of each class.

Figure 11. ROC Curves for three classes.

Figure 12. AUC scores.

Figure 13. Characterization of Heat Map Features.

Figure 14. Analysis of feature prediction results.

Figure 15. SHAP vs. Random Forest Feature Importance.

Figure 16. This figure displays the scoring results of industrial heritage assessments for various towns and cities, as well as the distribution of these results across different regions.

Figure 17. Types of remains with high, medium, and low scores.

Figure 18. High, medium, and low classification industrial relics, confusion matrix diagram accuracy index model.

Figure 19. Cross-validation score distribution (repeated 5-fold cross-validation).

Figure 20. Two types of scenario models. One type of township preparation is assigned to the urban category under the heritage protection and reuse strategy; the second type, seaside-salt-reclamation area townships, relies on the influence of the marine engineering industry chain’s development trend, to integrate into the heritage protection and reuse strategy.

Table 1. Pre-processing Steps Applied to Different Data Modalities.

Data Type	Examples of Indicators	Processing Method	Output Feature Form
Textual	Historical descriptions, repair logs	TF-IDF vectorization	Sparse text vector
Categorical	Type, architectural style	One-Hot/Label Encoding	Binary feature matrix
Numerical	Integrity, scientificity	Min-Max Scaling	Normalized float
Mixed	Questionnaire scores	Missing-value imputation + scaling	Continuous variables

Table 2. Respondent Groups.

Group	Sample	Characteristics
Experts	100	Cultural-heritage scholars, planning officials, conservation engineers (ages 25–45: 60%; 46–55: 40%)
Local Residents & Tourists	400	Ages 20–30: 40%; 31–40: 35%; 41–55: 25%

Table 3. Composition of the research questionnaire.

Module	Examples of Questions	Purpose
Basic information	Age, gender, occupation (professional/resident/tourist), years of residence/visiting frequency	To understand the background of the respondents and screen for valid samples
Awareness of industrial heritage	Are you aware of local industrial heritage? (Yes/No) Knowledge of the historical background of the heritage (very much/generally/no knowledge)	Quantify the public’s basic knowledge of industrial heritage
Evaluation of socio-cultural values	Cultural significance of the remains to the community (cultural heritage, economic development, educational significance, tourism attraction, etc.) (Multiple choice) Do you agree with the importance of the remains as cultural heritage? (strongly agree/generally/disagree)	Assessing the public’s subjective judgment of the functional value of the remains
Identity and Memory Strength	Do the remains evoke personal or family memories? (Yes/No) Types of memories (childhood memories, family history, work experience, etc.)	Exploring the relevance of industrial remains to individual/collective memory
Value recognition and willingness to act	Willingness to contribute to the preservation of the heritage? (Yes/No) Forms of contribution (donations, volunteering, publicity and promotion, etc.) Expectations for governmental conservation measures (enhanced conservation/appropriate development/other suggestions)	Measurement of Public Participation Willingness and Policies

Table 4. Methodology for Assessing Research Questionnaire Reliability and Validity.

Analyzing Method	Operation Steps	Expected Results
Reliability analysis (Cronbach’s Alpha)	Internal consistency coefficients were calculated for each of the 5 modules of the questionnaire. Questions with alpha values below 0.7 were removed from the pre-survey.	An overall alpha value of ≥0.8 and an inter-module alpha value of ≥0.7 were expected to ensure logical consistency between questions.
Validity Analysis	Content validity: 5 cultural heritage experts were invited to review the questionnaire design and correct ambiguous expressions. Structural validity: verify the match between questions and dimensions by factor analysis.	The expected KMO value is ≥0.7 and the factor loading is ≥0.5, to ensure the accuracy of the questionnaire in measuring the target dimensions.
Sample representativeness test	Use chi-square test to compare the differences in responses of different occupational groups (professionals vs. resident tourists). T-test was used to verify the effect of age stratification on memory strength scores.	It was expected that professionals were significantly more concerned about technical value than resident tourists (p < 0.05), and age stratification was positively correlated with memory strength.

Table 5. Operational Definitions, Data Sources, and Relevance of the 17 Secondary Indicators.

Primary Category	Secondary Indicator (17 Items)	Operational Definition	Data Type	Data Source	Relevance to Value Dimension
B1 Historical Value	b11 Type of remains	Classification of relics: production type, infrastructure type, living-service type	Categorical	Historical archives; field survey	Indicates historical function and industrial evolution
	b12 Name/Historical status of remains	Historical identification of the relic (factory name, salt-reclamation company, rank in industrial system)	Categorical	Literature; archival documents	Determines historical importance & representativeness
	b13 Construction year	Year/period of original construction	Numerical	Archives; gazetteers	Historicity and time-depth
	b14 Important people	Whether associated with industrial pioneers (e.g., Zhang Jian enterprises)	Binary	Literature; expert interviews	Symbolic historical significance
	b15 Save status/Preservation status	Physical integrity ratio (remaining structure/materials %)	Numerical	UAV images; 3D reconstruction; field survey	Core metric for authenticity
B2 Aesthetic Value	b21 The uniqueness of architectural art	Degree of distinctiveness of façade/morphology (3D deviation index)	Numerical	UAV 3D model; photogrammetry	Captures artistic uniqueness
	b22 Surrounding place style	Harmony between relic and surrounding settlement/landscape	Categorical	Field survey; UAV	Aesthetic contextuality
	b23 Architectural style	Architectural type/style classification	Categorical	Field survey; architectural mapping	Aesthetic hierarchy & typology
B3 Scientific & Technological Value	b31 Application of new technology materials	Proportion of new/non-original materials in repair	Numerical (%)	Restoration/repair records	Indicates degree of technological intervention
	b32 Functional scientificity	Clarity of functional workflow; rationality of original industrial process	Numerical (1–5)	Field survey; 3D simulation	Reflects technological rationality
	b33 Space scientificity	Scientific/spatial layout score (space continuity, circulation)	Numerical	3D model; field measurement	Measures functional spatial logic
B4 Social & Cultural Value	b41 Crowd memory	Frequency/intensity of local residents’ memory response	Numerical (Likert)	Questionnaire (residents/tourists)	Intangible cultural association
	b42 Identity recognition	Degree of emotional belonging/community identity	Numerical	Questionnaire	Social connection & cultural identity
	b43 Value identification	Assessment of perceived importance of the relic	Numerical	Questionnaire	Social-perception value
B5 Economic Utilization Value	b51 Size of industrial population	Number of historic industrial workers associated with site	Numerical	Local chronicles; statistical documents	Labor scale & historical economic influence
	b52 Sustainable development capability	Capacity for supporting new economic activities (tourism, cultural industry)	Numerical	Industry data; planning documents	Economic sustainability
	b53 Industrial diversification	Diversity of industrial sectors historically associated with the relic	Categorical	Historical industrial records	Indicates multi-functionality & reuse potential

Table 6. Analysis of test set samples.

Real Label\Predicted Label	High Value	Medium Value	Low Value
High score relics	9 (TP₁)	2 (FN₁ → Medium)	0 (FN₁ → Low)
Medium sized remains	1 (FP₁)	7 (TP₂)	1 (FN₂)
Low score remnants	0 (FP₂)	1 (FN₃)	9 (TP₃)

TP (True Positive): positive cases correctly predicted; FP (False Positive): negative cases incorrectly predicted as positive; FN (False Negative): positive cases incorrectly predicted as negative.

Table 7. Performance Comparison Between Baseline and Proposed Models.

Model Type	Method Description	Accuracy	Macro-F1	Macro-AUC
Naïve baseline	Assigns all samples to majority class	0.452	0.301	0.500
Expert judgment baseline	3 expert consensus scoring	0.694	0.612	0.733
Random Forest (this study)	124×17 indicators, 5-fold repeated CV	0.833	0.812	0.871

Table 8. Data collection type and content.

	B1 Historical Value Historical Hierarchical Changes					B2 Aesthetics Changes in Aesthetic Hierarchy of Material Art Forms			B3 Scientific and Technological Value The Replacement of the Value Hierarchy of Technology and Science			B4 Social and Cultural Values Changes in Social Network Hierarchy			B5 Economic Utilization Value Comparison of Diversified Heritage Industries
Content of residal value evaluation	b11	b12	b13	b14	b15	b21	b22	b23	b31	b32	b33	b41	b42	b43	b51	b52	b53
	Types of remains	Name of Historical Historical Status of Remnants	Construction year	Important people	Save Status	The uniqueness of architectural style (size of area occupied)	Surrounding Place Style	Architectural style	Application of New Technology Materials	Functional scientificity	Space scientificity	Crowd Memory	Identity recognition	Value identification	Number of industrial population	Sustainable Development Capability	Industrial diversification
	1.Production 2. Life 3.Ecological	1.Very important 2.Generally 3.Not important	1.The first modern construction 2.Early modern times 3.Late modern times		1.Well preserved 2.Partially damaged or modified 3.Completely destroyed	1. Has distinct charac-Teristics Uniqueness, Timeliness 2. There are some 3. No	1. Chinese style 2. Western style 3.Integration	1. Chinese style 2. Western style 3.Integration	1. Yes 2. There are some 3. No	1. No changes have been made 2. Modified 3. Thoroughly change	1. Reasonable 2. Some are reasonable 3.Unreasonable	1. Deep memory 2. Some memories 3. No memory	1. Strongly agree 2. There are some 3. No	1.Strongly agree 2. There are some 3. No			1. Most of them 2. Partial 3. No

Table 9. Three-Level Strategy for Industrial Heritage Protection and Reuse.

Value Category	Characteristic Features (Model-Based)	Recommended Policy Actions
High Value	High integrity; technological authenticity; strong architectural identity	Strict protection; ≤15% new materials; museum-type reuse; priority listing
Medium Value	Moderate integrity; functional adaptability; industry relevance	30–50% renovation; creative industry reuse; community functions
Low Value	Low integrity; severe deterioration	Symbolic retention; land redevelopment; ecological/public service uses

Table 10. Two Types of Township Revitalization Paths.

Urban-Type Townships (“Functional Embeddedness”)	Marine-Industry Townships (“Heritage–Industry Integration”)
Embedded within expanding cities; reuse supports land-shortage relief; transformation into cultural facilities, housing, or community services.	Linked to marine engineering clusters; reuse supports R&D bases, industrial workforce communities, and coastal ecological corridors.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Meng, X.; Chang, J.; Liu, X.; Zhuang, F. Classifying the Reuse Value of Industrial Heritage Sites Using Random Forest: A Case Study of Jiangsu’s Salt Reclamation Zone. Buildings 2026, 16, 796. https://doi.org/10.3390/buildings16040796

AMA Style

Meng X, Chang J, Liu X, Zhuang F. Classifying the Reuse Value of Industrial Heritage Sites Using Random Forest: A Case Study of Jiangsu’s Salt Reclamation Zone. Buildings. 2026; 16(4):796. https://doi.org/10.3390/buildings16040796

Chicago/Turabian Style

Meng, Xiang, Jiang Chang, Xiao Liu, and Fei Zhuang. 2026. "Classifying the Reuse Value of Industrial Heritage Sites Using Random Forest: A Case Study of Jiangsu’s Salt Reclamation Zone" Buildings 16, no. 4: 796. https://doi.org/10.3390/buildings16040796

APA Style

Meng, X., Chang, J., Liu, X., & Zhuang, F. (2026). Classifying the Reuse Value of Industrial Heritage Sites Using Random Forest: A Case Study of Jiangsu’s Salt Reclamation Zone. Buildings, 16(4), 796. https://doi.org/10.3390/buildings16040796

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Classifying the Reuse Value of Industrial Heritage Sites Using Random Forest: A Case Study of Jiangsu’s Salt Reclamation Zone

Abstract

1. Introduction

2. Literature Review

2.1. International Research on Industrial Heritage Value

2.2. Research Gaps and Analytical Orientation

3. Research Site

4. Model Construction and Analysis

4.1. Data Collection and Pre-Processing

4.2. Data Set Collection Method

4.2.1. Multi-Source Data Collection and Indicator Operationalization

4.2.2. Socio-Cultural Value Data Collection

4.2.3. Integration of Multi-Source Data into Model Features

4.3. Model Construction and Validation Process

4.3.1. Random Forest Parameter Settings and Training Workflow

4.3.2. Pearson Heat Map to Analyze the Correlation Connection of Each Feature

4.3.3. Random Forest Modeling to Construct the Weights of the Features

4.3.4. Confusion-Matrix Test Model Validation

4.3.5. Additional Performance Metrics: Precision, F1-Score, ROC/AUC

4.3.6. Baseline Comparison Against Expert and Naïve Classifiers

5. Results

5.1. Feature Importance Analysis

5.1.1. Three Correlation Characteristics of the First-Level Feature Indicators

5.1.2. Correlation of Secondary-Feature Indicators

5.1.3. Interpretability Analysis Using SHAP

5.2. Classification Performance

6. Discussion

6.1. Three-Tier Protection and Reuse Framework from Classification to Decision-Making

6.1.1. High-Value Sites: “Core Protection + Cultural Empowerment”

6.1.2. Medium-Value Sites: “Functional Substitution + Industrial Integration”

6.1.3. Low-Value Sites: “Symbolic Retention + Redevelopment Transition”

6.2. Township-Based Regulation: “Urban Integration” vs. “Marine Industry Cluster” Pathways

6.3. Practical Use of Collective Memory and Identity Indicators

6.4. Who Uses the Model? Clarifying Final Decision-Makers

6.5. Model Bias, Data Limitations, and Vulnerabilities

6.6. Transferability to Other Provinces

6.7. Machine Learning as a Decision-Support Tool for Industrial Heritage Regulation

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI