2. Literature Review
Port smartness refers to a comprehensive metric for evaluating the level of port smartness, reflecting the maturity and efficiency of port smartness initiatives. It serves as a core indicator for measuring the progress and outcomes of port smartness efforts. Existing literature on the measurement and evaluation of port smartness primarily focuses on the construction of indicator systems and the innovation of evaluation methodologies. For instance, Molavi et al. (2020) proposed the Smart Port Index (SPI) [
1], while Robert Philipp (2020) developed the Port Digital Readiness Index [
2]. However, the former assesses smart ports from only three dimensions, and the latter emphasizes digital infrastructure and technological readiness, making it difficult to fully capture the multidimensional characteristics of port intelligence. In China, Cai Wenxue and Zheng Jichuan (2019) developed an evaluation framework based on a set of indicators and applied the Analytic Hierarchy Process (AHP) combined with fuzzy comprehensive evaluation, but their approach is highly subjective due to manual weighting, making the results heavily dependent on expert judgment [
3]. Cao Jie et al. (2021) employed an improved matter-element theory based on cloud models to evaluate Tianjin Port, yet the limited sample size restricts cross-comparisons among coastal ports [
4]. Zheng Zhong and Li Hongliang (2022) used AHP to evaluate 14 international ports including Hamburg Port, broadening the comparative perspective internationally; however, their evaluation system still relies heavily on static weight assignments, failing to reflect the dynamic evolution of port intelligence [
5]. Luo Bencheng et al. (2023) proposed a grey entropy change-weight evaluation model and conducted empirical analysis on the intelligence levels of China’s top 13 container throughput ports, but their sample selection remains throughput-driven, potentially underestimating differences in intelligent transformation among smaller and medium-sized ports [
6]. Zhu Jishuang et al. (2025) broke away from the “throughput-centric” paradigm by establishing a multidimensional comprehensive evaluation system for world-class ports and applied AHP and fuzzy comprehensive evaluation to assess 34 global ports [
7]. Zhou et al. (2024) based their assessment on indicators from the “Evaluation Index System for Smart Ports”, using correlation calculation and entropy weighting methods, though the static entropy method may still lead to intertemporal incomparability due to time-varying weights across periods [
8]. Clearly, although existing research on port intelligence evaluation continues to evolve, challenges remain, including inconsistent indicator systems, static evaluation methods, and insufficient cross-port comparability.
Data elements, with data as their carrier, are a new type of production factor that can participate in production and operation activities, create economic value, and improve total factor productivity after undergoing collection, processing, circulation, and application. Li Zhiguo and Wang Jie (2021), as well as He Wei et al. (2024), constructed indicator systems for data elements and used principal component analysis and entropy weight methods, to measure the development level of data elements; however, their indicators primarily focused on digital infrastructure and macro-level allocation environments [
9,
10]. Huang (2025) combined indicator system approaches with text analysis to assess China’s data element levels at both provincial macro and enterprise micro levels [
11]. Pan Hongliang (2025) established an evaluation framework for data element development based on three dimensions—data foundational support, data capability transformation, and industry application—and measured the domestic development level of data elements [
12]. Although this study began to emphasize the transformation process of data elements, its “capability” dimension placed greater emphasis on digital technological capabilities rather than data supply quality or data service capacity [
12]. Chen Rongda (2025) measured the marketization level of data elements across Chinese provinces from a macro perspective, focusing on data supply, circulation, and utilization [
13]. Wu Jie and Chen Hongzhao (2025) developed an evaluation system for data element development from an input-output perspective, systematically depicting the processes of data input, transmission, processing, support, and value realization [
14]. Nevertheless, their approach still leaned toward macro-level statistical and spatiotemporal evolution analysis, offering insufficient explanation of how data elements integrate into port smartness construction [
14]. In summary, existing research has measured the development level of data elements at regional, urban, and enterprise levels, highlighting the importance of data infrastructure, technological supply, data circulation, and application scenarios. However, within the port sector, the development level of data elements is often reflected in the data environment and digital foundation of the regions where ports are located. Whether these elements can be transformed into port intelligence capabilities depends further on port-specific business contexts, data governance mechanisms, and organizational absorption capacity.
With the rapid advancement of data elements, they have gradually emerged as the core driving force in the smartness process of coastal ports. A growing body of scholars has explored the role of data elements and associated digital technologies in enhancing port smartness. Sun Yu and Wang Pei (2019) emphasized that the key to applying unmanned technology lies in precise data collection and efficient utilization; however, their research focused primarily on individual technological applications and did not comprehensively analyze the overall impact of data elements on port intelligence [
15]. Paulauskas et al. (2021) argued that port digitization levels are influenced by multiple internal factors, with smaller and medium-sized ports significantly lagging behind larger ones in terms of digital maturity [
16]. Tang Hao et al. (2021), based on 5G technology, developed an information platform for Zhanjiang Port’s smart port system, proposing optimization strategies such as establishing databases, planning and constructing port platforms, and building information perception networks [
17]. Min (2022) highlighted the roles of the Internet of Things and big data in port management, but their analysis mainly focused on management architecture and technical applications without incorporating data as an independent production factor into the analytical framework [
18]. Deng Yuyong et al. (2022) analyzed 15 major Chinese ports and concluded that smart port development has significantly improved total factor productivity and overall efficiency through technological advancement, although current technical and scale efficiency still require improvement [
19]. Du Xinke (2023) and Ma Lanqing et al. (2024) respectively examined the application of artificial intelligence and sensing technologies in smart port development, strengthening the focus on key technologies, yet insufficient attention was given to the supply, circulation, and value transformation of data elements [
20,
21]. Hua Jiang et al. (2025), using Jiangyin Port as a case study, proposed that smart port development should be data-centric [
22], while Cai Hanyi (2026) discussed the significant role of big data technology in intelligent port logistics [
23]; however, both studies primarily focused on practical pathways and management strategies, lacking quantitative validation across multiple regions and years. These research findings indicate that digital technologies and data resources play a crucial role in improving port efficiency, yet there remains insufficient discussion regarding the systematic relationships, nonlinear characteristics, and regional heterogeneity between regional data element development and port smartness enhancement. Particularly in China’s coastal regions, differences in data element development levels, port scales, hinterland industries, and governance conditions across provinces and cities may lead to varying efficiencies in transforming data elements into port smartness.
In summary, existing research still suffers from four major shortcomings. First, lack of universality: Current port smartness evaluation systems employ different standards, with significant variations in indicator dimensions and weighting methods, resulting in insufficient horizontal comparability across studies. Second, lack of full-chain coverage: As a production factor, data elements must go through a complete chain of “base-supply-flow-use”, based on the lifecycle theory of data elements. Most existing studies focus only on infrastructure or macro-level applications, whereas this paper is the first to incorporate all four stages simultaneously into port smartness analysis. Third, lack of dynamism: Most studies use static evaluation methods such as entropy weight method or fuzzy comprehensive evaluation to measure the development level of data elements and port smartness, making it difficult to capture temporal evolution of indicators. Fourth, lack of mechanisms for identifying associations: Existing approaches mostly rely on linear weighted models or static evaluation models, implicitly assuming stable marginal contributions of each indicator, which makes it difficult to identify threshold effects, synergistic effects, and saturation effects in how data elements influence port smartness, and rarely discuss the possibility that highly intelligent ports may, in turn, promote improvements in data collection and application capabilities. The random forest algorithm in machine learning can effectively avoid issues such as model specification bias and multicollinearity inherent in traditional linear models, making it well-suited for analyzing complex real-world scenarios involving interactions among multiple factors.
Given this, this paper constructs an indicator system for measuring coastal port smartness and the development level of data elements, employs the VHSD-EM model to assess these dimensions across coastal regions, analyzes their temporal and spatial evolution, and integrates Random Forest algorithms with partial-effect models to examine the predictive contribution and average marginal response of data elements to port smartness. This provides practical guidance and empirical support for enhancing the smartness of China’s coastal ports. The main contribution of this study lies in offering an empirical diagnostic framework that captures the relationship between data elements and port smartness for China’s coastal ports, rather than proposing strict causal identification or novel machine learning algorithms. Specifically, the combined use of VHSD-EM, Random Forest, and partial-effect analysis provides complementary evidence on index construction, predictive association, and nonlinear response patterns while avoiding causal claims beyond the capacity of the data. Although the empirical setting is China, the framework also offers a transferable reference for other port systems seeking to assess whether regional data-element environments are effectively converted into port smartness performance.
3. Mechanism Analysis of Data Elements Driving the Enhancement of Coastal Port Smartness
The development of smart ports is a critical strategy for responding to global trade demands, enhancing national competitiveness, and promoting economic and social progress. It has become an important driver of efficiency improvement, cost reduction, and competitive advantage.
From a socio-technical and ecosystem perspective, the smartness of coastal ports is not determined solely by automation equipment or digital technologies. Rather, it results from the coordinated transformation of digital infrastructure, operational processes, governance mechanisms, and organizational capabilities. Digital infrastructure provides the foundation for data sharing, platform interconnection, and collaborative innovation. Socio-technical transformation theory emphasizes that technological upgrading must evolve alongside user practices, institutional arrangements, industrial networks, and governance structures. The digital platform ecosystem perspective further suggests that ports are multi-actor collaborative systems involving terminal operators, shipping companies, customs authorities, logistics providers, financial institutions, and public management agencies. Their value creation depends on data sharing, platform interoperability, and rule-based coordination. Therefore, data elements enhance port smartness not only by improving technical efficiency but also by reshaping governance models, inter-organizational collaboration, and operational performance.
The advancement of port smartness is a socio-technical process driven by data elements. However, data elements do not automatically translate into higher levels of smartness. Their value realization depends on the synergistic interaction among technological infrastructure, governance mechanisms, and human and organizational support. Accordingly, the WSR framework provides a suitable analytical lens for explaining the formation mechanism of port smartness.
The data-driven enhancement of coastal port smartness follows a progressive mechanism of factor reconstruction, chain empowerment, and system transition. Specifically, factor reconstruction explains the micro-level embedding of data elements into port production functions; chain empowerment reveals the meso-level circulation and value creation of data elements across the port value chain; and system transition captures the macro-level transformation of data elements into smart port capabilities through the coordinated evolution of the Wuli-Shili-Renli system. The mechanism of data elements driving smartness enhancement in coastal ports is presented in
Figure 1.
At the factor reconstruction stage, data elements are embedded into the port production function as a new production factor, reshaping the structure of production inputs. The traditional port production function can be expressed as , where denotes port output, labor input, capital input, and total factor productivity. With the introduction of data elements, the function can be reconstructed as , where represents data elements as an independent production factor, denotes data-empowered labor input, denotes data-empowered capital input, and refers to data-enhanced total factor productivity. These terms represent different mechanisms of data empowerment: data improve labor skills and decision-making capabilities, optimize capital allocation and equipment utilization, directly participate in production as an independent input, and enhance total factor productivity through integrated digital technologies.
At the chain empowerment stage, data elements permeate the entire base-supply-flow-use value chain. In the foundation support layer, digital infrastructure is established to enable human–machine-object interconnection. In the technology supply layer, production and operation are smart-enabled through the dual drive of mechanisms and data. In the circulation operation layer, data barriers among stakeholders are reduced to facilitate collaborative sharing. In the value transformation layer, the multiplier effect of data is activated to support new business models and service innovation. These four layers progress hierarchically and form a closed-loop empowerment process from infrastructure construction to value realization.
At the system transition stage, the effects generated through factor reconstruction and chain empowerment are integrated into the WSR system. The Wuli dimension represents the technological and material foundation of port smartness, including digital infrastructure, data collection systems, intelligent equipment, communication networks, and integrated digital platforms. It provides the basic conditions for data acquisition, transmission, processing, and application. The Shili dimension reflects the institutional and managerial logic through which technical resources are transformed into operational efficiency and governance capacity, including governance rules, process integration, collaboration mechanisms, platform management, risk control, and decision-making processes. The Renli dimension emphasizes the human-centered and organizational foundation of port intelligence, including managerial cognition, digital skills, organizational learning, stakeholder collaboration, and cross-departmental cooperation.
Through the coordinated interaction of Wuli, Shili, and Renli, data elements are transformed from technical resources into system-level capabilities for enhancing port smartness. This process systematically improves the smartness of port infrastructure, operational service efficiency, and talent-driven innovation capacity. Moreover, the empowerment effect on port smartness is strengthened by multiple supporting factors, generating a synergistic amplification effect in which the whole is greater than the sum of its parts. As a result, coastal port smartness evolves from quantitative accumulation toward qualitative transformation, forming a dynamic coupling mechanism between data elements and supporting conditions.
4. Research Methods
To avoid methodological over-complexity, the three analytical components are assigned distinct and limited roles: VHSD-EM constructs comparable CPSI and DEDI indices, Random Forest identifies predictive contributions under nonlinear conditions, and partial-effect analysis visualizes the model-predicted response intervals. The framework is therefore used for empirical diagnosis and policy interpretation of port smartness enhancement, not for claiming algorithmic novelty or establishing causal identification.
4.1. The VHSD-EM Model
The VHSD-EM model is a dynamic comprehensive evaluation approach that integrates the Vertical and Horizontal Scatter Degree (VHSD) model with the Entropy Method (EM). Compared with the traditional analytic hierarchy process (AHP), the VHSD-EM model reduces the subjectivity associated with expert-based weighting. Unlike the standalone entropy method, it considers not only the dispersion of indicator information but also the vertical and horizontal scatter degree, thereby incorporating both temporal dynamics and cross-sectional heterogeneity into the weighting process and improving the dynamic comparability of panel evaluation results. In contrast to the standalone VHSD model, VHSD-EM further leverages the entropy method’s capacity to capture the information content of indicators, thus avoiding excessive reliance on inter-object differences while neglecting the intrinsic informational contribution of the indicators. Comparison of evaluation methods are presented in
Table 1.
Given that the evaluation of coastal port smartness and data element development involves both temporal evolution and spatial differentiation, the heterogeneity arising from spatiotemporal factors must be adequately addressed. Therefore, this study constructs the VHSD-EM model by integrating the two methods, generating composite weights that balance temporal dynamics, cross-sectional differentiation, and information completeness. The proposed model provides a more transparent and comparable diagnostic basis for measuring and evaluating coastal port smartness and the development level of data elements.
- (1)
Basic principle of the VHSD model. Let the smartness of coastal ports be denoted as . The core formula is as follows:
In Formula (1), represents the comprehensive evaluation score of the -th evaluation object (port) in period ; denotes the weight of the -th indicator; and is the standardized value of the original data for the -th indicator of the -th port in period .
The determination of indicator weights adheres to the principle of “maximizing inter-object differentiation”. This differentiation is measured by the total sum of squared deviations (TSSD), expressed as:
In Formula (2), is the weight vector, and is a symmetric matrix of order I. Let , then we have .
To satisfy the basic constraint of indicator weights
, the maximization of the objective function in Equation (2) is transformed into the following optimization problem:
The solution W is the eigenvector corresponding to the largest eigenvalue of matrix . Finally, the comprehensive evaluation score for each port at each time point is calculated using Equation (1).
- (2)
Basic principle of the EM model. Firstly, the same standardization process is applied to obtain normalized indicator values . Information entropy is then used to determine indicator weights , with the core formula:
Here, denotes the information entropy of the -th indicator, and represents the normalized result. Finally, the comprehensive evaluation score is calculated according to Equation (1).
- (3)
Construction of the VHSD-EM model. The final weight is derived by taking the arithmetic mean of the weights obtained from Equations (3) and (4):
In Formula (6), represents the weight of the -th indicator for coastal port smartness in period , as calculated by the VHSD-EM model. The comprehensive evaluation score is then obtained via linear weighting aggregation (layer-by-layer summation). The calculation process for the data element development level follows the same logic.
In addition, this study employs the Spearman correlation test to assess the consistency between the measurement results of the VHSD model and the EM model. It should be noted that this test only indicates the consistency of rankings derived from the two weighting methods, and cannot independently verify the external validity of the CPSI or DEDI indicator systems. Therefore, this paper further supplements and validates the measurement results by combining ranking stability comparison and model robustness tests.
4.2. Random Forest Algorithm
In this study, the random forest model was implemented using the Python programming language, with model construction and computation specifically carried out via the Scikit-learn machine learning library. The random forest was utilized to identify the relative predictive importance of different feature variables in CPSI prediction and to capture potential nonlinear relationships between variables. It should be clarified that the results derived from the random forest reflect the variable importance in the context of model prediction.
- (1)
Sample splitting and node optimization. For a given feature variable and its threshold split point , the sample set is split into two subsets such that the residual sum of squares (RSS) of the target values is minimized:
In Formula (7),
and
denote the mean target values of the two subsets after splitting, calculated as:
The subsets
and
are defined as:
For each decision tree in the random forest, the splitting process (7)–(9) is repeated to iteratively select the optimal feature variable and optimal threshold that minimize RSS. The splitting stops when a preset stopping criterion is satisfied. The random forest then performs bootstrap sampling (random sampling with replacement) on the dataset, selects feature variables corresponding to different split nodes for each tree, and finally outputs the predicted target value by averaging the predictions across trees.
- (2)
Parameter tuning and overfitting mitigation. To mitigate the risk of overfitting under small-sample conditions, this study adopts a combined approach of grid search and cross-validation for parameter tuning. The parameter settings and performance evaluation results of the random forest model are presented in
Table 2.
- (3)
Model training, contribution rate calculation, and performance evaluation. The optimal parameter combination model, obtained through hyperparameter tuning, is trained using sample data, and the contribution rate of each feature variable is calculated. Additionally, in
Table 3, this study reports model performance metrics such as CV_R
2, RMSE, and MAE, comparing them with benchmark models to enhance transparency in result interpretation.
In terms of predictive performance, the Random Forest model exhibits a level of accuracy comparable to that of the panel fixed-effects model and shows better predictive accuracy than traditional linear models (ordinary OLS) as well as regularized regression methods (Ridge, Lasso) and the Gradient Boosting algorithm. Notably, it also enables the generation of nonlinear feature contribution rankings, a capability that enhances the interpretability of feature importance in nonlinear modeling frameworks.
The basic principle of the random forest and the workflow for calculating feature variable contribution rates are illustrated in
Figure 2.
4.3. Partial Effect Model
In model analyses involving multiple independent variables, partial effect models can be employed to examine the average marginal response of a specific independent variable on the predicted value of the dependent variable. Partial effect plots, by marginalizing other variables, illustrate the average change in the model-predicted outcome variable as a single variable or a combination of two variables varies. It should be emphasized that partial effect plots are used to identify nonlinear trends, threshold intervals, and interaction relationships, rather than to establish strict causal identification.
Let the set of independent variables be denoted as
. When examining the average marginal response of a specific independent variable
on the predicted value of the dependent variable Y, we define
as the subset of independent variables excluding
, satisfying
. The partial effect of
on Y is then expressed as:
where
denotes the partial effect function, and
represents the expected value of the dependent variable Y corresponding to different values of
.
We further extend this concept to a three-dimensional scenario to analyze the joint partial effect of two target independent variables,
and
, on Y. Let
denote the subset of independent variables excluding
and
, satisfying
. The joint partial effect of
and
on Y is given by:
where
denotes the joint partial effect function, and
represents the conditional expectation of Y corresponding to different values of
and
.
To enhance the statistical interpretability of partial effect results, this study employs the bootstrap resampling method to construct 95% uncertainty intervals. Specifically, repeated sampling with replacement is performed on the original sample; in each iteration, the random forest model is retrained, and partial effect values are calculated at uniform grid points. The 95% uncertainty interval is then constructed using the 2.5th and 97.5th percentiles of the resampled partial effect distributions. For univariate partial effect plots, the uncertainty interval is visualized as a shaded area; for bivariate joint partial effect plots, contour lines and sample location markers are incorporated to facilitate result interpretation. The parameter settings of the partial effect model are presented in
Table 4.
5. Evaluation of the Smartness of Coastal Ports and the Development of Data Elements
5.1. Evaluation Index System for the Smartness of Coastal Ports and Data Element Development
Based on the WSR systems methodology, this study constructs an evaluation index system for coastal port smartness from three dimensions: the Wuli layer, the Shili layer, and the Renli layer. Drawing on insights from existing literature [
3], eight secondary indicators and 23 tertiary indicators are selected, as presented in
Table 5.
The full industrial chain of data elements refers to the complete industrial ecosystem that encompasses the entire process from the generation of data (as a new production factor) to the ultimate realization of its value. This chain can be summarized into four core links: “base-supply-flow-use”. Based on the full industrial chain of data elements, this paper constructs an evaluation index system for the development level of data elements. Drawing on the approaches of Pan Hongliang [
12] and Chen Rongda [
13], eight secondary indicators and 22 tertiary indicators are selected, as presented in
Table 6.
The DEDI indicators are selected as observable proxies for the regional data-element environment rather than as direct port-operational variables. Indicators such as the number of high-tech zones and digital-economy policies reflect the institutional and industrial supply of data-related resources; optical cable length and telecommunications switch capacity capture the infrastructure foundation for data transmission and connectivity. These provincial-level indicators may influence port smartness indirectly by shaping the availability, circulation, and application capacity of data resources in the surrounding logistics and industrial ecosystem.
5.2. Data Sources
Since port smartness and data elements are emerging concepts proposed in recent years, related research in China remains at an early stage. As a result, the continuity and comparability of relevant statistical data are still subject to certain limitations. Accordingly, this paper selects seven coastal provinces/municipalities and eight coastal ports over the period 2017–2024 as the research sample, based primarily on the following considerations:
Firstly, the selected ports cover major coastal port regions in China, including Northeast China, North China, East China, and South China, thus ensuring regional representativeness. Secondly, these ports play an important role in terms of cargo throughput, container transportation, regional economic linkages, and port infrastructure. Thirdly, since this paper constructs both the port-level CPSI and the provincial-level DEDI, the availability of continuous panel data and the consistency of statistical standards constitute key constraints in sample selection. Fourth, Guangzhou Port and Zhuhai Port are both included in the Guangdong sample because they differ markedly in port scale, functional positioning, and port smartness performance. This enables a more nuanced examination of heterogeneity in port smartness under the same provincial data-element environment.
For port smartness data, primary sources include China port yearbooks, annual reports of port-listed companies, policy documents, and survey data released by government departments and authoritative institutions such as the Ministry of Transport, China Ports Association, and Shipping Exchanges. For missing values encountered during data collection, multiple imputation was employed for interpolation.
For data elements data, data were retrieved from China Statistical Yearbook, provincial statistical yearbooks, China Science and Technology Statistical Yearbook, and the National Intellectual Property Administration. The counts of data-related regulations/standards and data security documents were obtained via web scraping using Python (uses PyCharm 2024.1 as the development environment and completes data processing and analysis based on Python 3.11.0). Missing values in these datasets were addressed using linear interpolation.
The use of provincial-level DEDI indicators is mainly constrained by data availability and the fact that ports are embedded in wider regional economic and governance systems. Nevertheless, this scale mismatch between provincial data-element indicators and port-level smartness indicators remains a limitation, and the empirical results should be interpreted as regional association evidence rather than direct port-level causal effects.
5.3. Sustainability Implications: Linking CPSI with the SDGs
From the perspective of sustainable development, the CPSI indicator system constructed in this paper exhibits strong alignment with the United Nations Sustainable Development Goals (SDGs), we can see in
Table 7. Within the Wuli layer, indicators such as intelligent facilities, automated terminals, big data platforms, and informatization and paperless operations correspond to SDG 9, embodying the upgrading of port infrastructure and technological innovation. Green and low-carbon indicators align with SDG 13 and SDG 12, reflecting the port’s performance in energy conservation, emission reduction, and low-carbon operations. In the Shili layer, indicators related to logistics efficiency, operational effectiveness, and service quality are associated with SDG 9, SDG 11, and SDG 17, demonstrating the port’s supporting role in regional supply chain efficiency, urban logistics systems, and trade connectivity. At the Renli layer, indicators of talent teams and innovation-driven development correlate with SDG 4, SDG 8, and SDG 9, reflecting the human capital and innovation foundation required for the transformation of smart ports. Thus, the CPSI not only functions as an evaluation tool for the level of port intelligence but also serves as a comprehensive indicator for observing the sustainable transformation capacity of port systems.
5.4. Robustness Analysis
This study calculates the annual CPSI for coastal ports and the annual DEDI for provinces using the VHSD-EM model. The results of the Spearman test for the VHSD and EM models are presented in
Table 8.
According to Spearman’s test results, the evaluation results derived from the vertical-horizontal grading method and the information entropy weighting method are all significant at the 5% level, with some years reaching the 1% significance threshold. Additionally, the correlation coefficient approaches 1.000, indicating that the rankings of the two indices are nearly identical—thus indicating that the two methods show good consistency in sample sorting.
To examine the robustness of the composite weighting scheme, a weight sensitivity analysis was further conducted. Specifically, α was set to 0.25, 0.50, and 0.75, where α denotes the relative contribution of VHSD weights and 1 − α denotes that of EM weights. The composite weight was recalculated as
Wα = α ×
WVHSD + (1 − α) ×
WEM. The Spearman rank correlation coefficient was then used to compare the consistency of indicator-weight rankings under alternative α settings. Spearman correlation coefficients of CPSI ranking results are presented in
Table 9.
The results show that the overall Spearman rank correlation coefficient between α = 0.25 and the baseline setting of α = 0.50 is 0.9654, while that between α = 0.75 and α = 0.50 is 0.9258. The year-by-year results also indicate high rank stability, with mean annual Spearman coefficients of 0.9576 and 0.9072, respectively. These findings suggest that the CPSI weighting scheme is not sensitive to moderate changes in the relative contributions of VHSD and EM weights, thereby confirming the robustness of the weighting structure. The weight sensitivity analysis for DEDI follows the same principle.
5.5. Analysis of Evolution in the Time Dimension
This study further estimates the coastal port smartness index (CPSI) and data element development index (DEDI) for the period 2017–2024 using the VHSD-EM model, with results presented in
Figure 3.
Firstly, in terms of the overall development trend, both the smartness level of coastal ports and the provincial-level data element development level exhibit an upward trajectory, indicating a certain degree of temporal synchronization between the two. However, such synchronization does not imply a strict causal relationship; instead, it suggests a strong correlation between the improvement of the data element environment and the enhancement of port smartness during the sample period.
Secondly, from a regional perspective, the differentiation between the two indices is pronounced across coastal regions. In East China, the smartness of coastal ports and the development of data elements have mostly accelerated synchronously, with port smartness significantly outperforming data element development—both of which are notably higher than the overall level of coastal areas. By contrast, the growth rates of both data elements and port smartness in Fujian are relatively moderate. In North China and the Northeast (specifically Liaoning and Tianjin), the growth of data elements and port smartness is insignificant, showing almost flat trends; while port smartness outperforms data element development, both remain below the coastal average. In South China, the growth rate of data elements is significantly higher than the national average, yet the progress of port smartness is relatively slow—leading to a clear misalignment in their development pace and a gradually widening gap. In contrast to other regional characteristics, Guangdong province exhibits a relatively high level of data element development; however, the smartness rankings of Guangzhou Port and Zhuhai Port remain relatively low, indicating a structural mismatch between the provincial-level data element advantage and the smartness performance of its ports.
5.6. Analysis of Evolution in the Spatial Dimension
Based on the calculation results of the VHSD-EM model, the average values of the coastal port smartness index (CPSI) and data element development index (DEDI) for coastal regions during the period 2017–2024 were further derived. The results are shown in
Table 10.
Firstly, regarding the total coastal port smartness index (CPSI), the smartness of coastal ports exhibits a stepped distribution with significant regional gradient disparities. Shanghai Port (0.5116) and Ningbo-Zhoushan Port (0.5022) rank first and second, with their smartness levels far outperforming other ports, forming the first tier. Qingdao Port (0.4456) follows closely, constituting the second tier. Tianjin Port (0.3480), Dalian Port (0.3413), Guangzhou Port (0.3156), and Xiamen Port (0.3150) are at a medium development level, forming the third tier. Zhuhai Port (0.2123) has the lowest CPSI, with a distinct gap from leading ports, reflecting a significant regional imbalance in the smart development of coastal ports.
As an external qualitative validation, the leading CPSI positions of Shanghai Port and Ningbo-Zhoushan Port are broadly consistent with their widely recognized roles as advanced international hub ports with strong digital infrastructure, automated terminal development, and integrated logistics capabilities. This comparison is not intended as an additional quantitative validation test, but it improves the external interpretability of the CPSI ranking results.
Secondly, regarding the WSR subsystem index of port smartness, the development of subsystems in some ports is unbalanced, with structural bottlenecks. From the physical layer perspective, Ningbo-Zhoushan Port (0.2473) leads with well-developed infrastructure and intelligent facilities, while Zhuhai Port (0.1086) performs the worst, indicating a substantial gap in physical facility support capabilities between leading and trailing ports. From the Shili layer perspective, Qingdao Port (0.2200) and Ningbo-Zhoushan Port (0.2175) demonstrate coordinated leadership in logistics efficiency, operational efficiency, and service quality, whereas Zhuhai Port (0.0754) shows obvious deficiencies in operational processes and service efficiency, with prominent regional differentiation characteristics. From the Renli layer perspective, Tianjin Port (0.0671) leads Shanghai Port (0.0668) by a narrow margin; Zhuhai Port (0.2123) and Xiamen Port (0.0085) lag significantly in talent reserves and innovation-driven capabilities, which have become bottlenecks restricting their smartness improvement. This subsystem imbalance phenomenon indicates that some ports have development biases of “prioritizing hardware construction over software empowerment” or “emphasizing operational efficiency while neglecting talent cultivation”.
Thirdly, regarding the matching degree between the data element development index (DEDI) and the coastal port smartness index (CPSI), there exist significant disparities in their spatial synergy, giving rise to three distinct development patterns. The “data-leading but smartness-lagging” pattern: Guangdong province serves as a typical case. Its DEDI (0.5962) ranks first nationwide, yet the CPSI rankings of Guangzhou Port and Zhuhai Port are only 6th and 8th, respectively. This indicates that the enabling value of data elements has not been fully unleashed, with a prominent issue of inefficient conversion of data dividends. The “smartness-leading but data-lagging” pattern: For example, Tianjin’s DEDI (0.1409) is at the bottom of the sample, but its CPSI ranking outperforms its DEDI ranking, presenting a reverse imbalance. The supporting role of data elements in port smartness remains to be enhanced. The “coordinated development and virtuous cycle” pattern: Shanghai, Zhejiang province, and Shandong province exhibit high levels of both DEDI and CPSI with strong synergy, forming a positive cycle of “data element support → smartness enhancement → data value deepening”. In contrast, Fujian province and Liaoning province show a balanced feature of low rankings in both indices, leaving substantial room for improvement in their overall development level.
Fourth, from the perspective of spatial agglomeration characteristics, a spatial pattern of “leader-led agglomeration and regional block differentiation” has taken shape. The port smartness and data element development levels in East China (with Shanghai, Zhejiang, and Shandong as prominent leaders) rank first nationwide. Leveraging a solid digital economy foundation, clear port development positioning, and robust policy support, they have formed a collaborative agglomeration effect, emerging as the core leading region for the intelligent transformation of China’s coastal ports. In South China (Guangdong), data element development advantages are prominent, yet port smartness construction has failed to keep pace, resulting in a lack of effective synergy. Ports in North China (Tianjin) and Northeast China (Liaoning) have a relatively solid foundation in port smartness, but lag in data element development—a key bottleneck restricting regional collaborative upgrading. This spatial distribution pattern is closely linked to regional data element resource endowments, port development positioning, policy support intensity, and industrial synergy levels, further exacerbating regional differentiation in the intelligent development of coastal ports.
6. Association Analysis Between Data Elements and Smartness of Coastal Ports
6.1. Identification of Direct Association Between Data Elements and Smartness of Coastal Ports
This study employs the coastal port smartness index (CPSI) as the dependent variable and utilizes the random forest algorithm to conduct feature importance analysis. Given that the predictive association of data elements on the enhancement of coastal port smartness is context-dependent—i.e., the association between data elements and CPSI exhibits nonlinear characteristics as the socio-economic conditions of coastal regions evolve—this paper, in addition to the core explanatory variable (development index of data elements, DEDI), selects 10 control variables as feature variables based on existing literature [
24,
25,
26,
27,
28,
29,
30,
31,
32]. These variables span dimensions including economic development, policy support, talent reserve, capital investment, openness, financing capacity, innovation level, and financial development. They not only serve as important contextual predictors of coastal port smartness but are also influenced by the development of data elements to varying degrees. It should be noted that the feature importance derived from random forest analysis reflects the relative contribution of different variables to the model’s prediction of CPSI.
The specific operationalization of variables is as follows: The economic development level of coastal regions is measured by per capita regional GDP (RGDP_PC); Government support for the transportation sector in coastal regions is reflected by the budgeted general public budget expenditure on transportation (BET_GPB); Talent density in coastal regions is captured by the proportion of port employees with a bachelor’s degree or above (EDU) and the proportion of port technical personnel (TECH); Capital investment level and openness of coastal regions are represented by per capita fixed asset investment (CFAI_PC) and foreign trade dependence (OPE), respectively; The financing level of coastal regions is measured by the social financing scale (SSF); The independent innovation capacity of coastal regions is assessed by the R&D investment intensity of port enterprises (RDI_PE) and the number of authorized invention patent applications (IPA); The development status of digital finance in coastal regions is gauged by the digital inclusive finance index (DFI).
The model training adopted an iterative optimization strategy, integrating cross-validation and model performance metric evaluation to assess the stability of results. After training the initial model on the full sample, this study progressively eliminated low-contribution variables (e.g., CFAI_PC, RDI_PE, SSF) based on feature contribution rates, theoretical relevance, and model performance. The remaining features were then retrained, and their contribution rates recalculated, ultimately retaining eight core feature variables. This process was designed to enhance the interpretability and parsimony of the model; however, given the limited sample size, the findings should be interpreted as exploratory predictive evidence.
This variable combination not only enhances the contribution rate of data elements but also maintains a high model goodness of fit (R
2), thereby providing empirical evidence for the more cautious identification of association patterns between data elements and the improvement of coastal port smartness. The measurement results of feature variable contribution rates are presented in
Table 11.
Firstly, the data element development index (DEDI) accounts for 13.586% of the contribution rate in model prediction, indicating its high importance in explaining the CPSI variations among the sampled ports. This result demonstrates a strong statistical association between the provincial-level data element development level and coastal port smartness.
Secondly, the remaining seven feature variables also exhibited varying degrees of importance in the model. Among them: the digital inclusive finance index (DFI) has the highest contribution rate (29.324%), highlighting the fundamental supporting role of the popularization and deepening of digital financial services in the smart transformation of ports; Foreign trade dependence (OPE) ranks second (17.033%), indicating that the level of openness and trade activity are important external factors associated with smart port development; Per capita regional GDP (RGDP_PC) contributes 10.270%, reflecting the supporting effect of regional economic strength on port smartness; The number of authorized invention patent applications (IPA) and the proportion of port technical personnel (TECH) have contribution rates of 9.351% and 8.166%, respectively, embodying the enabling role of innovation capacity and professional technical talent in port smartness; The budgeted general public budget expenditure on transportation (BET_GPB) contributes 6.881%, underscoring the important role of government policy support; The proportion of port employees with a bachelor’s degree or above (EDU) contributes 5.389%, reflecting the supporting role of high-end talent reserves.
This indicates that port smartness is not the outcome of the standalone role of data elements, but rather the combined effect of regional economic foundation, openness level, financial services, talent structure, innovation capacity, and policy support.
6.2. Identification of Interactive Associations Between Data Elements, Other Elements, and Smartness in Coastal Ports
In the analysis of regression problems involving moderating effects, the introduction of interaction terms is a standard approach. For a given feature variable, its contribution rate calculated by the random forest algorithm without interaction terms reflects the combined effect of the variable itself and its interactions with other feature variables. Thus, if the magnitude of a feature variable’s impact on the target value varies with the values of other variables (and a high correlation exists between them), the variable’s standalone contribution rate will inevitably decrease after adding interaction terms—while the interaction terms will account for a portion of the contribution. Concurrently, the model’s goodness-of-fit (R2) will either remain stable or improve compared to the model without interaction terms.
To further investigate the interactive relationships between data elements and supporting conditions, this study incorporates first-order interaction terms of DEDI with the seven feature variables into the random forest model. It is worth noting that the contribution rate of interaction terms is used to characterize the changes in the importance of different variable combinations in model prediction, which can serve as a reference for identifying synergistic relationships, but does not equate to the causal moderating effect in the context of traditional regression.
After incorporating the interaction terms between the DEDI and each of the seven feature variables, and based on the optimal model derived from hyperparameter tuning of the random forest algorithm, the contribution rates of all feature variables—including the seven interaction terms—were finally obtained, as presented in
Table 12.
A comparison between
Table 11 and
Table 12 reveals a significant shift in the contribution rate structure of variables following the introduction of interaction terms:
Firstly, the standalone contribution rates of the data element (DEDI) and the seven feature variables all exhibit a downward trend, while interaction terms account for a substantial proportion of total contributions. Specifically: The standalone contribution rate of DEDI decreased from 13.586% to 3.216%, representing a 76.329% reduction; The combined standalone contribution rate of the seven feature variables fell from 86.414% to 67.433%; The combined contribution rate of the DEDI main effect and DEDI interaction terms reached 32.567%. This finding indicates that the predictive contribution of data elements is strongly conditioned by supporting regional factors—i.e., the predictive association of data elements varies dynamically with changes in the economic and social conditions of coastal regions. This supports the existence of nonlinear correlated enhancement patterns in port smartness enhancement.
Secondly, from the distribution of interaction term contribution rates, the factors with relatively high contributions are concentrated in the digital inclusive finance index (DFI), per capita regional GDP (RGDP_PC), budgeted general public budget expenditure on transportation (BET_GPB), and the number of authorized invention patent applications (IPA), reaching 6.031%, 5.870%, 5.7166%, and 4.369%, respectively. Their independent contribution rates decreased from 29.324%, 10.270%, 6.881%, and 9.351% before including interaction terms to 25.564%, 9.747%, 3.287%, and 4.629% after including them. This indicates that whether viewed in terms of overall contribution or considering interactive effects, enhancing digital finance development, promoting regional economic growth, increasing government support, and strengthening independent innovation capacity are associated with stronger coordinated improvement between data elements and port smartness.
6.3. Pathways for Enhancing Coastal Port Smartness Associated with Data Elements
6.3.1. Univariate Partial Effect
The preceding analysis has established the high significance of data elements in predicting port smartness, while their model-predicted response patterns require further examination. To identify the average marginal response of CPSI predicted values across varying DEDI levels and explore how provinces can select appropriate pathways for port smartness enhancement based on their respective data element development statuses, this study leverages a partial effect model to analyze the nonlinear association between data elements and port smartness, and further discusses potential differentiated pathways by incorporating the geographical locations of sample regions.
Based on the random forest algorithm, the univariate dynamic partial effect formula in this paper is expressed as:
Similarly, the multivariate dynamic partial effect formula is expressed as:
In Equations (12) and (13), and denote the functional relationships between the feature variables and coastal port smartness estimated by the random forest model, while represents the number of training set samples in the random forest model.
In essence, the above two equations represent the discretization of integrals for continuous functions: by summing and averaging, all other variables
are eliminated via integration, yielding the partial effect of
on Y and the joint partial effect of
and
on Y. Let
denote the data element development index; the partial effect plot of
on coastal port smartness is presented in
Figure 4.
As illustrated in
Figure 4, the results of the univariate partial effect analysis reveal a pronounced nonlinear association between DEDI and CPSI.
At the stage of low DEDI levels, the predicted CPSI shows an overall upward trend with the improvement of data element development, indicating that the construction of data elements in the initial stage is associated with higher predicted comprehensive service capacity of coastal ports. Around the potential turning interval of DEDI ≈ 0.215, the predicted curve of the model exhibits a potential inflection point characterized by an increasing slope, indicating that the positive correlation between data element development and port smartness is enhanced once data element development reaches a certain level.
When DEDI rises to a medium-to-high level, the curve gradually flattens, suggesting that the marginal contribution of data element development to CPSI may experience a diminishing trend, and simply improving the regional data-element environment may not be sufficient to generate equivalent increases in predicted CPSI without complementary governance, operational, and innovation conditions.
In addition, some provinces, such as Guangdong, are characterized by high DEDI but low CPSI, which indicates that CPSI is not fully explained by DEDI, but is also jointly influenced by multiple factors such as port infrastructure, industrial collaboration, openness, and technical capabilities.
6.3.2. Joint Partial Effects
To clarify the mechanism for identifying the joint association between data elements and the aforementioned supporting factors with respect to port smartness, this study further employs a multivariate dynamic partial effect model to examine the joint dynamic partial effects of data elements and feature variables on coastal port smartness, thereby identifying differentiated optimization paths for port smartness enhancement across regions. Given that analyzing all feature variables individually would unduly lengthen the paper, only the four elements with the highest contribution rates of interaction terms with data elements are selected as typical cases for analysis. The model is constructed based on Equation (13), where
denotes the data element development index (DEDI) and
represents a selected feature element. A two-dimensional contour heat map was plotted to visualize the joint dynamic partial effects, as detailed in
Figure 5.
Figure 5 presents the joint partial effects of DEDI and four supporting factors on the predicted CPSI using two-dimensional contour heat maps. In each subplot, the horizontal axis denotes the DEDI, while the vertical axes respectively represent the digital inclusive finance index (DFI), per capita regional GDP (RGDP_PC), the budgeted general public budget expenditure on transportation (BET_GPB), and the number of authorized invention patent applications (IPA). The color gradient indicates the predicted CPSI generated by the random forest model, with the transition from red to green representing an increase in the predicted CPSI level. Contour lines represent different levels of the joint partial effect. Hollow scatter points denote raw observations, solid labeled dots indicate provincial mean positions, and red pentagrams identify the largest-gradient points on the predicted response surfaces, namely the intervals where predicted CPSI is most sensitive to changes in the combination of DEDI and the corresponding supporting factor. These points should not be interpreted as strict causal thresholds or policy optima; rather, they serve as diagnostic references for identifying potential nonlinear response intervals and regional differences in factor coordination. The relative positions of provincial mean points and largest-gradient points are further used to classify regional development stages, as shown in
Table 13.
Firstly, the synergistic state between data elements and supporting factors exhibits significant four-dimensional heterogeneity.
In the DFI dimension, there remains substantial room for universal improvement in the synergy between digital financial inclusion and data elements. Shanghai, Zhejiang, Guangdong, and Shandong have crossed the DEDI inflection point but have not yet entered the maximum gradient interval corresponding to DFI. This indicates that these regions possess a solid data element foundation, while the synergistic empowerment of digital finance for port intelligence has not been fully unleashed. In contrast, Fujian, Tianjin, and Liaoning have not simultaneously crossed the DEDI and DFI inflection points, suggesting that they need to both strengthen their data element infrastructure and enhance the capacity of digital financial services to support intelligent port transformation.
In the RGDP_PC dimension, the synergy between regional economic foundations and data elements presents a stratified pattern. Shanghai has simultaneously crossed the DEDI and RGDP_PC inflection points, demonstrating a well-established synergistic support between data element development and economic foundations. Zhejiang, Guangdong, and Shandong have crossed the DEDI inflection point but have not reached the high-efficiency synergy interval corresponding to RGDP_PC, reflecting the characteristic of “relatively advanced data elements yet insufficient economic support”. Fujian, Tianjin, and Liaoning remain in a state of dual lag, requiring further enhancement of regional economic strength and industrial hinterland support to improve the conversion capacity of data elements into port intelligence.
In the BET_GPB dimension, the supporting role of fiscal transportation expenditure in enabling data elements varies across regions. Guangdong has simultaneously crossed the DEDI and BET_GPB inflection points, indicating strong synergy between its data element foundation and fiscal support for transportation. Zhejiang, Shanghai, and Shandong have crossed the DEDI inflection point, but their fiscal transportation expenditure has not yet entered the high-efficiency synergy interval, suggesting that these regions need to further optimize fiscal resource allocation by directing transportation fiscal investment more precisely toward the construction of smart port infrastructure, digital platforms, and intelligent operation systems. Fujian, Liaoning, and Tianjin have not crossed either inflection point, necessitating simultaneous strengthening of data infrastructure construction and fiscal support in the transportation sector.
In the IPA dimension, the synergy between innovation capacity and data elements is relatively favorable, yet regional differentiation persists. Shanghai, Guangdong, Shandong, and Zhejiang have simultaneously crossed the DEDI and IPA inflection points, indicating a strong synergistic relationship between their data element foundation and innovation output. Although Tianjin has exceeded the IPA inflection point, it has not crossed the DEDI inflection point, exhibiting the characteristic of “relatively advanced innovation capacity yet insufficient data element foundation”. Moving forward, it should strengthen the construction of data infrastructure, data circulation systems, and port digital application scenarios to better translate innovation capacity into port intelligence performance. Fujian and Liaoning lag behind in both DEDI and IPA, requiring simultaneous reinforcement of data element accumulation and innovation capacity building.
Secondly, the coordination status between data elements and supporting factors across coastal provinces and municipalities exhibits significant regional heterogeneity, resulting in differentiated development patterns and catch-up pathways. Shanghai performs relatively well in terms of RGDP_PC and IPA, but still has room for improvement in DFI and BET_GPB, indicating that its future focus should shift from relying solely on economic and innovation advantages to strengthening digital finance support and precise allocation of fiscal resources for transportation. Guangdong excels in BET_GPB and IPA dimensions and maintains a high level of DEDI, yet it has not fully entered the efficient coordination range in DFI and RGDP_PC, suggesting the need to further improve the efficiency of transforming data elements into smart port applications and enhance the support of digital finance and regional economic quality for port intelligence. Zhejiang and Shandong are generally balanced overall, but in DFI, RGDP_PC, and BET_GPB they mainly show “having crossed the DEDI inflection point but with supporting factors not yet fully meeting standards”, indicating that they should prioritize enhancing synergies among digital finance, economic support, and fiscal investment. Tianjin holds certain advantages in IPA, but lags behind in DEDI, DFI, RGDP_PC, and BET_GPB, meaning that its advancement in port intelligence cannot rely solely on innovation foundations; it must simultaneously strengthen data infrastructure, economic underpinning, and fiscal input. Fujian and Liaoning remain relatively lagging overall, with most indicators failing to cross the inflection points, requiring priority improvements in data infrastructure, digital financial services, regional economic foundations, and innovation capacity. Overall, the enhancement of coastal port intelligence is not determined solely by data elements, but rather by the degree of coordinated alignment among data elements, digital finance, economic foundation, fiscal support, and innovation capability. Policies must be tailored to regional endowment differences, implementing targeted measures to unlock the synergistic value of data elements through upgrades in supporting factors.
6.4. Further Analysis and Discussion of the Guangdong Case
Combining the results of random forest and partial dependence analysis, there is a strong statistical association and nonlinear response between DEDI and CPSI. However, the case of Guangdong suggests that high levels of data element development do not necessarily correspond to equally high levels of port digitalization performance. To avoid over-interpretation of causality in this phenomenon, this paper further explores the issue from the perspectives of institutional and governance contexts.
On one hand, Guangdong lies within the Greater Bay Area port cluster, where ports such as Guangzhou, Shenzhen, and Zhuhai differ in functional positioning, hinterland structure, port management entities, and industrial foundations. The “Outline Development Plan for the Guangdong-Hong Kong-Macao Greater Bay Area” positions Hong Kong as an international shipping center and places key ports like Guangzhou and Shenzhen within the regional integrated transportation system, proposing to strengthen inland waterways and port-access rail and highway networks centered on major coastal ports. This indicates that the digital transformation of ports in Guangdong involves complex coordination across cities, ports, and transport modes.
On the other hand, while the Guangdong port system already possesses substantial scale and regional data infrastructure, the performance of port digitalization still depends on whether data standards, platform interconnectivity, operational collaboration, and governance mechanisms are effectively integrated. In other words, data resource advantages can only be translated into improvements in CPSI when embedded in specific scenarios such as port operations, sea-rail intermodal transport, customs clearance coordination, port-logistics-trade services, and green supervision.
Based on the above exploratory analysis, Guangdong should focus on improving the efficiency of transforming its provincial data element advantages into smart port scenarios. Specifically, building upon the existing coordinated development of port clusters, efforts should be further advanced to unify data standards, interconnect platform interfaces, and coordinate business processes among major ports such as Guangzhou Port, Shenzhen Port, and Zhuhai Port, thereby promoting the in-depth application of data elements in areas including intelligent dispatching, sea-rail intermodal transport, customs clearance coordination, green supervision, and port-trade services.
For Guangzhou Port, priority should be given to strengthening its role as an international hub port, integrating port, shipping, and trade services, and enhancing cross-port intelligent dispatching capabilities. For Zhuhai Port, efforts should focus on addressing shortcomings in smart logistics platforms, digitalization of port operations, and multimodal transport coordination.
It should be noted that the explanation of the Guangdong case in this paper is intended as an exploratory discussion of mechanisms based on statistical results and policy texts, rather than a rigorous causal analysis. Future research should incorporate port-level operational data, platform development data, port governance structures, enterprise interviews, or case studies to conduct more detailed empirical analyses of data element conversion efficiency in Guangdong ports.
The Guangdong case further suggests that data-element accumulation is a necessary but not sufficient condition for port smartness enhancement. Governance capacity, cross-port coordination, data-standard unification, and scenario-based operational integration determine whether regional data resources can be converted into port-level intelligent services. This finding also has international relevance: in European and other global port systems, the same diagnostic framework may be transferable, but implementation effects are likely to vary with port governance structures, data-sharing rules, public–private coordination mechanisms, and the degree of integration among port, logistics, customs, and hinterland stakeholders.
7. Conclusions, Suggestions, Limitations, and Future Research
7.1. Conclusions
This study investigates the association between data element development and coastal port smartness, and identifies pathways for enhancement, thereby providing empirical evidence and policy implications for the smart development of coastal ports. The key conclusions are as follows: (1) CPSI and DEDI both exhibit an overall upward trend during the sample period, yet their development is not spatially synchronized. East China, represented by Shanghai, Zhejiang, and Shandong, displays a relatively coordinated pattern in which data element development and port smartness mutually reinforce each other. Tianjin and Liaoning demonstrate relatively stronger port-smartness foundations than their provincial data-element environments, whereas Guangdong presents a typical mismatch characterized by “data-element advancement but port-smartness lag.” Fujian and Liaoning still have considerable room for improvement in both data element development and port smartness. (2) Coastal port smartness displays a clear stepwise spatial structure and internal subsystem imbalance. Shanghai Port and Ningbo-Zhoushan Port form the leading tier, Qingdao Port occupies the second tier, and Tianjin, Dalian, Guangzhou, Xiamen, and Zhuhai show varying degrees of catch-up pressure. From the WSR perspective, the Wuli layer reveals gaps in infrastructure and intelligent facilities, the Shili layer reflects differences in logistics and operational efficiency, and the Renli layer highlights talent reserves and innovation capacity as key constraints for several ports. (3) The random forest results indicate that data elements are an important predictor of coastal port smartness, but their role should be interpreted as predictive contribution rather than strict causal effect. In the model without interaction terms, DEDI contributes 13.586% to CPSI prediction, ranking behind digital inclusive finance and openness and ahead of regional economic strength and innovation output. After introducing interaction terms, the standalone contribution of DEDI decreases to 3.216%, whereas the combined contribution of the DEDI main effect and its interaction terms accounts for 32.567%, indicating that the association between data elements and port smartness is strongly dependent on supporting regional conditions. (4) The partial effect results further reveal a nonlinear response of CPSI to DEDI. At low DEDI levels, data element accumulation provides a foundation for port smartness improvement; around the threshold of DEDI = 0.215, the positive association becomes more pronounced; at medium-to-high DEDI levels, the curve gradually flattens, suggesting diminishing marginal predictive gains. This pattern explains why regions with high DEDI, such as Guangdong, do not automatically achieve high CPSI: the transformation of data-element advantages into port-smartness performance also depends on infrastructure, industrial coordination, openness, digital finance, fiscal support, and innovation capacity. (5) The joint partial effect analysis shows significant dimensional and regional heterogeneity in the synergy between data elements and supporting factors. Digital inclusive finance, regional economic development, fiscal transportation expenditure, and innovation output all interact with DEDI, but the efficiency of these combinations differs across regions. Overall, coastal port smartness is shaped not by data elements alone, but by the coordinated allocation of data elements and multiple supporting factors.
7.2. Suggestions
The empirical findings of this paper are primarily derived from a sample of China’s coastal ports. The following recommendations do not constitute a universally applicable policy framework for smart ports, but rather region-specific moderate policy proposals formulated based on the matching status of the “data element-port smartness” nexus and the shortcomings in supporting factors across different regions.
Firstly, establish a full-chain data element system of “base-supply-flow-use” to promote the data elements to cross the scale threshold. For regions where data element development has not yet reached the threshold, accelerate the improvement of data infrastructure, expanding coverage of Internet broadband, big data platforms, and 5G networks; establish and improve data circulation rules and transaction mechanisms to facilitate cross-port, cross-department, and cross-regional data sharing and interconnection; strengthen data security protection and intellectual property rights (IPR) protection and improve the efficiency of data element circulation and the quality of its application, thereby providing full-chain data support for port smartness.
Secondly, promote synergistic empowerment of data elements and supporting factors. In response to the dimensional heterogeneity in the synergistic state between data elements and supporting factors, targeted policy interventions are proposed: strengthen the recruitment and cultivation of port talents with bachelor’s degrees or above; enhance R&D investment intensity and patent commercialization efficiency; optimize fiscal expenditure structures to tilt toward smart port development; and deepen the application of digital inclusive finance in the port sector. By upgrading supporting factors, the synergistic value of data elements can be fully unlocked.
Thirdly, implement a differentiated regional catch-up strategy to advance port intelligence in a categorized manner. Given the varying endowments and divergent development paths among coastal provinces and municipalities, tailored development strategies should be formulated: For regions characterized by “data leadership but lagging smartness”, such as Guangdong, efforts should focus on strengthening integration between provincial data infrastructure and port application scenarios, promoting data sharing among ports, customs, shipping companies, logistics platforms, and financial institutions, thereby enhancing the port’s capacity to absorb and utilize data resources. For areas like Tianjin, which exhibit “smartness leadership but data lag”, it is essential to fully leverage their innovation advantages while simultaneously strengthening data infrastructure, data circulation rules, port data platform development, and policy-based financial support. For relatively underdeveloped regions such as Fujian and Liaoning, policies should prioritize synchronized advancement in data infrastructure, digital financial services, industrial hinterland support, fiscal investment, and innovation capabilities. For regions demonstrating “coordinated development and positive feedback loops”, including Shanghai, Zhejiang, and Shandong, the focus should shift from scale expansion to quality improvement, with enhanced synergy between data elements and inclusive digital finance, regional economic support, fiscal transportation spending, and innovation output.
Fourth, strengthen the dual drivers of digital inclusive finance and opening-up to unlock the synergistic value of supporting factors. In response to the pervasive shortcoming of lagging digital inclusive finance across all seven provinces and municipalities, the following targeted measures are proposed: The People’s Bank of China (PBC) and the National Administration of Financial Regulation (NAFR) should issue special policies to encourage financial institutions to develop financial products tailored to port smart transformation, thereby reducing the financing costs of digital upgrading; For regions with opening-up advantages (e.g., Guangdong, Shanghai), enhance the synergistic effect between trade dynamism and data element empowerment; For regions with lagging opening-up (e.g., Fujian, Liaoning), expand outward-oriented businesses such as cross-border e-commerce and international logistics to form a virtuous cycle of “trade-driven growth → smart upgrading → data empowerment → financial innovation”, advancing the construction of world-class ports.
7.3. Limitations and Future Research
Currently, research on how data elements can empower port intelligence is still in its exploratory stage, and the theoretical framework requires further refinement. This study aims to investigate the relationship between data elements and the enhancement of coastal port smartness by establishing a scientific and reasonable indicator system. However, due to limitations in data availability and research conditions, the developed indicator system remains incomplete, and this study has certain inherent limitations.
Specifically, this study has five main limitations: the sample size is relatively small; the random forest and partial-effect models provide predictive rather than causal evidence; the DEDI is measured mainly at the provincial level, while CPSI is measured at the port level; the institutional and governance setting is China-specific; and the international generalizability of the conclusions remains limited. These caveats should be considered when interpreting the empirical results and policy implications.
Future research should focus on the following five aspects:
- (1)
Indicator System Construction: As statistical data continue to improve, authors should make efforts to develop a more detailed, scientific, and systematic evaluation index system, incorporating more micro-level and qualitative indicators.
- (2)
Sample Size: The current study is based on a limited number of major coastal ports, corresponding to 56 port-year observations due to data constraints. Model complexity is deliberately constrained through cross-validation, parsimonious feature selection, benchmark comparison, and robustness checks. These procedures reduce, but cannot eliminate, the uncertainty associated with small-sample machine-learning analysis. To achieve more reliable results, future research should consider larger sample sizes covering small- and medium-sized ports and inland river ports.
- (3)
Research Methods: Exploring other ways to assess the nonlinear and spatial spillover effects of data elements, the degree of synergy between data elements and emerging technologies, and the long-term dynamic development of smart ports could lead to more comprehensive conclusions.
- (4)
Causal Interpretation and Indicator Scale: The machine-learning results should be interpreted as predictive associations rather than causal effects. Future research should integrate port-level operational data, platform transaction data, enterprise interviews, and longitudinal policy shocks to better address causality and the scale mismatch between provincial DEDI indicators and port-level CPSI indicators.
- (5)
International Applicability and Governance Context: The conclusions of this study are primarily applicable to the context of China’s coastal ports and cannot be directly generalized to all international port systems because institutional arrangements, port governance models, data-sharing rules, and public–private coordination mechanisms differ across countries. Future research may further expand the sample scope, integrating micro survey data, enterprise operation data, and international port comparison data to conduct external testing and revision of the framework proposed in this study.