1. Introduction
In the era of rapid digital economic growth, data have become a central production factor, profoundly transforming economic growth models and social governance systems. China’s “14th Five-Year Plan for Digital Economy Development” and the “Opinions on Establishing a Data Infrastructure System to Better Leverage the Role of Data Elements” (commonly known as the “Data Twenty Articles”) explicitly highlight the need to accelerate the marketization of data elements and refine data value assessment mechanisms to unlock the economic and social potential of data assets. The core modeling challenge in this process lies in addressing the nonlinearity of data and the sensitivity of neural networks to initial parameters, which complicates the accurate quantification of data’s unstructured value. Moreover, the rapid advancement of artificial intelligence has greatly facilitated the in-depth exploration of data elements, playing a crucial role in data analysis, intelligent forecasting, and value assessment. Despite these advancements, effectively measuring and evaluating the value of data remains a significant challenge in the ongoing process of data marketization.
Existing research on data value assessment primarily follows two approaches. The first approach adapts conventional intangible asset valuation methods—such as the income approach [
1], market approach [
2], and cost approach [
3]—to account for the unique characteristics of data. Wang et al. reviewed current data valuation methodologies, highlighting the importance of incorporating data’s natural increment, externalities, and multidimensional attributes. They also emphasized the need to consider spillover effects in data circulation and evaluate value based on specific application scenarios [
4]. Jonathan argued that traditional valuation methods should account for market dynamics as external economic conditions significantly influence data valuation [
5]. Xu Xianchun et al. refined the cost approach to improve its applicability in data accounting [
6], while Lei et al. enhanced the income approach to increase evaluation accuracy. Among these methods, the cost approach is generally preferred due to its practicality and lower operational complexity, whereas the market and income approaches are more susceptible to external market fluctuations [
7]. Yang classified the enterprise data assetization process into three stages—accumulation, application, and assetization—applying the income approach to assess case enterprises [
8]. Zhang et al. [
9] suggested that the income approach better reflects the profitability of data, yielding results that are widely accepted. Meanwhile, Lu et al. [
10] proposed mitigating the limitations of the traditional cost approach by incorporating precise market premium estimations and accurate replacement cost calculations. Lin contended that most studies on traditional valuation methods overlook practical applicability, advocating for the adoption of modern evaluation techniques instead [
11].
The second approach to data valuation involves developing specialized derivative models. For instance, Longstaff et al. introduced the least squares Monte Carlo (LSM) model, which integrates least squares regression with the Monte Carlo algorithm for valuation purposes [
12]. Zuo et al. proposed replacing the analytic hierarchy process with a multidimensional preference linear programming method to enhance data valuation accuracy [
13]. Despite these advancements, no consensus has been reached within the academic community regarding a standardized methodology for data value assessment. Most studies remain anchored in intangible asset valuation frameworks while advocating for the selection of evaluation approaches tailored to the specific attributes of the data [
14].
Recent advancements in artificial intelligence have introduced new perspectives on data value assessment. Researchers have increasingly explored the integration of machine learning and deep learning techniques with traditional valuation methods to optimize assessment models and improve predictive accuracy. For example, D. Niyato et al. incorporated the Sternberg model from economics with machine learning classification algorithms to evaluate data utility from a data science perspective [
15]. Zhang proposed a deep learning-based model for analyzing both intrinsic and extrinsic data value, validating its effectiveness using production data from port enterprises [
16]. Wang et al. developed a fuzzy evaluation model for big data value assessment, employing artificial neural networks to determine indicator weights and applying fuzzy comprehensive evaluation to derive final valuation results [
17]. Additionally, Ren constructed a data evaluation system leveraging ensemble machine learning techniques to quantify data value [
18]. Wang further highlighted the capability of artificial neural networks to objectively assess data application value, mitigating the subjectivity and variability inherent in manual evaluations [
19]. Yan’s research demonstrated that machine learning-based models outperform traditional multiple linear regression methods, with the random forest model exhibiting the highest accuracy in data valuation [
20].
Building on these developments, deep learning-based neural network models have gained prominence in data value assessment. Among them, the backpropagation (BP) neural network [
21], recognized for its strong nonlinear mapping capabilities and adaptability, has demonstrated significant advantages in data prediction and analysis.
The backpropagation (BP) algorithm has demonstrated significant potential in data valuation applications, particularly in optimizing computational efficiency and enhancing predictive accuracy. For example, Huo et al. [
22] implemented a three-layer BP neural network to classify retail stores for fresh food sales forecasting. Their model exhibited accelerated convergence rates and robust resistance to data redundancy and noise during Matlab simulations, achieving superior accuracy in short-term stock price predictions. Further showcasing its versatility, Chen et al. [
23] employed BP neural networks during the COVID-19 pandemic to predict user suitability for online education platforms, achieving a classification accuracy of 77.5%. Similarly, Kalinić et al. [
24] demonstrated the BP algorithm’s predictive superiority over traditional linear models in analyzing mobile commerce consumer attitudes, highlighting its enhanced ability to capture complex behavioral patterns.
However, traditional BP neural networks exhibit inherent limitations in practical applications, such as a susceptibility to local optima and a hypersensitivity to initial weight-threshold configurations, all of which reduce model stability and predictive reliability. Feng et al. [
25] emphasized the algorithm’s critical reliance on initial weights and biases, showing that improper initialization directly degrades valuation accuracy. Deng et al. [
26] identified fundamental weaknesses in BP neural networks for elemental prediction in aquatic systems, noting that their tendency toward local minima restricts global optimization potential, thus limiting practical applicability despite satisfactory internal validation performance. To address these limitations, researchers have proposed alternative architectures. For instance, an RBF (radial basis function) neural network model was introduced to overcome the predictive deficiencies of conventional BP networks. Wang et al. [
27] systematically analyzed the inherent contradictions between BP’s learning rate and stability, as well as the absence of systematic methods for determining optimal hidden-layer neuron quantities. Similarly, Ye et al. [
28] confirmed BP’s convergence instability and structural ambiguity while implementing a modified LM-BP (Levenberg–Marquardt backpropagation) neural network to reduce model error rates through enhanced optimization mechanisms.
To address these challenges, researchers have increasingly integrated metaheuristic optimization algorithms with backpropagation neural networks (BPNNs) in recent years. Notable advancements include hybrid architectures that combine BPNNs with a sparrow search algorithm (SSA) [
29], a whale optimization algorithm (WOA) [
30], a particle swarm optimization (PSO) [
31], and genetic algorithms (GAs) [
32]. These integrations enable BPNNs to overcome the constraints of local optima and achieve global optimization capabilities. Among these approaches, the genetic algorithm (GA) has proven particularly effective as a global search mechanism for optimizing the initial weights and thresholds of BPNNs. This GA-BPNN hybridization effectively resolves issues related to instability and local optima entrapment during system training, thereby enhancing both convergence efficiency and predictive performance [
33]. When applied to data valuation modeling, the GA-BPNN framework significantly improves prediction efficiency while eliminating the subjective biases inherent in traditional fitting processes, thus enhancing the objectivity and reliability of results [
34]. Zhang et al. [
35] optimized a BP neural network using an improved genetic algorithm (GA) to model the relationship between weld appearance and molten pool shadow features. Through experiments conducted at welding speeds of 4.5 m/min and 3 m/min, the predictive performance of the model was validated, with the mean absolute percentage error (MAPE) for weld height and width remaining below 4.95%, 4.81%, 5.3%, and 1.4%, respectively. These results demonstrate the model’s high accuracy and stability. Song et al. [
36] proposed a BP neural network prediction model integrated with the AdaBoost algorithm to improve prediction accuracy. The average mean absolute error (
MAE) decreased from 17,760.1 in the standard BP model to 5,230.6 in the AdaBoost_BP model, representing an approximate 70.5% reduction in error. Liang et al. [
37] developed a GA-optimized BP neural network (GA-BP) model to predict the effects of polymer fissure grouting. Compared with the conventional BP model, the GA-BP model improved the coefficient of determination (
R2) from 0.975 to 0.991 while reducing
MAE by 56.6% and root mean square error (
RMSE) by 40.4%, indicating significantly enhanced predictive accuracy and fitting performance. Zhang et al. [
38] applied the GA-BP neural network to improve the energy efficiency assessment of crude oil gathering and transportation systems. The GA-BP model increased the
R2 for energy utilization efficiency, thermal energy utilization efficiency, and electrical energy utilization efficiency by 1.87%, 0.56%, and 1.37%, respectively. Additionally, the
R2 for comprehensive energy consumption, gas consumption, and electricity consumption improved by 2.63%, 0.77%, and 0.31%, respectively, demonstrating that the GA-BP model significantly enhanced prediction accuracy while reducing computational workload. Li et al. [
39] employed an interpretable GA-BP strategy to optimize the prediction of interactions at the membrane–sludge particle interface. The mean squared errors (
MSEs) for three types of interactions were reduced to 1.0816 × 10
−6, 5.0089 × 10
−9, and 9.0432 × 10
−7, respectively, with the maximum regression coefficient (R) reaching 0.99990 and prediction errors controlled within 0.01%, effectively improving the accuracy of membrane fouling quantification. Li et al. [
40], based on construction engineering data from Guangdong Province, proposed a GA-BP model for project cost prediction. The
R2 of the testing set increased from 0.87 to 0.94, and the
RMSE decreased from 1907.203 to 1281.422, significantly enhancing prediction accuracy and stability.
This study represents the first application of a GA-optimized BP neural network in data value assessment. A comprehensive data value evaluation framework was established based on different “dataset” types, consisting of 9 primary indicators—data quality, data accessibility, data coverage, data diversity, data adaptability, data volume, technical capability, market factors, and data source—and 21 secondary indicators, including producer reputation, supplier qualifications, data quality certification, timeliness, and update speed, among the 21 indicators. The model designed in this study incorporates symmetrical structural features, which contribute to the stability and generalization ability of data value prediction. The GA-BP model effectively optimizes network weights and thresholds, accelerating convergence and improving prediction accuracy while substantially reducing mean squared error (
MSE), root mean squared error (
RMSE), and mean absolute error (
MAE). Simultaneously, it enhances the determination coefficient (
R2) [
41]. Furthermore, comparative analysis of predicted and actual values, alongside error distribution visualizations, thoroughly validated the advantages of GA-BP in terms of convergence speed, prediction accuracy, and model stability. Experimental results demonstrate that the GA-BP model effectively handles data complexity and heterogeneity, outperforming BP neural networks, long short-term memory (LSTM), convolutional neural networks (CNNs), random forest (RF), support vector machines (SVMs), particle swarm optimization BP (PSO-BP), whale optimization algorithm BP (WOA-BP), and sparrow search algorithm BP (SSA-BP) in both prediction accuracy and model robustness.
5. Conclusions
This study introduces a BP neural network model optimized by a genetic algorithm (GA). Through comparative experimental analysis, the results demonstrate that the GA-BP model significantly outperforms traditional BP neural networks in both prediction accuracy and stability. Furthermore, the GA-BP model exhibits clear advantages over popular machine learning models, including long short-term memory (LSTM), convolutional neural networks (CNNs), random forest (RF), and support vector machines (SVMs), as well as optimization algorithms such as particle swarm optimization (PSO), sparrow search algorithm (SSA), and whale optimization algorithm (WOA). These advantages are especially evident when addressing highly nonlinear and complex tasks, such as data value assessment. This is primarily attributed to the effective integration of the genetic algorithm’s global search capability with the BP neural network’s ability to model nonlinear relationships, which collectively overcome the issue of local optima that often hampers traditional BP networks.
In terms of application value, this study contributes significantly beyond simply applying the GA-BP neural network model to the novel field of data value assessment. Specifically, it introduces a new approach by integrating data indicators from various industries to construct a more objective and comprehensive data value assessment model. This approach addresses the subjectivity often found in traditional evaluation methods. The model has potential applications in pricing platforms across sectors such as finance, healthcare, and manufacturing. Furthermore, the study proposes a market-driven valuation method that leverages historical transaction data from multiple sectors, enabling the model to provide intelligent, cross-industry data value assessments. This methodology not only enhances the understanding of data value evaluation but also lays the foundation for a standardized framework that can be applied across different industries. Whilst the study provides valuable insights, limitations such as the relatively small sample size and the data being sourced from a single platform are acknowledged, which may impact the generalizability of the findings.
Future research could expand in several directions. First, in terms of algorithm optimization, further exploration of integrating genetic algorithms with other optimization techniques or incorporating more advanced algorithms could further improve prediction accuracy and generalization ability. Second, in terms of model architecture, the introduction of deep learning techniques, such as attention mechanisms, could enhance feature extraction capabilities. Third, in terms of application expansion, incorporating external factors such as market dynamics and policy environments into the evaluation framework could result in a more comprehensive data value assessment model. Fourth, addressing the current limitations by expanding the sample size, diversifying data sources, synthetic data, and validating the model across multiple countries would enhance its robustness and improve its applicability in international contexts. The GA-BP model proposed in this study offers an effective technical approach for data value assessment. As the digital economy continues to develop, this method is poised to play a significant role in a broader range of data-driven scenarios, supporting the market-based allocation of data resources.