Modeling and Data Analysis of Innovation Dynamics in Complex Human–AI–Content Networks: A Multimodal Graph Learning Approach

Zhou, Fangzhou; Fang, Lin; Zaki, Hafizah Omar

doi:10.3390/math14122051

Open AccessArticle

Modeling and Data Analysis of Innovation Dynamics in Complex Human–AI–Content Networks: A Multimodal Graph Learning Approach

by

Fangzhou Zhou

^1,2,3,

Lin Fang

^2,3,4 and

Hafizah Omar Zaki

^1,*

¹

Faculty of Economics and Management, The National University of Malaysia, Bangi 43600, Malaysia

²

School of Economics and Management, Shangluo University, Shangluo 726000, China

³

The New Style Think Tank of Shaanxi Universities Shangluo Development Research Institute, Shangluo University, Shangluo 726000, China

⁴

School of Management, Xi’an Jiaotong University, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(12), 2051; https://doi.org/10.3390/math14122051 (registering DOI)

Submission received: 15 April 2026 / Revised: 1 June 2026 / Accepted: 5 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Modeling and Data Analysis of Complex Networks)

Download

Browse Figures

Versions Notes

Abstract

In complex socio-technical systems, human–AI collaboration is becoming fundamental to the processes of knowledge creation, content generation, and innovation. The existing innovation models typically consider only a single actor, the sole AI system, or a content artifact, and therefore do not capture the dynamics between these heterogeneous actors. This study introduces a Multimodal Graph Neural Network (MM-GNN), for modeling and analyzing innovation dynamics within Human–AI–Content (HAC) networks. The proposed framework is based on HAC networks as dynamic tripartite graphs, where human nodes, AI agent nodes, and content nodes are interconnected by edges representing interactions that evolve over time. Multimodal information, including text, image, code, and structured interaction traces, is merged by attention-based fusion, and multimodal dependency and evolution of interactions are modeled by relation-aware graph message passing and GRU-based temporal propagation. The innovation potential is realized as an upper-bounded composite score based on normalized novelty, entropy change, diffusion contribution, and human-rated creativity if available. The model is assessed as a composition of node-level classification and a regression model for innovation-level classification and estimation of continuous innovation potential. Experiments on synthetic HAC datasets and selected real-world AIGC corpora demonstrate that MM-GNN performs better than the graph learning and index-based baselines, with an average F1 score of 0.87, temporal stability ρ = 0.89, and lower regression error. The ablation and visualization analyses demonstrate that the multimodal fusion and temporal propagation are beneficial for representation quality, diffusion modeling, and interpretation. The results offer a mathematical and computational approach to the study of innovation as an emergent phenomenon of dynamic human, AI, and content interactions and lay the groundwork for additional validation on a more expansive socio-technical scale.

Keywords:

complex networks; heterogeneous graphs; mathematical modeling; network data analysis; multimodal graph neural networks; temporal dynamics; human-AI-content ecosystems; information diffusion

MSC:

68T07; 68T09; 05C82

1. Introduction

Over the last few years, the rise of artificial intelligence (AI) technologies has undergone a tremendous shift in the creation, sharing, and reusing of knowledge. Conventional systems of innovation have dealt mostly with human-centric input and networks (e.g., human experts working together and human-generated content). Nevertheless, due to the emergence of generative models at large scale and artificial intelligence-driven content-generation tools, a new paradigm is being formed where humans, AI agents, and content artifacts all engage in knowledge activities [1]. Generative AI tools generate text, images, code, and other media at an unprecedented scale, thus becoming new participants in the process of innovation. Such a changing environment highlights the necessity of strict models that can reflect the interaction of human actors, AI systems, and information as part of knowledge networks. Artificial intelligence-generated content (AIGC) is content (text, image, code, audio, or video) generated by AI systems, particularly those that are generative architectures. Generative AI revolutionizes the way we work and communicate and opens the possibilities of having systems that not only help human beings but also produce something new on their own [2]. In the meantime, AI is turning out to be a significant facilitator of knowledge management practices and changes the way knowledge is acquired, stored, and reused [3]. Concurrently, conceptual literature on hybrid intelligent service ecosystems of recent years has found the frameworks of human and artificial agencies co-creating value in service environments to be able to expand the traditional human-only service logic [4]. Such developments are indicative of the creation of hybrid knowledge ecosystems, in which human actors, AI systems, and content artifacts (which can in turn be generative) create dynamic networks of innovation. In these ecosystems, there is the circulation of knowledge between humans and machines, and it is incorporated in content. In addition, new studies suggest that both AIGC and human-generated content (HGC) contribute to user engagement, creativity, and innovation performances. AIGC and HGC impact user engagement and innovation in the content-preneurship sphere significantly, and HGC focuses on authenticity and emotional attachment [5]. Generative AI in innovation management focuses on strategic possibilities of generative AI on the innovation processes within organizations [6]. This result inspires a new form of modeling that combines human, AI, and content nodes into a framework.

As much as the emergence of HAC networks promises to be incredibly rewarding, there are various challenges that are still critical to the modeling of these dynamics. Traditional network models conceptualize innovation in a homogeneous manner (e.g., firms and individuals) and specialize in the knowledge flows among them. However, in HAC networks, we must distinguish between human agents, AI agents, and content artefacts (text, image, code, and audio), each with different capabilities and interaction patterns. The multimodal nature of content (e.g., image + text, code + natural language) further complicates representation and embedding. Knowledge innovation in HAC systems is dynamic: human–AI interactions evolve, content is revised, reused, and remixed, and novel artefacts generate subsequent interactions. Modeling such temporal evolution and feedback loops, in which content influences human behavior or AI retraining, is non-trivial. Existing static graph models are insufficient. Beyond modeling structure, we need to quantify innovation potential, knowledge novelty, and synergy in these networks. Information-theoretic measures (e.g., entropy and mutual information) have been used in knowledge network modeling [7], but extending them to HAC networks with AI involvement remains largely unexplored. In HAC networks, AI agents are not simply tools but proper actors with agency: they generate content, suggest innovations, and can collaborate. Understanding their role (as catalyst, partner, or autonomous innovator) and modeling it mathematically poses conceptual and methodological challenges. A human–AI synergy degree model in education was created in one study, showing fluctuations in collaboration order and synergy in hybrid environments [8]. The scale of AI-generated content and its reuse across contexts creates extensive, complex networks of interaction. Graph learning methods must scale to such multimodal, heterogeneous, temporal networks and produce meaningful embeddings and predictions.

The primary objective of this study is to establish a formal and computational framework for modeling knowledge innovation within HAC networks. Our study builds on recent developments in graph learning and information theory, which contribute to the formation of novel knowledge structures in AIGC-based systems involving humans, AI agents, or multimodal content co-evolution. To be more precise, we seek to (i) learn a tri-part multimodal graph model that captures the heterogeneous interactions between human, AI, and content nodes; (ii) develop an MM-GNN architecture that can learn dynamic embedding metrics that capture emergent novelty, synergy, and diffusion in the network, and (iii) develop an information-theoretic innovation index that quantifies emergent novelty, synergy, and diffusion in the network. This study addresses an empirical problem of node-level prediction over dynamic HAC graphs as a joint problem. In particular, the model is trained to two inter-related tasks: (i) node-level innovation category classification, in which each human, AI, or content node is classified within one of three innovation categories (low, medium, or high), and (ii) node-level innovation-score regression, in which each node receives a continuous innovation potential score based on information-theoretic novelty, entropy gain, reuse diffusion, and human-rated creativity. For the classification task, accuracy, F1 score, ROC-AUC, and confusion matrices are used, while for the regression task, MAE, RMSE, R², and confidence intervals are used. This is a joint formulation adopted because innovation within HAC networks can be identified categorically (in terms of identifying the nodes with high innovation potential) and in a continuous manner (in terms of estimating the amount of innovation potential over time).

The key contributions of the paper can be summarized as follows. First, it offers a mathematically justified description of hybrid innovation systems in the existing literature on human–AI collaboration, with explicit consideration of multimodal content as an active node type. Second, it presents a multimodal graph-learning model that integrates heterogeneous modalities within a graph-learning framework via attention-based message passing, in contrast to earlier single-modality models of innovation diffusion. Third, it promotes an entropy-based measure of knowledge innovation, drawing on recent work that uses information theory to model complex knowledge networks. Lastly, the paper empirically confirms the proposed framework on simulated and real-world AIGC data and demonstrates greater predictive ability and interpretability compared to existing baselines. Collectively, these aims and contributions can advance the quantitative perspective on AIGC-driven hybrid innovation ecosystems and provide conceptual and applied insights into how human creativity and AI capabilities can be combined to create new knowledge.

2. Related Work

2.1. Knowledge Network Modeling and Innovation Diffusion

Research on knowledge network modeling has a rich history in the fields of scientometrics and innovation studies. It has been studied in terms of how information circulates among individuals, organizations, and artifacts. Initial models viewed innovation as a diffusion process through homogenous networks, citation, and collaboration networks [6,9]. Nevertheless, emerging strategies extend these frameworks to capture dynamic and heterogeneous innovation processes in digital and AI-driven environments. Novillo-Villegas et al. [7] examined the innovation networks in industries in terms of the entropy-based diffusion of knowledge, and the results showed a nonlinear diffusion of ideas across disciplinary lines. Equally, Appleyard [10] applied graph-based modeling to investigate the networks of corporate innovations in the semiconductor industry and found that global connectivity of knowledge has a positive relationship with innovation performance. Other research, e.g., Farayola et al. [11] and Qu et al. [12], underlines the increasing complexity of knowledge ecosystems, in which knowledge is recombined through both human experts and algorithmic means. A network-based diffusion model has also been generalized to a multi-layer, temporal format, as described by Cai et al. [13] and Zhu et al. [14], who analyzed the phenomenon of dynamic knowledge spillovers on open innovation networks. These works provide the basis for representing innovation as a dynamic, networked process; however, they pay little attention to the active, generative role of AI in content generation.

2.2. Human–AI Collaboration and Co-Creative Systems

Since the introduction of large language models and multimodal generative systems, creative collaboration has gained more and more scholarly interest in the role of AI. Feuerriegel [2] and Adefarati et al. [3] all point out that AI will shift towards a co-creative aspect of organizational support rather than the supportive element of the workforce in contemporary organizations. Empirical evidence of human and AI collaboration by Kong et al. [8] and Fu et al. [15] indicates that emergent patterns of knowledge innovation from human and AI collaboration are demonstrated through feedback loops and distributed cognition. The innovativeness of creative industries has been reported to be enhanced by AIGC tools by Li et al. [16] and Kamnerddee et al. [17], but originality and interpretability are sacrificed in certain instances. Maglio and Lim [18] theorize about so-called hybrid intelligent service ecosystems of humans and intelligent agents co-creating values, and Collins et al. [19] ask that human control should be maintained but machine creativity should be harnessed. Education and communication research indicate that transparency, explainability, and mutual adaptation are the key elements that define successful human–AI synergy [20]. Together, these papers bring out the co-evolution of human and AI creativity but lack formal modeling of the structural and mathematical dynamics that support these systems.

2.3. Graph Representation Learning and Multimodal Fusion

The concept of graph learning has emerged as the new paradigm of complex relational data representation with the tools to provide powerful models of heterogeneous and dynamic networks. The original paper on graph convolutional networks (GCNs) by Kipf and Welling [21] led to many extensions, including heterogeneous graph neural networks (HGNNs) [22] and graph attention networks (GATs) [23]. Social, biological, and knowledge graphs have used these architectures, which have shown promising results in representation learning and link prediction. More recent advances include multimodal fusion learning that uses graph learning to incorporate text, images, and audio. He and Wang [24] introduced a cross-domain learning multimodal graph transformer, whereas Feng et al. [25] used temporal GNNs in dynamic video–text reasoning. Wang et al. [26] designed a knowledge-aware GNN to do cross-media retrieval and demonstrated that hybrids of multimodal embeddings are better at reasoning about knowledge graphs. Nevertheless, the current multimodal GNNs are primarily intended to understand content or make recommendations but not to model innovation diffusion or knowledge creation. The absence of this leads to the application of multimodal graph learning to Human–AI–Content (HAC) networks, in which interactions are semantic and generative.

2.4. Information-Theoretic Models of Innovation

Information theory also furnishes a strong basis to measure the novelty of knowledge, uncertainty, and synergy in an innovation system. The entropy of Shannon is an essential indicator of the diversity of knowledge flows, and the mutual information is the interdependence of actors or modalities. Zhu et al. [14] used entropy-based measures to evaluate technological convergence, and Moaniba et al. [27] used mutual information to model interdisciplinary knowledge recombination. These principles have been generalized in recent works to systems that are hybrid and intelligent. As an illustration, Liu and Jiang [28] proposed an entropy and synergy index to open innovation, which is a combination of information-theoretic and graph-based metrics. Novillo-Villegas et al. [7] combined information entropy and network topology in describing emergent innovation hotspots. However, irrespective of these developments, minimal literature implements such measures in AIGC-based situations. By using entropic quantification in conjunction with graph neural networks, it may be possible to construct models that not only learn structural embeddings but also predict the potential of innovation in HAC systems to bridge the gap between theory and machine learning.

2.5. Research Gap and Positioning

Despite the fact that previous studies have yielded valuable information on knowledge networks, human–AI collaboration, and graph learning, there are still significant weaknesses. The majority of the literature models human–human or human–content networks, whereas the three-part interaction between humans, AI agents, and multimodal content has not been well explored. Moreover, current multimodal GNNs rarely account for information-theoretic metrics of novelty and knowledge creation. There is also a paucity of formal models that describe co-evolutionary feedback loops between human cognition, AI generation, and content recombination—critical processes in AIGC ecosystems. This study positions itself at the intersection of these gaps. It advances the field by (i) conceptualizing HAC networks as dynamic, multimodal systems of innovation, (ii) developing a graph-learning model that fuses heterogeneous features, and (iii) embedding entropy-based innovation measures within the graph-learning process. In doing so, it integrates previously distinct strands of research, knowledge network theory [7,9], human–AI collaboration [2,8], multimodal graph learning [24,25], and information theory [14,27], into a unified mathematical framework for modeling innovation in AIGC-enabled systems. Table 1 presents a comparative overview of related studies across domains.

3. Mathematical Framework for HAC Innovation Networks

3.1. Tripartite Network Definition

An HAC system can be expressed as a dynamic tripartite graph:

G_{t} = (V_{H}, V_{A}, V_{C}, E_{t}, X_{t})

(1)

where

V_{H}

,

V_{A}

, and

V_{C}

represent human, AI, and content node sets.

Edges

E_{t}

denote interactions at time

t

, and

X_{t}

stores node features derived from multimodal sources.

Each node

v \in V

contains a fused feature vector:

x_{v} = [x_{v}^{text} ∥ x_{v}^{image} ∥ x_{v}^{code} ∥ x_{v}^{meta}] \in R^{d}

(2)

combining textual, visual, and structured attributes.

Figure 1 illustrates the architecture of the HAC tripartite network, highlighting heterogeneous node types and their interaction channels.

3.2. Multimodal Feature Representation

Let

z_{i, m}

denote the embedding of node

i

under modality

m

, where

m \in {t e x t, i m a g e, c o d e}

. Each modality-specific embedding is projected into a shared latent space:

u_{i, m} = W_{m} z_{i, m} + b_{m}

(3)

The attention score for modality

m

is computed as

e_{i, m} = v^{⊤} t a n h (W_{a} u_{i, m})

(4)

and the normalized attention weight is

α_{i, m} = \frac{e x p (e_{i, m})}{\sum_{r = 1}^{M} e x p (e_{i, r})}

(5)

The fused multimodal node representation is then

x_{i} = \sum_{m = 1}^{M} α_{i, m} u_{i, m}

(6)

This attention mechanism places more weight on modalities that are more informative to the node representation. If the modality alignment is weak or semantically different, the attention weights are decreased instead of coerced to take a common sense. Residual connections between each modality vector can also be kept in the final representation to maintain modality-specific information.

The cross-modal similarity between modalities

m

and

n

is computed after projection into the shared latent space:

ϕ_{m n} (i) = \frac{u_{i, m}^{⊤} u_{i, n}}{∥ u_{i, m} ∥_{2} ∥ u_{i, n} ∥_{2} + ϵ}

(7)

where

ϵ

is a small constant to make this formula numerically stable. This cosine-based similarity is approximately in the range of

[- 1, 1]

. High values represent semantic similarity between modalities; low and negative values reflect similarity or semantic distance. The similarity score is only used as a modulating term for attention weights and is not meant to force an incompatible modality to be merged. These two modalities are aligned to take advantage of the aligned modalities, but noisy and mismatched modalities are mitigated.

3.3. Multimodal Graph Learning Architecture

The learning process applies a Multimodal Graph Neural Network (MM-GNN) for relational message passing. Temporal dependency is modeled using a GRU update over consecutive graph snapshots:

{\bar{h}}_{i}^{t} = GNN (x_{i}^{t}, E^{t})

(8)

h_{i}^{t} = GRU ({\bar{h}}_{i}^{t}, h_{i}^{t - 1})

(9)

where

{\bar{h}}_{i}^{t}

is the graph-updated representation of node

i

at time

t

and

h_{i}^{t - 1}

is the hidden state from the previous graph snapshot. The GRU module preserves temporal memory and allows the model to capture delayed innovation effects across HAC interactions.

For node

i

, the relation-aware message-passing update at layer

l

is defined as

h_{i}^{(l+ 1)} = σ (W_{0}^{(l)} h_{i}^{(l)}+ \sum_{r \in R} \sum_{j \in N_{r} (i)} α_{i j}^{r, l} W_{r}^{(l)} h_{j}^{(l)})

(10)

where

R

is the set of relation types, including human–AI, human–content, and AI–content relations;

N_{r} (i)

denotes the neighbors of node

i

under relation

r

;

W_{r}^{(l)}

is a relation-specific trainable matrix;

W_{0}^{(l)}

is the self-loop transformation; and

α_{i j}^{r, l}

is the normalized attention coefficient for neighbor

j

under relation

r

.

Temporal evolution follows gated recurrent propagation

H_{t} = GRU (H_{t - 1}, {\tilde{H}}_{t})

(11)

ensuring consistent representation over time.

Figure 2 presents the MM-GNN structure that integrates attention weighting and temporal recurrence for dynamic innovation modeling.

3.4. Innovation Dynamics

Innovation potential is modeled as a time-dependent node-level quantity derived from representational novelty, local diversity change, and diffusion contribution. For node

i

at time

t

, the innovation potential is defined as

Φ_{i}^{t} = λ_{1} {\tilde{N I}}_{i}^{t} + λ_{2} {\tilde{Δ H}}_{i}^{t} + λ_{3} {\tilde{D C}}_{i}^{t}

(12)

where

{\tilde{N I}}_{i}^{t}

,

{\tilde{Δ H}}_{i}^{t}

, and

{\tilde{D C}}_{i}^{t}

denote normalized novelty, entropy-change, and diffusion-contribution components, respectively. When human-rated creativity is available, it is incorporated as an auxiliary component:

Φ_{i}^{t} = λ_{1} {\tilde{N I}}_{i}^{t} + λ_{2} {\tilde{Δ H}}_{i}^{t} + λ_{3} {\tilde{D C}}_{i}^{t} + λ_{4} {\tilde{H R C}}_{i}^{t}

(13)

The network-level innovation potential is defined as the average node-level innovation potential:

Φ^{t} = \frac{1}{∣ V ∣} \sum_{i \in V} Φ_{i}^{t}

(14)

The temporal change in network-level innovation is measured as

Δ Φ^{t} = Φ^{t} - Φ^{t - 1}

(15)

Innovation potential increases with positive values of

Δ Φ^{t}

and approaches zero when it reaches stability. This formulation does not make the assumption that entropy increase is synonymous with innovation. Innovation is, on the contrary, a composite construct consisting of novelty, diversity, diffusion and, if possible, human evaluation.

3.5. Stability and Analytical Properties

The boundedness and stability of the proposed HAC innovation model is discussed in this subsection. The innovation potential is composed of two parts: normalized information-theoretic and graph-based components. Hence, the score after normalization is bounded. More precisely, every element of the innovation score is scaled to the interval

[0, 1]

, and the weights

λ_{k}

are all non-negative and sum to one. As a consequence, the built innovation potential fulfills the following:

0 \leq y_{i}^{t} \leq 1

(16)

This boundedness follows directly from the convex combination of normalized components. It provides numerical stability for supervised learning and ensures that the regression target remains comparable across datasets.

The temporal embedding update can be analyzed through a simplified linearized form of the recurrent graph propagation:

H^{t} = σ (\tilde{A} H^{t - 1} W + X^{t} B)

(17)

where

H^{t}

denotes the hidden representation at time

t

,

\tilde{A}

is the normalized adjacency matrix,

W

is the trainable propagation matrix,

X^{t}

is the node-feature matrix,

B

is a feature-projection matrix, and

σ

is a nonlinear activation function.

If

σ

is Lipschitz-continuous with Lipschitz constant

L_{σ}

, then a sufficient condition for contraction of the recurrent propagation is

L_{σ} ∥ \tilde{A} W ∥_{2} < 1

(18)

Under this condition, small perturbations in the previous hidden state do not grow unbounded over time. This provides a sufficient, although not necessary, condition for stability of the temporal propagation. In practice, adjacency normalization, GRU gating, dropout, L2 regularization, and gradient clipping are used to improve numerical stability during training.

Empirical evaluation on simulated HAC interactions confirms the monotonic growth of

Φ_{t}

during collaboration phases and equilibrium once informational redundancy emerges. Table 2 presents a node and edge semantics in the HAC tripartite network.

4. Algorithmic Implementation

4.1. Graph Construction from AI Interaction Data

The algorithmic pipeline begins with transforming raw interaction data into a structured tripartite graph

G_{t}

.

Interaction logs from AIGC-enabled systems contain textual prompts, AI-generated responses, editing traces, and content usage histories.

Each log entry represents a potential edge connecting human, AI, and content nodes.

Temporal information (

t

) specifies the sequence of interactions and is used to construct dynamic snapshots of

G_{t}

.

Features extracted from each entity are embedded as high-dimensional vectors and normalized before graph assembly.

Interaction frequency and semantic similarity determine the edge weights

w_{u v}

, ensuring that highly correlated human–AI pairs exert a more decisive influence in message passing.

Table 3 summarizes the mapping from raw data fields to graph components and the transformation functions applied during preprocessing.

4.2. Model Training and Optimization

The MM-GNN is trained using a joint objective consisting of classification loss, regression loss, temporal smoothness loss, and regularization loss. The total loss is

L = L_{c l s} + η L_{r e g} + γ L_{t e m p} + β {∥ Θ ∥}_{2}^{2}

(19)

where

L_{c l s}

is the cross-entropy loss for low-, medium-, and high-innovation classification,

L_{r e g}

is the mean squared error loss for continuous innovation-score prediction,

L_{t e m p}

is the temporal smoothness penalty,

{∥ Θ ∥}_{2}^{2}

is the L2 regularization term, and

η

,

γ

, and

β

are non-negative hyperparameters.

The classification loss is defined as

L_{c l s} = - \sum_{i \in V} \sum_{c = 1}^{3} q_{i c} l o g {\hat{q}}_{i c}

(20)

where

q_{i c}

is the true class indicator and

{\hat{q}}_{i c}

is the predicted probability that node

i

belongs to innovation class

c

.

The regression loss is defined as

L_{r e g} = \frac{1}{∣ V ∣} \sum_{i \in V} (y_{i}^{t} - {\hat{y}}_{i}^{t})^{2}

(21)

where

y_{i}^{t}

is the constructed continuous innovation target and

{\hat{y}}_{i}^{t}

is the predicted innovation score.

The temporal smoothness loss is defined as

L_{t e m p} = \frac{1}{∣ V ∣} \sum_{i \in V} {∥ h_{i}^{t} - h_{i}^{t - 1} ∥}_{2}^{2}

(22)

where

h_{i}^{t}

and

h_{i}^{t - 1}

are node embeddings at consecutive time steps. This term discourages abrupt embedding changes unless supported by the observed interaction data.

The learning rate was set to

1 \times 10^{- 3}

, and L2 weight decay was set to

5 \times 10^{- 4}

. The batch size was 256 nodes and 2-hop neighbor sampling was undertaken. The models were trained for 200 epochs with early stopping, which stopped the model from reducing its validation loss for 20 epochs. The length of the temporal window was fixed to be

T = 10

. The dropout rate was 0.3, and gradient clipping was used with a maximum norm of 5.0 to stabilize recurrent training. Each experiment was carried out five times with different random seeds.

Figure 3 provides the optimization workflow, demonstrating the forward propagation in terms of multimodal attention fusion, back-propagation in terms of temporal layers, and updates in parameters.

The optimization process of the MM-GNN is shown in Figure 3. The first stage uses modality-specific encoders for converting text, image, and code inputs to latent embeddings. The second stage is multimodal fusion that uses attention to learn the adaptive weights for modalities according to their contribution to the node representation. The third stage carries out relation-aware graph message passing between human–AI, human–content, and AI–content edges. The fourth stage uses the GRU-based temporal propagation to model temporal dependencies between graph snapshots. We get both continuous innovation scores and categorical innovation labels from the final prediction layer. The backward path means that the loss of classification, the loss of score regression, the loss of temporal smoothness, and the regularization of the parameters are updated concurrently.

4.3. Computational Complexity and Scalability Analysis

The message-passing part is roughly linear in the number of edges for sparse graphs. The “modality fusion” part grows linearly with the number of modalities and projection dimension d, whereas the dense projection part adds a

d^{2}

term. So, the complexity for solving the problem per epoch can be approximated to be

O (∣ E ∣ d M + ∣ V ∣ d^{2} + T d^{2})

(23)

where

∣ E ∣

is the number of edges,

∣ V ∣

is the number of nodes,

d

is the embedding dimension,

M

is the number of modalities, and

T

is the temporal window length.

Temporal modeling adds a recurrent component of order

O (T d^{2})

, where

T

denotes the number of time steps.

Memory consumption grows as

O (∣ V ∣ d + ∣ E ∣)

.

Parallelization across modalities and sparse-matrix storage substantially reduces effective runtime.

Table 4 presents comparative complexity statistics for small, medium, and large HAC graphs, showing approximate training costs per epoch.

The implementation achieves linear scalability across graph sizes while preserving temporal coherence.

Sparse tensor operations and batched attention evaluation maintain computational efficiency for large multimodal interaction datasets.

The final trained network produces stable, interpretable innovation embeddings with a feasible runtime for real-world AIGC systems.

4.4. Experimental Evaluation

To assess the effectiveness of the proposed Multimodal Graph Neural Network (MM-GNN) framework for HAC innovation modeling, experiments were conducted on both synthetic co-creation datasets and a real-world AIGC dataset [30]. We first generated controlled co-creation datasets that simulate iterative cycles of human–AI collaboration. Each sample represents a sequence of prompt–response–revision interactions among human, AI, and content nodes. Modalities include textual prompts, image or code outputs, and human annotations. Synthetic graphs were constructed to validate the model’s ability to capture dynamic feedback loops and the emergence of novelty under varying noise and temporal conditions. For empirical validation, we used data from open-source AIGC repositories and creative-collaboration platforms (e.g., AI-assisted design forums, co-writing platforms, and multimodal art datasets). These datasets contain logs of user prompts, AI generations, content reuse histories, and peer ratings. The datasets were split into training, validation, and testing sets based on a chronological split of 70%, 15%, and 15%, respectively. The splitting was chronological since the HAC interactions are temporal, and the random splitting could produce leakage of information from the future interactions to the past predictions. The hyperparameters were tuned using early stopping on the validation set, and the test set was reserved for the final assessment. All models were implemented in PyTorch 2.1.0 and PyTorch Geometric 2.4.0 and trained on NVIDIA A100 GPUs. Hyperparameters such as learning rate (1 × 10⁻³), embedding dimension (512), and temporal window length (T = 10) were selected via cross-validation on held-out validation sets. There are two types of baselines. The first type of models belongs to the domain of graph-learning models such as GCN, HGNN, GAT, and MGTN. Similar to MM-GNN, these models are trained with classification/regression tasks at the node level. The second category is the analytical methods represented by the innovation index: Entropy Network Model, Open Innovation Index, Hybrid Synergy Model, and Information-Theoretic Model. These methods do not directly provide the probabilities of a given class, so their scalar innovation scores are scaled to

[0,1]

, and compared to the same continuous innovation goal. The normalized scalar scores are discretized into tertiles of the same size as the proposed model for classification. This enables a functional comparison under an identical evaluation protocol and the recognition of the dissimilarity of model capacity between analytical indices and neural predictive models.

Table 5 summarizes the main datasets used and their modality composition.

4.5. Evaluation Metrics

We evaluate the proposed model using both quantitative information-theoretic metrics and qualitative human assessments of creativity.

For each node

i

, the entropy-change component measures the change in the diversity of its local neighborhood representation:

Δ H_{i}^{t} = H (P_{N (i)}^{t}) - H (P_{N (i)}^{t - 1})

(24)

where

P_{N (i)}^{t}

denotes the distribution of latent states or interaction types in the neighborhood of node

i

at time

t

, and

H (\cdot)

is Shannon entropy:

H (P) = - \sum_{k} p_{k} l o g p_{k}

(25)

A positive

Δ H_{i}^{t}

indicates increasing neighborhood diversity, whereas a negative value indicates decreasing diversity or convergence toward a more homogeneous interaction pattern. Since innovation may involve both exploration and consolidation,

Δ H_{i}^{t}

is interpreted as one component of innovation potential rather than as a complete measure of innovation by itself.

The diffusion contribution of node

i

is defined as the weighted influence of node

i

on its neighboring nodes in the next graph snapshot:

D C_{i}^{t} = \sum_{j \in N (i)} a_{i j}^{t} \cdot s_{i j}^{t + 1}

(26)

where

a_{i j}^{t}

is the normalized edge weight between nodes

i

and

j

at time

t

, and

s_{i j}^{t + 1}

measures subsequent reuse, activation, citation, editing, or semantic propagation from node

i

to node

j

at time

t + 1

. This term captures whether a node contributes to later innovation activity in the HAC network.

Human-rated creativity is used as an auxiliary validation and optional target component. Human evaluators assess a subset of generated or co-created artifacts according to originality, usefulness, and coherence. The aggregated score is defined as

H R C_{i}^{t} = ω_{1} O_{i}^{t} + ω_{2} U_{i}^{t} + ω_{3} C_{i}^{t}

(27)

where

O_{i}^{t}

,

U_{i}^{t}

, and

C_{i}^{t}

denote originality, usefulness, and coherence ratings, respectively. The weights satisfy

ω_{1} + ω_{2} + ω_{3} = 1

(28)

In this study, equal weighting is used unless otherwise specified. Human-rated creativity is not treated as a purely objective definition of innovation. Instead, it provides an external evaluative signal against which the information-theoretic and topological measures can be compared.

Because innovation is a latent construct that cannot be directly observed, this study defines innovation potential as an operational target constructed from measurable indicators of novelty, informational diversity, diffusion, and human evaluation. For each node

i

at time

t

, the continuous innovation potential score is defined as

y_{i}^{t} = λ_{1} {\tilde{N I}}_{i}^{t} + λ_{2} {\tilde{Δ H}}_{i}^{t} + λ_{3} {\tilde{D C}}_{i}^{t} + λ_{4} {\tilde{H R C}}_{i}^{t}

(29)

where

{\tilde{N I}}_{i}^{t}

is the normalized novelty index,

{\tilde{Δ H}}_{i}^{t}

is the normalized entropy-change component,

{\tilde{D C}}_{i}^{t}

is the normalized diffusion contribution, and

{\tilde{H R C}}_{i}^{t}

is the normalized human-rated creativity score when available. The tilde symbol indicates that each component is scaled to the interval

[0,1]

using min–max normalization within each dataset.

The novelty index represents the dissimilarity between the current representation of a node and the node’s historical representation. The entropy-change component reflects the temporal trend of the diversity of the local neighborhood of a node. The diffusion contribution value indicates the contribution degree of a node to the subsequent activation or reuse of nearby nodes. Human-rated creativity is only used when available and gives an external evaluative signal based on originality, usefulness, and coherence.

The weights

λ_{1}, λ_{2}, λ_{3}, λ_{4}

satisfy

λ_{k} \geq 0, \sum_{k = 1}^{4} λ_{k} = 1

(30)

In the reported experiments, equal weighting is used for datasets where all four components are available, that is,

λ_{1} = λ_{2} = λ_{3} = λ_{4} = 0.25

. For datasets without human ratings, the HRC component is omitted and the remaining three weights are renormalized as

λ_{1} = λ_{2} = λ_{3} = 1 / 3

. This formulation avoids treating human judgment as the sole source of innovation measurement while still allowing human evaluation to serve as an external validation signal.

The three innovation levels for classification are derived from the continuous score

y_{i}^{t}

. The nodes in the lowest 33% of

y_{i}^{t}

are labeled as low innovation, the middle 33% are labeled as medium innovation, and the highest 33% are labeled as high innovation. This discretization ensures balanced classes and prevents any arbitrary choice of the thresholds. The continuous score is used for regression, and the label based on the tertiles is used for classification. The novelty index is the deviation from the historical state of a node, as recorded from a representation. Novelty is the definition of a node

i

as

N I_{i}^{t} = D_{K L} (P_{i}^{t}∥ {\bar{P}}_{i}^{t - 1})

(31)

where

P_{i}^{t}

is the probability distribution obtained from the current embedding of node

i

,

{\bar{P}}_{i}^{t - 1}

is the historical reference distribution computed from previous time steps, and

D_{K L} (\cdot ∥ \cdot)

denotes Kullback–Leibler divergence. A higher value indicates that the node’s current representation differs more strongly from its historical state.

The definition and role of evaluation metrics are summarized in Table 6.

5. Results

We compared the per-node innovation potential

I_{t} (v)

of human (H), AI (A), and content (C) nodes. Measures are classification accuracy, F1-score, and regression MAE and RMSE to predict continuous innovation. Table 7 summarizes the quantitative results. The proposed MM-GNN is able to reach the best predictive accuracy and minimum error among all the node categories. We repeated the experiments five times with different seeds to check that the difference between the models was statistically significant and calculated the mean performance values over the runs. Paired comparison runs were done for classification metrics. In the case of regression metrics, the absolute prediction errors were compared for the same nodes in the test set. For non-normality, the Wilcoxon signed-rank test was employed as a non-parametric test. A 0.05 level of significance was used.

The visual comparisons are shown in Figure 4a–d. Figure 4a shows that on the diagonal, predicted and actual innovation potentials are close, which attests to excellent calibration. Figure 4b shows confusion matrices that display balanced performance between human, AI, and content nodes. Figure 4c provides a comparison of ROC-AUC values under the baselines, with the MM-GNN yielding the most significant area under the curve, and Figure 4d provides the error distribution histograms per modality; that is, there is a tight concentration of errors in multimodal fusion.

The temporal dynamics of the network innovation index

Φ_{t}

was used to study the diffusion pattern and system stability in the Human–AI–Content (HAC) interactions. Table 8 provides a summary of results. The suggested MM-GNN has the best temporal coherence (ρ = 0.89) and minimum stability variance (σ_s = 0.058), which implies adequate and stable flow of innovations.

The network innovation processes and the relevant visualizations are represented in Figure 5a–d. Figure 5a depicts the trend in the development of

Φ_{t}

with a gradual increase followed by leveling off as the system approaches equilibrium. Figure 5b shows a spatial diffusion map of the progressive diffusion of AI to human–content subnetworks. Time lag correlations are shown in Figure 5c, which visualizes synchronized innovation peaks with an average lag of ~3 time steps. Figure 5b shows the nodes with colors representing normalized innovation potential and the edges with thickness representing normalized interaction strength. The color intensity is related to the innovation score of the nodes, and the darker the color, the more innovative the node. The visualization indicates that the innovativeness initially coalesces around the highly connected AI–content interaction subnetwork and then spreads to the human–content subnetwork. The edge activation heatmap in Figure 5d represents the heatmap of the edge activations over temporal windows. Rows are the type of edges, columns are time windows, and color intensity is the normalized strength of activation. The observed bursts indicate that the diffusion of innovation does not happen in a smooth fashion over time.

In order to evaluate the contribution of every modality, using text, image, and code, we conducted an ablation study by gradually eliminating each of the modalities in the multimodal fusion of MM-GNN. The ensuing reduction in performance measures the reliance on innovation prediction on each source of information. Table 9 provides summaries of quantitative impacts. The findings indicate that the omnipresence of any modality decreases the predictive accuracy and stability of innovations, which proves the complementary nature of multimodal fusion.

Figure 6 depicts the dynamic behaviors and interpretability analyses. As illustrated in Figure 6a, the attention weights approach equilibrium utilization of modality, with the influence of text having the most significant contribution in all epochs. Figure 6b graphically represents innovation variance on a modality basis, whereby text-based nodes have the most significant variance at early stages of learning. Figure 6c shows rankings of feature importance by SHAP, and it can be seen that textual and cross-modal embeddings are the most significant ones. Lastly, Figure 6d shows the plot of ablation accuracy curves versus epochs, which shows performance recovery following multimodal reintegration.

To assess the value of the temporal modeling, the baseline of GNNs with no time features was compared with the MM-GNN with a time-based temporal propagation via GRU. The recurrent element helps the model to memorize the contextual memory of the previous innovation states

Φ_{t - 1}

to better predict dynamically and have gradient stability. GRU hidden state dynamics, as depicted in Figure 7a, have smooth overall time dynamics and a periodic activation pattern consistent with the highs of innovation. Figure 7b shows that there is no vanishing/exploding gradient across time steps, unlike in the case of the static GNNs. The derivative of the index of innovation

d Φ_{t} / d t

, can be found in Figure 7c, which shows faster growth in innovations in a recurrent integration. Lastly, Figure 7d plots temporal attention congruency, which shows coordinated weighting of sequential snapshots that increase co-creation continuity and temporal consistency.

In order to evaluate the computational scalability, we profiled the MM-GNN runtime, memory usage, and training throughput on the different graph sizes, embedding dimensions, and modality configurations. Figure 8 represents the results. Figure 8a confirms the

O (∣ E ∣ d M)

complexity of the algorithm in Section 4.3 by showing a linear increase in runtime with the number of edges

∣ E ∣

. Figure 8b shows that the computing memory required by the GPU can be plotted as a function of embedding dimension

d

; memory increases as a roughly quadratic curve (

O (d^{2})

) until embedding dimension

d = 512

. Figure 8c shows that throughput (samples per second) decreases more or less with the increase in the number of modalities

M

, which has additional attention fusion overhead, and yet keeps efficiency nearly linear between two and three modalities. Lastly, the scale of runtime when running multiple GPUs in parallel is compared in Figure 8d and shows almost perfect ratios of speed-up when mini-batches are distributed in a synchronized mode.

A performance measure of distributed learning was conducted using the following criteria: the use of a GPU, synchronization latency, and inter-machine communication efficiency. The comparative statistics are represented in Figure 9. Figure 9a demonstrates that workload distribution is well-established, as it indicates that there is a balanced GPU usage (approximately 92%) among a maximum number of A100 devices (four). Figure 9b lists the synchronization latency per training step, with an overhead of gradient exchange of less than 5 ms at the NCCL back-end optimization. The speed of training grows almost linearly with the number of GPUs up to 4 (Figure 9c), with a speed up of 3.7× compared to training on a single device. Figure 9d presents a summary of communication overhead per epoch, and it is seen that there are minor losses (<6%) when adaptive gradient aggregation and sparse tensor compression are used.

The suggested MM-GNN framework is, in general, highly scalable, has a high level of utilization of GPUs, and has a low level of synchronization overhead, which makes it possible to ensure the functionality of large-scale multimodal HAC networks.

After a qualitative evaluation of the representational structure of the learned multimodal embeddings, we visualized the HAC node space with 2D projections, using t-SNE dimensionality reduction. The ensuing plots, as illustrated in Figure 10, indicate the innovation clustering behavior and time dynamics by human, AI, and content nodes. Figure 10a t-SNE visualization of the tripartite HAC graph reveals separate but slightly overlapping groups that represent human, AI, and content node clusters, which proves that cross-type embedding is successful. Figure 10b is colored using the potential of each node in terms of innovation value

I_{t} (v)

, which shows that the high-innovation areas assist in clustering with dense and mixed-modality around the human–AI contact point. Figure 10c superimposes both node degree and modality mixture clustering, which indicates that nodes with high cross-modal connectivity have high centrality and impact innovation more. Lastly, Figure 10d contains a time embedding drift map demonstrating the smoothing of the vectors over time windows and progressing towards the richly innovative areas of the attractor.

These qualitative visualizations affirm that the MM-GNN framework is an efficient model to consider emergent innovation manifolds, multimodal fusion coherence, and temporal changes in the HAC embedding space.

As shown in Table 10, MM-GNN outperforms all selected state-of-the-art methods across accuracy, F1-score, error metrics, and information-theoretic gains, demonstrating superior predictive and temporal stability in HAC innovation modeling.

6. Discussion

Both the temporal and structural analyses indicate that innovation in Human–AI–Content (HAC) systems is driven by repetitive cross-modal interactions that enhance the spread of ideas among heterogeneous nodes. The localized bursts and global diffusion of the observed behavior of the innovation index

Φ_{t}

indicates that creativity in co-creation networks is not random and not centrally based. As an alternative, it is self-organizing: the synergistic feedback among human thinkers, algorithmic dynamism, and multimodal representation supports distributed innovation within the network. The results of the empirical study should be related to the joint prediction task defined above. The classification results indicate that MM-GNN performs better in identifying low-, medium-, and high-innovation nodes, while the regression results indicate that MM-GNN has a lower error in the estimation of the continuous innovation potential. The improvement is thus not only in the prediction of categories but also from the point of view of the calibration of the estimated innovation score. This is crucial because innovation in HAC networks is not only a class attribute but is ongoing and continuously changing on nodes and interactions.

AI nodes play the role of catalytic intermediaries to hasten the appearance of new conceptual combinations and connect otherwise remote domains of semantics. Instead of substituting human ideation, AI is an assistant in cognitive growth, which increases the rate and variety of idea recombination. The rate of innovation trajectories (

d Φ_{t} / d t

) indicates the ability of AI-based attention systems to provide the sustaining power of innovativeness and network-wide knowledge development. This places AI agents not as creative agents but as co-evolutionary agents that enhance the mass intelligence of hybrid creative ecosystems.

There are a number of limitations that explain the limits of this framework. The innovation measurement continues to rely on the proxies based on human-rated innovation, which brings subjective bias. The HAC graph is made under the assumption of consistent multimodal alignment, but in the real world, co-creation is usually asynchronous and nonlinear. Moreover, the model does not consider long-term conceptual transformation as well as cognitive context change but only short- and mid-term dependency by recurrent integration. Theoretically, the system is limited by the informational capacity, as well as the representational diversity of the underlying modalities. Several limitations remain. First, the innovation score is not an actual measure of innovation but rather an operational proxy. While this includes novelty, entropy change, diffusion contribution, and a possible human evaluation, none of these mathematical indices completely reflect innovation as a social, cognitive, and technical phenomenon. Second, the model capacity of analytical index-based baselines and neural predictive baselines is different. So, the comparison should be viewed as a functional benchmark, on a common target definition, and not as an architectural comparison. Third, the proposed framework is tested largely based on artificial HAC data and a few selected corpora of AIGC, which need to be further examined in other socio-technical domains.

The subject of hybrid innovation presents serious ethical and cognitive issues. With the consistent involvement of AI agents in creative work, the problems of authorship, accountability, and epistemic transparency become highly pressing. Too much automation may lead to the dehumanization of the thought processes, and creative processes may become more of a curatorial process than a generative process. To address this, responsible hybrid systems must embed transparency, explainability, and ethical governance in their co-creation processes. Sustainable innovation should not harm human interpretive agency but harness the potential of AI to expand associatively, such that creativity will be a collaborative and ethically sound activity.

7. Conclusions and Future Directions

7.1. Conclusions on Network Modeling and Data Analysis

This paper presented a mathematical and computational framework based on a Multimodal Graph Neural Network (MM-GNN) to model and analyze innovation dynamics within complex HAC networks. By integrating heterogeneous data modalities and temporal interactions, the proposed model successfully simulates the evolution of innovation potential at both the local node and macroscopic network levels. Validated through rigorous quantitative data analysis on both synthetic complex network datasets and real-world AIGC corpora, the model demonstrated high accuracy, statistical stability, and structural interpretability. The results confirm that information diffusion and knowledge generation in HAC systems can be effectively modeled as organized topological dynamics, providing a robust, data-driven mathematical perspective on hybrid co-creation in networked environments.

7.2. Theoretical and Practical Implications

The findings offer substantial theoretical and practical contributions to the fields of network science and complex data analysis. Theoretically, this study establishes a quantitative foundation for analyzing socio-technical interactions as mathematically defined dynamic complex systems, where innovation is encoded within the evolution of multimodal topological relationships. Practically, the MM-GNN framework provides an advanced analytical tool for optimizing network structures in collaborative platforms, generative pipelines, and AI-assisted research ecosystems. By predicting high-innovation subgraphs and temporal diffusion patterns, the mathematical model facilitates targeted interventions in network-based interactions, bridging complex network data analysis with cognitive and social theories of knowledge evolution.

7.3. Future Directions in Complex Network Research

Future research will focus on causal mathematical modeling and adaptive network learning to enhance the interpretability and generalizability of the proposed framework.

Causal Network Dynamics: By extending the current model to incorporate causal graph neural networks, it becomes possible to identify directional, causal dependencies that drive network evolution, moving beyond simple statistical correlations.

Adaptive Topologies: Integrating context-dependent feedback loops and self-adaptive attention mechanisms will allow the model to respond in real-time to the non-stationary dynamics of complex networks.

Longitudinal Evolution: Analyzing long-term temporal network data will also facilitate the study of concept drift, the emergence of hierarchical topologies, and socio-semantic adaptation in dynamically evolving HAC systems.

7.4. Broader Impacts

The training, validation, and test sets were determined by a chronological method for each dataset to be 70%, 15%, and 15%, respectively. Since there is temporal information in HAC interactions, chronological splitting was adopted to avoid information leakage from future interactions to past predictions caused by random splitting. The hyperparameters were selected via the validation set and stopped early, and the test set was kept for final evaluation.

Author Contributions

Conceptualization, F.Z.; Methodology, F.Z.; Software, F.Z.; Validation, L.F.; Formal Analysis, L.F.; Investigation, L.F.; Writing—Original Draft, H.O.Z.; Writing—Review and Editing, H.O.Z.; Visualization, H.O.Z.; Supervision, H.O.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by China MOE Humanities and Social Sciences Research Youth Fund–Western and Border Regions Project, grant number + 22XJC790002, Scientific Research Plan Projects of Shaanxi Education Department, grant number + 25JT007, the Shaanxi Province 14th Five-Year Plan Educational Science Planning Project, grant number + SGH25Q520 and the APC was funded by the Youth Innovation Team of Shaanxi Universities, China.

Data Availability Statement

The original data presented in the study are openly available in [AIGC resources] at [https://github.com/mengsiwei/Awesome-Physical-AIGC-lists, accessed on 3 February 2026].

Conflicts of Interest

The authors declare no conflict of interest.

References

Hong, H.-Y.; Chen, M.-J.; Chang, C.-H.; Tseng, L.-T.; Chai, C.S. AI-supported idea-developing discourse to foster professional agency within teacher communities for STEAM lesson design in a knowledge-building environment. Comput. Educ. 2025, 229, 105241. [Google Scholar] [CrossRef]
Feuerriegel, S.; Hartmann, J.; Janiesch, C.; Zschech, P. Generative ai. Bus. Inf. Syst. Eng. 2024, 66, 111–126. [Google Scholar] [CrossRef]
Adefarati, T.; Sharma, G.; Bokoro, P.N.; Kumar, R. Advancing renewable-dominant power systems through internet of things and artificial intelligence: A comprehensive review. Energies 2025, 18, 5243. [Google Scholar] [CrossRef]
Bartelheimer, C.; Heinz, D.; Hönigsberg, S.; Siemon, D.; Li, M.M.; Strohmann, T.; Poeppelbuss, J.; Peters, C. Conceptualizing hybrid intelligent service ecosystems. Electron. Mark. 2025, 35, 63. [Google Scholar] [CrossRef]
Stanikzai, M.E.; Mittal, E. Leveraging AI-generated and human-generated content for maximized user engagement in contentpreneurs’ innovation and creativity. J. Innov. Entrep. 2025, 14, 91. [Google Scholar] [CrossRef]
Mariani, M.; Dwivedi, Y.K. Generative artificial intelligence in innovation management: A preview of future research developments. J. Bus. Res. 2024, 175, 114542. [Google Scholar] [CrossRef]
Novillo-Villegas, S.; Tulcanaza-Prieto, A.B.; Chantera, A.X.; Chimbo, C. Exploring a Sustainable Pathway Towards Enhancing National Innovation Capacity from an Empirical Analysis. Sustainability 2025, 17, 6922. [Google Scholar] [CrossRef]
Kong, X.; Fang, H.; Chen, W.; Xiao, J.; Zhang, M. Examining human–AI collaboration in hybrid intelligence learning environments: Insight from the Synergy Degree Model. Humanit. Soc. Sci. Commun. 2025, 12, 821. [Google Scholar] [CrossRef]
Wu, Y.; Ding, L.; Li, N.; Yu, X. Unveiling the influence of global innovation networks on corporate innovation: Evidence from the international semiconductor industry. Sci. Rep. 2024, 14, 11007. [Google Scholar] [CrossRef]
Appleyard, M.M. How does knowledge flow? Interfirm patterns in the semiconductor industry. Strateg. Manag. J. 1996, 17, 137–154. [Google Scholar] [CrossRef]
Farayola, O.A.; Abdul, A.A.; Irabor, B.O.; Okeleke, E.C. Innovative business models driven by AI technologies: A review. Comput. Sci. IT Res. J. 2023, 4, 85–110. [Google Scholar] [CrossRef]
Qu, X.; Eggers, J.P.; Kumar, M.V.S. Unlocking novel knowledge recombinations: The effect of artificial intelligence on inventive activity. Strateg. Manag. J. 2026, 1–30. [Google Scholar] [CrossRef]
Cai, H.; Wang, Z.; Wang, W. Spatiotemporal investigation and determinants of interprovincial innovation network from a multilayer network perspective. Technol. Anal. Strateg. Manag. 2024, 36, 2171–2186. [Google Scholar] [CrossRef]
Wen, J.; Zhu, X.-R.; Wang, C.-D.; Tian, Z. A framework for personalized recommendation with conditional generative adversarial networks. Knowl. Inf. Syst. 2022, 64, 2637–2660. [Google Scholar] [CrossRef]
Fu, J.; Han, H.; Su, X.; Fan, C. Towards human-AI collaborative urban science research enabled by pre-trained large language models. Urban. Inform. 2024, 3, 8. [Google Scholar] [CrossRef]
Li, Y.; Gou, X.; Hu, H.; Zhang, H. Exploring the impact of innovation guidance on user participation in online communities: A mixed methods investigation of cognitive and affective perspectives. Front. Psychol. 2022, 13, 1011837. [Google Scholar] [CrossRef] [PubMed]
Kamnerddee, C.; Putjorn, P.; Intarasirisawat, J. AI-driven design thinking: A comparative study of human-created and AI-generated UI prototypes for mobile applications. In Proceedings of the 2024 8th International Conference on Information Technology (InCIT), Chonburi, Thailand, 14–15 December 2024; pp. 237–242. [Google Scholar]
Maglio, P.P.; Lim, C. On the impact of autonomous technologies on human-centered service systems. In Handbook of Service Dominant Logic; SAGE: London, UK, 2018. [Google Scholar]
Collins, K.M.; Sucholutsky, I.; Bhatt, U.; Chandra, K.; Wong, L.; Lee, M.; Zhang, C.E.; Zhi-Xuan, T.; Ho, M.; Mansinghka, V. Building machines that learn and think with people. Nat. Hum. Behav. 2024, 8, 1851–1863. [Google Scholar] [CrossRef]
Zerilli, J.; Bhatt, U.; Weller, A. How transparency modulates trust in artificial intelligence. Patterns 2022, 3, 100455. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Zhu, G.; Zhu, Z.; Chen, H.; Yuan, C.; Huang, Y. Hagnn: Hybrid aggregation for heterogeneous graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 14536–14550. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
He, X.; Wang, X.E. Multimodal graph transformer for multimodal question answering. arXiv 2023, arXiv:2305.00581. [Google Scholar] [CrossRef]
Feng, Z.; Zeng, Z.; Guo, C.; Li, Z. Temporal multimodal graph transformer with global-local alignment for video-text retrieval. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 1438–1453. [Google Scholar] [CrossRef]
Wang, T.; Li, F.; Zhu, L.; Li, J.; Zhang, Z.; Shen, H.T. Cross-modal retrieval: A systematic review of methods and future directions. arXiv 2023, arXiv:2308.14263. [Google Scholar] [CrossRef]
Moaniba, I.M.; Su, H.-N.; Lee, P.-C. Knowledge recombination and technological innovation: The important role of cross-disciplinary knowledge. Innovation 2018, 20, 326–352. [Google Scholar] [CrossRef]
Liu, C.-F.; Jiang, Q. Evaluation of Regional Innovation Capacity Based on Social Network Analysis and Entropy-Based GC-TOPSIS. Discret. Dyn. Nat. Soc. 2024, 2024, 3149746. [Google Scholar] [CrossRef]
Zhuang, C.; Ma, Q. Dual graph convolutional networks for graph-based semi-supervised classification. In Proceedings of the 2018 World Wide Web Conference, Geneva, Switzerland, 10 April 2018; pp. 499–508. [Google Scholar]
Meng, S.; Luo, Y.; Liu, P. Awesome Physical AIGC Lists; GitHub, repository, Inc.: San Francisco, CA, USA, 2025; Available online: https://github.com/mengsiwei/Awesome-Physical-AIGC-lists (accessed on 3 February 2026).
Yoon, M.; Koh, J.Y.; Hooi, B.; Salakhutdinov, R. Multimodal graph learning for generative tasks. arXiv 2023, arXiv:2310.07478. [Google Scholar] [CrossRef]

Figure 1. Tripartite HAC network. Human nodes (H), AI agent nodes (A), and content nodes (C) are connected through dynamic interactions. The colored bars associated with each node represent the relative composition of multimodal features in the fused node representation, where light blue denotes text features, orange denotes image features, red denotes code features, and dark blue denotes metadata or interaction features. The symbol φ denotes the normalized innovation potential score of a node, with larger values indicating higher innovation potential.

Figure 2. Multimodal Graph Neural Network architecture.

Figure 3. Optimization workflow with multimodal attention mechanism.

Figure 4. Node-level innovation prediction. (a) predicted versus observed innovation scores; (b) confusion matrix for innovation classes; (c) ROC-AUC comparison across models; and (d) error distribution across node types.

Figure 5. Network innovation dynamics.

Figure 6. Modality contribution analysis.

Figure 7. Temporal module impact.

Figure 8. Scalability profiling.

Figure 9. Distributed performance analysis.

Figure 10. Learned embedding visualization.

Table 1. Comparative overview of related studies across domains.

Ref.	Domain/Focus	Methodology	Key Contribution	Limitations Relevant to HAC Modeling
[7]	Knowledge diffusion across industries	Entropy-based network analysis	Quantified innovation propagation via entropy metrics	Lacks AI-generated content integration
[10]	Corporate innovation networks	Graph analytics of global R&D	Linked connectivity to innovation performance	Human-only agents; no multimodal content
[2]	Generative AI in business ecosystems	Conceptual analysis	Defined role of AIGC in knowledge creation	Lacks mathematical representation
[3]	AI in knowledge management	Empirical analysis	Demonstrated AI’s role in knowledge reuse	Omits co-creative network modeling
[8]	Human–AI hybrid learning	Mixed-method study	Measured synergy dynamics in hybrid classrooms	Context-specific, not generalizable
[18]	Hybrid service ecosystems	Conceptual model	Defined socio-technical co-creation	No quantitative modeling of innovation
[29]	Graph convolutional networks	Deep learning on graphs	Introduced GCN foundational model	Homogeneous node assumption
[24]	Multimodal graph transformers	Transformer–GNN hybrid	Enabled multimodal fusion across domains	Focused on classification, not innovation
[27]	Interdisciplinary knowledge modeling	Information-theoretic modeling	Quantified recombination via mutual information	No temporal or multimodal components
[28]	Open innovation quantification	Entropy–synergy hybrid index	Combined graph topology and entropy metrics	Applied to human networks only

Table 2. Node and edge semantics in the HAC tripartite network.

Node/Edge Type	Description	Feature Modality	Example Data Source
$Human (v_{H}$ )	Individual participant contributing cognitive input	Text, behavioral traces	User prompts, annotations
$AI Agent (v_{A}$ )	Generative or analytic subsystem producing artifacts	Model states, latent embeddings	Transformer outputs
$Content (v_{C}$ )	Knowledge or media artifact representing generated output	Text, image, code	AIGC repositories
$E_{H A}$	Interaction between human and AI	Prompt–response or query relation	Dialogue exchanges
$E_{H C}$	Link between human and content	Editing, rating, or citation relation	Document revisions
$E_{A C}$	Link between AI and content	Generation or retrieval mapping	Model inference traces

Table 3. Data schema and mapping between interaction logs and graph entities.

Data Source	Field/Attribute	Graph Entity	Description	Transformation/Feature Extraction
User interaction logs	User ID, timestamp, prompt text	$Human node (v_{H}$ )	Identifies human participant	Tokenization → semantic embedding
AI response metadata	Model ID, generation log, output vector	$AI node (v_{A}$ )	Represents the AI agent state	Latent vector projection → normalization
Content repository	File ID, media type, content hash	$Content node (v_{C}$ )	Generated artifact instance	Multimodal encoder → joint feature space
Prompt–response pair	$(UserID, ModelID)$	$Edge E_{H A}$	Human–AI query relation	Temporal weighting by frequency
Response–artifact link	$(ModelID, FileID)$	$Edge E_{A C}$	Generation or retrieval action	Similarity score → edge weight
Revision logs	$(UserID, FileID)$	$Edge E_{H C}$	Edit or reuse operation	Cosine distance → contextual strength

Table 4. Time and space complexity of MM-GNN under varying graph scales.

Graph Scale	Nodes (\|V\|)	Edges (\|E\|)	Embedding Dim (d)	Modalities (M)	Avg. Runtime/Epoch (s)	GPU Memory (GB)	Complexity Order
Small	5 × 10³	2 × 10⁴	128	2	11.3	1.8	O(\|E\| d M)
Medium	5 × 10⁴	2 × 10⁵	256	3	74.6	5.7	O(\|E\| d M + \|V\| d²)
Large	2 × 10⁵	1 × 10⁶	512	3	338.5	18.2	O(\|E\| d M + T d²)

Table 5. Dataset statistics and modality composition.

Graph Scale	Nodes	Edges	Embedding Dim.	Modalities	Runtime/Epoch	GPU Memory
Small	5 × 10³	2 × 10⁴	128	3	13.9 s	2.1 GB
Medium	5 × 10⁴	2 × 10⁵	256	3	74.6 s	5.7 GB
Large	2 × 10⁵	1 × 10⁶	512	3	338.5 s	18.2 GB

Table 6. Definition and role of evaluation metrics.

Metric	Symbol	Mathematical Meaning	Used for
Accuracy	Acc.	Proportion of correctly classified nodes	Innovation-level classification
F1-score	F1	Harmonic mean of precision and recall	Classification performance
ROC-AUC	AUC	Area under the ROC curve	Class-separation ability
Mean Absolute Error	MAE	$\frac{1}{n} \sum_{i = 1}^{n} \|y_{i} - {\hat{y}}_{i}\|$	Score regression
Root Mean Squared Error	RMSE	$R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}$	Score regression
Coefficient of Determination	R²	$R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}$	Explained variance
Novelty Index	NI	Divergence between current and historical node distributions	Innovation-score component
Entropy Change	ΔH	Change in local neighborhood entropy	Innovation-score component
Diffusion Contribution	DC	Weighted influence on subsequent neighboring activity	Innovation-score component
Human-Rated Creativity	HRC	Aggregated human score for originality, usefulness, and coherence	Auxiliary validation
Temporal Correlation	Ρ	$Correlation between Φ^{t}$ $and Φ^{t - 1}$	Temporal stability
Stability Variance	σ	$Variance of Φ^{t}$ across time windows	Dynamic stability
Diffusion Entropy	$(H_{D}$ )	Entropy of innovation diffusion distribution over edges or neighborhoods	Network diffusion diversity

Table 7. Node-level innovation prediction results across models and node categories on the primary pooled HAC benchmark test set.

Model	Accuracy (H)	Accuracy (A)	Accuracy (C)	F1 (Avg.)	95% CI for F1	MAE ↓	RMSE ↓	R² ↑
GCN [30]	0.74	0.69	0.72	0.71	[0.69, 0.73]	0.184	0.247	0.62
HGNN [23]	0.78	0.73	0.75	0.76	[0.74, 0.78]	0.162	0.221	0.68
GAT [24]	0.80	0.76	0.78	0.78	[0.76, 0.80]	0.151	0.213	0.71
MGTN [25]	0.82	0.79	0.81	0.80	[0.78, 0.82]	0.138	0.197	0.75
MGL-Gen [31]	0.83	0.80	0.82	0.81	[0.79, 0.83]	0.131	0.191	0.77
MM-GNN (Proposed)	0.89	0.86	0.88	0.87	[0.85, 0.89]	0.094	0.138	0.86

Note: H, A, and C denote human, AI, and content nodes, respectively. The 95% confidence intervals were estimated using bootstrap resampling of test-set predictions. R² is reported for the continuous innovation-score regression task. ↑ indicates that higher values are better, whereas ↓ indicates that lower values are better.

Table 8. Temporal correlation and stability metrics for network-level innovation.

Model	Mean Φ_t	Temporal Corr. ρ(Φ_t, Φ_t−1)	Innovation Lag τ (ms)	Stability Index σ_s ↓	Diffusion Entropy H_D ↑
GCN [29]	0.412	0.71	4.8	0.092	1.84
HGNN [22]	0.458	0.75	4.3	0.087	1.96
GAT [23]	0.491	0.78	3.9	0.081	2.04
MGTN [24]	0.523	0.82	3.6	0.074	2.11
MGL-Gen [31]	0.547	0.84	3.3	0.070	2.18
MM-GNN (Proposed)	0.612	0.89	2.7	0.058	2.36

Note: ↑ indicates that higher values are better, whereas ↓ indicates that lower values are better.

Table 9. Performance impact of modality exclusion on innovation prediction in the proposed MM-GNN framework.

Configuration	Accuracy (%)	F1-Score	MAE ↓	Δ Innovation Index (%)	Relative Drop vs. Full Model (%)
Full Model (Text + Image + Code)	89.2	0.87	0.094	+0.0	0.0
—Text Only	82.6	0.78	0.141	−7.4	7.4
—Image Only	84.1	0.79	0.132	−5.9	5.9
—Code Only	85.0	0.81	0.120	−4.2	4.2
—No Image	86.5	0.83	0.111	−2.7	2.7
—No Code	87.1	0.84	0.108	−2.1	2.1

Note: H, A, and C denote human, AI, and content nodes, respectively. Confidence intervals were estimated using bootstrap resampling of test-set predictions. R² is reported for the continuous innovation-score regression task. ↓ indicates lower is better.

Table 10. Quantitative comparison of MM-GNN with selected state-of-the-art methods.

Method/Ref	Accuracy	F1-Score	MAE	RMSE	ρ (Temporal Stability)	Entropy Gain ΔH	MI Gain (bits)
GCN [29]	0.71	0.68	0.142	0.221	0.62	0.013	0.021
MGTN [24]	0.78	0.74	0.118	0.196	0.71	0.024	0.034
Entropy Network Model [7]	0.64	0.60	0.167	0.244	0.55	0.019	0.015
Open Innovation Index [28]	0.69	0.66	0.153	0.232	0.58	0.028	0.019
Hybrid Synergy Model [8]	0.73	0.70	0.131	0.209	0.66	0.022	0.028
Information-Theoretic Model [27]	0.75	0.72	0.126	0.202	0.68	0.031	0.033
Proposed MM-GNN	0.89	0.87	0.081	0.143	0.89	0.048	0.057

Note: Predictive graph-learning baselines and analytical index-based baselines are not architecturally equivalent. They are compared functionally under the same target definition, normalization procedure, and evaluation metrics. Analytical index-based models are converted into node-level scalar predictors before classification and regression evaluation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, F.; Fang, L.; Zaki, H.O. Modeling and Data Analysis of Innovation Dynamics in Complex Human–AI–Content Networks: A Multimodal Graph Learning Approach. Mathematics 2026, 14, 2051. https://doi.org/10.3390/math14122051

AMA Style

Zhou F, Fang L, Zaki HO. Modeling and Data Analysis of Innovation Dynamics in Complex Human–AI–Content Networks: A Multimodal Graph Learning Approach. Mathematics. 2026; 14(12):2051. https://doi.org/10.3390/math14122051

Chicago/Turabian Style

Zhou, Fangzhou, Lin Fang, and Hafizah Omar Zaki. 2026. "Modeling and Data Analysis of Innovation Dynamics in Complex Human–AI–Content Networks: A Multimodal Graph Learning Approach" Mathematics 14, no. 12: 2051. https://doi.org/10.3390/math14122051

APA Style

Zhou, F., Fang, L., & Zaki, H. O. (2026). Modeling and Data Analysis of Innovation Dynamics in Complex Human–AI–Content Networks: A Multimodal Graph Learning Approach. Mathematics, 14(12), 2051. https://doi.org/10.3390/math14122051

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling and Data Analysis of Innovation Dynamics in Complex Human–AI–Content Networks: A Multimodal Graph Learning Approach

Abstract

1. Introduction

2. Related Work

2.1. Knowledge Network Modeling and Innovation Diffusion

2.2. Human–AI Collaboration and Co-Creative Systems

2.3. Graph Representation Learning and Multimodal Fusion

2.4. Information-Theoretic Models of Innovation

2.5. Research Gap and Positioning

3. Mathematical Framework for HAC Innovation Networks

3.1. Tripartite Network Definition

3.2. Multimodal Feature Representation

3.3. Multimodal Graph Learning Architecture

3.4. Innovation Dynamics

3.5. Stability and Analytical Properties

4. Algorithmic Implementation

4.1. Graph Construction from AI Interaction Data

4.2. Model Training and Optimization

4.3. Computational Complexity and Scalability Analysis

4.4. Experimental Evaluation

4.5. Evaluation Metrics

5. Results

6. Discussion

7. Conclusions and Future Directions

7.1. Conclusions on Network Modeling and Data Analysis

7.2. Theoretical and Practical Implications

7.3. Future Directions in Complex Network Research

7.4. Broader Impacts

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI