A Comparative Analysis of Three Data Fusion Methods and Construction of the Fusion Method Selection Paradigm

Liu, Ziqi; Yin, Ziqiao; Mi, Zhilong; Guo, Binghui; Zheng, Zhiming

doi:10.3390/math13081218

Open AccessArticle

A Comparative Analysis of Three Data Fusion Methods and Construction of the Fusion Method Selection Paradigm

by

Ziqi Liu

^1,2,

Ziqiao Yin

^2,3,4,5,*

,

Zhilong Mi

^2,3,

Binghui Guo

^2,3,4,* and

Zhiming Zheng

^2,3,4

¹

School of Mathematical Sciences, Beihang University, Beijing 100191, China

²

Key Laboratory of Mathematics, Informatics and Behavioral Semantics and State, Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China

³

Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Institute of Artificial Intelligence, Beihang University, Beijing 100191, China

⁴

Zhongguancun Laboratory, Beijing 100094, China

⁵

Hangzhou International Innovation Institute of Beihang University, Hangzhou 311115, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2025, 13(8), 1218; https://doi.org/10.3390/math13081218

Submission received: 1 March 2025 / Revised: 28 March 2025 / Accepted: 3 April 2025 / Published: 8 April 2025

Download

Browse Figures

Versions Notes

Abstract

Multisource and multimodal data fusion plays a pivotal role in large-scale artificial intelligence applications involving big data. However, the choice of fusion strategies for different scenarios is often based on experimental comparisons, which leads to increased computational costs during model training and suboptimal performance during testing. In this paper, we present a theoretical analysis of early fusion, late fusion, and gradual fusion methods. We derive equivalence conditions between early and late fusions within the framework of generalized linear models. Moreover, we analyze the failure conditions of early fusion in the presence of nonlinear feature-label relationships. Furthermore, we propose an approximate equation for evaluating the accuracy of early and late fusion methods as a function of sample size, feature quantity, and modality number. We also propose a critical sample size threshold at which the performance dominance of early fusion and late fusion models undergoes a reversal. Finally, we introduce a fusion method selection paradigm for selecting the most appropriate fusion method prior to task execution and demonstrate its effectiveness through extensive numerical experiments. Our theoretical framework is expected to solve the problems of computational and resource costs in model construction, improving the scalability and efficiency of data fusion methods.

Keywords:

data fusion; equivalency analysis; model accuracy evaluation; critical sample size threshold evaluation; method selection paradigm

MSC:

68T09

1. Introduction

The ongoing advancement of data acquisition technologies has precipitated a proliferation of multimodal data representation paradigms for characterizing complex phenomena. In the pursuit of enhanced representational completeness, modern analytical frameworks increasingly incorporate advanced data fusion methods to synthesize disparate information sources. The data fusion method was first proposed and widely used by the US military, which was defined as “ multi-level process dealing with the association, correlation, combination of data and information from single and multiple sources to achieve refined position, identify estimates and complete and timely assessments of situations, threats and their significance” [1].

Since then, many scholars have classified data fusion methods. Durrant-Whyte classified data fusion methods into three categories, namely methods for complementarity data sources, redundancy data sources, and cooperation data sources [2]. In Dasarathy’s classification framework, data fusion techniques were systematically organized into a tripartite structure comprising signal fusion, feature fusion, and decision fusion, with this categorization predicated on the nature of input–output data modalities and their corresponding feature attributes [3]. At present, the mainstream classifications are mainly early fusion (EF), intermediate fusion (IF), and late fusion (LF) [4,5]. Early fusion, also known as data layer fusion, refers to the simple concatenation of original features as the input of a predictive classifier. Intermediate fusion, also known as feature layer fusion, refers to the learning of edge features of original data and the fusion of features for predictive classification. Late fusion, also known as decision layer fusion, refers to the training of unimodal models with different data and the fusion of model decisions [6,7,8]. Figure 1 shows the framework of these three types of data fusion methods. Comparative analysis of various data fusion methods has become a prominent focus in contemporary data fusion research.

At present, most studies on the comparative analysis of data fusion methods rely on experimental results from specific data sets, such as the comparison of the effect of early fusion and late fusion [9,10,11,12], early fusion and intermediate fusion [13,14,15], and the comprehensive comparison between the three fusion methods [16,17,18,19,20,21,22]. Few studies have compared the effects of different fusion methods based on the mechanism of different fusion methods. Pereira et al. [23], assuming a perfect model and a sufficiently large sample size, mathematically derived that early fusion outperforms late fusion in binary classification problems. What is more, an aspect worth thinking about is the lack of selection mechanism of fusion methods before performing predictive and classification tasks.

In this paper, we address the binary classification problem within the framework of data fusion. Specifically, we provide mathematical formulations for both early fusion and late fusion methodologies. Furthermore, we derive a mathematical definition of gradual fusion (GF) as a novel intermediate fusion approach, which processes data in a hierarchical, stepwise manner for prediction tasks [4,5]. In order to construct a fusion method selection paradigm, we first compare three fusion methods from the perspective of fusion mechanism and obtain the equivalent conditions of the early fusion and late fusion, the failure conditions of the early fusion when there is a gradual association between different features, and labels under the generalized linear model through mathematical derivation. Then, we propose an equation to evaluate the accuracy of the early fusion and late fusion by sample size, feature number, and mode number, and evaluate the critical sample size threshold at which the performance dominance of early fusion and late fusion models undergoes a reversal. Finally, based on the proposed equation, we give a discriminant paradigm that provides a theoretical basis for selecting fusion models before performing a specific task. The process of constructing the fusion method selection paradigm is shown in Figure 2.

The paper is structured in the following manner. In Section 2, the research questions and concepts related to data fusion are described. In Section 3, three fusion methods are compared and analyzed, and the fusion method selection paradigm is proposed. Then, the main conclusions are illustrated in Section 4 with examples. Finally, we discuss the work of this paper in detail and discuss the feasible research direction in the future.

2. Description of the Problem and Related Concepts

Since we are concerned with the fusion of multimodal data, any tasks related to data preprocessing are outside the scope of this article. We assume that the data have been filtered by the relevant variables, so all features have a prediction weight of non-0 relative to the label. In this paper, the mode features are denoted as

{\{X^{i}\}}_{i = 1}^{K}

,

X^{i} = (x_{1}^{i}, x_{2}^{i}, . . ., x_{m_{i}}^{i})

.

K

represents the number of modes. The complete set of features was denoted as

X = (x_{1}, . . ., x_{m})

, that is

⋃_{i = 1}^{K} X_{i} = X

. We assume that none of the modes contain the cross-terms of the features, and the prediction models were all generalized linear models.

The generalized linear model is an extension of the classical linear regression model, which is capable of handling non-normally distributed response variables and allows the relationship between the linear predictor and the expected value of the response variable to be established through the link function. It is defined as follows [24].

Definition 1.

Suppose that the dependent variable

{Y = (Y}_{1}, Y_{2}, . . ., Y_{n})

obtains

n

independent observations and follows an exponential distribution, i.e., there is a density function:

f (Y | θ, ϕ) = e x p ((Y θ - b (θ)) / ϕ + c (Y, ϕ)),

(1)

where

θ

,

ϕ

are the parameters, and

b (\cdot)

and

c (\cdot)

are the functions. When

θ = K (X^{T} β)

, in which

X = (x_{1}, . . ., x_{m})

is the observed value of p-independent variable corresponding to the

Y

,

β

is the

p \times 1

coefficient vector, and the function

K (\cdot)

describes the association between the independent variable

X

and the parameter

θ

. If there is a monotone differentiable connection function, such that

g (μ) = η = X^{T} β,

(2)

E (Y) = μ,

(3)

the model is called a generalized linear model. The inverse function of the join function

g^{- 1} (\cdot)

is called the response function. Since the dependent variable

Y

follows the exponential distribution, it can be calculated by the expectation of the exponential distribution that

E (Y) = b^{'} (θ) = b^{'} (K (X^{T} β)) = F (X)

. So, the key to a generalized linear model is to find a monotone differentiable connection function

g (\cdot)

such that

g \circ F (X) = X^{T} β

. Since

g (\cdot)

is monotonic and differentiable, it is monotonic for every

x_{i}

in

X

.

In early fusion, all features are spliced into a vector that serves as a unimodal input to the subsequent prediction model [25]. Therefore, the definition of early fusion can be given as follows.

Definition 2.

Given features of

K

modes, the model that satisfies the following form is called the early fusion:

g_{E} (μ) = η_{E} = \sum_{i = 1}^{m} w_{i} x_{i},

(4)

where

g_{E} (\cdot)

is the connection function of the generalized linear model in the early fusion,

η_{E}

is the output,

w_{i}

is the weight coefficient (

w_{i} \neq

0), and the final prediction result is

g_{E}^{- 1} (η_{E})

.

In the late fusion, a separate model is trained for each mode, and the predictions from the individual models are aggregated to make the final prediction [4]. Therefore, the definition of late fusion can be given as follows.

Definition 3.

Given features of

K

modes, the model that satisfies the following form is called the late fusion:

g_{L}^{k} (μ) = η_{L}^{k} = \sum_{j = 1}^{m_{k}} w_{j}^{k} x_{j}^{k}, k = 1,2, . . ., K, x_{j}^{k} \in X,

(5)

{o u t p u t}_{L} = f ({g_{L}^{1}}^{- 1} (η_{L}^{1}), {g_{L}^{2}}^{- 1} (η_{L}^{2}), . . ., {g_{L}^{K}}^{- 1} (η_{L}^{K})),

(6)

where

g_{L}^{k} (\cdot)

(

k = 1,2, . . ., K

) is the sub-model trained on the features of the

k

-th mode,

{g_{L}^{k}}^{- 1} (η_{L}^{k})

is the output,

f (\cdot)

is the fusion function of the decision, and

{o u t p u t}_{L}

is the final decision.

Gradual fusion is different from early fusion and late fusion. In this case, the features of different modes are fused step by step according to the correlation between modes. The mode with high correlation is fused first, while the mode with low correlation is fused later [4,5]. For gradual fusion, we can use a network diagram composed of modal and fusion functions to represent it.

Definition 4.

Given features of

K

modes, the model that satisfies the following form is called the gradual fusion:

g_{G} (μ) = η_{G} = G (\bar{X}, F),

(7)

where

\bar{X}

represents the set of all modal features,

F

represents the set of fusion prediction functions, the prediction function includes the input relations between modes and functions, and, therefore,

G

represents the progressive fusion model graph of a whole composed of

\bar{X}

and

F

.

Because the fusion functions are generalized linear models, the elements in

F

, as the output of the fusion prediction function, represent the inverse function of the connection function in the generalized linear model. The starting point of the directed connected edge in the network is the input of the end point, and there is a fusion function node with only the input degree and no output degree, that is, the final fusion prediction function, denoted as

f

. From (7), it is not difficult to achieve

f = g_{G}^{- 1} (\cdot)

. In addition, every modal node in the network must have a path to the final fusion prediction function node. See Appendix A for the detailed framework of the gradual fusion model construction.

Example 1.

Giving a case with 5 modal features

{X_{i}}_{i = 1}^{5}

and 4 fusion prediction functions

{f_{1}, f_{2}, f_{3}, f}

. In first fusion step,

X_{1}

and

X_{2}

are fused by

f_{1}

, while

X_{3}

and

X_{4}

are fused by

f_{2}

. In the second fusion step, the results of

f_{1}

and

f_{2}

are fused by

f_{3}

. Finally, the

X_{5}

and the result of

f_{3}

are fused by

f

. The model can be represented as (8):

g_{G} (μ) = η_{G} = f (X_{5}, f_{3} (f_{1} (X_{1}, X_{2}), f_{2} (X_{3}, X_{4}))) .

(8)

3. Comparative Analysis and Construction of the Fusion Method Selection Paradigm

3.1. Comparative Analysis of Early Fusion and Late Fusion

Since the features have been variably filtered, we assume that for

\forall x_{i}, \exists η_{i}

, we include non-0 terms of

x_{i}

and

w_{j}^{i} \neq

0. To start with the simple case, we consider

f (\cdot)

the case of a linear function in late fusion. According to the definitions of early fusion and late fusion, we find that the two methods may be equivalent in form, so let

\sum_{k = 1}^{K} α_{k} η_{L}^{k} = η_{E} .

(9)

When we bring (4), (5), (6) into Formula (9), we obtain

\sum_{k = 1}^{K} α_{k} \sum_{j = 1}^{m_{k}} w_{j}^{k} x_{j}^{k} = \sum_{i = 1}^{m} w_{i} x_{i} .

(10)

When Formula (10) is merged and sorted by

x

, we obtain

\sum_{i = 1}^{m} x_{i} \sum_{k = 1}^{K} {\bar{w}}_{k}^{i} α_{k} = \sum_{i = 1}^{m} w_{i} x_{i},

(11)

where

{\bar{w}}_{k}^{i}

is the coefficient of

x_{i}

in Formula (5). A system of linear equations with respect to

α_{k}

is obtained by matching the coefficients of

x_{i}

.

\{\begin{matrix} \sum_{k = 1}^{K} {\bar{w}}_{k}^{1} α_{k} = w_{1} \\ ⋮ \\ \sum_{k = 1}^{K} {\bar{w}}_{k}^{m} α_{k} = w_{m} \end{matrix}

(12)

Lemma 1.

Consistency Theorem for Nonhomogeneous Linear Systems [26]

\{\begin{matrix} a_{11} x_{1} + a_{12} x_{2} + . . . + a_{1 n} x_{n} = b_{1} \\ a_{21} x_{1} + a_{22} x_{2} + . . . + a_{2 n} x_{n} = b_{2} \\ ⋮ \\ a_{m 1} x_{1} + a_{m 2} x_{1} + . . . + a_{m n} x_{n} = b_{m} \end{matrix}

(13)

The sufficient and necessary conditions governing the existence of solutions for the above equations are

r (A) = r (\tilde{A})

, where

A

is the coefficient matrix of (13),

\tilde{A}

is the augmented matrix of (13), and

r (\cdot)

represents the rank of matrix.

Proof.

See Appendix B for proof. □

According to Lemma 1, Equation (12) has a solution when the rank of the coefficient matrix

A

is equal to the augmented matrix

\tilde{A}

. Let us assume that one of the solutions is

(\bar{α_{1}}, \bar{α_{2}}, . . ., \bar{α_{n}}

), then

\sum_{k = 1}^{K} \bar{α_{k}} η_{L}^{k} = η_{E}

. We only need to specify

f

in Formula (6) to make the early fusion and the late fusion achieve the same output.

Theorem 1.

Under the condition of the generalized linear model, the necessary condition for the equivalence of early fusion and late fusion is that the rank of the coefficient matrix

A = ({\bar{w}}_{1}, {\bar{w}}_{2}, . . ., {\bar{w}}_{K})

is equal to

\tilde{A} = ({\bar{w}}_{1}, {\bar{w}}_{2}, . . ., {\bar{w}}_{K}, w)

, where

{\bar{w}}_{i} = {(\begin{matrix} {\bar{w}}_{i}^{1} & \dots & {\bar{w}}_{i}^{m} \end{matrix})}^{T} (i = 1,2, . . ., K)

is the coefficient in the

i

-th sub-model of late fusion, and

w = {(\begin{matrix} w_{1} & \dots & w_{m} \end{matrix})}^{T}

is the coefficient of

x_{i}

in the early fusion.

Proof.

Because

r (A) = r (\tilde{A})

, according to Lemma 1, we can obtain

(\bar{α_{1}}, \bar{α_{2}}, . . ., \bar{α_{n}}

), such that

\sum_{k = 1}^{K} \bar{α_{k}} η_{L}^{k} = η_{E}

. In the late fusion model, outputs of sub-models are

{g_{L}^{k}}^{- 1} (η_{L}^{k})

, which can be converted to

η_{L}^{k}

by

g_{L}^{k} (\cdot)

. So, we can set

f ({g_{L}^{1}}^{- 1} (η_{L}^{1}), {g_{L}^{2}}^{- 1} (η_{L}^{2}), . . ., {g_{L}^{K}}^{- 1} (η_{L}^{K})) = g_{E}^{- 1} (\sum_{k = 1}^{K} \bar{α_{k}} g_{L}^{k} ({g_{L}^{k}}^{- 1} (η_{L}^{k}))) .

(14)

Then,

{o u t p u t}_{E F} = {o u t p u t}_{L F}

. The early fusion model and the late fusion model map the same input to the same output, which indicates the equivalence of the two models. □

Furthermore, we find that the number of equations in (12) represents the number of features, while

K

represents the number of later fusion neutron models. We know that when the number of linearly independent equations in the system is greater than the number of unknowns, the system has no solution, so we can derive the following inference.

Inference 1.

When the number of linearly independent sub-models in the late fusion is less than the total number of features, the early fusion and the late fusion are not equivalent.

According to the conclusion of Theorem 1, we can also find that the early fusion is the parameter search on the full feature space, while the late fusion is the parameter search on the subspace of several features after incorporating the prior knowledge of feature grouping. In general, this kind of feature grouping is a lossy compression. Only when the feature grouping is accurate can the late fusion effectively reduce the parameter search space and improve the parameter search efficiency. This can be explained in detail by the information entropy theory. See Appendix A for details.

This provides us with a revelation that when the sample size is large enough, the model accuracy of the early fusion may be higher than that of the late fusion; however, the cost of sample sizes can be significant. In general, for a generalized linear model, when the number of linearly independent features is greater than the number of samples, the model will almost be overfit with probability 1. Therefore, in the early fusion, since the input of the model is the union of all modal features, the number of features will be very large, the requirements on the number of samples will be very high, and the total number of irrelevant features greater than that of all modes will be needed to greatly reduce the risk of overfitting. In the late fusion, because each mode is trained separately, the number of samples only needs to be greater than the largest number of irrelevant features in all modes, which greatly reduces the demand for the sample size. This conclusion is illustrated by the Vapnik–Chervonenkis (VC) dimension and the Hoeffding inequality [27,28].

When training generalized linear classifiers, there are empirical errors

E_{i n}

and generalization errors

E_{o u t}

. The training goal is to make empirical errors and generalization errors equal as much as possible. Through Hoeffding inequality, we can obtain the following formula [29]:

p (| E_{i n} - E_{o u t} | > ε) \leq 4 {(2 N)}^{d v c} e x p (- ε^{2} N / 8),

(15)

where

N

represents the sample size and

d v c

represents the VC dimension of the classifier. Since the VC dimension of the generalized linear classifier in the d-dimensional space is

d + 1

,

d v c

can be approximated as the feature dimension

d

. Derivation of right of Formula (14), with respect to the number of samples

N,

is obtained as follows:

2 {(2 N)}^{d - 1} (d - ε^{2} N / 8) e x p (- ε^{2} N / 8) .

(16)

Let (15) equal to 0, so that we can obtain

N = 8 d / ε^{2}

. Therefore, when the sample size is large enough, with the increase in the sample size, the probability of

E_{i n} = E_{o u t}

is greater. However, when the number of features increases, in order to ensure the probability, the number of samples must also increase. In the early fusion, when the feature dimension increases

Δ d_{E}

, the sample size needs to increase

\frac{8 Δ d_{E}}{ε^{2}}

as well. At this time in the late fusion, the increase in the feature dimension is

Δ d_{L} \leq Δ d_{E}

, and, therefore, the increase in the sample size is

8 Δ d_{L} / ε^{2} \leq 8 Δ d_{E} / ε^{2}

.

3.2. Comparative Analysis of Early Fusion and Gradual Fusion

In practical applications, the relationships between features and labels frequently exhibit significant complexity, often characterized by progressive associations. Such patterns are particularly prevalent in medical domains, where the diagnosis and treatment of diseases typically involve multifaceted interactions between clinical indicators and pathological outcomes [30,31]. Under such circumstances, conventional early fusion approaches may exhibit limited capacity to capture the intricate relationships inherent in the data, consequently leading to sub-optimal model performance in terms of predictive accuracy. We illustrate this by the following theorem.

Theorem 2.

Under the condition of generalized linear model, when there is a common feature

x

among the modes, a common node in the path from the different modes contains

x

to the final fusion prediction function. The fusion function of the common node is not monotonic for the features

x

. Then, the early fusion and gradual fusion are not equivalent.

Proof.

According to the theorem conditions, we assume that

X_{i} \cap X_{j} = {x}

. It is obvious that there must be common nodes in the path from the different modes containing

x

to the final fusion prediction function; at least the final fusion prediction function is one of them. Then, we assume that the fusion prediction function on this common node has the following expression for

X_{i}

and

X_{j}

:

\tilde{η} = \tilde{f} (f_{i} (X_{i}), f_{j} (X_{j})) = w_{i} f_{i} (X_{i}) + w_{j} f_{j} (X_{j}) + C,

(17)

where

\tilde{f}

represents the fusion prediction function represented by the common node, and

f_{i}

and

f_{j}

represent the fusion prediction function represented by the node pointing to

\tilde{f}

, respectively. We treat other modal features as constants, except for

X_{i}

and

X_{j}

. According to the theorem condition, the pair is non-monotonic in the current node state. Since the fusion prediction functions on the path are all monotonic functions, the final fusion prediction function pair is also non-monotonic for

x

, so

f = g_{G}^{- 1} (\cdot)

is non-monotonic for

x

. Suppose there are two multimodal feature sets

{\bar{X}}^{1}

and

{\bar{X}}^{2}

, of which only feature x has different values,

x^{1}

and

x^{2}

, respectively,

x^{1} {\neq x}^{2}

. In the gradual fusion model,

η_{G}^{1} = G ({\bar{X}}^{1}, F)

and

η_{G}^{2} = G ({\bar{X}}^{2}, F)

, due to

g_{G}^{- 1} (\cdot),

is non-monotonic for

x

, so we may as well assume that

g_{G}^{- 1} (η_{G}^{1}) = g_{G}^{- 1} (η_{G}^{2})

. In the early fusion model,

η_{E}^{1} = W {\bar{X}}^{1} = \sum_{i = 1}^{m - 1} w_{i} x_{i} + w x^{1}

and

η_{E}^{2} = W {\bar{X}}^{2} = \sum_{i = 1}^{m - 1} w_{i} x_{i} + w x^{2}

, then

η_{E}^{1} \neq η_{E}^{2}

, so

g_{E}^{- 1} (η_{E}^{1}) \neq g_{E}^{- 1} (η_{E}^{2})

. This situation shows that gradual fusion maps

{\bar{X}}^{1}

and

{\bar{X}}^{2}

to the same output, while the early fusion maps the two data sets to different outputs, so the gradual fusion model and the early fusion model are non-equivalent. Proof over. □

To illustrate the content of Theorem 2, let us take a concrete example.

Example 2.

There are three modes of features. Each mode includes three-dimensional features denoted as

X_{1} = (x_{1}^{1}, x_{2}^{1}, x_{3}^{1})

,

X_{2} = (x_{1}^{2}, x_{2}^{2}, x_{3}^{2})

, and

X_{3} = (x_{1}^{1}, x_{2}^{3}, x_{3}^{3})

. In constructing the gradual fusion model,

X_{1}

and

X_{2}

are fused first, and then

X_{3}

is fused. The prediction model used is the Logistic model. Suppose the expression for the fusion of

X_{1}

and

X_{2}

is

η_{1} = \sum_{i = 1}^{2} \sum_{x_{j}^{k} \in X_{i}} x_{j}^{k}

. The fusion feature after the first fusion is

1 / (1 + e x p (- (\sum_{i = 1}^{2} \sum_{x_{j}^{k} \in X_{i}} x_{j}^{k}))) .

(18)

Let the fusion feature after the second fusion be

1 / (1 + e x p (- (\sum_{i = 1}^{2} \sum_{x_{j}^{k} \in X_{i}} x_{j}^{k}))) - x_{1}^{1} / 5 + x_{2}^{3} + x_{3}^{3} .

(19)

Then, the prediction expression after the second fusion is

g_{G} (μ) = η_{G} = 1 / (1 + e x p (- (\sum_{i = 1}^{2} \sum_{x_{j}^{k} \in X_{i}} x_{j}^{k}))) - x_{1}^{1} / 5 + x_{2}^{3} + x_{3}^{3} .

(20)

We can obtain

\frac{\partial g_{G} (μ)}{\partial x_{1}^{1}} = e x p (- (\sum_{i = 1}^{2} \sum_{x_{j}^{k} \in X_{i}} x_{j}^{k})) / (1 + e x p (- (\sum_{i = 1}^{2} \sum_{x_{j}^{k} \in X_{i}} x_{j}^{k})))) - 1 / 5 .

(21)

Obviously,

\partial g_{G} (μ) / \partial x_{1}^{1}

is not constant positive or negative, so

g_{G} (\cdot)

is not monotonic for

x_{1}^{1}

. In this case,

g_{G} (\cdot)

is not a generalized linear model.

3.3. Construction of the Fusion Method Selection Paradigm

Theoretically, when there is a generalized linear relationship between the labels and each modal feature, the prediction accuracy of the early fusion model can approach 1 when the sample is large enough, while the accuracy of the later fusion model may approach

1 - ε

due to the loss caused by the compression of the feature space. For this reason, we propose approximate Equations (21) and (22) to evaluate the early fusion and late fusion models’ accuracy through sample size, feature quantity, and modality number in early and late fusions. Furthermore, the suitability of early fusion and late fusion under varying sample size conditions is evaluated.

{A c c u r a c y}_{E F} = 1 - α_{1} * d_{a l l} / N - α_{2} * σ / \sqrt[3]{N},

(22)

{A c c u r a c y}_{L F} = 1 - α_{1} * d_{m a x} / N - α_{2} * σ / \sqrt[3]{N} - α_{3} * \sqrt{K},

(23)

where

d_{a l l}

represents the sum number of all features and

d_{m a x}

represents the max number of all mode features.

d_{a l l} / N

and

d_{m a x} / N

represent the overfitting risk due to imbalance between the feature number and the sample number [32].

σ / \sqrt[3]{N}

represents uncertainty induced by data noise [33].

\sqrt{K}

represents parameter search loss due to modal partitioning. Data signal-to-noise ratio is calculated by (23):

σ = V a r (\hat{y}) / V a r (ϵ),

(24)

where

V a r (\hat{y})

represents the variance of the predicted value by the linear regression model.

V a r (ϵ)

represents the variance of the residual difference between the predicted value and the true label predicted by the linear regression model. Through Equations (21) and (22), we calculate the sample critical value of early fusion and late fusion under the same set of data. Let

{A c c u r a c y}_{E F} = {A c c u r a c y}_{L F}

, so we can obtain

N_{c r i t i c a l} = α_{1} (d_{a l l} - d_{m a x}) / α_{3} * \sqrt{K} .

(25)

When the sample size exceeds the threshold, the performance of the early fusion model outperforms that of the late fusion model.

According to Equations (21) and (22), a fusion method selection paradigm is proposed before constructing the prediction model.

(a): Samples are randomly drawn from the population with a predetermined sample size range of $[d_{a l l} / 3, 100 d_{a l l}]$ . For each sampled set, the signal-to-noise ratio is calculated simultaneously.
(b): Generalized linear models are employed to construct both early fusion and late fusion models. The models are trained on the sampled datasets, and their predictive accuracy is systematically recorded. Concurrently, the theoretical accuracy under the given sample conditions is computed using Equations (21) and (22).
(c): A comparative analysis is conducted between the empirical model’s accuracy and the theoretical accuracy. If the maximum deviation between the model’s accuracy and the theoretical accuracy for either early or late fusion does not exceed 20%, the task can be appropriately addressed using the corresponding generalized linear fusion approach. Conversely, if the deviation exceeds this threshold, the relationship between features and labels is deemed nonlinear, suggesting that generalized linear fusion methods (either early or late fusion) are insufficient. At a 95% confidence level, more sophisticated fusion strategies, such as progressive fusion, should be considered to capture the underlying complexity of the data.

4. Numerical Experiment Validation

4.1. Example of Early Fusion and Late Fusion Equivalence

Let the feature set be

X = (x_{1}, x_{2}, x_{3})

, and there are three modes

X_{1} = (x_{2}, x_{3})

,

X_{2} = (x_{1}, x_{3})

and

X_{3} = (x_{1}, x_{2})

. Constructing a set of 1000 samples,

x_{i} (i = 1,2, 3)

follows a uniform distribution between (0,1).

X_{i} (i = 1,2, 3)

is constructed as described above. Then, a 2-dimensional weight vector between (0,1) is randomly generated for each one, and the average of the sum of the product of each modal feature and the weight is calculated. The final step involves substituting the mean value into the logistic function to compute the variable

\tilde{y}

. In order to simulate the binary classification scenario and make the two types of samples balance, the median is taken as the threshold of mapping

\tilde{y}

to the binary value to obtain the final label

y

. The whole process is as follows:

\tilde{y} = 1 / (1 + e x p (- (\sum_{i = 1}^{3} w_{i} X_{i}) / 3)),

(26)

y = \{\begin{matrix} 0, \tilde{y} \leq m e d i a n \\ 1, \tilde{y} > m e d i a n \end{matrix} .

(27)

Using the above three modes to train the logistic regression model through early fusion and late fusion, respectively, the iteration steps of logistic regression model is 100. The expression of early fusion is as follows:

l n (y / (1 - y)) = 5.688 x_{1} + 9.51 x_{2} + 9.142 x_{3} .

(28)

The expressions of the three sub-models in the late fusion are as follows:

\{\begin{matrix} l n (y / (1 - y)) = 6.131 x_{2} + 6.02 x_{3} \\ l n (y / (1 - y)) = 2.846 x_{1} + 4.401 x_{3} \\ l n (y / (1 - y)) = 2.944 x_{1} + 4.655 x_{2} \end{matrix} .

(29)

We can obtain the coefficient matrix

A

and the augmented matrix

\tilde{A}

,

r a n k (A) = r a n k (\tilde{A}) = 3

. So, early fusion and late fusion are equivalent to Theorem 1. It can be further found that

(α_{1}, α_{2}, α_{3}) = (0.803,0.979,0.986)

is a solution to the system. In the late fusion, the three sub-models can be mapped to the expression of the early fusion by summing the weights with linear weights.

4.2. Comparative Experimental Analysis of Early Fusion and Late Fusion Models

Ten groups of 10-dimensional random variables between (0,1) are randomly generated as the features of the 10 modes, and the weight vector between (0,1) of the corresponding dimension is generated at the same time. The logical function values of the 10 modal features under the weights are calculated, respectively, and the obtained values are averaged and mapped to binary value as the label; see Algorithm 1 for details. The model is a logistic model and trained by early fusion and late fusion, respectively. The iteration steps of the logistic regression model is also 100. In early fusion, the features of all modes are spliced into a vector input model for training. In the late fusion, sub-model is trained on each mode, and the model output is linearly combined to obtain the final prediction result. The samples are divided into a training set and a test set in a 7:3 ratio. The experiment is repeated 100 times, and the average is used as the result.

Algorithm 1 Data Generation Algorithm of Late Fusion Mode

Random

W = {w_{1}, w_{2}, . . ., w_{10}}

,

w_{j} = (w_{1}^{j}, w_{2}^{j}, . . ., w_{10}^{j})

For i in Sample numbers:

Random

{X_{1}^{i}, X_{2}^{i}, . . ., X_{10}^{i}}

,

X_{j}^{i} = (x_{1}^{j}, x_{2}^{j}, . . ., x_{10}^{j})

,

j, k = (1,2, . . ., 10)

Calculate

y_{j}^{i} = s i g m o i d (w_{j} \cdot X_{j}^{i T})

Calculate

{\bar{y}}^{i} = m e a n {y_{j}^{i}}_{j = 1}^{10}

\bar{y} = {({\bar{y}}^{i})}_{i = 1}^{S a m p l e n u m b e r s}

δ = m e d i a n {\bar{y}}

If

{\bar{y}}^{i} \geq δ

:

{\bar{y}}^{i} = 1

Else:

{\bar{y}}^{i} = 0

Figure 3a shows the prediction accuracy of the two fusion methods. It can be found that the model accuracy of the two fusion methods increases with the increase in the number of samples. In addition, when the number of samples is small, due to the feature compression of late fusion, the search space becomes smaller so that the prediction accuracy of the model in late fusion is higher than that in the early fusion. When the sample is large enough, the search in the full feature space in the early fusion makes the model accuracy higher than that in the later fusion. It can also be found that with the continuous improvement of accuracy, the sample size required to increase the same accuracy is also increasing. Furthermore, we also compared the ROC curve and F1-score of the two fusion methods under different sample sizes, as shown in Figure 4. When the sample size is less than 100, the late fusion is better than the early fusion, and when the sample size is more than 100, the early fusion is better than the late fusion, which is consistent with the conclusion of the model’s accuracy.

4.3. Comparative Experimental Analysis of Early Fusion and Gradual Fusion Models

The gradual fusion model in this section consists of two stages. In the first stage, features from certain modalities are grouped, and pairwise dot product calculations are performed. In the second stage, the logarithmic values of computed pairwise values, along with the remaining modality features, are fused into a feature vector using the parallel splicing method from the previous fusion. This fused vector then serves as the input for the final prediction model. The specific experimental process is as follows.

Randomly generate four groups of three-dimensional random variables between (0,1) as features of the four modes. In particular, the last feature of the second and fourth modes is between (0,0.5). Then, the one-dimensional random variable between (0,1), the last feature of the first mode, and the third mode multiply by

- 1 / 9

, which constitutes the feature of the fifth mode. Multiply the corresponding features of the first and second modes, as well as the third and fourth modes to calculate the logarithmic function value. The two pairs of the obtained values and the features of the fifth mode are calculated with the weight 1 to obtain the logical function value, and map it to the binary value as the label; see Algorithm 2 for details. It is obvious that label is non-monotonic for the last feature of the first mode and the third mode. In the early fusion training, the process is similar to the early fusion model training in Section 4.2. In gradual fusion, according to the process of data generation, the corresponding features of the first and second modes, the third and fourth modes are multiplied to obtain the pair value as the fusion feature of the first step. Then, the fusion feature and the features of the fifth mode are used to train the logistic regression model. The samples are divided into the training set and the test set in a 7:3 ratio. The experiment was repeated 100 times, and the average was used as the result.

Algorithm 2 Data Generation Algorithm of Gradual Fusion Mode

For i in Sample numbers:

Random

X_{1}^{i} = {x_{1}^{i}, x_{2}^{i}, x_{3}^{i}}

,

X_{2}^{i} = {x_{4}^{i}, x_{5}^{i}, x_{6}^{i}}

,

X_{3}^{i} = {x_{7}^{i}, x_{8}^{i}, x_{9}^{i}}

,

X_{4}^{i} = {x_{10}^{i}, x_{11}^{i}, x_{12}^{i}}

,

X_{5}^{i} = {{- \frac{1}{9} x_{3}^{i}}_{3}, - \frac{1}{9} x_{9}^{i}, x_{13}^{i}}

.

x_{j}^{i} ϵ (0,1) (j \neq 6,12)

,

x_{j}^{i} ϵ (0,0.5) (j = 6,12)

Calculate

y_{1}^{i} = l o g (X_{1}^{i} \cdot X_{2}^{i T})

and

y_{2}^{i} = l o g (X_{3}^{i} \cdot X_{4}^{i T})

Calculate

{\bar{y}}^{i} = S i g m o i d (y_{1}^{i} + y_{2}^{i} - x_{3}^{i} / 9 - x_{9}^{i} / 9 + x_{13}^{i})

\bar{y} = {({\bar{y}}^{i})}_{i = 1}^{S a m p l e n u m b e r s}

δ = m e d i a n {\bar{y}}

If

{\bar{y}}^{i} \geq δ

:

{\bar{y}}^{i} = 1

Else:

{\bar{y}}^{i} = 0

Figure 3b shows the prediction results of the two fusion methods. It can be found that the gradual fusion is better than the early fusion overall. When the sample size is large enough, the accuracy of the gradual fusion model tends to be 1. When the sample size is small, the precision of the early fusion is low, and when the sample size is increased, the search effect of the full-feature space is enhanced but the upper bound is only 0.9, indicating that the gradual fusion method is worth considering when there is a nonlinear gradual relationship between the features and the labels. In the same way, we also compared the ROC curve and F1-score of the two fusion methods under different sample sizes, as shown in Figure 5. We can see that the effect of gradual fusion is better than that of early fusion in all cases of the sample size, which is consistent with the conclusion of model accuracy.

4.4. Effect Verification of the Approximate Equation and Method Selection Paradigm

In order to illustrate the validity of the accuracy evaluation’s equation form, we perform a series of numerical experiments under the model scene in this paper. First, we estimated the parameters in the equation by numerical experiments. We took

K \in [15,20]

,

d_{m a x} \in [15,25]

, and

d_{a l l} = K * d_{m a x}

and conduct multiple sets of early and late fusion experiments. Each set of experiments was repeated 100 times to obtain the average value, and the estimate of the critical value of the actual sample was obtained by the method of mapping the intersection points of the experiments. The range of

α_{1}

and

α_{3}

were estimated experimentally by minimizing the mean square error of the theoretical and actual values and by grid search. In our experimental setup, the optimal values of

α_{1}

and

α_{3}

were 0.071 and 0.024, respectively. Figure 6a shows the distribution of the theoretical sample’s critical values and the actual sample’s critical values; it can be seen that the scattered points were basically around

y = x

. At the same time, Figure 6b shows the error distribution of the theoretical sample critical values and actual sample critical values, it can be seen that the errors of the theoretical sample’s critical values and the actual sample’s critical values were less than or equal to 0.25, and most of them were within 0.15.

Equations (36) and (37) in [23] also described the sample size

N_{E F}

and

N_{L F}

required for early fusion and late fusion, in which the convergence factor is 0.9. It indicates that the effect of the late fusion model may be better than that of the early fusion model when the sample size is between

N_{L F}

and

N_{E F}

. We compared our calculated

N_{c r i t i c a l}

with

N_{E F}

and

N_{L F}

in four typical cases. The experimental parameters and results are shown in Table 1. In [23], the sample sizes for early fusion and late fusion were evaluated under the premise of ensuring model accuracy, leading to a certain degree of redundancy in the sample estimates derived from parameters

N_{E F}

and

N_{L F}

. More importantly, the study did not account for the correlation between the number of modes and the performance of the early fusion model. In contrast, our proposed method incorporates this factor, allowing our results to more precisely characterize the critical sample threshold at which the performance reversal between early fusion and late fusion models occurs.

Then, we perform the early fusion experiments in which the total number of features is a multiple of 10, the sample size is 200, 500, 1000, 5000, 10,000. Based on the grid search method of theoretical accuracy and model accuracy, we obtain the optimal

α_{2}

is 0.033. Figure 7a shows the relative error between the actual accuracy of the model and the theoretical accuracy of all the early fusion experiments, and it can be seen that the relative error is about 0.18 with 95% confidence. Furthermore, we conducted a late fusion experiment in which the number of modes is integer between [15, 20] and the number of features of each mode is integer between [15, 25]. The sample sizes are also 200, 500, 1000, 5000, and 10,000. Figure 7b shows the relative error between the actual accuracy of the model and the theoretical accuracy of all experiments, and it can be seen that the error is about 0.14 with 95% confidence.

In order to verify the fusion method selection paradigm, Figure 8a shows the model accuracy and theoretical accuracy of the early fusion under the gradual fusion data, including five modes. It can be seen that when the sample size is small, the model accuracy of the early fusion is much different from the theoretical accuracy and the maximum error is about 21.1%. In addition, we also simulated the Lorentz system, a typical scenario in a group of nonlinear systems, and made predictions through the early fusion of logistic regression models. Figure 8b shows the accuracy of the model and its theoretical accuracy. The maximum error is far more than 20%, thus demonstrating the effectiveness of the proposed fusion method selection paradigm.

To reinforce the validation of the model selection paradigm, we applied our paradigm to a real data set of hypoproteinemia patients from a previous study [34]. The data set included 108 control patients and 108 experimental patients, each with four different medical examination features. According to the model selection paradigm, 50, 100, 150, and 216 samples were selected according to the equal proportion of labels to conduct early fusion model and late fusion model experiments. The accuracy of the theoretical model was also calculated, as shown in Figure 9. The results in the figure indicate that the maximum discrepancy between the actual model accuracy and the theoretical accuracy for both early fusion and late fusion exceeds 20%. This suggests that the applicability of these two models in this scenario is limited, thereby highlighting the effectiveness of our proposed model selection paradigm.

5. Discussion

The multisource and multimodal data fusion model has been widely used across various fields, but most of the research is still in the application level and the theoretical comparison of different data fusion methods is rarely studied. Since it is difficult to express all kinds of nonlinear models uniformly and the models themselves are complex, we consider a simple generalized linear model as a starting point for analysis. This paper analyzes the similarities and differences in the early fusion, late fusion, and gradual fusion from the perspective of fusion mechanism through mathematical derivation. Furthermore, a model accuracy evaluation equation is proposed in the early fusion and the late fusion, and the critical sample size threshold evaluates the method at which the performance dominance of early fusion and late fusion models that undergoes a reversal is analyzed. Finally, we give the selection evaluation paradigm of the fusion method in the construction of fusion model.

Specifically, we first give the mathematical definitions of early fusion and late fusion. Based on the definition, the unified expression of the generalized linear model is introduced for derivation. During the derivation process, we find that the equivalence of early fusion and late fusion depends on the model parameter, then the equivalence problem is transformed into a non-homogeneous linear system of model parameters. By the consistency theorem of the non-homogeneous linear system in linear algebra, we obtain the equivalence conditions of the earlier fusion and the later fusion. In the numerical experiment, we found that when the sample size was large, the effect of the early-stage fusion model was better than that of the late-stage fusion, even when the features and labels relatively met the mode of late-stage fusion. One possible reason is that in the early-stage fusion, parameter search is carried out in the full feature space, while in the later stage fusion, the features are divided into different modes for sub-model training, so the search space of parameters is compressed. If the modal distinction is not perfect, certain search loss will be caused, making the model unable to find the optimal parameter in the full space. This is consistent with the conclusion obtained by Pereira et al. [23]. Through a series of analysis, we highlight that under ideal conditions, specifically when the sample size is sufficiently large, there is no need for a complex late fusion. This is because the early fusion model can explore the entire parameter space to find the optimal solution, making it equivalent to or even superior to the late fusion model. However, in real-world scenarios, where sample sizes are often limited, early fusion tends to perform poorly. In such cases, late fusion can leverage its advantages to achieve an acceptable level of prediction accuracy.

We then give a mathematical definition of gradual fusion. We consider that the data features and labels meet the gradual fusion mode, so a nonlinear relationship is introduced between the features and labels. Based on the definition of the generalized linear model, we find that once the monotonicity between intermediate hidden variables and features is lost in the process of progressive fusion, the final label and features will no longer meet the definition of the generalized linear model, so the gradual fusion model and the early fusion will not be equivalent. This conclusion is also verified by numerical experiments.

Furthermore, we consider whether it is possible to quantify the correlation between the model accuracy and the data attributes of early fusion and late fusion. We assume that under the generalized linear condition, the association complexity between data features and labels is low, and combined with literature [19], we put forward Equations (21) and (22). Based on (21) and (22), we analyze the sample critical points of the model accuracy of early fusion and late fusion. Combined with numerical experiments of critical points, we give an estimate of the parameters through an optimization method. Moreover, the numerical experiment results show that the model relative error of the early fusion and the late fusion is controlled at about 18% and 14% with 95% confidence, which indicates the validity of Formulas (21) and (22). Finally, we put forward an evaluation paradigm for the selection of fusion mode in the construction of a fusion model, which provides a certain theoretical basis for the selection of fusion mode before model construction.

The research in this paper is carried out under the condition of the generalized linear model, and one of the future research directions is to extend the application scope of our conclusions to a specific class of nonlinear models, such as deep learning. Deep learning fundamentally relies on two key concepts: perceptrons and activation functions. By stacking multiple layers of perceptrons, activation functions, and various additional operations, a diverse range of deep learning models can be constructed. Since a perceptron combined with a sigmoid activation function essentially forms a generalized linear model, our findings can be extended to the design of single-layer perceptrons within certain deep-learning architectures, particularly in determining sample sizes and parameter counts. In addition, the parameter optimization of Formulas (21) and (22) is also a worthy research direction. Finding better parameters can narrow the error threshold in the selection evaluation paradigm of fusion mode and make the evaluation more accurate.

6. Conclusions

In this paper, we propose selection paradigms that provide a theoretical basis for selecting early fusion, late fusion, and gradual fusion models before performing a specific task. Unlike most of the recent fusion model studies that have on experimental results, our research is performed based on the mechanistic understanding of the fusion process. Specifically, we establish the equivalence conditions between early fusion and late fusion within the framework of generalized linear models, along with the failure conditions of early fusion in scenarios where nonlinear correlations exist between features and labels. This enhances our understanding of the similarities and differences between various fusion methods. Subsequently, we construct model accuracy evaluation equations that incorporate sample size, feature quantity, and modality number guided by theoretical insights, enabling a data-driven approach to fusion method evaluation. Furthermore, we propose a critical sample size threshold evaluation method at which the performance dominance of early fusion and late fusion models undergoes a reversal through the equations, allowing for the understanding of the trade-offs between model accuracy and data size. Finally, we propose the model selection paradigm based on accuracy evaluation equations. Numerical experimental results demonstrate that the proposed paradigm reliably identifies the optimal fusion strategy. In future research, our theoretical framework is anticipated to significantly reduce the computational and resource costs associated with model construction, thereby enhancing the scalability and efficiency of data fusion methodologies.

Author Contributions

Conceptualization, Z.L. and Z.Y.; methodology, Z.L. and Z.Y.; software, Z.L.; validation, Z.Y., Z.M. and B.G.; writing—original draft preparation, Z.L.; writing—review and editing, Z.Y., Z.M., B.G. and Z.Z.; funding acquisition, B.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Science and Technology Major Project, grant: 2021ZD0201302, the National Natural Science Foundation of China, grant: 12201025, the Fundamental Research Funds for the Central Universities, and the Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing.

Data Availability Statement

The data used in the paper can be generated according to Algorithms 1 and 2.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EF	Early fusion
IF	Intermediate fusion
LF	Late fusion
GF	Gradual fusion

Appendix A

For

K

mode features

X_{1}

,

X_{2}

,...,

X_{K}

, whose joint information entropy is defined as

H (X_{1}, X_{2}, . . ., X_{K}) = - \sum_{x_{1}, x_{2}, . . ., x_{K}} p (x_{1}, x_{2}, . . ., x_{K}) l o g p (x_{1}, x_{2}, . . ., x_{K}) .

(A1)

The larger the

H (X_{1}, X_{2}, . . ., X_{K})

, the more information the

X_{1}, X_{2}, . . ., X_{K}

contains. Then, the mutual information of the

K

modes is

I (X_{1}, X_{2}, . . ., X_{K}) = \sum_{i = 1}^{K} H (X_{i}) - H (X_{1}, X_{2}, . . ., X_{K}) .

(A2)

In the early fusion model, all mode features are spliced into a feature vector and then used in the downstream task. We represent the model output as

O_{E}

:

O_{E} = f_{E} (X_{1}, X_{2}, . . ., X_{K}),

(A3)

so that when the function

f_{E}

is very reasonable, it can approximately retain all the information of the joint input and label

Y

, as follows:

I (O_{E}, Y) = I ((X_{1}, X_{2}, . . ., X_{K}), Y) .

(A4)

In the late fusion model, by training a single model

h_{m} (X_{m})

for all the modes and then combining the decisions, we represent the model output as

O_{L}

:

O_{L} = f_{L} (h_{1}, h_{2}, . . ., h_{M}) .

(A5)

The simple-decision fusion function

f_{L}

, similar to weighted summation, cannot accurately obtain the higher-order correlation between modes, resulting in the loss of certain mode interaction information in the output, so

I (O_{L}, Y) < I ((h_{1}, h_{2}, . . ., h_{M}), Y) \leq I ((X_{1}, X_{2}, . . ., X_{K}), Y) .

(A6)

To sum up, we can obtain

I (O_{L}, Y) < I (O_{E}, Y)

, and it is difficult for the late fusion model to achieve the same effect as the early fusion model.

Appendix B

Let

\{\begin{matrix} α_{1} = {(a_{11}, \dots, a_{m 1})}^{T} \\ α_{2} = {(a_{12}, \dots, a_{m 2})}^{T} \\ ⋮ \\ α_{n} = {(a_{1 n}, \dots, a_{m n})}^{T} \\ β = {(b_{1}, \dots, b_{m})}^{T} \end{matrix} .

(A7)

Then, the nonlinear secondary equations can be written as

x_{1} α_{1} + x_{2} α_{2} + . . . + x_{n} α_{n} = β .

(A8)

Let

S = {α_{1}, α_{2}, . . ., α_{n}}

. The equations have a solution ⇔

β

, which is a linear combination of

S

⇔

β \in V (S)

⇔

V (S) = V (S \cup | β |)

, where

V (S)

represents the subspace generated by the collection

S

.

Because

S \cup | β | \supseteq S

,

V (S \cup | β |) \supseteq V (S)

,

V (S) = V (S \cup | β |) \Leftrightarrow d i m V (S) = d i m V (S \cup | β |)

. Since

S

is column vectors of

A

,

d i m V (S) = r a n k S = r a n k A

. In a similar way,

S \cup | β | = {α_{1}, α_{2}, . . ., α_{n}, β}

are the column vectors of

\tilde{A}

,

d i m V (S \cup | β |) = r a n k (S \cup | β |) = r a n k \tilde{A}

. Thus, can be obtained

r a n k A = d i m V (S) = d i m V (S \cup | β |) = r a n k \tilde{A}

.

References

White, F.E. Data fusion lexicon. JDL. Tech. Panel C 1991, 3, 19. [Google Scholar]
Durrant-Whyte, H.F. Sensor models and multisensor integration. Int. J. Robot. Res. 1988, 7, 97–113. [Google Scholar] [CrossRef]
Dasarathy, B.V. Sensor fusion potential exploitation-innovative architectures and illustrative applications. Proc. IEEE 1997, 85, 24–38. [Google Scholar]
Lipkova, J.; Chen, R.J.; Chen, B.; Lu, M.Y.; Barbieri, M.; Shao, D.; Vaidya, A.J.; Chen, C.; Zhuang, L.; Williamson, D.F.; et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell 2022, 40, 1095–1110. [Google Scholar]
Stahlschmidt, S.R.; Ulfenborg, B.; Synnergren, J. Multimodal deep learning for biomedical data fusion: A review. Brief. Bioinform. 2022, 23, bbab569. [Google Scholar] [CrossRef]
Zhao, F.; Zhang, C.; Geng, B. Deep Multimodal Data Fusion. ACM Comput. Surv. 2024, 56, 216. [Google Scholar]
Mohsen, F.; Ali, H.; El Hajj, N.; Shah, Z. Artificial intelligence-based methods for fusion of electronic health records and imaging data. Sci. Rep. 2022, 12, 17981. [Google Scholar]
Qi, P.; Chiaro, D.; Piccialli, F. FL-FD: Federated learning-based fall detection with multimodal data fusion. Inf. Fusion 2023, 99, 101890. [Google Scholar] [CrossRef]
Mervitz, J.; de Villiers, J.; Jacobs, J.; Kloppers, M. Comparison of early and late fusion techniques for movie trailer genre labelling. In Proceedings of the 2020 IEEE 23rd International Conference on Information Fusion (FUSION), Rustenburg, South Africa, 6–9 July 2020; pp. 1–8. [Google Scholar]
Yao, W.; Moumtzidou, A.; Dumitru, C.O.; Andreadis, S.; Gialampoukidis, I.; Vrochidis, S.; Datcu, M. Early and late fusion of multiple modalities in Sentinel imagery and social media retrieval. In Pattern Recognition. ICPR International Workshops and Challenges; Springer: Cham, Switzerland, 2021; pp. 591–606. [Google Scholar] [CrossRef]
Pandeya, Y.R.; Lee, J. Deep learning-based late fusion of multimodal information for emotion classification of music video. Multimed. Tools Appl. 2021, 80, 2887–2905. [Google Scholar]
Pawłowski, M.; Wróblewska, A.; Sysko-Romańczuk, S. Effective techniques for multimodal data fusion: A comparative analysis. Sensors 2023, 23, 2381. [Google Scholar] [CrossRef]
Trinh, M.; Shahbaba, R.; Stark, C.; Ren, Y. Alzheimer’s disease detection using data fusion with a deep supervised encoder. Front. Dement. 2024, 3, 1332928. [Google Scholar]
Liu, Z.; Shen, Y.; Lakshminarasimhan, V.B.; Liang, P.P.; Zadeh, A.; Morency, L.-P. Efficient low-rank multimodal fusion with modality-specific factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018. [Google Scholar]
Zadeh, A.; Chen, M.; Poria, S.; Cambria, E.; Morency, L.-P. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 1103–1114. [Google Scholar]
Boulahia, S.Y.; Amamra, A.; Madi, M.R.; Daikh, S. Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Mach. Vis. Appl. 2021, 32, 121. [Google Scholar]
Koca, M.B.; Sevilgen, F.E. Comparative Analysis of Fusion Techniques for Integrating Single-cell Multiomics Datasets. In Proceedings of the 2024 32nd Signal Processing and Communications Applications Conference, Mersin, Turkey, 15–18 May 2024; pp. 1–4. [Google Scholar]
Kantarcı, M.G.; Gönen, M. Multimodal Fusion for Effective Recommendations on a User-Anonymous Price Comparison Platform. In Proceedings of the 2024 IEEE Conference on Artificial Intelligence, Singapore, 25–27 June 2024; pp. 947–952. [Google Scholar]
Xu, R.; Xiang, H.; Xia, X.; Han, X.; Li, J.; Ma, J. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication. In Proceedings of the 2022 International Conference on Robotics and Automation, Philadelphia, PA, USA, 23–27 May 2022. [Google Scholar]
Chen, Y.; Bruzzone, L. Self-supervised SAR-optical data fusion of Sentinel-1/-2 images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5406011. [Google Scholar]
Zhang, P.; Wang, D.; Yu, Z.; Zhang, Y.; Jiang, T.; Li, T. A multi-scale information fusion-based multiple correlations for unsupervised attribute selection. Inform. Fusion 2024, 106, 102276. [Google Scholar]
Zhang, Q.; Zhang, P.; Li, T. Information fusion for large-scale multi-source data based on the Dempster-Shafer evidence theory. Inform. Fusion 2025, 115, 102754. [Google Scholar] [CrossRef]
Pereira, L.M.; Salazar, A.; Vergara, L. A comparative analysis of early and late fusion for the multimodal two-class problem. IEEE Access 2023, 11, 84283–84300. [Google Scholar]
Wu, X.; Zhao, X. Influence analysis of generalized linear model. Math. Stat. Appl. Probab. 1994, 4, 34–44. [Google Scholar]
Guo, W.; Wang, J.; Wang, S. Deep multimodal representation learning: A survey. IEEE Access 2019, 7, 63373–63394. [Google Scholar] [CrossRef]
Li, S. Linear Algebra; Higher Education Press: Beijing, China, 2006. [Google Scholar]
Vapnik, V.N.; Chervonenkis, A.Y. On the uniform convergence of relative frequencies of events to their probabilities. In Measures of Complexity: Festschrift for Alexey Chervonenkis; Springer: Cham, Switzerland, 2015; pp. 11–30. [Google Scholar]
Sontag, E.D. VC dimension of neural networks. NATO ASI Ser. F Comput. Syst. Sci. 1998, 168, 69–96. [Google Scholar]
Cherkassky, V.; Dhar, S.; Vovk, V.; Papadopoulos, H.; Gammerman, A. Measures of Complexity: Festschrift for Alexey Chervonenkis; Springer: Cham, Switzerland, 2015; p. 17. [Google Scholar]
Serebruany, V.L.; Christenson, M.J.; Pescetti, J.; McLean, R.H. Hypoproteinemia in the hemolytic-uremic syndrome of childhood. Pediatr. Nephrol. 1993, 7, 72–73. [Google Scholar]
Wu, S.; Shang, X.; Guo, M.; Su, L.; Wang, J. Exosomes in the diagnosis of neuropsychiatric diseases: A review. Biology 2024, 13, 387. [Google Scholar] [CrossRef] [PubMed]
Ying, X. An overview of overfitting and its solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. [Google Scholar] [CrossRef]
Chabé-Ferret, S. Statistical Tools for Causal Inference. Social Science Knowledge Accumulation Initiative (SKY), 2022.
Liu, Z.; Kang, Q.; Mi, Z.; Yuan, Y.; Sang, T.; Guo, B.; Zheng, Z.; Yin, Z.; Tian, W. Construction of geriatric hypoalbuminemia predicting model for hypoalbuminemia patients with and without pneumonia and explainability analysis. Front. Med. 2024, 11, 1518222. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Three fusion method frameworks: (a) Early fusion framework; (b) intermediate fusion framework; and (c) late fusion framework.

Figure 2. An illustration of the construction of the proposed fusion method selection paradigm with the relation of the three data fusion methods. Starting from the construction method selection paradigm, we conducted a comparative analysis of early–late fusion and early–gradual fusion. This analysis yielded a series of theoretical results, corresponding to the gray squares on the left and right sides of the figure. Finally, we validated our conclusions through multiple numerical experiments.

Figure 3. (a) Comparison of the accuracy of early fusion and late fusion models. The shaded portion represents the range of variance for 100 experiments. (b) Comparison of the accuracy of early fusion and gradual fusion models. The shaded portion represents the range of variance for 100 experiments.

Figure 4. ROC curve and F1-score of early fusion model and late fusion model under different sample sizes. When the sample size is 20, the ROC curves of the two models coincide.

Figure 5. ROC curve and F1-score of the early fusion model and the gradual fusion model under different sample sizes.

Figure 6. (a) Scatter plot of accuracy threshold of early fusion and late fusion models. The X-axis represents the theoretical value derived from

N_{c r i t i c a l}

. The Y-axis represents the experimental results. (b) The error distribution of the theoretical sample’s critical values and the actual sample’s critical values.

Figure 6. (a) Scatter plot of accuracy threshold of early fusion and late fusion models. The X-axis represents the theoretical value derived from

N_{c r i t i c a l}

. The Y-axis represents the experimental results. (b) The error distribution of the theoretical sample’s critical values and the actual sample’s critical values.

Figure 7. The relative error distribution between the actual accuracy of the model and the theoretical accuracy of early fusion experiments and late fusion experiments. (a) Relative error distribution between the actual accuracy of the model and the theoretical accuracy of early fusion experiments. (b) Relative error distribution between the actual accuracy of the model and the theoretical accuracy of late fusion experiments.

Figure 8. The accuracy of the early fusion model and theoretical accuracy in the nonlinear data. (a) The model accuracy and theoretical accuracy of the early fusion under the gradual fusion data. When the sample size is smaller, the error between model accuracy and theoretical accuracy is larger. (b) The model accuracy and theoretical accuracy of the early fusion under the Lorentz system data. The error between model accuracy and theoretical accuracy is large at all times.

Figure 9. The accuracy of early fusion model, late fusion model, and theoretical accuracy in the hypoproteinemia data set. The maximum error of the actual accuracy and the theoretical accuracy of the two models exceeds 20%, as described in the model selection paradigm.

Table 1. The experimental parameters and results of sample size comparison experiment.

Experimental Parameters	$N_{a c t u a l}$	$N_{E F}$	$N_{L F}$	$N_{c r i t i c a l}$
$K = 3$ $, d_{m a x} = 50$ $, d_{a l l} = 150$	185	1511	511	170
$K = 15$ $, d_{m a x} = 10$ $, d_{a l l} = 150$	95	1511	111	106
$K =$ $3, d_{m a x} = 15$ $0, d_{a l l} = 450$	490	4511	1511	512
$K =$ $15, d_{m a x} =$ $30, d_{a l l} = 450$	340	4511	311	320

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Yin, Z.; Mi, Z.; Guo, B.; Zheng, Z. A Comparative Analysis of Three Data Fusion Methods and Construction of the Fusion Method Selection Paradigm. Mathematics 2025, 13, 1218. https://doi.org/10.3390/math13081218

AMA Style

Liu Z, Yin Z, Mi Z, Guo B, Zheng Z. A Comparative Analysis of Three Data Fusion Methods and Construction of the Fusion Method Selection Paradigm. Mathematics. 2025; 13(8):1218. https://doi.org/10.3390/math13081218

Chicago/Turabian Style

Liu, Ziqi, Ziqiao Yin, Zhilong Mi, Binghui Guo, and Zhiming Zheng. 2025. "A Comparative Analysis of Three Data Fusion Methods and Construction of the Fusion Method Selection Paradigm" Mathematics 13, no. 8: 1218. https://doi.org/10.3390/math13081218

APA Style

Liu, Z., Yin, Z., Mi, Z., Guo, B., & Zheng, Z. (2025). A Comparative Analysis of Three Data Fusion Methods and Construction of the Fusion Method Selection Paradigm. Mathematics, 13(8), 1218. https://doi.org/10.3390/math13081218

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Comparative Analysis of Three Data Fusion Methods and Construction of the Fusion Method Selection Paradigm

Abstract

1. Introduction

2. Description of the Problem and Related Concepts

3. Comparative Analysis and Construction of the Fusion Method Selection Paradigm

3.1. Comparative Analysis of Early Fusion and Late Fusion

3.2. Comparative Analysis of Early Fusion and Gradual Fusion

3.3. Construction of the Fusion Method Selection Paradigm

4. Numerical Experiment Validation

4.1. Example of Early Fusion and Late Fusion Equivalence

4.2. Comparative Experimental Analysis of Early Fusion and Late Fusion Models

4.3. Comparative Experimental Analysis of Early Fusion and Gradual Fusion Models

4.4. Effect Verification of the Approximate Equation and Method Selection Paradigm

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI