# A Dimensionality Reduction Algorithm for Unstructured Campus Big Data Fusion

^{*}

## Abstract

**:**

## 1. Introduction

- We construct the fusion model of unstructured campus data. The representation model for specific types of data has been very mature, but there is no method to integrate the video, audio, image and so on of unstructured campus data into one model. This paper proposes a fusion model of heterogeneous campus data. The model transforms a variety of heterogeneous campus student data into a corresponding vector form, and establishes corresponding sub tensor models according to students’ class video, class image, answer audio, evaluation text, etc. Then, the semi tensor product method is used to fuse tensors of different orders to realize the fusion of individual sub tensor models of students and abstract the labeled student model.
- Extraction of core tensors. After the fusion of a sub-tensor model, heterogeneous data can be utilized by various algorithms. Due to the large amount of data, this can cause huge time consumption for subsequent analysis. This paper proposes a core tensor extraction method. The original tensor is decomposed using singular value decomposition, and a smaller core tensor is extracted from the original tensor, which can reduce the data storage capacity and the computation time.

## 2. Related Background Knowledge

- ${T}_{m}$ is mode-m unfolded matrix;
- $\parallel T\parallel $ is the frobenius norm of tensor $\mathrm{T}$;
- ${\times}_{n}$ is n-mode product of a tensor;
- $\u2a02$ is Kronecker product;
- $\propto $ is semi-tensor product.

**Theorem**

**1.**

**Corollary**

**1.**

**Theorem**

**2.**

## 3. icHOSVD Algorithm for Unstructured Campus Big Data Fusion

#### 3.1. Framework of the icHOSVD Algorithm

#### 3.2. Fusion Model of Unstructured Campus Data

#### 3.2.1. Subtensor Model of Heterogeneous Data from Multiple Sources

- (1)
- The sub-tensor representation method of video data.

- (2)
- The sub-tensor representation method of audio data.

- (3)
- The sub-tensor representation method of image data.

- (4)
- The sub-tensor representation method of text data.

#### 3.2.2. A Tensor Space Fusion Method Based on Semi-Tensor Product

#### 3.3. An icHOSVD Algorithm Based on Tensor

#### 3.3.1. Tensor Segmentation

#### 3.3.2. icHOSVD Algorithm

Algorithm 1. The recursive HOSVD algorithm. |

Input: matrix ${M}_{i}$, matrix ${C}_{i}$Output: new left unitary matrix $U$, positive semi-definite diagonal matrix $\sum $, right unitary matrix $V$ |

1. if $\mathrm{i}>1$then |

2. $({U}_{j},{\sum}_{j},{C}_{j})\leftarrow \mathrm{HOSVD}({M}_{i},{C}_{i})$; |

3. $blend({M}_{i-1},{C}_{i-1},{U}_{i-1},{\sum}_{j-1},{C}_{j-1})$; |

4. $\mathrm{i}\leftarrow \mathrm{i}-1$; |

5. else if$\text{}\mathrm{i}=1$ |

6. $HOSVD({M}_{i})$ |

7. end |

8. end |

9. return$U,\sum ,V$; |

## 4. Experiment Analysis

- (1)
- Time complexity.

_{1}and C

_{2}are constants. To begin by adding columns to raw matrix, the time complexity of one unfolded matrix decomposed by a singular value decomposition is $O\left({k}^{2}n\right)$, whereas $k$ is the number of truncated left singular vectors. After unfolding, a p-order tensor has p-mode unfolding matrixes. The matrix unfolding time is $O\left(p{k}^{2}n\right)$. The semi-product time of a tensor by a truncated base is $O\left({k}^{2}n\right)$. The total semi-product time is $O\left(p{k}^{2}n\right)$. The total time of icHOSVD algorithm is $O\left(1\right)+O\left(p{k}^{2}n\right)+O\left(p{k}^{2}n\right)$, which is $O\left(p{k}^{2}n\right)$.

- (2)
- Computation accuracy.

**Reconstruction error rate:**the formula of reconstruction error rate is shown in Formula (18).

**Dimensionality Reduction Ratio:**the formula of dimensionality reduction ratio is shown in Formula (19).

- (3)
- Comparison with other methods.

## 5. Summary and Outlook

