# A Method of Combining Hidden Markov Model and Convolutional Neural Network for the 5G RCS Message Filtering

## Abstract

## 1. Introduction

- This paper is the first time a method for RCS message filtering has been proposed.
- The proposed method is a combination of HMM and CNN methods and produces the RCS message property based on 210 text fields where spam information may exist.
- The proposed combination method achieves promising results.

## 2. Related Work

## 3. 5G RCS Message Filtering Models and Methods

#### 3.1. The 210 RCS Message Text Fields

#### 3.2. The HMM for RCS Message Short Text Fields Weighting

#### 3.2.1. Short Text Preprocessing

#### 3.2.2. Feature Extraction for Training Short Texts

#### 3.2.3. HMM Training with the Observation Sequence and the Hidden State Sequence

#### 3.2.4. HMM Decoding with the Observation Sequences of Testing Set

#### 3.2.5. Weighting for Other Texts with Fewer Words

#### 3.3. The CNN Model for RCS Message Filtering

#### 3.3.1. Creation of Feature Matrix

#### 3.3.2. 2D Convolution for Relevant Features Extraction

#### 3.3.3. MaxPooling to Avoid over Fitting

#### 3.3.4. Optimizations for CNN

`Batch normalization`—Batch normalization is used to standardize the extracted relevant features between CNNs. It not only speeds up learning but also decreases the internal covariate shift of a CNN.`Dropout`—Before the last fully connected layer, a dropout function runs on the output matrix of CNNs to reduce computational complexity and redundant information. RCS message filtering is a dropout-suitable scenario in which the card data of most messages are insufficient.`Optimizer`—Adam optimizer is applied to optimize parameters, in this case to increase accuracy.`Dense`—Dense, also known as the fully connected layer, runs an activation function to determine the property of an RCS message. Many activation functions, such as Sigmoid, Softmax and Linear, are tested. The Linear function is the present setting as it produces the best results.

## 4. Experiments and Discussion

#### 4.1. Data Analysis

#### 4.2. Experiments and Results

#### 4.3. Discussions

## 5. Conclusions and Future Work

## Author Contributions

## Funding

## Data Availability Statements

## Conflicts of Interest

## References

Chosen Fields | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|

Menu 1 | ... | Menu 4 | Card Label | |||||||

Card1 | Title | Content | BT | LD | LC | ... | BT | LD | LC | ham or spam |

Card2 | Title | Content | BT | LD | LC | ... | BT | LD | LC | ham or spam |

... | ... | ... | ... | ... | ||||||

Card15 | Title | Content | BT | LD | LC | ... | BT | LD | LC | ham or spam |

Message Label | ham or spam |

^{a}BT, LD and LC are abbreviations for button text, link domain and link content respectively.

Ham | Spam | Total | |
---|---|---|---|

Total number | 381 | 59 | 440 |

Percentage | 86.6% | 13.4% | 100% |

Training Set | Testing Set | Total | |
---|---|---|---|

Total number | 293 | 147 | 440 |

Percentage | 66.6% | 33.4% | 100% |

Model | Actual | Predicted | Predicted % | AUC | ||
---|---|---|---|---|---|---|

The proposed method | Ham | Spam | Ham | Spam | ||

Spam | 3 | 20 | 13.0% | 86.9% | 0.965 | |

Ham | 122 | 2 | 98.4% | 1.6% |

Model | Class | Accuracy | Precision | Recall | F-Measure |
---|---|---|---|---|---|

The proposed method | Ham | 0.965 | 0.983 | 0.976 | 0.979 |

Spam | 0.870 | 0.909 | 0.889 |

