# A Method of Combining Hidden Markov Model and Convolutional Neural Network for the 5G RCS Message Filtering

^{*}

## Abstract

**:**

## 1. Introduction

- This paper is the first time a method for RCS message filtering has been proposed.
- The proposed method is a combination of HMM and CNN methods and produces the RCS message property based on 210 text fields where spam information may exist.
- The proposed combination method achieves promising results.

## 2. Related Work

## 3. 5G RCS Message Filtering Models and Methods

#### 3.1. The 210 RCS Message Text Fields

#### 3.2. The HMM for RCS Message Short Text Fields Weighting

#### 3.2.1. Short Text Preprocessing

#### 3.2.2. Feature Extraction for Training Short Texts

#### 3.2.3. HMM Training with the Observation Sequence and the Hidden State Sequence

#### 3.2.4. HMM Decoding with the Observation Sequences of Testing Set

#### 3.2.5. Weighting for Other Texts with Fewer Words

#### 3.3. The CNN Model for RCS Message Filtering

#### 3.3.1. Creation of Feature Matrix

#### 3.3.2. 2D Convolution for Relevant Features Extraction

#### 3.3.3. MaxPooling to Avoid over Fitting

#### 3.3.4. Optimizations for CNN

`Batch normalization`—Batch normalization is used to standardize the extracted relevant features between CNNs. It not only speeds up learning but also decreases the internal covariate shift of a CNN.`Dropout`—Before the last fully connected layer, a dropout function runs on the output matrix of CNNs to reduce computational complexity and redundant information. RCS message filtering is a dropout-suitable scenario in which the card data of most messages are insufficient.`Optimizer`—Adam optimizer is applied to optimize parameters, in this case to increase accuracy.`Dense`—Dense, also known as the fully connected layer, runs an activation function to determine the property of an RCS message. Many activation functions, such as Sigmoid, Softmax and Linear, are tested. The Linear function is the present setting as it produces the best results.

## 4. Experiments and Discussion

#### 4.1. Data Analysis

#### 4.2. Experiments and Results

#### 4.3. Discussions

## 5. Conclusions and Future Work

## Author Contributions

## Funding

## Data Availability Statements

## Conflicts of Interest

## References

- ChinaMobile; ChinaTelecom; ChinaUnicom. 5G Messaging White Paper. 2020, pp. 1–12. Available online: https://www.gsma.com/futurenetworks/wp-content/uploads/2020/04/5G-Messaging-White-Paper-EN.pdf (accessed on 8 July 2021).
- GSMA. The Mobile Economy. 2020, pp. 2–62. Available online: https://www.gsma.com/mobileeconomy/wp-content/uploads/2020/03/GSMA_MobileEconomy2020_Global.pdf (accessed on 8 July 2021).
- CAICT. White Paper on China’s 5G Development and Its Economic and Social Impacts. China Acad. Inf. Commun. Technol.
**2020**, 12, 1–46. [Google Scholar] - Kang, M.; Ahn, J.; Lee, K. Opinion mining using ensemble text hidden Markov models for text classification. Expert Syst. Appl.
**2018**, 94, 218–227. [Google Scholar] [CrossRef] - Xie, J.; Chen, B.; Gu, X.; Liang, F.; Xu, X. Self-Attention-Based BiLSTM Model for Short Text Fine-Grained Sentiment Classification. IEEE Access
**2019**, 7, 180558–180570. [Google Scholar] [CrossRef] - Hadi, W.; Al-Radaideh, Q.A.; Alhawari, S. Integrating associative rule-based classification with Naïve Bayes for text classification. Appl. Soft Comput. J.
**2018**, 69, 344–356. [Google Scholar] [CrossRef] - Liu, Z.; Kan, H.; Zhang, T.; Li, Y. DUKMSVM: A framework of deep uniform kernel mapping support vector machine for short text classification. Appl. Sci.
**2020**, 10, 2348. [Google Scholar] [CrossRef] [Green Version] - Samant, S.S.; Bhanu Murthy, N.L.; Malapati, A. Improving Term Weighting Schemes for Short Text Classification in Vector Space Model. IEEE Access
**2019**, 7, 166578–166592. [Google Scholar] [CrossRef] - Gashti, M.Z. Detection of Spam Email by Combining Harmony Search Algorithm and Decision Tree. Eng. Technol. Appl. Sci. Res.
**2017**, 7, 1713–1718. [Google Scholar] [CrossRef] - Alsmadi, I.; Hoon, G.K. Term weighting scheme for short-text classification: Twitter corpuses. Neural Comput. Appl.
**2019**, 31, 3819–3831. [Google Scholar] [CrossRef] - Rao, D.; Huang, S.; Jiang, Z.; Deverajan, G.G.; Patan, R. A dual deep neural network with phrase structure and attention mechanism for sentiment analysis. Neural Comput. Appl.
**2021**, 6. [Google Scholar] [CrossRef] - Yu, S.; Liu, D.; Zhu, W.; Zhang, Y.; Zhao, S. Attention-based LSTM, GRU and CNN for short text classification. J. Intell. Fuzzy Syst.
**2020**, 39, 333–340. [Google Scholar] [CrossRef] - Barushka, A.; Hajek, P. Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks. Appl. Intell.
**2018**, 48, 3538–3556. [Google Scholar] [CrossRef] [Green Version] - Zhang, L.; Jiang, W.; Zhao, Z. Short-text feature expansion and classification based on nonnegative matrix factorization. Int. J. Intell. Syst.
**2020**, 1–15. [Google Scholar] [CrossRef] - Pang, J.; Rao, Y.; Xie, H.; Wang, X.; Wang, F.L.; Wong, T.L.; Li, Q. Fast Supervised Topic Models for Short Text Emotion Detection. IEEE Trans. Cybern.
**2021**, 51, 815–828. [Google Scholar] [CrossRef] [PubMed] - Xu, J.; Cai, Y.; Wu, X.; Lei, X.; Huang, Q.; fung Leung, H.; Li, Q. Incorporating context-relevant concepts into convolutional neural networks for short text classification. Neurocomputing
**2020**, 386, 42–53. [Google Scholar] [CrossRef] - Hu, X.; Wang, H.; Li, P. Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection. Pattern Recognit. Lett.
**2018**, 116, 187–194. [Google Scholar] [CrossRef] - Tuan, A.P.; Tran, B.; Nguyen, T.H.; Van, L.N.; Than, K. Bag of biterms modeling for short texts. Knowl. Inf. Syst.
**2020**, 62, 4055–4090. [Google Scholar] [CrossRef] - Škrlj, B.; Martinc, M.; Kralj, J.; Lavrač, N.; Pollak, S. tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification. Comput. Speech Lang.
**2021**, 65. [Google Scholar] [CrossRef] - Anderlucci, L.; Viroli, C. Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data. Adv. Data Anal. Classif.
**2020**, 14, 759–770. [Google Scholar] [CrossRef] - Enamoto, L.; Weigang, L.; Filho, G.P. Generic framework for multilingual short text categorization using convolutional neural network. Multimed. Tools Appl.
**2021**. [Google Scholar] [CrossRef] - Hao, M.; Xu, B.; Liang, J.Y.; Zhang, B.W.; Yin, X.C. Chinese short text classification with mutual-attention convolutional neural networks. ACM Trans. Asian Low-Resour. Lang. Inf. Process.
**2020**, 19, 1–13. [Google Scholar] [CrossRef] - Torres, J.; Vaca, C.; Terán, L.; Abad, C.L. Seq2Seq models for recommending short text conversations. Expert Syst. Appl.
**2020**, 150, 113270. [Google Scholar] [CrossRef] - Xia, T.; Chen, X. A discrete hidden Markov model for SMS spam detection. Appl. Sci.
**2020**, 10, 5011. [Google Scholar] [CrossRef] - Xia, T.; Chen, X. A Weighted Feature Enhanced Hidden Markov Model for Spam SMS Filtering. Neurocomputing
**2021**, 444, 48–58. [Google Scholar] [CrossRef] - GSMA. Official Document RCC.71—RCS Universal Profile Service Definition Document. 2020, pp. 1–260. Available online: https://www.gsma.com/futurenetworks/wp-content/uploads/2019/10/RCC.71-v2.4.pdf (accessed on 8 July 2021).
- Rabiner, L.R.; Juang, B.H. An Introduction to Hidden Markov Models. IEEE ASSP Mag.
**1986**, 3, 4–16. [Google Scholar] [CrossRef] - Roy, P.K.; Singh, J.P.; Banerjee, S. Deep learning to filter SMS Spam. Future Gener. Comput. Syst.
**2020**, 102, 524–533. [Google Scholar] [CrossRef]

Chosen Fields | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|

Menu 1 | ... | Menu 4 | Card Label | |||||||

Card1 | Title | Content | BT | LD | LC | ... | BT | LD | LC | ham or spam |

Card2 | Title | Content | BT | LD | LC | ... | BT | LD | LC | ham or spam |

... | ... | ... | ... | ... | ||||||

Card15 | Title | Content | BT | LD | LC | ... | BT | LD | LC | ham or spam |

Message Label | ham or spam |

^{a}BT, LD and LC are abbreviations for button text, link domain and link content respectively.

Ham | Spam | Total | |
---|---|---|---|

Total number | 381 | 59 | 440 |

Percentage | 86.6% | 13.4% | 100% |

Training Set | Testing Set | Total | |
---|---|---|---|

Total number | 293 | 147 | 440 |

Percentage | 66.6% | 33.4% | 100% |

Model | Actual | Predicted | Predicted % | AUC | ||
---|---|---|---|---|---|---|

The proposed method | Ham | Spam | Ham | Spam | ||

Spam | 3 | 20 | 13.0% | 86.9% | 0.965 | |

Ham | 122 | 2 | 98.4% | 1.6% |

Model | Class | Accuracy | Precision | Recall | F-Measure |
---|---|---|---|---|---|

The proposed method | Ham | 0.965 | 0.983 | 0.976 | 0.979 |

Spam | 0.870 | 0.909 | 0.889 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Gao, B.; Zhang, W.
A Method of Combining Hidden Markov Model and Convolutional Neural Network for the 5G RCS Message Filtering. *Appl. Sci.* **2021**, *11*, 6350.
https://doi.org/10.3390/app11146350

**AMA Style**

Gao B, Zhang W.
A Method of Combining Hidden Markov Model and Convolutional Neural Network for the 5G RCS Message Filtering. *Applied Sciences*. 2021; 11(14):6350.
https://doi.org/10.3390/app11146350

**Chicago/Turabian Style**

Gao, Bibu, and Wenqiang Zhang.
2021. "A Method of Combining Hidden Markov Model and Convolutional Neural Network for the 5G RCS Message Filtering" *Applied Sciences* 11, no. 14: 6350.
https://doi.org/10.3390/app11146350