Food safety event detection is a technique used to discover food safety events by monitoring online news. In general, a set of keywords are extracted as features to represent news, and then the news is clustered to generate events. The most popular method for news feature extraction is Term Frequency-Inverse Document Frequency (TF-IDF), however, it has some defects such as being prone to the “dimension disaster”, low computational efficiency, and a lack of semantic information. In addition, Latent Dirichlet Allocation (LDA) is also widely used in news representation. Despite its low dimension, it still suffers from some drawbacks such as the need to set a predefined number of clusters and has difficulty recognizing new events. In this paper, a method based on multi-feature fusion is proposed, which combines the TF-IDF features, the named entity features, and the headline features to represent the news. Based on the representations, the incremental clustering method is used to cluster the news documents and to detect food safety events. Compared with the traditional methods, the proposed method achieved higher Precision, Recall, and F1 scores. The proposed method can help regulatory authorities to make decisions and improve the reputation of the government, whilst reducing social anxiety and economic losses.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited