BiTTM: A Core Biterms-Based Topic Model for Targeted Analysis

: While most of the existing topic models perform a full analysis on a set of documents to discover all topics, it is noticed recently that in many situations users are interested in ﬁne-grained topics related to some speciﬁc aspects only. As a result, targeted analysis (or focused analysis ) has been proposed to address this problem. Given a corpus of documents from a broad area, targeted analysis discovers only topics related with user-interested aspects that are expressed by a set of user-provided query keywords. Existing approaches for targeted analysis suffer from problems such as topic loss and topic suppression because of their inherent assumptions and strategies. Moreover, existing approaches are not designed to address computation efﬁciency, while targeted analysis is supposed to provide responses to user queries as soon as possible. In this paper, we propose a core BiT erms-based T opic M odel (BiTTM). By modelling topics from core biterms that are potentially relevant to the target query, on one hand, BiTTM captures the context information across documents to alleviate the problem of topic loss or suppression; on the other hand, our proposed model enables the efﬁcient modelling of topics related to speciﬁc aspects. Our experiments on nine real-world datasets demonstrate BiTTM outperforms existing approaches in terms of both effectiveness and efﬁciency.


Introduction
Topic modelling as unsupervised learning has become a prevalent text mining tool for discovery of hidden semantic structures in a text body. Given a collection of documents, most of the existing topic models perform a full analysis to discover all topics occurring in the corpus. However, it was recently noticed [1] that in many situations users are interested in focused topics related to some specific aspects only. For example, given a set of Amazon product reviews, a user might be interested only in bedding products. A conventional topic model performing full analysis will identify all topics from the entire corpus such as "furniture", "food" and "clothing". Although the topic of "furniture" is related to the user interested aspect of "bedding products", it is too coarse as the user might be more interested in fine-grained topics like "bed frames" and "mattress". As a result, targeted (or focused) analysis is proposed by Wang et al. [1] to discover topics relevant to targeted aspects only. Particularly, given a corpus of documents from a broad domain and a set of user-provided keywords representing user-interested aspects, targeted analysis aims to discover topics related with the queried aspects only.
Methods for targeted analysis can be generally categorised into two groups: (1) conventional topic models incorporating filtering strategies and (2) specialised topic models. However, methods of both categories suffer from problems such as topic loss and topic suppression, because of the limitations of their respective assumptions and strategies.
For algorithms in the first group, both pre-filtering and post-filtering strategies can be adopted to empower full-analysis topic models to find topics related to queried aspects. Basically, the pre-filtering strategy retains only documents containing the query keywords and extracts topics from the retained "partial data". The quality of the discovered topics thus heavily depends on user-supplied query keywords. If the keywords are not appropriate or comprehensive enough, many relevant documents will be filtered, which incurs a significant topic loss. For example, if a user provides "bath" as a query keyword, documents without the keyword but containing the synonyms like "shower" and similar words like "bathtub" will be filtered although such documents are actually relevant. Consequently, there is a great possibility to lose topics if modelling from the retained partial data. A post-filtering strategy applies conventional topic models to identify first all topics in the corpus and then filter the topics that do not contain the query keywords in the results. However, as analysed in [1], such a strategy may result in topic suppression when the query keywords are infrequent in the database. Topic suppression means that topics related to the user interested aspect are suppressed by general topics.
For algorithms in the second group, TTM [1] is the first and the state-of-the-art. TTM is a sparse topic model designed to directly mine focused topics based on user-provided query keywords. TTM simulates two topic-word distributions: φ r for relevant topics and φ ir for irrelevant topics. It considers documents at the sentence level and introduces a variable r to indicate the status of a sentence (e.g., relevant or irrelevant). Words are then sampled from φ r or φ ir according to the sentence status. Although TTM can accomplish the targeted analysis to a certain extent, the effectiveness of TTM is handicapped by its scheme of processing at the sentence level and its assumption that each sentence focuses on only one topic. By considering sentences individually and separately, topic information between consecutive sentences may be lost, which results in inferior topic qualities and possible topic loss. By assuming that each sentence is related with only one aspect, it is very likely for TTM to mistakenly assign relevance status for sentences related with multiple topics, which is often the case for long sentences. The wrong assignment of sentence statuses will in turn lead to possible missing of meaningful topics.
A common challenge faced by algorithms of both categories is the computation efficiency, while full analysis of topics is largely performed offline, targeted analysis is more likely an online module that is supposed to respond to user queries as soon as possible. However, existing algorithms for targeted analysis, especially the post-filtering strategy and the specialised topic models, are not devised to address this issue. The pre-filtering strategy may gain efficiency by modelling topics from a reduced set of "partial data", but it achieves this at the cost of losing important topics.
To address the aforementioned issues, we propose a novel Core BiTerm-based Topic Model (BiTTM) for targeted analysis, which directly models fine-grained topics related to the queried aspect from a set of core biterms. Biterm, proposed in BTM [2], is a word-pair consisting of two different words that appear together in a fixed-size window and represent co-occurrence information. Improving biterms, we introduce core biterms as a set of selected biterms that have strong connections with query keywords. By modelling topics from the set of core biterms, BiTTM is expected to achieve better performance than existing specialised topic models in terms of the following aspects: 1. The existing specialised topic models for targeted analysis (i.e., TTM and AP-SUM [3]) process at either the sentence level or the word level so that the semantic information between consecutive sentences will be lost. In contrast, since a biterm may consist of two words coming from two successive sentences, information across the whole document can be captured by BiTTM to alleviate the issue of losing topics.
2. The TTM model samples relevance status at the sentence level which may be too coarse. When a sentence is related to multiple topics, it would be difficult to infer the relevance status of the sentence as a binary value. In contrast, the APSUM model [3] samples relevance status for individual words which may be too specific, because it cannot handle phrases that make sense when multiple words are considered together. Biterms, as a scheme in-between sentences and words, are expected to achieve more accurate inference of relevance status.
3. Existing specialised topic models do not have any finesse to accelerate the calculation without significant semantic information losing. Instead, BiTTM introduces a heuristic preprocessing based on core biterms for speeding topic modelling while alleviating information loss, which makes it a more pragmatic solution for targeted analysis according to user queries.
To comprehensively evaluate the performance of BiTTM, extensive experiments have been conducted on real-world datasets including short texts, medium texts and long texts. Moreover, we select a large number of targets with different word and document frequencies to explore the adaptability of BiTTM to various types of queries. The experimental results show that (1) BiTTM improves the quality of topics, alleviates topic losing, and outperforms baselines especially for query keywords of low frequencies; (2) the time cost of BiTTM is most outstanding and stable compared to those of the baselines, which demonstrates the high applicability of BiTTM on datasets with different characteristics.
The remainder of this paper is organised as follows. Prior research and related works are reviewed in Section 2. We provide technical details of BiTTM in Section 3, and discuss the experimental results in Section 4. Finally, Section 5 closes this paper with some conclusive remarks.

Related Work
In this section, we introduce works related to our research in three parts. Firstly, we review existing specialised topic models for targeted topic analysis. Secondly, we describe the model of BTM that introduces the concept of biterms for topic modelling. Thirdly, we discuss other topic models relevant to our proposed BiTTM.

Targeted Topic Models
Specialised topic models for targeted analysis are still rarely seen, which are mainly used for information retrieval [4], abstract extraction [3,5] and opinion mining [1,6]. TTM [1] and APSUM [3] are the two most representative models.
Wang et al. first study the problem to detect relevant and user-concerned topics from a given dataset [1] and propose the model TTM as illustrated in Figure 1. The main idea of TTM is to introduce a relevance variable r to indicate whether a sentence is related with a specified aspect. The variable r determines whether each word in a sentence is generated by a related topic or an irrelevant topic. Moreover, the relevant topic-word distribution ϕ r is sparse because the number of words related to the target is usually less than that of the irrelevant words. The steps of generative process are illustrated as follows: Therefore, TTM considers the status r at the sentence-level. It is difficult to determine whether a sentence is related to the target when a sentence contains multiple topics. The wrong assignment of sentence status will negatively affect the quality of topics.
APSUM [3] is a generative aspect summarisation model designed for fine-grained summaries of online reviews. Compared with TTM, APSUM is different in terms of the following two aspects. Firstly, while TTM models the relevance at the sentence level, APSUM considers at the word level. As discussed in Section 1, the former might be too coarse to determine the relevance status for sentences appropriately; the latter is not able to handle phrases where it makes sense only when multiple words are considered together. Secondly, APSUM introduces an additional component called document aggregator to mitigate the issue of aspect sparsity, which refers to the circumstances where there are not enough text data related with specific aspects. The basic idea is to cluster similar documents through document aggregator and sample topics for documents at the document aggregator level.
Essentially, both TTM and APSUM try to identify potentially related words that can serve as bridges to link relevant documents, especially those without containing query keywords. TTM searches such related words from sentences containing query keywords; APSUM attempts from semantically similar documents through document aggregator. However, both models fail to capture the semantic information between neighbouring sentences, which does exist in natural language [7].

BTM
BTM [2] is a topic model for short texts. To alleviate the problem of insufficient information with short texts, this model extracts topics by modelling from biterms representing word co-occurrences. A biterm is an unordered word pair consisting of two different words in a fixed-length text window. BTM replaces documents with a biterm set that can reveal the correlation between words in depth.
The graphical model of BTM is depicted in Figure 2 with a generative process as follows:

For each biterm
BTM is designed for full analysis on short texts, while the concept of biterms is used by BTM to seize word occurrences to alleviate data sparsity, we borrow the idea in our targeted analysis model to capture words closely related with the query keywords provided by users.

Other Topic Models
Topic models have been widely studied and used in different applications, such as analysing public opinions and trends [8][9][10][11][12], providing personalised user services [13][14][15] and news tracking [16][17][18]. Among a wide range of existing topic models, in this subsection, we discuss two types of topic models that are related with our BiTTM.
Firstly, since we consider user queried aspects, our model is linked with topic models considering user information (e.g., user profiles and behaviours). In the literature, there are many topic models that take into account user information to obtain in-depth analysis [19][20][21][22][23][24]. For example, Viet et al. exploit users' browsing histories to propose a keyword-topic model [19] for contextual advertising. Kalyanam et al. [20] simultaneously consider textual data and user behaviours, such as forwarding and commenting, to explore the evolution of topics. Sordo et al. [21] consider the topological changes of users' co-authorship network to identify groups of researchers. Although these models can incorporate user information into topic analysis, they cannot extract fine-grained topics related with user-interested specific aspects.
Secondly, BiTTM is also connected with sparse topic models [25][26][27][28][29][30][31][32]. The notable feature of these models is the consideration of distribution skewness that can be divided into two categories. Firstly, a document is related with only a few topics among all topics available in the data set. Secondly, a topic involves only a small part of the dictionary. A lot of sparse topic models have been devised based on the two types of distribution skewness. For example, Williamson et al. [30] and Chen et al. [28] address the document skewness, while Wang et al. [33] take into account the topic skewness. Moreover, the method called "dual-sparse topic model" [25] implements both types of skewness simultaneously. Generally speaking, the sparsity is addressed by incorporating the "Spike and Slab" priors: the "Spike" is used to control the selections of words; and the "Slab" is used to smooth distributions to avoid ill-defined distributions where some words never appear. Our BiTTM also considers both document skewness and topic skewness through the spike and slab. Differently, we address the sparsity by taking into account user-interested aspects at the same time.

BiTTM
In this section, we describe BiTTM for efficient topic analysis of targeted aspects. In Section 3.1, we introduce the concept of core biterms and the process to generate core biterms. Sections 3.2 and 3.3 discuss the generative process and inference of BiTTM, respectively.

Core Biterms
Considering the user-specified aspect usually involves only part of the data, we believe data preprocessing is an indispensable step for efficient targeted analysis. However, existing specialised topic models perform directly on the entire dataset ignoring the efficiency issue. Existing methods incorporating pre-filtering strategies, as discussed before, achieve certain efficiency by modelling topics from a reduced data set; nevertheless, the reduced data set may lose relevant documents if the targets are not expressed appropriately or comprehensively. For example, Table 1 enumerates three situations where query keywords may easily be incomplete, resulting in possible loss of relevant documents and topics.

Example 2 Domain restriction
Two candidate query keywords, "crib" and "bed", in the Amazon review dataset Baby.

Example 3 Event Description
Two candidate query keywords, "mistake" and "osarsfail", in the Twitter dataset Oscars.

1.
Synonyms. For example, if the supplied query keyword is "bath", relevant documents containing words representing similar semantics, such as "shower", may be missed.

2.
Words referring to the same targeted aspect in a particular domain. For example, when the domain is confined to Amazon reviews of baby products, the keywords "crib" and "bed" represent the same aspect, although they are not exactly synonyms.

3.
Words describing the same event. Users often use diverse words to refer to the same event, especially in social networks. For example, considering the Twitter dataset of Oscars, both "mistake" and "oscarsfail" are used to describe the event of a wrong envelope for the Best Picture Award.
To address the aforementioned issues, we propose an efficient data preprocessing method based on core biterms.
As introduced in BTM [2], a biterm consists of any two distinct words in a fixedlength window so that it captures the co-occurrence information in the document. As the window may span two or more sentences, the semantic information between consecutive sentences can be captured. Compared with TTM and APSUM, processing at the level of biterms addresses potential loss of information between successive sentences. Therefore, we consider biterms as the base unit of our preprocessing.
To handle the situations exemplified in Table 1, we consider to use "core words" to complement query keywords so that relevant documents that do not explicitly contain query keywords can be considered. Intuitively, if core words represent the same aspect indicated by query keywords, they should appear together with query keywords very often. Hence, we first extract "core words" that frequently co-occur with query keywords from biterms, and then extract frequent biterms containing core words as "core biterms". The algorithm is illustrated in Algorithm 1, which can be summarised in three steps as follows: Step 1: Calculate the desired size of the set of core words, scw, and rank all biterms ∈ B all in descending order according of frequency (Lines 1-2).
Step 2: Acquire core words from top frequent biterms containing target, and then calculate the average frequency of biterms containing core words as threshold (Lines 3-15).
Step 3: Select core biterms according to two conditions. Firstly, the biterm has at least one core word. Secondly, the frequency of the biterm has to be greater than threshold (Lines [16][17][18][19][20]. We will then model targeted topics from the generated core biterms, which yields a threefold benefit as follows: (1) the context information between neighbouring sentences is preserved; (2) sampling relevance status based on biterms is more accurate; and (3) modelling topics from core biterms is more efficient.

Algorithm 1: Preprocessing based on biterms
Input: size of dictionary W, biterms B all Output: biterms B core with semantic links to the target

Model Description & Generative Process
In this subsection, we describe the model and the generative process of BiTTM. Table 2 lists the notations used in this paper.
The generative process is as follows: Graphical representation of BiTTM is shown in Figure 3. Following the above procedure, the generative process can be summarised into three parts. Firstly, we draw two global parameters θ and φ ir . The former is a topic distribution which models on the entire corpus instead of one document, and the latter is a topic-word distribution of irrelevant topic. In other words, two words in an irrelevant biterms are drawn from only one irrele-vant topic. Secondly, φ r k is drawn for each target-relevant topic k ∈ {1, 2, · · · , K}. Please note that two smoothing parameters, smoothing prior δ and weak smoothing prior , are used for dual-sparsity [25]. Thirdly, status r of b i is determined by both target indicator x and Bernoulli(π b ). According to two different types of status, relevant or irrelevant, we draw a word from φ r or φ ir .

Notation Meaning
B the set of (core) biterms W the set of words D the set of documents π b the bernoulli distribution over biterm b γ, β ir , α beta prior of π b , Dirichlet prior of φ ir , θ φ ir topic-word distribution over the irrelevant topic φ r k topic-word distribution over the kth relevant topic p, q beta prior of ω δ, word smoothing prior, weak word smoothing prior ω k bernoulli distribution of word selector β r k x, w, z, r target indicator, word, topic, status β r w|k , β r * |k word selector of word w under topic k, the sum of word selector β r w|k . Different from the generative process of BTM, BiTTM draws two topics for a relevant biterm and each word in the biterm may be assigned a different topic. The reason why we choose this strategy is that it is inappropriate to assume that the two words in a biterm share the same topic for targeted analysis, while for BTM, it is probably sufficient to draw one topic for a biterm as it is a full-analysis model which aims to mine coarse-grained topics.
Here is an example to elaborate the difference between full-analysis and targeted analysis. When dealing with the biterms b 1 (battery, larger) and b 2 (lens, larger), BTM is prone to assign the same topic to the two biterms because of the shared word "larger". This allocation might be fine for full analysis since it does not pursue fine-grained topics so that it is not necessary to distinguish between "battery" and "lens". However, for targeted analysis, "battery" and "lens" represent two different aspects and should be recognised distinctively. Note that, although the sampling process of the two words in a biterm is independent to each other, their combined effects determine the status (i.e., relevant or irrelevant) of the biterm.

Inference
Following BTM [2] and TTM [1], we choose Gibbs Sampling [34] to infer the model parameters. All notations used in this section are shown in Table 2.
We first sample the status of every biterm. Intuitively, if a biterm contains a query keyword, then it is relevant with the target aspect. Let d be a binary variable and d = 1 indicates a biterm contains the keyword provided by users. Then, we define the probability that a biterm is relevant as Otherwise, we define the probability as shown below: Next, we sample word selector β r w for all words w ∈ W. Applying Gibbs Sampling similar to TTM [1], we can obtain the equation P(β r w |β r −w , w, δ, , p, q|) ∝ P(β r , w|δ, , p, q). Then, , the probability of sampling k as the topic for w i,m can be computed as Equation (3).

Baselines and Metrics
Baselines . Three methods are chosen to be compared with BiTTM, including Targeted Topic Model (TTM), Biterm Topic Model-Partial Data (BTM-PD), and Biterm Topic Model with a post-filtering strategy (BTM ).
• TTM. Targeted Topic Model is the first method for focused analysis that extracts related topics according to a target keyword provided by users. We select TTM rather than APSUM as the baseline of specialised topic models for targeted analysis because TTM outperforms APSUM in terms of topic coherence when the number of topics is less than 50 [3]. For targeted analysis of fine-grained topics, we believe the number of topics in a given corpus is usually less than 50. Moreover, TTM serves as the most valuable comparison because APSUM is not exactly designed for targeted analysis. • BTM . As our model is developed based on biterms, we also compare with two variations of BTM that are adapted for targeted analysis. BTM is a state-of-the-art topic model for short texts, which also applies to long texts [2]. As a typical fullanalysis model, BTM aims to find all topics (or all aspects) from the entire corpus. We then use a filtering strategy to eliminate topics that do not contain the target keywords. This approach is named as BTM for simplicity.
• BTM-PD. This is another variation of BTM which applies the pre-filtering strategy to perform focused analysis. We use only the subset of documents containing the target keywords to model topics. As discussed before, the pre-filtering strategy is handicapped by the variability of target keyword-relevant documents may be filtered so that topics may be missed out.

Metrics.
We adopt two techniques to evaluate the quality of topics: topic coherence [35] and precision@n [1] (P@n for short). The former is a popular evaluation method to evaluate the quality of discovered topics [36][37][38][39][40]. As an automated evaluation metric, topic coherence mainly measures the interpretability of topics instead of target-relevance. More specifically, topic coherence measures document-level mutual information of keywords in topics, however, it does not reflect the relationship between topics and targets. In order to evaluate whether topics are target-relevant, we employ the metric P@n, also used by TTM [1], which is an evaluation based on human judgment to assess the relevance between the target and topics.
Considering the M most probable words in topic k, the topic coherence of k is defined as Equation (7).
where |{doc(w k,m , w k,l )}| is the number of documents containing both w k,m and w k,l ; |{doc(w k,l )}| is the number of documents containing w k,l , and w k,l is the lth most probable word in topic k. Basically, for the mth probable word, the measure considers its co-occurrence with the m − 1 more probable words. A smoothing count of 1 is added to avoid leading the logarithm to zero. Basically, the more the measure approximates to zero, the more coherent the discovered topics are. Given the set of topics discovered by all models, suppose there are Ku topics that have been verified by users to be related with the target aspect. Moreover, from all topics discovered by a particular model m, suppose there are Km topics related with the target. Then, the precision of model m at rank position n is defined as follows: where |{CorrectWords(z)}| is the number of words, among the top n words of topic z, which are relevant to the target (Note that, if a discovered topic is potentially related with multiple semantic topics, the best semantic topic based on the top 20 words will be adopted). Therefore, the two evaluation methods have different merits and objectives. For example, topic coherence is an automated evaluation metric reflecting the interpretability of topics. P@n demands human judgement and assesses the relevance between the discovered topics and the queried target. For the sake of fairness, we use P@n to evaluate all comparing models (i.e., BiTTM, TTM, BTM-PD and BTM ) to find out the effectiveness of the models in performing focused analysis. However, we only compare BiTTM and TTM in terms of topic coherence to evaluate the topic quality since the other two models are variations of BTM, which is essentially designed for full-analysis of topics.

Data sets & Experimental Settings
Data sets. In order to comprehensively evaluate the performance of our proposed model, we conduct experiments on different types of text. In particular, three types of documents are considered, including short, medium and long texts. For each type of documents, we select three data sets. The description of the nine datasets used in our experiments is provided in Table 3. The datasets are all publicly available at the URLs listed in the bottom of Table 3.

Experimental Settings.
In our experiments, we use various words as target queries to analyse the influence exerted by diverse targets on performance. For parameter settings, we follow the hyper-parameter setting in TTM: α = γ = 1, β ir = 0.001, p = q = 1, and the two smoothing priors are set as δ = 0.001, = 1 × 10 −7 . Other baselines follow the parameter settings in their respective papers.

Quantitative Evaluation
In this subsection, we analyse the quality of discovered topics from two aspects: topic coherence (representing topic interpretability or semantic coherence) and P@n (indicating topic relevance).
Analysing the results of topic coherence: The average topic coherence achieved by BiTTM and TTM is shown in Table 4, the more the score approximates to zero, the more coherent the discovered topics are. As we can see from the table, BiTTM is not comparable to TTM for analysing short texts in terms of topic coherence. However, with the increase of document length, BiTTM starts to outperform TTM. The reason why BiTTM generally works better than TTM on medium and long texts is because TTM is a sentence-based model for which the information between consecutive sentences will be lost. In contrast, by considering core biterms that may come from neighbouring sentences, our BiTTM model captures the semantics crossing sentences so that more interpretable topics can be generated. However, since it is quite often for a short text document to contain only one sentence, the limitation of sentence-based TTM cannot be reflected. Generally, by beating TTM on non-short text documents, BiTTM has a broader applications in text data analysis.
To evaluate the model performance with respect to different queries, we randomly sample query keywords from the documents according to word frequency distributions. We plot the comparative results of BiTTM and TTM in Figure 4, where the horizontal axis represents the word frequency of the target keyword, and the vertical axis indicates the percentage of documents containing the target. There are three types of symbol in the figure: red dots, green squares and blue triangles. Each symbol corresponds to a comparison between the topics discovered by BiTTM and TTM with respect to a query. In particular, a green square means BiTTM obtains a better topic coherence than TTM for this query, while a blue triangle implies the opposite. For a red dot, it indicates that TTM fails to discover the specified number of topics or words under some topics for this particular query. For example, we set the number of topics to 5 for the experiments in Figure 4 and consider the top 10 words for each topic. However, TTM discovers less than 5 topics or less than 10 words for a topic when handling queries corresponding to red dots. Note that, this situation does not happen for BiTTM.   The most obvious trend that can be observed from Figure 4 is that the red dots usually appear in the lower left corner, the blue triangles gather in the upper right corner, and the green squares fall in between. The red dots in the lower left corner imply that TTM is prone to miss out topics when dealing with infrequent targets. The blue triangles in the right corner suggest that TTM performs better when the targets appear very frequently in many documents. However, the number of such target keywords may be limited. On the contrary, BiTTM achieves satisfactory performance for a diverse range of targets even if they are infrequent in the corpus. This also verifies the effectiveness of using core words to enrich the semantic information in the context of the target keyword (i.e., BiTTM strategy) than taking words in same sentences as bridges to connect potentially target-related words (i.e., TTM strategy).
Analysing the results of P@n: To calculate the measure of P@n, similar to TTM, three human labelers familiar with the data sets are engaged to label the results. The P@n values at the rank positions of 5, 10 and 20 are reported in Table 5, from which several interesting outcomes can be observed. Firstly, the performance of the two variations of BTM (i.e., BTM-PD and BTM ) is generally worse than that of the two specialised topic models (i.e., BiTTM and TTM), which demonstrates that full-analysis topic models with filtering strategies are not suitable for targeted analysis because they are prone to detect general topics instead of fine-grained target-related topics. In addition, comparing the two BTM variations, BTM-PD is better than BTM in most cases, which proves that the pre-filtering strategy is more effective in removing irrelevant words than the post-filtering strategy. Secondly, the average P@n of BiTTM achieves a gain of more than 10% compared with TTM, and more than 26% compared with BTM-PD, over all queries in the table and the settings of n. Moreover, the performance difference among the three types of document is not significant, whereas the different target queries have influence on the P@n results, which will be explained later using concrete examples. Thirdly, TTM is the second best model for P@5. However, for P@10, TTM achieves the best performance than all other models for some queries. It suggests the tendency of TTM to put target-related words in lower-ranked positions. Table 5. P@n scores of all models over a set of 18 targets on 9 data sets. n is set to 5, 10 and 20.

Type
Datasets Targets  BiTTM  TTM  To explore the influence of different queries, let us take a closer look at two specific targets: "ashtray" (in the short-text data set "cigar") and "rinses" (in the medium-text data set "baby"). As shown in Figure 4, both targets are infrequent words (appearing in the lower left corner) in respective datasets. However, the P@n scores of BiTTM and TTM for the two queries, as shown in Table 5, are remarkably different. Basically, both models perform well with respect to "ashtray" but not with respect to "rinses", especially for TTM. The P@n score of TTM for handling "rinse" is unsatisfactory and several inexplicable words, such as "attention" and "entertain", appear in the discovered topics, which makes it hard to interpret the topics. By examining the datasets, we find that documents containing "ashtray" consistently describe the appearance of ashtrays such as colours and materials. That is, the documents are pretty clean and relevant, which explains why both BiTTM and TTM process the query well. Nevertheless, the documents containing "rinses" are mostly composed of short sentences, such as "It rinses out well and dries quickly." and "Rinses/Washes easy.", where the meaningful descriptions are hidden in the context of sentences containing "rinses". TTM cannot handle this situation since it is a sentence-based model. The two examples explain why the performance varies with respect to query keywords.
Comparing the performance in terms of topic coherence and P@n, we notice that BiTTM is more capable to acquire topics related to the target (i.e., high P@n scores) than to generate semantically coherent topics (i.e., better topic coherence values), especially for short text documents. This is because words related to the target do not necessarily have high co-occurrence, which is used to calculate topic coherence. For instance, "Oktoberfest" is an appropriate word related to the target "place" in the dataset cigar because a type of cigar named Quesada Oktoberfest is released in October for celebrating the famous Germany beer festival. However, "Oktoberfest" as a low-frequency word can not provide enough mutual information, which directly causes the poor performance in topic coherence. Conversely, a high-frequency word "rolled" contributes to high topic coherence score but it is not selected by BiTTM since it is too general to describe the target "place".

Time Efficiency Analysis
As mentioned before, it is ideal for targeted analysis to provide responses to user queries as soon as possible. Therefore, in this experiment, we analyse the time efficiency of the comparative models.
The average time cost of the four methods on each dataset over 40 random queries is shown in Table 6. It can be observed that, generally, BiTTM has the best time efficiency, followed by BTM-PD. TTM is significantly slower without any preprocessing strategy, and BTM is the most inefficient model since BTM performs full analysis on the complete dataset. To clearly demonstrate the impact of data size on the time efficiency, we plot the results in Figure 5 where the grey bars denote the size of datasets and the polylines in different colours indicate the time cost of different methods. Note that, since the time consumption of BTM is not comparable to the others, only three models (i.e., BiTTM, BTM-PD and TTM) are displayed in the figure. It can be observed that, generally, the time cost of all methods increases with respect to the increment of data size. However, the size of dataset has a greater impact on TTM than the other two methods, which shows that TTM is not suitable for processing large data sets. In contrast, BiTTM and BTM-PD have a better capability to adapt to large data sets. For these two methods, the difference of data size does not make dramatic changes to time consumption since they both have preprocessing strategies to focus on only the portion of data related to query targets. The difference between BiTTM and BTM-PD is that BiTTM is faster than BTM-PD especially when the length of documents increases. The reason is that the pre-filtering strategy adopted by BTM-PD is a simple and rough processing. It selects documents as long as they contain the query keywords. Consequently, irrelevant information contained by such documents will be included and processed as well, which negatively contributes to the time efficiency of BTM-PD. To illustrate the impact of document length on the time efficiency, the percentage histogram of time cost of BiTTM, TTM and BTM-PD is plotted in Figure 6, where the average document length increases from left to right. It can be observed that the time efficiency of TTM is worst on short texts. Recall that the topic coherence of TTM on short texts is better than BiTTM. This experiment shows that TTM achieves this by significantly sacrificing time efficiency, while the topic quality in terms P@n of TTM on short texts is also worse than that of BiTTM. Moreover, we can see that efficiency performance of BTM-PD is worse on long texts, compared to its performance on short texts. This is because BTM-PD is a biterm-based topic model and long texts generally have more biterms than short texts. Although BiTTM is also a biterm-based model, the strategy of selecting "core biterms" removes a lot of irrelevant biterms so that the performance of BiTTM on long texts is also promising.
Therefore, Figures 5 and 6 demonstrate that BiTTM can be widely applied to various types of text data, because both data size and document length have no great impact on its time efficiency, thanks to the core biterm-based preprocessing strategy.

Qualitative Evaluation
We present qualitative analysis of the result topics generated by comparative models in this subsection. We focus on evaluating from two aspects: performance of discovering as many fine-grained relevant topics as possible and performance of dealing with semantically approximate targets. For exemplified queries discussed in the following, we have shown their word frequency and document frequency in Figure 4.

Discovering Relevant Topics
We take the query "disease" in the dataset food as an example. Table 7 shows the topics discovered by the four comparative models, together with the top 10 words of each topic. The third row of Table 7 are the labels we assign manually to summarise the semantics of each topic, where SFA is the abbreviation for Saturated Fatty Acid. Words that do not semantically align with the topics are displayed in red. Compared with the topics discovered by BiTTM, all the other three methods fail to identify the topic prevention, which is clearly a relevant topic of "disease". Moreover, the two BTM variation models (i.e., BTM-PD and BTM ) miss out the topic risk. By taking a closer look, we find that this is because the two BTM models cannot distinguish between the two topics risk and research that are different delicately. In other words, the two BTM models discover a topic combining research and risk. This is understandable because BTM as a full-analysis topic model discovers general topics. TTM succeeds in discovering both research and risk, but the topic quality is poorer than that of BiTTM (e.g., there are more bold words in the two topics discovered by TTM, which means more inconsistent words in results of TTM). Therefore, BiTTM discovers more relevant and fine-grained topics than other models for this example query.
Consider the topic SFA that is discovered by all of the four models. Results of BiTTM clearly indicate that saturated fatty acids affect blood sugar and carcinogenesis, but the results of other methods are not satisfactory. For example, TTM tends to find out which foods (e.g., tart, chip and sweetener) have unsaturated fatty acids. BTM-PD and BTM focus on food ingredients (e.g., palm oil and protein). These results are not related with the target "disease" queried by users. Hence, the topic quality of BiTTM is better than that of other models as well in this example.

Handling Semantically Approximate Targets
When the targets supplied by users are semantically approximate, a set of similar relevant topics are supposed to be discovered. We further examine the performance of the comparative models in handling semantically approximate targets. In particular, we analyse two types of semantically approximate queries mentioned in Section 3.1: synonyms and diverse descriptions of the same event.
An example of the first type is shown in Table 8. We query the dataset "baby" with two targets, "bath" and "shower", which share similar semantics in the data set of Amazon reviews of baby products. A successful model should return similar topics. As shown in the table, BiTTM is the only model that can obtain the set of four meaningful topics for both queries, while other methods either miss topics or generate vague content for topics. For instance, except BiTTM, the other three methods fail to identify the topic blanket with respect to the query "bath", while TTM and BTM can retrieve the topic with respect to the target "shower". According to the results of BiTTM, we find that blanket is an important aspect of bath/shower, because most people will cover their babies with a blanket after a bath/shower. Hence, the topic blanket is an aspect in which users are interested. Ignoring an important topic hinders downstream analysis and applications, such as high-quality personalised services and commodity recommended systems. As another example, there are two topics discovered by BiTTM only: sentiment and protection. Checking the content of topic sentiment, we learn that users tend to associate emotional expressions (e.g., "have a nice time with daughter/son") when commenting on shower/bath products. This topic thus implies users' emotional polarity of products, which is important for applications such as user profiling, recommendation and public opinion monitoring. The topic protection describes safety products that can be installed in tubs or on faucets. The safety issue of bath is an important concern especially for baby products, and it is non-ideal for the other three methods to ignore this topic.
Moreover, we find that BTM-PD extracts only two topics for both queries and the content of the topics are too vague to understand (e.g., we are not able to assign semantic labels to the topics). There are six identical words between the two sets of top 10 words, which makes it very hard to distinguish between the semantics of the topics. The same situation occurs to BTM -there are two similar topics about "spout". For example, given the query "bath", the two topics have eight identical words in the top 10 words. The content of these two topics may be correct, but the information expressed is redundant. It is not useful to generate identical topics but increasing the difficulty of further analysis. Table 9 shows an example of the second type. Given the dataset Oscars, both "mistake" and "oscarsfail" refer to the same event that the Best Picture Award, which should belong to Moonlight, was wrongly presented to La La Land because of a wrong envelope. As we can see from the table, BiTTM can acquire three fine-grained relevant topics, which describes the process of the event development: At the beginning of the event, two guests present the Best Picture to La La Land, and no one was aware of the mistake. Many tweets emerge to talk about La La Land and express congratulations to the actors and the producer, which can be seen from the content of the topic beginning. Next, the error is corrected and the real winner is another movie Moonlight. Topic correction is a perfect interpretation of this stage. Note that, the top 10 words of this topic with respect to the query "mistake" contains the word "oscarsfail", which demonstrates the usefulness of the core biterms strategy used by BiTTM. The third topic discussion covers the discussion of the actors' reaction after this mistake has happened. In contrast, TTM only retrieves the topic discussion and the quality is not satisfactory. Some irrelevant words like Moana, another movie, appear in the topic. BTM-PD and BTM also discover only the topic of discussion with respect to the target of "oscarsfail", and the quality is low. For example, the word "documentary" which is not related with the two movies appears in results. Although the quality has improved with respect to the target "mistake", the two topics discovered BTM are too similar with 6 identical words in top 10.

Conclusions
Targeted topic modelling is an increasingly vital task due to the prevalence of texts on the web and the limit of users' interests. Compared with full-analysis topic models, such as LDA [41] and BTM [2], which are designed to discover all topics in a dataset, targeted analysis models aim to perform an in-depth semantic analysis to extract fine-grained topics about which users are concerned. In this paper, we propose a core biterm-based topic model for targeted analysis named BiTTM. Motivated by the fact that only part of the entire dataset is related with target aspects and the requirement to efficiently provide responses to user queries, a pre-processing mechanism is indispensable and core biterms related to target queries are proposed to be extracted (from neighbouring sentences) to preserve relevant information and to capture semantics across documents. Fine-grained topics are then modelled from core biterms where different topics are allowed to be sampled for each word in a biterm. Extensive experiments have been conducted to evaluate BiTTM, compared with the state-of-the-arts, in terms of topic coherence, topic relevance and time efficiency on nine real-world data sets including short texts, medium texts as well as long texts with respect to various query keywords randomly sampled from the corpus. The experimental results demonstrate that BiTTM outperforms existing models remarkably in terms of retrieving high quality topics relevant to targets and computation efficiency.
Future research should consider the potential effects of relevance in semantic space more carefully, for example, using multi-source semantic information to enhance the computational accuracy of relevance may significantly improve model performance. Recent studies [42][43][44][45][46] have shown that using word embeddings for topic modelling is potential for text analysis, and this may constitute the object of future studies.