Query Recommendation Using Hybrid Query Relevance

With the explosion of web information, search engines have become main tools in information retrieval. However, most queries submitted in web search are ambiguous and multifaceted. Understanding the queries and mining query intention is critical for search engines. In this paper, we present a novel query recommendation algorithm by combining query information and URL information which can get wide and accurate query relevance. The calculation of query relevance is based on query information by query co-concurrence and query embedding vector. Adding the ranking to query-URL pairs can calculate the strength between query and URL more precisely. Empirical experiments are performed based on AOL log. The results demonstrate the effectiveness of our proposed query recommendation algorithm, which achieves superior performance compared to other algorithms.


Introduction
As the number of web pages keeps expanding, it is progressively difficult to get useful information which can satisfy user's needs based on original search queries [1].Thus, users rebuild a new query that is similar to the original search query and is closer to the user's search intentions.We can see some examples in Table 1.For example, when users input a new query "apple" to a website, they do not get their useful information.Thus, the search engine will provide a series of new queries, e.g., "apple website" and, "iPhone".In such a way, users can choose a new query to search relevant information and quickly get what they want.Input queries are usually too short and ambiguous to express the true idea.So, understanding the query and mining intention are the key steps.A query log [2] is an important resource to mine user search behavior.The user submits a query to a search engine that leads to a series of information in the query log.The sequences of queries issued by a user within a short time have same intention.A session is defined as sequences of queries that are submitted to satisfy the same intention.Therefore, query co-occurrence in the same session has query relevance and can be used to produce a recommendation.Only using query co-occurrence is prone to data sparsity and loss of much useful information in a query log.
Clicked URLs in a query log have been used in query recommendation [3].Clicking behaviors show query intention, to some extent.For instance, when the user submits the initial query "apple" to the search engine, its search aims to find the "iPhone official website".If the URL of "Apple's official offer" clicked by the user is considered in the query recommendation, the next recommended query will have the information of iPhone, which is closer to the user's real search intention.Accordingly, the clicked URLs can reveal the user's search intention.Query semantics is also an important factor to understand queries.Both query information and URL information are based on counting the number, and lacking query semantics.To better understand query intention, we propose a query recommendation method.The model is shown in Figure 1.In our model, we mine query co-occurrence from query log and use query semantics to calculate query relevance.At the same time, we calculate the query relevance by query-URL pairs, adding the ranking of URLs.We can obtain hybrid query relevance, combining the relevance of query information and URL information.
The three major contributions of this paper are summarized as follows: (1) Solely mining query information from a query log can obtain little useful information and cause data sparsity.Therefore, we use the corpus to train a query embedding vector, getting query semantics to expand the query information and improving the accuracy of the relevance between queries.(2) We combine the number of clicked URLs and the ranking of URLs in the web pages to calculate query relevance.The two different queries are more similar when they have the numbers of the same clicked URLs.At the same time, the ranking of the URL in the web page is higher; the URL is more related to the query.(3) We calculate the hybrid query relevance by query information and URL information.Queries in a session have same query intention.The clicked URLs can more accurately understand query intention.Comprehensive consideration of the query information and query-URL pairs is an effective way to understand the user's intention.

Related Work
Much research has explored the area of query recommendation based on query logs [4][5][6].Chen et al. [4] proposed a query suggestion method by constructing struggling flow graph to identify the struggle phrases and mine effective representation based on query log.Zahera et al. [6] proposed a method based on clustering processes in which groups of semantically similar queries are detected.A query log has lots of queries which can be used to understand query search intention.Mining query information in a query log has been discussed.Boldi et al. [7] presented query flow graph model (QFG).In a QFG, each node represents a query and each edge between two nodes shows that they are consecutive in a session.Assignment of score values to each query permits use of random walk.It extracts the relationship between queries.However, it has some limitations.On the one hand, only query information is employed in the QFG, but not the URL information and semantic information.The clicked URL in the log and query semantic information can better understand the query semantics and more accurately locate the user's query intention.On the other hand, a query is usually short, with an average of only two or three words.Moreover, some queries are ambiguous.Understanding the semantics of a query and the search intention is limited by the use of only query information in the QFG.Sordoni et al. [8] proposed novel hierarchical recurrent encoder-decoder architecture to account for sequences of previous queries.The queries in a session is training data, thus making the next query prediction contextual.Among these methods presented, some capture word-level representation [9,10], some described queries using different feature space [11], some learned the ranking to improve the accuracy of candidate queries [6].
Clicked URLs are important features to understand query intention.QUBIC [12] was proposed based on a query-URL bipartite graph.It extracts an affinity graph from the initial query-URL bipartite graph only using queries.The weights of edges in an affinity graph are calculated by a query-URL vector, capturing the similarity from query to query.A clustering algorithm [5] was proposed that can automatically mine query major subtopics from the query log, where each subtopic is represented by a cluster containing several URLs and keywords.Nevertheless, these pieces of research are about query-URL pairs for the query recommendation, and are not related to the URL ranking.Ma et al. [13] applied a union matrix which combines query-URL bipartite graph and user-query bipartite graph to learn low-dimensional latent feature vectors of queries and proposed a solution for calculating query similarity using those feature vectors.The query-product clickthrough bipartite graph [14] was proposed by search engine logs and specific domain features such as categories and products popularities.In those approaches above, mining URL information and features can gain query relevance.However, queries submitted to search engines also show the relation between queries.Ye et al. [15] proposed an efficient query suggestion method by calculating the bidirectional transition probability-based query-URL graph and making a strength metric of the query-URL edges.The query log is regarded as the main data in those approaches.However, log files are usually sparse, and there are no edges between many queries and URLs.Therefore, this is not enough for mining query relevance, which only uses a query log.
Existing work aimed to model query information or query-URL pairs to calculate the query relevance, while our method combines query co-occurrence and query semantics to calculate the relevance-based query information and combine the ranking of URL and query-URL pairs to calculate the relevance-based URL information.At the same time, the query information and URL information are considered to calculate the final query relevance.

Our Approach
In this section, we illustrate how to calculate the query relevance and recommend the related queries based on query information and URL information.Query relevance based on query information has two parts, namely query co-occurrence in a session and query semantics.The query relevance based on URL information is calculated by query-URL pairs adding ranking.The query recommendation algorithm is based on hybrid query relevance which combines query relevance-based query information and query relevance based on URL information.

Preliminaries
The users submit a query to the engine and click the returned pages.If users feel satisfied with the information, the search process ends; otherwise, the users submit a new query which has the same search intention as the initial query.Search engine records search behaviors to form a query log.A query log contains a UseID, issued queries, clicked URLs, the ranking of URLs and a timestamp.We can extract useful knowledge to improve the efficiency of query recommendation from this information.A format of a record in a query log, typically, is < user id , query, clicked URL, ranking, timestamp >.
A session means that user has a search intention.We consider a query session as a sequence of queries S = {q 1 , q 2 , . . ., q n } where n is the number of queries in S. One common way to gain the session from query log is to use a time threshold.We take 30 min as the time threshold for session segmentation according to previous work [16].White et al. [16] showed the probability of switching, for sessions of varying length, as measured by the number of queries in the session.It can be proved that 30 min is the best threshold for the session partition of the search log.

Query Relevance Based on Query Information
The query log in the search engine can be divided into different sessions.The queries in a session have the same query intention.We count the number of queries q i and query q j in the same session and q j submitted immediately after the query q j .We define the query pairs as a tuple, [q i , q j , f (q i , q j )].So the query log contains many query pairs.The query relevance can be determined by the following equations [7]: where Rel session (q i , q j ) denotes the relevance between query q i and query q j based on query co-occurrence in sessions.f (q i ) denotes the numbers which q i appears in the query pairs.f (q i , q j ) denotes the numbers that query pairs of (q i , q j ) appeared in the query pairs.Due to data sparsity, there are many missing values in calculating Rel session (q i , q j ).At the same time, we cannot accurately calculate the query relevance if we do not correctly distinguish whether the queries have the same search intention.Therefore, we use query semantics to expand query information by query embedding vectors, better understanding queries.Word2vec is a good way to train word vectors.The learning process of a vector by word2vec can be expressed as linear translations.For example, we can find the results of simply computing vector ("King") − vector ("Man") + vector("Woman") is very close to the vector of "Queen" [17,18].Therefore, taking the element-wise sum or mean of the word embedding over all words in the sentence also produces a vector with the potential to encode meaning [19,20].
The queries in the search log are usually short, averaging only two or three words.Therefore, we can get the query embedding vectors based on pre-trained word embedding vector by linear combination.Moreover, the word vector of each word in the query is easily obtained through corpus training.It is a time-saving method.
The calculation of query semantics can be divided into three steps (illustrated in Figure 2): First, each query can be seen as a set of words, represented as q = {q w1 , q w2 , . . ., q wn }, where q is query.The q w1 is the keyword in the query.
Second, we calculate the query embedding vectors by pre-trained word embedding vector [19] by the following equations: where word2vec(q wi ) denotes the word i in the query and n denotes the number of words in the query.Third, we calculate the relevance of query semantics between each query embedding vector via the following formulas: where v qi ,v qj denote the query embedding vector.x i , x j denote the value of the query embedding vector v qi ,v qj .m denotes the dimension of vector.
The query relevance based on query information can be obtained as follows: Rel query (q i , q j ) = αRel session (q i , q j ) + (1 where Rel query (q i , q j ) denotes query relevance based on query information, Rel sem (q i , q j ) denotes the relevance which calculated by query semantics, Rel session (q i , q j ) denotes the relevance where queries in same session, and α denotes weight.

Query Relevance Based on URL Information
Query information is an important factor for understanding query intention.The clicked URLs can also help us better understand query intention.The more the same URLs queries are clicked, the more relevance the queries gets.A higher ranking means that the URL is more important.We count the number of query clicks for each URL and get the average ranking, defined as a tuple, [Q, URL, C(Q, URL), Ave ranking ].Q is a set of queries: Q = {q 1 , q 2 , . . ., q t }.Where t is the number of queries.URL is a set of URLs: URL = {u 1 , u 2 , . . ., u h }.Where h is the number of URL.Ave ranking is the average ranking of each URL.For query q i in a set of Q, we can see the structure of query, clicked URLs and ranking in Figure 3. From Figure 3, we can see that query q i clicks different URLs, and each URL has a different ranking.Therefore, we calculate the average rankings of each URL by Equation ( 5).

Ave ranking (u
where Ave ranking (u i ) denotes the average ranking of URL i .|u i | denotes the total number of rankings in URL i .R u i denotes one of the rankings when query q i clicks the URL i .|N i | denotes the number of ranking in |u i |.
We combine the number of clicks and the ranking.However, in a query log, the higher ranking means that the URL is in front of the web page and the value of ranking is small.The strength between the query and URL can be calculated as follows: where C(q, u i ) denotes the number that query q clicks URL u i .|k| denotes the number of URL that query q clicks.Given the query and URL, we can get a t × h matrix S(s ij ) which shows the strength of query and URL.s ij denotes the strength of query and URL.t denotes the number of queries and h is the number of URL.The relevance of queries based on URL information can be measured using the cosine measure [21] as follows:

S(s
where S(s i ) denotes the ith row of the matrix S(s ij ), s ik denotes an elements in matrix S(s ij ), S(s j ) denotes the j th row of the matrix S(s ij ), s jk denotes an elements in matrix S(s ij ).

Hybrid Query Relevance
Not only can query information be used to understand search intentions, but also URL information.However, there still exists drawbacks that make obtaining comprehensive query relevance in depth inefficient.We define a hybrid query relevance which takes advantage of each method as follows: Rel(q i , q j ) = βRel query (q i , q j ) + (1 − β)Rel URL (q i , q j ) (9) where Rel(q i , q j ) denotes the hybrid query relevance, Rel query (q i , q j ) denotes the query relevance based on query information, Rel URL (q i , q j ) denotes the query relevance by URL information, and β denotes weight.The contrast experiments will be made to find out the optimum weight for parameter β in the experimental part.
When users input a query into a search engine, we use restart random walk [22] to recommend the query which is close to the input query.Random walk with restart is defined as equation [10].
where c is restart probability and W is the matrix of hybrid query relevance.e i stands for initial vector, The ith element is 1, the rest is 0. r i is scoring vector.
In the process of recommendation, the initial query, as a starting point, randomly selects an adjacent query with the initial query, and moves to the adjacent query.Then the current adjacent query, as the initial, queries and repeats the above process of random walk.Finally, we find the top queries to recommend to users that are similar to the initial query.

Results
In the section, we first introduce the data set and evaluation methods.Then we find the appropriate values of parameters α, β by gradually adjusting their weights.Last, we validate the performance of our proposed algorithm through several experiments which compare our algorithm with other algorithms.All the recommendation algorithms are implemented in Python 2.7 version on Windows 10 running on a PC with system configuration Intel Core i5 processor (2.40 GHz) with 8-GB RAM.

Experimental Data and Evaluation Methods
The data set used in this paper comes from search logs from AOL search engine from March to May in 2006 (http://www.researchpipeline.com/mediawiki/index.php?title=AOL_Search_Query_Logs).This collection consists of approximately 20 million web queries collected from approximately 6.5 million users over three months.We list a few examples from AOL log in Table 2.We use a 10-fold cross validation algorithm.The data set is randomly divided into ten parts.Each copy contains approximately 3,500,000 records and 800,000 sessions.We take 9 copies as a training set, and 1 copy as a test set each time.We repeat the experiments 10 times to get the mean value of the results.
The preprocessing of the training set involves three steps: first, we use the threshold of 30 min to divide the sessions to estimate whether the two queries have the same search target.Subsequently, www and other navigation vocabulary in the query are removed, which can reduce noise.Finally, we remove the edges between the queries less than five times and the edges between the queries and URL less than five times.
During the test, the total queries submitted after query q in the test set are determined and are considered part of a session with q to form a relevant query set.If the recommended query is in the relevant query set, it is considered successful.In this study, the first N queries are selected to evaluate the precision, recall, and F1-value.The precision, recall, and F1-value are expressed as follows: precision = the number of correct queries the number of total queries (11) Recall = the number of correct queries the number of total correct queries ( We do multi-group experiments for the value for parameter α.In the experiments, the parameter α is satisfied at more than 0 and less than or equal to 1, when α = 1 means that we do not add the query semantics.We change their value with interval of 0.1 in the experiment and observe the influence of the precision, the recall and the F1 measure.To get accurate results, we recommend queries from Top 5 to Top 50 with interval of 5 and calculate average value to get the final precision, the recall, and the F1 measure.The results are shown in Figure 4. From Figure 4, we can see that combining the query semantics can improve query recommendation results.When the parameter α is too small, that means the weight of query semantics is too big, and we cannot get good results.This is because some queries are ambiguous.At the same time, some queries are correlated, but they do not have semantic relevance.We can mine their relevance by query log.However, only using query pairs (α = 1) also cannot get good recommendation results.Because of the sparsity of data and the incorrect session partition, the relevance between many queries is missing.Therefore, we combine query pairs and query semantics.In Figure 4c, we can see that we can get better results compared with only using query pairs when the parameter α is larger than 0.3.We can observe that precision, recall, and F1 declined beyond 0.9.To get the optimal values of the α, we change the range of parameter to 0.02 and conduct multiple experiments.The results are shown in Figure 5.  Figure 5 suggests that precision, recall, and F1 decrease with the increase of parameter.α = 0.9 is the best result.Thus, we set the parameter α to 0.9 in the later experiments.

Selection of Parameter β
Parameter β is the weight to balance the query information.A large number of experiments have been done by changing their value with interval of 0.1.β is 1, which means we only use query information.We also use the precision, the recall, and the F1 measure to evaluate results.The queries are recommended from Top 5 to Top 50 with intervals of 5. We calculate the average precision, the recall, and the F1 measure.Figure 6 shows the results.
Figure 6a shows that the precision is not greatly improved when we add the URL information.Due to the data noise, the precision is lower than when only using query information sometimes.However, we can see the recall is greatly improved in Figure 6b.URL information can be used as complementary features to better understand user search intention.We can find the relevance of queries more widely by clicked URLs.That is why the recall is greatly improved.Based on an overall consideration of precision and recall, we can see that the recommendation efficiency is improved in Figure 6c.In Figure 7, we can observe that precision, recall, and F1 declined rapidly beyond 0.9.This is because the query submitted by the user is the best way to reflect the query intention, and the clicked URL can be the supplementary condition to better understand the query intention in the process of query recommendation.Therefore, the weight of query information is relatively large.β = 0.9 is the best result.Therefore, we set parameter β to 0.9 in the later experiments.

Evaluation of Efficiency
To examine the effectiveness of our approach, we compare the performance of the following algorithms: (1) QFG [7]: This is a query flow graph model extracting queries to count the number of query co-occurrences.(2) QUBIC [12]: This is a bipartite graph model using query information and URL information in logs to build a query-URL bipartite graph.(3) RW UQ [15]: This is a method calculating the bidirectional transition probability-based query-URL graph and making a strength metric of the query-URL edges.(4) CQM [6]:This is a method based on clustering processes in which groups of semantically similar queries are detected.(5) QRSR: Our method considers query information and URL information.The relevance based on URL combines the query-URL pairs with URL ranking which can more accurately calculate the relation between query and URL.
The precision of the different algorithms is shown in Figure 8. From Figure 8, we can see that our method has higher precision than other query recommendation algorithms.We cannot get wide and accurate relation of queries solely using query information or URL information.It is limited to understanding query intention.Query information and user behavior information can complement each other.Using query embedding vectors to represent query semantics can better understand query; using the ranking of a URL can improve the accuracy of the strength between query and URL.We combine query information and URL information to mine more relevance between queries which can more accurately understand queries and query intention.
Figure 9 shows the recall and F1-value on the different algorithms, respectively.In Figure 9a, the recall of our approach is compared with those of the other four methods.We can observe that as the number of recommendations increases, the recall rate of our method as well as other methods increases.However, our method has a higher recall than the other two methods.The results of F1-value are shown in Figure 9b.Its trend is the same as that observed in Figure 9a.

Conclusions
In this paper, we presented a query recommendation algorithm to understand search intention by using both query information and URL information.The query semantics was used to calculate query relevance-based query information.Using the ranking of URL can better measure the strength between query and clicked URL.Experiments based on an AOL log suggest that our method has higher precision in query recommendation.In future work, we will mine other information in the search log to improve recommendation results, which can be closer to the query intention.

Figure 2 .
Figure 2. The process of calculating query semantics.

Figure 3 .
Figure 3.The structure of query, clicked URLs and ranking.

Figure 4 .
Figure 4. Selection of Parameter α.(a) Description of Precision in the first panel.(b) Description of recall in the second panel.(c) Description of F1 in the third panel.

Figure 5 .
Figure 5. Selection of Parameter α.(a) Description of precision in the first panel.(b) Description of recall in the second panel.(c) Description of F1 in the third panel.

Figure 6 .
Figure 6.Selection of Parameter β.(a) Description of Precision in the first panel.(b) Description of recall in the second panel.(c) Description of F1 in the third panel.To get the optimal values of the β, we also change the range of parameter to 0.02 and conduct multiple experiments.The results are shown in Figure7.

Figure 7 .
Figure 7. Selection of Parameter β.(a) Description of precision in the first panel.(b) Description of recall in the second panel.(c) Description of F1 in the third panel.

Figure 8 .
Figure 8. Compare precision with other algorithms.

Figure 9 .
Figure 9. Compare recall and F1 with other algorithms.(a) Description of recall in the first panel.(b) Description of F1 in the second panel.

Table 1 .
Example of query recommendation.

Table 2 .
Examples of AOL log.