Evaluating Retrieval Effectiveness by Sustainable Rank List

: The Internet of Things (IoT) and Big Data are among the most popular emerging ﬁelds of computer science today. IoT devices are creating an enormous amount of data daily on a different scale; hence, search engines must meet the requirements of rapid ingestion and processing followed by accurate and fast extraction. Researchers and students from the ﬁeld of computer science query the search engines on these topics to reveal a wealth of IoT-related information. In this study, we evaluate the relative performance of two search engines: Bing and Yandex. This work proposes an automatic scheme that populates a sustainable optimal rank list of search results with higher precision for IoT-related topics. The proposed scheme rewrites the seed query with the help of attribute terms extracted from the page corpus. Additionally, we use newness and geo-sensitivity-based boosting and dampening of web pages for the re-ranking process. To evaluate the proposed scheme, we use an evaluation matrix based on discounted cumulative gain (DCG), normalized DCG (nDCG), and mean average precision (MAP n ). The experimental results show that the proposed scheme achieves scores of MAP@5 = 0.60, DCG 5 = 4.43, and nDCG 5 = 0.95 for general queries; DCG 5 = 4.14 and nDCG 5 = 0.93 for time-stamp queries; and DCG 5 = 4.15 and nDCG 5 = 0.96 for geographical location-based queries. These outcomes validate the usefulness of the suggested system in helping a user to access IoT-related information.


Introduction
Data functions as a fuel that helps the Internet run.However, the profusion of data on the World Wide Web (WWW) is creating considerable problems for Internet users.In the past few years, the fields of Internet of Things (IoT) and Big Data have evolved rapidly and the combination of these two has enormously increased the growth of data.According to the predictions in recent studies, by 2020 there will be tens of billions of devices and sensors [1] that will generate 40 Zettabytes (40 trillion GB) of data [2].These interconnected devices are placed in real-time environments and play their role in assembling, communicating, and distributing data.For example, modern cars produced by different companies are equipped with multiple sensors.The use of these sensor-based cars helps in collecting facts such as sale by area, fuel efficiency, route finding, tracking of lost vehicles, or driver assessment.On the other hand, devices such as wearables (e.g., fitness bands) are collecting data for health monitoring, goal tracking, and user location.All the data produced by these devices are stored on clouds.The most difficult tasks are the use of data when it is on the move and extracting valuable information from them.This change in the field of computer science is changing many technologies.of searches at Google, Twitter, and LinkedIn.We then prepared 10 queries related to each of these topics with the help of students.After performing some processing on these 100 information needs, we selected 60 of them.We translated each of these information needs using an offline-made page corpus.After translation, the scheme executed the top-K translated queries on search engines and prepared a pool of documents.Afterward, the scheme calculated the score of text similarity among queries and web pages.Using this PRF-based scheme, we prepared a sustainable optimal rank list to measure the relative performance of the search engines.Lastly, the proposed PRF scheme performed boosting based on location and freshness of web pages to achieve a sustainable optimal rank.The proposed approach automatically figures out the relevance of web pages.To the best of our knowledge, this is the first study that examines the ranking performance of search engines for IoT-related queries.This study verifies the effectiveness of the proposed scheme for sustainable optimality with the help of evolution metrics based on discounted cumulative gain (DCG), normalized DCG (nDCG), and Mean Average Precision (MAP n ).In this work, we compare the proposed scheme results with two different search engines (Bing and Yandex).The experimental results showed that proposed scheme achieves satisfactory results for all the metrics used in this work.Table 1 below shows all the terms and abbreviations used in this work.Vocabulary related to geographic words The remaining paper is organized as follows.Section 2 explains earlier work related to the evaluation of IR systems.Section 3 describes in detail the proposed a scheme for automatic evaluation of search engines.Section 4 describes the dataset and evaluation matrix used in the proposed work.Section 5 describes the experimental results.Finally, we conclude the paper and discuss future work in Section 6.

Related Work
Relevance has a vital role in information sciences and plays a key part in various IR models [13].Human evaluation for checking web page relevance with given queries is a multifaceted procedure that needs harmonization of numerous intellectual jobs [14], which is difficult to manage for large systems such as a search engine.The relevance feedback from users proved very helpful in evaluating the relevance of returned web pages.Relevance feedback is a technique in which we first input a query and use their results for future queries; there are three kinds of relevance feedback [15].
Explicit Feedback (EF): This kind of feedback is usually started by assessors against a group of documents retrieved for a query (Relevance Judgment).It can be graded using numbers, characters, or binary (relevant or irrelevant) relevance systems.This kind of information is used in the query and documents.Usually, nDCG is used for this kind of feedback.
Implicit Feedback (IF): This is based on user behaviors (view, copy, paste, etc.); in this kind of feedback, the user directly assesses the relevance of web pages or documents.However, the user is not aware of this process.For example, dwell time shows the time spent by users on the web page.Blind Feedback (BF)/Pseudo Relevance Feedback (PRF): This is an automatic way of judging the relevance between query and documents.Using this method, we can find top-K relevant document among a set of results.This scheme consists of following steps: (1).
Take the results returned by initial queries (top-K); (2).
Select the top 20 or 30 terms from these documents using some scheme; (3).
Execute query rewriting and then match with returned documents to find the top relevant results.
The results of an earlier study [16] show that query expansion based on BF/PRF can find all the results that are missed by the initial query.This method relies on the choice of 300 to 530 expansion terms.The main problem in this scheme is that it can suffer from query drift because of too many terms, although their proposed scheme is able to achieve 7-25% effectiveness.Soboroff et al. [17] proposed a random-based mechanism to label the document as relevant or irrelevant.They randomly select documents against a query and then evaluate these.Using TREC-4 and -6, they showed that there was a low correlation among different relevance judgments for a document.They found 0.938 correlation score for selected ranking systems for TREC-4.Random selection was unable to achieve a high ranking on search engines, where it is difficult to find relevance judgments.In [18], the authors consider a page as an answer to the query if page title has query terms in it.Their approach is mostly successful for navigational queries and their scheme is based on query logs, which are not available nowadays.They perform their experiments using Open Directory Project (ODP), and used a Mean Reciprocal Rank (MRR)-based evaluation to compare the performances of search engines.The problem with their work is that ODP does not treat all languages equally.Mahmoudi et al. [19] perform same work for the Persian language.The study in [20] is based on finding the commonness or overlapping between a query and the results given by a search engine.The authors first select a group of queries and then execute these queries on the SSEs one by one to select a certain DCV.Then, they compare these results lists of a search engine with those of other search engines.The sum of the top results in common will be used as the score.This score is used for the ranking of results (reference counting).The evaluation matrix they used is based on R-precision and precision.Joachims et al. in [21] proposed a semiautomatic approach.With the help of workforce, they collected clicks data.However, this human source does offer relevance judgment.The hired workforce is used to send queries and clicking on results for any kind of information need.They used two search engines (Google and MSN search) and shows mixed results of both queries for a search initiated by them.In AWSEEM 2004 [10] authors gathered some 25 information needs from users and then prepared queries to represent the information needs.Once they had prepared the queries, they executed all of them on SSEs (All the Web, AltaVista, HotBot, Infoseek, Lycos, Msn, Netscape, and Yahoo).After executing all the queries, they prepared a pool of 200 results and re-ranked based on relevancy.They applied DCV on the re-ranked results and considered them relative to the need.Top-t = Top-s.Then, they calculated the correlation between Top-t and Top-s also between the manual and automatic scheme.The main drawback of their approach is the only used textual content to build Pseudo Relevance.In [22], the authors construct a training dataset based on user behavior.They used this for learning functions for IR.The implicit feedback from users proved very helpful in evaluating the relevance of returned web pages.This implicit feedback is then compared with explicit feedback.The authors used click-through information as a preference sign for results; while checking the relevance in their study, they found that those implicit relevance signals are less consistent.A study in [23] used user behaviors as a sign of relevance check.The authors considered distinct levels for a relevance model in their work.They consider saving and scrolling.Based on these actions of users, they decide the relevancy.This observation makes this approach powerful.However, it still needs human behavior, which is sometimes difficult to arrange.This approach is also very difficult to implement.In [11], the authors used data fusion based on voting such as Borda Count and Condorcet for ranking of results.They prepared a Pseudo Relevant set with the help of data fusion.They mixed results of the search engine using different methods and consider the top-t as relevant ones.Based on these top-t documents, they evaluate the effectiveness of IR systems.They also consider overlapping among search engine results.The study in [24] is about the classification of queries.They assume mostly for a single query that if users click on a single result, then this is a navigational query and otherwise, it belongs to other classes.The study in [25] measures search engine performances with the help of navigational queries and user behavioral data.For each navigational query, there is one right answer.They calculate the MRR measure.The problems with their study are that they used only navigational queries and they compare very few search engines.In [26], the authors combined different rankings for documents to find a new rank.They use a learning mechanism for finding ranking rules.They find results for queries using four selected methods.Then they cross-check these results for the learned ranking rules.The work in [27] uses the scheme of AWSEEM and combined it with PageRank and AlexaRank to improve the scores.The studies in [28,29] represent an effort to find trends based on an association between authors and institutions for IoT-related topics.The article [30] is a study regarding adaptive smart homes using user-created feedback and [31] discusses the evaluation of self-monitoring devices for clinical purposes.Singh and Dwivedi in [32] study the reason (e.g., only one document in the corpus or if a number of documents containing query terms and total documents are same) behind the failing of cosine similarity.After studying the impacts of these factors, they proposed enhanced from of vector space model to improve efficiency.In [33], Lewandowski et al. take randomly selected 1000 informational and 1000 navigational queries for a major German SE and compare their performance with Google and Bing.According to their results, Google is able to establish the precise responses in 95.3% of cases, while Bing only yields the right response 76.6% of the time for navigational queries.

Sustainable Optimal Rank Preparation for Relative Assessment of IoT-Related Searches
Most users, whether they are from academia or the general public, are interested in the knowledge of IoT-related aspects.However, because of their limited domain knowledge, they face difficulties while making queries.Therefore, they seek help from search engines to get the desired result.Modern search engines take seed queries as input and fetch related web pages based on some scoring criteria, ranking them accordingly.These rank scores are known as relevancy scores, which are calculated using different methods.While evaluating search engine performance, we evaluate the returned rank lists using measures such as MRR, DCG, nDCG, and MAP@n.
As we described in the previous section, the PRF is an automatic way to get the relevance between a query and documents.This section describes the proposed scheme for checking relevance, which is based on a PRF-related approach.The proposed scheme is shown in Figure 1.
In the proposed scheme, we use 60 seed queries against 10 selected topics.These seed queries are prepared with the help of a group of students.First, we need to build a page corpus for query rewriting.Therefore, the proposed PRF scheme executes these seed queries on two search engines, Bing and Yandex, and prepares a page corpus.For this purpose, we manually selected 10 pages for each seed query and stored them in the word vector form.Afterward, we re-write these seed queries with the help of the page corpus, select top-2 queries, and execute them on the two search engines.During our experiment, we found that this second execution provides more relevant pages than the seed query.Afterward, the proposed scheme stored the web pages using different document cutoff values (DCV) {5, 10, and 20} for each executed top-2 query.These stored pages are used to calculate the relevancy score based on text similarity and boosting parameters of each page.Using this calculated pseudo relevance score, we re-ranked the retrieved web pages and prepared the sustainable optimal rank list for PRF.Afterward, a comparison between the sustainable optimal rank list and the search engine results lists is performed.This is how the relative performance evaluation of search engine is conducted.We evaluate the performance with the help of DCG, nDCG, and MAP@n.
scoring criteria, ranking them accordingly.These rank scores are known as relevancy scores, which are calculated using different methods.While evaluating search engine performance, we evaluate the returned rank lists using measures such as MRR, DCG, nDCG, and MAP@n.
As we described in the previous section, the PRF is an automatic way to get the relevance between a query and documents.This section describes the proposed scheme for checking relevance, which is based on a PRF-related approach.The proposed scheme is shown in Figure 1.In the proposed scheme, we use 60 seed queries against 10 selected topics.These seed queries are prepared with the help of a group of students.First, we need to build a page corpus for query rewriting.Therefore, the proposed PRF scheme executes these seed queries on two search engines, Bing and Yandex, and prepares a page corpus.For this purpose, we manually selected 10 pages for each seed query and stored them in the word vector form.Afterward, we re-write these seed queries

Formulation of Information Need
In the past few years, the fields of the IoT and Big Data have evolved very rapidly, and their combination has increased the growth of data.Every other day, some new research or breakthrough appears on the web and a new company announces a new IoT product.Therefore, the topicality of IoT domains keeps changing every day.Thus, it is difficult to cover every query and topic in this research.
For this work, we selected the top 10 topics of IoT shown in the study by IoT ANALYTICS [9].These topics are listed in Table 2, according to the number of searches made on Google.We select 10 students and teach them about these topics for two months.Afterward, we ask them to prepare at least one query about each topic as an information need for submission on the search engine.Hence, in this way, these students initially prepare 100 information needs.After removing similar or converging queries, we selected 60 seed queries for the experiments.

Query Rewriting
The user queries given from inexperienced users are usually short and ineffective [13].After obtaining the seed queries, we expanded them in an offline manner using PRF.PRF usually used the terms for retrieval that are most frequent (term distribution) in the top-ranked relevant documents [12].For example, one student is interested in buying a smart light for his smart home.He or she will write a query about "smart light" on Bing or Yandex to search the Internet, as shown in Figure 2. According to Figure 2, for the Bing search engine, the top-3 results are not relevant to the information need and for Yandex top result is not relevant.Consequently, it can be assumed that the rank of results for the information need is not correct.Hence, the query should be rephrased to achieve better results.The proposed PRF scheme rewrites the seed queries using the page corpus; we predict that by doing so, better results can be fetched on top of the sustainable optimal rank list.Suitably, the PRF extends the seed query "smart light" with attributes such as "Company = TP-Link, Detail: Smart LED Light Bulb, Network type = Wi-Fi, Color = Dimmable White, Voltage = 50 W Equivalent, Price: $19.99."For the experimental work, we prepared an attribute corpus for all such information needs.The corpus stored each attribute for an IoT-related entity in a comma separated value text file.The number of attributes in each definition can be different.This work extends the queries by adding terms one after the other and making different candidate queries.

Execution of Queries
Once we formed this candidate query set by rewriting the seed queries, the execution of these rewritten queries on Bing and Yandex fetched the results list for different DCV values {5, 10, and 20}.This execution of queries gave us the pool of web pages used in selecting the top-K queries and top-URLS, and checking for similarity with the query.After fetching the results, every list about the executing query is stored in the database.For example, if the scheme executed a rewritten query for smart light that has the keyword "TP-Link" + "Smart LED Light Bulb" on Bing, it returns list as shown in Table 3.On DCV5, the shown five URLs are returned by Bing.In Table 3, the label shows the rank returned by the search engine to each URL.Among the returned resulting URLs, u1, u3 and u4 show the same link.In an analogous manner, we store the

Execution of Queries
Once we formed this candidate query set by rewriting the seed queries, the execution of these rewritten queries on Bing and Yandex fetched the results list for different DCV values {5, 10, and 20}.This execution of queries gave us the pool of web pages used in selecting the top-K queries and top-URLS, and checking for similarity with the query.After fetching the results, every list about the executing query is stored in the database.For example, if the scheme executed a rewritten query for smart light that has the keyword "TP-Link" + "Smart LED Light Bulb" on Bing, it returns list as shown in Table 3.On DCV 5 , the shown five URLs are returned by Bing.In Table 3, the label shows the rank returned by the search engine to each URL.Among the returned resulting URLs, u 1 , u 3 and u 4 show the same link.In an analogous manner, we store the results returned for each rewritten query.

Selecting Top-K Queries
After executing all the queries, we prepared a database of all the information needs.Then, PRF extracts the number of unique URLs and how many queries return that URL using the database.Subsequently, we re-ranked the URLs based on their frequency count.In this work, we selected the top-K rewritten queries using this frequency score, with the help of random walk on a query-URL bipartite graph.Bipartite graphs represent a mapping between the queries set Q and the URL set U. The scheme models the web search as a query-URL bipartite graph G = (Q, U, E), where nodes are divided into two separate sets: Q, the set of queries, and U, the set of URLs.Every edge in set E links a query and a URL.There is an edge between a query and a URL if the query fetches the web page.Using this many-to-many relationship among queries and URLs, PRF created a bipartite graph.In the proposed scoring scheme, each record has a query and URLs with their proper degrees.In a bipartite graph, the edges in E represent the confidence values of the query and pages in the search results.Figure 3 represents the proposed example of a bipartite graph.In this probabilistic retrieval model, the scheme fulfills an information need with a query set Q. Given an undirected graph G = (Q, U, E) with l = |Q| queries, m = |U| URLs, and n = |E| edges, the natural random walk starts with one of these random queries, q i , and then selects one of the fetched URLs using the n × n transition matrix M, as shown in Table 4.

Selecting Top-K Queries
After executing all the queries, we prepared a database of all the information needs.Then, PRF extracts the number of unique URLs and how many queries return that URL using the database.Subsequently, we re-ranked the URLs based on their frequency count.In this work, we selected the top-K rewritten queries using this frequency score, with the help of random walk on a query-URL bipartite graph.Bipartite graphs represent a mapping between the queries set Q and the URL set U. The scheme models the web search as a query-URL bipartite graph G = (Q, U, E), where nodes are divided into two separate sets: Q, the set of queries, and U, the set of URLs.Every edge in set E links a query and a URL.There is an edge between a query and a URL if the query fetches the web page.Using this many-to-many relationship among queries and URLs, PRF created a bipartite graph.In the proposed scoring scheme, each record has a query and URLs with their proper degrees.In a bipartite graph, the edges in E represent the confidence values of the query and pages in the search results.Figure 3 represents the proposed example of a bipartite graph.In this probabilistic retrieval model, the scheme fulfills an information need with a query set Q. Given an undirected graph G = (Q, U, E) with l = |Q| queries, m = |U| URLs, and n = |E| edges, the natural random walk starts with one of these random queries, qi, and then selects one of the fetched URLs using the n × n transition matrix M, as shown in Table 4.  4, which has four URLs and five queries.Table 4. Transition matrix M shows the random walk probabilities for the query-URL bipartite graph shown in Figure 3.
According to the example, the random walk starts by choosing random query qi from set Q, and then it selects a random URL ui connected to qi; from this URL ui, it then selects another connected query, and so on.This process of the query-URL transition is repeated, or it stops at a query or URL node, based on the value of n.Moreover, the scheme limits the number of transitions and keeps them according to the information need.It is also a simplifying assumption that the proposed random walk only visits an edge once.That means if an edge connects two nodes, once it traversed them, the scheme will choose a new random edge the next time.The random walk follows Algorithm 1 and executes in the following manner, where the notations used are heavily influenced by the study from Jaakkola and Szummer [34].Let U be the set of URLs, and let Q be the set of queries; the scheme constructs a bipartite graph G.In G, the number of edges E, assigns weights to the queries and URLs, given by incoming edge counts (degree dqi; dui) of each node.Scheme defines transition probabilities Pt + 1|t(ui|qi) from query qi to URL ui and vice versa, so Pt + 1|t(ui|qi) = 1/dqi, where i ranges over all connected URL nodes with a given query.The notation Pt2|t1(qi|ui) will denote the transition  4, which has four URLs and five queries.Table 4. Transition matrix M shows the random walk probabilities for the query-URL bipartite graph shown in Figure 3.

URL
q 1 q 2 q 3 q 4 q 5 According to the example, the random walk starts by choosing random query q i from set Q, and then it selects a random URL u i connected to q i ; from this URL u i , it then selects another connected query, and so on.This process of the query-URL transition is repeated, or it stops at a query or URL node, based on the value of n.Moreover, the scheme limits the number of transitions and keeps them according to the information need.It is also a simplifying assumption that the proposed random walk only visits an edge once.That means if an edge connects two nodes, once it traversed them, the scheme will choose a new random edge the next time.The random walk follows Algorithm 1 and executes in the following manner, where the notations used are heavily influenced by the study from Jaakkola and Szummer [34].Let U be the set of URLs, and let Q be the set of queries; the scheme constructs a bipartite graph G.In G, the number of edges E, assigns weights to the queries and URLs, given by incoming edge counts (degree d qi ; d ui ) of each node.Scheme defines transition probabilities P t + 1|t (u i |q i ) from query q i to URL u i and vice versa, so P t + 1|t (u i |q i ) = 1/d qi , where i ranges over all connected URL nodes with a given query.The notation P t2|t1 (q i |u i ) will denote the transition probability from node u i at time t 1 to node q i at time t 2 , while transition probabilities P t + 1|t (u i |q i ) are not the same because the number of degrees varies across nodes.Let the transition probability be: We organize the transition probabilities as a matrix M whose u i :q i entry is P t + 1|t (u i |q i ).The matrix M is row-stochastic, such that the rows sum to 1. Consequently, we perform the random walk and calculate the probability of transitioning and degrees from node q i to node u i in time t, denoted by P t|t − 1 (u i |q i ), which is equal to P t|t − 1 (u i |q i ) = Mt (q i u i ).Algorithm 1 is the pseudo code of the random walk process on a bipartite graph.According to Algorithm 1, a single node is visited once.Matrix R will store the degrees of URLs in reference q i :u i transitions.Both these data structures help in the confidence score calculation for each query and URL.The confidence of a query or URL is the ratio of the co-occurrence of the query/URL pair in the collection of results.The ratio value is always ≤ 1, which makes it easier to process than showing the actual high co-occurrence values.The confidence scores are calculated as follows: Equation ( 2) shows the confidence value of query q i .Here, the numerator is the sum of the degrees of all URLs fetched by qi and the denominator is the sum of degrees for all URLs fetched by all queries.U s (cell) = d ui ; //Store degree of URL both in Us and R matrix against q i u i .7: else 8: u i = random (URL) 9: U s (cell) = d ui ; 10: Q i + 1 = random (q i ); //Assign one random Query which is fetching URL.11: end if 12: end while = 0 During each iteration, the random walk visits one of the nodes from the query or the URLs and updates the matrix R. Here, the scheme fills the cells using source and destination node name q i u i , and enter the degrees of the URL in the proper cell.Using these degrees shown by R, we calculate the confidence in the query rank queries according to their confidence scores.Table 5 shows that the top-2 queries in the random walk are q 3 and q 4 with scores 1. Note: Here, q 3 and q 4 are suggested for seed query "smart light" shown by right-most columns.

Text Similarity Score
The proposed scheme uses the vector space model for calculating text similarity score between query and text extracted from web pages and URLs of the links.The proposed scheme starts from seed query processing, as shown in Figure 4. First, it recognizes the language of the query and then tokenizes the seed query.Afterward, it performs some preprocessing such as spelling correction and segmentation and expands the seed query using the rewriting technique explained Once it has created the set of candidate queries, the scheme deploys the top-query selection technique.Afterward, we select top-2 queries among rewritten queries.Henceforth, we only use these two queries for checking text similarity with the page text and fetched URLs.Subsequently, the scheme classifies the query and finds location and newness related words in them and, finally, finds the entity the user is searching for.Note: Here, q3 and q4 are suggested for seed query "smart light" shown by right-most columns.

Text Similarity Score
The proposed scheme uses the vector space model for calculating text similarity score between query and text extracted from web pages and URLs of the links.The proposed scheme starts from seed query processing, as shown in Figure 4. First, it recognizes the language of the query and then tokenizes the seed query.Afterward, it performs some preprocessing such as spelling correction and segmentation and expands the seed query using the rewriting technique explained above.Once it has created the set of candidate queries, the scheme deploys the top-query selection technique.Afterward, we select top-2 queries among rewritten queries.Henceforth, we only use these two queries for checking text similarity with the page text and fetched URLs.Subsequently, the scheme classifies the query and finds location and newness related words in them and, finally, finds the entity the user is searching for.The scheme prepares a term matrix TF-IDF for both the query and text of web pages.TF-IDF is one of the oldest and most well-known approaches that represent each query and document/web page as a vector.Using this vector representation, the PRF scheme calculated the cosine similarity score between the information need, which is described by an attribute-oriented definition and text documents made after extracting text from web pages and URLs of the links.We removed stop words from each web page text and do not perform stemming because it can change the intention or meaning.The scheme makes use only pages fetched by all queries for the all three selected DCV values {5, 10 and 20} for similarity check.From each URL in the list, the scheme extracted the text using the Alchemy API (http://www.alchemyapi.com/).Each obtainable web page and URL text are saved separately to build corpuses, for each IoT-related information need.We assume dead links to be irrelevant during this process.The same page URL retrieved by multiple search queries is considered once for text similarity check, as we have already saved its frequency.The extracted text from all unique URLs links and pages is stored as ui and Twp.After text retrieval from URLs, the scheme performs some basic text processing on each Twp and URL ui.Nowadays, search engines return results in different sections, such as related video, news, and blogs, among others.Thus, we filter retrieved URLs and consider only unique web page URLs returned for the query.Afterward, we perform normalization on the retrieved URLs.URL normalization is a process to convert URLs The scheme prepares a term matrix TF-IDF for both the query and text of web pages.TF-IDF is one of the oldest and most well-known approaches that represent each query and document/web page as a vector.Using this vector representation, the PRF scheme calculated the cosine similarity score between the information need, which is described by an attribute-oriented definition and text documents made after extracting text from web pages and URLs of the links.We removed stop words from each web page text and do not perform stemming because it can change the intention or meaning.The scheme makes use only pages fetched by all queries for the all three selected DCV values {5, 10 and 20} for similarity check.From each URL in the list, the scheme extracted the text using the Alchemy API (http://www.alchemyapi.com/).Each obtainable web page and URL text are saved separately to build corpuses, for each IoT-related information need.We assume dead links to be irrelevant during this process.The same page URL retrieved by multiple search queries is considered once for text similarity check, as we have already saved its frequency.The extracted text from all unique URLs links and pages is stored as u i and T wp .After text retrieval from URLs, the scheme performs some basic text processing on each T wp and URL u i .Nowadays, search engines return results in different sections, such as related video, news, and blogs, among others.Thus, we filter retrieved URLs and consider only unique web page URLs returned for the query.Afterward, we perform normalization on the retrieved URLs.URL normalization is a process to convert URLs into a standard and consistent format.Uniform resource locators (URLs) comprise a lot of information.For example, the URL "https://www.amazon.com/TP-Link-Dimmable-Equivalent-Assistant-LB100/dp/B01HXM8XF6"suggests a home page of "Amazon" having the company name "TP-Link", light type "Dimmable", etc.After normalization of URLs, a segment of the URL provides "www.amazon.com"which becomes "www", "Amazon" and "com", or lengths of tokens, orthographic features, sequential n-grams, and sequential bi-grams.We tokenize URLs with the help of nonalphanumeric characters as boundaries and remove English stop words, including web-specific stop words, as well as file and domain extensions, etc.Then, we generate n-grams of length 2 and 3 for URLs and web page text.Punctuation marks ("," "." etc.) are removed.In this work, the extracted n-grams from URLs are used to check similarity with the query.n-grams that consist of stop words only or that do not contain at least one alphanumeric character (e.g., n-grams such as "at the" or "#,@") are removed.We drop sparsity in this work by constructing a vector for only query terms, not considering the other terms from a web document.The bag of words approaches faces problems if the terms have any spelling mistakes (e.g., with a unigram).Hence, we used n-grams rather than a collection of unigrams, where occurrences of pairs of consecutive words are counted.We removed stop words from the each T wp and gave query-URL and query-wp vector forms.The computation of similarity between a query and URL and query and T wp are named as SS u (similarity score of URL) and SS t (similarity score of text), respectively.The score of similarity for every fetched URL is calculated in the following fashion: Equations ( 3) and ( 4) show the cosine similarity scores between query q i , URL i , and T wp .We use three factors for calculating relevance score are web query-wp text similarity, query-URL text similarity and normalized frequency of URL (F u ).The normalized F u calculation is shown in Equation ( 5).We calculated it by dividing degree of u i by sum of all URLs degrees.For example, the degree of URL u 1 in Table 5 is 5; we normalized it by dividing with 17, which is the sum of all URLs degrees.Equation (6) shows the Pseudo Relevance score (PR s ) for the pages fetched for the IoT-related information need.
Using the PR s , we will create an initial rank of all the fetched web pages and compare it with the rank returned by the search engines

Boosting-Based Page Re-Rank
The initial rank created on the basis of PR s considers all the web pages fetched using a query set while ranking.Afterward, the proposed PRF scheme selected the top-n results from the rank list prepared by PR s and applied boosting criteria based on newness and locality of each web page.We do so based on some concrete motives, as initially the number of pages can be very high and applying boosting criteria on each fetched web page can be time-consuming.That is why we are applying boosting based re-ranking on the top-n results.While using different DCVs, we found that, in most cases, the number of irrelevant URLs increases drastically after the 20th URL.Therefore, we re-ranked only the top 20 URLs after calculating the boosting scores.

Newness-Based Re-Rank
Newness-based re-rank is based on the time feature.Therefore, the URL that is most recent should be presented before the other pages while ranking.Here, we first explore whether the provided query is time-sensitive or not.Users often trying to explore the things or document based on their time.For example, the user is searching for "latest Smart Lights" and enters the same keywords as a query.In the given query, the word "latest" shows that the query is time-sensitive and the resulting rank of this query should boost the latest pages.If the search engine ranks an older page or irrelevant page higher, it creates a bad image of the search engine in user's mind.We used query classification method to divide the query into time-sensitive and non-time-sensitive classes.Text classifiers regularly use a document denoted as a BOW.This is a simple illustration: consider a query q i , whose class is given by C t .In the case of queries, there are two classes C t = Time-sensitive and C nt = non-time-sensitive.We classify q i as the class which has the highest posterior probability P(c i |q i ), which can be re-expressed using Bayes Theorem: P(c i |q i ) = P(q i |c i ) P(c i ) P(q i ) ∝ P(q i |c i ) P(c i ) If we have a vocabulary V, having |V t | word types, then the feature vector dimension d = |V t |.The Bernoulli query model is represented by a feature vector with binary elements taking the value 1 if the corresponding time-sensitive word exists in the query q i and 0 if not.Consider the vocabulary example: V t = {latest, up-to-date, state-of-the-art, advanced, hot, modern, new, newest, etc.} Now, consider the query "latest Smart Lights", then its Bernoulli model is q b = (1,0,0).Using this information, we can classify it with the help of Equation ( 7).This classification helps us in deciding whether we should add the time stamp based factor or not while ranking.Equation (8) shows this time-sensitive relevancy score calculation.Here, PR st shows pseudo relevance score calculation for URL u i , while PR s shows initial relevancy score and s new = (α or β) shows score based on time stamp for u i .Here, α is a boosting factor and its value for newness is equal to 1, while β is dampening factor and its value is −1.In Equation ( 8), C t shows the class related to time-sensitive queries.
Hence, here we proposed a Page Rank Boosting (PRB) scheme for time-sensitive queries.In PRB, first, we check the time-sensitive nature of the query and consequently apply PRB on the top-20 pages from the initial rank.
Table 6 shows the effects of boosting on and the promotion or demotion for different labeled documents from the initial rank list.According to the scheme, if the time-sensitive detail is not attached to a web page, then the dampening factor damps it, whereas, if a page is labeled as somewhat useful and has a time stamp label new, then PRB boosts its score to useful, and, if the time stamp is old, then PRB demotes it to not useful.In the PRB scheme, we consider that if the label is not useful that shows it belongs to irrelevant class so PRB neither promotes it nor demotes it.The PRB helps a lot in improving the DCG of the rank list towards the ideal DCG.We consider the number of relevancy labels (3) and time-sensitive stamps (3) and they are equal.To check the newness of web pages, PRB used the in-house made scraper to find out the dates in the web page HTML code.This work uses the latest date in the code as the freshness of page rather than its inception date.If a page code does not return a date within the last one year, we consider it as old.Table 6.Boosting scores for queries having a timestamp to calculate initial scores given by the PRB scheme.

Old Not Given
Useful Useful (no change) Somewhat useful (damped using β) Somewhat useful (damped using β) Somewhat useful Useful (boosted using α) Not useful (damped using β) Not useful (damped using β) Not useful Not useful (no change) Not useful (no change) Not useful (no change)

Geo-Sensitive-Based Re-Rank
Text similarity and time-sensitive feature-based ranking of results are helpful for the user.However, when the users are looking for retail stores or services around some particular locations, then these are not sufficient.Hence, here we proposed a PRB scheme for geo-sensitive queries.In this scheme, PRB first checks the geo-sensitive nature of the query and if it is geo-sensitive, then PRB is applied to the top-20 pages from the initial rank.For example, the user is interested in buying or availability of some IoT-related item near his geographic location but while searching he gets the results which are not around him.The PRB names these queries as geo-sensitive queries.These results will create an adverse impact on the search engine on the user.Hence, here we discuss the location-sensitive boosting method for pages.Like the time-sensitive queries, the PRB only boosts those queries which have some location-determining word in them.For example, if query q i "smart lights available in Daegu" contains a word such as a city name or a phrase such as "nearest shop of smart lights", the PRB classifies these queries as GEO queries.Like the newness-based boosting, let us consider a query q i , whose class is given by C g .In the case of queries, there are two classes C g = GEO-sensitive and C ng = non-GEO-sensitive.The PRB classify q i as the GEO class if it has highest posterior probability P (c g |q i ), which can be re-expressed using Bayes' Theorem: P(c g q i ) = P(q i c g ) P c g P(q i ) ∝ P(q i c g ) P c g (9) If the PRB has a vocabulary V g , having |V g | word types, then the feature vector dimension d = |V g |.The Bernoulli query model is represented by a feature vector with binary elements taking value 1 if the corresponding time-sensitive word exists in the query q i and 0 if not.Consider the vocabulary example: V g = {adjacent location, local, nearest, nearby, besides, site, locality, Daegu, Seoul etc.} Now consider the query "Smart Lights available in Daegu" then its Bernoulli model is q b = (0,0,0,0,1).Using this model, the PRB can classify the query with the help of Equation ( 9).This classification helps us in deciding whether PRB should add the location-based factor or not while ranking.
Equation (10) shows this geo-sensitive relevancy score calculation.Here, PR sg indicates a pseudo-relevance score calculation for URL u i , while PR s indicates an initial relevancy score and s geo = (α or β) shows score based on time stamp for u i .Here, α a boosting factor and its values for location are equal to the PR s label while β is dampening factor and its value is −PR s .In Equation (10), C g shows class related to geo-sensitive queries.
To improve the ranking for geo-sensitive queries, the PRB extracts the locations from the query and URLs.If the query holds a location-oriented label such as "Daegu", then the PRB extracts the location-related information of URLs from the contact us section within the website text, such as the company address, whereas, if the query does not have any city name, then the PRB extracts the IP locations for matching.The PRB stores the location information in a separate vector.Once we have the locations for both the queries and URLs, the PRB can easily compute the similarity cosine (q i , u i ).When the cosine between these are 0, it means there is no location match, and the page should be demoted, while if it is other than 0, then it should be promoted to the rank according to the Table 7.The proposed boosting of page rank list for geo-sensitive queries takes the initial ranking of the page for boosting.The PRB performed a list-wise comparison of URLs while boosting; for example, p 1 is top rank page according to initial rank and labeled as useful.There are two possibilities that p 1 has no location and should be penalized, so the PRB uses a dampening factor, which is β = PR s , and for useful it is (−2).However, if p 1 shows a location match, then there will be no change as it is already labeled useful.The same kind of dampening is performed for pages of all kind labels.Table 7. Boosting of the score for location-sensitive queries we have applied these scores on the initial scores given by the PRB scheme.

Dataset and Evaluation Mechanism
IoT is one of the emerging fields of the modern era, and, to evaluate the searches related to it, we need search query log from search engines.Nowadays, however, because of privacy and some commercial factors, it is problematic to acquire real query logs from a search engine such as Microsoft, Yahoo, Google, Bing, and so on.Therefore, for this purpose, we prepared in-house data set.We select 10 students and brief them about these 10 selected topics, as mentioned in Section 3.1.Afterward, we ask them to prepare at least one query about each topic as an information need for submission on the search engine.Accordingly, we prepared 100 information needs initially by all these students.After removing the similar queries, we have selected 60 seed queries for the experiments.We then rewrite these queries and execute them on Bing and Yandex to form a pool of web pages.We executed these queries on all SSE and extracted unique URLs depending on DCV, against each query in common.Consequently, the pool consists of 60 × 100 = 6000 web pages.For the initial ranking, we use the top-2 queries and pages and find text similarities for them.
To check the relative performance of the search engines, we compared it with the sustainable optimal rank list prepared by us.In this work, we tried to achieve optimal rank list for testing the relevance.We check the sustainability of the optimal rank list with the help of relevance judgment prepared by students for all the web pages of the pool.We gave the returned web pages to the student according to their information need and requested them to the label.To evaluate the optimal rank list accuracy, we used label web pages, which annotated by students as a base.The student's label pages as highly useful (2), somewhat useful (1), or not useful (0).The search system that achieves higher performances relative to the sustainable optimal rank list with high accuracies gets a higher score in the evaluation and we hope that it gets more attraction of users.We use DCG, nDCG, and MAP n for evaluation purposes to check the initial performance of search engine relative to the sustainable optimal rank list at each step in this work.DCG and nDCG rates higher the list that can fetch the highest number of relevant pages and rank them following their relevance score.In the results, DCG and nDCG show the averages for all 60 information needs for each set of experiments.We have obtained absolute relevance judgments from a group of students for all query-URL pairs of the dataset.We report the DCG and nDCG scores for the SSE and rank the list prepared by the methods for the same DCV value {5, 10, and 20}.We also calculated the MAP@n scores for all three schemes with the selected DCV values.In this work, we calculated the precision scores for each of the queries and then calculated its mean average.

Results and Discussion
The proposed scheme uses query rewriting as the preliminary theme and, after executing the rewritten queries, we prepared a pool using DCV values {5, 10, and 20}.This selection variation helps in proving the efficiency of a search engine for different DCVs.We take these lists of results and used student evaluation initially to check the relevancy of query-URL pairs for each search engine result for each DCV.We used student judgments as the gold standard to check the relevancy of fetched results list against the queries.Table 8 represent the average results for both the search engines and Optimal Rank performance in the form of DCG, iDCG, and nDCG using the student assessment.These results are calculated for the 60 information needs provided by the students and then averaged.Table 8 shows the DCG values for Bing and Yandex in comparison to the Optimal Rank.The results clearly show that Yandex performs better than Bing for the given information need.The results depict that Yandex fetched more relevant documents than Bing, and gave these fetched documents more appropriate rank for all three DCV values.When we compare Yandex performance with Optimal Rank, we found that the use of rewritten queries fetches more relevant results than both the search engines using seed query.However, the rank is still not ideal as we can see the ideal DCG values in Table 8.An IR system ideal DCG shows the rank achieved after sorting it on their evolution labels in descending form.Consequently, none of the lists can achieve this ideal rank, but the Optimal Rank performance is better than others.Table 8 shows the normalized values for DCG achieved by the all three of them.Here, we use DCV values {5, 10, and 20}, and the results in Table 8 clearly show that as we increase the DCV values the number of relevant fetched documents decreases, which is shown by the lower values in nDCG 20 column.Once we have calculated the results for the 60 queries with the help of DCG and nDCG, we filter the queries which contain the time stamp.Out of the 60 queries, there are only seven different queries made by the students that have time-related words in them.We extracted these queries using the classification mentioned in the Section 3.6.1.After separating these queries, we applied same rewriting scheme on these queries and executed them on the search engines to prepare the pool of web pages.From these pages using the technique mentioned in Section 3.6.1,we prepare the time-related information from the queries and web pages and check the threshold gap of one year for boosting and dampening.Once we have found this information we, re-ranked the pool of pages and prepared a list.
Table 9 show the results calculated after the re-rank based on the newness of web pages for time-sensitive queries.For time-sensitive queries, the performances of both the SSE decrease.The DCG values in Table 9 show this effect clearly for Optimal Rank, Bing, and Yandex for all DCV values.In terms of the overall performance, the Optimal Rank + New, which are the results achieved by re-ranking the web pages, shows better performance.During the analysis of the results list, we found that the list produced by re-ranking shows the latest page at the top with 89% accuracy.In the above results, Table 9 show the results for the ideal DCG and nDCG achieved by all of the information retrieval methods.In these results, Yandex performs relatively better than Bing.
Table 10 represent the results for the queries based on the location-related information.We classify the queries using the location vocabulary shown in Section 3.6.2for the location-sensitive queries.Out of 60 queries, there are only five queries that contain the location-related keyword(s).Using these five seed queries, we rewrite them and generate more related queries to form a pool of web pages.After forming the pool, we calculated the scores shown in Table 10.Table 10 shows that there is a smaller decrease in the DCG value for the location-based queries than time-sensitive queries.Here, in these tables, the Optimal Rank + GEO shows the impact of location-based re-ranking of results.Similar to the above results, this re-ranking shows better performance than the three other methods, including the basic optimal rank method.The same phenomenon is shown for all three evaluation values, i.e., as we increase the DCV, the overall performance of the lists decreases.
The next evaluation matrix we used is MAP@n for checking the number of retrieved results.This evaluation matrix represents the average precision obtained by different IR systems while searching for all topics against the selected DCVs.Table 11 shows the results for MAP@n achieved using the three systems.The results clearly show that the optimal rank achieves the highest MAP@n.Here, n is equal to the DCV values, at DCV 5 , the proposed scheme achieved MAP@5 = 0.60, which shows clear improvement over the other system results.Figure 5 shows the same MAP@n scores achieved for each of the topics separately.For all topics, the proposed scheme proves its worth.The highest MAP score is 0.62, which is achieved for the topic related to smart homes.This is because of the topic's popularity among Internet users and most number of indexed pages.While the lowest MAP scores are for smart retail related queries: 0.57. Figure 5c shows the MAP scores for DCV 20 ; the proposed scheme underperforms the Yandex for only two topics: smart retail and smart farming.The proposed scheme performance degrades as it is unable to find right terms for query rewriting, because of the poor standard page corpus for these topics.
The obtained statistics related to search engine performance evaluation can be helpful for people related to a search engine, and it allows them to still be informed and helps for rational decisions.

Conclusions and Future Work
In this work, we proposed a scheme for obtaining the sustainable optimal rank to fetch search results with higher precision for IoT-related topics.The proposed scheme rewrites the seed query with the help of attribute terms extracted from the page corpus.We also used newness-and geo-

Conclusions and Future Work
In this work, we proposed a scheme for obtaining the sustainable optimal rank to fetch search results with higher precision for IoT-related topics.The proposed scheme rewrites the seed query with the help of attribute terms extracted from the page corpus.We also used newness-and geo-sensitivity-based boosting and dampening of web pages for re-ranking.We verified the proposed scheme for sustainable optimal ranking with the help of an evolution metric based on DCG, nDCG, and MAP@n.In this work, we compared the proposed scheme results with two different search engines.The experimental results showed that the scheme achieves scores of MAP@5 = 0.60, DCG 5 = 4.43, and nDCG = 0.95 for general queries, DCG 5 = 4.14 and nDCG = 0.93 for time-stamp queries, and DCG 5 = 4.15 and nDCG = 0.96 for geographical location-based queries.The outcomes validate the usefulness of the suggested system in helping the user to access IoT-relevant results.In the future, we plan to expand this work for automatic performance evaluation of SSEs with the help of text similarity, semantic features, and clustering of results.

Figure 1 .
Figure 1.Automatic evaluation of the relative performance of search engines for IoT-related queries.

Figure 1 .
Figure 1.Automatic evaluation of the relative performance of search engines for IoT-related queries.

Figure 2 .
Figure 2. Search results for seed query "Smart Light" for (a) Yandex and (b) Bing search engines.

Figure 2 .
Figure 2. Search results for seed query "Smart Light" for (a) Yandex and (b) Bing search engines.

Figure 3 .
Figure 3.The bipartite graph for the sample table shown in Table4, which has four URLs and five queries.

Figure 3 .
Figure 3.The bipartite graph for the sample table shown in Table4, which has four URLs and five queries.

Algorithm 1 :
Random Walk on a Bipartite Graph Input: q 0 = seed query, run size n = |E|.Output: Sample queries and their degrees; Sample Pages with Degree.1: Matrix R and list Us are empty; //Matrix R and list Us will store degrees of URLs because of Random Walk.2: i = 1; 3: while i ≤ n do 4: u i = random (URL); //Assign one random URL which is fetched by the query.5: if U s (cell)!= Null then 6:

Figure 4 .
Figure 4. Query model used for text similarity in the proposed scheme.

Figure 4 .
Figure 4. Query model used for text similarity in the proposed scheme.
change) Not useful (damped using β)Not useful (damped using β) Somewhat useful Useful (boosted using α)Not useful (damped using β) Not useful (damped using β) Not useful Not useful (no change) Not useful (no change) Not useful (no change)

Table 1 .
Symbols and abbreviations used in the paper.

Table 2 .
Topics used for the construction of information needs.Here, the topic ranking is given according to the number of searches on Google.

Table 5 .
Top-2 query suggestions using random walk on bipartite based scheme with the help of R matrix.

Table 5 .
Top-2 query suggestions using random walk on bipartite based scheme with the help of R matrix.

Table 8 .
DCG, iDCG, and nDCG scores for the search engines against the seed queries provided by the students and optimal rank using rewriting.

Table 9 .
DCG, iDCG, and nDCG scores for the search engines against the seed queries given by the students and optimal rank + timestamp-based re-ranking of results.

Table 10 .
DCG, iDCG, and nDCG scores for the search engines against the seed queries provided by the students and optimal rank + geographic location based re-ranking of results.

Table 11 .
Mean average precision scores at different DCV values for the all three search lists.