Next Article in Journal
Estimating Autonomous Vehicle Localization Error Using 2D Geographic Information
Next Article in Special Issue
From Manual to Intelligent: A Review of Input Data Preparation Methods for Geographic Modeling
Previous Article in Journal
Measuring Urban Greenspace Distribution Equity: The Importance of Appropriate Methodological Approaches
Previous Article in Special Issue
Predicting Station-Level Short-Term Passenger Flow in a Citywide Metro Network Using Spatiotemporal Graph Convolutional Neural Networks
Open AccessArticle
Peer-Review Record

Spatial Keyword Query of Region-Of-Interest Based on the Distributed Representation of Point-Of-Interest

ISPRS Int. J. Geo-Inf. 2019, 8(6), 287; https://doi.org/10.3390/ijgi8060287
Reviewer 1: Shaohua Wang
Reviewer 2: Anonymous
Reviewer 3: Imran Memon
ISPRS Int. J. Geo-Inf. 2019, 8(6), 287; https://doi.org/10.3390/ijgi8060287
Received: 21 April 2019 / Revised: 18 June 2019 / Accepted: 19 June 2019 / Published: 20 June 2019

Round 1

Reviewer 1 Report

This paper presents a spatial keyword query of Region-Of-Interest based on the distributed representation of Point-Of-Interest, which is and timely and interesting topic, and of relevance to the IJGI journal. The spatial keyword query for ROI is essential for spatial analytics.

More specific comments can be found in the following. Some areas where I would like to see more detail:

3. Problem Statement

- Table 2. Symbols list should be reformulated with detail information. For example pi is the i-th POI.

4.Methods

-The overall architecture should be the overall workflow.

5. Experiment and Results

-- Figure 7 and Figure 8 should be updated with the compass, legend, and map scale.

--Please give more information about formulation (11).

-- It is hard to see Q1 with Figure 10 and Figure 11. Please redraw the maps with labels.

-- Could you give more descriptions about Figure 14? What does a different color mean in Figure 14(a)?

6. Conclusions

 -- What are the limitations of your method?


Author Response

First of all, we would like to thank you for your review sincerely. According to your comments, we have carefully considered and revised our manuscript as follows:

 

Point 1: Problem Statement:--Table 2. Symbols list should be reformulated with detail information. For example pi is the i-th POI.

 

Response 1: We formulate the formal expressions in table 2 so that they have more detailed information.

Table 2. Symbols list

Symbol

Meaning

P

a collection of POI

pi

i-th POI

ti

i-th type label

Q

a keyword query group

qi

i-th keyword in query

R

a ROI

 

 

Point 2: Methods:--The overall architecture should be the overall workflow.

 

Response 2: We reconsider and modify the overall workflow according to our research objectives. On this basis, we rewrite the overall architecture in strict accordance with the process of our workflow, making the methodology more logical and clear.

4.1. The Overall Architecture

The workflow diagram of our method is designed in Figure 2. First of all, we will describe the data to train our POI vectors in Section 4.2 and consider it as the input of workflow. According to specific intentions, the procedure of the workflow is made of three steps:

1.         Firstly, the raw date that contains a large number of POIs with type labels is used to construct the corpus (an organized computer-readable collection of text or speech in the field of NLP) of POIs. The skip-Gram model of Word2Vec train the POI corpus to express POIs in a way of high-dimensional vectors, which can capture their semantic information and environmental state. The latent semantic association of POI embedding vectors is revealed in correlation analysis; (Section 4.3)

2.         Secondly, a grid division in research region is built to acquire the candidate ROIs, each of which is viewed as a POI set. The candidate ROIs can be described in a form of vector by the product of the step 1 (POI embedding vectors). At the same time, two variant methods of generating candidate ROIs are introduced to make the ROI vector description more reasonable; (Section 4.4)

3.         Finally, the products of previous step, candidate ROI vectors, are consider as the input of this step. They will be utilized to calculate the relevance score by similarity formula with the query vector in keeping with user’s query keyword group Q. Therefore, based on different query mode, the top-K ROI related with user’s query is returned as the final result. (Section 4.5)

                                             

Figure 2. Workflow of the spatial keyword query of ROI with the distributed representation of POIs.

In the remainder of this section, we present more details about the specific process of these steps.

 

Point 3: Experiment and Results:-- Figure 7 and Figure 8 should be updated with the compass, legend, and map scale.

 

Response 3: We redraw Figure 7 and Figure 8 according to your suggestion and append additional information to make them more detailed.

 

Point 4: Experiment and Results:--Please give more information about formulation (11).

 

Response 4: There is a label error in the previous submitted manuscript that we mark the equation (12) as (11). We have fixed this error in our manuscript. Your suggestion leads us to realize that both equations (11) and (12) don’t fully express the specific implementation and intention. We explain these two formulas in more detail in our manuscript as follows:

(1)


(11)

Pearson correlation coefficient between variables Xc,D and Y is defined as the quotient of covariance and standard deviation between them. The absolute value of the correlation coefficient   |rc,D| reveals the strength of the correlation: the closer the correlation coefficient is to 1 or -1, the stronger the correlation; the closer the correlation coefficient is to 0, the weaker the correlation. In our metric, a larger positive Pearson correlation coefficient rc,D indicates that the POI vector of the iteration (c,D) is more in line with the original multi-level type association, which also means it is of high quality at this time.

(2)


(12)

The overlap regions between the top-K results Rtop-K of the query and the rasterized region Rv is viewed as the hits, i.e., the correct ROI query results. With the number of top-K query results taken as the denominator, Precision reflects the proportion of the hits in top-K query results; with the number of ROIs in the validation set Rv taken as the denominator, Recall reflects the proportion of the hits in all relevant ROI query results. F-value is the harmonic average of them. Because the F-value can reflect the overall performance of the query, the F-value corresponding to the query results is considered as the final evaluation standard in our experiment.

 

Point 5: Experiment and Results:--It is hard to see Q1 with Figure 10 and Figure 11. Please redraw the maps with labels.

 

Response 5: We redrew these figures with 300dpi to make it clearer. If there are still any problems, you could view the original figures in Supplementary.

 

Point 6: Experiment and Results:--Could you give more descriptions about Figure 14? What does a different color mean in Figure 14(a)?

 

Response 6: We describe about it in more detail in the illustration of figure 14:

Figure 14. (a)The heat map of POIs. It intends to reflect a combined relevance of POIs of the type of Starbucks and cinema. The brighter grids means a higher value of the combined relevance of them, i.e. both of them are densely distributed in this ROI, while the dark ones are the opposite. It is worth noting that the grids populated with only one type of POIs do not show very high correlation. (b) The top-50 query results by RALL. The top-50 query results are basically consistent with the brighter grids in (a), reflecting that our method can achieve good performance in the task of multi-keyword query.

 

Point 7: Conclusions: -- What are the limitations of your method?

 

Response 7: In accordance with your proposal, we have taken serious consideration into our method and its variations, and point out their limitations in conclusions:

There are two limitations of our methods to be clarified: 1.The size of the grid determines the query granularity of ROI, which affects the performance of our proposal. Unfortunately, we are not able to automatically learn this value based on the target of the query, which means users need to set it based on experience; 2. The essence of distributed representation of POI is to learn the environmental characteristics and semantic information of POI, which means that the applicable objects of our method will depend on the cities’ schema that constitute POI corpus. An intuitive example is POI vectors learned from Beijing that might be efficient to build spatial keyword query of ROI in Shanghai, but might show a bad performance in rural towns.

 

Finally, thank you very much for your valuable suggestions. We sincerely hope that you can consider our paper after we revised it.


Author Response File: Author Response.pdf

Reviewer 2 Report

Paper presents interesting and actual topic on ROI top-k keyword query method and it is worth of publishing. Nevertheless, it seems to me that the text should be arranged in order to clearly separate the methodological part from the results. Elements of the authors' research methodology can be found not only in Methodology section, both also in the Related Works section and in the Results section (evaluation methodology). Description of datasets should be also placed in methodology section. Methodology should contain both – method of search and the methodology of evaluation. Results should contain both: results of using proposed method as well as the results of method evaluation.

 

Detailed commnents:

 

verse 42

Is ”location-based service”, should be “location-based service (LBS)”.

 

verse 45-46

POI, ROI should be explained at first use in the text, even they are explained in the abstract.

 

verse 52-55

“At present the existing ROI exploration methods are mainly based on…” there is a lack of references confirming these sentence

 

verse 55-61 – the same

 

verse 67 – no explanation what Word2Vec model is

 

verse 76 – “Different from traditional ROI exploration …” but why it is better?

 

 

2. Related Works

It is a mix of previous research and authors research. It would be better to separate these two elements. This part should present ONLY previous research. Authors ideas and extension of existing research should be moved to the methodology section (verse 101, 104, 117 etc.)

 

verse 189-190

“It is worth mentioned that the similarity calculating takes into account the environment semantics of regions” it is important to explain it!

 

4.2.2.

Subsection can’t start with the figure, moreover before mentioning figure number in the text.

 

Verse 346

Please, delete word “below”

 

Verse 392-398

It is the methodology of evaluation, therefore it should be the part of the methodology section.

 

Verse 400, 508, Dataset description should be in methodology section


Author Response

First of all, we would like to thank you for your review sincerely. According to your comments, we have carefully considered and revised our manuscript as follows:

(The verse number in our response is shown under the mode of “All Markup” of the “track changes”.)

 

Point 1: Nevertheless, it seems to me that the text should be arranged in order to clearly separate the methodological part from the results. Elements of the authors' research methodology can be found not only in Methodology section, both also in the Related Works section and in the Results section (evaluation methodology). Description of datasets should be also placed in methodology section. Methodology should contain both – method of search and the methodology of evaluation. Results should contain both: results of using proposed method as well as the results of method evaluation.

 

Response 1: According to your suggestion, we separate our methodological part from the results. We reconsider and modify the overall architecture of Methods. We put the description of dataset in Methods (Section 4.2). Meanwhile, the evaluation methodology, correlation analysis of the POI vectors, is considered as the part of the POI embedding and described in Methods (Section 4.3.3). But we think that the selection of training parameters should still be placed in the experimental part for the following two reasons:

1.      The evaluation metric of parameter selection is more related with the specific implementation of Word2Vec model, rather than our ROI query methods.

2.      Putting the evaluation metric of parameter selection in the experiment section can make the description more coherent and clear.

 

Point 2: verse 42 – Is “location-based service”, should be “location-based service (LBS)”.

 

Response 2: verse 45

We fix the error according to your suggestion.

 

Point 3: verse 45-46 – POI, ROI should be explained at first use in the text, even they are explained in the abstract.

 

Response 3: We have supplemented the explanation of these concepts in the abstract when they first appeared:

verse 18-21

Compared with the traditional LBS based on Point-Of-Interest (POI) which is the isolate location point data rich in information, an increasing number of demands have concentrated on Region-Of-Interest (ROI) exploration, i.e., geographic regions that contains many POIs and expresses rich environmental information.

 

Point 4: verse 52-55 –“At present the existing ROI exploration methods are mainly based on…” there is a lack of references confirming these sentence.

 

Response 4: verse 56-59

We have added references for these sentences to make them more convincing:

At present the existing ROI exploration methods are mainly based on the statistical information or the density information of query elements [6-9], such as POIs with certain keyword [10,11], neglecting the influence of regional internal characteristics and environment traits in the region, which results in the omission of effective information of the ROI.

 

Point 5: verse 55-61 – the same. There is a lack of references confirming these sentence.

 

Response 5: verse 59-65

We have added references for these sentences to make them more convincing:

Theoretically, judging how much the ROI is related to a query should take into the distribution and the type characteristics of all geographic object in the ROI, i.e., the regional ecology decides the association with the query requirements. However, it is still quite a challenging problem to measure the relevance between each spatial object and the query requirements [12]. Moreover, another challenging problem for ROI exploration is the spatial-keyword query with the multiple query elements [11]. This type of the keyword query process is usually more complex and results in a lot of time consumption.

 

Point 6: verse 67 – no explanation what Word2Vec model is

 

Response 6: We have added an explanation of the Word2Vec model in the introduction when it first appears:

verse 71-72

Therefore, in this paper, we attempt to construct a reasonable spatial context and use Word2Vec model, which is a deep-learning language model that show word in a distributed representation based on its context in the document, to capture the spatial distribution features and environmental information of each type of spatial object such as POI. The model can transform the spatial object into a high-dimensional vector, which also show the association characteristics between POI types.

 

 

Point 7: verse 76 – “Different from traditional ROI exploration …” but why it is better?

 

Response 7: We briefly explain the advantages and innovations of our methods in its following sentences:

verse 83

“we study ROI exploration for environment semantics and distribution characteristics.”

verse 84

“We are the first to utilize the spatial distribution and the semantic information of POI in ROI exploration.”

The specific description of the advantages of our method could be found in the second paragraph of the Introduction and Related Works. For example, we point out the drawbacks of the existing methods in the second paragraph of the Introduction: “At present the existing ROI exploration methods are mainly based on the statistical information or the density information of query elements, such as POIs with certain keyword, neglecting the influence of regional internal characteristics and environment traits in the region, which results in the omission of effective information of the ROI.”

We also compare some existing methods with our method in Related Works: “Compared with the above research, the biggest difference of this paper is introducing the concept of environmental semantics of spatial objects. With the distribution features and semantic expressions of each spatial object captured by the statistical model of deep learning, the regional features corresponding to the candidate ROI are constructed to match the query keywords.”

We verify the good performance of our method through experiments and summarize its advantages in the conclusion. This is a strong evidence to proof that it better than traditional methods.

Besides, our methods takes into account the environment semantics and distribution characteristics of the POIs. Intuitively, more information will lead to a better result. We believe that the advantages of our methods have been reflected in the text.

 

Point 8: Related Works

It is a mix of previous research and authors research. It would be better to separate these two elements. This part should present ONLY previous research. Authors ideas and extension of existing research should be moved to the methodology section (verse 101, 104, 117 etc.)

 

Response 8: We consider it necessary to mention our work in the Related Works. In this section we summarize the existing works, point out their drawbacks, and compare our methods with them. In this way, we can demonstrate the improvement and innovation of our methods better. If these are explained only in the section Methods and the comparisons are omitted, it is not clear to show the gap between the existing works and our proposal probably.

 

Point 9: verse 189-190

“It is worth mentioned that the similarity calculating takes into account the environment semantics of regions” it is important to explain it!

 

Response 9: According to your opinion, we explain the similarity calculation of environment semantics and describe more details about the instance in Figure 1.

verse 197-203

On the basis of the well-trained distributed representation of POIs, our method will generate the corresponding vector for each candidate ROI, which contains the internal environmental information and structural characteristics of ROI, i.e., the environmental semantics of the region. Thus, the vector corresponding to the query keyword will be treated as the search condition to find the top-K ROI matching the query vector. The example in figure 1 is to calculate similarity score between the vector corresponding to the query keyword Q {school} and each candidate ROI vector to find the top-1 result.

 

Point 10: 4.2.2. Subsection can’t start with the figure, moreover before mentioning figure number in the text.

 

Response 10: verse 303

We put the figure after the paragraph where it is first mentioned.

 

Point 11:  verse 346

Please, delete word “below”

 

Response 11: verse 432

We delete it after revision.

 

Point 12:  verse 392-398

It is the methodology of evaluation, therefore it should be the part of the methodology section.

 

Response 12: Section 4.3.3

As answered in response 1, we have restructured our manuscript to make the evaluation be part of the methodology based on your suggestion.

 

Point 13:  Verse 400, 508, Dataset description should be in methodology section

 

Response 13: Section 4.2

As answered in response 1, we have described our dataset for training the distributed representation of POIs in methodology section on your suggestion.

 

Finally, thank you very much for your valuable suggestions. We sincerely hope that you can consider our paper after we revised it.


Author Response File: Author Response.pdf

Reviewer 3 Report

Existing work limitation did not well analyzed .

lack of literature review. consider following reference with discussion.

-GEO matching regions: multiple regions of interests using content based image retrieval based on relative locations." Multimedia Tools and Applications 76.14 (2017): 15377-15411.

-Region-of-Interest Compression and View Synthesis for Light Field Video Streaming." IEEE Access 7 (2019): 41183-41192

figure 1 and 2 need more detail explanation.

what are main attributes to calculate region of interest add table (ROI attributes) and how many ROI calculate in table 4.

add implementation algorithm and its detail also provide the implementation code  supplementary data.

Author Response

First of all, we would like to thank you for your review sincerely. According to your comments, we have carefully considered and revised our manuscript as follows:

 

Point 1: Existing work limitation did not well analyzed.

 

Response 1: We think we have outlined the main limitation of existing work in the second paragraph of Introduction, which is the innovation point of our proposals:

-- “At present the existing ROI exploration methods are mainly based on the statistical information or the density information of query elements, such as POIs with certain keyword, neglecting the influence of regional internal characteristics and environment traits in the region, which results in the omission of effective information of the ROI.”

Meanwhile, we point out more specific limitations of existing work in Related Works:

-- “However, with the POI dataset increasing, this approach usually results in lots of time consumption as a consequence and it is quite difficult for this approach to control the size of ROIs.”

-- “One of the main challenges in this approach is that there are some difficulties in measuring the similarity between the query and each ROIs reasonably.”

-- “Zhi Y et al. propose a method based on the density statistics of POIs with related keywords in the region to measure the correlation between the target region and the query, which ignores abundant and available POI environment information.”

-- “Compared with the above research, the biggest difference of this paper is introducing the concept of environmental semantics of spatial objects.”

We believe that it could be acceptable about our analysis of the limitations of the existing work. On this basis, we added more references in the second paragraph of Introduction to make the analysis more valid and convincing after this revision. If there are still drawbacks in the analysis of existing work limitation, we would like to thank you for more detailed suggestions and try our best to improve our manuscript.

 

Point 2: lack of literature review. Consider following reference with discussion.

 

-GEO matching regions: multiple regions of interests using content based image retrieval based on relative locations." Multimedia Tools and Applications 76.14 (2017): 15377-15411.

 

-Region-of-Interest Compression and View Synthesis for Light Field Video Streaming." IEEE Access 7 (2019): 41183-41192

 

Response 2: We have carefully read the two articles you recommended and discussed them:

(1) The article puts forward three methods to solve the problem of finding relative locations of ROIs: 1. Geolocation based image retrieval (GLBIR), 2.Unsupervised feature technique Principal component analysis (PCA) and 3. Multiple region-based image retrieval.

The purpose is to match the image with the geographic areas, that is, to find the corresponding geographical location of the ROI in the image. The ROI mentioned in the article refers to the areas selected as the focus of image analysis.

(2) The article presents a light field video dataset captured with a plenoptic camera and designs a new region-of-interest (ROI)-based video compression method for light field videos. Its proposal realizes a high compression in bitrates and exhibits synthesized view in identical visual quality as their ground truth. Similarly, the ROI mentioned in this article is the areas of image which people pay their attention on.

It is worth noting that there is a big difference between the ROI mentioned in the two articles and that of our article. We briefly describe the concept of ROI related to our research in the Abstract:

geographic regions that contains many POIs and expresses rich environmental information.

Moreover, the detailed definition of ROI in our article is clarified in definition 3 of the Problem Statement:

Definition 2. ROI: ROI is a relevant region R where a certain number of POI satisfying the query locate. With a POI regarded as an atom in this region, R is represented as a mixed POI set R = {p1, p2, p3,..., pn}, where pi is a POI with one type label ti. After the ROI division, each region is viewed as the candidate ROI to be matched. More details will be explored in-deeply in Section 4, now the ROI can be treated as an abstract set in here.

We introduce the definition of ROI in wikipedia to further emphasize the gap:

(https://en.wikipedia.org/wiki/Region_of_interest)

“A region of interest (often abbreviated ROI), are samples within a data set identified for a particular purpose. The concept of a ROI is commonly used in many application areas. For example, in medical imaging, the boundaries of a tumor may be defined on an image or in a volume, for the purpose of measuring its size. In geographical information systems (GIS), a ROI can be taken literally as a polygonal selection from a 2D map. In computer vision and optical character recognition, the ROI defines the borders of an object under consideration. In many applications, symbolic (textual) labels are added to a ROI, to describe its content in a compact manner. Within a ROI may lie individual points of interest (POIs).”

The articles you recommend may not fit well with our research topic because they focus on the ROI in computer vision but our research topic relates to the ROI in the field of GIS. Considering your suggestions and the ambiguity about the definition of the ROI, we have added these references about the ROI definition to emphasize our research topic.

 

 

Point 3: figure 1 and 2 need more detail explanation.

 

Response 3: Based on your suggestion, we rewrite descriptions about Figure 1 and Figure 2 and add more detail explanation to make the contents more clear and intelligible:

Figure 1:

An instance of top-K Similarity search is shown in Figure 1. The query Q groups is {school}. There are 4 candidate ROIs to be matched. Assuming K=1, the ROI colored with red is returned as the top-1 result. It is worth mentioned that the similarity calculating takes into account the environment semantics of regions. On the basis of the well-trained distributed representation of POIs, our method will generate the corresponding vector for each candidate ROI, which contains the internal environmental information and structural characteristics of ROI, i.e., the environmental semantics of the region. Thus, the vector corresponding to the query keyword will be treated as the search condition to find the top-K ROI matching the query vector. The example in figure 1 is to calculate similarity score between the vector corresponding to the query keyword Q {school} and each candidate ROI vector to find the top-1 result.

                                             

Figure 1. The blue points represent the buildings with type label “school” and the yellow points indicate the buildings with type label “residential buildings”. Finally, the red marked region is returned as the results of our tok-1 query by matching the environment information of each candidate ROI for query.

Figure 2:

The workflow diagram of our method is designed in Figure 2. First of all, we will describe the data to train our POI vectors in Section 4.2 and consider it as the input of workflow. According to specific intentions, the procedure of the workflow is made of three steps:

1.         Firstly, the raw date that contains a large number of POIs with type labels is used to construct the corpus (an organized computer-readable collection of text or speech in the field of NLP) of POIs. The skip-Gram model of Word2Vec train the POI corpus to express POIs in a way of high-dimensional vectors, which can capture their semantic information and environmental state. The latent semantic association of POI embedding vectors is revealed in correlation analysis; (Section 4.3)

2.         Secondly, a grid division in research region is built to acquire the candidate ROIs, each of which is viewed as a POI set. The candidate ROIs can be described in a form of vector by the product of the step 1 (POI embedding vectors). At the same time, two variant methods of generating candidate ROIs are introduced to make the ROI vector description more reasonable; (Section 4.4)

3.         Finally, the products of previous step, candidate ROI vectors, are consider as the input of this step. They will be utilized to calculate the relevance score by similarity formula with the query vector in keeping with user’s keyword query group Q. Therefore, based on different query mode, the top-K ROI related with user’s query is returned as the final result. (Section 4.5)

Figure 2. Workflow of the spatial keyword query of ROI with the distributed representation of POIs.

 

 

Point 4: what are main attributes to calculate region of interest add table (ROI attributes) and how many ROI calculate in table 4.

 

Response 4: (1) As explained in the manuscript, we considered the bottom-level types of POI included in each candidate ROI to calculate the vector of candidate ROI, which includes the distribution characteristics and category information of POI in each ROI. Therefore, the label set of bottom-level types of POI in each candidate ROI can be regarded as the main attributes.

(2) Table 4 presents the results of POI vectors clustering analysis, which reveals that the POI embedding method in Section 4.3 can effectively capture semantic associations among similar POI types. This process doesn’t involve the grid division and the number of ROI.

 

Point 5: add implementation algorithm and its detail also provide the implementation code supplementary data.

 

Response 5: According to your suggestion, we have added the implementation algorithm of our method based on our workflow in Figure 2 and provided the implementation code as the supplementary data.

Our approach can be resumptively divided into three steps:

l  The implementation algorithm of step 1:

This step about POI embedding mainly involves the implementation of the Word2Vec model in the field of deep learning. We don’t mention the implementation algorithm of this step in our manuscript for two reasons:

1. Most research related to the methodology of this step focus on the network structure and implementation function of deep learning frame, which don’t discuss the underlying algorithm implementation deeply. Because this part of the content is quite mature, it may be redundant to show it in the paper. This situation can be seen in classic articles in the field:

-- Mikolov, Tomas , et al. "Distributed Representations of Words and Phrases and their Compositionality." Advances in Neural Information Processing Systems 26(2013):3111-3119.

-- Zamani, Hamed , and W. B. Croft . " [ACM Press the 40th International ACM SIGIR Conference - Shinjuku, Tokyo, Japan (2017.08.07-2017.08.11)] Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, - SIGIR \"17 - Relevance-based Word Embedding." (2017):505-514.

2. We just constructed POI corpus as the input of Word2Vec model in this step, and explored the latent environmental semantic information of POI through this model without greatly changing the original model.

Therefore, the code about this process can be available online:

https://github.com/tensorflow/tensorflow/blob/9590c4c32dd4346ea5c35673336f5912c6072bf2/tensorflow/examples/tutorials/word2vec/word2vec_basic.py

Or, we suggest you could use the Word2Vec implementation in Gensim:

https://radimrehurek.com/gensim/

This implementation optimizes the training process and provides an interface for setting parameters.

 

l  The implementation algorithm of step 2:

In step 2, we mainly describe the implementation algorithms of the two strategies (TF-IDF Method and Gaussian Kernel) and omit the implementation of grid division because it is simple and easy to understand.

The implementation algorithm and its details of the TF-IDF Method can be found in Section 4.4.2:

Algorithm 1 describes the detailed implementation of TF-IDF method, which be processed after grid division. Each Si in candidate ROI set S is a POI set, where each POI has a type label tj corresponding to POI vector vtj. First, inverse document frequency of each tj is calculated, and then the corresponding weights wj of each POI vector are calculated by lines 5-6 for each Si. Each candidate ROI vector Ri can be obtained by Eq. (5), where POI vectors set v will participate in the compute. Eventually, it returns the candidate ROI vectors set R.

Algorithm 1: TF-IDF Method

Input(1) candidate ROI set S (2) POI vectors set v(3) type labels set t

Outputcandidate ROI vectors set R

1: for each tj  t do

2:     IDF(tj)   = result by Eq.(6)

3: for each Si  S do              

4:     for each tj  t do                  

5:              TF(tj)   = the frequency of POI with label tj in i-th   ROI

6:              wj   = IDF(tj)   * TF(tj)

7:     Ri   = result by Eq.(5)

8: return R

 

The implementation algorithm and its details of the Gaussian Kernel can be found in Section 4.4.3:

Considering the parameters (a and b) of the grid division, candidate ROI vectors set R can be represented as vectors matrix R(a,b) in Algorithm 2. First, lines 1-2 performs the expansion and filling process shown in Figure 4. Next, the convolution multiplication of Eq. (8) is performed for each unexpanded ROI vector R(i,j) on the augmented matrix. As a result, an adjusted candidate ROI vectors matrix R’(a,b) will be returned.

Algorithm 2: Gaussian Kernel Method

Input(1) candidate ROI vectors matrix R(a,b) (2) convolution   kernel K

Outputadjusted candidate ROI vectors matrix R(a,b)’

1: expand R(a,b) to the size of R(a+2,   b+2)

2: fill the expended parts with 0 vector

3: for each R(i,j)  R(a+2, b+2) do

4:     if R(i,j)  the expended parts then

5:              R’(i,j)   = result by Eq.(8)

6: return R’(a,b)

 

The code implementation for this step can be found in RALL.py as the supplementary data.

 

l  The implementation algorithm of step 3:

In step 3, we mainly describe the implementation algorithms of keyword query search. Because of the similarity between the two query modes, we summarize it and show the algorithm implementation in Section 4.5.2:

Because there is a high similarity between the two query modes, the query search implementation of them will be shown in Algorithm 3 together. Lines 1-5 generate the query vector group Qv corresponding to the keyword query group Q. Lines 6-7 perform the average operation on Qv. At this time, no matter whether it is single keyword query or multi-keyword query, the output is the average query vector Qmean. Then, the similarity score between each candidate ROI vector Ri and Qmean is calculated by Eq.(10), which can be used to sort the candidate ROI vector Ri in descending order. Finally, it returns the top-K ROI Rtop-K relevant to query.

Algorithm 3: Query Search

Input(1)candidate ROI vectors set R (2)keyword query   group Q (3) parameter   K (4)POI vectors set v (5)type labels set t

OutputThe top-K ROIs related to query  Rtop-K

1: Qv = {Ø}

2: for each qj  Q do

3:     for each tm  t do

4:              if qj = tm   then

5:                       append   vtj   into Qv

6: if Qv ≠ {Ø} then

7:     Qmean   = mean(Qv)

8: for each Ri  R do

9:     Similarity_Score(Ri)   =Similarity(Qmean, Ri) by Eq.(10)

10: sort R in descending order   of Similarity_Score(R)

11: return top-K Rtop-K in R

 

The code implementation for this step can be found in RALL.py as the supplementary data.

 

 

 

Finally, thank you very much for your valuable suggestions. We sincerely hope that you can consider our paper after we revised it.


Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

all comments are well addresses.

Author Response

Thank you very much for your valuable suggestions.

Back to TopTop