Next Article in Journal
Overview of Distributed Machine Learning Techniques for 6G Networks
Previous Article in Journal
Eyes versus Eyebrows: A Comprehensive Evaluation Using the Multiscale Analysis and Curvature-Based Combination Methods in Partial Face Recognition
 
 
Article
Peer-Review Record

Maximum Entropy Approach to Massive Graph Spectrum Learning with Applications

Algorithms 2022, 15(6), 209; https://doi.org/10.3390/a15060209
by Diego Granziol *,†, Binxin Ru, Xiaowen Dong, Stefan Zohren, Michael Osborne and Stephen Roberts
Reviewer 1: Anonymous
Reviewer 2:
Algorithms 2022, 15(6), 209; https://doi.org/10.3390/a15060209
Submission received: 30 April 2022 / Revised: 26 May 2022 / Accepted: 26 May 2022 / Published: 15 June 2022

Round 1

Reviewer 1 Report

Dear Authors, the manuscript "Maximum Entropy approach to Massive Graph Spectrum learning with Applications" examines a very interesting topic, but it is very difficult to read it and keep all thoughts and notations alive. In some parts the reader may have some confusion or questions. While at the same time I have the following suggestions to make:

\item In section 2, it is important to state whether you are only dealing with simple graphs (without loops and multiple edges). Instead of a non-negative weight $W_{ij}$ between two vertices $v_i$ and $v_j$ it would be better to write $w_{ij}$. Moreover, for unweighted graphs, it should also be noted that $w_{ij}=0$ for two unconnected nodes.  

\item In section 3, Appendix 6 is not part of [18].  Give some description of function $\delta$ for the spectral density in the equation (2).

\item In section 3.1, the smoothed spectral density given by (3) it's not clear to me. At first, using the equation 2, I think that $\displaystyle p(\lambda')=\sum_{i=1}^mw_i\delta(\lambda'-\lambda_i)$, where $m$ is the number of iterative steps as you write before. The cardinality of the vertex set $V(G)$, you denoted by $n$, can be very large. What if $m$ is much less than $n$? The potential reader must already be confused in this step. Please provide the arguments or give an idea. The last part of (3) in the form $\displaystyle \sum_{i=1}^n w_ik_\sigma(\lambda-\lambda_i)$ must be also justified.  Note that this summation via $n$ (not through $m$) for $w_i$ is used in the next parts.

\item In lines 60-63, you write about two same assumptions 3.1 with no reference. 


Due to the limited readability it was unfortunately not possible for me to observe the correctness of the described results, the work is written in an unclear style. The paper is not suitable for publication in the journal Algorithms in this form.

Comments for author File: Comments.pdf

Author Response



Dear Authors, the manuscript ”Maximum Entropy approach to Massive Graph Spec- trum learning with Applications” examines a very interesting topic, but it is very dicult to read it and keep all thoughts and notations alive. In some parts the reader may have some confusion or questions. While at the same time I have the following suggestions to make:

  1. In section 2, it is important to state whether you are only dealing with simple graphs (without loops and multiple edges). Instead of a non-negative weight Wij between two vertices vi and vj it would be better to write wij. Moreover, for unweighted graphs, it should also be noted that wij = 0 for two unconnected nodes.

    We do not only consider simple graphs. Our method is completely general, as we only provide a method for estimating the spectrum of any undirected graph. Note that a directed graph would have imaginary eigenvalues for its adjacency matrix in general.We have added in that the w_{ij}=0 for two unconnected nodes.

  2. In section 3, Appendix 6 is not part of [18]. Give some description of function for the spectral density in the equation (2).

    We add the description “This can be seen as an $m$-moment matched discrete approximation to the spectral density of the graph.”

  3. In section 3.1, the smoothed spectral density given by (3) it’s not clear to me. At

    Xm i=1

    number of iterative steps as you write before. The cardinality of the vertex set V (G),

    you denoted by n, can be very large. What if m is much less than n? The potential

    reader must already be confused in this step. Please provide the arguments or give

We thank the observant reviewer for spotting the typo which must have crept in during the edit! However to address the science behind the point, note that the smoothed spectral density is the the smoothed spectral density of the underlying graph adjacency matrix or laplacian, which is determined by n not m eigenvalues. Looking at the effect of smoothing, it is clear that the m moments are changed. Now of course the argument can be repeated ad verbatim for its approximation using m weighted pseudo eigenvalues (which is what we get from the lanczos algorithm). Hence the argument holds irrespective of whether we use the m moment approximation or the n moment ground truth.

We agree upon reflection that keeping with the sum to m is more readable and thank the reviewer for pointing this out.



If m<



60-63 have no assumptions, the assumptions are

The kernel function $k_\sigma(\lambda-\lambda_{i})$ is supported on the real line $[-\infty,\infty]$

The kernel function $k_\sigma(\lambda-\lambda_{i})$ is symmetric and permits all moments.

We understand this was not clear due to a formatting error, which has been fixed.

These assumptions are required for the proof and are met by the most commonly used gaussian smoothing kernel and so are justified in practice. The rest is actually the result from the theorem.

We have clarified this confusion in the text.

 

 

Reviewer 2 Report

 

Comments for author File: Comments.pdf

Author Response

Comments on Manuscript 1229879 "Maximum Entropy approach to Massive Graph Spectrum learning with Applications"

The paper builds on a previous work by the same authors [Ref.15] related to a Maximum Entropy (MaxEnt) Method for E¢cient Approximations for spectra of massive graphs. Main claims are that:

CLAIM #1) Kernel smoothing, previously used to visualize and compare graph spectral densities, may modify/a§ect moment evaluation;

CLAIM #2) the new MaxEnt method: ! is computationally e¢cient ! does not a§ect moment evaluation and permits to determine analytic forms

for symmetric and non-symmetric KullbackñLeibler (KL) divergences and Shan- non entropy;

CLAIM #3) two example applications are provided.

Remark #1 - Unclear issue

Regarding the Lanczos algorithm, mentioned in Sec.6, it is unclear whether the issue mentioned above in CLAIM #1 applies or not. This point should be clariÖed.

CONCLUSION

The paper appears, generally, clarly written and the contents of the paper are interesting and appropriate. For approval a revision addressing Remark #1 is required.

 

 

Just to clarify, the moment information which is given using the lanczos algorithm and that using MaxEnt are identical. However, when looking at KL divergences, so that the densities have the same support, they must be smoothed, this is what introduces the error in the moment information. Furthermore, when estimating the mass at zero, since the lanczos algorithm is unlikely to give a point mass at 0 (for a positive definite matrix this is certainly not going to happen) the density must again be smoothed to give an estimate of the mass at zero. As the nature of this smoothing (I.e the bandwidth) is usually done arbitarily (we may not have any ground truth) and it does not respect moment or constraint information (for example that lambda >0 in certain instances), we can expect this to lead to worse performance than a method which does respect this information. This follows the study in question. We append this discussion into the paper

 

As we will further discuss in this paper, whilst this approximate spectral density respects the moment information of the original matrix, in practice as this spectral density is discrete, it must be smoothed. This smoothing neither respects moment nor bound information (for example the matrix may be positive definite). We hence expect a method which respects such information when relevant to provide superior performance, this is the subject of enquiry of this paper.”

 

Round 2

Reviewer 1 Report

Dear Authors, despite multiple fixes, there are still some inconsistencies in your article. So, I suggest the following improvements:

\item In the equation (5), the last summation is also over $m$ instead of $n$. \item Adding text in lines 77-78 is not completed.

\item In line 88, the notation "nz" in $\mathcal{O}(d \times m \times n_{nz})$ is not clear to me.
\item In Algorithm 1, I think that there is a problem with indexes in steps 6 and 7 for $z_i^{'}$.
\item The square of the first component of the corresponding eigenvector $[\phi_i]_1^2$ is in another form for the following (7).  
\item It is also worth noting what $q(\lambda)$ with the corresponding coefficients $\beta_i$ mean in Section 7.
\item Descriptions included in your figures must be changed, they are too large. Moreover, (a) and (b) in Figure 2, and (b) in Figure 4 are difficult to see.
\item In line 160, there is a formatting error between $p$ and $n$.
\item In line 199, there is a missing reference.
\item In line 220, instead of "node $i$", should be "node $v_i$". Your notation $\sum_{g\in(g,1)}$ for all neighbors of $v_1$ is very confusing. Similarly, $\sum_{g\in(g,m+1)}$ of $v_{m+1}$. First, I suggest to define the set of vertices of adjacent vertices for a given vertex $v_i$ (e.g. $N(v_i)$ as the open neighborhood of $v_i$) and then using it in the considered summation. This is a standard technique in the graph theory.

The results seem correct and the paper is suitable for publication in the journal Algorithms after minor revisions.

Comments for author File: Comments.pdf

Author Response

Thank you for continually spotting small typos.

O(nnz) means the number of non zero entries in the corresponding matrix (ammended in the script)

Algorithm 1 fixed, the issues with figures also fixed but we have kept the captions as is because we really believe it adds value in terms of explaining the figures without reference to the paper.

lambdas explained again. Although we think this is very clear in the paper. They are the lagrange multipliers which are learned in the optimisation procedure. Formatting errors and missing references corrected, along with the use of standard graph theory notation.

Reviewer 2 Report

See enclosed comments

Comments for author File: Comments.pdf

Author Response

Thank you for your support.

Back to TopTop