Next Article in Journal
Bi-Level Phase Load Balancing Methodology with Clustering-Based Consumers’ Selection Criterion for Switching Device Placement in Low Voltage Distribution Networks
Previous Article in Journal
Some Fixed Point Results of Weak-Fuzzy Graphical Contraction Mappings with Application to Integral Equations
 
 
Article
Peer-Review Record

A Block Coordinate Descent-Based Projected Gradient Algorithm for Orthogonal Non-Negative Matrix Factorization

Mathematics 2021, 9(5), 540; https://doi.org/10.3390/math9050540
by Soodabeh Asadi 1 and Janez Povh 2,3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Mathematics 2021, 9(5), 540; https://doi.org/10.3390/math9050540
Submission received: 8 December 2020 / Revised: 15 February 2021 / Accepted: 24 February 2021 / Published: 4 March 2021
(This article belongs to the Section Mathematics and Computer Science)

Round 1

Reviewer 1 Report

Authors propose a new orthogonal non-negative matrix factorization.

The article is technically sound, and experiments have been correctly conducted. 

In the following some comments:

- Authors should better emphasize why there is a need on this new algorithm. Related work section should highlight the limits of the state-of-the-art and explain how the proposed approach overcomes them.

- Literature is outdated. Please refer to more recent works such as


https://doi.org/10.1007/s40314-020-1091-2 , https://doi.org/10.3390/app9245552 ,
https://doi.org/10.1016/j.knosys.2020.106054 ,
https://doi.org/10.1177/1177932220906827, https://doi.org/10.1109/TSP.2020.2991801

 

  • Equation 2) could you please explain why 1 has been added to the denominator?
  • Please highlight the best values in tables 2 and 3 with bold style
  • Please discuss the complexity order of the proposed approach and compare it with those of algorithm 2 and 3

 

 

Author Response

Dear reviewer 1,

we would like to start by thanking for the time and care in reviewing our paper. We have rewritten several parts of our paper to make its contributions clearer and addressed all the comments, as explained in this letter. In order to make the recent changes in our revision clearer, they are coloured blue

  1. Authors should better emphasize why there is a need on this new algorithm. 

Response: We added the following paragraph at the end of Subsection 1.2. While the classic non-negative matrix factorization problem NMF has achieved a great attention in the recent decade, see also the recent book [12], and several methods have been devised to compute approximate optimal solutions, the problems with the orthogonality constraints ONMF–bi-ONMFwere much less studied and the list of available methods are much shorter. Most of them are related to the fixed point method and to some variant of update rules. Especially meeting both orthogonality constraints in bi-ONMF, which is relevant for co-clustering of the data, is still challenging and very limited research has been done in this direction, especially with methods that are not related to the fixed point method approach.

2. Related work section should highlight the limits of the state-of-the-art and explain how the proposed approach overcomes them. 

Response: This is done on page 4, lines 105-108; as it mentions that our work considers also the deviation of the factors from orthonormality which is important for the feasibility of the solutions. We measure this quantity for the three algorithms in all computations.

3. Literature is outdated. Please refer to more recent works such as
https://doi.org/10.1007/s40314-020-1091-2 , https://doi.org/10.3390/app9245552 ,
https://doi.org/10.1016/j.knosys.2020.106054 ,
https://doi.org/10.1177/1177932220906827, https://doi.org/10.1109/TSP.2020.2991801

Response: All the above suggested recent works are cited in the current version with relevant comments on page 3, lines 68-91. Additionally, we also introduced so other recent literature, like the book of Gillis from 2020.

4. Equation 2) could you please explain why 1 has been added to the denominator?  

Response: A sentence is added right after equation 2 (page 4, line 109) to explain that the 1 in the denominator prevents the numerical difficulties when the data matrix R has a very small Frobenius norm.

 

5. Please highlight the best values in tables 2 and 3 with bold style 

Response: Done. The smallest values in each row of all tables are highlighted in bold.

 

6. Please discuss the complexity order of the proposed approach and compare it with those of algorithm 2 and 3 

Response: A new section 4.5. Time complexity of all algorithms was added to address this issue. Also Figure 6 is prepared with two plots presenting how RSE is decreasing with the number of iterations and with the time, for each algorithm, on the noisy dataset with n=200.

Reviewer 2 Report

The authors have proposed a novel algorithm for orthonormal non-non-negative matrix 1

factorization problem based on projected gradient method. The article is well written and easy to understand. Result and discussion section needs a major improvement.

Minor comment:

  • A detailed explanation of Armijo rule with reference is needed.
  • Compare the computational complexity of the proposed method with other methods.

Major Comments:

  • As the authors mentioned NMF has applications in text mining, document classification, clustering, etc. These data contains noise, artifacts and unwanted components. However, the synthetic data used in the article does not contain any noise.  Authors should run simulations with noisy  synthetic data and analyse the effect of noise  on the proposed algorithm. Also compare the proposed algorithm with others on other noisy data.
  • In real world data the decomposition rank (k) is usually unknown. Authors have shown results for (p<=k). If we use p>k how will the algorithm perform?
  • The authors use RSE (R-WH) to evaluate the performance. How close are the estimated (W) and (H) to the actual W and H?      

Author Response

Dear reviewer 2,

we would like to start by thanking for the time and care in reviewing our paper. We have rewritten several parts of our paper to make its contributions clearer and addressed all the comments, as explained below. In order to make the recent changes in our revision clearer, they are coloured red

Minor comment:

  1. A detailed explanation of Armijo rule with reference is needed. 

Response: We have added on page 7, under  Algorithm 3, a short introduction to the Armijo rule as being the guarantee for the ''sufficient decrease''. The Armijo condition ensures that the line search step is not too large. The work of Armijo, where the rule originated from, is also cited here.

 

2. Compare the computational complexity of the proposed method with other methods. 

Response: This was also the comment from reviewer 1. We added a  new section 4.5. Time complexity of all algorithms to address this issue. Figure 6 is also prepared with two plots presenting how RSE is decreasing with the number of iterations and with the time, for each algorithm, on the noisy dataset with n=200.

Major Comments:

  1. As the authors mentioned NMF has applications in text mining, document classification, clustering, etc. These data contains noise, artifacts and unwanted components. However, the synthetic data used in the article does not contain any noise.  Authors should run simulations with noisy  synthetic data and analyse the effect of noise  on the proposed algorithm. Also compare the proposed algorithm with others on other noisy data.  

Response: we prepared a third dataset, which is a noisy variant of the second data set, and performed all computations for n=200. A new section 4.5 Numerical results on the noisy BION dataset, involving Table 6 and Figure 5, is entirely devoted to this issue.

 

2. In real world data the decomposition rank (k) is usually unknown. Authors have shown results for (p<=k). If we use p>k how will the algorithm perform? 

Response: In new section  4.5 Numerical results on the noisy BION dataset we also included inner dimensions equal to 120 % and 140% if true inner dimension and compared the algorithms on noisy and non-noisy data for these dimensions. Table 6 and Figure 5 also depicts what happens if p>100%.

3. The authors use RSE (R-WH) to evaluate the performance. How close are the estimated (W) and (H) to the actual W and H? 

Response: We analysed these differences and found out that they are relatively high, but mostly because of underlying  permutations. This issue is addressed in the last paragraph of Section 4.4

 

Round 2

Reviewer 2 Report

The authors have adressed the comments. 

Back to TopTop