Thresholding Approach for Low-Rank Correlation Matrix Based on MM Algorithm

Background: Low-rank approximation is used to interpret the features of a correlation matrix using visualization tools; however, a low-rank approximation may result in an estimation that is far from zero, even if the corresponding original value is zero. In such a case, the results lead to a misinterpretation. Methods: To overcome this, we propose a novel approach to estimate a sparse low-rank correlation matrix based on threshold values. We introduce a new cross-validation function to tune the corresponding threshold values. To calculate the value of a function, the MM algorithm is used to estimate the sparse low-rank correlation matrix, and a grid search was performed to select the threshold values. Results: Through numerical simulation, we found that the false positive rate (FPR), interpretability, and average relative error of the proposed method were superior to those of the tandem approach. For the application of microarray gene expression, the FPRs of the proposed approach with d=2,3 and 5 were 0.128, 0.139, and 0.197, respectively, while the FPR of the tandem approach was 0.285. Conclusions: We propose a novel approach to estimate sparse low-rank correlation matrices. The advantage of the proposed method is that it provides results that are interpretable using a heatmap, thereby avoiding result misinterpretations. We demonstrated the superiority of the proposed method through both numerical simulations and real examples.


Background
It is essential in applications to compute the correlation matrix and present it with a heatmap [1] to understand the relationship between variables or subjects. However, when the sample size is small or contains noise, the correlation matrix may be challenging to interpret regarding the relationship between variables. Here, a correlation matrix that is easy to interpret is referred to as one that satisfies the following two properties: (a) the correlation coefficients between variables that are related are high and those that are not zero or close to zero, and (b) the number of correlation coefficients to be interpreted is small. In this paper, we deal with the problem of estimating such an easily interpretable correlation matrix.
A low-rank approximation address problem (a) [2]. Various methods have been proposed to estimate low-rank correlation matrices [3][4][5][6], and there are several advantages associated with this estimation. As one of these advantages, low-rank approximations can also effectively describe the clustering structure [7], which results in improved interpretations. These specific relations between variables are emphasized from the property, and therefore problem (a) can be solved.
While the low-rank correlation matrix has the above advantages, (i) the number of correlation coefficients to be interpreted becomes problematic when the number of variables is large, making interpretation difficult, and (ii) according to the Eckart-Young-Mirsky theorem [8], the error between the low-rank correlation matrix and the original correlation matrix becomes large, and even if the true correlation coefficient is 0, it can be estimated as a value far from 0.
To overcome this, in this study, we proposed a new approach to estimate sparse lowrank correlation matrices. The proposed approach, combined with a heatmap, provides a visual interpretation of the relationships between the variables. There are two types of techniques available for the sparse methods of the correlation matrix and the covariance matrix [9,10]. The first involves adding a sparsity penalty to the objective functions [11][12][13][14][15][16][17][18][19]. The other type uses thresholding values to achieve a sparse structure.  proposed the thresholding matrix estimator [20] and various related methods have been developed [21][22][23][24]. In addition, to estimate the sparse correlation matrix, Refs. [25][26][27] used generalized thresholding operator-based methods [25]. For the estimation of sparse low-rank matrices, methods based on penalty terms have also been proposed [28,29]. In addition to that, there is another approach to estimate the covariance matrix based on the modified Cholesky decomposition (MCD) [30]. This covariance matrix estimation certainly has beneficial properties when the number of variables is high. However, the estimate depends on the order of variables [31]. To tackle the problem, several methods have been proposed [32][33][34][35].
The proposed approach adopts an approach that uses hard thresholding based [20,24] therefore, it is easy to use and provides interpretable results. We describe why we aimed to estimate a low-rank sparse correlation matrix rather than a low-rank sparse covariance matrix. The covariance matrix depends on the original multivariate data scale. Since the threshold corresponding to all covariances cannot be treated with a single value, multiple thresholds must be selected by cross-validation. In other words, estimating a low-rank covariance matrix requires multiple threshold-based approaches, which takes an enormous amount of time. On the other hand, estimating a low-rank correlation matrix can be implemented with a single-threshold approach because the scale is uniform, and can be computed relatively quickly. Furthermore, when estimating the covariance matrix, the variance of each variable must also be estimated, which increases the number of parameters to be estimated compared to the estimation of the correlation matrix. Based on the above, this method focuses on the correlation matrix from the feasibility viewpoint and develops a method for estimating sparse low-rank correlation matrices. For more on this discussion, see Refs. [14,36].
The summary of the proposed method is as follows: for small sample sizes and noisy data, a sparse low-rank correlation matrix can be used to emphasize the correlation coefficients between related variables. In addition, the inclusion of sparse constraints eliminates the problem of misunderstanding and facilitates interpretation. Furthermore, the direct estimation of the correlation matrix instead of the covariance matrix reduces the computation time comparatively. We introduce a new cross-validation function to estimate sparse low-rank correlation matrices modifying those used in Refs. [20,24]. This cross-validation function is the mean difference between the low-rank correlation matrix and the original-scale correlation matrix. Compared to the cross-validation function of Refs. [20,24], the value of the proposed cross-validation function tends to be higher when the rank is set to lower. The proposed cross-validation function is expected to choose threshold values corresponding to a more sparse correlation matrix from the property. To calculate the values of the cross-validation function, the majorize-minimization algorithm (MM algorithm) [3,4] and the hard thresholding approach are used. The proposed method has two advantages; first, the estimated sparse low-rank correlation matrix allows an easy and visual interpretation of the correlation matrix using a heatmap and avoids a misinterpretation of the correlation matrix. The proposed approach can estimate the lowrank correlation matrix, and more sparse and specific relations between variables can be emphasized. If the true correlation coefficient is zero, the proposed method estimates the corresponding coefficient as zero. In addition, we focus only on positive correlation coefficients, not negative correlation coefficients. With the focus on only positive relations, it becomes easy to interpret the features of the relations.
The rest of this paper is structured as follows. We explain the model and algorithm in Section 2. Section 3 evaluates the proposed approach and describes the numerical simulation. The results of applying the proposed method to real data are provided in Section 3. Finally, we present our conclusions in Section 4.

Adaptive Thresholding for Sparse and Low-Rank Correlation Matrix Estimation
In this section, we present the proposed approach for estimating a sparse low-rank correlation matrix. First, an estimation of a low-rank correlation matrix is described based on the MM algorithm [3,4]. Next, the hard thresholding operator and the proposed crossvalidation function are described to achieve the sparse low-rank correlation structure.

Optimization Problem of Low-Rank Correlation Matrices
Let R = (r ij ) r ij ∈ [−1, 1] (i, j = 1, 2, · · · , p) and W = (w ij ) w ij ∈ {0, 1} (i, j = 1, 2, · · · , p) be the correlation matrix between the variables and the binary matrix, respectively, where p is the number of variables. Here, w ii = 1 (i = 1, 2, · · · , p). Given the number of low dimensions d ≤ p, and the correlation matrix R and the binary matrix W, the optimization problem of estimating a low-rank correlation matrix is defined as follows.
When Y is estimated, W is fixed. In this situation, Y is calculated based on the MM algorithm. The estimation is described in Section 2.1.2. We introduced a modified cross-validation function to determine W and chose W in Section 2.1.3.

Estimation of Low-Rank Correlation Matrices Based on MM Algorithm
The MM algorithm for estimating a low-rank correlation matrix proposed by Ref. [4] is explained (Algorithm 1). To estimate Y in the closed-form under constraint (2), the quadratic optimization problem for Y must be converted to a linear optimization one. We can derive the updated formula combined with the Lagrange multiplier in the closed-form using the linear function. Let y (t) ∈ R d be the parameter of the t step of the optimization problem algorithm, and g(y|y (t) ) be a real function g : R d × R d → R. If g(y|y (t) ) satisfy the following conditions such that ∀y ∈ R d ; g(y|y (t) ) ≥ f (y) and (3) g(y (t) |y (t) ) = f (y (t) ), (4) g(y|y (t) ) is defined as the majorizing function of f (y) at the point y (t) , where f : R d → R is the original function. Simply put, to estimate the parameters in the MM algorithm, g(y|y (t) ), not f (y), should be minimized. In several situations, g(y|y (t) ) is expected to easily minimize the value. For more details on the MM algorithm, see Ref. [38].
Before deriving the majorizing function, the objective function (1) can be re-described as follows: where B i = ∑ j =i w ij y j y T j . Here, the parameter estimation of Y is conducted by y i . The corresponding part of Equation (5) and the majorizing function can be described as follows: is y i of (t − 1) step in the algorithm. Here, the inequality of Equation (7) is satisfied because B i − λ i I d is semidefinitely negative. In fact, if y i = y (t−1) i , Equations (6) and (7) become equal. Using the Lagrange multiplier method and Equation (7), the updated formula of y i is derived as follows: Algorithm 1 Algorithm for estimating the low-rank correlation matrix Input: R, d(≤ p) and small constant ε > 0 Output: Y Initialisation: Set Y (0) satisfying y (0) j = 1 for all j and set t ← 1 Calculate λ i to be the largest eigenvalue of B i

5:
Update y i based on Equation (8).  We adopt hard thresholding to estimate the sparse low-rank correlation matrix in the proposed approach. To determine threshold values, we introduce a cross-validation func-tion based on Ref. [20]. The purpose of this approach is quite simple; that is, to determine the threshold values related to sparse estimation by considering the corresponding rank.
Let h(α) ∈ (−1, 1) be a threshold value of sample correlation coefficients corresponding to the α percentile of correlations, where α ∈ [0, 1] is the percentage point. By setting the percentile point α, the corresponding threshold value h(α) is fixed. For a correlation Using them, the proportional threshold operator is defined as wherer For example, the proportional threshold operator is used in the domain of neural sci- (9) can be described as follows: Here, Equation (10) is modified for the original function of Ref. [20].
was used; however, we focused only on higher correlation coefficients and not on negative correlation coefficients. Using modifications, it becomes easy to interpret the results.
To estimate a sparse low-rank correlation matrix, we introduce the modified proportional threshold operator based on Equation (9) because the interpretation of the proportional threshold is quite simple. Given an α percentile, rank d, and the correlation matrix R, the modified threshold operator is defined as follows; where Y is the estimated correlation matrix with rank d, such as minimizing f (Y| R, W h(α),R ). Equation (12) is different from Equation (11) using a low-rank correlation matrix, although W h(α),R is calculated from the original correlation matrix, not from a low-rank correlation matrix. For the choice of the threshold value h(α), cross-validation was introduced (e.g., Refs. [20,24]). The cross-validation procedure for estimating h(α) consists of three steps, as shown in Figure 1. First, the original multivariate data X ∈ R n×p is split into two groups randomly, such as X (1,k) ∈ R n 1 ×p and X (2,k) ∈ R n 2 ×p , where n 1 = n − n/ log n , n 2 = n/ log n , and k represent the index of the number of iterations for cross-validation, and · represents floor function. For n 1 and n 2 , Ref. [20] determines both n 1 and n 2 from the point of view of theory. Second, the correlation matrices for both X (1,k) and X (2,k) are calculated as R (1,k) and R (2,k) , respectively. Third, the correlation matrix with rank d, Y is estimated, such as minimizing f (Y| R (1,k) , W h(α),R (1,k) ) with constraint (2). Fourth, for fixed h(α), the procedure from the first step to the third step is repeated K times and the proposed cross-validation function is calculated as follows.
where K is the number of iterations for the cross-validation. Among the candidates for threshold values, h(α) is selected as the value, such that the expression in Equation (13) is minimized. The algorithm for the cross-validation is presented in Algorithm 2.

Algorithm 2 Algorithm of Cross-validation for tuning proportional thresholds
Input: candidates of threshold values {h(α)} α , X and d(≤ p) Split X into X (1,k) and X (2,k)

Numerical Simulation and Real Example
This section presents a numerical simulation to evaluate the proposed approach. The numerical simulation was conducted based on Ref. [19], with some modifications. Practically, the size of the numerical data is matched to that of the real-data example in Section 2.2.2. In addition, we present a real example of applying the proposed method to a microarray gene expression data set from Ref. [40].

Simulation Design of Numerical Simulation
In this subsection, the simulation design is presented. The framework of the numerical simulation consists of three steps. First, artificial data with a true correlation matrix are generated. Second, sparse low-rank correlation matrices are estimated using two methods, including the proposed method. In addition, a sample correlation matrix and sparse correlation matrix based on the threshold also apply. Third, using several evaluation indices, these estimated correlation matrices are evaluated and their performances are compared.
In this simulation, three kinds of correlation models are used. Let I and J be a set of indices for the rows and columns of the correlation matrices, respectively. In addition, I k and J k are defined as follows: where i o and j o (o = 1, 2, · · · , 100) indicate the number of rows and columns, respectively. Using this notation, three true correlation models, R (1) respectively, where i k and j k are the maximum number of indices of I k and J k , respectively, and 1 represents the indicator function. The models for Equations (14) and (15) are called sparse models, while the model for Equation (16) is called a non-sparse model by Ref. [19]. The models for Equations (14) and (15) are used in Refs. [12,13,20]; for these, see Figure 2.
These artificial data are generated as x i ∼ N(0 p , R ( ) ) (i = 1, 2, · · · , n; = 1, 2, 3), where 0 p is a zero vector with a length of p. In this simulation, we set p = 100 and the number of cross-validations K = 5. In this simulation, there are several types of scenarios. For the number of scenarios in the estimation of sparse low-rank correlation matrices, there are 2 (setting 1) × 3 (setting 2) × 3 (setting 3) × 2 (setting 4: proposal and tandem) = 36 patterns. In addition to that, for the number of scenarios in the estimation of sparse correlation matrices without low-rank approximation, there are 2 (setting 1) × 3 (setting 3) × 3 (setting 4: sample correlation, Jiang (2013) with modification and graphical lasso [41]) = 18 patterns. Simply, there are 54 patterns in this numerical simulation. In each pattern, artificial data are generated 100 times and evaluated using several indices. Given W and rank d, the result of the estimation Y depends on the initial parameters Y (0) in both the proposed approach and the tandem approach. Therefore, the low-rank matrix is estimated from 50 randomly generated initial values, and the solution with the smallest value of the objective function is adopted. For R 1 and R 2 , the candidates of α are set from 0.66 to 0.86 in steps of 0.02. However, for R (3) , the candidates of α are set from 0.66 to 0.82 in steps of 0.02. From the point of view of computation time, we set the range of threshold values. The threshold range is different in the case of R (3) because above 0.84 it is too sparse to compute the solution of the low-rank matrix. Next, the settings of the numerical simulation are presented. For the summary, see Table 1. Setting one was set to evaluate the effects of the number of subjects. If the number of subjects is smaller, the estimated sparse low-rank correlation is expected to be unstable. To evaluate the effect of rank, setting two was set. The variance between the estimated sparse low-rank correlation coefficients becomes larger when a smaller rank is set. Therefore, it becomes easy to interpret the results. Next, as explained in Equations (14)- (16), there are three levels in setting three.

True correlation model 1
True correlation model 2 True correlation model 3 Figure 2. True correlation models. Finally, in setting four, we selected five methods: the proposed approach, tandem approach, sample correlation matrix calculation, and the sparse correlation matrix estimation based on threshold value [24] with modifications and graphical lasso [41]. The tandem approach was included in the comparison method to compare the performance of the proposed method, which considers low ranks when computing the cross-validation function, and the tandem approach, which combines existing methods. Jiang (2013) with modifications was included in the comparison method to compare the performance of the proposed method with the case where the low-rank approximation is not considered. Graphical lasso is included as a comparison method because it is often used to infer relationships between variables. However, since the method does not directly estimate the sparse correlation matrix, the correlation matrix is calculated by estimating the inverse of the sparse covariance matrix and then computing the inverse of the covariance matrix. Finally, the sample correlation matrix results are also presented for reference. Next, we explain the compared methods. The purpose of both the proposed and tandem approaches is to estimate a sparse low-rank correlation matrix. In the tandem procedure, there are two steps. The threshold value h(α) is determined based on the following cross-validation function in the first step.
Equation (17) is a modification of the cross-validation function in Ref. [24]. The modifi- [24]. In the second step, using h(α) in the first step, the low rank correlation matrix is estimated based on Ref. [4]. In short, given h(α) and R, Y is estimated as follows: with the constraint y j = 1(j = 1, 2, · · · , p). To estimate the sparse correlation matrix without dimensional reduction, there are two methods in this simulation. First, Jiang (2013) with modifications is explained, and this approach is also two steps. The first step is the same procedure as the first step in the tandem approach, and the threshold value h(α) is determined. In the second step, by using h(α) in the first step, the sparse correlation matrix is calculated as W h(α),R R. For the modification, Equation (10) is used as the threshold function (although r ij · 1 h(α) [|r ij | ≥ h(α)] was used in Ref. [24]). The second approach is graphical lasso [41]. To select the tuning parameters, the CVglasso package is used [42] in R. The correlation matrix is then calculated from the estimated sparse inverse covariance matrix. Likewise, as in the approach pursued in Ref. [19], we adopt four evaluation indices. To evaluate the fitting between the estimated sparse low-rank correlation matrix and the true correlation matrix, the average relative errors of the Frobenius norm (F-norm) and the spectral norm (S-norm) are adopted as follows: where · S indicates the spectral norm,R is an estimator of the sparse low-rank correlation matrix, and R ( ) ( = 1, 2, 3) is the true correlation matrix corresponding to Equation (14), Equation (15), and Equation (16), respectively. In addition, to evaluate the results on sparseness, the true positive rate (TPR) and the false positive rate (FPR) are defined as follows: where | · | indicates the cardinality of a set,R = (r ij ), and R ( ) = (r ( ) ij ) ( = 1, 2, 3). In addition to that, to evaluate the interpretability, we adopt the following index.
Equation (22) ranges from 0 to 1. If Equation (22) is close to 1, the estimated correlation matrix is very sparse, otherwise, that is not sparse.

Application of Microarray Gene Expression Dataset
Here, we present the results of applying both the proposed approach and the tandem approach to the microarray gene expression dataset in Ref. [40]. This real application aims to evaluate the differences between two classes of genes in the results of estimating sparse low-rank correlation matrices. Concretely, in this real example, true correlation coefficients between classes are assumed to be zero, and the FPR of the proposed approach is compared with that of the tandem approach.
In Ref. [25], the same dataset was used to apply their method. Specifically, the data set provided by the R package "MADE4" [43] is used in this example. The data set includes the 64 training samples and the 306 genes. In addition, there are four types of small round blue cell tumors of childhood (SRBCT), such as neuroblastoma (NB), rhabdomyosarcoma(RMS), and Burkitt lymphoma, a subset of non-Hodgkin lymphoma (BL), and the Ewing family of tumors (EWS). Simply, there are four sample classes in this dataset. As in Ref. [25], these genes are classified into two classes: "informative" and "non-informative", where genes belonging to "informative" have information to discriminate four classes and those belonging to "non-informative" do not.
Next, to construct the "informative" class and the "non-informative" class, the F statistics are calculated for each gene, as follows: where G indicates the number of classes, such as NB, RMS, BL, and EWS, n g is the number of subjects belonging to class g,x gj is the mean of class g for gene j,x j is the mean of gene j, and s gj is the sample variance of class g for gene j. Here, if F j is relatively higher, gene j is considered as "informative" because the corresponding j tends to include information such that each class is discriminated. From the calculated F j , the top 40 and bottom 60 genes are set as being in the "informative" and "non-informative" classes, respectively. The correlation matrix for 100 genes was then calculated and used as input data. See Figure 3.
True correlation coefficients are considered as zero in this example. To compare the results of the proposed approach with those of the tandem approach, the FPR is calculated. For the tandem approach, see Section 2.2.1. In this application, true correlations between genes belonging to the "informative" class and genes belonging to the "non-informative" class are considered to be 0. Therefore, the FPR denominator is 2 × 40 × 60 = 4800. For TPR, it is difficult to determine the true structure because correlations within each class are not necessarily non-zero. In addition to that, to evaluate the interpretability of the estimated correlation matrix, we adopt the following index: where IS and NS are a set of genes belonging to the "informative" and "non-informative" classes, respectively. If Equation (23) is larger, correlation within the same class is considered sparse. For rank, we set 2, 3, and 5. The candidates of α for determining the threshold value are set from 0.50 to 0.83 in steps of 0.01 for both approaches, and these algorithms start from 50 different initial parameters. In addition, as was performed for the numerical simulation, the sample correlation matrix, Jiang (2013) with modifications, and graphical lasso are also employed.

Results
This section presents the results of the numerical simulation and real application.

Simulation Result
In this subsection, we present the simulation results by the true correlation models. Tables 2-4 indicate the FPRs, TPRs, and Sparsity for applying R (1) , R (2) , and R (3) , respectively. Each cell indicates the mean of these indices. Here, R (2) is a non-sparse correlation matrix and, therefore, FPR cannot be calculated, and both the TPR and FPR of the sample correlation matrix cannot be calculated because the sample correlation is not a sparse matrix. From the results of the numerical simulation, the FPRs of the proposed approach was the lowest among those of all the methods in all the situations, while the TPRs of the proposed approach tended to be inferior to those of the other approaches. Simply, the proposed approach makes it a sparser low-rank correlation matrix compared to the tandem approach when a smaller rank is used. In addition to that, the result of sparsity in the proposed approach was higher than that in the other methods. From the results, we found that the proposed method provides us with interpretable results compared to the tandem method. For the result of the graphical lasso, these correlation matrices are estimated as non-sparse, although these estimated inverse covariance matrices are estimated as sparse. Therefore, both TPRs and FPRs were estimated as higher, and the sparsity in the graphical lasso tends to be 0.00 without the case of R (2) . For the relative error of the F-norm, Figure 4-6 indicate the results of applying these methods to R (1) , R (2) , and R (3) , respectively. Hence, the median of the proposed approach was lower than that of the tandem approach for each pattern. Furthermore, the interquartile range of the proposed approach was smaller than that of the tandem approach in each pattern. Therefore, we confirmed that the results of the proposed approach are effective and stable compared to those of the tandem approach. As rank is set as larger, the results of both the proposed approaches become lower and close to those of Jiang (2013) with modifications in all situations. Among these methods, the result of Jiang (2013) with modifications is the best for the relative error of F-norm in the case of R (1) and R (3) , although that of the graphical lasso (glasso) is the best in the case of R (2) . However, it is a natural thing from the properties of low-rank approximation. As was done for the F-norm, those of S-norm for R (1) , R (2) , and R (3) are shown in Figures 7-9, respectively. The tendency of the results for S-norm is quite similar to that for F-norm. From the results for F-norm, we observe that the result of the proposed approach with rank 5 is quite close to that of Jiang (2013) with modifications.
For the estimated correlation matrices, Figures 10-12 correspond to the true correlation model one, the true correlation model two, and the true correlation model three with n = 50, respectively. In the same way, Figures 13-15 correspond to true correlation model one, true correlation model two, and true correlation model three with n = 75, respectively. From Figures 10, 12, 13 and 15, we found that the estimated correlation matrices of the proposed approach tend to estimate zero correctly compared to those of the tandem approach. Especially, the tendency can be confirmed when the rank is set as lower, visually. In addition, rank is set larger, and the estimated correlation matrices tend to be close to the results of Jiang (2013) with modifications.   Figure 5. Relative errors of F-norm for R (2) with n = 50 and n = 75; the vertical axis indicates the results of relative errors of F-norm. q q q q q q q q q q q q q q Proposal rank2 Tandem rank2 Proposal rank3 Tandem rank3 Proposal rank5 Tandem rank5 Sample Jiang with modification glasso 0.0 0.5 1.0 1.5 2.0 2.5 3.0

R model 3 n =50
q q q q q q q q q q q q q q q q q q q q Proposal rank2 Tandem rank2 Proposal rank3 Tandem rank3 Proposal rank5 Tandem

R model 3 n =50
q q q q q q q q q q q q q q q Proposal rank2 Tandem rank2 Proposal rank3 Tandem rank3 Proposal rank5 Tandem rank5 Sample Jiang with modification glasso Proposed approach with d=3 Proposed approach with d=5 Tandem approach with d=2 Tandem approach with d=3 Tandem approach with d=5 Sample correlation matrix Graphical lasso Figure 10. Examples of estimated correlation matrices for true correlation model 1 (n = 50).

Sample correlation matrix
Jiang (2013) with modification Proposed approach with d=2 Proposed approach with d=3 Proposed approach with d=5 Tandem approach with d=2 Tandem approach with d=3 Tandem approach with d=5 Graphical lasso Figure 11. Examples of estimated correlation matrices for the true correlation model 2 (n = 50).

Sample correlation matrix
Jiang (2013) with modification Proposed approach with d=2 Proposed approach with d=3 Proposed approach with d=5 Tandem approach with d=2 Tandem approach with d=3 Tandem approach with d=5 Graphical lasso Figure 12. Examples of estimated correlation matrices for the true correlation model 3 (n = 50).

Sample correlation matrix
Jiang (2013) with modification Proposed approach with d=2 Proposed approach with d=3 Proposed approach with d=5 Tandem approach with d=2 Tandem approach with d=3 Tandem approach with d=5 Graphical lasso Figure 13. Examples of estimated correlation matrices for the true correlation model 1 (n = 75).

Sample correlation matrix
Jiang (2013) with modification Proposed approach with d=2 Proposed approach with d=3 Proposed approach with d=5 Tandem approach with d=2 Tandem approach with d=3 Tandem approach with d=5 Graphical lasso Figure 14. Examples of estimated correlation matrices for the true correlation model 2 (n = 75).

Sample correlation matrix
Jiang (2013) with modification Proposed approach with d=2 Proposed approach with d=3 Proposed approach with d=5 Tandem approach with d=2 Tandem approach with d=3 Tandem approach with d=5 Graphical lasso Figure 15. Examples of estimated correlation matrices for true correlation model 3 (n = 75).

Result of Application of Microarray Gene Expression Dataset
In this subsection, the results of the application of the microarray gene expression dataset are shown. For the estimated original correlation matrix, Jiang (2013) with modifications, the proposed approach tandem approach, and graphical lasso, see Figure 16. Therefore, the percentage points of d = 2, 3 and 5 in the proposed approach were estimated as α = 0.82, 0.81, and 0.75, respectively, while the percentage points in the tandem approach and Jiang (2013) with modifications were both α = 0.65. The estimated results of Jiang (2013) with modifications are presented in Figure 16. However, FPRs were higher than those of the proposed approach. Here, the FPR is unaffected by the choice of rank in the tandem approach. From these results, the estimated sparse low-rank correlation matrix tends to be sparser when the rank is set as lower. In fact, it can be confirmed in Figure 16. In addition, as the rank is set larger, the estimated correlations of the proposed approach become similar to those of the tandem approach. We also confirmed that the estimated sparse low-rank correlation matrix between genes belonging to the "informative" class tends to be similar to the results obtained in Ref. [25] using the heatmap.
Next, Table 5 shows the results of the FPR of the proposed approach, the tandem approach, Jiang (2013) with modifications, and the graphical lasso. The FPRs of the proposed method with d = 2, 3, and 5 were all lower than those of the tandem approach, Jiang (2013) with modifications, and graphical lasso. In addition, the tendency can be confirmed in Figure 16 visually. In fact, we can confirm that the proposed method was able to estimate the correlation coefficient between the classes as zero, compared to the tandem approach. The tendency was observed to be with respect to the results of the numerical simulations. Proposed approach with d = 3 Proposed approach with d = 5 Tandem approach with d = 2 Tandem approach with d = 3 Tandem approach with d = 5 Jiang (2013) with modification Graphical lasso Figure 16. Estimated sparse low-rank correlation matrices with d = 2, 3, and 5, sample correlation matrix, and sparse correlation matrix without rank reduction.

Conclusions
This study proposed a novel estimation method for sparse low-rank correlations based on the MM algorithm. The approach overcomes the problem of estimating low-rank correlation matrices. The low-rank approximation is an a potent tool, and the approach provides us with a straightforward interpretation of the features because the contrast of the estimated coefficients becomes larger. However, these estimates sometimes lead to misinterpretations. Even if the true correlation coefficient is zero, the corresponding estimated coefficient of the low-rank approximation without sparse estimation may be greater than zero. To confirm the efficiency of the proposed method, we performed numerical simulations and experimented using a real example, which involved the use of a microarray gene expression dataset. In the case of the real example, the FPR of the proposed approach with d = 2, 3, and 5 were found to be 0.128, 0.139, and 0.197, respectively, although those of the tandem approach and Jiang (2013) with modifications were found to be 0.285 and 0.285, respectively. We were, therefore able to confirm that the FPR of the proposed approach was the best, irrespective of the rank. Similarly, from the numerical simulations, we confirmed that the FPR of the proposed approach was superior to that of the tandem approach and Jiang (2013) with modifications. In addition to that, we also found that these relative errors of the proposed approach were superior to those of the tandem approach through numerical simulations. The proposed approach is considered an approximate true correlation matrix compared to the tandem approach.
Next, we refer to each method compared. The sample correlation matrix was used as a reference and for comparison. For Jiang (2013), concerning modifications, the numerical simulation results confirm that the F-norm and the S-norm are better than the proposed approach and the tandem approach because the low-rank approximation is not performed. Furthermore, Jiang (2013) with modifications is better than the sample correlation matrix without sparse constraints. Graphical lasso is a method for the sparse estimation of the inverse of the covariance, and the computed correlation matrix is not necessarily a sparse. Therefore, FPR, sparsity, and within-class sparsity were not good compared to the other methods through numerical simulations and examples of real-data applications. In other words, when a correlation matrix is estimated, it would be better to estimate a direct correlation matrix or a covariance matrix. The TPR, FPR, and sparsity of the tandem approach are almost identical to Jiang (2013) with modifications. The tandem method has a higher FPR than the proposed method and worse results for sparsity and withinclass sparsity, which are indicators of interpretability. Furthermore, the proposed method performed better for the F-norm and the S-norm because the tandem method does not take threshold values into account. However, the numerical simulation results show that the TPR tends to be better than the proposed method.
Here, we show the recommendations of the proposed approach. First, the proposed approach can provide the interpretable sparse low-rank correlation matrix from the result of both the sparsity of the numerical simulation and the within-class sparsity of the real example. Second, in a real example, the sparse low-rank correlation matrix estimated by the proposed approach was close to previous studies [21]. However, additional constraint, such as a low-rank approximation, is added to the proposed method. Third, the proposed approach can reduce the FPR compared to the tandem approach and the graphical lasso in this study. In short, the proposed approach can avoid misunderstandings compared to those methods.
Although the proposed approach showed promising results in both experiments, several limitations need further investigation. First, when the rank is set low, the TPRs of the proposed method were low compared to those of the tandem approach. The relationship between the determination of the rank and the corresponding TPR, F-norm, S-norm, sparsity, and within-class sparsity was a trade-off. Therefore, a method for a more effective determination of the rank should be developed. As one of the solutions to the problem, we can consider the introduction of a nuclear-norm regularization [44]. Second, when the percentage point is set significantly higher, it may be challenging to obtain the sparse low-rank correlation matrix because it becomes difficult to calculate an updated formula of the low-rank correlation matrix. Third, the proposed approach was more sparse than the tandem approach through numerical simulations and real examples, but this was not shown theoretically. This point also needs to be studied theoretically, but this is a topic for future research. Finally, it should be noted that the proposed approach only focuses on the positive correlation matrix and does not consider the negative correlation matrix for the simplicity of the interpretation. However, the proposed method can be extended for considering both positive and negative correlation coefficients.
Author Contributions: K.T. constructed the proposed approach and designed the numerical simulation and real application; Y.F. and K.T. implemented the algorithm for artificial and real data, and wrote the manuscript.; S.H. modified the proposed method to solve the practical problems and reviewed the manuscript. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: The gene expression data set belongs to the package R 'MADE4' in: https://www.bioconductor.org/packages/release/bioc/html/made4.html (accessed on 17 April 2022).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: