Previous Article in Journal
Rigidity of Non-Steady Gradient Ricci Solitons
Previous Article in Special Issue
Extending the Applicability of Newton-Jarratt-like Methods with Accelerators of Order 2m + 1 for Solving Nonlinear Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Structural Results on the HMLasso

by
Shin-ya Matsushita
1,* and
Hiromu Sasaki
2
1
Department of Management Science and Engineering, Akita Prefectureal University, Yurihonjo City 015-0055, Japan
2
IT Development Center Akita, JTEKT, Akita City 010-0001, Japan
*
Author to whom correspondence should be addressed.
Axioms 2025, 14(11), 843; https://doi.org/10.3390/axioms14110843 (registering DOI)
Submission received: 31 July 2025 / Revised: 5 November 2025 / Accepted: 12 November 2025 / Published: 17 November 2025

Abstract

HMLasso (Lasso with High Missing Rate) is a useful technique for sparse regression when a high-dimensional design matrix contains a large number of missing data. To solve HMLasso, an appropriate positive semidefinite symmetric matrix must be obtained. In this paper, we present two structural results on the HMLasso problem. These results allow existing acceleration algorithms for strongly convex functions to be applied to solve the HMLasso problem.

1. Introduction

Let X be an n × p design matrix and let y be an n-dimensional response. Consider the following standard linear regression model:
y = X β + ϵ ,
where ϵ is a noise term. The popular model focuses on the sparsity assumption of the regression vector. The Lasso [1] (Least absolute shrinkage and selection operator) is among the most popular procedures for estimating the unknown sparse regression vector in a high-dimensional linear model. Lasso is formulated as an 1 -penalized regression problem as follows:
min β 1 2 n y X β 2 2 + α β 1
where α > 0 is a regularization parameter and · 1 (resp. · 2 ) is the 1 (resp. 2 ) norm. Here, we consider the case where the design matrix X contains missing data. Missing data is prevalent and inevitable, which affects not only the representativeness and quality of data but also the results of Lasso regression. Therefore, it is crucial to develop a method applicable to process missing data. HMLasso [2] was proposed to effectively address this issue and is formulated as follows:
min Σ 1 2 W ( Σ S pair ) F 2 s . t . Σ 0
min β 1 2 β Σ ^ β ρ pair β + α β 1 ,
where ⊙ is the Hadamard product, · F is the Frobenius norm, S pair , W R p × p , ρ pair R p is defined using X (see Section 2 for details), and Σ ^ = argmin Σ 0 1 2 W ( Σ S pair ) F 2 (i.e., Σ ^ is the solution to problem (3)). It is known that problem (4) can be equivalently written as Lasso (2), and in the literature dedicated to sparse regression, the most encountered algorithm is the proximal gradient method when dealing with a sum of two convex functions. Therefore, the key to solving HMLasso is to consider how fast and accurately problem (3) can be solved.
To deal with this challenge, we consider two structural results on the HMlasso problem. In the first, we show that the gradient of the objective function of (3) is Lipschitz continuous; in the second, we show that the objective function of (3) is strongly convex. These results guarantee that accelerated algorithms for strongly convex functions can be applied to solve (3). Finally, we conduct numerical experiments on the real data considered in [2]. The numerical results show that the accelerated algorithm for strongly convex functions is effective for solving problem (3).

2. Preliminaries

This section reviews basic definitions, facts, and notation that will be used throughout the paper.
Let A , B R n × p . S + p denote the set of positive semidefinite matrices. Let D R p × p . The indicator function i D : R p × p ( , ] of D is defined by
i D ( B ) : = 0 ( B D ) ; ( otherwise ) .
Let f : R p × p ( , ] be a proper and convex function. The proximal mapping prox γ f ( x ) of f is defined by
prox f ( Σ ) : = argmin W R p × p f ( W ) + 1 2 W Σ F 2 .
Let Σ S p . Then, there exist a diagonal matrix Λ R p × p and a p × p orthogonal matrix U such that Σ = U Λ U . Let P S + p be the matrix projection from S p onto S + p . Then, the following holds (see, for example, [3] (Example 29.31), [4] (Theorem 6.3)):
P S + p ( Σ ) = U Λ + U ,
where Λ + is the diagonal matrix obtained from Λ by setting the negative entries to 0.
Let X R n × p be a design matrix. Set
I j k : = { i : X i j and X i k are observed }
and let n j k be the number of elements of I j k . We define matrices S pair and W as follows:
( S pair ) j k : = 1 n j k i I j k x i j x i k ( I j k ) 0 ( I j k = ) , ( W ) j k : = n j k n .
We also define a vector ρ pair as follows:
( ρ pair ) j : = 1 n j j i I j j x i j y i ( I j j ) 0 ( I j j = )
Let L [ 0 , ) and let h : R p × p R { } be a differentiable function. The gradient h of h is said to be L-Lipschitz continuous if
h ( Σ 1 ) h ( Σ 2 ) F L Σ 1 Σ 2 F ( Σ 1 , Σ 2 R p × p ) .
This condition is often called L-smoothness in the literature. Let h : R p × p R be L-smooth on R p × p . Then, we can upper bound the function h as
h ( Σ 1 ) h ( Σ 2 ) + h ( Σ 2 ) , Σ 1 Σ 2 + L 2 Σ 1 Σ 2 F 2 ( Σ 1 , Σ 2 R p × p )
(see, for example, [5] (Theorem 5.8), [6] (Theorem A.1)).
Let μ ( 0 , ) and let g : R p × p R { + } . g is said to be μ -strongly convex if for each Σ 1 , Σ 2 R p × p and λ ( 0 , 1 ) , we have
g ( λ Σ 1 + ( 1 λ ) Σ 2 ) λ g ( Σ 1 ) + ( 1 λ ) g ( Σ 2 ) μ 2 λ ( 1 λ ) Σ 1 Σ 2 F 2 .
Suppose that g is μ -strongly convex and differentiable. Then, (10) is equivalent to the following condition:
g ( Σ 1 ) g ( Σ 2 ) , Σ 1 Σ 2 μ Σ 1 Σ 2 F 2 ( Σ 1 , Σ 2 R p × p ) .

3. Main Results

In this section, we present two structural results on the HMlasso problem (3).
Define f ( Σ ) : = 1 2 W ( Σ S pair ) F 2 . In this case, the gradient f of f is described as the following:
f ( Σ ) = 1 2 W ( Σ S pair ) F 2 = W W ( Σ S pair )
see [2]).

3.1. Lipschitz Continuity

The first structural result deals with the gradient of the objective function in (3). We show the Lipschitz continuity of f .
Lemma 1.
The gradient f of f is Lipschitz continuous and its Lipschitz constant is W W F .
Proof. 
Let α ¯ be defined by α ¯ : = j = 1 p k = 1 p W j k 4 . By (12), we obtain
f ( Σ 1 ) f ( Σ 2 ) F 2 = W W ( Σ 1 S pair ) W W ( Σ 2 S pair ) F 2 = W W ( Σ 1 Σ 2 ) F 2 = j = 1 p k = 1 p W j k 2 ( Σ 1 j k Σ 2 j k ) 2 .
On the other hand,
W W F 2 Σ 1 Σ 2 F 2 = j = 1 p k = 1 p W j k 4 α ¯ · j = 1 p k = 1 p Σ 1 j k Σ 2 j k 2 .
This implies
W W F 2 Σ 1 Σ 2 F 2 f ( Σ 1 ) f ( Σ 2 ) F 2 = j = 1 p k = 1 p α ¯ W j k 4 Σ 1 j k Σ 2 j k 2 0 ,
where the last inequality follows from α ¯ W j k 4 0 for any j and k. Hence,
f ( Σ 1 ) f ( Σ 2 ) F W W F Σ 1 Σ 2 F .
   □

3.2. Strong Convexity

We next consider the strong convexity of f. This is our second result.
Lemma 2.
f is min l , m ( W W ) l , m -strongly convex.
Proof. 
We show (11) with constant min l , m ( W W ) l , m . Let Σ 1 , Σ 2 R p × p .
f ( Σ 1 ) f ( Σ 2 ) , Σ 1 Σ 2 = W W ( Σ 1 Σ 2 ) , Σ 1 Σ 2 = j = 1 p k = 1 p W j k 2 ( Σ 1 j k Σ 2 j k ) 2 j = 1 p k = 1 p min j , k ( W W ) j , k ( Σ 1 j k Σ 2 j k ) 2 = min l , m ( W W ) l , m Σ 1 Σ 2 F 2 ,
where the second-to-last inequality follows from W j k 2 min l , m ( W W ) l , m for any j and k. This implies that f is min l , m ( W W ) l , m -strongly convex.    □
Remark 1.
From Lemmas 1 and 2, the objective function of (3) is strongly convex and has a Lipschitz gradient. It should be noted that we can guarantee a faster convergence than just a regular convex function if we know the objective function is strongly convex and smooth (see, for example, [6]).

4. Numerical Experiments

In this section, we consider a strongly convex variant of FISTA [6] (Algorithm 1) to solve problem (3).

4.1. Strongly Convex FISTA

Let f : R p × p R be an L-smooth and μ -strongly convex function with dom ( f ) = R p × p , and let h : R p × p R { } be a convex function with { Σ : h ( Σ ) < } . We consider the problem of minimizing the sum of f and h:
min Σ R p × p f ( Σ ) + h ( Σ ) .
In this setting, forward-backward splitting strategies were introduced as classical methods for solving (13). In the context of acceleration methods, the fast iterative shrinkage threshold algorithm (FISTA) was introduced by Beck and Teboulle [7] using the idea of forward-backward splitting. This topic is addressed in many references, and we refer to [3,5,6].
Here, we focus on the following strongly convex variant of FISTA involving backtracking investigated in [6].
Algorithm 1 Strongly convex FISTA [6]
Input: 
An initial point Σ 0 R p × p , and initial estimate L 0 > μ .
1:
Initialize  Φ 0 = Σ 0 , t 0 = 0 , and some α > 1 .
2:
for  k = 0 , do
3:
    L k + 1 = L k
4:
   while do
5:
      q k + 1 = μ / L k + 1
6:
      t k + 1 = 2 t k + 1 + 4 t k + 4 q k + 1 t k 2 + 1 2 ( 1 q k + 1 )
7:
     set τ k = ( t k + 1 t k ) ( 1 + q k + 1 t k ) t k + 1 + 2 q k + 1 t k t k + 1 q k t k 2 and δ k = t k + 1 t k 1 + q k + 1 t k + 1
8:
      Ψ k = Σ k + τ k ( Φ k Σ k )
9:
      Σ k + 1 = prox h / L k + 1 ( Ψ k 1 L k + 1 f ( Ψ k ) )
10:
      Φ k + 1 = ( 1 q k + 1 δ k ) Φ k + q k + 1 δ k Ψ k + δ k ( Σ k + 1 Ψ k )
11:
     if  f ( Σ k + 1 ) f ( Σ k ) + f ( Σ k ) , Σ k + 1 Σ k + L k + 1 2 Σ k + 1 Σ k F 2 holds then
12:
         break { k will be incremented . }
13:
     else
14:
         L k + 1 = α L k + 1 { Recompute new L k + 1 . }
15:
     end if
16:
   end while
17:
end for
Output: 
An approximate solution Σ k + 1
Remark 2.
The Lipschitz and strongly convex constants of the objective function affect the convergence of the algorithm (see [6]), but are highly dependent on the number of missing values in the data (see Lemmas 1 and 2 and (6)). The above algorithm includes a backtracking procedure with respect to the smoothness constant, which allows for robust solutions even when there are many missing values in the design matrix.
Remark 3.
We demonstrate how strongly convex FISTA can be applied to (3). Set
f ( Σ ) : = 1 2 W ( Σ S pair ) F 2 and h ( Σ ) : = i S + p ( Σ ) .
In this case, problem (3) is the special instance of problem (12). The gradient f can be computed by (12). Moreover, the proximal mapping prox h is P S + p (see, for example, [3] (Example 12.25)) and hence the computation in the algorithm is simple.

4.2. Residential Building Dataset

We compared the performance of HMLasso (3) with the strongly convex FISTA (SCFISTA) and the alternating direction multiplier method (ADMM) used in [6]. All experiments are conducted on a PC with an Apple M1 Max CPU and 32 GB RAM. Methods are implemented with MATLAB (R2025a).
The numerical experiment uses an actual residential building dataset from the UCI Data Repository https://archive.ics.uci.edu/ml/datasets/Residential+Building+Data+Set (accessed on 20 May 2024). The data consisted of n = 300 samples and p = 27 variables. We set the average missing rates to 20 % , 40 % , 60 % , and 80 % . As choices for L 0 , α , and μ occurring in an algorithm, we let L 0 : = 1 , α : = 1.1 , and μ : = min l , m ( W W ) l , m . We chose 10 random initial points Σ 0 ( i ) R p × p ( i = 1 , 2 , , 10 ), and all entries at the initial point are defined from a standard normal distribution. Figures demonstrate the following function:
D k ( i ) : = Σ k ( i ) Σ cvx F and D k : = ( 1 / 10 ) i = 1 10 D k ( i ) ,
where Σ cvx denotes the solution obtained by cvx and { Σ k ( i ) } is the sequence generated by Σ 0 ( i ) and each of SCFISTA and ADMM.
Figure 1 and Figure 2 show the computation results of the relation between the distance to a solution and iteration number. As we can see from these figures, SCFISTA and ADMM converge to the solution as the number of iterations increases. Furthermore, we found it more effective to solve the HMLasso problem using the acceleration algorithm for strongly convex functions.

5. Conclusions

In this paper, we present two structural results on the HMlasso problem, covering Lipschitz continuity of the gradient of the objective function and strong convexity of the objective function. These results allow existing acceleration algorithms for strongly convex functions to be applied to solve the HMLasso problem. Our numerical experiments suggest that accelerated algorithms for strongly convex functions are computationally attractive.

Author Contributions

Methodology, Writing—Original Draft Preparation, S.-y.M.; Software, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by JSPS KAKENHI, Grant Number 23K03235.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We would like to express our sincere gratitude to Takayasu Yamaguchi of Akita Prefectural University for providing the initial inspiration for this research. This work was supported by the Research Institute for Mathematical Sciences, an International Joint Usage/Research Center located in Kyoto University.

Conflicts of Interest

Hiromu Sasaki was employed by JTEKT. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  2. Takada, M.; Fujisawa, H.; Nishikawa, T. HMLasso: Lasso with High Missing Rate. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), Macao, 10–16 August 2019; pp. 3541–3547. [Google Scholar]
  3. Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd ed.; CMS Books in Mathematics; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
  4. Escalante, R.; Raydan, M. Alternating Projection Methods; SIAM: Philadelphia, PA, USA, 2011. [Google Scholar]
  5. Beck, A. First-Order Methods in Optimization; MOS-SIAM Series on Optimization; SIAM: Philadelphia, PA, USA, 2017. [Google Scholar]
  6. D’Aspremont, A.; Scieur, D.; Taylor, A. Acceleration Methods. Fund. Trends Optim. 2021, 5, 1–245. [Google Scholar] [CrossRef]
  7. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
Figure 1. The numerical comparisons of SCFISTA and ADMM at average missing rates of 20% (left) and 40% (right) for the design matrix in (3).
Figure 1. The numerical comparisons of SCFISTA and ADMM at average missing rates of 20% (left) and 40% (right) for the design matrix in (3).
Axioms 14 00843 g001
Figure 2. The numerical comparisons of SCFISTA and ADMM at average missing rates of 60% (left) and 80% (right) for the design matrix in (3).
Figure 2. The numerical comparisons of SCFISTA and ADMM at average missing rates of 60% (left) and 80% (right) for the design matrix in (3).
Axioms 14 00843 g002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Matsushita, S.-y.; Sasaki, H. Structural Results on the HMLasso. Axioms 2025, 14, 843. https://doi.org/10.3390/axioms14110843

AMA Style

Matsushita S-y, Sasaki H. Structural Results on the HMLasso. Axioms. 2025; 14(11):843. https://doi.org/10.3390/axioms14110843

Chicago/Turabian Style

Matsushita, Shin-ya, and Hiromu Sasaki. 2025. "Structural Results on the HMLasso" Axioms 14, no. 11: 843. https://doi.org/10.3390/axioms14110843

APA Style

Matsushita, S.-y., & Sasaki, H. (2025). Structural Results on the HMLasso. Axioms, 14(11), 843. https://doi.org/10.3390/axioms14110843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop