CLSP: Linear Algebra Foundations of a Modular Two-Step Convex Optimization-Based Estimator for Ill-Posed Problems

Bolotov, Ilya

doi:10.3390/math13213476

Open AccessArticle

CLSP: Linear Algebra Foundations of a Modular Two-Step Convex Optimization-Based Estimator for Ill-Posed Problems

by

Ilya Bolotov

Faculty of International Relations, Prague University of Economics and Business, W. Churchill Sq. 1938/4, Žižkov, 130 67 Prague, Czech Republic

Mathematics 2025, 13(21), 3476; https://doi.org/10.3390/math13213476

Submission received: 24 September 2025 / Revised: 25 October 2025 / Accepted: 27 October 2025 / Published: 31 October 2025

(This article belongs to the Section D: Statistics and Operational Research)

Download

Browse Figures

Versions Notes

Abstract

This paper develops the linear-algebraic foundations of the Convex Least Squares Programming (CLSP) estimator and constructs its modular two-step convex optimization framework, capable of addressing ill-posed and underdetermined problems. After reformulating a problem in its canonical form,

A^{(r)} z^{(r)} = b

, Step 1 yields an iterated (if

r > 1

) minimum-norm least-squares estimate

{\hat{z}}^{(r)} = {(A_{Z}^{(r)})}^{†} b

on a constrained subspace defined by a symmetric idempotent

Z

(reducing to the Moore–Penrose pseudoinverse when

Z = I

). The optional Step 2 corrects

{\hat{z}}^{(r)}

by solving a convex program, which penalizes deviations using a Lasso/Ridge/Elastic net-inspired scheme parameterized by

α \in [0, 1]

and yields

{\hat{z}}^{*}

. The second step guarantees a unique solution for

α \in (0, 1]

and coincides with the Minimum-Norm BLUE (MNBLUE) when

α = 1

. This paper also proposes an analysis of numerical stability and CLSP-specific goodness-of-fit statistics, such as partial

R^{2}

, normalized RMSE (NRMSE), Monte Carlo t-tests for the mean of NRMSE, and condition-number-based confidence bands. The three special CLSP problem cases are then tested in a 50,000-iteration Monte Carlo experiment and on simulated numerical examples. The estimator has a wide range of applications, including interpolating input–output tables and structural matrices.

Keywords:

convex optimization; modular estimators; least squares; generalized inverse; regularization; normalized RMSE; Monte Carlo simulation

MSC:

90C25; 65F22; 65F20; 65F35; 65C05

JEL Classification:

C13; C61; C63; C15

1. Introduction

As the existing literature on optimization—for example, Nocedal and Wright [1] (pp. 1–9) and Boyd and Vandenberghe [2] (pp. 1–2)—points out, constrained optimization plays an important role not only in natural sciences, where it was first developed by Euler, Fermat, Lagrange, Laplace, Gauss, Cauchy, Weierstrass, and others, but also in economics and its numerous applications, such as econometrics and operations research, including managerial planning at different levels, logistics optimization, and resource allocation. The history of modern constrained optimization began in the late 1940s to 1950s, when economics, already heavily reliant on calculus since the marginalist revolution of 1871–1874 [3] (pp. 43–168), and the emerging discipline of econometrics, standing on the shoulders of model fitting [4] (pp. 438–457), internalized a new array of quantitative methods referred to as programming after the U.S. Army’s logistic jargon. Programming—including classical, linear, quadratic, and dynamic variants—originated with Kantorovich in economic planning in the USSR and was independently discovered and developed into a mathematical discipline by Dantzig in the United States (who introduced the bulk of the terminology and the simplex algorithm) [5] (pp. 1–51, [6]). Since the pivotal proceedings published by Koopmans [7], programming has become an integral part of mathematical economics [8,9,10], in both microeconomic analysis (an overview is provided in Intriligator and Arrow [11] (pp. 76–91) and macroeconomic modeling, such as Leontief’s input–output framework outlined in Koopmans [7] (pp. 132–173) and, in an extended form, in Dorfman et al. [12] (pp. 204–264).

Calculus, model fitting, and programming are, however, subject to strong assumptions, including function smoothness, full matrix ranks, well-posedness, well-conditionedness, feasibility, non-degeneracy, and non-cycling (in the case of the simplex algorithm) as summarized by Nocedal and Wright [1] (pp. 304–354, 381–382) and Intriligator and Arrow [11] (pp. 15–76). These limitations have been partially circumvented by developing more strategy-based than analysis-based, computationally intensive constraint programming algorithms; consult Frühwirth and Abdennadher [13] and Rossi et al. [14] for an overview. Still, specific cases of constrained optimization problems in econometric applications, such as incorporating the textbook two-quarter business-cycle criterion into a general linear model [15] or solving underdetermined linear programs to estimate an investment matrix [16], remain unresolved within the existing methodologies and algorithmic frameworks. In response, this text develops a modular two-step convex programming methodology to formalize such problems as a linear system of the form

A z = b

, including linear(ized) constraints

C x + S y^{*} = b_{1 : k}

, with

k \leq m

, where

A \in R^{m \times n}

is a partitioned matrix,

C \in R^{k \times p}

and

S \in R^{k \times (n - p)}

are its submatrices,

b \in R^{m}

is a known vector, and

z \in R^{n}

is the vector of unknowns, which includes both the target solution subvector

x \in R^{p}

and an auxiliary slack-surplus subvector

y^{*} \in R^{n - p}

, introduced to accommodate inequalities in constraints.

The methodology (Convex Least Squares Programming, or CLSP, framework) comprises selectively omittable and repeatable actions for enhanced flexibility—and consists of two steps: (a) the first, grounded in the theory of (constrained) pseudoinverses as summarized in Rao and Mitra [17], Ben-Israel and Greville [18], Lawson and Hanson [19], and Wang et al. [20], is iterable to refine the solution and serves as a mandatory baseline in cases where the second step becomes infeasible (due to mutually inconsistent constraints)—as exemplified by Whiteside et al. [21] for regression-derived constraints and by Blair [22,23] in systems with either too few or too many effective constraints; and (b) the second, grounded in the theory of regularization and convex programming as summarized in Tikhonov et al. [24], Gentle [4], Nocedal and Wright [1], and Boyd and Vandenberghe [2], provides an optional correction of the first step, drawing on the logic of Lasso, Ridge, and Elastic Net to address ill-posedness and compensate for constraint violations or residual approximation errors resulting from the first-step minimum-norm LS estimate. Hence, it can be claimed that the proposed framework is grounded in the general algorithmic logic of Wolfe [25] and Dax [26] as further formalized by Osborne [27], who were among the first to introduce an additional estimation step into simplex-based solvers, and is comparable to the algorithms of Übi [28,29,30,31,32], the closest and most recent elaborations on the topic.

The aim of this work is to define the linear algebra foundations (the theoretical base) of the CLSP framework, present supporting theorems and lemmas, develop tools for analyzing numerical stability of matrix

A

, and discuss the goodness of fit of vector

{\hat{z}}^{*}

, illustrating them on simulated examples. Topics such as the maximum likelihood estimation of

{\hat{z}}^{*}

and corresponding information criteria (Akaike and Bayesian) are beyond its present scope. The text is organized as follows. Section 2 summarizes the historical development and recent advances in convex optimization, motivating the formulation of a new methodology. Section 3 presents the formalization of the estimator, while Section 4 develops a sensitivity analysis for

C

. Section 5 introduces goodness-of-fit statistics, including the normalized root mean square error (NRMSE) and its sample-mean test. Section 6 presents special cases, and Section 7 reports the results of a Monte Carlo experiment and solves simulated problems via CLSP-based Python 3 modules. Section 8 concludes with a general discussion. Throughout, the following notation is used: bold uppercase letters (e.g.,

A, B, C, G, M,

and

S

) denote matrices; bold lowercase letters (e.g.,

b, x, y,

and

z

) denote column vectors; italic letters (e.g.,

i, j, k, m, n, p, λ, ω

) denote scalars; the transpose of a matrix is denoted by the superscript ⊤ (e.g.,

X^{⊤}

); the inverse by

- 1

(e.g.,

Q^{- 1}

); generalized inverses are indexed using curly braces (e.g.,

G^{{1, 2}}

); the Moore–Penrose pseudoinverse is denoted by dagger (e.g.,

A^{†}

);

ℓ_{p}

norms, where

1 \leq p \leq \infty

, are denoted by double bars (e.g.,

{∥ \cdot ∥}_{2}

); and condition numbers by

κ (\cdot)

. All functions, scalars, vectors, and matrices are defined over the real numbers (

R

).

2. Historical and Conceptual Background of the CLSP Framework

The methodology of convex optimization—formally,

x^{*} = \arg \min_{x \in R^{n}} f (x)

, subject to convex inequality constraints

g_{i} (x) \leq 0

and affine equality constraints

h_{j} (x) = 0

as defined by Boyd and Vandenberghe [2] (pp. 7, 127–129)—has evolved over more than two centuries through several milestones, including generalized inverses and linear and (convex) quadratic programming, each relaxing the strong assumptions of its predecessors. As documented in the seminal works of Rao and Mitra [17] (pp. vii–viii) and Ben-Israel and Greville [18] (pp. 4–5, 370–374), the first pseudoinverses (original term, now typically reserved for

A^{†}

) or generalized inverses (current general term) emerged in the theory of integral operators—introduced by Fredholm in 1903 and further developed by Hurwitz in 1912—and, implicitly, in the theory of differential operators by Hilbert in 1904, whose work was subsequently extended by Myller in 1906, Westfall in 1909, Bounitzky in 1909, Elliott in 1928, and Reid in 1931. The cited authors attribute their first application to matrices to Moore in 1920 under the term general reciprocals [33] (though some sources suggest he may have formulated the idea as early as 1906), with independent formulations by Siegel in 1937 and Rao in 1955, and generalizations to singular operators by Tseng in 1933, 1949, and 1956, Murray and von Neumann in 1936, and Atkinson in 1952–1953. A theoretical consolidation occurred with Bjerhammar’s [34] rediscovery of Moore’s formulation, followed by Penrose’s [35,36] introduction of four conditions defining the unique least-squares minimum-norm generalized inverse

G \equiv A^{†}

, which, such that

G \in R^{n \times m}

, can be expressed in Equation (1):

(i) A G A = A (ii) G A G = G (iii) {(A G)}^{⊤} = A G (iv) {(G A)}^{⊤} = G A

(1)

In econometrics, it led to conditionally unbiased (minimum bias) estimators [37] and a redefinition of the Gauss–Markov theorem [38] in the 1960s, superseding earlier efforts [39].

Further contributions to the theory, calculation, and diverse applications of

G

were made in the 1950s to early 1970s by Rao and Mitra [17,40] (a synthesis of the authors’ previous works from 1955–1969), Ben-Israel and Greville [18], Greville [41,42,43,44], Cline [45], Cline and Greville [46], Golub and Kahan [47], Ben-Israel and Cohen [48], and Lewis and Newman [49] (the most cited sources on the topic in the Web of Science database, although the list is not exhaustive; consult Rao and Mitra [17] (pp. vii–viii) for an extended one), which, among others, introduced the concept of a

{\cdot}

-generalized inverse

G \in A^{{\cdot}}

, where

A^{{\cdot}} \subseteq R^{n \times m}

is the space of the

{\cdot}

-generalized inverses of

A \in R^{m \times n}

— hereafter,

{i, j}

-inverses for

i > 1

,

{i, j, k}

-inverses, Drazin inverses, and higher-order

G

are disregarded because of their inapplicability to the CLSP framework—by satisfying a subset or all of the (i)–(iv) conditions in Equation (1) as described in Table 1. Formally, for

G^{{1}} \in A^{{1}}

,

I_{n} \in R^{n \times n}

,

I_{m} \in R^{m \times m}

, and arbitrary matrices

U, V \in R^{n \times m}

, any

{1}

-inverse, including all

{1, j}

-inverses and

A^{†}

, can be expressed as

G = G^{{1}} + (I_{n} - G^{{1}} A) U + V (I_{m} - A G^{{1}})

, out of which only

G \equiv A^{†}

and a constrained

G \equiv G^{{1, 2}} \in R^{n \times m}

—here, the Bott–Duffin inverse [50], expressed in modern notation as

A_{S}^{†} = P_{S}^{L} {(P_{S}^{L} A P_{S}^{R})}^{†} P_{S}^{R}

, where

P_{S}^{L} \in R^{m \times m}

and

P_{S}^{R} \in R^{n \times n}

are orthogonal (perpendicular) projectors onto the rows and columns defined by an orthonormal matrix

S

—can be uniquely defined (

A_{S}^{†}

under certain conditions) and qualify as minimum-norm unbiased estimators in the sense of Chipman [37] and Price [38].

Starting from the early 1960s, Cline [45], Rao and Mitra [17] (pp. 64–71), Meyer [52], Ben-Israel and Greville [18] (pp. 175–200), Hartwig [53], Campbell and Meyer [51] (pp. 53–61), Rao and Yanai [54], Tian [55], Rakha [56], Wang et al. [20] (pp. 193–210), and Baksalary and Trenkler [57], among others, extended the formulas for

A^{†}

to partitioned matrices, including the general row-wise case,

A_{1} \in R^{m_{1} \times n}

,

A_{2} \in R^{m_{2} \times n}

,

m_{1} + m_{2} = m

—and, equivalently, column-wise one,

A_{1} \in R^{m \times n_{1}}

,

A_{2} \in R^{m \times n_{2}}

, and

n_{1} + n_{2} = n

(Equation (2), used in the numerical stability analysis of the CLSP estimator for

C_{canon} = [C S]

in the decomposition

A = B C_{canon} + R

given

A \in R^{m \times n}

,

C_{canon} \in R^{k \times n}

, and

B = A C_{canon}^{†} = A {[C S]}^{†} \in R^{m \times k}

):

{[\begin{matrix} A_{1} & A_{2} \end{matrix}]}^{†} = {({[\begin{matrix} A_{1} & A_{2} \end{matrix}]}^{⊤} [\begin{matrix} A_{1} & A_{2} \end{matrix}])}^{†} {[\begin{matrix} A_{1} & A_{2} \end{matrix}]}^{⊤} = {[\begin{matrix} A_{1}^{⊤} A_{1} & A_{1}^{⊤} A_{2} \\ A_{2}^{⊤} A_{1} & A_{2}^{⊤} A_{2} \end{matrix}]}^{†} [\begin{matrix} A_{1}^{⊤} \\ A_{2}^{⊤} \end{matrix}]

(2)

where

rank ([A_{1} A_{2}]) \leq rank (A_{1}) + rank (A_{2})

, with equality iff

R (A_{1}) \cap R (A_{2}) = {0}

, and strict inequality iff

R (A_{1}) \cap R (A_{2}) \neq {0}

. For the original definitions and formulations, consult Rao and Mitra [17] (pp. 64–66) and Ben-Israel and Greville [18] (pp. 175–200).

To sum up,

G

(especially

A^{†}

), SVD, and regularization techniques—namely, Lasso (based on the

ℓ_{1}

-norm), Tikhonov regularization (hereafter, referred to as Ridge regression, based on the

ℓ_{2}

-norm), and Elastic Net (a convex combination of

ℓ_{1}

and

ℓ_{2}

norms) [4] (pp. 477–479)—have become an integral part of estimation (model-fitting), where

A^{†}

can be defined as a minimum-norm best linear unbiased estimator (MNBLUE), reducing to the classical BLUE in (over)determined cases. These methods have been applied in (a) natural sciences since the 1940s [24] (pp. 68–156) [34,50] (pp. 85–172, [58]) [59,60,61], and (b) econometrics since the 1960s (pp. 36–157, 174–232, [19]) [37,38] (pp. 61–106, [62]) (pp. 37–338, [63]) in both unconstrained and equality-constrained forms, thereby relaxing previous rank restrictions.

The incorporation of inequality constraints into optimization problems during the 1930s–1950s (driven by the foundational contributions of Kantorovich [64], Koopmans [7], and Dantzig [6], along with the later authors [5]) marked the next milestone in the development of convex optimization under the emerging framework of programming, namely for linear cases (LP)—formally,

x^{*} = \arg \max_{x \in R^{n}} c^{⊤} x

subject to

A x \leq b

,

x \geq 0

(with the dual

λ^{*} = \arg \min_{λ \in R^{m}} b^{⊤} λ

subject to

A^{⊤} λ \geq c)

, where

A \in R^{m \times n}

,

b \in R^{m}

, and

c \in R^{n}

—and convex quadratic ones (QP)—formally,

x^{*} = \arg \min_{x \in R^{n}} \{\frac{1}{2} x^{⊤} Q x + c^{⊤} x\}

subject to

A x \leq b

(with the dual

λ^{*} = \arg \max_{λ \in R_{+}^{m}} \{- \frac{1}{2} {(A^{⊤} λ + c)}^{⊤} Q^{- 1} (A^{⊤} λ + c) - b^{⊤} λ\}

), where

Q \in R^{n \times n}

is symmetric positive definite,

c \in R^{n}

,

A \in R^{m \times n}

, and

b \in R^{m}

—[2] (pp. 146–213) (for a classification of programming, consult Figure 1) (pp. 355–597, [1]) (pp. 1–31, [6]), with Dantzig’s simplex and Karmarkar’s interior-point algorithms being efficient solvers [65,66].

In the late 1950s–1970s, the application of LP and QP extended to LS problems (which can be referred to as a major generalization of mainstream convex optimization methods; consult Boyd and Vandenberghe [2] (pp. 291–349) for a comprehensive modern overview) with the pioneering works of Wagner [67] and Sielken and Hartley [68], expanded by Kiountouzis [69] and Sposito [70], who first substituted LS with LP-based (

ℓ_{1}

-norm) least absolute (LAD) and (

ℓ_{\infty}

-norm) least maximum (LDP) deviations and derived unbiased (

ℓ_{p}

-norm,

1 < p < \infty

) estimators with non-unique solutions, whilst, in the early 1990s, Stone and Tovey [66] demonstrated algorithmic equivalence between LP algorithms and iteratively reweighted LS. Judge and Takayama [71] and Mantel [72] reformulated (multiple) LS with inequality constraints as QP, and Lawson and Hanson [19] (pp. 144–147) introduced, among others, the non-negative least squares (NNLS) method. These and further developments are reflected in the second step of the CLSP framework, where

\hat{z} = G b

is corrected under a regularization scheme inspired by Lasso (

α = 0

), Ridge (

α = 1

), and Elastic Net (

0 < α < 1

) regularization, where

{\hat{z}}^{*} = \arg \min_{z \in R^{n}} \{(1 - α) {∥z - \hat{z}∥}_{1} + α {∥z - \hat{z}∥}_{2}^{2}\}

, subject to

A z = b

.

The final frontier that motivates the formulation of a new framework is the class of ill-posed problems in Hadamard’s sense [4] (pp. 143, 241), i.e., systems with no (i) feasibility,

x \notin F : = {x | \forall_{i, j} g_{i} (x) \leq 0, h_{j} (x) = 0}

, (ii) uniqueness,

\exists x_{1} \neq x_{2} : A x_{1} = A x_{2}

, (iii) stability,

\forall ε > 0, \exists δ > 0 : ∥b - b_{δ}∥ < δ \Rightarrow ∥A^{†} b - A^{†} b_{δ}∥ \geq ε

, or formulation—as a well-posed LS, LP, or QP problem—of solution under mainstream assumptions, where

A \in R^{m \times n}

,

x \in R^{n}

, and

b \in R^{m}

, such as the ones described in Bolotov [15], Bolotov [16]: namely, (a) underdetermined,

rank (A) < min (m, n)

, and/or ill-conditioned linear systems,

κ (A) = {∥ A ∥}_{2} {∥ A^{†} ∥}_{2} ≫ 1

,

A \in R^{m \times n}

, in cases of LS and LP, where solutions are either non-unique or highly sensitive to perturbations [22,73]; and (b) degenerate,

# {active constraints at x} > dim (x) = n

, with cycling behavior,

c^{⊤} x^{(k + 1)} = c^{⊤} x^{(k)}

, and

x^{(k + 1)} \neq x^{(k)}

,

x \in R^{n}

, in all LP and QP problems, where efficient methods, such as the simplex algorithm, may fail to converge [27,74]. Such problems have been addressed since the 1940s–1970s by (a) regularization (i.e., a method of perturbations) and (b) problem-specific algorithms [2] (pp. 455–630). For LS cases, the list includes, among others, Lasso, Ridge regularization, and Elastic Net [4] (pp. 477–479), constrained LS methods [75], restricted LS [76], LP-based sparse recovery approaches [73], Gröbner bases [77], as well as derivatives of eigenvectors and Nelson-type, BD-, QR- and SVD-based algorithms [78]. For LS and QP, among others, we have Wolfe’s regularization-based technique for simplex [25] and its modifications [27,74], Dax’s LS-based steepest-descent algorithm [26], a primal-dual NNLS-based algorithm (LSPD) [79], as well as modifications of the ellipsoid algorithm [80]. A major early attempt to unify (NN)LS, LP, and QP within a single framework to solve both well- and ill-posed convex optimization problems was undertaken by Übi [28,29,30,31,32], who proposed a series of problem-specific algorithms (VD, INE, QPLS, and LS1). Figure 2 compares all the mentioned methods by functionality.

To sum up, based on the citation and reference counts in modern literature mapping software (here, Litmaps), as well as in-group relevance (i.e., the number of citations within the collection) of the above-cited works (consult Figure 3), this text incorporates the seminal studies and an almost exhaustive representation of prior research on the topic in question.

3. Construction and Formalization of the CLSP Estimator

To accommodate the above-described problem classes, a modular two-step CLSP consists of a first (compulsory) step—minimum-norm LS estimation, based on

A^{†}

or

A_{S}^{†}

(hereafter denoted as

A_{Z}^{†}

to match

z

)—followed by a second (optional) step, a regularization-inspired correction. This structure ensures that CLSP is able to yield a unique solution under a strictly convex second-step correction and extends the classical LS framework in both scope and precision, providing a generalized approach that reduces the reliance on problem-specific algorithms [28,29,30,31,32]. The estimator’s algorithmic flow is illustrated in Figure 4 and formalized below.

The first process block in the CLSP algorithmic flow denotes the transformation of the initial problem—

{\hat{x}}_{M}^{*} = {Proj}_{x_{M}} \{Q_{X} [\begin{matrix} x_{M} \\ x_{L} \end{matrix}] \in R^{p} : \forall_{1 \leq i \leq s} g_{i} (x_{M}, x_{L}) ⋛ γ_{i}, \forall_{j} h_{j} (x_{M}, x_{L}) = η_{j}\}

, where

{Proj}_{x_{M}}

is the projection onto the

x_{M}

coordinates,

x_{M} \in R^{p - l}

is the vector of model (target) variables to be estimated (

{\hat{x}}_{M}^{*}

),

x_{L} \in R^{l}

,

l \geq 0

, is a vector of latent variables that appear only in the constraint functions

g_{i} (\cdot)

and

h_{j} (\cdot)

, such that

Q_{X} [\begin{matrix} x_{M} \\ x_{L} \end{matrix}] = x \in R^{p}

,

Q_{X}

is the permutation matrix, and

g_{i} (\cdot) ⋛ γ_{i}

and

h_{j} (\cdot) = η_{j}

are the linear(ized) inequality and equality constraints,

\forall_{i, j} γ_{i}, η_{j} \in b

—to the canonical form

A^{(r)} z^{(r)} = b

(a term first introduced in LP [6] (pp. 75–81)), where (a)

A^{(r)} \in R^{m \times n}

is the block design matrix consisting of a

\forall_{i} g_{i} (\cdot)

-and-

\forall_{j} h_{j} (\cdot)

constraints matrix

C \in R^{k \times p}

, a model matrix

M = Q_{M}^{L} [\begin{matrix} M_{partial} \\ 0 \end{matrix}] Q_{M}^{R} \in R^{(m - k) \times p}

, a sign slack matrix

S = Q_{S}^{L} [\begin{matrix} S_{s} \subseteq \forall_{σ_{1}, \dots, σ_{s} \in {- 1, + 1}} diag (σ_{1}, \dots, σ_{s}) \\ 0 \end{matrix}] Q_{S}^{R} \in R^{k \times (n - p)}

, and either a zero matrix

0

(if

r = 1

) or a reverse-sign slack matrix

S_{residual} = [0 diag (- sign ({(b - A^{(r - 1)} {\hat{z}}^{(r - 1)})}_{k + 1 : m}))] \in R^{(m - k) \times (n - p)}

(if

r > 1

) (see Equation (3)):

A^{(r)} = [\begin{matrix} C & S \\ M & \{\begin{matrix} 0, & initial iteration, r = 1 \\ S_{residual}, & subsequent iterations, r > 1 \end{matrix} \end{matrix}]

(3)

and (b)

z^{(r)} = [\begin{matrix} x^{(r)} \\ {(y^{(r)})}^{*} \end{matrix}] \in R^{n}

is the full vector of unknowns, comprising (model)

x^{(r)} \in R^{p}

and (slack)

{(y^{(r)})}^{*} = [\begin{matrix} y^{(r)} \\ y_{residual}^{(r)} \end{matrix}] \in R^{n - p}

,

y_{i}^{(r)} \geq 0

and

y_{j}^{(r)} = 0

—an equivalent problem [2] (pp. 130–132), where

g_{i} {(\cdot)}^{a} = g_{i} (\cdot) + S_{i i} y_{i}^{(r)} = 0

and

h_{j} {(\cdot)}^{a} = h_{j} (\cdot)

. The constraint functions

g_{i} {(\cdot)}^{a}

and

h_{j} {(\cdot)}^{a}

must be linear(ized), so that

C, S \subseteq A^{(r)} \Leftrightarrow C x^{(r)} + S {(y^{(r)})}^{*} = b_{1 : k}

,

k \leq m

.

In practical implementations, for problems with

(m > m_{E}) \lor (n > n_{E})

, where E is an estimability limit, a block-wise estimation can be performed through the partitioning

A^{(r)} = \{A_{s u b s e t, i_{E}, j_{E}}^{(r)}\}

,

m_{subset} = m_{reduced} - 1

,

n_{subset} = n_{reduced} - 1

,

i_{E} \in {1, \dots, ⌈\frac{m}{m_{subset}}⌉}

, and

j_{E} \in {1, \dots, ⌈\frac{n}{n_{subset}}⌉}

, leading to a total of

(⌈\frac{m}{m_{subset}}⌉ \cdot ⌈\frac{n}{n_{subset}}⌉)

matrices

A_{reduced}^{(r)} \in R^{m_{reduced} \times n_{reduced}}

, composed of

A_{subset}^{(r)}

and

\sum a_{i, j}^{(r)}

,

i \notin I_{subset} \subset {1, \dots, m}

,

j \notin J_{subset} \subset {1, \dots, n}

, and a total of

(⌈\frac{m}{m_{subset}}⌉)

vectors

b_{r .} \in R^{m_{reduced}}

(as formalized in Equation (4)):

A_{reduced}^{(r)} = Q_{A}^{I} [\begin{matrix} A_{I_{subset}, J_{subset}}^{(r)} & \sum_{j \notin J_{subset}} a_{i, j}^{(r)} \\ \sum_{i \notin I_{subset}} a_{i, j}^{(r)} & \sum_{i \notin I_{subset}} \sum_{j \notin J_{subset}} a_{i, j}^{(r)} \end{matrix}] Q_{A}^{J}, b_{r .} = Q_{b}^{I} [\begin{matrix} b_{I_{subset}} \\ \sum_{i \notin I_{subset}} b_{i} \end{matrix}]

(4)

\sum a_{i, j}^{(r)}

are then treated as slack rows/columns, i.e., incorporated into

S

, to act as balancing adjustments since the reduced model, by construction, tends to inflate the estimates

{\hat{a}}_{i_{E}, j_{E}}

. However, full compensation of information loss, caused by aggregation, is infeasible; hence, estimable (smaller) full models are always preferred over reduced (per parts) estimation.

The second process block, i.e., the first step of the CLSP estimator, denotes obtaining an (iterated if

r > 1

) estimate

{\hat{z}}^{(r)} = G^{(r)} b

(alternatively,

(⌈\frac{m}{m_{subset}}⌉ \cdot ⌈\frac{n}{n_{subset}}⌉)

-times

{\hat{z}}_{reduced}^{(r)} = G_{reduced}^{(r)} b_{r .}

with

{\hat{z}}_{i_{E}, j_{E}}^{(r)} = \{{\hat{z}}_{reduced, 1 : m_{subset}, 1 : n_{subset}}^{(r), (i_{E}, j_{E})}\}

) through the (constrained) Bott–Duffin inverse

G^{(r)} \equiv {(A_{Z}^{(r)})}^{†} \in R^{n \times m}

[50] (pp. 49–64, [20]), defined on a subspace specified by a symmetric idempotent matrix

Z \in R^{n \times n}

(regulating solution-space exclusion through binary entries 0/1, such that

P_{Z}^{L} = Z

restricts the domain of estimated variables to the selected subspace, while

P_{Z}^{R} = I_{m}

leaves the data vector, i.e., the input

b

, unprojected).

G^{(r)} \equiv {(A_{Z}^{(r)})}^{†}

reduces to the Moore–Penrose pseudoinverse

G^{(r)} \equiv {(A^{(r)})}^{†} \in R^{n \times m}

when

Z = I_{n}

, and equals the standard inverse

G^{(r)} \equiv {(A^{(r)})}^{- 1}

iff

Z = I_{n} \land rank (A^{(r)}) = m = n

(the application of a left-sided Bott–Duffin inverse to

A^{(r)} z^{(r)} = b

is given by Equation (5)):

{\hat{z}}^{(r)} = {(A_{Z}^{(r)})}^{†} b = [{(Z {(A^{(r)})}^{⊤} A^{(r)} Z)}^{†} Z {(A^{(r)})}^{⊤}] b, Z^{2} = Z = Z^{⊤}

(5)

The number of iterations (r) increases until a termination condition is met, such as error

|\frac{1}{\sqrt{m}} {∥(b - A^{(r)} {\hat{z}}^{(r)})∥}_{2} - \frac{1}{\sqrt{m}} {∥(b - A^{(r - 1)} {\hat{z}}^{(r - 1)})∥}_{2}| < ε

or

r > r_{limit}

, whichever occurs first—implemented in CLSP-based software—although any goodness-of-fit metric can be employed. The described condition, maximizing the fit quality—especially under a missing second step—is, however, heuristic, with a formal proof of convergence lying beyond the scope of this work. In practical implementations,

{\hat{z}}^{(r)}

is efficiently computed using the SVD (see Appendix A.1) with optional Ridge regularization to stabilize a nearly singular system.

The third process block, i.e., the second step of the CLSP estimator, denotes obtaining a corrected final solution

{\hat{z}}^{*}

(alternatively,

(⌈\frac{m}{m_{subset}}⌉ \cdot ⌈\frac{n}{n_{subset}}⌉)

-times

{\hat{z}}_{reduced}^{*}

with

{\hat{z}}_{i_{E}, j_{E}}^{*} = \{{\hat{z}}_{reduced, 1 : m_{subset}, 1 : n_{subset}}^{*, (i_{E}, j_{E})}\}

)—unless the algorithmic flow terminates after Step 1, in which case

{\hat{z}}^{*} = {\hat{z}}^{(r)}

—by penalizing deviations from the (iterated if

r > 1

) estimate

{\hat{z}}^{(r)}

reflecting Lasso (

α = 0 \Leftrightarrow {\hat{z}}^{*} = {\hat{z}}^{(r)} + \hat{u}

, where

\hat{u} = \arg \min_{u} {∥ u ∥}_{1}, s . t . A^{(r)} ({\hat{z}}^{(r)} + u) = b

), Ridge (

α = 1 \Leftrightarrow {\hat{z}}^{*} = {\hat{z}}^{(r)} + {(A^{(r)})}^{†} A^{(r)} \hat{u}

, where

{(A^{(r)})}^{†} A^{(r)} = P_{R (A^{⊤})}

, a projector equal to

I

iff

A^{(r)}

is of full column rank, and

\hat{u} = \arg \min_{u} {∥ u ∥}_{2}^{2}, s . t . A^{(r)} ({\hat{z}}^{(r)} + u) = b

), and Elastic Net (

0 < α < 1 \Leftrightarrow {\hat{z}}^{*} = {\hat{z}}^{(r)} + (1 - α) \hat{u} + α {(A^{(r)})}^{†} A^{(r)} \hat{u}

, where

\hat{u} = \arg \min_{u} \{{(1 - α) ∥ u ∥}_{1} + α {∥ u ∥}_{2}^{2}\}, s . t . A^{(r)} ({\hat{z}}^{(r)} + u) = b

) (pp. 306–310, [2]) (pp. 477–479, [4]) (the combination of Lasso, Ridge, and Elastic Net corrections of

{\hat{z}}^{*}

is given by Equation (6)):

{\hat{z}}^{*} = \arg \min_{z \in R^{n}} \{(1 - α) {∥z - {\hat{z}}^{(r)}∥}_{1} + α {∥z - {\hat{z}}^{(r)}∥}_{2}^{2}\}, α \in [0, 1], s . t . A^{(r)} z = b

(6)

where the parameter

α

may be selected arbitrarily or determined from the input

b

using an appropriate criterion, such as minimizing prediction error via cross-validation, satisfying model-selection heuristics (e.g., AIC or BIC), or optimizing residual metrics under known structural constraints. Alternatively,

α

may be fixed at benchmark values, e.g.,

α \in {0, \frac{1}{2}, 1}

, or based on an error rule, e.g.,

α = min (\frac{{(\frac{1}{\sqrt{m}} {∥(b - A^{(r)} {\hat{z}}^{*})∥}_{2})}_{α = 0}}{{(\frac{1}{\sqrt{m}} {∥(b - A^{(r)} {\hat{z}}^{*})∥}_{2})}_{α = 0} + {(\frac{1}{\sqrt{m}} {∥(b - A^{(r)} {\hat{z}}^{*})∥}_{2})}_{α = 1} + ε}, 1)

,

ε \to 0^{+}

—in CLSP-based software. In practical implementations,

{\hat{z}}^{*}

, obtained from solving a convex optimization problem, is efficiently computed with the help of numerical solvers.

To sum up, the definition of the process blocks ensures that the CLSP estimator yields a unique final solution

{\hat{z}}^{*}

under regularization weight

α \in (0, 1]

owing to the strict convexity of the objective function; see Theorem 1 (although Theorem 1 is mathematically straightforward, it is included for completeness because its corollaries establish the conditions for uniqueness and convergence that underlie special cases). For

α = 0

, the solution remains convex but may not be unique unless additional structural assumptions are imposed. In the specific case of

α = 1

, the estimator reduces to the Minimum-Norm Best Linear Unbiased Estimator (MNBLUE), or, alternatively, to a block-wise

m_{subset} \times n_{subset}

-MNBLUE across

(⌈\frac{m}{m_{subset}}⌉ \cdot ⌈\frac{n}{n_{subset}}⌉)

reduced problems (for a proof, consult Theorem 2 and its corollaries). It is, however, important to note that—when compared to the classical BLUE—the CLSP, as an MNBLUE estimator, by definition, is not constrained by the same Gauss–Markov conditions and remains well-defined under rank-deficient or underdetermined systems (

m < n

) by introducing the minimum-norm condition as a substitute for the homoscedasticity and full-rank assumptions of the original theorem. Therefore, the MNBLUE represents a generalized or, for ill-posed cases, a problem-class-specific extension of the BLUE, applicable to systems which violate one or more Gauss–Markov requirements while still preserving (a) linearity, (b) unbiasedness, and (c) minimum variance (as demonstrated in Theorem 2, although a formal theoretical treatment of the MNBLUE is beyond the scope of this work).

Theorem 1.

Let

{\hat{z}}^{(r)} = {(A_{Z}^{(r)})}^{†} b

be the r-iterated estimate obtained in the first step of the CLSP algorithm, and let

{\hat{z}}^{*}

be the second-step correction obtained through convex programming by solving

{\hat{z}}^{*} = \arg \min_{z \in R^{n}} \{(1 - α) {∥z - {\hat{z}}^{(r)}∥}_{1} + α {∥z - {\hat{z}}^{(r)}∥}_{2}^{2}\}, α \in (0, 1], s . t . A^{(r)} z = b

(i.e., the regularization parameter excludes a pure Lasso-type correction), then the final solution

{\hat{z}}^{*}

is unique.

Proof.

Let

{\hat{z}}^{(r)} = {(A_{Z}^{(r)})}^{†} b

denote the linear estimator obtained via the Bott–Duffin inverse, defined on a subspace determined by a symmetric idempotent matrix

Z

, and producing a conditionally unbiased estimate over that subspace. The Bott–Duffin inverse

{(A_{Z}^{(r)})}^{†}

is given by

{(A_{Z}^{(r)})}^{†} = {(Z {(A^{(r)})}^{⊤} A^{(r)} Z)}^{†} Z {(A^{(r)})}^{⊤}

, and the estimate

{\hat{z}}^{(r)}

is unique if

rank (A^{(r)} Z) = dim (R (Z))

. In this case,

{\hat{z}}^{(r)}

is the unique minimum-norm solution in

R (Z)

. In all other cases, the solution set is affine and given by

{\hat{z}}^{(r)} = {(A_{Z}^{(r)})}^{†} b + [ker (A^{(r)} Z) \cap R (Z)]

, where the null-space component represents degrees of freedom not fixed by the constraint

A^{(r)} Z

, and hence

{\hat{z}}^{(r)}

is not unique. At the same time, the second-step estimate

{\hat{z}}^{*}

is obtained by minimizing the function

(1 - α) {∥z - {\hat{z}}^{(r)}∥}_{1} + α {∥z - {\hat{z}}^{(r)}∥}_{2}^{2}

over the affine (hence convex and closed) constraint set

F^{(r)} = \{z \in R^{n} | A^{(r)} z = b\}

. Under

α \in (0, 1]

, the quadratic term

α {∥z - {\hat{z}}^{(r)}∥}_{2}^{2}

contributes a positive-definite Hessian

2 α I ≻ 0

to the objective function, making it a strictly convex parabola-like curvature, and, given

F^{(r)} \neq \emptyset

, the minimizer

z^{*}

exists and is unique. Therefore, the CLSP estimator with

α \in (0, 1]

yields a unique

{\hat{z}}^{*}

. □

Corollary 1.

Let the final solution be

{\hat{z}}^{*} = {\hat{z}}^{(r)} = {(A_{Z}^{(r)})}^{†} b

, where

{(A_{Z}^{(r)})}^{†}

denotes the Bott–Duffin inverse on a subspace defined by a symmetric idempotent matrix

Z

. Then (one-step)

{\hat{z}}^{*}

is unique iff

rank (A^{(r)} Z) = dim (R (Z))

. Else, the solution set is affine and given by

{\hat{z}}^{*} \in {(A_{Z}^{(r)})}^{†} b + [ker (A^{(r)} Z) \cap R (Z)]

, which implies that the minimizer

z^{*}

is not unique, i.e.,

# \arg \min (\cdot) > 1

.

Corollary 2.

Let

{\hat{z}}^{*}

be the final solution obtained in the second step of the CLSP estimator by solving

{\hat{z}}^{*} = \arg \min_{z \in R^{n}} \{{∥z - {\hat{z}}^{(r)}∥}_{1}\}, s . t . A^{(r)} z = b

(i.e., the regularization parameter is set to

α = 0

, corresponding to a Lasso correction). Then the solution

{\hat{z}}^{*}

exists and the problem is convex. The solution is unique iff the following two conditions are simultaneously satisfied: (1) the affine constraint set

F^{(r)} = \{z \in R^{n} | A^{(r)} z = b\}

intersects the subdifferential of

∥ z - {\hat{z}}^{(r)} ∥_{1}

at exactly one point, and (2) the objective function is strictly convex on

F^{(r)}

, which holds when

F^{(r)}

intersects the interior of at most one orthant in

R^{n}

. In all other cases, the minimizer

z^{*}

is not unique, and the set of optimal solutions forms a convex subset of the feasible region

F^{(r)}

; that is, the final solution

{\hat{z}}^{*} \in \arg \min_{z \in F^{(r)}} {∥z - {\hat{z}}^{(r)}∥}_{1}

, where

\arg \min (\cdot)

is convex and

# \arg \min (\cdot) > 1

.

Theorem 2.

Let

{\hat{z}}^{(r)} = {(A_{Z}^{(r)})}^{†} b

be the r-iterated estimate obtained in the first step of the CLSP algorithm, and let

{\hat{z}}^{*}

be the second-step correction obtained through convex programming by solving

{\hat{z}}^{*} = \arg \min_{z \in R^{n}} \{{∥z - {\hat{z}}^{(r)}∥}_{2}^{2}\}, s . t . A^{(r)} z = b

(i.e., the regularization parameter is

α = 1

, corresponding to a Ridge correction), then

{\hat{z}}^{*}

is the Minimum-Norm Best Linear Unbiased Estimator (MNBLUE) of

z

under the linear(ized) constraints set by the canonical form of the CLSP.

Proof.

Let

{\hat{z}}^{(r)} = {(A_{Z}^{(r)})}^{†} b

denote the linear estimator obtained via the Bott–Duffin inverse, defined on a subspace determined by a symmetric idempotent matrix

Z

, and producing a conditionally unbiased estimate over that subspace. The Bott–Duffin inverse is

{(A_{Z}^{(r)})}^{†} = {(Z {(A^{(r)})}^{⊤} A^{(r)} Z)}^{†} Z {(A^{(r)})}^{⊤}

and the estimate

{\hat{z}}^{(r)}

is unique if

rank (A^{(r)} Z) = dim (R (Z))

. Given the linear model derived from the canonical form

b = A^{(r)} z^{(r)} + ε

, where

ε

are residuals with

E [ε] = 0

, it follows that

E [{\hat{z}}^{(r)}] = E [{(A_{Z}^{(r)})}^{†} b] = {(A_{Z}^{(r)})}^{†} E [b] = {(A_{Z}^{(r)})}^{†} A^{(r)} z^{(r)}

. Substituting

{(A_{Z}^{(r)})}^{†}

yields

E [{\hat{z}}^{(r)}] = {(Z {(A^{(r)})}^{⊤} A^{(r)} Z)}^{†} Z {(A^{(r)})}^{⊤} A^{(r)} z^{(r)}

. By the generalized projection identity

{(A_{Z}^{(r)})}^{†} A^{(r)} Z = Z

, one obtains

E [{\hat{z}}^{(r)}] = Z z^{(r)}

, which proves conditional unbiasedness on

R (Z)

(and full unbiasedness if

Z = I

). Subsequently, the second-step estimate

{\hat{z}}^{*}

is obtained by minimizing the squared Euclidean distance between

{\hat{z}}^{(r)}

and

z

, subject to the affine constraint

A^{(r)} z = b

, which corresponds, in geometric and algebraic terms, to an orthogonal projection of

{\hat{z}}^{(r)}

onto the affine (hence convex and closed) subspace

F^{(r)} = \{z \in R^{n} | A^{(r)} z = b\}

. By Theorem 1, the unique minimizer

z^{*}

exists and is given explicitly by

z^{*} = {\hat{z}}^{(r)} + {(A^{(r)})}^{†} (b - A^{(r)} {\hat{z}}^{(r)})

. This expression satisfies the first-order optimality condition of the convex program and ensures that

z^{*}

is both feasible and closest (in the examined

ℓ_{2}

-norm) to

{\hat{z}}^{(r)}

among all solutions in

F^{(r)}

. Therefore,

z^{*}

is (1) linear (being an affine transformation of a linear estimator), (2) unbiased (restricted to the affine feasible space), (3) minimum-norm (by construction), and (4) unique (by strict convexity). By the generalized Gauss–Markov theorem (Chapter 7, Definition 4, p. 139, [17]) (Chapter 8, Section 3.2, Theorem 2, p. 287, [18]), it follows that

{\hat{z}}^{*}

is the Minimum-Norm Best Linear Unbiased Estimator (MNBLUE) under the set of linear(ized) constraints

A^{(r)} z = b

—i.e., the one with the smallest dispersion (in the Löwner sense) among the class of unbiased minimum second-norm least-squares estimators

{\hat{z}}^{*} = \arg \min_{z} {∥b - A z∥}_{2}^{2} = G^{{1, 3, 4}} b + ε

, where

E (ε) = 0

and

Var (ε) = 0

given the Ridge-inspired convex programming correction (residual sterilization),

{\hat{z}}^{*} = \arg \min_{z \in R^{n}} \{{∥z - {\hat{z}}^{(r)}∥}_{2}^{2}\}, s . t . A^{(r)} z = b

, ensuring strict satisfaction of the problem constraints

A^{(r)} z = b

, under the assumption of a feasible Step 2. □

Corollary 3.

Let the canonical system

A^{(r)} z^{(r)} = b

be consistent and of full column rank,

A^{(r)} \in R^{m \times n}

,

m \geq n

, and

rank (A^{(r)}) = n

. Suppose that the CLSP algorithm terminates after Step 1 and that

Z = I_{n}

, such that

{\hat{z}}^{*} = {\hat{z}}^{(r)} = {(A^{(r)})}^{†} b = {({(A^{(r)})}^{⊤} A^{(r)})}^{- 1} {(A^{(r)})}^{⊤} b

. Then, provided the classical linear model

b = A^{(r)} z^{(r)} + ε

, where

E [ε] = 0

and

Cov (ε) = σ^{2} I_{m}

, holds, the CLSP estimator is equivalent to OLS while

{\hat{z}}^{*}

is the Best Linear Unbiased Estimator (BLUE).

Corollary 4.

Let

{\hat{z}}_{reduced}^{(r)} = A_{Z_{reduced}, reduced}^{†} b_{r .}

be the r-iterated estimate obtained via the Bott–Duffin inverse, defined on a subspace determined by a symmetric idempotent matrix

Z_{reduced}

, and producing a conditionally unbiased estimate over that subspace. Subsequently, the estimate

{\hat{z}}_{reduced}^{*}

is obtained by minimizing the squared Euclidean distance between

{\hat{z}}_{reduced}^{(r)}

and

z_{reduced}

, subject to the affine constraint

A_{reduced}^{(r)} z_{reduced} = b_{r .}

, such that

{\hat{z}}_{i_{E}, j_{E}}^{*} = \{{\hat{z}}_{reduced, 1 : m_{subset}, 1 : n_{subset}}^{*, (i_{E}, j_{E})}\}

. Then each

{\hat{z}}_{reduced}^{*}

is the Minimum-Norm Best Linear Unbiased Estimator (MNBLUE) under the linear(ized) constraints defined by the canonical form of each reduced system of the CLSP, and

{\hat{z}}^{*}

is a block-wise

m_{subset} \times n_{subset}

-MNBLUE corresponding to each of the

(⌈\frac{m}{m_{subset}}⌉ \cdot ⌈\frac{n}{n_{subset}}⌉)

reduced problems.

4. Numerical Stability of the Solutions ${\hat{z}}^{(r)}$ and ${\hat{z}}^{*}$

The numerical stability of solutions, potentially affecting solver convergence in practical implementations, at each step of the CLSP estimator—both the (iterated if

r > 1

) first-step estimate

{\hat{z}}^{(r)}

(alternatively,

{\hat{z}}_{i_{E}, j_{E}}^{(r)} = \{{\hat{z}}_{reduced, 1 : m_{subset}, 1 : n_{subset}}^{(r), (i_{E}, j_{E})}\}

) and the final solution

{\hat{z}}^{*}

(alternatively,

{\hat{z}}_{i_{E}, j_{E}}^{*} = \{{\hat{z}}_{reduced, 1 : m_{subset}, 1 : n_{subset}}^{*, (i_{E}, j_{E})}\}

)—depends on the condition number of the design matrix,

κ (A^{(r)})

, which, given its block-wise formulation in Equation (3), can be analyzed as a function of the “canonical” constraints matrix

C_{canon} = [C S] \in R^{k \times n}

as a fixed, r-invariant, part of

A^{(r)}

. For both full and reduced problems, sensitivity analysis of

κ (A^{(r)})

with respect to constraints (i.e., rows) in

C_{canon}

, especially focused on their reduction (i.e., dropping), is performed based on the decomposition (Equations (7) and (8)):

A^{(r)} = B^{(r)} C_{canon} + R^{(r)} = B^{(r)} [\begin{matrix} C & S \end{matrix}] + R^{(r)}, R^{(r)} = A^{(r)} (I - P_{R ({[C S]}^{⊤})})

(7)

where

B^{(r)} = A^{(r)} C_{canon}^{†} \in R^{m \times k}

denotes the constraint-induced, r-variant, part of

A

, and, when the rows of

A^{(r)}

lie entirely within

R (C_{canon})

, i.e.,

A^{(r)} = A^{(r)} P_{R ({[C S]}^{⊤})} \Leftrightarrow R^{(r)} = 0

,

\begin{matrix} κ (A^{(r)}) & \leq κ (B^{(r)}) \cdot κ (C_{canon}) \\ κ (A^{(r)}) + Δ κ_{A} & \leq (κ (B^{(r)}) + Δ κ_{B}) \cdot (κ (C_{canon}) + Δ κ_{C}) \\ = κ (B^{(r)}) \cdot κ (C_{canon}) + Δ κ_{B} \cdot κ (C_{canon}) + κ (B^{(r)}) \cdot Δ κ_{C} + Δ κ_{B} \cdot Δ κ_{C} \end{matrix}

(8)

i.e., a change in

κ (A^{(r)})

consists of a (lateral) constraint-side effect,

κ (B^{(r)}) \cdot Δ κ_{C}

, induced by structural modifications in

C_{canon}

, an (axial) model-side effect,

Δ κ_{B} \cdot κ (C_{canon})

, reflecting propagation into

B^{(r)}

—where

B^{(r)} = A^{(r)} C_{canon}^{†}

and

A^{(r)} = f (C_{canon}, M_{canon}^{(r)})

,

M_{canon}^{(r)} = g (M, ε^{(r - 1)})

, and

ε^{(r - 1)} = b - A^{(r - 1)} {\hat{z}}^{(r - 1)}

(for a “partitioned” decomposition of

B^{(r)}

given the structure of

C_{canon}^{†}

, see Appendix A.2)—and the (non-linear) term,

Δ κ_{B} \cdot Δ κ_{C}

.

The condition number of

C_{canon} \in R^{k \times n}

,

κ (C_{canon})

, the change of which determines the constraint-side effect,

κ (B^{(r)}) \cdot Δ κ_{C}

, can itself be analyzed with the help of pairwise angular measure between row vectors

c_{i}^{⊤}, c_{j}^{⊤} \in C_{canon}

, where

i, j \in {1, \dots, k}

. For any two

c_{i}^{⊤}, c_{j}^{⊤} \in R^{1 \times n}

(alternatively, in a centered form,

{\tilde{c}}_{i}^{⊤} = c_{i}^{⊤} - {\bar{c}}_{i}, {\tilde{c}}_{j}^{⊤} = c_{j}^{⊤} - {\bar{c}}_{j} \in R^{1 \times n}

, with

{\bar{c}}_{i} = \frac{1}{n} \sum_{k = 1}^{n} c_{i, k}, {\bar{c}}_{j} = \frac{1}{n} \sum_{k = 1}^{n} c_{j, k}

),

i \neq j

, the cosine of the angle between them

cos (θ_{i, j})

(alternatively, the Pearson correlation coefficient

ρ = cos ({\tilde{θ}}_{i, j})

) captures their angular alignment—values close to

\pm 1

indicate near-collinearity and potential ill-conditioning, whereas values near 0 imply near-orthogonality and improved numerical stability (for the pairwise definition and its potential aggregated measures, consult Equations (9) and (10)):

\forall_{c_{i}^{⊤}, c_{j}^{⊤} \in C_{canon}, i \neq j} cos (θ_{i, j}) = \frac{c_{i}^{⊤} \cdot c_{j}}{∥ c_{i}^{⊤} ∥_{2} \cdot {∥ c_{j}^{⊤} ∥}_{2}} \in [- 1, 1]

(9)

The pairwise angular measure can be aggregated across j (i.e., for each row

c_{i}^{⊤}

) and jointly across

i, j

(i.e., for the full set of

(\binom{k}{2}) = \frac{k (k - 1)}{2}

unique row pairs of

C_{canon}

), yielding a scalar summary statistic of angular alignment, e.g., the Root Mean Square Alignment (RMSA):

{RMSA}_{i} = \sqrt{\frac{1}{k - 1} \sum_{\begin{matrix} j = 1 \\ j \neq i \end{matrix}}^{k} {cos}^{2} (θ_{i, j})}, RMSA = \sqrt{\frac{2}{k (k - 1)} \sum_{i = 1}^{k - 1} \sum_{j = i + 1}^{k} {cos}^{2} (θ_{i, j})}

(10)

where

{RMSA}_{i}

captures the average angular alignment of constraint i with the rest, while RMSA evaluates the overall constraint anisotropy of

C_{canon}

. As with individual cosines

cos (θ_{i, j})

(alternatively,

ρ = cos ({\tilde{θ}}_{i, j})

), values near 1 suggest collinearity and potential numerical instability, while values near 0 reflect angular dispersion and reduced condition number

κ (C_{canon})

. The use of squared cosines

{cos}^{2} (θ_{i, j})

(alternatively,

ρ^{2} = {cos}^{2} ({\tilde{θ}}_{i, j})

) in contrast with other metrics (e.g., absolute values) maintains consistency with the Frobenius norms in

κ (C_{canon})

and assigns greater penalization to near-collinear constraints in

C_{canon}

. Furthermore, a combination of

{RMSA}_{i}

,

i \in {1, \dots, k}

, and the above-described marginal effects, obtained by excluding row i from

C_{canon}

—namely,

κ (B^{(r)}) \cdot Δ κ_{C}^{(- i)}

,

Δ κ_{B}^{(- i)} \cdot κ (C_{canon})

, and

Δ κ_{B}^{(- i)} \cdot Δ κ_{C}^{(- i)}

if

R^{(r)} = 0

—together with the corresponding changes in the goodness-of-fit statistics from the solutions

{\hat{z}}^{(r) (- i)}

and

{\hat{z}}^{* (- i)}

(consult Section 5), can be visually presented over either all i, or a filtered subset, such that

{RMSA}_{i} \geq threshold

as depicted in Figure 5.

To sum up, in the CLSP estimator, a feasible solution

{\hat{z}}^{*}

(in limiting cases,

{\hat{z}}^{*} \equiv {\hat{z}}^{(r)}

), by the theoretical definition of the Bott–Duffin (Moore–Penrose) inverse (consult Section 2), always exists for the canonical form

A^{(r)} z^{(r)} = b

(therefore, there is no threshold value

κ (A^{(r)}) > κ {(A^{(r)})}_{threshold}

that would imply the nonexistence of a solution). Still, ill-conditioning of the design matrix

A^{(r)}

might hinder the software-dependent computation of (a) the SVD in Step 1 and (b) the convex optimization problem in Step 2 on an ad hoc basis, thereby manifesting the ill-posedness of the problem. The proposed decomposition and

{RMSA}_{i}

(RMSA) metric, quantifying the anisotropy of

C_{canon}

, provide a means to analyze the origin of instability and optimize the constraint structure of

A^{(r)}

—e.g., by mitigating the (numerical) issues discussed in Whiteside et al. [21] and Blair [22,23]—before re-estimation can be performed.

5. Goodness of Fit of the Solutions ${\hat{z}}^{(r)}$ and ${\hat{z}}^{*}$

Given the hybrid structure of the CLSP estimator—comprising a least-squares-based (iterated if

r > 1

) initial estimate

{\hat{z}}^{(r)}

(or

{\hat{z}}_{i_{E}, j_{E}}^{(r)} = \{{\hat{z}}_{reduced, 1 : m_{subset}, 1 : n_{subset}}^{(r), (i_{E}, j_{E})}\}

), followed by a regularization-inspired final solution

{\hat{z}}^{*}

(or

{\hat{z}}_{i_{E}, j_{E}}^{*} = \{{\hat{z}}_{reduced, 1 : m_{subset}, 1 : n_{subset}}^{*, (i_{E}, j_{E})}\}

)—and the potentially ill-posed or underdetermined nature of the problems it addresses, standard goodness-of-fit methodology of classical regression—such as explained variance measures (e.g., the coefficient of determination,

R^{2}

), error-based metrics (e.g., Root Mean Square Error, RMSE, for comparability with the above-described RMSA), hypothesis tests (e.g., F-tests in the analysis of variance and t-tests at the coefficient level), or confidence intervals (e.g., based on Student’s t- and the normal distribution)—are not universally applicable (with the exception of RMSE, these measures are valid only under classical Gauss–Markov assumptions, i.e., for overdetermined problems where

{\hat{z}}^{*} \equiv {\hat{z}}^{(r)}

). The same limitation applies to model-selection criteria, such as

R_{adjusted}^{2}

, AIC, and BIC, that presuppose the existence of a maximizable likelihood function and well-defined (i.e., greater than zero) degrees of freedom, not satisfied in the examined class of problems (equally, the formulation of a likelihood function for the two-step CLSP estimator lies beyond the scope of this work). This text, therefore, discusses the applicability of selected alternative statistics, most of which are robust to underdeterminedness: partial

R^{2}

(i.e., the adaptation of

R^{2}

to the CLSP structure), normalized RMSE, t-test for the mean of the NRMSE, and a diagnostic interval based on the condition number of

A^{(r)}

—implemented in CLSP-based software (see Equations (11)–(18)).

For the explained variance measures, the block-wise formulation of the full vector of unknowns

z^{(r)} = [\begin{matrix} x^{(r)} \\ {(y^{(r)})}^{*} \end{matrix}] \in R^{n}

—obtained from

x^{(r)} = Q_{X} [\begin{matrix} x_{M}^{(r)} \\ x_{L}^{(r)} \end{matrix}] \in R^{p}

(where

x_{M}^{(r)} \in R^{p - l}

is the vector of model (target) variables and

x_{L}^{(r)} \in R^{l}

,

l \geq 0

, is a vector of latent variables present only in the constraints) and a vector of slack variables

{(y^{(r)})}^{*} \in R^{n - p}

—the design matrix

A^{(r)} = [\begin{matrix} C_{canon} \\ M_{canon}^{(r)} \end{matrix}] \in R^{m \times n}

, where

M_{canon}^{(r)} = [M {0, r = 1; S_{residual}, r > 1}] \in R^{(m - k) \times n}

is the “canonical” model matrix, and the input

b = [\begin{matrix} b_{C} \\ b_{M} \end{matrix}] \in R^{m}

, all necessitate a “partial”

R^{2}

(hereafter, the term “partial” will be reserved for statistics related to vector

x^{*} \subseteq z

) to isolate the estimated variables

{\hat{x}}^{*}

, where, given the vector of ones

1 = {1, \dots, 1}^{⊤}

,

R_{partial}^{2} = 1 - \frac{{∥b_{M} - M {\hat{x}}^{*}∥}_{2}^{2}}{{∥b_{M} - {\bar{b}}_{M} 1∥}_{2}^{2}}, {\bar{b}}_{M} = \frac{1}{m - k} \sum_{i = 1}^{m - k} b_{M, i}

(11)

provided that

rank (A^{(r)}) = n

,

m \geq n

, i.e., applicable strictly to (over)determined problems. If

C = \emptyset \Leftrightarrow k = 0 \land y = \emptyset \Leftrightarrow n = p

, then

A^{(r)} \equiv M

and

R_{partial}^{2}

reduces to the classical

R^{2}

. Overall,

R_{partial}^{2}

has limited use for the CLSP estimator and is provided for completeness.

Next, for the error-based metrics, independent of

rank (A^{(r)})

and thus more robust across underdetermined cases (in contrast to

R_{partial}^{2}

), a Root Mean Square Error (RMSE) can be defined for both (a) the full solution

{\hat{z}}^{*}

, via

RMSE = \frac{1}{\sqrt{m}} {∥b - A^{(r)} {\hat{z}}^{*}∥}_{2}

, serving as an overall measure of fit, and (b) its variable component

{\hat{x}}^{*}

, via

{RMSE}_{partial}^{(r)} = \frac{1}{\sqrt{m - k}} {∥b_{k + 1 : m} - M {\hat{x}}^{*}∥}_{2}

, quantifying the fit quality with respect to the estimated target variables only (in alignment with

R_{partial}^{2}

). Then, to enable cross-model comparisons and especially its use in hypothesis testing (see below), RMSE must be normalized—typically by the standard deviation of the reference input,

σ_{b}

or

σ_{b_{M}}

—yielding the Normalized Root Mean Square Error (NRMSE) comparable across datasets and/or models (e.g., in the t-test for the mean of the NRMSE):

NRMSE = \frac{\frac{1}{\sqrt{m}} {∥b - A^{(r)} {\hat{z}}^{*}∥}_{2}}{σ_{b}}, {NRMSE}_{partial} = \frac{\frac{1}{\sqrt{m - k}} {∥b_{k + 1 : m} - M {\hat{x}}^{*}∥}_{2}}{σ_{b_{M}}}

(12)

The standard deviation is preferred in statistical methodology over the alternatives (e.g., max-min scaling or range-based metrics) because it expresses residuals in units of their variability, producing a scale-free measure analogous to the standard normal distribution. It is also more common in practical implementations (such as Python’s sklearn or MATLAB).

Also, for the hypothesis tests, classical ANOVA-based F-tests and coefficient-level t-tests—relying on variance decomposition and residual degrees of freedom—are applicable exclusively to the least-squares-based (iterated if

r > 1

) first-step estimate

{\hat{z}}^{(r)}

and to the final solution

{\hat{z}}^{*} = {\hat{z}}^{(r)}

, given a strictly overdetermined

A^{(r)} z^{(r)} = b

, i.e.,

rank (A^{(r)}) = n

,

m > n

(an OLS case), and under the assumption of homoscedastic and normally distributed residuals. Then, in the case of the F-tests, the test statistics follow the distributions (a)

F \sim F (q, m - n)

and (b)

F_{partial} \sim F (q_{partial}, m - n)

, a Wald-type test on a linear restriction—with the degrees of freedom

q = n - 1

and

q_{partial} = p - 1

if

x^{(r)}

is a vector of coefficients from a linear(ized) model with an intercept and

q = n

and

q_{partial} = p

otherwise—yielding (a)

H_{0} : {\hat{z}}^{(r)} = 0

and (b)

H_{0}^{partial} : {\hat{x}}^{(r)} = R {\hat{z}}^{(r)} = 0

, where

R \in R^{p \times n}

is a restriction matrix:

F = \frac{\frac{1}{q} {∥A^{(r)} {\hat{z}}^{(r)}∥}_{2}^{2}}{\frac{1}{m - n} {∥b - A^{(r)} {\hat{z}}^{(r)}∥}_{2}^{2}}, F_{partial} = \frac{\frac{1}{q_{partial}} {(R {\hat{z}}^{(r)})}^{⊤} {[R {({(A^{(r)})}^{⊤} A^{(r)})}^{- 1} R^{⊤}]}^{- 1} (R {\hat{z}}^{(r)})}{\frac{1}{m - n} {∥b - A^{(r)} {\hat{z}}^{(r)}∥}_{2}^{2}}

(13)

In the case of the t-tests, (a)

\forall_{i \in {1, \dots, n}} t_{i} \sim t_{m - n}

(n test statistics) and (b)

\forall_{i \in {1, \dots, p}}

t_{partial, i} \equiv t_{i}

(p test statistics), given that the partial result

{\hat{x}}^{(r)}

or

{\hat{x}}^{*} = {\hat{x}}^{(r)}

is a subvector of length p of

{\hat{z}}^{(r)}

(also

{\hat{z}}^{*} = {\hat{z}}^{(r)}

), yielding (a)

\forall_{i \in {1, \dots, n}} H_{0} : {\hat{z}}_{i}^{(r)} = 0

and (b)

\forall_{i \in {1, \dots, p}} H_{0}^{partial} : {\hat{x}}_{i}^{(r)} = 0

:

\forall_{i \in {1, \dots, n}} t_{i} = \frac{{\hat{z}}_{i}^{(r)}}{(\frac{1}{\sqrt{m - n}} {∥b - A^{(r)} {\hat{z}}^{(r)}∥}_{2}) \cdot \sqrt{{[{({(A^{(r)})}^{⊤} A^{(r)})}^{- 1}]}_{i i}}}, \forall_{i \in {1, \dots, p}} t_{partial, i} \equiv t_{i}

(14)

An alternative hypothesis test—robust to both

rank (A^{(r)})

and the step of the CLSP algorithm, i.e., valid for the final solution

{\hat{z}}^{*}

(in contrast to the classical ANOVA-based F-tests or coefficient-level t-tests)—can be a

t

-test for the mean of the NRMSE, comparing the observed

{NRMSE}_{obs}

(as

μ_{0}

), both full and partial, to the mean of a simulated sample

{NRMSE}_{sim}^{(τ)}

, generated via Monte Carlo simulation, typically from a uniformly (a structureless flat baseline) or normally (a “canonical” choice) distributed random input

b^{(τ)} \sim U (0, I_{m})

or

b^{(τ)} \sim N (0, I_{m})

—both the distributions being standard (the choice of distributions is, therefore, analogous to employing standard weakly or non-informative priors, with the exception of Jeffrey’s prior, in Bayesian inference)—for

τ = 1, \dots, T

, where T is the sample size, yielding

t \sim t_{T - 1}

with

H_{0} : {NRMSE}_{obs} = {\bar{NRMSE}}_{sim}

and

H_{1} : {NRMSE}_{obs} \neq {\bar{NRMSE}}_{sim}

for the two-sided test and

H_{1} : {NRMSE}_{obs} ≷ {\bar{NRMSE}}_{sim}

for the one-sided one:

t = \frac{{\bar{NRMSE}}_{sim} - {NRMSE}_{obs}}{\sqrt{\frac{1}{T (T - 1)} \sum_{τ = 1}^{T} {({NRMSE}_{sim}^{(τ)} - {\bar{NRMSE}}_{sim})}^{2}}}, {\bar{NRMSE}}_{sim} = \frac{1}{T} \sum_{τ = 1}^{T} {NRMSE}_{sim}^{(τ)}

(15)

justified when

b^{(τ)}

is normally distributed or approximately normally distributed based on the Lindeberg–Lévy Central Limit Theorem (CLT), i.e., when

T > 30

(as a rule of thumb) for

b^{(τ)} \sim U (0, I_{m})

. Then,

H_{0}

denotes a good fit for

{\hat{z}}^{*}

in the sense that

{NRMSE}_{obs}

does not significantly deviate from the simulated distribution, i.e.,

H_{0}

should not be rejected for the CLSP model to be considered statistically consistent. In practical implementations (such as Python 3, R 3, Stata 14, SAS 9.4, Julia 1.0, and MATLAB 2015a/Octave 4), T typically ranges from 50 to 50,000.

Finally, for the confidence intervals, classical formulations for (a)

z_{i}^{(r)}

,

i \in {1, \dots, n}

, and (b)

x_{i}^{(r)}

,

i \in {1, \dots, p}

—based on Student’s t- and, asymptotically, the standard normal distribution,

t \sim t_{m - n}

and

z \sim N (0, 1)

—are equivalently exclusively applicable to the least-squares-based (iterated if

r > 1

) first-step estimate

{\hat{z}}^{(r)}

and to the final solution

{\hat{z}}^{*} = {\hat{z}}^{(r)}

under a strictly overdetermined

A^{(r)} z^{(r)} = b

, i.e.,

rank (A^{(r)}) = n

,

m > n

(an OLS case), and assuming homoscedastic and normally distributed residuals. Then, provided

α \in (0, 1)

, such as

α \in {0.01, 0.05, 0.1}

, the confidence intervals for (a)

z_{i}^{(r)}

and (b)

x_{i}^{(r)}

are

\begin{matrix} \forall_{i \in {1, \dots, n}} z_{i}^{(r)} \in {\hat{z}}_{i}^{(r)} \pm t_{m - n, 1 - α / 2} \cdot \frac{1}{\sqrt{m - n}} {∥b - A^{(r)} {\hat{z}}^{(r)}∥}_{2} \cdot \sqrt{{[{({(A^{(r)})}^{⊤} A^{(r)})}^{- 1}]}_{i i}}, \\ \forall_{i \in {1, \dots, p}} x_{i}^{(r)} \equiv z_{i}^{(r)} \end{matrix}

(16)

given that the partial result

{\hat{x}}^{(r)}

or

{\hat{x}}^{*} = {\hat{x}}^{(r)}

is a subvector of length p of

{\hat{z}}^{(r)}

or

{\hat{z}}^{*} = {\hat{z}}^{(r)}

. For

n > 30

, the distribution of the test statistic,

t_{m - n, 1 - α / 2}

, approaches a standard normal one,

Z_{1 - α / 2}

, based on the CLT, yielding an alternative notation for the confidence intervals:

\begin{matrix} \forall_{i \in {1, \dots, n}} z_{i}^{(r)} \in {\hat{z}}_{i}^{(r)} \pm Z_{1 - α / 2} \cdot \frac{1}{\sqrt{m - n}} {∥b - A^{(r)} {\hat{z}}^{(r)}∥}_{2} \cdot \sqrt{{[{({(A^{(r)})}^{⊤} A^{(r)})}^{- 1}]}_{i i}}, \\ \forall_{i \in {1, \dots, p}} x_{i}^{(r)} \equiv z_{i}^{(r)} \end{matrix}

(17)

An alternative “confidence interval”—equivalently robust to both

rank (A^{(r)})

and the step of the CLSP algorithm, i.e., valid for the final solution

{\hat{z}}^{*}

(in contrast to the classical Student’s t- and, asymptotically, the standard normal distribution-based formulations)—can be constructed deterministically via condition-weighted bands, relying on componentwise numerical sensitivity. Let the residual vector

b - A^{(r)} {\hat{z}}^{*}

be a (backward) perturbation in

b

, i.e.,

Δ b = b - A^{(r)} {\hat{z}}^{*}

. Then, squaring both sides of the classical first-order inequality

\frac{∥ Δ {\hat{z}}^{*} ∥_{2}}{∥ {\hat{z}}^{*} ∥_{2}} \leq κ (A^{(r)}) \cdot \frac{{∥ Δ b ∥}_{2}}{{∥ b ∥}_{2}}

yields

\frac{\sum_{i = 1}^{n} {(Δ {\hat{z}}_{i}^{*})}^{2}}{\sum_{i = 1}^{n} {({\hat{z}}_{i}^{*})}^{2}} \leq κ {(A^{(r)})}^{2} \cdot \frac{{∥b - A^{(r)} {\hat{z}}^{*}∥}_{2}^{2}}{{∥ b ∥}_{2}^{2}}

, where

κ (A^{(r)}) = ∥ A^{(r)} ∥_{2} {∥ {(A^{(r)})}^{†} ∥}_{2} = \frac{σ_{\max} (A^{(r)})}{σ_{rank (A^{(r)})} (A^{(r)})}

—

σ_{\max}

and

σ_{rank (A^{(r)})}

being the biggest and smallest singular values of

A^{(r)}

—iff

σ_{rank (A^{(r)})} > 0 \land {∥ b ∥}_{2} > 0

. Under a uniform relative squared perturbation (a heuristic allocation of error across components as a simplification, not an assumption) of

z^{*}

, rearranging terms and taking square roots of both sides gives

\forall_{i \in {1, \dots, n}} | Δ {\hat{z}}_{i}^{*} | \leq | {\hat{z}}_{i}^{*} | \cdot κ (A^{(r)}) \cdot \frac{{∥b - A^{(r)} {\hat{z}}^{*}∥}_{2}}{{∥ b ∥}_{2}}

and a condition-weighted “confidence” band:

\forall_{i \in {1, \dots, n}} z_{i}^{*} \in {\hat{z}}_{i}^{*} \cdot (1 \pm κ (A^{(r)}) \cdot \frac{{∥b - A^{(r)} {\hat{z}}^{*}∥}_{2}}{{∥ b ∥}_{2}}), \forall_{i \in {1, \dots, p}} x_{i}^{*} \equiv z_{i}^{*}

(18)

which is a diagnostic interval based on the condition number for

z^{*}

, consisting of (1) a canonical-form system conditioning,

κ (A^{(r)})

, (2) a normalized model misfit,

\frac{{∥b - A^{(r)} {\hat{z}}^{*}∥}_{2}}{{∥ b ∥}_{2}}

, and (3) the “scale” of the final solution,

{\hat{z}}^{*}

, without probabilistic assumptions. A perturbation in one or more of these three components, e.g., caused by a change in RMSA resulting from dropping selected constraints (consult Section 4), will affect the limits of

z^{*}

. Under a perfect fit, the interval collapses to

z_{i}^{*} \in {\hat{z}}_{i}^{*} \pm 0

and, for

κ (A^{(r)}) ≫ 1

, tends to be very wide. Overall, in practical implementations, where the squared perturbations may violate the uniformity simplification, an aggregated (e.g., average) width of the diagnostic interval for vectors

z^{*}

and

x^{*}

becomes more informative, as it represents an adjusted goodness-of-fit statistic—normalized error weighted by the condition number of the design matrix

A^{(r)}

.

To sum up, the CLSP estimator requires a goodness-of-fit framework that reflects both its algorithmic structure and numerical stability. Classical methods remain informative but are valid only under strict overdetermination (

m > n

), full-rank design matrix

A^{(r)}

, and distributional assumptions (primarily, homoscedasticity and normality of residuals). In contrast, the proposed alternatives—(a) NRMSE and

{NRMSE}_{partial}

, (b) t-tests for the mean of NRMSE with the help of Monte Carlo sampling, and (c) condition-weighted confidence bands—are robust to underdeterminedness and ill-posedness, making them preferable in practical implementations (e.g., in existing CLSP-based software for Python, R, and Stata).

6. Special Cases of CLSP Problems: APs, CMLS, and LPRLS/QPRLS

The structure of the design matrix

A^{(r)}

in the CLSP estimator, as defined in Equation (3) (consult Section 3), allows for accommodating a selection of special-case problems, out of which three cases are covered (each case will use modified notation for i, j, m, and p), but the list may not be exhaustive. Among others, the CLSP estimator is, given its ability to address ill-posed or underdetermined problems under linear(ized) constraints, efficient in addressing what can be referred to as allocation problems (APs) (or, for flow variables, tabular matrix problems, TMs)—i.e., in most cases, underdetermined problems involving matrices of dimensions

m \times p

to be estimated, subject to known row and column sums, with the degrees of freedom (i.e., nullity) equal to

n - rank (A^{(r)}) \geq m p + s^{*} - (\frac{m}{i} + \frac{p}{j}) - q

, where

0 \leq s^{*} \leq (\frac{m}{i} + \frac{p}{j}) + q

is the number of active (non-zero) slack variables and

0 \leq q \leq rank (M_{partial}) \leq min (k, m p)

is the number of known model (target) variables (e.g., a zero diagonal)—whose design matrix,

A^{(r)} \in R^{(\frac{m}{i} + \frac{p}{j} + k) \times n}

, comprises a (by convention) row-sum-column-sum constraints matrix

C = {[{((I_{\frac{m}{i}} \otimes 1_{i}) \otimes 1_{p})}^{⊤} | {(1_{m} \otimes (I_{\frac{p}{j}} \otimes 1_{j}))}^{⊤}]}^{⊤} \in R^{(\frac{m}{i} + \frac{p}{j}) \times m p}

(where

1 = {1, \dots, 1}

), a model matrix

M = Q_{M}^{L} [\begin{matrix} M_{partial} \\ 0 \end{matrix}] Q_{M}^{R} \in R^{k \times m p}

—in a trivial case,

M_{partial} \subseteq I_{m p}

—a sign slack matrix

S = Q_{S}^{L} [\begin{matrix} S_{s} \subseteq \forall_{σ_{1}, \dots, σ_{s} \in {- 1, + 1}} diag (σ_{1}, \dots, σ_{s}) \\ 0 \end{matrix}] Q_{S}^{R} \in R^{(\frac{m}{i} + \frac{p}{j}) \times (n - m p)}

, and either a zero matrix

0

(if

r = 1

) or a (standard) reverse-sign slack matrix

S_{residual} \in R^{k \times (n - m p)}

(provided

r > 1

) (as given by Equation (19) extending Equation (3)):

A^{(r)} = [\begin{matrix} [\begin{matrix} (I_{\frac{m}{i}} \otimes 1_{i}) \otimes 1_{p} \\ 1_{m} \otimes (I_{\frac{p}{j}} \otimes 1_{j}) \end{matrix}] & S \\ M & \{\begin{matrix} 0, & initial iteration, r = 1 \\ S_{residual}, & subsequent iterations, r > 1 \end{matrix} \end{matrix}]

(19)

where

\frac{m}{i}

and

\frac{p}{j}

denote groupings of the m rows and p columns, respectively, into i and j homogeneous blocks (when no grouped row or column sums are available,

i = j = 1

); with real-world examples including: (a) input–output tables (national, inter-country, or satellite); (b) structural matrices (e.g., trade, country-product, investment, or migration); (c) financial clearing and budget-balancing; and (d) data interpolations (e.g., quarterly data from annual totals). Given the available literature in 2025, the first pseudoinverse-based method of estimating input–output tables was proposed in Pereira-López et al. [81] and the first APs (TMs)-based study was conducted in Bolotov [16], attempting to interpolate a world-level (in total, 232 countries and dependent territories) “bilateral flow” matrix of foreign direct investment (FDI) for the year 2013, based on known row and column totals from UNCTAD data (aggregate outward and inward FDI). The estimation employed a minimum-norm least squares solution under APs (TMs)-style equality constraints, based on the Moore–Penrose pseudoinverse of a “generalized Leontief structural matrix”, rendering it equivalent to the initial (non-corrected) solution in the CLSP framework,

{\hat{z}}^{(1)} = A_{Z}^{(1) †} b

. As a rule of thumb, AP (TM)-type problems are rank deficient for

m > 2 \land p > 2

with the CLSP being a unique (if

α \in (0, 1]

) and an MNBLUE (if

α = 1

) estimator (consult Section 3).

In addition to the APs (TMs), the CLSP estimator is also efficient in addressing what can be referred to as constrained-model least squares (CMLS) (or, more generally, regression problems, RPs)—i.e., (over)determined problems involving vectors of dimension p to be estimated with (standard) OLS degrees of freedom, subject to, among others, linear(ized) data matrix

D \in R^{k \times p}

-related (in)equality constraints,

\forall_{i \in {1, \dots, t}} C_{i} \equiv T_{i} D \in R^{m_{i} \times p}

, where

T_{i} \in R^{m_{i} \times k}

is a transformation matrix of

D

, such as the i-th difference or shift (lag/lead), and

C_{i} x ⋛ γ_{i}

—whose design matrix,

A^{(r)} \in R^{(\sum_{i = 1}^{u} m_{i} + k) \times n}

, consists of a u-block constraints matrix

C = [\begin{matrix} C_{1} \\ \dots \\ C_{u} \end{matrix}] \in R^{(\sum_{i = 1}^{u} m_{i}) \times p}

(

\forall_{i \in {1, \dots, u}} C_{i} \in R^{m_{i} \times p}

),

u \geq t

, a data matrix as the model matrix

M \equiv D \in R^{k \times p}

, a (standard) sign slack matrix

S \in R^{(\sum_{i = 1}^{u} m_{i}) \times (n - p)}

, and either a zero matrix

0

(if

r = 1

) or a (standard) reverse-sign slack matrix

S_{residual} \in R^{k \times (n - p)}

(when

r > 1

) (as given by Equation (20) extending Equation (3), with

M

substituted by data matrix

D

):

A^{(r)} = [\begin{matrix} [\begin{matrix} C_{1} \\ \dots \\ C_{u} \end{matrix}] & S \\ D & \{\begin{matrix} 0, & initial iteration, r = 1 \\ S_{residual}, & subsequent iterations, r > 1 \end{matrix} \end{matrix}]

(20)

with real-world examples including (a) previously infeasible textbook-definition econometric models of economic (both micro and macro) variables (e.g., business-cycle models), and (b) additional constraints applied to classical econometric models (e.g., demand analysis). Given the available literature in 2025, the first studies structurally resembling CMLS (RPs) were conducted in Bolotov [15,82], focusing on the decomposition of the long U.S. GDP time series into trend (

y_{t}

) and cyclical (

c_{t}

) components—using exponential trend, moving average, Hodrick-Prescott filter, and Baxter-King filter—under constraints on its first difference, based on the U.S. National Bureau of Economic Research (NBER) delimitation of the U.S. business cycle.

c_{t}

was then modeled with the help of linear regression (OLS), based on an n-th order polynomial with extreme values smoothed via a factor of

\frac{1}{10^{n - i}}

and simultaneously penalized via an n-th or (

n + 1

)-th root,

c_{t} = β_{0} + β_{1} \int (\sqrt[{n, n + 1}]{\prod_{i = 1}^{i = n} (t - t_{i}) \cdot \frac{1}{10^{n - i}}}) d t + ε_{t}

, where

β_{0}, β_{1}

are the model parameters,

\forall_{1 \leq i \leq n} t_{i}

are the values of the (externally given) stationary points, and

ε_{t}

is the error. The order of the polynomial and the ad hoc selection of smoothing and penalizing factors, however, render such a method inferior to the CLSP. Namely, the unique (if

α \in (0, 1]

) and an MNBLUE (if

α = 1

) CLSP estimator (consult Section 3) allows (a) the presence of inequality constraints and (b) their ill-posed formulations.

Finally, the CLSP estimator can be used to address (often unsolvable using classical solvers) underdetermined and/or ill-posed—caused by too few and/or mutually inconsistent or infeasible constraints as in the sense of Whiteside et al. [21] and Blair [22,23]—LP and QP problems, an approach hereafter referred to as linear/quadratic programming via regularized least squares (LPRLS/QPRLS): CLSP substitutes the original objective function of the LP/QP problem with the canonical form

A^{(r)} z^{(r)} = b

(i.e., focusing solely on the problem’s constraints, without distinguishing between the LP and QP cases) with

{\hat{x}}_{M}^{*} = {Proj}_{x_{M}^{(r)}} \{Q_{X} [\begin{matrix} x_{M}^{(r)} \\ x_{L}^{(r)} \end{matrix}] \in R^{p} : \forall_{1 \leq i \leq s} g_{i} (x_{M}^{(r)}, x_{L}^{(r)}) ⋛ γ_{i}, \forall_{j} h_{j} (x_{M}^{(r)}, x_{L}^{(r)}) = η_{j}\} \subseteq z^{*}

being the solution of the original problem, where

Q_{X}

is a permutation matrix and

g_{i} (\cdot) ⋛ γ_{i}

and

h_{j} (\cdot) = η_{j}

represent the linear(ized) inequality and equality constraints,

\forall_{i, j} γ_{i}, η_{j} \in b

, and the degrees of freedom are equal, under the classical formalization of a primal LP/QP problem (consult Section 2), to

n - rank (A^{(r)}) \geq (p + s) - (m_{u b} + m_{e q} + m_{n n}) = p - m_{e q}

, where p is the length of

x_{M}^{(r)}

,

s = m_{u b} + m_{n n}

is the number of introduced slack variables, while

m_{u b} \geq 0

,

m_{e q} \geq 0

, and

m_{n n} \equiv p

denote the numbers of upper-bound, equality, and non-negativity constraints, respectively (under the standard assumption that all model variables are constrained to be non-negative). In the LPRLS/QPRLS, the design matrix,

A^{(r)} \in R^{(m_{u b} + m_{e q} + m_{n n}) \times n}

, by the definition of LP/QP, is r-invariant and consists of a block constraints matrix

C = [\begin{matrix} C_{u b} \\ C_{e q} \\ C_{n n} \end{matrix}] \in R^{(m_{u b} + m_{e q} + m_{n n}) \times p}

, where

C_{u b} \in R^{m_{u b} \times p}

,

C_{e q} \in R^{m_{e q} \times p}

, and

C_{n n} \equiv I_{p} \in R^{p \times p}

, a (standard) sign slack matrix

S \in R^{(m_{u b} + m_{e q} + m_{n n}) \times (n - p)}

, and

M_{canon} = \emptyset

(as given by Equation (21) extending Equation (3), under the condition

M = S_{residual} = \emptyset

):

A^{(r)} = A^{(1)} = [\begin{matrix} [\begin{matrix} C_{u b} \\ C_{e q} \\ C_{n n} \end{matrix}] & S \end{matrix}]

(21)

Given the available literature in 2025, the first documented attempts to incorporate LS into LP and QP included, among others, Dax’s LS-based steepest-descent algorithm [26] and the primal-dual NNLS-based algorithm (LSPD) [79], whose structural restrictions—in terms of well-posedness and admissible constraints—can be relaxed through LPRLS/QPRLS, still guaranteeing a unique (if

α \in (0, 1]

) and an MNBLUE (if

α = 1

) solution (consult Section 3).

To sum up, the CLSP framework can be applied to three special classes of problems—allocation (APs (TMs)), constrained-model least squares (CMLS (RPs)), and regularized linear or quadratic programming (LPRLS/QPRLS)—differing only in the nature of their constraint blocks, the treatment of slack or residual components, and rank properties of their design matrix

A^{(r)}

, while the estimation method and goodness-of-fit statistics remain identical. The three examined cases correspond to the overwhelming majority of potential real-world uses of the estimator, whereas custom (non-described) problem types are relatively scarce. The AP (TM) case is of particular methodological importance as a means of estimating, among others, input–output tables (without reliance on historical observations).

7. Monte Carlo Experiment and Numerical Examples

This work will finally demonstrate the capability of the CLSP estimator to undergo large-scale Monte Carlo experiments across its special cases—namely, the AP (TM), CMLS (RP), and LPRLS/QPRLS problems (illustrated below on the example of the APs (TMs))—which can be replicated to assess its explanatory power and to calibrate key parameters (r and

α

) for real-world applications in future research. However, the construction of a complete econometric framework—requiring not only the results of such calibration but also the formal derivation of its maximum likelihood function (for model selection) and an extension of the Gauss–Markov theorem (to explicitly incorporate MNBLUE cases)—lies beyond the scope of this theoretical text, whose aim is to establish the linear-algebraic foundations of the estimator and to discuss its numerical stability and potential measures of its goodness of fit. Therefore, the Monte Carlo experiment is complemented by simulated numerical examples serving as a proof of the estimator’s practical implementability and the estimability of its diagnostic metrics (i.e., RMSA, NRMSE, t-tests, and diagnostic intervals).

Proceeding to the Monte Carlo experiment, the standardized form of

C

in the APs (TMs), i.e.,

C = f (m, i, p, j)

, with given

S

and r, enables the large-scale simulation of the distribution of NRMSE—a robust goodness-of-fit metric for the CLSP—through repeated random trials under varying matrix dimensions

m \times p

, row and column group sizes i and j, and composition of

M

, inferring asymptotic percentile means and standard deviations for practical implementations (e.g., the t-test for the mean). This text presents the results of a simulation study on the distribution of the NRMSE from the first-step estimate

{\hat{z}}^{(1)}

(implementable in standard econometric software without CLSP modules) for

i = 1

,

j = 1

,

S = \emptyset

,

Z = I_{m + p + k} \Leftrightarrow {(A_{Z}^{(1)})}^{†} = {(A^{(1)})}^{†}

, and

r = 1

, conducted in Stata/MP (set to version 14.0) with the help of Mata’s 128-bit floating-point cross-product-based quadcross() for greater precision and SVD-based svsolve() with a strict tolerance of c("epsdouble") for increased numerical stability, and a random-variate seed 123456789 (see Listing A1 containing a list of dependencies [83] and the Stata .do-file). In this 50,000-iterations Monte Carlo simulation study, spanning matrix dimensions from 1 × 2 and 2 × 1 up to 50 × 50, random normal input vectors

b \sim N (0, I_{m + p + k})

were generated for each run, applied with and without zero diagonals (i.e., if, under

l = 0 \Leftrightarrow x_{L} = \emptyset

,

x_{M} = x = z_{1 : m p} \in R^{m p}

is reshaped into a (

m \times p

)-matrix

X_{M} = X \in R^{m \times p}

, then

\forall_{i \in {1, \dots, min (m, p)}} X_{i i} = 0

). Thus, a total of 249.9 million observations (

= (50 \cdot 50 - 1) \cdot 2 \cdot 50, 000

) was obtained via the formula

NRMSE = \frac{1}{\sqrt{m}} {∥ b - A^{(1)} ({(A^{(1)})}^{†} b) ∥}_{2} / σ_{b}

, resulting in 4998 aggregated statistics of the asymptotic distribution, assuming convergence: mean, sd, skewness, kurtosis, min, max, as well as p1–p99 as presented in Figure 6 (depicting the intensity of its first two moments, mean and sd) and Table 2 (reporting a subset of all thirteen statistics for

m, p mod 10 = 0

).

From the results of the Monte Carlo experiment, it is observable that (1) the NRMSE from the first-step estimate

{\hat{z}}^{(1)}

and its distributional statistics exhibit increasing stability and boundedness as matrix dimensions

m \times p

increase—specifically, for

m, p ≫ 1

, both the mean and sd of NRMSE tend to converge, indicating improved estimator performance (e.g., under no diagonal restrictions, at

m = p = 5

,

mean = 1.84

and

sd = 1.29

, while at

m = p = 45

,

mean = 0.21

and

sd = 0.16

); (2) the zero-diagonal constraints (

\forall_{i \in {1, \dots, min (m, p)}} X_{i i} = 0

) reduce the degrees of freedom and lead to a much more uniform and dimensionally stable distribution of NRMSE across different matrix sizes (e.g., at

m = p = 5

, mean drops to

1.17

and sd to

0.85

, while at

m = p = 45

, mean rises slightly to

0.26

and sd to

0.19

); and (3) the distribution of NRMSE exhibits mild right skew and leptokurtosis—less so for the zero-diagonal case: under no diagonal restriction, the average skewness is

0.97

with a coefficient of variation of

38.16 %

(i.e., >0.00), and the average kurtosis is

4.15

with a variation of

304.61 %

, whereas under the zero-diagonal constraint, skewness averages

0.95

with only

8.57 %

variation, and kurtosis averages

3.76

with

25.94 %

variation (i.e., >3.00). To sum up, the larger and less underdetermined (i.e., in an AP (TM) problem, the larger the

M^{(1)}

block of

A^{(1)}

) the canonical system

A^{(1)} z^{(1)} = b

, the lower the estimation error (i.e., the mean NRMSE) and its variance, and, provided a stable limiting law for NRMSE exists, the skewness and kurtosis from the simulations drift toward 0 and 3, respectively, consistent with—though not proving—a convergence toward a standard-normal distribution. This leads to a straightforward conclusion that the CLSP estimator’s Step 1 performance improves monotonically with increasing problem size and rank, indicating that larger and better-conditioned systems yield more stable normalized errors, thereby allowing future research to concentrate on the calibration of the parameters r and, on the inclusion of Step 2,

α

.

For an illustration of the implementability of the estimator for the APs (TMs), CMLS (RPs), and LPRLS/QPRLS, the author’s cross-platform Python 3 module, pyclsp (version ≥ 1.6.0, available on PyPI) [84]—together with its interface co-modules for the APs (TMs), pytmpinv (version ≥ 1.2.0, available on PyPI) [85], and the LPRLS/QPRLS, pylppinv (version ≥ 1.3.0, available on PyPI) [86] (a co-module for the CMLS (RPs) is not provided, as the variability of

\forall_{i \in {1, \dots, t}} C_{i} \equiv T_{i} D

within the

C = [\begin{matrix} C_{1} \\ \dots \\ C_{u} \end{matrix}]

,

u \geq t

, block of

A^{(r)}

prevents efficient generalization)—is employed for a sample (dataset) of i random elements

X_{i}

(scalars, vectors, or matrices depending on context) and their transformations, drawn independently from the standard normal distribution, i.e.,

{X_{1}, \dots, X_{i}} \overset{iid}{\sim} N (0, 1)

, under the mentioned random-variate seed 123456789 (configured for consistency with the above-described Monte Carlo experiment). Thus, using the cases of (a) a (non-negative symmetric) input–output table and (b) a (non-negative) zero-diagonal trade matrix as two AP (TM) numerical examples, this text simulates problems similar to the ones addressed in Pereira-López et al. [81] and Bolotov [16]: an underdetermined

X \in R^{m \times p}

is estimated—from row sums, column sums, and k known values of two randomly generated matrices (a)

X_{true} = 0.5 \cdot (| X_{1} | + | X_{1} |^{⊤})

and (b)

X_{true, i, j} = (| X_{1}|)_{i, j} (1 - 1_{{i = j \leq min (m, p)}})

,

{(X_{1})}_{i, j} \overset{iid}{\sim} N (0, 1)

, subject to (a)

X_{i, j} \geq 0

and (b)

X_{i, j} \geq 0 \land \forall_{i \in {1, \dots, min (m, p)}} X_{i i} = 0

—via CLSP assuming

x = vec (X) \subseteq z

,

M \equiv {(I_{m p})}_{I}

,

b = [\begin{matrix} b_{row sums} \\ b_{column sums} \\ b_{known values} \end{matrix}] = [\begin{matrix} X_{true} 1 \\ {(1^{⊤} X_{true})}^{⊤} \\ {(vec (X_{true}))}_{I} \end{matrix}]

, and

I \sim U \{1, \dots, m p\} \land | I | = k

. The Python 3 code, based on pytmpinv and two other modules, installable by executing pip install matplotlib numpy pytmpinv==1.2.0, for (a)

m = p = 20

and

k = 0.1 \cdot m p = 40

and (b)

m = p = 40

—estimated with the help of

m_{subset} \times n_{subset}

-MNBLUE across

(⌈\frac{m}{m_{subset}}⌉ \cdot ⌈\frac{n}{n_{subset}}⌉) = (⌈\frac{40}{20 - 1}⌉ \cdot ⌈\frac{40}{20 - 1}⌉) = 9

reduced models

m_{reduced} = p_{reduced} \leq 20

assuming an estimability constraint of

m p \leq 400

—and

k = 0.2 \cdot m p = 320

, with a non-iterated (

r = 1

), unique (

α > 0

), and MNBLUE (

α = 1

) (two-step) CLSP solution, is implemented in (a) Listing 1 and (b) Listing 2 (e.g., to be executed in a Jupyter Notebook 6.5 or later).

Listing 1. Simulated numerical example for a symmetric input-output-table-based AP (TM) problem.

In case (a) (see Listing 1 for code), the number of model (target) variables

# {\hat{x}}_{M}^{*}

is

m p = 400

with a nullity of

dim (ker (A^{(1)})) = n - rank (A^{(1)}) = 321

, corresponding to 80.25% of the total unknowns. Given the simulation of

X_{true} \in R^{20 \times 20}

, a matrix unknown in real-world applications—i.e., CLSP is used to estimate the elements of an existing matrix from its row sums,

X_{true} 1 \in R^{20}

, column sums,

{(1^{⊤} X_{true})}^{⊤} \in R^{20}

, and a randomly selected 10% of its entries,

{(vec (X_{true}))}_{I} \in R^{40}

—the model’s goodness of fit can be measured by a user-defined

R^{2} \approx max (0, 1 - \frac{{∥vec (X_{true}) - {\hat{x}}_{M}^{*}∥}_{2}^{2}}{{∥vec (X_{true}) - 1 \frac{1}{400} \sum_{i = 1}^{400} (vec {(X_{true})}_{i})∥}_{2}^{2}}) \approx 0.278256

(i.e., CLSP achieves an improvement of

Δ R^{2} = 0.178256

over a hypothetical naïve predictor reproducing the known 10% entries of

X_{true, I}

and yielding a

R^{2} \approx 0.1

but still a modest value of

R^{2}

, i.e., a relatively large error,

NRMSE = 12.649111

) with

{\hat{x}}_{M}^{*}

lying within wide condition-weighted diagnostic intervals

{\hat{x}}_{M}^{*} \in [min ({\hat{x}}_{M, lower}^{*}), max ({\hat{x}}_{M, upper}^{*})] = [- 8.0847 \times 10^{15}, 8.0847 \times 10^{15}]

reflecting the ill-conditioning of the strongly rectangular

A^{(1)} \in R^{80 \times 400}

,

κ (A^{(1)}) = 4.9968 \times 10^{14}

, and only the left-sided Monte Carlo-based t-test for the mean of the NRMSE (on a sample of 30 NRMSE values obtained by substituting

b

with

{(b^{(τ)})}_{i} \overset{iid}{\sim} N (0, 1)

) suggesting consistency with expectations (i.e., with the

p - value \approx 1

). In terms of numerical stability,

κ (C_{canon}) = 4.8023 \times 10^{14} \approx κ (A^{(1)})

with a low

{RMSA}_{i \in {1, \dots, 40}} = RMSA = 0.0400

, as confirmed by the correlogram produced in matplotlib, indicating a well-chosen constraints matrix, even given the underdeterminedness of the model (which also prevents a better fit).

Listing 2. Simulated numerical example for a (non-negative) trade-matrix-based AP (TM) problem.

In case (b) (see Listing 2 for code), the corresponding number of model (target) variables in each of the

(⌈\frac{40}{19}⌉ \cdot ⌈\frac{40}{19}⌉) = 9

reduced design submatrices

A_{subset}^{(1)} \subset A_{reduced}^{(1)}

,

# {\hat{x}}_{M, reduced}^{*}

, is

m_{subset} p_{subset} \leq 361

—where

m_{subset} \leq 20 - 1 = 19

and

p_{subset} \leq 20 - 1 = 19

, one row and one column being reserved for

\sum_{j \notin J} a_{i, j}^{(1)}

and

\sum_{i \notin I} a_{i, j}^{(1)}

, which enter

A_{reduced}^{(1)}

as

S = [\begin{matrix} I_{m - m_{subset}} 0 \\ 0 I_{p - p_{subset}} \end{matrix}]

and the vector of slack variables

{({\hat{y}}_{reduced}^{*})}^{*} = [\begin{matrix} {\hat{y}}_{reduced}^{*} \\ {\hat{y}}_{residual, reduced}^{*} \end{matrix}] \subset {\hat{z}}_{reduced}^{*}

, so that

C_{reduced} x_{reduced} + S_{reduced} y_{reduced}^{*} = b_{r ., 1 : k}

(i.e., the slack matrix compensates for the unaccounted row and column sums in the reduced models as opposed to the full one, such as case (a))—with a nullity

dim (ker (A_{reduced}^{(1)})) = n_{reduced} - rank (A_{reduced}^{(1)}) \in [2, 290]

(per reduced model), corresponding to 25.00–72.68% of the total unknowns (per reduced model)—to compare, a full model, under the same inputs and no computational constraints (i.e.,

m p ≰ m_{E} p_{E}

), would have a nullity

dim (ker (A^{(1)})) = n - rank (A^{(1)}) = 1168

, corresponding to 73.00% of the total unknowns. In the examined example—based on

X_{true} 1 \in R^{40}

,

{(1^{⊤} X_{true})}^{⊤} \in R^{40}

, and a randomly selected 20% of entries of the true matrix

X_{true} \in R^{40 \times 40}

—the reduced-model block solution’s goodness of fit could not be efficiently measured by a user-defined

R^{2} \approx max (0, 1 - \frac{{∥vec (X_{true}) - vec (\{{\hat{x}}_{M, reduced, 1 : m_{subset}, 1 : n_{subset}}^{*, (i_{E}, j_{E})}\})∥}_{2}^{2}}{{∥vec (X_{true}) - 1 \frac{1}{1600} \sum_{i = 1}^{1600} (vec {(X_{true})}_{i})∥}_{2}^{2}}) \approx 0

(i.e., the block matrix constructed from

(⌈\frac{40}{19}⌉ \cdot ⌈\frac{40}{19}⌉) = 9

reduced-model estimates led to

R^{2} \approx max (0, - 5.435377)

but to an error proportionate to the one in case (a),

{NRMSE}_{reduced} \in [3.464102, 15.748016]

(per reduced model))—in contrast, a full model would achieve

R^{2} \approx max (0, 1 - \frac{{∥vec (X_{true}) - {\hat{x}}_{M}^{*}∥}_{2}^{2}}{{∥vec (X_{true}) - 1 \frac{1}{1600} \sum_{i = 1}^{1600} (vec {(X_{true})}_{i})∥}_{2}^{2}}) \approx 0.278427

but at a cost of a greater error,

NRMSE = 29.427878

—with

{\hat{x}}_{M, reduced}^{*}

lying within strongly varying condition-weighted diagnostic intervals

{\hat{x}}_{M, reduced}^{*} \in [min ({\hat{x}}_{M, lower, reduced}^{*}), max ({\hat{x}}_{M, upper, reduced}^{*})]

, where

min ({\hat{x}}_{M, lower, reduced}^{*}) \in [- 330.638753, - 2.5009 \times 10^{- 19}]

(per reduced model) and

max ({\hat{x}}_{M, upper, reduced}^{*}) \in [- 2.5456 \times 10^{- 19}, 336.547225]

(per reduced model), and varying results of Monte Carlo-based t-tests for the mean of the

{NRMSE}_{reduced}

(on a sample of 30

{NRMSE}_{reduced}

values obtained by substituting

b_{r .}

with

{(b_{r .}^{(τ)})}_{i} \overset{iid}{\sim} N (0, 1)

), where the p-values range is

4.8022 \times 10^{- 09}

–1.000000 (left-sided), 0.000000–1.000000 (two-sided), and

3.7021 \times 10^{- 20}

–1.000000 (right-sided) (per reduced model)—alternatively, a full model would lead to wider condition-weighted diagnostic intervals

{\hat{x}}_{M}^{*} \in [min ({\hat{x}}_{M, lower}^{*}), max ({\hat{x}}_{M, upper}^{*})] = [- 1.5357 \times 10^{16}, 1.5357 \times 10^{16}]

(i.e., reflecting the ill-conditioning of the strongly rectangular

A^{(1)} \in R^{400 \times 1600}

,

κ (A^{(1)}) = 2.2196 \times 10^{14}

) and only the left-sided Monte Carlo-based t-test for the mean of the NRMSE (on a sample of 30 NRMSE values obtained by substituting

b

with

{(b^{(τ)})}_{i} \overset{iid}{\sim} N (0, 1)

) suggesting consistency with expectations (i.e., with the

p - value \approx 1

). In terms of numerical stability,

κ (C_{canon, reduced}) \in [2.236068, 6.244998] < κ (A_{reduced}^{(1)}) \in [4.509742, 9.325735]

(per reduced model), which indicates well-conditioning of all the reduced models—conversely, in a full model,

κ (C_{canon}) = 1.0907 \times 10^{14} < κ (A^{(1)})

(therefore, a full model ensures an overall better fit but a lower fit quality, i.e., a trade-off).

As the CMLS (RP) numerical example, this text addresses a problem similar to the one solved in Bolotov [15,82]: a coefficient vector

β

from a (time-series) linear regression model in its classical (statistical) notation

y_{t} = x_{t} β + ϵ_{t}, s . t . \sum_{j = 1}^{i + 1} β_{j} = c \land \forall_{t \in P} Δ y_{t} = 0 \land \forall_{t \in P} Δ^{2} y_{t} ⪌ 0

with

Δ^{n} y_{t}

denoting n-th order differences (i.e., the discrete analogue of

\frac{d^{n} y_{t}}{d t^{n}}

)—where

y_{t}

is the dependent variable,

x_{t} = [1, x_{t, 1}, \dots, x_{t, i}]

is the vector of regressors with a constant,

ϵ_{t}

is the model’s error (residual), c is a constant, and

P ⫋ {t_{j} : j = 1, \dots, k}

is the set of stationary points—is estimated on a simulated sample (dataset) with the help of CLSP assuming

x_{M} \equiv β \in R^{p}

and

D \equiv X = [1, x_{1}, \dots, x_{i}] \in R^{k \times p}

,

p = i + 1

. The Python 3 code, based on numpy and pyclsp modules, installable by executing pip install numpy pyclsp==1.6.0, for

k = 500

,

p = 6

,

c = 1

,

y = D β_{true} + ϵ

, where

\forall_{j \in {2, \dots, 6}} D_{\cdot, j} \equiv X_{j}

,

{(X_{j})}_{i} \overset{iid}{\sim} N (0, 1)

,

X_{j} \in R^{500}

,

β_{true} = \frac{X_{β}}{1^{⊤} X_{β}}

,

{(X_{β})}_{i} \overset{iid}{\sim} N (0, 1)

,

X_{β} \in R^{6}

,

ϵ \equiv X_{ϵ}

,

{(X_{ϵ})}_{i} \overset{iid}{\sim} N (0, 1)

,

X_{ϵ} \in R^{500}

, and

P = {t : Δ y = 0}

, with a non-iterated (

r = 1

), unique (

α > 0

), and MNBLUE (

α = 1

) two-step CLSP solution (consistent with the ones from cases (a) and (b) for APs (TMs)), is implemented in Listing 3 (e.g., for a Jupyter Notebook).

Listing 3. Simulated numerical example for a (time-series) stationary-points-based CMLS (RP) problem.

Compared to the true values

β_{true} = [0.1687, - 0.3593, 0.5969, - 0.0628, - 0.1742, 0.8307]

, the CMLS (RP) estimate is

\hat{β} = {\hat{x}}_{M}^{*} = [0.2024, - 0.1121, 0.2095, - 0.0396, - 0.0296, 0.2611]

with a modest

R_{partial}^{2} = 0.293328

(i.e., a greater error,

{NRMSE}_{partial} = 0.840638

), moderate condition-weighted diagnostic intervals

{\hat{x}}_{M}^{*} \in [- 29.7780, 30.3003]

, and only the right-sided Monte Carlo-based t-test for the mean of the

{NRMSE}_{partial}

(on a sample of 30

{NRMSE}_{partial}

values obtained by substituting

b

with

{(b^{(τ)})}_{i} \overset{iid}{\sim} N (0, 1)

)—in the example under scrutiny,

{NRMSE}_{partial}

is preferable to NRMSE due to

{\hat{y}}^{*} ≫ {\hat{x}}^{*}

—suggesting consistency with expectations (i.e., with the

p - value \approx 1

). Similarly, in terms of numerical stability,

κ (C_{canon}) = 138.6109 > κ (A^{(1)}) = 127.8287

, indicating that the constraint block is ill-conditioned, most likely, because of the imposed data- (rather than theory-) based definition of stationary points, which rendered Step 2 infeasible (

{\hat{x}}_{M}^{*} = {\hat{x}}_{M}^{(1)}

) (limiting the fit quality).

Finally, in the case of a LPRLS/QPRLS numerical example, this text simulates potentially ill-posed LP (QP) problems, similar to the ones addressed in Whiteside et al. [21] and Blair [22,23]: a solution

{\hat{x}}^{*} ⋛ 0

in its classical LP (QP) notation

x^{*} = \arg \max_{x \in R^{p}} c^{⊤} x

(

x^{*} = \arg \min_{x \in R^{p}} \{\frac{1}{2} x^{⊤} Q x + c^{⊤} x\}

),

s . t . A x \leq b \equiv A_{u b} x \leq b_{u b} \land A_{e q} x = b_{e q}

—where

Q \in R^{p \times p}

is a symmetric positive definite matrix,

c \in R^{p}

,

A = [\begin{matrix} A_{u b} \\ A_{e q} \end{matrix}] \in R^{(m_{u b} + m_{e q}) \times p}

, and

b = [\begin{matrix} b_{u b} \\ b_{e q} \end{matrix}] \in R^{m_{u b} + m_{e q}}

—is estimated from two randomly generated (coefficient) matrices

A_{u b} \equiv X_{1}

,

{(X_{1})}_{i, j} \overset{iid}{\sim} N (0, 1)

, and

A_{e q} \equiv X_{2}

,

{(X_{2})}_{i, j} \overset{iid}{\sim} N (0, 1)

, and two (right-hand-side) vectors

b_{u b} \equiv X_{3}

,

{(X_{3})}_{i} \overset{iid}{\sim} N (0, 1)

, and

b_{e q} \equiv X_{4}

,

{(X_{4})}_{i} \overset{iid}{\sim} N (0, 1)

, via CLSP assuming

C = [\begin{matrix} A_{u b} \\ A_{e q} \end{matrix}]

,

S = Q_{S}^{L} [\begin{matrix} S_{s} \subseteq I_{m_{u b}} \\ 0 \end{matrix}] Q_{S}^{R}

, and

b = [\begin{matrix} b_{u b} \\ b_{e q} \end{matrix}]

, where

Q_{S}^{L}

and

Q_{S}^{R}

are permutation matrices, while omitting

Q

and

c

. The Python 3 code, based on numpy and pylppinv modules, installable by executing pip install numpy pylppinv==1.3.0, for

m_{u b} = 50

,

m_{e q} = 25

, and

p = 500

, with a non-iterated (

r = 1

), unique (

α > 0

), and MNBLUE (

α = 1

) (two-step) CLSP solution (analogously consistent with the ones from cases (a) and (b) for APs (TMs) and the one from CMLS (RPs)), is implemented in Listing 4 (e.g., for a Jupyter Notebook).

Listing 4. Simulated numerical example for a underdetermined and potentially infeasible LP (QP) problem.

The nullity (i.e., underdeterminedness) of the CLSP design matrix,

A^{(1)}

, is

dim (ker (A^{(1)})) = n - rank (A^{(1)}) = 475

, corresponding to 86.36% of the total unknowns, accompanied by a greater (relatively high) error,

NRMSE = 12.247449

, moderate condition-weighted diagnostic intervals

{\hat{x}}_{M}^{*} \in [- 1.2919, 1.3918]

, and only the right-sided Monte Carlo-based t-test for the mean of the NRMSE (on a sample of 30 NRMSE values obtained by substituting

b

with

{(b^{(τ)})}_{i} \overset{iid}{\sim} N (0, 1)

) suggesting consistency with expectations (i.e., with the

p - value \approx 1

). Similarly, in terms of numerical stability,

κ (C_{canon}) = 2.2168 = κ (A^{(1)}) = 2.2168

, indicating that the constraint block is well-conditioned despite a strongly rectangular

A^{(1)} \in R^{75 \times 550}

. Hence, CLSP can be efficiently applied to AP (TM), CMLS (RP), and LPRLS/QPRLS special cases using Python 3, and the reader may wish to experiment with the code in Listings 1–4 by relaxing its uniqueness (setting

α = 0

) or the MNBLUE characteristic (setting

α \in [0, 1)

).

To sum up, while the simulated numerical examples for the APs (TMs), CMLS (RPs), and LPRLS/QPRLS problems successfully illustrate the theoretical assumptions and numerical behavior of the CLSP estimator (as described in Section 3, Section 4 and Section 5 of this work), they remain limited by simplification and a priori construction. First, each estimation was performed under a fixed

r = 1

(i.e., without iterative refinement) and a unique MNBLUE configuration (

α = 1

), thus omitting the calibration of these parameters. Second, in the AP (TM) case with 10% known values and a non-zero diagonal, similar to the CMLS (RP) one, the estimator achieved a modest

R^{2} \approx 0.3

, consistent with the degree of underdeterminedness, while the reduced models (e.g., zero-diagonal, block-decomposed) performed substantially worse due to aggregation and block-wise estimation (backing the recommendation to prefer full models over reduced ones). Finally, the estimator, given its formalization, is computationally demanding, with the SVD-based Step 1 complexity of

O (max (m, n) min {(m, n)}^{2})

for

A^{(r)} \in R^{m \times n}

in each iteration (in CLSP-based software built on Python’s SciPy, R, and Stata) and convex-programming-based Step 2 complexity of

O (k n n z (A^{(r)}))

or

O (n^{3})

, where k is the number of solver iterations and

n n z (\cdot)

is the number of non-zero elements in

A^{(r)} \in R^{m \times n}

, (in CLSP-based software built on Python’s CVXPY and R’s CVXR), not including eventual repeated estimations for

{RMSA}_{i}

(in which

A^{(r)}

is reduced by one row) and Monte Carlo-based t-test—to exemplify, a

20 \times 20

AP (TM) problem with 10% known values and a non-zero diagonal has an

A^{(r)} \in R^{80 \times 400}

requiring r-times 2.56 million floating-point operations in Step 1 and up to 320 million under the SCS solver in Step 2, both eventually repeated twenty times to calculate

{RMSA}_{i}

and at least thirty times (per CLT) for Monte Carlo-based t-test. Despite its complexity, the CLSP framework remains practically viable and is ready for parameter calibration with subsequent application to real-world problems.

8. Discussion and Conclusions

The proposed modular two-step Convex Least Squares Programming (CLSP) estimator represents an attempt to unify historical efforts developed mainly in the 1960s–1990s to address ill-posed problems in both LS and LP (QP)—namely, the mentioned Wolfe [25], Stoer [75], Waterman [76], Dax [26], Osborne [27], George and Osborne [74], Barnes et al. [79], Donoho and Tanner [73], Grafarend and Awange [77], Bei et al. [80], and Qian et al. [78]—by correcting (1) the estimation, based on a Moore–Penrose pseudoinverse

A^{†}

(which has been applied in an ad hoc manner in the field, particularly in econometrics in the 2010s; consult Pereira-López et al. [81] and Bolotov [16]) and a Bott–Duffin inverse

A_{Z}^{†}

(based on the literature research, this text may be the first to consider a constrained generalized inverse to generalize an

A^{†}

-based estimator) with the help of a (2) regularization-inspired correction [4] (pp. 477–479), achieving greater fit quality in terms of ensuring strict satisfaction of problem constraints (previously infeasible under an

A^{†}

-based solution), while yielding a (potentially) unique and MNBLUE solution (by setting

α = 1

). At the same time, CLSP is well grounded in the algorithmic logic of the works in question, such as Wolfe [25] and Dax [26], as further formalized by, e.g., Osborne [27]. However, the primary strength of the framework in question lies in its “generality”: in contrast to the (individual) problem-specific algorithms, such as the ones developed by Übi [28,29,30,31,32]—routines combining (NN)LS, LP, and QP within a single framework to solve both well- and ill-posed convex optimization problems—CLSP provides a unified estimator with a proper methodology for numerical stability and goodness-of-fit assessment, centered around RMSA and NRMSE. A secondary strength is its practical “versatility”: through at least three classes of special cases—APs (TMs), CMLS (RPs), and LPRLS/QPRLS—it allows estimation of (rectangular) input–output tables (national, inter-country, or satellite), structural matrices (such as trade, country-product, investment, or migration data), and previously infeasible textbook-definition models of economic variables and/or LP (QP) problems. Furthermore, the inclusion of

A_{Z}^{†}

allows probabilistic and Bayesian applications since its subspace restrictions can be interpreted in terms of random variables and prior distributions, as well as Monte Carlo-based inference. To conclude, CLSP has already been fully implemented in Python 3 (pyclsp, pytmpinv, and pylppinv) and is to be ported into R and Stata in the future, making it a readily available alternative to purely theoretical or not freely accessible algorithms or proprietary packages, and applicable in a real-world setting (such as econometric research, replacing [15,16,81] with the two-step corrected estimation).

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The numerical examples used in the manuscript are based on simulated data. The Python 3.10 and Stata 14 code for replication purposes is provided in Listings 1–4 and A1.

Acknowledgments

All arguments, structure, and conclusions in this manuscript are the author’s own. During the preparation of the manuscript, the author recurred to ChatGPT-5 (OpenAI) solely to improve clarity and language, including refinement of style, formal precision in proofs of theorems, and assistance with interpretation of data and results. No content was adopted without the author’s review and editing, and the author claims full responsibility for the final content of the manuscript.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript (in order of their appearance in the sections):

CLSP	Convex Least Squares Programming
LS	Least Squares
OLS	Ordinary Least Squares
NNLS	Non-Negative Least Squares
SVD	Singular Value Decomposition
LP	Linear Programming
QP	Quadratic Programming
Lasso	Least Absolute Shrinkage and Selection Operator
Ridge	Ridge Regression (Tikhonov regularization)
MNBLUE	Minimum-Norm Best Linear Unbiased Estimator
BLUE	Best Linear Unbiased Estimator
RMSA	Root Mean Square Alignment
RMSE	Root Mean Square Error
NRMSE	Normalized Root Mean Square Error
ANOVA	Analysis of Variance
CLT	Lindeberg–Lévy Central Limit Theorem
APs	Allocation Problems
TMs	Tabular Matrix Problems
UNCTAD	UN Trade and Development
CMLS	Constrained-Model Least Squares
RPs	Regression Problems
NBER	U.S. National Bureau of Economic Research
LPRLS	Linear Programming via Regularized Least Squares
QPRLS	Quadratic Programming via Regularized Least Squares
iid	independent and identically distributed

Appendix A

This appendix presents (a) an equation of the Bott–Duffin inverse,

{(A_{Z}^{(r)})}^{†}

, based on the singular value decomposition (SVD), as well as (b) a partitioned decomposition of

B^{(r)}

(consult Equations (A1)–(A3) and, for the history of all three concepts, Section 2 and Section 4).

Appendix A.1

Let

A^{(r)} \in R^{m \times n}

be a real matrix, and let

Z \in R^{n \times n}

be a symmetric idempotent matrix (i.e.,

Z^{2} = Z = Z^{⊤}

) defining a subspace restriction. Then, the Bott–Duffin inverse

{(A_{Z}^{(r)})}^{†} \in R^{n \times m}

is given by the formula

{(A_{Z}^{(r)})}^{†} = {(Z {(A^{(r)})}^{⊤} A^{(r)} Z)}^{†} Z {(A^{(r)})}^{⊤}

, as defined in Equation (5), where the projected Gram matrix

Z {(A^{(r)})}^{⊤} A^{(r)} Z

is computable via its SVD:

Z {(A^{(r)})}^{⊤} A^{(r)} Z = U Σ V^{⊤} = U Σ U^{⊤}

(A1)

where

U, V \in R^{n \times n}

are orthogonal (since the projected Gram matrix is symmetric positive semidefinite,

U = V

), and

Σ = diag (σ_{1}, \dots, σ_{rank}, 0, \dots, 0) \in R^{n \times n}

contains the singular values. Hence, for

Σ^{†} = diag (σ_{1}^{- 1}, \dots, σ_{rank}^{- 1}, 0, \dots, 0)

,

U Σ^{†} U^{⊤}

substitutes

{(Z {(A^{(r)})}^{⊤} A^{(r)} Z)}^{†}

:

{(A_{Z}^{(r)})}^{†} = {(Z {(A^{(r)})}^{⊤} A^{(r)} Z)}^{†} Z {(A^{(r)})}^{⊤} = (U Σ^{†} U^{⊤}) Z {(A^{(r)})}^{⊤}

(A2)

Appendix A.2

Let

rank ([C S]) \leq rank (C) + rank (S)

in the analysis of numerical stability of the first estimate

{\hat{z}}^{(r)}

as defined in Equation (2). Then a decomposition of

B^{(r)}

reveals how the interaction of constraints

C

and slack structure

S

forms the curvature of the solution space:

\begin{matrix} B^{(r)} & = A^{(r)} {[\begin{matrix} C & S \end{matrix}]}^{†} \\ = f ([\begin{matrix} C & S \end{matrix}], g (M, b - A^{(r - 1)} {\hat{z}}^{(r - 1)})) \cdot {[\begin{matrix} C^{⊤} C & C^{⊤} S \\ S^{⊤} C & S^{⊤} S \end{matrix}]}^{†} [\begin{matrix} C^{⊤} \\ S^{⊤} \end{matrix}] \end{matrix}

(A3)

While the block

C^{⊤} C

captures the contribution of the constraints and

S^{⊤} S

reflects the curvature introduced by slack variables, the off-diagonal blocks

C^{⊤} S

and

S^{⊤} C

encode cross-interactions between constraints and slacks: if these off-diagonal blocks vanish, the effects are orthogonal and additive (in the column space sense since

C^{⊤} S = 0

implies

R (C) ⊥ R (S)

); otherwise,

C

is orthogonalized with respect to

R (S)

, which is affected by

(I - S S^{†}) C

. Hence, the presence of

S

in

[C S]

can improve or worsen the ill-posedness of the canonized problem: namely,

C^{⊤} S \neq 0

tends to worsen the condition number

κ ([C S])

.

Appendix B

This appendix presents Stata code, i.e., a .do-file, for the Monte Carlo experiment, compatible with version 14.0 and above (Stata/MP with at least four CPUs is recommended for replication—specifically, 50,000 iterations require substantial CPU time and memory).

Stata/MP 14.0 dependencies:

TITLE

’SUMMARIZEBY’: module to use statsby functionality with summarize

DESCRIPTION/AUTHOR(S)

This routine combines the summarize command with statsby. It iteratively collects summarize results for each variable with statsby and saves the result to memory or to a Stata .dta file.

KW: data management

KW: summarize

KW: statsby

Requires: Stata version 8

Distribution–Date: 20221012

Author: Ilya Bolotov, Prague University of Economics and Business

Support: email ilya.bolotov@vse.cz

INSTALLATION FILES

summarizeby.ado

summarizeby.sthlp

(type ssc install summarizeby to install)

Listing A1. Stata code from the Monte Carlo experiment dedicated to mapping the distribution of NRMSE.

References

Nocedal, J.; Wright, S.J. Numerical Optimization, 2nd ed.; Springer Series in Operations Research; Springer: New York, NY, USA, 2006; p. 664. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization, 1st ed.; Cambridge University Press: Cambridge, UK, 2004; p. 727. [Google Scholar] [CrossRef]
Sydsæter, K.; Hammond, P.; Seierstad, A.; Strøm, A. Further Mathematics for Economic Analysis, 2nd ed.; FT Prentice Hall: Harlow, UK; München, Germany, 2011; p. 616. [Google Scholar]
Gentle, J.E. Matrix Algebra: Theory, Computations and Applications in Statistics, 3rd ed.; Springer Texts in Statistics; Springer: Cham, Switzerland, 2024; p. 725. [Google Scholar] [CrossRef]
Dantzig, G.B. Reminiscences about the Origins of Linear Programming. Mem. Am. Math. Soc. 1984, 48, 1–11. [Google Scholar] [CrossRef]
Dantzig, G.B. Linear Programming and Extensions, 1st ed.; Reprint of 1963; Princeton Landmarks in Mathematics and Physics; Princeton University Press: Princeton, NJ, USA, 1998; p. 656. [Google Scholar]
Koopmans, T.C. Activity Analysis of Production and Allocation: Proceedings of a Conference, 1st ed.; Wiley: New York, NY, USA, 1951; p. 404. [Google Scholar]
Allen, R.G.D. Mathematical Economics, 2nd ed.; Reprint of 1959; Palgrave Macmillan: London, UK, 1976; p. 812. [Google Scholar] [CrossRef]
Lancaster, K. Mathematical Economics, 1st ed.; Dover Publications: New York, NY, USA, 1987; p. 411. [Google Scholar]
Intriligator, M.D. Mathematical Optimization and Economic Theory, 1st ed.; Reprint of 1971; Classics in Applied Mathematics; SIAM: Philadelphia, PA, USA, 2002; p. 508. [Google Scholar] [CrossRef]
Intriligator, M.D.; Arrow, K.J. Handbook of Mathematical Economics, 1st ed.; Handbooks in Economics; North-Holland: Amsterdam, The Netherlands; New York, NY, USA, 1981; p. 378. [Google Scholar]
Dorfman, R.; Samuelson, P.A.; Solow, R.M. Linear Programming and Economic Analysis, 1st ed.; Reprint of 1958; Dover Books on Advanced Mathematics; Dover Publications: New York, NY, USA, 1987; p. 525. [Google Scholar]
Frühwirth, T.; Abdennadher, S. Essentials of Constraint Programming, 1st ed.; Cognitive Technologies; Springer: Berlin/Heidelberg, Germany, 2003; p. 144. [Google Scholar] [CrossRef]
Rossi, F.; van Beek, P.; Walsh, T. (Eds.) Handbook of Constraint Programming, 1st ed.; Foundations of Artificial Intelligence; Elsevier: Amsterdam, The Netherlands; Boston, MA, USA, 2006; Volume 2, p. 955. [Google Scholar]
Bolotov, I. Modeling of Time Series Cyclical Component on a Defined Set of Stationary Points and Its Application on the U.S. Business Cycle. In Proceedings of the the 8th International Days of Statistics and Economics. Melandrium, Prague, Czech Republic, 11–13 September 2014; pp. 151–160. [Google Scholar]
Bolotov, I. Modeling Bilateral Flows in Economics by Means of Exact Mathematical Methods. In Proceedings of the 9th International Days of Statistics and Economics. Melandrium, Prague, Czech Republic, 10–12 September 2015; pp. 199–208. [Google Scholar]
Rao, C.R.; Mitra, S.K. Generalized Inverse of Matrices and Its Applications, 1st ed.; Wiley Series in Probability and Mathematical Statistics; Wiley: New York, NY, USA, 1971; p. 240. [Google Scholar] [CrossRef]
Ben-Israel, A.; Greville, T.N.E. Generalized Inverses: Theory and Applications, 2nd ed.; Reprint of 2003; CMS Books in Mathematics; Springer: New York, NY, USA, 2006; p. 420. [Google Scholar] [CrossRef]
Lawson, C.L.; Hanson, R.J. Solving Least Squares Problems, 1st ed.; Reprint of 1974; Classics in Applied Mathematics; SIAM: Philadelphia, PA, USA, 1995; p. 337. [Google Scholar] [CrossRef]
Wang, G.; Wei, Y.; Qiao, S. Generalized Inverses: Theory and Computations, 1st ed.; Developments in Mathematics; Springer: Singapore, 2018; Volume 53, p. 397. [Google Scholar] [CrossRef]
Whiteside, M.M.; Choi, B.; Eakin, M.; Crockett, H. Stability of Linear Programming Solutions Using Regression Coefficients. J. Stat. Comput. Simul. 1994, 50, 131–146. [Google Scholar] [CrossRef]
Blair, C. Random Linear Programs with Many Variables and Few Constraints. Math. Program. 1986, 34, 62–71. [Google Scholar] [CrossRef]
Blair, C. Random Inequality Constraint Systems with Few Variables. Math. Program. 1986, 35, 135–139. [Google Scholar] [CrossRef]
Tikhonov, A.N.; Goncharskiy, A.V.; Stepanov, V.V.; Yagola, A.G. Chislennyye Metody Resheniya Nekorrektnykh Zadach, 2nd ed.; Nauka: Moscow, Russia, 1990; p. 232. [Google Scholar]
Wolfe, P. A Technique for Resolving Degeneracy in Linear Programming. J. Soc. Ind. Appl. Math. 1963, 11, 205–211. [Google Scholar] [CrossRef]
Dax, A. Linear Programming via Least Squares. Linear Algebra Its Appl. 1988, 111, 313–324. [Google Scholar] [CrossRef]
Osborne, M.R. Degeneracy: Resolve or Avoid? J. Oper. Res. Soc. 1992, 43, 829–835. [Google Scholar] [CrossRef]
Übi, E. Exact and Stable Least Squares Solution to the Linear Programming Problem. Cent. Eur. J. Math. 2005, 3, 228–241. [Google Scholar] [CrossRef]
Übi, E. On Stable Least Squares Solution to the System of Linear Inequalities. Open Math. 2007, 5, 373–385. [Google Scholar] [CrossRef]
Übi, E. A Numerically Stable Least Squares Solution to the Quadratic Programming Problem. Open Math. 2008, 6, 171–178. [Google Scholar] [CrossRef]
Übi, E. Mathematical Programming via the Least-Squares Method. Open Math. 2010, 8, 795–806. [Google Scholar] [CrossRef]
Übi, E. Linear Inequalities via Least Squares. Proc. Est. Acad. Sci. 2013, 62, 238–248. [Google Scholar] [CrossRef]
Dresden, A. The Fourteenth Western Meeting of the American Mathematical Society. Bull. Am. Math. Soc. 1920, 26, 385–397. [Google Scholar] [CrossRef]
Bjerhammar, A. Rectangular Reciprocal Matrices, With Special Reference to Geodetic Calculations. Bull. Géodésique 1951, 20, 188–220. [Google Scholar] [CrossRef]
Penrose, R. A Generalized Inverse for Matrices. Math. Proc. Camb. Philos. Soc. 1955, 51, 406–413. [Google Scholar] [CrossRef]
Penrose, R. On Best Approximate Solutions of Linear Matrix Equations. Math. Proc. Camb. Philos. Soc. 1956, 52, 17–19. [Google Scholar] [CrossRef]
Chipman, J.S. On Least Squares with Insufficient Observations. J. Am. Stat. Assoc. 1964, 59, 1078–1111. [Google Scholar] [CrossRef]
Price, C.M. The Matrix Pseudoinverse and Minimal Variance Estimates. SIAM Rev. 1964, 6, 115–120. [Google Scholar] [CrossRef]
Plackett, R.L. Some Theorems in Least Squares. Biometrika 1950, 37, 149–157. [Google Scholar] [CrossRef]
Rao, C.R.; Mitra, S.K. Generalized Inverse of a Matrix and Its Applications. In Theory of Statistics: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume I, 1st ed.; Le Cam, L.M., Neyman, J., Scott, E.L., Eds.; University of California Press: Berkeley, CA, USA, 1972; pp. 601–620. [Google Scholar]
Greville, T.N.E. The Pseudoinverse of a Rectangular Matrix and Its Statistical Applications. In Proceedings of the Annual Meeting of the American Statistical Association, Chicago, IL, USA, 27–30 December 1958; American Statistical Association: Alexandria, VA, USA, 1958; pp. 116–121. [Google Scholar]
Greville, T.N.E. The Pseudoinverse of a Rectangular or Singular Matrix and Its Application to the Solution of Systems of Linear Equations. SIAM Rev. 1959, 1, 38–43. [Google Scholar] [CrossRef]
Greville, T.N.E. Some Applications of the Pseudoinverse of a Matrix. SIAM Rev. 1960, 2, 15–22. [Google Scholar] [CrossRef]
Greville, T.N.E. Note on Fitting of Functions of Several Independent Variables. J. Soc. Ind. Appl. Math. 1961, 9, 109–115. [Google Scholar] [CrossRef]
Cline, R.E. Representations for the Generalized Inverse of a Partitioned Matrix. J. Soc. Ind. Appl. Math. 1964, 12, 588–600. [Google Scholar] [CrossRef]
Cline, R.E.; Greville, T.N.E. An Extension of the Generalized Inverse of a Matrix. SIAM J. Appl. Math. 1970, 19, 682–688. [Google Scholar] [CrossRef]
Golub, G.H.; Kahan, W. Calculating the Singular Values and Pseudo-Inverse of a Matrix. SIAM J. Numer. Anal. 1965, 2, 205–224. [Google Scholar] [CrossRef]
Ben-Israel, A.; Cohen, D. On Iterative Computation of Generalized Inverses and Associated Projections. SIAM Rev. 1966, 8, 410–419. [Google Scholar] [CrossRef]
Lewis, T.O.; Newman, T.G. Pseudoinverses of Positive Semidefinite Matrices. SIAM J. Appl. Math. 1968, 16, 701–703. [Google Scholar] [CrossRef]
Bott, R.; Duffin, R.J. On the Algebra of Networks. Trans. Am. Math. Soc. 1953, 74, 99–109. [Google Scholar] [CrossRef]
Campbell, S.L.; Meyer, C.D. Generalized Inverses of Linear Transformations, 1st ed.; Reprint of 1979; Classics in Applied Mathematics; SIAM: Philadelphia, PA, USA, 2009; p. 272. [Google Scholar] [CrossRef]
Meyer, C.D., Jr. Generalized Inverses and Ranks of Block Matrices. SIAM J. Appl. Math. 1973, 25, 597–602. [Google Scholar] [CrossRef]
Hartwig, R.E. Block Generalized Inverses. Arch. Ration. Mech. Anal. 1976, 61, 197–251. [Google Scholar] [CrossRef]
Rao, C.R.; Yanai, H. Generalized Inverses of Partitioned Matrices Useful in Statistical Applications. Linear Algebra Its Appl. 1985, 70, 105–113. [Google Scholar] [CrossRef]
Tian, Y. The Moore-Penrose Inverses of m x n Block Matrices and Their Applications. Linear Algebra Its Appl. 1998, 283, 35–60. [Google Scholar] [CrossRef]
Rakha, M.A. On the Moore-Penrose Generalized Inverse Matrix. Appl. Math. Comput. 2004, 158, 185–200. [Google Scholar] [CrossRef]
Baksalary, O.M.; Trenkler, G. On Formulae for the Moore-Penrose Inverse of a Columnwise Partitioned Matrix. Appl. Math. Comput. 2021, 403, 125913. [Google Scholar] [CrossRef]
Albert, A.E. Regression and the Moore-Penrose Pseudoinverse, 1st ed.; Mathematics in Science and Engineering; Academic Press: New York, NY, USA, 1972; p. 180. [Google Scholar]
Dokmanić, I.; Kolundžija, M.; Vetterli, M. Beyond Moore-Penrose: Sparse Pseudoinverse. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 6526–6530. [Google Scholar] [CrossRef]
Baksalary, O.M.; Trenkler, G. The Moore-Penrose Inverse: A Hundred Years on a Frontline of Physics Research. Eur. Phys. J. H 2021, 46, 9. [Google Scholar] [CrossRef]
Mortari, D. Least-Squares Solution of Linear Differential Equations. Mathematics 2017, 5, 48. [Google Scholar] [CrossRef]
Getson, A.J.; Hsuan, F.C. {2}-Inverses and Their Statistical Application, 1st ed.; Reprint of 1988; Lecture Notes in Statistics; Springer: New York, NY, USA, 2012; p. 110. [Google Scholar] [CrossRef]
Björck, A. Numerical Methods for Least Squares Problems, 1st ed.; SIAM: Philadelphia, PA, USA, 1996; p. 408. [Google Scholar]
Kantorovich, L.V. Matematicheskiye Metody Organizatsii i Planirovaniia Proizvodstva, reprint ed.; Izdatel’skiy dom S.-Peterb. gos. un-ta: Saint Petersburg, Russia, 2012; p. 96. [Google Scholar]
Shamir, R. The Efficiency of the Simplex Method: A Survey. Manag. Sci. 1987, 33, 301–334. [Google Scholar] [CrossRef]
Stone, R.E.; Tovey, C.A. The Simplex and Projective Scaling Algorithms as Iteratively Reweighted Least Squares Methods. SIAM Rev. 1991, 33, 220–237. [Google Scholar] [CrossRef]
Wagner, H.M. Linear Programming Techniques for Regression Analysis. J. Am. Stat. Assoc. 1959, 54, 206–212. [Google Scholar] [CrossRef]
Sielken, R.L.; Hartley, H.O. Two Linear Programming Algorithms for Unbiased Estimation of Linear Models. J. Am. Stat. Assoc. 1973, 68, 639–641. [Google Scholar] [CrossRef]
Kiountouzis, E.A. Linear Programming Techniques in Regression Analysis. Appl. Stat. 1973, 22, 69. [Google Scholar] [CrossRef]
Sposito, V.A. On Unbiased Lp Regression Estimators. J. Am. Stat. Assoc. 1982, 77, 652–653. [Google Scholar] [CrossRef]
Judge, G.G.; Takayama, T. Inequality Restrictions in Regression Analysis. J. Am. Stat. Assoc. 1966, 61, 166–181. [Google Scholar] [CrossRef]
Mantel, N. Restricted Least Squares Regression and Convex Quadratic Programming. Technometrics 1969, 11, 763–773. [Google Scholar] [CrossRef]
Donoho, D.L.; Tanner, J. Sparse Nonnegative Solution of Underdetermined Linear Equations by Linear Programming. Proc. Natl. Acad. Sci. USA 2005, 102, 9446–9451. [Google Scholar] [CrossRef] [PubMed]
George, K.; Osborne, M.R. On Degeneracy in Linear Programming and Related Problems. Ann. Oper. Res. 1993, 46–47, 343–359. [Google Scholar] [CrossRef]
Stoer, J. On the Numerical Solution of Constrained Least-Squares Problems. SIAM J. Numer. Anal. 1971, 8, 382–411. [Google Scholar] [CrossRef]
Waterman, M.S. A Restricted Least Squares Problem. Technometrics 1974, 16, 135–136. [Google Scholar] [CrossRef]
Grafarend, E.W.; Awange, J.L. Algebraic Solutions of Systems of Equations. In Linear and Nonlinear Models, 1st ed.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 527–569. [Google Scholar] [CrossRef]
Qian, J.; Andrew, A.L.; Chu, D.; Tan, R.C.E. Methods for Solving Underdetermined Systems. Numer. Linear Algebra Appl. 2018, 25, 17. [Google Scholar] [CrossRef]
Barnes, E.; Chen, V.; Gopalakrishnan, B.; Johnson, E.L. A Least-Squares Primal-Dual Algorithm for Solving Linear Programming Problems. Oper. Res. Lett. 2002, 30, 289–294. [Google Scholar] [CrossRef]
Bei, X.; Chen, N.; Zhang, S. Solving Linear Programming with Constraints Unknown. In Proceedings of the 42nd International Colloquium, ICALP 2015, Kyoto, Japan, 6–10 July 2015; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9134, pp. 129–142. [Google Scholar] [CrossRef]
Pereira-López, X.; Fernández-Fernández, M.; Carrascal-Incera, A. Rectangular Input-output Models by Moore-Penrose Inverse. Rev. Electrónica De Comun. Y Trab. De Asepuma 2014, 15, 13–24. [Google Scholar]
Bolotov, I. The Problem of Relationships Between Conditional Statements and Arithmetic Functions. Mundus Symb. 2012, 20, 5–12. [Google Scholar]
Bolotov, I. SUMMARIZEBY: Stata Module to Use Statsby Functionality with Summarize, version 1.1.2; Boston College Department of Economics: Boston, MA, USA, 2022. Available online: https://ideas.repec.org/c/boc/bocode/s458870.html (accessed on 26 October 2025).
Bolotov, I. PYCLSP: Modular Two-Step Convex Optimization Estimator for Ill-Posed Problems, version 1.3.0. 2025. Available online: https://pypi.org/project/pyclsp/ (accessed on 26 October 2025).
Bolotov, I. PYTMPINV: Tabular Matrix Problems via Pseudoinverse Estimation, version 1.2.0. 2025. Available online: https://pypi.org/project/pytmpinv/ (accessed on 26 October 2025).
Bolotov, I. PYLPPINV: Linear Programming via Pseudoinverse Estimation, version 1.3.0. 2025. Available online: https://pypi.org/project/pylppinv/ (accessed on 26 October 2025).

Figure 1. LP and QP in programming. Adapted from Dantzig (1990, Figure 1.4.1, p. 8) [6] (pp. 1–11).

Figure 2. Applicability of selected types of convex optimization methods across problem classes.

Figure 3. Map of citation and reference counts for the above-cited sources, where hollow circles are suggested items. Monographs not indexed in the database are denoted as (?), with Lawson and Hanson being included in the bibliography as a 1995 reprint [19]. Adapted from Litmaps (Shared Citations and References), available at https://www.litmaps.com/ (accessed on 30 September 2025).

Figure 4. Algorithmic flow of the CLSP Estimator. Decision nodes (diamonds) indicate modularity.

Figure 5. Schematic correlogram of constraint rows i with high alignment (

{RMSA}_{i} \geq threshold

). Each row encodes directional alignment and symbolic marginal effects on model conditioning and fit.

Figure 5. Schematic correlogram of constraint rows i with high alignment (

{RMSA}_{i} \geq threshold

). Each row encodes directional alignment and symbolic marginal effects on model conditioning and fit.

Figure 6. Heatmaps (based on seaborn’s viridis colormap) of means,

mean = \frac{1}{50, 000} \sum_{i = 1}^{50, 000} {NRMSE}_{i}

, and standard deviations,

sd = \sqrt{\frac{1}{50, 000 - 1} \sum_{i = 1}^{50, 000} {({NRMSE}_{i} - mean)}^{2}}

, computed from the NRMSE of

{\hat{z}}^{(1)} = {(A^{(1)})}^{†} b

, where

x = vec (X \in R^{m \times p}, 1 \leq m, p \leq 50) \subseteq z

, under two structural variants: with and without a zero diagonal (

\forall_{i \in {1, \dots, min (m, p)}} X_{i i} = 0

). Color reflects the magnitude of the statistic, with lighter shades indicating higher values. The figure was prepared using matplotlib in Python 3.

Figure 6. Heatmaps (based on seaborn’s viridis colormap) of means,

mean = \frac{1}{50, 000} \sum_{i = 1}^{50, 000} {NRMSE}_{i}

, and standard deviations,

sd = \sqrt{\frac{1}{50, 000 - 1} \sum_{i = 1}^{50, 000} {({NRMSE}_{i} - mean)}^{2}}

, computed from the NRMSE of

{\hat{z}}^{(1)} = {(A^{(1)})}^{†} b

, where

x = vec (X \in R^{m \times p}, 1 \leq m, p \leq 50) \subseteq z

, under two structural variants: with and without a zero diagonal (

\forall_{i \in {1, \dots, min (m, p)}} X_{i i} = 0

). Color reflects the magnitude of the statistic, with lighter shades indicating higher values. The figure was prepared using matplotlib in Python 3.

Table 1. Selected types of generalized inverses

G \in A^{{\cdot}}

, where

A^{{\cdot}} \subseteq R^{n \times m}

,

x \in R^{n}

, and

b \in R^{m}

.

Table 1. Selected types of generalized inverses

G \in A^{{\cdot}}

, where

A^{{\cdot}} \subseteq R^{n \times m}

,

x \in R^{n}

, and

b \in R^{m}

.

Type of Inverse	Terminology	Properties
${1}$	Equation Solving Inverse	$G \in A^{{1}}$ iff $A G A = A$ . Hence, $G b$ is a solution to $A x = b$ for all $b \in R (A)$ .
${1, 2}$	Reflexive Inverse	$G \in A^{{1, 2}}$ iff $A G A = A$ and $G A G = G$ . Each ${1, 2}$ -inverse defines a direct sum $N (A) \oplus R (G) = R^{n}$ , $R (A) \oplus N (G) = R^{m}$ . For complementary subspaces $(N, R)$ , $G_{N, R}$ is unique, with $R (G_{N, R}) = R$ and $N (G_{N, R}) = N$ .
${1, 3}$	Least Squares Inverse	$G \in A^{{1, 3}}$ iff $A G A = A$ and ${(A G)}^{⊤} = A G$ . Hence, $G b$ is the least-squares solution minimizing the norm ${∥A x - b∥}_{2}^{2}$ .
${1, 4}$	Minimum Norm Inverse	$G \in A^{{1, 4}}$ iff $A G A = A$ and ${(G A)}^{⊤} = G A$ . Hence, $G b$ is the minimum-norm solution of $A x = b$ for all $b \in R (A)$ .
${1, 2, 3, 4}$	Moore–Penrose Inverse	The unique $G \equiv A^{†} \in A^{{1, 2, 3, 4}}$ is the least-squares minimum-norm solution.

Source: Adapted from Campbell and Meyer (2009, Table 6.1, p. 96) [51] (pp. 91–119) as a summary of Rao and Mitra [17] (pp. 44–71), Rao and Mitra [40], Ben-Israel and Greville [18] (pp. 40–51), and Wang et al. [20] (pp. 10–19, 33–42).

Table 2. Subset (at

m, p mod 10 = 0

) of the summary statistics for NRMSE from the simulation study.

Table 2. Subset (at

m, p mod 10 = 0

) of the summary statistics for NRMSE from the simulation study.

m	p	Mean	sd	Skewness	Kurtosis	Min	Max	p1	p5	p25	p50	p75	p95	p99
Without Zero Diagonal
10	10	000.81	000.59	000.82	003.24	000.00	003.52	000.01	000.06	000.33	000.70	001.18	001.94	002.47
10	20	001.53	001.14	000.92	003.60	000.00	008.10	000.02	000.12	000.62	001.31	002.21	003.70	004.80
10	30	005.30	003.96	000.95	003.69	000.00	027.25	000.09	000.41	002.13	004.51	007.65	012.95	016.90
10	40	007.56	005.67	000.97	003.77	000.00	046.23	000.12	000.61	003.02	006.45	010.90	018.59	024.18
10	50	025.08	018.95	000.99	003.84	000.00	127.63	000.39	001.98	009.99	021.10	036.31	061.64	080.75
20	10	000.57	000.43	000.93	003.60	000.00	002.73	000.01	000.04	000.23	000.49	000.83	001.39	001.79
20	20	007.90	005.85	000.89	003.48	000.00	040.71	000.13	000.64	003.18	006.77	011.48	019.17	024.57
20	30	004.10	003.03	000.92	003.61	000.00	021.59	000.06	000.33	001.68	003.50	005.92	009.92	012.87
20	40	005.85	004.37	000.95	003.71	000.00	031.73	000.09	000.47	002.36	005.00	008.44	014.27	018.65
20	50	008.52	006.36	000.94	003.63	000.00	043.37	000.14	000.68	003.44	007.22	012.36	020.73	026.90
30	10	002.89	002.16	000.97	003.78	000.00	015.48	000.05	000.23	001.16	002.47	004.17	007.05	009.19
30	20	002.85	002.11	000.92	003.58	000.00	014.28	000.05	000.23	001.16	002.43	004.12	006.88	008.93
30	30	004.44	003.32	000.94	003.66	000.00	022.66	000.07	000.35	001.78	003.76	006.43	010.85	014.08
30	40	008.73	006.50	000.95	003.69	000.00	048.55	000.14	000.69	003.56	007.47	012.61	021.13	027.85
30	50	005.61	004.22	000.96	003.69	000.00	031.96	000.09	000.44	002.25	004.75	008.10	013.70	017.94
40	10	000.55	000.41	000.98	003.80	000.00	003.10	000.01	000.04	000.22	000.47	000.79	001.35	001.76
40	20	004.67	003.48	000.93	003.66	000.00	025.54	000.07	000.35	001.87	003.97	006.74	011.34	014.83
40	30	001.39	001.04	000.95	003.69	000.00	006.70	000.02	000.11	000.56	001.18	002.01	003.40	004.39
40	40	000.27	000.20	000.96	003.71	000.00	001.48	000.00	000.02	000.11	000.23	000.39	000.67	000.87
40	50	000.29	000.21	000.95	003.69	000.00	001.58	000.00	000.02	000.11	000.24	000.41	000.70	000.91
50	10	003.74	002.82	000.99	003.84	000.00	023.34	000.06	000.29	001.50	003.16	005.37	009.20	012.13
50	20	001.65	001.24	000.95	003.68	000.00	008.17	000.03	000.13	000.67	001.40	002.39	004.05	005.26
50	30	005.24	003.90	000.94	003.69	000.00	028.36	000.08	000.42	002.12	004.47	007.56	012.74	016.54
50	40	000.47	000.35	000.97	003.73	000.00	002.40	000.01	000.04	000.19	000.40	000.67	001.15	001.49
50	50	000.23	000.17	000.94	003.64	000.00	001.21	000.00	000.02	000.09	000.20	000.33	000.56	000.73
10	10	000.71	000.52	000.85	003.37	000.00	003.20	000.01	000.06	000.29	000.62	001.04	001.71	002.20
10	20	000.69	000.51	000.92	003.60	000.00	003.54	000.01	000.05	000.28	000.59	001.00	001.67	002.16
10	30	001.93	001.44	000.95	003.74	000.00	010.17	000.03	000.15	000.78	001.64	002.77	004.69	006.13
10	40	004.65	003.50	000.97	003.76	000.00	023.68	000.07	000.37	001.86	003.92	006.72	011.43	014.89
10	50	001.94	001.47	000.98	003.84	000.00	012.42	000.03	000.15	000.77	001.64	002.81	004.75	006.25
20	10	000.70	000.52	000.92	003.59	000.00	003.70	000.01	000.06	000.28	000.60	001.02	001.70	002.20
20	20	000.70	000.52	000.93	003.57	000.00	003.68	000.01	000.05	000.28	000.60	001.01	001.71	002.21
20	30	000.42	000.31	000.94	003.68	000.00	002.24	000.01	000.03	000.17	000.36	000.61	001.02	001.34
20	40	000.27	000.20	000.94	003.64	000.00	001.27	000.00	000.02	000.11	000.23	000.39	000.65	000.85
20	50	002.06	001.55	000.97	003.81	000.00	011.72	000.03	000.16	000.83	001.75	002.98	005.04	006.64
30	10	000.45	000.34	000.95	003.67	000.00	002.40	000.01	000.04	000.18	000.38	000.65	001.10	001.43
30	20	000.42	000.32	000.95	003.71	000.00	002.16	000.01	000.03	000.17	000.36	000.61	001.03	001.34
30	30	000.45	000.34	000.96	003.76	000.00	002.53	000.01	000.03	000.18	000.38	000.65	001.10	001.43
30	40	000.34	000.26	000.97	003.73	000.00	001.88	000.01	000.03	000.14	000.29	000.50	000.85	001.10
30	50	000.57	000.43	000.96	003.75	000.00	003.21	000.01	000.05	000.23	000.48	000.82	001.39	001.81
40	10	000.34	000.26	000.96	003.76	000.00	001.83	000.01	000.03	000.14	000.29	000.49	000.84	001.08
40	20	000.33	000.25	000.98	003.77	000.00	001.68	000.01	000.03	000.13	000.28	000.47	000.80	001.05
40	30	000.63	000.47	000.97	003.80	000.00	003.36	000.01	000.05	000.25	000.53	000.90	001.52	001.99
40	40	000.38	000.28	000.98	003.82	000.00	002.23	000.01	000.03	000.15	000.32	000.54	000.92	001.20
40	50	000.30	000.23	000.97	003.78	000.00	001.76	000.00	000.02	000.12	000.26	000.44	000.75	000.98
50	10	000.61	000.46	000.97	003.74	000.00	003.16	000.01	000.05	000.24	000.52	000.88	001.48	001.95
50	20	000.17	000.13	000.97	003.76	000.00	000.87	000.00	000.01	000.07	000.14	000.24	000.41	000.54
50	30	000.32	000.24	000.97	003.82	000.00	002.10	000.00	000.03	000.13	000.28	000.47	000.80	001.03
50	40	000.61	000.46	000.95	003.67	000.00	003.04	000.01	000.05	000.24	000.52	000.89	001.49	001.95
50	50	000.68	000.51	000.96	003.70	000.00	003.76	000.01	000.05	000.27	000.58	000.98	001.66	002.16

Source: self-prepared.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bolotov, I. CLSP: Linear Algebra Foundations of a Modular Two-Step Convex Optimization-Based Estimator for Ill-Posed Problems. Mathematics 2025, 13, 3476. https://doi.org/10.3390/math13213476

AMA Style

Bolotov I. CLSP: Linear Algebra Foundations of a Modular Two-Step Convex Optimization-Based Estimator for Ill-Posed Problems. Mathematics. 2025; 13(21):3476. https://doi.org/10.3390/math13213476

Chicago/Turabian Style

Bolotov, Ilya. 2025. "CLSP: Linear Algebra Foundations of a Modular Two-Step Convex Optimization-Based Estimator for Ill-Posed Problems" Mathematics 13, no. 21: 3476. https://doi.org/10.3390/math13213476

APA Style

Bolotov, I. (2025). CLSP: Linear Algebra Foundations of a Modular Two-Step Convex Optimization-Based Estimator for Ill-Posed Problems. Mathematics, 13(21), 3476. https://doi.org/10.3390/math13213476

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CLSP: Linear Algebra Foundations of a Modular Two-Step Convex Optimization-Based Estimator for Ill-Posed Problems

Abstract

1. Introduction

2. Historical and Conceptual Background of the CLSP Framework

3. Construction and Formalization of the CLSP Estimator

4. Numerical Stability of the Solutions ${\hat{z}}^{(r)}$ and ${\hat{z}}^{*}$

5. Goodness of Fit of the Solutions ${\hat{z}}^{(r)}$ and ${\hat{z}}^{*}$

6. Special Cases of CLSP Problems: APs, CMLS, and LPRLS/QPRLS

7. Monte Carlo Experiment and Numerical Examples

8. Discussion and Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1

Appendix A.2

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

CLSP: Linear Algebra Foundations of a Modular Two-Step Convex Optimization-Based Estimator for Ill-Posed Problems

Abstract

1. Introduction

2. Historical and Conceptual Background of the CLSP Framework

3. Construction and Formalization of the CLSP Estimator

4. Numerical Stability of the Solutions z ^ ( r ) and z ^ *

5. Goodness of Fit of the Solutions z ^ ( r ) and z ^ *

6. Special Cases of CLSP Problems: APs, CMLS, and LPRLS/QPRLS

7. Monte Carlo Experiment and Numerical Examples

8. Discussion and Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1

Appendix A.2

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4. Numerical Stability of the Solutions ${\hat{z}}^{(r)}$ and ${\hat{z}}^{*}$

5. Goodness of Fit of the Solutions ${\hat{z}}^{(r)}$ and ${\hat{z}}^{*}$