Next Article in Journal
Blockchain-Driven Incentive Mechanism and Multi-Level Federated Learning Method for Behavior Detection in the Internet of Vehicles
Previous Article in Journal
An Advanced Adaptive Group Learning Particle Swarm Optimization Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Normal System in Laplace Expansion and Related Regression Modeling Problems

Independent Researcher, Minneapolis, MN 55305, USA
Symmetry 2025, 17(5), 668; https://doi.org/10.3390/sym17050668 (registering DOI)
Submission received: 31 March 2025 / Revised: 23 April 2025 / Accepted: 25 April 2025 / Published: 27 April 2025
(This article belongs to the Section Mathematics)

Abstract

:
This paper considers some innovative theoretical features and practical applications of the normal system of equations used for estimating parameters in multiple linear regression. The Laplace expansion of a determinant by cofactors and double Laplace expansion are employed for resolving the normal system. Additional features are described, including the ridge regularization applied directly to the normal system, geometric interpretation as a unique hyperplane through the points of special weighted means, Mahalanobis distances from observations to these means for the linear link functions, and multidimensional interpolation. The found properties are useful for a better understanding and interpretation of multiple regression, and the numerical examples demonstrate convenience and applicability of these tools in data modeling.

1. Introduction

Multiple regression is one of the most widely used tools of applied statistical modeling needed for solving various data fitting problems, analyzing predictors’ impact on the outcome variable, and making predictions. The construction of linear models is commonly performed by using the criterion of minimizing the total squared errors by the response variable, and such a model is known as the ordinary least squares (OLS) regression. Multiple textbooks, monographs, research papers, and internet sources are devoted to the OLS models and their applications in statistical modeling—see, for example [1,2,3,4,5,6,7,8,9,10].
The minimization of the OLS objective produces the so-called normal system of equations for the estimation of the model coefficients, and the current paper is devoted to the consideration of the normal system, as well as to its various properties and relations with several other theoretical and practical questions. It describes the classic Laplace expansion and its double form for presenting a determinant via the sum of the elements in a row, a column, or both, multiplied by their cofactors which allows the calculation of determinants via those of lower order [11,12,13]. The paper demonstrates that double Laplace expansion is a useful tool for presenting statistical modeling relations.
Another property obtained from the normal system of equations defines the geometric meaning of multiple linear regression as a hyperplane going through some special points [14,15]: a model with n parameters goes through n points of the weighted mean values. These mean values can be calculated from the data, and a normal system can be interpreted as equations of interpolation through the points of the weighted means [16,17]. Generally, the multivariate interpolation presents a difficult problem requiring grid approximations and other special techniques [18,19,20,21,22]; however, with the linear interpolation via the points of weighted means, the multivariate interpolation can be easily performed.
The normal system of equations can sometimes be based on an ill-conditioned matrix, with its determinant close to zero. An inversion of such a matrix can produce big values, inducting the inflated parameters of regression. In such cases, known as the effects of multicollinearity among the predictors, a regularization in forms of ridge regression, LASSO, elastic net, and other techniques [23,24,25,26,27,28,29,30,31] is applied. A penalizing function in these approaches is added to the OLS objective, which produces parameters of the model adjusted due to the applied restrictions. This paper considers the possibility of applying a penalizing function directly to the normal system of equations, which corresponds to the modified parameters of the model and has its interesting features.
The work also considers the estimation of the logistic regression, which is commonly built in the nonlinear modeling by the maximum likelihood criterion. As it is well-known, the generalized linear models (GLM) can be represented via the linear link functions [32,33,34]. Finding how to transform an outcome variable and obtain the linear link helps reduce a nonlinear to the linear model estimation. The paper describes the Mahalanobis distance known for its scaling multivariable observations for data clustering, segmentation, fusion, and other practical needs [35,36]. This distance is employed for measuring how far the observations are from the weighted means of the predictors used for interpolation by the normal system. A binary outcome variable can be weighted by these distances in relation to the centers of the weighted means of the predictors. The weighted means of the binary outcome can serve as the estimated proportions, which, in turn, can be used as the dependent variable values in the linear link function. Then, the parameters of the logistic regression can be found by solving the normal system of the linearized problem.
The paper is structured as follows. After Section 1 (Introduction), Section 2 describes the Laplace expansion and its double form, and Section 3 presents the relations of the multiple regression in terms of the Laplace expansion. Section 4 considers the geometrical properties of multiple regression as hyperplane interpolation via the points of weighted means. Section 5 describes ridge regularization of the normal system, and Section 6 defines Mahalanobis distances from the observations to the weighted means for building a linear link function of the nonlinear model. Section 7 illustrates the techniques by numerical examples and demonstrates that these approaches are useful in various data modeling problems. Finally, Section 8 summarizes the results.

2. Laplace Expansion and Its Double Form

To start with the main formulation for the Laplace expansion, consider a square matrix A of an n-th order, with elements ajk, where j, k = 1, 2, …, n:
A =   a 11     a 1 n   a 1 n     a n n .
Due to the Laplace expansion [13], the determinant det(A) of the matrix (1) can be presented as sum of the elements of any row multiplied by cofactors of these elements, so with any jth row the determinant equals
det A = k = 1 n 1 j + k a j k m j k ,
where m j k are the minors or determinants of the (n − 1)th order submatrices obtained by removing the jth row and the kth columns from the original matrix (1). The signed minors 1 j + k m j k are the cofactors of the elements a j k , and the expression (2) is called the cofactor expansion of determinant. Similarly, a determinant can be expanded as the sum of the elements of any column multiplied by their cofactors.
By the same pattern, a determinant can be expressed via the double sum of the product of elements in any row and column, weighted by their cofactors as well. Indeed, expanding the minors m j k in (2) by the elements in a jth row yields the expression for the dual expansion.
Let us consider such a formula in more detail. Suppose that matrix A (1) is extended to matrix B of the (n + 1)th order by the additional first row with elements xk, additional first column with elements bj, and an additional element y in the upper left corner:
B = y     x 1   x 2   b 1     a 11   a 12 x n   a 1 n b 2     a 21   a 22 b n     a n 1   a n 2   a 2 n   a n n .
The determinant of matrix B can be reduced to the following formula:
det B = d e t A y j = 1 n k = 1 n 1 j + k m j k b j x k .
This formula is expressed via the determinant of matrix A and the double sum of weighted products of the elements xk in the first row and bj in the first column. The weights in the sum (4) are defined by the minors m j k of matrix A, or, more exactly, by the signed minors 1 j + k m j k , which are the cofactors of the elements ajk of the matrix (1).
By assuming that matrix A is not singular, so its determinant differs from zero, it is possible to represent the expression (4) as
det B = d e t A y j = 1 n k = 1 n 1 j + k m j k det A b j x k .
The elements of the inverted matrix A−1 are defined by the cofactors of the transposed matrix divided by the determinant of the matrix, so we can rewrite the result (5) in the matrix form:
det B = d e t A ( y j = 1 n k = 1 n ( A 1 ) j k b j x k ) = d e t A ( y x A 1 b ) ,
where x and b are vector columns with the elements xk and bj (3), respectively, and prime denotes transposition, so x’ is the vector row.
If the determinant (3) equals zero, det(B) = 0; then, with the non-singular matrix A, the result (6) yields the equation
y = x A 1 b .
We will return to this bilinear form in further consideration.

3. Multiple Linear Regression in Laplace Expansion

Let us briefly recall some main relations of regression modeling. The multiple linear regression for the dependent variable y and n independent variables xk (k = 1, 2, …, n) can be written as
y i = a 0 + a 1 x i 1 + a 2 x i 2 + + a n x i n + ε i ,
where i denotes observations (i = 1, 2, …, N), the intercept a0 and the coefficients of regression aj are the unknown parameters of the model, and ε is the error term. The parameters are estimated by minimizing the sum of squared deviations, which is the OLS criterion:
S 2 = i = 1 N ε i 2 = i = 1 N y i a 0 a 1 x i 1 a n x i n 2 .
Equalizing the derivatives of the expression (9) by the unknown parameters to zero yields the so-called normal system:
i = 1 N y i =   a 0 N +   a 1 i = 1 N x i 1 + + a n i = 1 N x i n   i = 1 N x i 1 y i = a 0 i = 1 N x i 1 + a 1 i = 1 N x i 1 2 + + a n i = 1 N x i 1 x i n i = 1 N x i n y i = a 0 i = 1 N x i n + a 1 i = 1 N x i n x i 1 + + a n i = 1 N x i n 2 .
Dividing the first equation in (10) by N defines the intercept in the regression model:
a 0 = y ¯ a 1 x 1 ¯ a n x n ¯ .
Substituting the intercept (11) into the other n Equation (10) reduces them to the following equation in matrix form:
C x x a = c x y ,
in which the matrix Cxx and vector cxy of the nth order are defined as follows:
C x x = X X ,   c x y = X y .
In the relations (13), X denotes the matrix of N x n order of the centered observations by the predictors, y is the vector of the centered observations of the dependent variable, and X’ is the matrix transposed. The symmetric matrix Cxx and the vector cxy in (12) and (13) are the sample covariance matrix of the predictors x among themselves and the vector of their covariances with the dependent variable y, respectively. Resolving Equation (12) for the vector a of the coefficients of regression yields the expression
a = C x x 1 c x y ,
where C x x 1 is the inverted matrix. With the obtained parameters (14), the model (8) for the centered variables x and y can be presented as follows:
y = x a = x C x x 1 c x y ,
where x’ is the vector row of predictors and y is a value of the dependent variable predicted by the regression.
The obtained solution (15) coincides with Equation (7). Thus, if in the matrix (3) we use covariances with the vector b = cxy, and the matrix A = Cxx, then the regression model corresponds to the following equation for the determinant of the matrix (3):
  y           x   c x y   C x x = 0 ,
which reduces to the solution (15), proving that the regression model can be expressed in terms of the Laplace expansion (7).
For an explicit example, let us use standardized variables centered and normalized by their standard deviations, when, in place of the covariances, we have correlations. The model with two predictors can be presented as follows:
y = β 1 x 1 + β 2 x 2 ,
where the parameters are so-called beta coefficients of the normalized equation. The matrix and vector of correlations (13) for the model (17) are
C x x =   1     r 12 r 12   1 ,   c x y =   r 1 y r 2 y .
Then, relation (16) can be written explicitly as
y   x 1   x 2 r 1 y 1 r 12 r 2 y r 12 1 = 0 .
Applying the Laplace expansion (2) by the elements of the first row, which are the names of the variables, produces the equation
y 1 r 12 r 12 1 x 1 r 1 y r 12 r 2 y 1 + x 1 r 1 y 1 r 2 y r 12 = 0 .
By finding determinants of the second order, we resolve Equation (20) for the variable y:
y = r 1 y r 2 y r 12 1 r 12 2 x 1 + r 2 y r 1 y r 12 1 r 12 2 x 2 .
This is the simple pair regression (17) with beta coefficients. The Laplace expansion by the first column in (19) yields the equation
y = x 1 x 2 r 12 1 r 12 2 r 1 y + x 2 x 1 r 12 1 r 12 2 r y 2 ,
which can also be regrouped to the solution (21). Substituting values of two predictors into the determinant (19) or into the explicit Equations (21) and (22) yields a predicted value of the dependent variable.

4. Multivariate Interpolation in Laplace Expansion

The Laplace expansion has a useful geometric interpretation. Consider a system of linear equations with the given coefficients ajk and bj, and variables xk:
b 1 = a 11 x 1 + a 12 x 2 + + a 1 n x n   b 2 = a 21 x 1 + a 22 x 2 + + a 2 n x n b n = a n 1 x 1 + a n 2 x 2 + + a n n x n .
This system geometrically corresponds to the hyperplane of the nth order going through n multidimensional points with the coordinates defined by the coefficients in each Equation (23), which are (b1, a11, a12, …, a1n), (b2, a21, a22, …, a2n), …, (bn, an1, an2, …, ann). A practical way of finding the hyperplane going via these given points is known in analytic geometry [14,15]. It involves solving the following determinant equation:
  y   x 1   x 2   b 1   a 11   a 12 x n   a 1 n   b 2   a 21   a 22   b n   a n 1   a n 2   a 2 n   a n n = 0 .
The elements of this determinant match the matrix (3), and Equation (24) corresponds to expression (16). The first row of (24) contains the names of the variables, and the other rows contain parameters of the system (23). Substituting the names in the first row (24) by the values in any other row leads to the determinant with the same two rows. Such a determinant equals zero, so this hyperplane goes exactly via each of the multidimensional points.
The problem of multivariate interpolation becomes easily solvable by the linear interpolation in the approach (23) and (24): indeed, using the values of the predictors in a needed point in place of the names in (24) yields the value of the outcome. Expanding the determinant by the elements of the first row produces the equation of the hyperplane in the explicit formula for interpolation. The linear interpolation can be extended to the polynomial interpolation by many variables if, besides the linear items, we also include quadratic, mix-effects, higher powers of variables, and other kinds of polynomial or nonlinear items in the model. All such items can be denoted as new variables and used in determinant (24) for multivariable interpolation.
Let us return to the normal system (10) of the non-centered and non-standardized variables and show how it can be formulated in terms of multivariate interpolation through the set of special multidimensional points. The OLS model with 1 + n parameters of the intercept and coefficients of regression (9) can be presented as the hyperplane going through the 1 + n points of the weighted mean values of all variables. More specifically, assuming that the total of any x differs from zero, we divide each jth term in Equation (10) by the term with the intercept a0, so this system reduces to the following one:
1 N i = 1 N y i = a 0 +   a 1 1 N i = 1 N x i 1 + + a n 1 N i = 1 N x i n   i = 1 N x i 1 i = 1 N x i 1 y i = a 0 + a 1 i = 1 N x i 1 i = 1 N x i 1 x i 1 + + a n i = 1 N x i 1 i = 1 N x i 1 x i n i = 1 N x i n i = 1 N x i n y i = a 0 + a 1 i = 1 N x i n i = 1 N x i n x i 1 + + a n i = 1 N x i n i = 1 N x i n x i n .
The system (25) can be represented via the mean values, denoted by bars:
y   ¯ = a 0 +   a 1 x 1 ¯ + + a n x n ¯ y w 1 ¯ = a 0 + a 1 x 1 . w 1 ¯ + + a n x n . w 1 ¯ y w n ¯ = a 0 + a 1 x 1 . w n ¯ + + a n x n . w n ¯ ,
with the means and weighted means defined as
y ¯ = 1 N i = 1 N y i   ,   x 1 ¯ = 1 N i = 1 N x i 1   ,     ,   x n ¯ = 1 N i = 1 N x i n   y w 1 ¯ = i = 1 N w i 1 y i ,   x 1 . w 1 ¯ = i = 1 N w i 1 x i 1 , ,   x n . w 1 ¯ = i = 1 N w i 1 x i n y w n ¯ = i = 1 N w i n y i ,   x 1 . w n ¯ = i = 1 N w i n x i 1 ,     x n . w n ¯ = i = 1 N w i n x i n .
The weights of the ith observations in their total for each variable xj are
w i j = x i j i = 1 N x i j   ,     i = 1 N w i j = 1 ,
with the totals for each jth variable equals one. Thus, y w j ¯ in (26) and (27) are the mean values of y, and x k . w j ¯ are the mean values of xk, weighted by the jth set of weights built by xj (j = 1, 2, …, n).
Consequently, the normal system (10) can be reduced to system (26), where each equation describes the hyperplane going through the corresponding (1 + n)-dimensional points defined in the rows (27). Therefore, linear regression (8) can be seen geometrically as a hyperplane going through the 1 + n points of the mean values (27) of variables weighted by each other variable. Such a hyperplane can be built similarly to the procedure described in (23) and (24), So, with the normal system (26), there is the determinant equation:
y     x 0   x 1 y   ¯   1   x 1 ¯ x n x n ¯ y w 1 ¯   1 x 1 . w 1 ¯ y w n ¯   1 x 1 . w n ¯ x n . w 1 ¯ x n . w n ¯ = 0 .
Unlike in (24), there is also a column with x0 for the intercept, which corresponds to the variable identically equals one.
Expanding determinant (29) by the elements of the first row produces explicit formula for the hyperplane, which is also the regression model. For an explicit example of the model with one predictor, Equation (29) becomes
y   x 0   x 1 y   ¯ 1 x 1 ¯ y w 1 ¯ 1 x 1 . w 1 ¯ = y 1 x 1 ¯ 1 x 1 . w 1 ¯ x 0 y   ¯ x 1 ¯ y w 1 ¯ x 1 . w 1 ¯ + x 1 y   ¯ 1 y w 1 ¯ 1 = 0 .
With identity x0 = 1, finding the determinants of the second order and resolving Equation (30) for the dependent variable y, produces the following expression:
y = y ¯   x 1 . w 1 ¯ y w 1 ¯   x 1 ¯ x 1 . w 1 ¯ x 1 ¯ + y w 1 ¯ y   ¯ x 1 . w 1 ¯ x 1 ¯ x 1 .
For only one predictor, x1 can be simplified as x and w1 as w, so the pair regression (31) can be reduced to
y = y ¯   x w ¯ y w ¯   x ¯ x w ¯ x ¯ + y w ¯ y ¯ x w ¯ x ¯ x = y ¯ y w ¯ y ¯ x w ¯ x ¯ x ¯ + y w ¯ y ¯ x w ¯ x ¯ x = a 0 + a 1 x .
The regression line (32) goes via the two points of means and weighted means of the variables because the equality x = x ¯ yields y = y ¯ , and equality x = x w ¯ yields y = y w ¯ . The slope in (32) can be transformed to the expression
a 1 = y w ¯ y ¯ x w ¯ x ¯ = i = 1 N y i x i i = 1 N x i y ¯ i = 1 N x i 2 i = 1 N x i x ¯ = i = 1 N y i x i N y ¯   x ¯ i = 1 N x i 2 N x ¯ 2 = i = 1 N x i x ¯ y i y ¯ i = 1 N x i x ¯ 2 = c o v x ,   y v a r x
which is the quotient of the sample covariance of x with y and the variance of the independent variable x, so it coincides with the regular definition of the slope for the pair regression. The intercept in (32) is
a 0 = y ¯ y w ¯ y ¯ x w ¯ x ¯ x ¯ = y ¯ a 1 x ¯ ,
which corresponds to the relation (11) for the case of one predictor.
The normal system of equations considered above in (10), (25) and (26), or in the matrix form (12) can contain an ill-conditioned matrix with the determinant close to zero, so an inversion of such a matrix produces big values inducting the inflated parameters of regression. This situation requires imposing regularization with a penalizing function, which is commonly added to the OLS objective. However, as shown below, it is possible to apply a penalizing function directly to the normal system.

5. Regularization of the Normal System

For a clear exposition, let us start with the regular ridge regression (RR). The OLS criterion (9) in the matrix form for the standardized variables and with the added penalized function of the quadratic norm of standardized beta-coefficients is defined as
S 2 = | | y X β R R | | 2 + k | | β R R | | 2 = 1 2 β R R c x y + β R R C x x β R R + k β R R β R R ,
where β R R denotes the vector of ridge regression parameters and k is a small positive ridge parameter. The notations (13) are used for the assumed normalized variables. Minimizing objective (35) by this vector yields the system of equations
( C x x + k I ) β R R = c x y ,
where I is the identity matrix of the nth order. Equation (36) presents the normal system (12) with the added scalar matrix of the constant k. Resolving (36) yields
β R R = ( C x x + k I ) 1 c x y ,
which is the vector of the beta-coefficients for the RR model. The ridge regression solution (37) exists even for a singular correlation matrix Cxx, and it reduces to the OLS solution (14) when k reaches zero.
Instead of penalizing the OLS objective (35), it is possible to apply the ridge regularization item to the normal system (12) for new parameters:
N S 2 = | | c x y C x x β N S | | 2 + k | | β N S | | 2 = c x y c x y 2 β N S C x x c x y + β N S C x x 2 β N S + k β N S β N S ,
where NS denotes the normal system. The derivative of objective (38) by the vector β N S produces the equation
C x x 2 + k I β N S = C x x c x y ,
then, the ridge solution by the regularized normal system becomes as follows:
β N S = ( C x x 2 + k I ) 1 C x x c x y .
A similar solution corresponding to the parameter k = 1 was obtained for NS regularization by another criterion in [30] (note that there is a typo with the missed sign of inversion for the matrix in the formula (25) in that work).
Another useful step in building a regularized solution β is the additional adjustment to the vector β a d j by the constant term q, which improves the quality of the data fit:
β a d j = q β .
For any obtained regularized solution of beta-coefficients β , the term q is defined as
q = β c x y β C x x β .
The maximum value of the coefficient of multiple determination, often used for the estimation of the regression quality of fit, with the adjustment (40) and (41), equals
R a d j 2 = β c x y 2 β C x x β .
More detail on this adjustment parameter is given in [30].
Substituting the ridge solutions (37) into (42) and using the obtained constant in (41) leads to the following adjusted ridge solution:
β R R . a d j = c x y ( C x x + k I ) 1 c x y c x y ( C x x + k I ) 2 C x x c x y ( C x x + k I ) 1 c x y .
Similarly, by using the normal system solution (40), we obtain the term (42) and the adjusted solution:
β N S . a d j = c x y ( C x x 2 + k I ) 1 C x x c x y c x y ( C x x 2 + k I ) 2 C x x 3 c x y ( C x x 2 + k I ) 1 C x x c x y .
Note that symmetrical matrices in the quotients (44) and (45) are commutative. The adjusted solutions (44) and (45) produce the maximum possible data fit by the vectors with structures (37) or (40), respectively.

6. Mahalanobis Distance to the Points of Means

The weighted mean values (27) can also be applied for facilitating estimations in some nonlinear models by their transformation to the corresponding linear link functions. Let us consider grouping the observations by the variables xij around the points (27) of the weighted means x j . w k ¯ of the predictors. As a measure of distance from a multivariate observation to the points of mean values for the variables measured in different units, the Mahalanobis distance can be tried out. The n + 1 points of means correspond to the n + 1 centers with the coordinates given in all kth rows (27), without the mean values by the y variable. A distance from an ith point to the kth center can be defined by Mahalanobis distance as follows:
d i k = ( x i m k ) C x x 1 x i m k ,
where C x x 1 is the inverted covariance matrix, xi is the vector-column of the ith values by all x variables, and mk is the vector column of the means by x-variables in each kth row in (27):
x i = x i 1   ,   x i 2   ,   , x i n ,   m k =   x 1 . w k ¯   ,   x 2 . w k ¯   ,   ,   x n . w k ¯ ,
including the first row in (27) with the non-weighted mean values. A smaller dik (46) corresponds to a higher level of belonging of the ith observation to the kth cluster of mean values, so the weights of such a belonging can be defined by the reciprocal distances, 1/dik. However, such metrics cannot be used for the case dik = 0, which can happen in real data.
The weights for the correspondence of an ith observation xi to a kth center mk (47) can be better defined via the probability density function (pdf) of the multinormal distribution based on the Mahalanobis distance (46):
p d i k = 1 det 2 π C x x exp d i k 2 2 .
A smaller dik corresponds to a bigger pdf (48), and for dik = 0, the pdf reaches its maximum, so there is no singularity. It is convenient to use the weights vik of belonging of an ith observation to a kth group defined by the pdf values (48) normalized within each ith observation:
v i k = p d i k k = 0 n p d i k .
So, for each ith observation, the total of weights across the groups of means equals one, k = 0 n v i k = 1 . With the weights (49), the weighted means of the dependent variable can be found in relation to each of kth groups. Note that the group of k = 0 corresponds to the predictor x0 with intercept in the first row in the Formulae (25)–(27).
Suppose the dependent variable y is presented by a binary variable with 0 and 1 values. With the weights (49), the weighted means for y are
y k ¯ = i = 1 N v i k y i .
The binary outcome model is commonly built in the form of the logistic regression by the maximum likelihood criterion. With this model, the probability in each ith point of observations can be found by the expression
p i = exp b 0 + b 1 x i 1 + + b n x i n 1 + exp b 0 + b 1 x i 1 + + b n x i n ,
where bj are the estimated logit regression parameters. The model (51) can be transformed into the linear link function:
l n p i 1 p i = b 0 + b 1 x i 1 + + b n x i n .
It is clear that the binary 0-or-1 values of the original outcome variable yi cannot be used under the logarithm in (52). However, we can define the new dependent variable z at the left-hand side (52) via the proportions (50) as follows:
z k = l n y k ¯ 1 y k ¯ .
Then, using the values (53) as the dependent variable in Equation (26), we solve this normal system of equations with respect to the parameters of logistic regression in the linearized form (52). The obtained parameters can serve in the logit Equation (51) for estimating probability at each observation point.

7. Numerical Examples

From the MASS package of the R software, the dataset “Cars 93” was taken for numerical examples. There are 93 observations on different car brands and models, measured by 27 numerical, ordinal, and nominal variables. The following eight numerical variables were used as predictors: x1—Engine size (liters), x2—Horsepower, x3—Rev. per mile (engine revolutions per mile in highest gear), x4—Fuel tank capacity (US gallons), x5—Length (inches), x6—Wheelbase (inches), x7—Width (inches), x8—Weight (pounds). As the dependent variable y, the Price (in USD 1000) is taken for the linear models with numerical outcome, and the binary variable Origin (USA and non-USA companies as 1 and 0 values, respectively) is taken for the logistic regression.
Table 1, in its first two numerical columns, presents the pair correlations r of the dependent variable of Price with the predictors and the beta-coefficients of the OLS linear regression (14) by normalized predictors.
The next five columns display the ridge regressions (37) adjusted by the relations (41) and (42) or (44) with different values of the ridge parameter k. The OLS model corresponds to the case k = 0, and all regressions show how their coefficients behave in profiling by the k parameter. The last three rows in Table 1 present the coefficient of multiple determination R2, the parameter of adjustment q (42), and the coefficient R2adj adjusted by this parameter. With bigger k, the quality of fit R2 of the ridge regressions diminishes, although the adjustment parameter q improves it to R2adj values.
Coefficients of the pairwise regressions of the price by each one predictor coincide with the pair correlations for the standardized data. Assuming that all the pairwise models have meaningful relation of the outcome with the predictors, we can notice that, in the OLS multiple regression, the parameter with x3 becomes positive, and with x7, it becomes negative, so of the opposite directions than observed in the pairwise relations. Such a change in sign in the coefficients corresponds to the effects of multicollinearity among the predictors in the multiple regression. Ridge regression makes the model less prone to collinearity impact, and by increasing the parameter k, we can reach the multiple ridge regression with the coefficients of same signs as in the pairwise models or in the correlations. However, as we see in Table 1, only with the ridge parameter growing to the rather big value of k = 1 can the coefficient with x3 become negative, as in the correlation, but the coefficient with x7 still cannot reach a positive value like in the pair relation.
A possibility to achieve the multiple regression coefficients of the signs coinciding with the pairwise models can be found using the ridge penalizing in the normal system itself, as described in the derivation (38)–(40). Table 2 is built similarly to the previous table, starting with the correlations and the OLS model, but the next columns present the results of the ridge regularization for the normal system with the adjustment of (40)–(42) or (45).
The results of Table 2 demonstrate that, already with the ridge parameter, k = 0.4, the coefficients in multiple regressions all become of the same signs as the coefficients in the pairwise models or the correlations. It means that the ridge penalizing—not in the original OLS objective but in the normal system—presents a better way to obtain the interpretable solutions for multiple linear models.
Let us consider the Origin binary variable in the OLS modeling by the same eight predictors. Its solution can be reduced to the normal system of equations presented via the weighted mean values as it is described in relations (25)–(27). Table 3 shows coefficients of the normal system given in the weighted mean values (26). The first column in Table 3 presents the weighted means of the dependent variable. The next column of the variable x0 corresponds to the intercept. After that, the weighted means of the predictors are on the right-hand side. The values in each row define the nine multidimensional points through which the hyperplane of the linear regression (8) goes. Solving this normal system produces coefficients of the OLS regression model.
For the Origin binary outcome, Table 4 in the first two numeric columns presents the non-standardized coefficients of the OLS and the logit regressions. The last row shows the coefficient of multiple determination R2 for the OLS model and the pseudo-R2 for the logit model (defined as one minus the quotient of the residual deviance by the null deviance). These models have a similar quality of fit and prediction.
The last column in Table 4 shows the estimation via the linear link by the Mahalanobis distance and weighted proportions described in the formulae (46)–(53). The R2 in this case equals one, but it only indicates that this model corresponds to the interpolation via the points of the weighted means, so its residuals equal zero. The three solutions in Table 4 look different, but their correlations are very high. Correlating without the intercepts yields 0.9496 for OLS and Logit, 0.8996 for OLS and Linear-link, and 0.9906 for Logit and Linear-link. With the intercepts, these values become 0.9986, 0.9990, and 0.9998, respectively.
With these three models, we can estimate the fitted values for the outcome variable y and find correlations between them, which are 0.9119 between OLS and Logit, 0.9875 between OLS and Linear-link, and 0.9527 between the Logit and Linear-link models. Thus, these vectors of fitted values are highly correlated; however, some OLS predictions could occur to be negative or above one, while both Logit and Linear-link always produce meaningful probability values in the [0, 1] range. The predicted variables can also be dichotomized to 0 or 1 for the values below 0.5 and equal/higher than 0.5; then, it is possible to build the cross-tables of counts for the original versus the predicted values. The quality of prediction can be evaluated by the hit rate, which is the total of the correct 0 and 1 predictions of the dependent binary variable in the total number of observations. The hit rate reached by each of the three predictions is about 83%, so, in general, all these models are of a good quality of fit and prediction.

8. Summary

The work described several theoretical relations and practical applications based on the normal system of equations used for the estimation of parameters in multiple linear regression. The considered techniques further develop the approaches started in the works of [16,17,30,31,36]. Resolving the normal system and finding new relations are described in terms of the Laplace expansion of the determinant by its cofactors and double Laplace expansion. The new approaches include the ridge regularization of the normal system itself and the geometric interpretation of the normal system as a unique hyperplane going via the multivariate points of weighted means. The hyperplane definition in the determinant equation can be used for the multivariable interpolation of the linear or nonlinear functions. Mahalanobis distances from the observations to the weighted means are built to find the weighted proportions of the outcome binary variable and reduce the logistic regression to the linear link function and multidimensional interpolation by it. The numerical examples demonstrate the usefulness of the described tools in data modeling. Particularly, they show that the ridge penalizing not in the original OLS criterion but in the normal system occurs to be a better way of obtaining an interpretable solution for multiple linear models and that the logistic model can be substituted by the estimation via its linear link with the proportions for the binary outcome evaluated in the Mahalanobis distance approach. Future research can extend the Mahalanobis metrics to the generalizations like Bregman, Jensen, and other divergences [37,38]. The found properties are convenient for a better understanding and interpretation of multiple regression modeling.

Funding

This research received no external funding.

Data Availability Statement

The dataset for the numerical examples is available in the MASS library of the R software.

Acknowledgments

The author is grateful to the two anonymous reviewers for their comments and suggestions which helped to improve the paper.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Rencher, A.C.; Schaalje, G.B. Linear Models in Statistics; Wiley: Hoboken, NJ, USA, 2008. [Google Scholar]
  2. Andersen, P.K.; Skovgaard, L.T. Regression with Linear Predictors; Springer: New York, NY, USA, 2010. [Google Scholar]
  3. Hocking, R.R. Methods and Applications of Linear Models: Regression and the Analysis of Variance; Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  4. Gentle, J.E. Matrix Algebra: Theory, Computations and Applications in Statistics; Springer: Cham, Switzerland, 2017. [Google Scholar]
  5. Young, D.S. Handbook of Regression Methods; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
  6. Demidenko, E. Advanced Statistics with Applications in R; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar]
  7. Irizarry, R.A. Introduction to Data Science: Data Analysis and Prediction Algorithms with R; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
  8. Efron, B.; Hastie, T. Computer Age Statistical Inference, Student Edition: Algorithms, Evidence, and Data Science; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar]
  9. Faraway, J.J. Linear Models with Python; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
  10. Montgomery, D.C.; Peck, D.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
  11. DeAlba, L.M. Determinants and eigenvalues. In Handbook of Linear Algebra; Hogben, L., Ed.; Chapman & Hall: New York, NY, USA; CRC Press: Boca Raton, FL, USA, 2007. [Google Scholar]
  12. Tsatsomeros, M. Matrix equalities and inequalities. In Handbook of Linear Algebra; Hogben, L., Ed.; Chapman & Hall: New York, NY, USA; CRC Press: Boca Raton, FL, USA, 2007. [Google Scholar]
  13. Laplace Expansion. 2025. Available online: https://en.wikipedia.org/wiki/Laplace_expansion (accessed on 24 April 2025).
  14. Korn, G.A.; Korn, T.M. Mathematical Handbook for Scientists and Engineers: Definitions, Theorems, and Formulas for Reference and Review; Courier Corporation: Chelmsford, MA, USA, 2000. [Google Scholar]
  15. Mathematics. 2023. Available online: https://math.stackexchange.com/questions/2723294/how-to-determine-the-equation-of-the-hyperplane-that-contains-several-points (accessed on 24 April 2025).
  16. Lipovetsky, S.; Conklin, M. Regression as weighted mean of partial lines: Interpretation, properties, and extensions. Int. J. Math. Educ. Sci. Technol. 2001, 32, 697–706. [Google Scholar] [CrossRef]
  17. Lipovetsky, S. Multiple regression model as interpolation through the points of weighted means. J. Data Sci. Intell. Syst. 2024, 2, 205–211. [Google Scholar] [CrossRef]
  18. Gasca, M.; Sauer, T. Polynomial interpolation in several variables. Adv. Comput. Math. 2000, 12, 377–410. [Google Scholar] [CrossRef]
  19. Jetter, K.; Buhmann, M.D.; Haussmann, W.; Schaback, R.; Stöckler, J. Topics in Multivariate Approximation and Interpolation; Elsevier: New York, NY, USA, 2005. [Google Scholar]
  20. Olver, P.J. On multivariate interpolation. Stud. Appl. Math. 2006, 116, 201–240. [Google Scholar] [CrossRef]
  21. Steffensen, J.F. Interpolation; Dover: New York, USA, USA, 2006. [Google Scholar]
  22. Multivariate Interpolation. 2025. Available online: https://en.wikipedia.org/wiki/Multivariate_interpolation (accessed on 24 April 2025).
  23. Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  24. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  25. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction; Springer: New York, NY, USA, 2001. [Google Scholar]
  26. Hawkins, D.M.; Yin, X. A faster algorithm for ridge regression of reduced rank data. Comput. Stat. Data Anal. 2002, 40, 253–262. [Google Scholar] [CrossRef]
  27. Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–489. [Google Scholar] [CrossRef]
  28. Liu, X.Q.; Gao, F. Linearized Ridge Regression Estimator in Linear Regression. Commun. Stat. Theory Methods 2011, 40, 2182–2192. [Google Scholar] [CrossRef]
  29. Hansen, B.E. The Risk of James–Stein and Lasso Shrinkage. Econom. Rev. 2016, 35, 1456–1470. [Google Scholar] [CrossRef]
  30. Lipovetsky, S. Enhanced ridge regressions. Math. Comput. Model. 2010, 51, 338–348. [Google Scholar] [CrossRef]
  31. Lipovetsky, S. Regressions regularized by correlations. J. Mod. Appl. Stat. Methods 2018, 17, 1–16. [Google Scholar] [CrossRef]
  32. McCullagh, P.; Nelder, J.A. Generalized Linear Models; CRC Press: Boca Raton, FL, USA, 1999. [Google Scholar]
  33. Dobson, A. An Introduction to Generalized Linear Models; Chapman & Hall: New York, NY, USA; CRC Press: Boca Raton, FL, USA, 2002. [Google Scholar]
  34. Grafarend, E.W.; Awange, J.L. Applications of Linear and Nonlinear Models: Fixed Effects, Random Effects, and Total Least Squares; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  35. Rao, C.R. Linear Statistical Inference and Its Applications, 2nd ed.; Wiley: New York, NY, USA, 2009. [Google Scholar]
  36. Lipovetsky, S. Data fusion in several algorithms. Adv. Adapt. Data Anal. 2013, 5, 1–12. [Google Scholar] [CrossRef]
  37. Adamcik, M. The information geometry of Bregman divergences and some applications in multi-expert reasoning. Entropy 2014, 16, 6338–6381. [Google Scholar] [CrossRef]
  38. Nielsen, F. On a variational definition for the Jensen-Shannon symmetrization of distances based on the information radius. Entropy 2021, 23, 464. [Google Scholar] [CrossRef] [PubMed]
Table 1. Price modeling: correlations, OLS, and Ridge regression models.
Table 1. Price modeling: correlations, OLS, and Ridge regression models.
PredictorsrOLSk = 0.2k = 0.4k = 0.6k = 0.8k = 1.0
x1Engine size0.5970.2290.1380.1250.1200.1170.115
x2Horsepower0.7880.7430.5950.4970.4320.3860.352
x3Rev. per mile−0.4260.1270.0880.0520.0280.011−0.001
x4Fuel tank0.6190.0830.1300.1430.1440.1430.140
x5Length0.5040.1380.0890.0750.0710.0710.071
x6Wheelbase0.5010.2820.1230.0910.0810.0770.075
x7Width0.456−0.642−0.276−0.159−0.100−0.064−0.040
x8Weight0.6470.1280.1680.1610.1530.1480.143
R2 0.7210.6710.6270.5930.5680.547
q 1.0001.1541.2171.2591.2941.326
R2adj 0.7210.6830.6470.6200.5980.582
Table 2. Price modeling: correlations, OLS, and Ridge Normal System models.
Table 2. Price modeling: correlations, OLS, and Ridge Normal System models.
PredictorsrOLSk = 0.2k = 0.4k = 0.6k = 0.8k = 1.0
x1Engine size0.5970.2290.0990.1040.1050.1050.104
x2Horsepower0.7880.7430.4370.3360.2790.2430.218
x3Rev. per mile−0.4260.1270.039−0.006−0.028−0.041−0.050
x4Fuel tank0.6190.0830.1840.1680.1540.1440.137
x5Length0.5040.1380.0250.0330.0430.0510.056
x6Wheelbase0.5010.2820.0130.0250.0370.0450.052
x7Width0.456−0.642−0.0230.0270.0470.0590.066
x8Weight0.6470.1280.1530.1390.1310.1260.122
R2 0.7210.6010.5570.5300.5110.497
q 1.0001.1281.1351.1311.1261.122
R2adj 0.7210.6830.6090.5650.5370.517
Table 3. Origin modeling: Normal system for OLS regression via the weighted means.
Table 3. Origin modeling: Normal system for OLS regression via the weighted means.
yx0x1x2x3x4x5x6x7x8
x00.5161312.6677143.8282332.216.665183.20103.9569.3763072.9
x10.5933113.0668158.5782174.817.622187.59105.8770.6373264.7
x20.5293812.9413162.6952224.817.505186.10105.1470.2543229.9
x30.4688912.4877137.2062436.816.243181.08103.0368.7552981.6
x40.5281312.8211151.0852273.317.303185.17104.9569.9643175.6
x50.5305212.7316146.1032305.216.843184.36104.3969.6213110.4
x60.5249812.7171145.4832311.716.826183.98104.3969.5743106.3
x70.5279012.7162145.6472311.316.806183.85104.2469.5803100.7
x80.5366912.8342151.1762262.917.221185.44105.0870.0043184.9
Table 4. Origin modeling: OLS, Logit, and Linear link by Mahalanobis distances.
Table 4. Origin modeling: OLS, Logit, and Linear link by Mahalanobis distances.
PredictorsOLSLogitLinear Link by Mahalanobis
x0Intercept−3.8091−38.2030−15.9008
x1EngineSize0.16633.75931.3595
x2Horsepower−0.0025−0.0357−0.0172
x3Rev. per mile−0.0003−0.00150.0008
x4Fuel tank capacity−0.0414−0.3962−0.0564
x5Length0.00310.02880.0294
x6Wheelbase−0.0008−0.07920.0194
x7Width0.09140.86950.1660
x8Weight−0.0004−0.0043−0.0016
R2 0.42170.45991.0000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lipovetsky, S. Normal System in Laplace Expansion and Related Regression Modeling Problems. Symmetry 2025, 17, 668. https://doi.org/10.3390/sym17050668

AMA Style

Lipovetsky S. Normal System in Laplace Expansion and Related Regression Modeling Problems. Symmetry. 2025; 17(5):668. https://doi.org/10.3390/sym17050668

Chicago/Turabian Style

Lipovetsky, Stan. 2025. "Normal System in Laplace Expansion and Related Regression Modeling Problems" Symmetry 17, no. 5: 668. https://doi.org/10.3390/sym17050668

APA Style

Lipovetsky, S. (2025). Normal System in Laplace Expansion and Related Regression Modeling Problems. Symmetry, 17(5), 668. https://doi.org/10.3390/sym17050668

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop