Isotonic and Convex Regression: A Review of Theory, Algorithms, and Applications

Lim, Eunji

doi:10.3390/math14010147

Open AccessReview

Isotonic and Convex Regression: A Review of Theory, Algorithms, and Applications

by

Eunji Lim

Department of Decision Sciences and Marketing, Adelphi University, Garden City, NY 11530, USA

Mathematics 2026, 14(1), 147; https://doi.org/10.3390/math14010147 (registering DOI)

Submission received: 17 November 2025 / Revised: 18 December 2025 / Accepted: 28 December 2025 / Published: 30 December 2025

(This article belongs to the Special Issue Stochastic Simulation: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

Shape-restricted regression provides a flexible framework for estimating an unknown relationship between input variables and a response when little is known about the functional form, but qualitative structural information is available. In many practical settings, it is natural to assume that the response changes in a systematic way as inputs increase, such as increasing, decreasing, or exhibiting diminishing returns. Isotonic regression incorporates monotonicity constraints, requiring the estimated function to be nondecreasing with respect to its inputs, while convex regression imposes convexity constraints, capturing relationships with increasing or decreasing marginal effects. These shape constraints arise naturally in a wide range of applications, including economics, operations research, and modern data-driven decision systems, where they improve interpretability, stability, and robustness without relying on parametric model assumptions or tuning parameters. This review focuses on isotonic and convex regression as two fundamental examples of shape-restricted regression. We survey their theoretical properties, computational formulations based on optimization, efficient algorithms, and practical applications, and we discuss key challenges such as non-smoothness, boundary overfitting, and scalability. Finally, we outline open problems and directions for future research.

Keywords:

nonparametric regression; shape-restricted regression; convex regression; isotonic regression

MSC:

62G08

1. Introduction

In many statistical and data-driven problems, the goal is to understand how a response variable changes as one or more input variables vary. In practice, the exact functional relationship is often unknown and difficult to specify parametrically. Nevertheless, domain knowledge frequently provides qualitative information about the structure of the relationship. For example, demand typically decreases as price increases, waiting times in queueing systems decrease as service capacity increases, and costs often exhibit diminishing returns as resources grow.

Shape-restricted regression is a class of nonparametric methods that incorporate qualitative prior knowledge by constraining the shape of the estimated function, rather than specifying a parametric form. Common shape constraints include monotonicity and convexity, which arise naturally in many applications.

Imposing such constraints fundamentally changes the fitted function. Isotonic regression enforces monotonicity and typically yields a piecewise constant estimator, while convex regression produces a piecewise linear fit with monotone slopes. These estimators can be viewed as projections of the data onto a space of functions satisfying the prescribed shape constraints.

By preventing local fluctuations and enforcing global structural properties, shape-restricted regression often improves stability, interpretability, and robustness relative to unconstrained nonparametric methods. When qualitative relationships are known a priori, such as increasing risk with exposure or diminishing returns, these methods provide flexible, data-driven models that remain aligned with domain knowledge. Isotonic and convex regression are two fundamental and widely studied examples of this broader framework; Figure 1 and Figure 2 later in the paper provide visual illustrations of how monotonicity and convexity constraints shape the fitted functions in practice.

With this motivation in mind, we consider the problem of estimating an unknown regression function from noisy observations. Let

X_{1}, \dots, X_{n}

denote input variables and

Y_{1}, \dots, Y_{n}

the corresponding responses, where each observation is subject to random noise.

Mathematically, we want to estimate an unknown function

f_{0} : Ω \subset R^{d} \to R

from observations

(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})

, where we assume

Y_{i} = f_{0} (X_{i}) + ε_{i},

for

i = 1, \dots, n

,

Ω

is a closed convex subset of

R^{d}

, and the

ε_{i}

’s are independent and identically distributed (iid) mean zero random variables with finite variance

σ^{2}

. This type of problem arises in many settings. For instance,

x

may represent a parameter in a stochastic system, such as the service rate of a single-server queue, and

f_{0} (x)

a performance metric, such as the steady-state mean waiting time in that single server queue when the service rate equals

x

.

In many applications, there is no-closed form formula for

f_{0} (x)

. In such a case, a practical approach is to simulate the system at selected design points

x = X_{1}, \dots, X_{n}

, obtain simulated outputs

Y_{1}, \dots, Y_{n}

as estimates of

f_{0} (X_{1}), \dots, f_{0} (X_{n})

, respectively, and use the data

(X_{1}, Y_{1}), \dots,

(X_{n}, Y_{n})

to approximate

f_{0}

: This is a typical regression problem. The problem becomes challenging when it is impossible to place the relationship into a simple parametric form. However, when

f_{0}

is known to satisfy certain structural properties such as monotonicity or convexity, shape-restricted regression offers powerful estimators.

Given such prior shape information, a natural estimator of

f_{0}

fits a function f with such a shape characterization to the data set and solves

\begin{matrix} \underset{f \in F}{minimize} & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f (X_{i}))}^{2}, \end{matrix}

(1)

where

F

is the class of functions satisfying the assumed shape constraint. For example, if

f_{0}

is known to be nondecreasing, we can set

F = F_{m}

, where

\begin{matrix} F_{m} = {f : Ω \to R ∣ f (v) \leq f (w) if v_{i} \leq w_{i} for i = 1, \dots, d, \\ v = (v_{1}, \dots, v_{d}), w = (w_{1}, \dots, w_{d})} . \end{matrix}

On the other hand, if

f_{0}

is known to be convex, then we can set

F = F_{c}

, where

\begin{matrix} F_{c} = {f : Ω \to & R ∣ f is convex, i . e ., \\ f (α v + (1 - α) w) \leq α f (v) + (1 - α) f (w) for α \in [0, 1] and v, w \in Ω} . \end{matrix}

Throughout this paper, we will focus on these two types of problems, the one with

F = F_{m}

which leads to isotonic regression and the other with

F = F_{c}

which leads to convex regression.

One of the first questions that arises is how to compute the solution to (1) numerically when

F = F_{m}

or

F = F_{c}

. Below, we describe how the solution to (1) can be obtained by solving quadratic programming (QP) problems.

1.1. QP Formulation for Isotonic Regression When $d = 1$

We begin with the one-dimensional case to provide a simple and intuitive introduction to isotonic regression. In this setting, the monotonicity constraint reduces to a set of ordered inequalities on the fitted values, which clearly illustrates the basic structure of the estimator and its connection to quadratic programming. This case serves as a conceptual foundation for the more general multivariate formulation discussed later.

When

f_{0}

is nondecreasing in

x

, as when price increases with demand, we estimate

f_{0}

by fitting a nondecreasing function f that minimizes squared errors

\sum_{i = 1}^{n} {(Y_{i} - f (X_{i}))}^{2}

. Writing

f_{i} ≜ f (X_{i})

and assuming the design is ordered so

X_{1} < \dots < X_{n}

, the existence of a nondecreasing fit through

(X_{i}, f_{i})

is equivalent to

f_{1} \leq \dots \leq f_{n}

. Thus, the estimator solves the following quadratic programming (QP) problem:

\begin{matrix} \underset{f_{1}, \dots, f_{n} \in R}{minimize} & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f_{i})}^{2} \\ subject to & f_{1} \leq f_{2} \leq \dots \leq f_{n} . \end{matrix}

(2)

Let

{\hat{f}}_{1}^{m}, \dots, {\hat{f}}_{n}^{m}

be a solution to (2). Any nondecreasing function interpolating

((X_{i}, {\hat{f}}_{i}^{m})

:

1 \leq i \leq n)

defines the isotonic regression estimator and we will denote it by

{\hat{f}}_{n}^{m} (\cdot)

; see, for example, Brunk [1].

1.2. QP Formulation for Convex Regression When $d = 1$

As with isotonic regression, we first consider convex regression in the one-dimensional setting to highlight its essential structure in the simplest possible form. When the covariate is scalar, convexity translates into monotonicity of successive slopes, leading to transparent linear inequality constraints. This formulation provides intuition for how convexity is enforced and motivates the multivariate extension presented in subsequent sections.

Suppose instead

f_{0}

is convex. Writing

f_{i} ≜ f (X_{i})

and assuming the design is ordered so

X_{1} < \dots < X_{n}

, a convex fit through

(X_{i}, f_{i})

exists if and only if

\frac{f_{2} - f_{1}}{X_{2} - X_{1}} \leq \dots \leq \frac{f_{n} - f_{n - 1}}{X_{n} - X_{n - 1}} .

Hence, the least-squares convex fit solves

\begin{matrix} \underset{f_{1}, \dots, f_{n} \in R}{minimize} & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f_{i})}^{2} \\ subject to & \frac{f_{2} - f_{1}}{X_{2} - X_{1}} \leq \dots \leq \frac{f_{n} - f_{n - 1}}{X_{n} - X_{n - 1}} . \end{matrix}

(3)

Let

{\hat{f}}_{1}^{c}, \dots, {\hat{f}}_{n}^{c}

be a solution to (3). Any convex function interpolating

((X_{i}, {\hat{f}}_{i}^{c}) : 1 \leq i \leq n)

is a convex regression estimator and we will denote it by

{\hat{f}}_{n}^{c} (\cdot)

; see, for example, Hildreth [2].

1.3. Scope, Motivation, and Organization of the Review

Over the past several decades, isotonic and convex regression have attracted sustained interest due to their ability to incorporate qualitative structural information into nonparametric estimation; see [3,4] for recent survey papers. Despite extensive theoretical and methodological development, the literature remains fragmented across statistical theory, optimization-based computation, and applied practice.

One central issue concerns computation. Although isotonic and convex regression estimators can be formulated as solutions to constrained least-squares problems, their practical implementation depends critically on efficient algorithms capable of handling large and possibly high-dimensional data sets. Understanding how these estimators can be computed numerically, and what algorithmic trade-offs arise in practice, is therefore a key focus of this review.

A second major theme involves statistical properties. A substantial body of work has established consistency, convergence rates, and limiting distributions under various assumptions, yet these results differ markedly between isotonic and convex regression and between univariate and multivariate settings. Clarifying these similarities and differences is essential for understanding when and how shape-restricted regression methods can be effectively applied.

Beyond theory and computation, isotonic and convex regression play an important role in applications across economics, operations research, and modern data-driven decision systems. At the same time, practical use of these methods reveals recurring challenges, including non-smoothness of the estimators, boundary overfitting, and scalability limitations. Addressing these issues has motivated a range of extensions and regularization strategies, many of which introduce new theoretical and computational questions.

The objective of this paper is to provide an integrated review of isotonic and convex regression that connects these methodological, theoretical, and practical perspectives. We emphasize a unified optimization-based viewpoint, highlight common challenges, and identify open problems that remain insufficiently explored.

The remainder of the paper is organized as follows: Section 2 reviews quadratic programming formulations of isotonic and convex regression. Section 3 focuses on isotonic regression, covering statistical properties, algorithms, applications, and challenges. Section 4 examines convex regression in greater detail. Section 5 presents concluding remarks, and Section 6 discusses open challenges and directions for future research.

1.4. Notation and Definitions

Throughout this paper, we view vectors as columns and write

x^{T}

to denote the transpose of a vector

x \in R^{d}

. For

x \in R^{d}

, we write its ith component as

x_{i}

, so

x = {(x_{1}, \dots, x_{d})}^{T}

. We define

∥ x ∥ = {(x_{1}^{2} + \dots + x_{d}^{2})}^{1 / 2}

and

{∥ x ∥}_{\infty} = {max}_{1 \leq i \leq d} | x_{i} |

.

For any convex set

S \subset R^{d}

and a convex function

g : S \to R

, a vector

β \in R^{d}

is said to be a subgradient of g at

x \in S

if

g (y) \geq g (x) + β^{T} (y - x)

for all

y \in S

. We denote a subgradient of f at

x \in S

by

subgrad g (x)

. The set of all subgradients of g at

x

is called the subdifferential of g at

x

, denoted by

\partial g (x)

. When a function

g : S \to R

is differentiable at

x \in {[a, b]}^{d}

, we denote its gradient at

x

by

\nabla g (x)

. When

g : R \to R

is differentiable, we denote its gradient at

x

by

g^{'} (x)

.

For a sequence of random variables

(Z_{n} : n \geq 1)

and a sequence of positive real numbers

(α_{n} : n \geq 1)

, we say

Z_{n} = O_{p} (α_{n})

as

n \to \infty

if, for any

ϵ > 0

, there exist constants c and

n_{0}

such that

P (| Z_{n} / α_{n} | > c) < ϵ

for all

n \geq n_{0}

.

For any sequences of real numbers

(a_{n} : n \geq 1)

and

(b_{n} : n \geq 1)

,

a_{n} = O (b (n))

if there exists a positive real number M and a real number

n_{0}

such that

| a_{n} | \leq M b (n)

for all

n \geq n_{0}

.

For any sequence of random variables

(A_{n} : n \geq 1)

and any random variable B,

A_{n} \Rightarrow B

means

A_{n}

converges in distribution to B, or

A_{n}

converges weakly to B as

n \to \infty

.

2. The Basic QP Formulation

In this section, we describe how the isotonic and convex regression estimators can be found by solving QP problems. Since there are many efficient algorithms for solving QP problems, expressing the isotonic and convex regression estimators as solutions to QP problems offers a basic way to compute them numerically. In

d = 1

, (2) and (3) already provide QP formulations for the isotonic and convex regression estimators, respectively. We now present multivariate analogues.

2.1. Isotonic Regression When $d > 1$

The main challenge in finding the QP formulation for the multivariate isotonic regression estimator is how to express the monotonicity of a multivariate function in linear inequalities as in the constraints of (2). We will define a non-decreasing function g as follows:

g (v) \leq g (w) whenever v ⪯ w

for

v = (v_{1}, \dots, v_{d})

and

w = (w_{1}, \dots, w_{d})

in

R^{d}

, where

v ⪯ w

indicates

v_{i} \leq w_{i}

for

1 \leq i \leq d

.

Using this definition, the nondecreasing function with the minimum sum of squared errors can be found by solving the following QP problem in decision variables

f_{1}, \dots, f_{n} \in R

:

\begin{matrix} minimize & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f_{i})}^{2} \\ subject to & f_{i} \leq f_{j} if X_{i} ⪯ X_{j} for 1 \leq i, j \leq n . \end{matrix}

(4)

Let

{\hat{f}}_{1}^{m}, \dots, {\hat{f}}_{n}^{m}

be a solution to (4).

To clarify the form of the resulting estimator, solving the quadratic program (4) yields fitted values

{\hat{f}}_{1}^{m}, \dots, {\hat{f}}_{n}^{m}

at the observed design points

X_{1}, \dots, X_{n}

. The multivariate isotonic regression estimator is then defined as any nondecreasing function that interpolates these fitted values. In practice, the estimator is typically taken to be a piecewise constant function over regions of the covariate space determined by the order constraints.

As a result, the estimator can be readily evaluated at any point

x \in Ω

by assigning it the fitted value corresponding to the maximal design point

X_{i} ⪯ x

under the partial order; i.e.,

{\hat{f}}_{n}^{m} (x) = \{\begin{matrix} min_{1 \leq i \leq n} {{\hat{f}}_{i}^{m}} & if x ⪯ X_{i} for all 1 \leq i \leq n \\ max_{1 \leq i \leq n} {{\hat{f}}_{i}^{m} ∣ X_{i} ⪯ x} & otherwise \end{matrix}

This representation highlights both the interpretability and the practical usability of the multivariate isotonic regression estimator once the underlying optimization problem has been solved.

2.2. Convex Regression When $d > 1$

Again, the key question is how to express the existence of a convex function going through

(X_{1}, f_{1}), \dots, (X_{n}, f_{n})

as a set of linear constraints. One should notice that there exists a convex function passing through

(X_{1}, f_{1}), \dots, (X_{n}, f_{n})

if and only if there exist

β_{1}, \dots, β_{n} \in R^{d}

satisfying

f_{i} \geq f_{j} + β_{j}^{T} (X_{i} - X_{j}) for 1 \leq i, j \leq n;

see page 338 of Boyd and Vandenberghe [5]. Here,

β_{i}

serves as a subgradient of the convex function f at

X_{i}

. Hence, a convex function with the least squares can be found by solving the following QP problem in decision variables

f_{1}, \dots, f_{n} \in R

and

β_{1}, \dots, β_{n} \in R^{d}

:

\begin{matrix} minimize & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f_{i})}^{2} \\ subject to & f_{i} \geq f_{j} + β_{j}^{T} (X_{i} - X_{j}) for 1 \leq i, j \leq n . \end{matrix}

(5)

Let

{\hat{f}}_{1}^{c}, \dots, {\hat{f}}_{n}^{c}, {\hat{β}}_{1}, \dots, {\hat{β}}_{n}

be a solution to (5). Any convex function going through

(X_{1}, {\hat{f}}_{1}^{c}), \dots,

(X_{n}, {\hat{f}}_{n}^{c})

can serve as the convex regression estimator in the multidimensional case. However, to remove any ambiguity, we will define the convex regression estimator

{\hat{f}}_{n}^{c} (\cdot)

as follows:

{\hat{f}}_{n}^{c} (x) = max_{1 \leq i \leq n} {{\hat{f}}_{i}^{c} + β_{i}^{T} (x - X_{i})}

for

x \in Ω

.

In (2)–(5), we established the isotonic and convex regression estimators as solutions to QP problems. Since there are many efficient numerical algorithms to solve QP problems, these QP formulations enable us to compute the isotonic and convex regression estimators numerically.

3. Isotonic Regression

In this section, we review statistical properties, interesting applications, and some challenges of isotonic regression estimator. We also review some algorithms for computing the isotonic regression estimator numerically.

3.1. Statistical Properties

3.1.1. Univariate Case

The isotonic regression estimator in one and multiple dimensions was first introduced by Brunk [1]. In one dimension, Brunk [6] established its strong consistency. More specifically, Brunk [6] proved that if

Ω = (a, b) \subset R

, for any

c > a

and

d < b

,

max_{c \leq x \leq d} | {\hat{f}}_{n}^{m} (x) - f_{0} (x) | \to 0

as

n \to \infty

with probability one.

Brunk [7] established the asymptotic distribution theory for the isotonic regression estimator when

d = 1

. Brunk [7] proved that when

f_{0}^{'} (x) \neq 0

,

n^{1 / 3} ({\hat{f}}_{n}^{m} (x) - f_{0} (x)) \Rightarrow {[4 σ^{2} f_{0}^{'} (x)]}^{1 / 3} {argmin}_{t \in R} {W (t) + t^{2}}

as

n \to \infty

, where

W (\cdot)

is the two-sided Brownian motion starting from 0.

The

l_{2}

-risk of the isotonic regression estimator, defined by

R (f_{0}, {\hat{f}}_{n}^{m}) ≜ E [\frac{1}{n} \sum_{i = 1}^{n} {({\hat{f}}_{n}^{m} (X_{i}) - f_{0} (X_{i}))}^{2}],

(6)

is proven to have the following bound

R (f_{0}, {\hat{f}}_{n}^{m}) \leq C σ^{4 / 3} V {(f_{0})}^{2 / 3} n^{- 2 / 3},

where

V (f_{0})

is the total variation of

f_{0}

and

C > 0

is a universal constant. Meyer and Woodroofe [8] proved (6) under the assumption that the

ε_{i}

’s are normally distributed, and Zhang [9] proved (6) under a more general assumption that the

ε_{i}

’s are iid with mean zero and variance

σ^{2}

.

3.1.2. Multivariate Case

In multiple dimensions, Lim [10] established the strong consistency. Corollary 3.1 of Lim [10] proved that when

Ω = {(0, 1)}^{d}

, for any

ϵ > 0

sup_{x \in {[ϵ, 1 - ϵ]}^{d}} | {\hat{f}}_{n}^{m} (x) - f_{0} (x) | \to 0

as

n \to \infty

with probability one.

Lim [10] also established the rate of convergence and proved that

E [{({\tilde{f}}_{n}^{m} (X) - f_{0} (X))}^{2}] = \{\begin{matrix} O_{p} (n^{- 2 / 3}), & if d = 1 \\ O_{p} (n^{- 1 / 2} {(log n)}^{2}), & if d = 2 \\ O_{p} (n^{- \frac{1}{2 (d - 1)}}), & if d > 2, \end{matrix}

where the expectation is taken with respect to

X

,

X

has the same distribution as the

X_{i}

’s, and

{\tilde{f}}_{n}^{m} (\cdot)

, a variant of the isotonic regression estimator, minimizes the sum of squared errors

\sum_{i = 1}^{n} {(Y_{i} - f (X_{i}))}^{2}

over all nondecreasing functions f satisfying

| f (x) | \leq C

for some pre-set constant C for all

x \in Ω

.

Bagchi and Dhar [11] derived the pointwise limit distribution of the isotonic regression estimator in multiple dimensions. In particular, Bagchi and Dhar [11] proved

n^{- 1 / (d + 2)} ({\hat{f}}_{n}^{m} (x) - f_{0} (x)) \Rightarrow H

as

n \to \infty

for some non-degenerate limit distribution H.

Han et al. [12] showed that the

l_{2}

-risk of a multivariate isotonic regression estimator is bounded by a constant multiple of

n^{- 1 / d} {(log n)}^{4}

for all

d \geq 1

.

These results reveal a pronounced dependence of the convergence behavior on the dimension d. In particular, the rapid deterioration of rates as d increases highlights the intrinsic difficulty of multivariate isotonic regression, even under strong shape constraints. This dimensional effect partially explains the limited practical adoption of multivariate isotonic regression and underscores the need for improved algorithms and regularization strategies.

3.2. Computational Algorithms

Formulations (2) and (3) are QPs with linear constraints, and hence, can be solved by off-the-shelf solvers such as CVX (a package for disciplined convex programming), CPLEX (IBM ILOG CPLEX Optimization Studio), or MOSEK (a commercial optimization solver for large-scale convex problems).

However, there exist algorithms tailored to the specific setting of the isotonic regression estimator. In

d = 1

, Barlow et al. [13] discussed a graphical procedure to compute the isotonic regression estimator by identifying it as the greatest common minorant for a cumulative sum diagram, and proposed a tabular procedure called the pool-adjacent-violators (PAV) algorithm. The PAV algorithm computes the isotonic regression estimator in

O (n)

steps. Various modifications of the PAV algorithm have been proposed in the literature; see Brunk [1], Mukerjee [14], Mammen [15], Friedman and Tibshirani [16], Ramsay [17], or Hall and Huang [18], among many others.

However, for regression functions of more than one variable, the problem of computing isotonic regression estimators is substantially more difficult, and good algorithms are not available except for special cases. Most of the literature deals with algorithms for multivariate isotonic regression on a grid (see Gebhardt [19], Dykstra and Robertson [20], Lee [21], or Qian and Eddy [22], among others), but little work has been done for the case of random design.

Table 1 summarizes the main computational approaches discussed above.

From a methodological standpoint, existing algorithms for isotonic regression reveal a clear tension between computational efficiency and general applicability. While the pool-adjacent-violators algorithm is exact and optimal in the univariate setting, its reliance on an ordered design severely limits its extension to higher dimensions. Multivariate formulations, on the other hand, are conceptually straightforward but quickly become computationally prohibitive due to the explosion of monotonicity constraints. As a result, many practical implementations either restrict attention to structured designs or sacrifice exactness for scalability, highlighting a gap between theoretical formulations and real-world deployment.

3.3. Applications

In many applications of economics, operations research, biology, and medical sciences, monotonicity is often assumed to be hold or is mathematically proven to hold. For example, in economics, economists often assume the downward slope of the demand as a function of the price. Also, monotonicity applies to indirect utility, expenditure production, profit, and cost functions; see Gallant and Golub [23], Matzkin [24], or Aït-Sahalia and Duarte [25], among many others. In operations research, the steady-state mean waiting time of a customer in a single-server queue is proven to be nonincreasing as a function of the service rate; see Weber [26]. In biology, the weight or height of growing objects over time are known to be nondecreasing over time. In medical sciences, the blood pressure is believed to be a monotone function of the use of tobacco and the body weight (see Moolchan et al. [27]) and the probability of contracting a cancer depends monotonically on certain factors such as smoking frequency, drinking frequency, and weight.

Illustration of Isotonic Regression

To illustrate how isotonic regression operates in practice, we consider a simple univariate example. We consider the case where

f_{0}

represents the expected average waiting time of the first 5000 customers in an M/M/1 queue with a first in/first out discipline, unit arrival rate, and a service rate

x \in [1.2, 1.3]

, initialized as empty and idle. The service rate is set as

X_{i} = 1.2 + 0.1 (i - 1) / n + 0.1 / (2 n)

for

1 \leq i \leq n

. For each

X_{i}

, the corresponding

Y_{i}

is the average waiting time of the first 5000 customers in this M/M/1 queue with a unit arrival rate and a service rate of

X_{i}

. Once the pairs

(X_{i}, Y_{i})

for

1 \leq i \leq n

are generated, we computed the isotonic regression estimator by solving (2) using CVX.

Figure 1 displays the observed data along with the isotonic regression estimator. Its mean square error of the isotonic regression estimator,

\frac{1}{n} \sum_{i = 1}^{n} {({\hat{f}}_{n}^{m} (X_{i}) - f_{0} (X_{i}))}^{2}

, is 0.0596 while the mean square error of the

Y_{i}

’s,

\frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f_{0} (X_{i}))}^{2}

, is 0.2743. This shows improved performance of the isotonic regression estimator compared to the

Y_{i}

’s.

The estimator enforces monotonicity while fitting the data in a least-squares sense, resulting in a piecewise constant function. This example highlights two key features of isotonic regression: (i) the automatic enforcement of shape constraints without tuning parameters, and (ii) the tendency toward non-smoothness and boundary effects, which motivate the extensions discussed later in this paper.

3.4. Challenges

Two common issues in the isotonic regression estimator are (i) non-smoothness (piecewise constant fits) and (ii) boundary overfitting (“spiking”). Figure 1 displays an instance of the true regression function

f_{0}

, the

(X_{i}, Y_{i})

’s, and the isotonic regression estimator. The isotonic regression estimator in Figure 1 is piecewise constant, and hence, non-smooth. Also, it overfits near the boundary of its domain.

3.4.1. Non-Smoothness of Isotonic Regression

To overcome the non-smoothness of the isotonic regression estimator, several researchers proposed variants of the isotonic regression estimator. Ramsay [17] considered the univariate case and proposed estimating

f_{0}

as a nonnegative linear combination of the I-splines given the knot sequence

X_{1}, \dots, X_{n}

in

R

. The I-splines are monotone by definition, so the corresponding estimator is guaranteed to be monotone. This method, however, can be sensitive to the number and placement of knots and the order of the I-splines. Mammen [15] also considered the univariate case and proposed computing the isotonic regression estimator and then making it smooth by applying kernel regression. He and Shi [28] considered the univariate case and proposed fitting normalized B-splines to the data set given the knot sequence

X_{1}, \dots, X_{n}

in

R

. This method is also sensitive to the number and placement of the knots. See Mammen and Thomas-Agnan [29] and Meyer [30] for other approaches in the univariate case.

In multiple dimensions, Dette and Scheder [31] proposed a procedure that smooths the data points using a kernel-type method and then applies an isotonization step to each coordinate so that the resulting estimate maintains the monotonicity in each coordinate.

3.4.2. Overfitting of Isotonic Regression

The isotonic regression estimator is known to be inconsistent at boundaries; this is called the “spiking” problem. See Figure 1 for an illustration of this phenomenon. Proposition 1 of Lim [32] proved that when

Ω = [a, b]

,

{\hat{f}}_{n}^{m} (a)

does not converge to

f_{0} (a)

in probability as

n \to \infty

. One popular remedy for this is using a penalized isotonic regression estimator, which minimizes the sum of squared errors plus a penalty term over all monotone functions. A penalized isotonic regression estimator can be expressed as the solution to the following problem:

\begin{matrix} \underset{f \in F_{m}}{minimize} & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f (X_{i}))}^{2} + λ_{n} P_{n} (f), \end{matrix}

(7)

where

λ_{n} \geq 0

is a smoothing parameter and

P_{n} (\cdot)

is a penalty term. Wu et al. [33] considered the case where

P_{n} (f)

equals the range of f divided by n. They proved that when

d = 1

, the solution to (7) (which is often referred to as the “bounded isotonic regression”) evaluated at

x \in Ω

converges to

f_{0} (x)

in probability as

n \to \infty

. This proves that the bounded isotonic regression estimator is consistent at the boundary unlike the traditional isotonic regression estimator. Wu et al. [33] also proved similar results for the

d = 2

case when the covariates are on a grid. Luss and Rosset [34] proposed an algorithm for solving the bounded isotonic regression for the multivariate case. On the other hand, Lim [32] considered the case where

P_{n} (f) = {max}_{x \in Ω} | f^{'} (x) |

for the

d = 1

case and established the strong consistency of the corresponding estimator at

x \in Ω

, thereby proving that the estimator is consistent at the boundary of

Ω

. Lim [32] also computed the convergence rate of

n^{- 1 / 3}

for the proposed estimator.

Even though several approaches are proposed for the

d = 1

case, limited work is available for the multivariate case. Isotonic regression estimator’s overfitting problem in the

d > 1

case is a good research topic for future.

4. Convex Regression

In this section, we review statistical properties, interesting applications, and some challenges of convex regression estimator. We also review some algorithms for computing the convex regression estimator numerically.

4.1. Statistical Properties

4.1.1. Univariate Case

The convex regression estimator, along with its QP formulation in (3), was first introduced by Hildreth [2] for the

d = 1

case. Hanson and Pledger [35] established strong consistency on compact subsets of

Ω

. They proved that, if

Ω = (a, b)

, for any

c > a

and

d < b

,

max_{c \leq x \leq d} | {\hat{f}}_{n}^{c} (x) - f_{0} (x) | \to 0

as

n \to \infty

with probability one.

Results on rates of convergence were obtained by Mammen [36] and Groeneboom et al. [37]. In particular, Mammen [36] showed that at any interior point

x \in Ω

,

{\hat{f}}_{n}^{c} (x) - f_{0} (x) = O_{p} (n^{- 2 / 5})

as

n \to \infty

, provided that the second derivative of

f_{0}

at

x

does not vanish.

Groeneboom et al. [37] identified the limiting distribution of

n^{2 / 5} ({\hat{f}}_{n}^{c} (x) - f_{0} (x))

at a fixed point

x \in Ω

. Groeneboom et al. [37] proved that if the errors are iid sub-Gaussian with mean zero and

f_{0}^{'} (x) \neq 0

,

n^{2 / 5} ({\hat{f}}_{n}^{c} (x) - f_{0} (x)) \Rightarrow H^{''} (0),

where

H

denotes the greatest convex minorant (i.e., envelope) of integrated Brownian motion with quartic drift

h^{4}

, and

H^{''} (0)

is its second derivative at 0. The envelope can be viewed as a cubic curve lying above and touching the integrated Brownian motion

+ h^{4}

.

More recently, Chatterjee [38] derived an upper bound on the

l_{2}

-risk of the convex regression estimator. Under the assumption of iid Gaussian errors with mean zero, they showed that

R (f_{0}, {\hat{f}}_{n}^{c}) ≜ E [\frac{1}{n} \sum_{i = 1}^{n} {({\hat{f}}_{n}^{c} (X_{i}) - f_{0} (X_{i}))}^{2}] \leq C n^{- 4 / 5}

for all

n \geq 1

, where C is a constant.

4.1.2. Multivariate Case

The multivariate convex regression estimator and its QP formulation in (5) were first introduced by Allon et al. [39]. Strong consistency over any compact subset of the interior of

Ω

was established by Seijo and Sen [40] and Lim and Glynn [41]. Seijo and Sen [40] proved that for any compact subset A of the interior of

Ω

,

max_{x \in A} | {\hat{f}}_{n}^{c} (x) - f_{0} (x) | \to 0

as

n \to \infty

with probability one under some modest assumptions.

Rates of convergence were obtained by Lim [42]. In particular, Lim [42] considered a variant of the convex regression estimator, say

{\tilde{f}}_{n}^{c} (\cdot)

, which minimizes the sum of squared errors

\sum_{i = 1}^{n} {(Y_{i} - f (X_{i}))}^{2}

over the set of convex functions f with subgradients bounded uniformly by a constant. Lim [42] showed that

\frac{1}{n} \sum_{i = 1}^{n} {({\tilde{f}}_{n}^{c} (X_{i}) - f_{0} (X_{i}))}^{2} = \{\begin{matrix} O_{p} (n^{- 4 / (4 + d)}), & if d < 4 \\ O_{p} ((log n) n^{- 1 / 2}), & if d = 4 \\ O_{p} (n^{- 2 / d}), & if d > 4 \end{matrix}

as

n \to \infty

.

A bound on the global risk has been established by Han and Wellner [43] as follows:

E [\int_{Ω} {({\bar{f}}_{n}^{c} (x) - f_{0} (x))}^{2} λ (x) d x] = \{\begin{matrix} O (n^{- 4 / (4 + d)}), & if d < 4 \\ O ((log n) n^{- 1 / 2}), & if d = 4 \\ O (n^{- 2 / d}), & if d > 4 \end{matrix}

as

n \to \infty

, where

λ (\cdot)

is the density function of

X_{1}

and

{\bar{f}}_{n}^{c} (\cdot)

minimizes the sum of squared errors

\sum_{i = 1}^{n} {(Y_{i} - f (X_{i}))}^{2}

over the set of convex functions f bounded uniformly by a constant.

The limiting distribution of

r_{n}^{- 1} ({\hat{f}}_{n}^{c} (x) - f_{0} (x))

for an appropriate rate of convergence

r_{n}

has not be established so far and this is still an open problem.

4.2. Computational Algorithms

In the univariate case, Dent [44] introduced a QP formulation of the convex regression estimator as in (3). Dykstra [45] developed an iterative algorithm to compute the convex regression estimator numerically. The procedure proposed by Dykstra [45] is guaranteed to converge to the convex regression estimator as the number of iterations approaches infinity. For other algorithms for solving the QP in (3), see Wu [46] and Fraser and Massam [47].

For the multivariate setting, Allon et al. [39] first presented the QP formulation given in (5). However, this formulation requires

O (n^{2})

linear constraints, which limits its applicability to data sets of only a few hundred observations under current computational capabilities. To address this, Mazumder et al. [48] proposed an algorithm based on the augmented Lagrangian method. Their approach was able to solve (5) for

n = 1000

and

d = 10

within a few seconds.

To mitigate the computational burden, several authors (Magnani and Boyd [49]; Aguilera et al. [50]; Hannah and Dunson [51]) proposed approximating the convex regression estimator by the maximum of a small set of hyperplanes. Specifically, they solved the optimization problem

(α^{*}, β^{*}, K^{*}) = {argmin}_{α \in R, β \in R^{d}, K \in N} \sum_{i = 1}^{n} {[Y_{i} - max_{k = 1, \dots, K} (α_{k} + β_{k}^{T} x_{i})]}^{2},

where

(α, β) \in R \times R^{d}

defines a hyperplane and

N

is the set of positive integers. Their estimator

{\hat{f}}_{n} (\cdot)

is then given by the maximum over these hyperplanes:

{\hat{f}}_{n} (x) = max_{k = 1, \dots, K^{*}} (α_{k}^{*} + β_{k}^{* T} x) .

More recently, to handle larger data sets, Bertsimas and Mundru [52] introduced a penalized convex regression estimator defined as the solution to the following QP problem in decision variables

f_{1}, \dots, f_{n} \in R

and

β_{1}, \dots, β_{n} \in R^{d}

:

\begin{matrix} minimize & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f_{i})}^{2} + \frac{λ_{n}}{n} \sum_{i = 1}^{n} {∥ β_{i} ∥}^{2} \\ subject to & f_{i} \geq f_{j} + β_{j}^{T} (X_{i} - X_{j}), 1 \leq i, j \leq n, \end{matrix}

(8)

where

λ_{n} > 0

is a smoothing parameter. They proposed a cutting-plane algorithm for solving (8) and reported that it can handle data sets with n = 10,000 and

d = 10

within minutes.

Another strategy to alleviate the computational burden of the convex regression estimator is to reformulate the problem as a linear programming (LP) problem rather than a quadratic program. Luo and Lim [53] proposed estimating the convex function that minimizes the sum of absolute deviations (instead of the sum of squared errors). The resulting estimator can be obtained by solving the following LP problem in decision variables

f_{1}, \dots, f_{n} \in R

and

β_{1}, \dots, β_{n} \in R^{d}

:

\begin{matrix} minimize & \frac{1}{n} \sum_{i = 1}^{n} | Y_{i} - f_{i} | \\ subject to & f_{i} \geq f_{j} + β_{j}^{T} (X_{i} - X_{j}) for 1 \leq i, j \leq n . \end{matrix}

(9)

Luo and Lim [53] further showed that the corresponding estimator converges a.s. to the true function as

n \to \infty

over any compact subset of the interior of

Ω

.

Table 2 summarizes the main computational approaches discussed above.

Taken together, these algorithmic developments highlight a fundamental trade-off between statistical exactness and computational scalability. While the exact QP formulation provides a principled estimator with strong theoretical guarantees, its quadratic growth in constraints severely limits its applicability in large-scale settings. Approximate and penalized methods alleviate this burden, but at the cost of introducing additional tuning parameters or approximation error. Understanding this trade-off remains central to the practical deployment of convex regression.

4.3. Applications

Convexity and concavity play an important role across various fields, including economics and operations research. In economics, for instance, it is common to assume diminishing marginal productivity with respect to input resources, which implies that the production function is concave in its inputs (Allon et al. [39]). Also, demand (Varian [54]) and utility (Varian [55]) functions are often assumed to be convex/concave. In operations research, the steady-state waiting time in a single-server queue has been shown to be convex in the service rate; see, for example, Weber [26]. An additional example is provided in Demetriou and Tzitziris [56], where infant mortality rates are modeled as exhibiting increasing returns with respect to the gross domestic product (GDP) per capita, leading to the assumption that the infant mortality function is convex in GDP per capita. For further applications of convex regression, see Johnson and Jiang [4].

Illustration of Convex Regression

To illustrate how convex regression operates in practice, we consider a simple univariate example. We consider the case where

f_{0} (x)

is the steady–state mean waiting time of a customer in a single server queue with the first-in/first-out discipline, where the service times are iid exponential random variables with a mean of

1 / x

for

x \in (1.2, 1.7)

, and the interarrival times are iid exponential random variables with a mean of 1.

The

X_{i}

’s are drawn uniformly from

(1.2, 1.7)

. For each

X_{i}

,

Y_{i}

is generated by averaging the waiting times of the first 5000 customers in the single server queue, initialized empty and idle, with the service rate of

X_{i}

. Once the

(X_{i}, Y_{i})

’s are obtained, we computed the convex regression estimator by solving (3) using CVX.

Figure 2 displays the observed data along with the convex regression estimator. Its mean square error of the convex regression estimator,

\frac{1}{n} \sum_{i = 1}^{n} {({\hat{f}}_{n}^{c} (X_{i}) - f_{0} (X_{i}))}^{2}

, is 0.0132 while the mean square error of the

Y_{i}

’s,

\frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f_{0} (X_{i}))}^{2}

, is 0.0554. This shows improved performance of the convex regression estimator compared to the

Y_{i}

’s.

The estimator enforces convexity while fitting the data in a least-squares sense, resulting in a piecewise linear function. This example highlights two key features of convex regression: (i) the automatic enforcement of shape constraints without tuning parameters, and (ii) the tendency toward non-smoothness and boundary effects, which motivate the extensions discussed later in the next subsection.

4.4. Challenges

The convex regression estimator faces two primary limitations: (i) non-smoothness (piecewise linear fits) and (ii) overfitting near the boundary of its domain (“spiking”). Figure 2 illustrates these issues by displaying the true regression function

f_{0}

, the observed data

(X_{i}, Y_{i})

, and the convex regression estimator. As shown in Figure 2, the convex regression estimator is piecewise linear and therefore non-smooth. It also exhibits boundary overfitting.

4.4.1. Non-Smoothness of Convex Regression

In the univariate setting, several approaches have been proposed to address the lack of smoothness. Mammen and Thomas-Agnan [29] developed a two-step procedure in which the data are first smoothed using nonparametric smoothing methods, followed by convex regression applied to the smoothed values. Alternatively, Aït-Sahalia and Duarte [25] suggested first estimating the convex regression estimator and then applying local polynomial smoothing to the estimator. Birke and Dette [57] proposed a different strategy: smoothing the data with a nonparametric method such as the kernel, local polynomial, series, or spline methods, computing its derivative, applying isotonic regression to the derivative, and integrating to obtain a convex estimator. Other methods for the univariate case are discussed in Du et al. [58].

In the multivariate case, Mazumder et al. [48] constructed a smooth approximation to the convex regression estimator that remains uniformly close to the original estimator.

4.4.2. Overfitting of Convex Regression

As with isotonic regression, convex regression tends to overfit near domain boundaries, often producing extremely large subgradients and inconsistent estimates of

f_{0}

near the boundary. For

d = 1

, Ghosal and Sen [59] showed that the convex regression estimator fails to converge in probability to the true value at boundary points and that its subgradients at the boundary are unbounded in probability as

n \to \infty

.

A common strategy to mitigate overfitting is to constrain the subgradients. Lim [42] proposed bounding the

{∥ \cdot ∥}_{\infty}

norm of the subgradients, imposing

∥ β_{i} ∥_{\infty} \leq C

for all i for some fixed

C > 0

to the constraints of (5). Instead, Mazumder et al. [48] added

∥ β_{i} ∥ \leq C

to the constraints of (5).

Another widely used approach is penalization, which adds a penalty term to the objective function of (1). Chen et al. [60] and Bertsimas and Mundru [52] studied a penalized multivariate convex regression estimator defined as the solution to the following QP problem in decision variables

f_{1}, \dots, f_{n} \in R, β_{1}, \dots, β_{n} \in R^{d}

:

\begin{matrix} minimize & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f_{i})}^{2} + \frac{1}{n} \sum_{i = 1}^{n} {∥ β_{i} ∥}^{2} \\ subject to & f_{i} \geq f_{j} + β_{j}^{T} (X_{i} - X_{j}) for 1 \leq i, j \leq n . \end{matrix}

(10)

Lim [61] generalized this idea and proposed the following three alternative approaches:

\begin{matrix} (A) & {\hat{f}}_{n}^{A} = {argmin}_{f \in F_{c}} & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f (X_{i}))}^{2} + λ_{n} J (f) \\ (B) & {\hat{f}}_{n}^{B} = {argmin}_{f \in F_{c}} & \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f (X_{i}))}^{2} & subject to J (f) \leq u_{n} \\ (C) & {\hat{f}}_{n}^{C} = {argmin}_{f \in F_{c}} & J (f) & subject to \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - f (X_{i}))}^{2} \leq s_{n} \end{matrix}

for some smoothing parameter

λ_{n} \geq 0

and constants

u_{n}, s_{n} \geq 0

, where

J (f)

is a penalty term measuring the overall magnitude of the subgradient of f.

In particular, Lim [61] considered the case where

J (f) = max_{1 \leq i \leq d} sup_{x \in Ω} |\frac{\partial f}{\partial x_{i}} (x)|

and

\partial f / \partial x_{i} (x)

represents the partial derivative of f with respect to the ith component of

x

. Lim [61] established the uniform strong consistency of

{\hat{f}}_{n}^{A}

and

{\hat{f}}_{n}^{C}

and their derivatives. Specifically, she showed that

sup_{x \in Ω} | {\hat{f}}_{n}^{A} (x) - f_{0} (x) | \to 0

(11)

and

sup_{x \in Ω} sup_{β \in \partial {\hat{f}}_{n}^{A} (x)} {∥ β - \nabla f_{0} (x) ∥}_{\infty} \to 0

(12)

as

n \to \infty

with probability one. Analogous results were shown for

{\hat{f}}_{n}^{C}

. These results confirm that penalization successfully eliminates boundary overfitting.

Finally, Lim [61] derived convergence rates for

{\hat{f}}_{n}^{A}

and

{\hat{f}}_{n}^{C}

. For

{\hat{f}}_{n}^{A}

, she proved

\frac{1}{n} \sum_{i = 1}^{n} {({\hat{f}}_{n}^{A} (X_{i}) - f_{0} (X_{i}))}^{2} = \{\begin{matrix} O_{p} (n^{- \frac{4}{4 + d}}), & if d < 4 \\ O_{p} (n^{- 1 / 2} log n), & if d = 4 \\ O_{p} (n^{- 2 / d}), & if d > 4 \end{matrix}

as

n \to \infty

, and for

{\hat{f}}_{n}^{C}

, she established

\frac{1}{n} \sum_{i = 1}^{n} {({\hat{f}}_{n}^{C} (X_{i}) - f_{0} (X_{i}))}^{2} = \{\begin{matrix} O_{p} (n^{- 1 / 2} \sqrt{log n}), & if d < 4 \\ O_{p} (n^{- 1 / 2} log n), & if d = 4 \\ O_{p} (n^{- 2 / d}), & if d > 4 \end{matrix}

as

n \to \infty

.

5. Connections to Contemporary Machine Learning

Beyond their traditional applications in economics and operations research, isotonic and convex regression have found increasing use in contemporary machine learning, where structural constraints are often imposed to improve interpretability, stability, and compliance with domain knowledge.

A prominent example is probability calibration in classification and ranking systems. In large-scale recommender systems and online advertising platforms, raw model outputs (e.g., click-through rate (CTR) scores) are often poorly calibrated as probabilities. Isotonic regression is widely used as a post-hoc calibration method in CTR prediction and recommendation systems, enforcing monotonicity between model scores and observed click frequencies while preserving predictive ordering.

Shape constraints also arise naturally in learning-to-rank and monotone prediction problems, where certain features are known a priori to have a monotone effect on the response. Such constraints are increasingly emphasized in interpretable and trustworthy machine learning, particularly in high-stakes applications such as credit scoring, hiring, and healthcare, where monotonicity is often required for regulatory or ethical reasons.

From an optimization perspective, convex regression connects directly to learning convex functions and convex potentials, which appear in areas such as structured prediction, energy-based modeling, and optimal transport. In these settings, convex regression provides a principled way to estimate convex functions from data while preserving global shape properties. Related ideas also appear in modern machine learning through neural network architectures designed to learn convex functions. A notable example is Input Convex Neural Networks (ICNNs) [62], which impose architectural constraints to guarantee convexity with respect to the inputs. While ICNNs are not direct implementations of the nonparametric convex regression methods reviewed here, they are conceptually motivated by the same need to learn convex cost or loss functions from data. This parallel highlights the continued relevance of convexity constraints across both classical statistical estimation and contemporary machine learning.

Finally, the computational challenges discussed in this paper—scalability, overfitting, and the need for efficient algorithms—are closely aligned with current concerns in large-scale machine learning. Recent developments such as augmented Lagrangian methods, cutting-plane algorithms, and penalized formulations mirror optimization techniques widely used in modern ML, highlighting shape-restricted regression as a natural and increasingly relevant component of the contemporary machine learning toolkit.

6. Future Directions

This section outlines several open problems that, in our view, remain both important and insufficiently explored. The goal of this section is not only to list unresolved questions, but also to provide readers with a research roadmap highlighting where significant theoretical, computational, and applied advances can be made. Each problem below is motivated by practical limitations of existing methods and points to concrete opportunities for future research.

6.1. Problem 1—Scalable Algorithms for Large-Scale Data

Modern applications increasingly involve data sets with millions or even billions of observations. For example, in large-scale hiring platforms or online advertising systems, isotonic regression is routinely used for score calibration, yet existing algorithms are unable to scale to such data sizes.

Despite decades of progress in computational optimization, no algorithm is currently capable of computing isotonic or convex regression estimators efficiently at this scale under general designs. Developing scalable algorithms—possibly through distributed optimization, streaming methods, or approximation schemes—remains a critical challenge. This problem is particularly relevant to researchers working at the intersection of optimization, machine learning, and large-scale data analysis.

6.2. Problem 2—Algorithms for Multivariate Isotonic Regression

While univariate isotonic regression enjoys highly efficient algorithms such as the pool-adjacent-violators method, the multivariate case remains far less understood. In particular, for isotonic regression under random designs in higher dimensions, efficient and practically implementable algorithms are largely unavailable.

Most existing work focuses on grid-based designs or highly structured settings, leaving a substantial gap between theory and practice. Closing this gap would significantly expand the applicability of isotonic regression in modern data-driven problems involving multiple covariates.

6.3. Problem 3—Pointwise Limit Distribution of Multivariate Convex Regression

Although many global theoretical properties of convex regression estimators have been established, a central question remains unresolved: the pointwise limiting distribution of the multivariate convex regression estimator.

Specifically, it is unknown whether

r_{n}^{- 1} ({\hat{f}}_{n}^{c} (x) - f_{0} (x))

converges weakly to a non-degenerate limit for an appropriate normalization

r_{n}

, and if so, what the form of this limit might be. Resolving this problem would place multivariate convex regression on a theoretical footing comparable to its univariate counterpart and would substantially deepen our understanding of the local behavior of shape-restricted estimators.

6.4. Problem 4—Incorporating Smoothness into Shape-Restricted Regression in Multiple Dimensions

A well-known limitation of isotonic and convex regression is their inherent lack of smoothness, resulting in piecewise constant or piecewise linear estimators. While several smoothing and regularization techniques have been proposed, most are either heuristic or limited to low-dimensional settings.

Developing principled methods that simultaneously enforce shape constraints and smoothness in multiple dimensions remains an open challenge. Progress in this direction would be particularly valuable for applications requiring stable derivatives, such as sensitivity analysis, optimization, and decision-making systems.

6.5. Problem 5—Overfitting in Isotonic and Convex Regression

Both isotonic and convex regression are known to suffer from overfitting near the boundaries of the design space. Existing remedies typically rely on penalization or hard constraints on the estimator or its subgradients, which introduce tuning parameters that are often selected in an ad-hoc or problem-specific manner.

This raises several fundamental questions: How should such tuning parameters be chosen in a principled way? Are there alternative approaches to controlling overfitting that avoid explicit penalization altogether? Addressing these questions would improve both the theoretical robustness and practical usability of shape-restricted regression methods.

6.6. Problem 6—Theory for Penalized Isotonic and Convex Regression

While penalization has proven effective in mitigating non-smoothness and boundary overfitting, the theoretical understanding of penalized shape-restricted estimators remains incomplete. For isotonic regression, most existing results are confined to the univariate setting, and extensions to higher dimensions are largely unexplored.

For convex regression, many foundational questions remain open, including optimal convergence rates, adaptive behavior, and the interaction between penalization and dimensionality. Advancing the theory of penalized isotonic and convex regression would provide essential guidance for both methodological development and practical implementation.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The author declares no conflicts of interest.

References

Brunk, H.D. Maximum likelihood estimates of monotone parameters. Ann. Math. Stat. 1955, 26, 607–616. [Google Scholar] [CrossRef]
Hildreth, C. Point estimates of ordinates of concave functions. J. Am. Stat. Assoc. 1954, 49, 598–619. [Google Scholar] [CrossRef]
Guntuboyina, A.; Sen, B. Nonparametric Shape-Restricted Regression. Stat. Sci. 2018, 33, 568–594. [Google Scholar] [CrossRef]
Johnson, A.L.; Jiang, D.R. Shape Constraints in Economics and Operations Research. Stat. Sci. 2018, 33, 527–546. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Brunk, H.D. On the estimation of parameters restricted by inequalities. Ann. Math. Stat. 1958, 29, 437–454. [Google Scholar] [CrossRef]
Brunk, H.D. Estimation of isotonic regression. In Nonparametric Techniques in Statistical Inference; Cambridge University Press: New York, NY, USA, 1970; pp. 177–195. [Google Scholar]
Meyer, M.; Woodroofe, M. On the degrees of freedom in shape-restricted regression. Ann. Stat. 2000, 28, 1083–1104. [Google Scholar] [CrossRef]
Zhang, C.H. Risk bounds in isotornic regression. Ann. Stat. 2002, 30, 528–555. [Google Scholar] [CrossRef]
Lim, E. The Limiting Behavior of Isotonic and Convex Regression Estimators When The Model Is Misspecified. Electron. J. Stat. 2020, 14, 2053–2097. [Google Scholar] [CrossRef]
Bagchi, P.; Dhar, S.S. A study on the least squares estimator of multivariate isotonic regression function. Scand. J. Stat. 2020, 47, 1192–1221. [Google Scholar] [CrossRef]
Han, Q.; Wang, T.; Chatterjee, S.; Samworth, R.J. Isotonic Regression in General Dimensions. Ann. Stat. 2019, 47, 2440–2471. [Google Scholar] [CrossRef]
Barlow, R.E.; Bartholomew, R.J.; Bremner, J.M.; Brunk, H.D. Statistical Inference Under Order Restrictions; Wiley: New York, NY, USA, 1972. [Google Scholar]
Mukerjee, H. Monotone nonparametric regression. Ann. Stat. 1988, 16, 741–750. [Google Scholar] [CrossRef]
Mammen, E. Estimating a smooth monotone regression function. Ann. Stat. 1991, 19, 724–740. [Google Scholar] [CrossRef]
Friedman, J.; Tibshirani, R. The monotone smoothing of scatterplots. Technometrics 1984, 26, 243–250. [Google Scholar] [CrossRef]
Ramsay, J.O. Monotone regression splines in action. Stat. Sci. 1988, 3, 425–461. [Google Scholar] [CrossRef]
Hall, P.; Huang, L.S. Nonparametric kernel regression subject to monotonicity constraints. Ann. Stat. 2001, 29, 624–647. [Google Scholar] [CrossRef]
Gebhardt, F. An algorithm for monotone regression with one or more independent variables. Biometrika 1970, 57, 263–271. [Google Scholar] [CrossRef]
Dykstra, R.L.; Robertson, T. An algorithm for isotonic regression for two or more independent variables. Ann. Stat. 1982, 10, 708–716. [Google Scholar] [CrossRef]
Lee, C.I.C. The min-max algorithm and isotonic regression. Ann. Stat. 1983, 11, 467–477. [Google Scholar] [CrossRef]
Qian, S.; Eddy, W.F. An algorithm for isotonic regression on ordered rectangular grids. J. Comput. Graph. Stat. 1996, 5, 225–235. [Google Scholar] [CrossRef]
Gallant, A.; Golub, G.H. Imposing curvature restrictions on flexible functional forms. J. Econ. 1984, 26, 295–321. [Google Scholar] [CrossRef]
Matzkin, R.L. Restrictions of economic theory in nonparametric methods. In Handbook of Econometrics; North-Holland: Amsterdam, The Netherlands, 1994; Volume IV, pp. 2523–2558. [Google Scholar]
Aït-Sahalia, Y.; Duarte, J. Nonparametric option pricing under shape restrictions. J. Econom. 2003, 116, 9–47. [Google Scholar] [CrossRef]
Weber, R.R. A note on waiting times in single server queues. Oper. Res. 1983, 31, 950–951. [Google Scholar] [CrossRef]
Moolchan, E.T.; Hudson, D.L.; Schroeder, J.R.; Sehnert, S.S. Heart rate and blood pressure responses to tobacco smoking among African-American adolescents. J. Natl. Med. Assoc. 2004, 96, 767. [Google Scholar]
He, X.; Shi, P. Monotone B-spline smoothing. J. Am. Stat. Assoc. 1998, 93, 643–650. [Google Scholar] [CrossRef]
Mammen, E.; Thomas-Agnan, C. Smoothing splines and shapre restrictions. Scand. J. Stat. 1999, 26, 239–252. [Google Scholar] [CrossRef]
Meyer, M.C. Inference using shape-restricted regression splines. Ann. Stat. 2008, 2, 1013–1033. [Google Scholar] [CrossRef]
Dette, H.; Scheder, R. Strictly Monotone and Smooth Nonparametric Regression for Two or More Variables. Can. J. Stat. 2006, 34, 535–561. [Google Scholar] [CrossRef]
Lim, E. An estimator for isotonic regression with boundary consistency. Stat. Probab. Lett. 2025, 226, 110513. [Google Scholar] [CrossRef]
Wu, J.; Meyer, M.C.; Opsomer, J.D. Penalized isotonic regression. J. Stat. Plan. Inference 2015, 161, 12–24. [Google Scholar] [CrossRef]
Luss, R.; Rosset, S. Bounded isotonic regression. Electron. J. Stat. 2017, 11, 4488–4514. [Google Scholar] [CrossRef]
Hanson, D.L.; Pledger, G. Consistency in concave regression. Ann. Stat. 1976, 4, 1038–1050. [Google Scholar] [CrossRef]
Mammen, E. Nonparametric regression under qualitative smoothness assumptions. Ann. Stat. 1991, 19, 741–759. [Google Scholar] [CrossRef]
Groeneboom, P.; Jongbloed, G.; Wellner, J.A. Estimation of a convex function: Characterizations and Asymptotic Theory. Ann. Stat. 2001, 29, 1653–1698. [Google Scholar] [CrossRef]
Chatterjee, S. An improved global risk bound in concave regression. Electron. J. Stat. 2016, 10, 1608–1629. [Google Scholar] [CrossRef]
Allon, G.; Beenstock, M.; Hackman, S.; Passy, U.; Shapiro, A. Nonparametric estimatrion of concave production technologies by entropy. J. Appl. Econom. 2007, 22, 795–816. [Google Scholar] [CrossRef]
Seijo, E.; Sen, B. Nonparametric least squares estimation of a multivariate convex regression function. Ann. Stat. 2011, 39, 1633–1657. [Google Scholar] [CrossRef]
Lim, E.; Glynn, P.W. Consistency of Multi-dmentional Convex Regression. Oper. Res. 2012, 60, 196–208. [Google Scholar] [CrossRef]
Lim, E. On convergence rates of convex regression in multiple dimensions. INFORMS J. Comput. 2014, 26, 616–628. [Google Scholar] [CrossRef]
Han, Q.; Wellner, J.A. Multivariate convex regression: Global risk bounds and adaptation. arXiv 2016. [Google Scholar] [CrossRef]
Dent, W. A note on least squares fitting of function constrained to be either non-negative, nondecreasing, or convex. Manag. Sci. 1973, 20, 130–132. [Google Scholar] [CrossRef]
Dykstra, R.L. An algorithm for restricted least squares regression. J. Am. Stat. Assoc. 1983, 78, 837–842. [Google Scholar] [CrossRef]
Wu, C.F. Some algorithms for concave and isotonic regression. TIMS Stud. Manag. Sci. 1982, 19, 105–116. [Google Scholar]
Fraser, D.A.S.; Massam, H. A mixed primal-dual bases algorithm for regression under inequality constraints. Application to concave regression. Scand. J. Stat. 1989, 16, 65–74. [Google Scholar]
Mazumder, R.; Iyengar, A.C.G.; Sen, B. A Computational Framework for Multivariate Convex Regression and Its Variants. J. Am. Stat. Assoc. 2019, 114, 318–331. [Google Scholar] [CrossRef]
Magnani, A.; Boyd, S. Convex piecewise-linear fitting. Optim. Eng. 2009, 10, 1–17. [Google Scholar] [CrossRef]
Aguilera, N.; Forzani, L.; Morin, P. On uniform consistent estimators for convex regression. J. Nonparametr. Stat. 2011, 23, 897–908. [Google Scholar] [CrossRef]
Hannah, L.A.; Dunson, D.B. Multivariate Convex Regression with Adaptive Partitioning. J. Mach. Learn. Res. 2013, 14, 3261–3294. [Google Scholar]
Bertsimas, D.; Mundru, N. Sparse Convex Regression. INFORMS J. Comput. 2021, 33, 262–279. [Google Scholar] [CrossRef]
Luo, Y.; Lim, E. On consistency of absolute deviations estimators of convex functions. Int. J. Stat. Probab. 2016, 5, 1–18. [Google Scholar] [CrossRef]
Varian, H.R. The nonparametric approach to demand analysis. Econometrica 1982, 50, 945–973. [Google Scholar] [CrossRef]
Varian, H.R. The nonparametric approach to production analysis. Econometrica 1984, 52, 579–597. [Google Scholar] [CrossRef]
Demetriou, I.; Tzitziris, P. Infant mortality and economic growth: Modeling by increasing returns and least squares. In Proceedings of the World Congress on Engineering, London, UK, 5–7 July 2017; Volume 2. [Google Scholar]
Birke, M.; Dette, H. Estimating a convex function in nonparametric regression. Scand. J. Stat. 2007, 34, 384–404. [Google Scholar] [CrossRef]
Du, P.; Parmeter, C.F.; Racine, J.S. Nonparametric kernel regression with multiple predictors and multiple shape constraints. Stat. Sin. 2013, 23, 1347–1371. [Google Scholar]
Ghosal, P.; Sen, B. On univariate convex regression. Sankhya A 2017, 79, 215–253. [Google Scholar] [CrossRef]
Chen, X.; Lin, Q.; Sen, B. On degrees of freedom of projection estimators with applications to multivariate nonparametric regression. J. Am. Stat. Assoc. 2020, 115, 173–186. [Google Scholar] [CrossRef]
Lim, E. Convex Regression with a Penalty. arXiv 2025, arXiv:2509.19788. [Google Scholar] [CrossRef]
Amos, B.; Kolter, J.Z. Input Convex Neural Networks. In Proceedings of the 34th International Conference on Machine Learning; PMLR Inc.: Cambridge, MA, USA, 2017. [Google Scholar]

Figure 1. The solid line is

f_{0}

, the circles are the

(X_{i}, Y_{i})

values, and the dotted line is the isotonic regression estimator.

Figure 1. The solid line is

f_{0}

, the circles are the

(X_{i}, Y_{i})

values, and the dotted line is the isotonic regression estimator.

Figure 2. The solid line represents

f_{0}

, the circles denote the observed data

(X_{i}, Y_{i})

, and the dotted line corresponds to the convex regression estimator.

Figure 2. The solid line represents

f_{0}

, the circles denote the observed data

(X_{i}, Y_{i})

, and the dotted line corresponds to the convex regression estimator.

Table 1. Comparison of computational algorithms for isotonic regression.

Method	Dimension	Design	Complexity
PAV algorithm [13]	$d = 1$	Ordered	$O (n)$
Graphical/cumulative sum	$d = 1$	Ordered	$O (n)$
General QP solvers	Any d	Any	Polynomial
Grid-based methods	$d > 1$	Grid	–

Table 2. Comparison of computational algorithms for convex regression.

Method	Dimension	Remarks
QP formulation [2]	$d = 1$	Exact solution
Iterative projection [45]	$d = 1$	Guaranteed convergence
Standard multivariate QP [39]	$d > 1$	Severe computational burden
Augmented Lagrangian [48]	$d > 1$	Scales to $n \approx 10^{3}$
Cutting-plane methods [52]	$d > 1$	Handles $n \approx 10^{4}$
Hyperplane approximation	$d > 1$	Approximate estimator

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lim, E. Isotonic and Convex Regression: A Review of Theory, Algorithms, and Applications. Mathematics 2026, 14, 147. https://doi.org/10.3390/math14010147

AMA Style

Lim E. Isotonic and Convex Regression: A Review of Theory, Algorithms, and Applications. Mathematics. 2026; 14(1):147. https://doi.org/10.3390/math14010147

Chicago/Turabian Style

Lim, Eunji. 2026. "Isotonic and Convex Regression: A Review of Theory, Algorithms, and Applications" Mathematics 14, no. 1: 147. https://doi.org/10.3390/math14010147

APA Style

Lim, E. (2026). Isotonic and Convex Regression: A Review of Theory, Algorithms, and Applications. Mathematics, 14(1), 147. https://doi.org/10.3390/math14010147

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Isotonic and Convex Regression: A Review of Theory, Algorithms, and Applications

Abstract

1. Introduction

1.1. QP Formulation for Isotonic Regression When d = 1

1.2. QP Formulation for Convex Regression When d = 1

1.3. Scope, Motivation, and Organization of the Review

1.4. Notation and Definitions

2. The Basic QP Formulation

2.1. Isotonic Regression When d > 1

2.2. Convex Regression When d > 1

3. Isotonic Regression

3.1. Statistical Properties

3.1.1. Univariate Case

3.1.2. Multivariate Case

3.2. Computational Algorithms

3.3. Applications

Illustration of Isotonic Regression

3.4. Challenges

3.4.1. Non-Smoothness of Isotonic Regression

3.4.2. Overfitting of Isotonic Regression

4. Convex Regression

4.1. Statistical Properties

4.1.1. Univariate Case

4.1.2. Multivariate Case

4.2. Computational Algorithms

4.3. Applications

Illustration of Convex Regression

4.4. Challenges

4.4.1. Non-Smoothness of Convex Regression

4.4.2. Overfitting of Convex Regression

5. Connections to Contemporary Machine Learning

6. Future Directions

6.1. Problem 1—Scalable Algorithms for Large-Scale Data

6.2. Problem 2—Algorithms for Multivariate Isotonic Regression

6.3. Problem 3—Pointwise Limit Distribution of Multivariate Convex Regression

6.4. Problem 4—Incorporating Smoothness into Shape-Restricted Regression in Multiple Dimensions

6.5. Problem 5—Overfitting in Isotonic and Convex Regression

6.6. Problem 6—Theory for Penalized Isotonic and Convex Regression

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

1.1. QP Formulation for Isotonic Regression When $d = 1$

1.2. QP Formulation for Convex Regression When $d = 1$

2.1. Isotonic Regression When $d > 1$

2.2. Convex Regression When $d > 1$