2.1. Definitions and Statement of the Problem
2.1.1. Image Partition into Regions
We introduce first notations and definitions for presenting in a formal way the proposed algorithms. We consider depth or disparities images $\mathbf{G}\in {\mathbb{R}}^{{n}_{r}\times {n}_{c}}$ having ${n}_{r}$ rows and ${n}_{c}$ columns. The set of pixels of the image is denoted ${\mathsf{\Omega}}^{0}=\{1,\dots ,{n}_{r}\}\times \{1,\dots ,{n}_{c}\}$. Two pixels $(i,j)$ and $({i}^{\prime},{j}^{\prime})$ are connected in connectivity 4 if $(i,j)({i}^{\prime},{j}^{\prime})\in \{(1,0),(1,0),(0,1),(0,1)\}$ and in connectivity 8 if $(i,j)({i}^{\prime},{j}^{\prime})\in {\{1,0,1\}}^{2}$ (unless otherwise specified we assume in the paper that connectivity level is 4). A label image, $\mathbf{X}\in {\mathbb{N}}^{{n}_{r}\times {n}_{c}}$, specifies a label $X(i,j)$ for each pixel $(i,j)\in {\mathsf{\Omega}}^{0}$. A region ${\mathsf{\Omega}}_{\ell}\subset {\mathsf{\Omega}}^{0}$ is a subset of ${\mathsf{\Omega}}^{0}$ and is specified in the label image $\mathbf{X}$ by the equivalence $(i,j)\in {\mathsf{\Omega}}_{\ell}\iff X(i,j)=\ell $. The region is said to be a connected component if for any pixel $(i,j)\in {\mathsf{\Omega}}_{\ell}$ there is at least one pixel $({i}^{\prime},{j}^{\prime})\in {\mathsf{\Omega}}_{\ell}$, which is connected in connectivity 4 to the pixel $(i,j)$. A partition of the image into regions is denoted as a set as $\mathcal{P}=\{{\mathsf{\Omega}}_{1},\dots ,{\mathsf{\Omega}}_{L}\}$ and can be unequivocally described by a label image with L labels, which we denote $\mathbf{X}\left(\mathcal{P}\right)$.
2.1.2. Representing the Region’s Contours
The contour, or boundary, of any region is formed of horizontal and vertical crack edges: a horizontal crack edge image
${\mathcal{H}}_{X}\in {\{0,1\}}^{{n}_{r}\times {n}_{c}}$ specifies by
${\mathcal{H}}_{X}(i,j)=1$ that
$X(i1,j)\ne X(i,j)$ and similarly the vertical crack edge image
${\mathcal{V}}_{X}\in {\{0,1\}}^{{n}_{r}\times {n}_{c}}$ specifies by
${\mathcal{V}}_{X}(i,j)=1$ that
$X(i,j1)\ne X(i,j)$ and we define the contour matrix for the label matrix
$\mathbf{X}$ to be the concatenated matrix
${\mathcal{C}}_{X}=\left[\begin{array}{cc}{\mathcal{H}}_{X}& {\mathcal{V}}_{X}\end{array}\right]$. A crack edge is also named contour element. A label image
$\mathbf{X}$ has associated a unique contour matrix
${\mathcal{C}}_{X}=\left[\begin{array}{cc}{\mathcal{H}}_{X}& {\mathcal{V}}_{X}\end{array}\right]$. Conversely, a contour image
$\mathcal{C}$ can be processed by a region labeling routine,
$\mathbf{X}=\mathrm{RegLab}\left(\mathcal{C}\right)$, to construct a label image
$\mathbf{X}$ where each connected component has a distinct label, (see, e.g., the BSD benchmarking software [
35]). We note that for any label matrix
$\mathbf{X}$ where each region is a connected component, it holds that
${\mathbf{X}}^{\prime}=\mathrm{RegLab}\left({\mathcal{C}}_{X}\right)$ differs from
$\mathbf{X}$ only by a permutation of the labels. The set of contour elements set to one in
${\mathcal{C}}_{X}$ that form the outside border of a region
$\mathsf{\Omega}$ is denoted
$\Gamma \left(\mathsf{\Omega}\right)$.
2.1.3. Representing a Hierarchical Segmentation
In the literature dealing with hierarchical segmentations, the representation of a sequence of segmentations is given by the ultrametric contour map (UCM) which can be formally defined as a contour matrix $\mathcal{U}\in {\mathbb{R}}^{{n}_{r}\times 2{n}_{c}}$ having real entries, as opposed to the contour matrix $\mathcal{C}$, which has binary elements. It is usual to normalize the real value of the contour element $\mathcal{U}(i,j)$ to the range $[0;1]$ and then consider the value as the probability that a contour element separates two adjacent pixels having different labels. However we keep the UCM matrix to be integervalued, with the elements specifying a persistency level. By thresholding the elements of the UCM matrix $\mathcal{U}$ at a threshold ${\tau}_{\ell}$ one obtains a binary matrix ${\mathcal{C}}_{\ell}$. Using a decreasing sequence of thresholds one obtains a sequence of binary contour images ${\mathcal{C}}_{1},\dots ,{\mathcal{C}}_{L}$, corresponding to nested segmentations ${\mathbf{X}}_{1},\dots ,{\mathbf{X}}_{L}$ of the image $\mathbf{G}$, which together form a hierarchical segmentation.
Considering two consecutive nested segmentations
${\mathbf{X}}_{\ell}$ and
${\mathbf{X}}_{\ell +1}$, and two neighbor regions,
${\mathsf{\Omega}}_{{\ell}_{1}}$ and
${\mathsf{\Omega}}_{{\ell}_{2}}$, in
${\mathbf{X}}_{\ell +1}$ that were obtained by splitting a single region
${\mathsf{\Omega}}_{{\ell}_{1,2}}$ in
${\mathbf{X}}_{\ell}$. The split is obtained by setting to one the contour elements from the set
$\Delta \Gamma =\Gamma \left({\mathsf{\Omega}}_{{\ell}_{1}}\right)\cap \Gamma \left({\mathsf{\Omega}}_{{\ell}_{2}}\right)$. The cost of the split in terms of bitrate,
$\mathcal{L}(\Delta \Gamma )$, can be approximated to be proportional to the number of contour elements in the set
$\Delta \Gamma $, hence
$\mathcal{L}(\Delta \Gamma )=c\Delta \Gamma $, as is done in most papers using MDL mergingsplitting optimization [
7,
24].
2.1.4. Polynomial Surface for Approximating the Disparity Map over a Region
We consider the following two dimensional polynomials:
${P}_{0}(i,j)={\theta}_{0}$,
${P}_{1}(i,j)={\theta}_{0}+{\theta}_{1}i+{\theta}_{2}j$, and
and denote generically
${P}_{\theta}(i,j)={\phi}_{k}{(i,j)}^{T}\theta $ where the elements of the regression vector
${\phi}_{k}(i,j)$ are monomials in the variables
i and
j. The main model considered in this paper is the reconstruction
$\mathcal{S}(\mathsf{\Theta},\mathcal{P})\in {\mathbb{R}}^{{n}_{r}\times {n}_{c}}$ of the image
$\mathbf{G}$, as a function of a partition
$\mathcal{P}=\{{\mathsf{\Omega}}_{1},\dots ,{\mathsf{\Omega}}_{L}\}$ and a set of polynomial parameter vectors
$\mathsf{\Theta}=\{{\theta}_{1},\dots ,{\theta}_{L}\}$, where the reconstruction surface
$\mathcal{S}$ for a pixel
$(i,j)$ belonging to region
${\mathsf{\Omega}}_{\ell}$ is obtained with the parameters
${\theta}_{\ell}$, as
${\mathcal{S}}_{i,j}={P}_{{\theta}_{\ell}}(i,j)$.
Finally, we denote the code length necessary for representing the parameters as
$\mathcal{L}\left(\mathsf{\Theta}\right)={\sum}_{\ell =1}^{L}\mathcal{L}\left({\theta}_{\ell}\right)$, where we assume that the elements of
${\theta}_{\ell}$ are quantized to a finite precision and are encoded by Golomb–Rice coding (hence assuming a geometric distribution of the parameters). The image of contours
${\mathcal{C}}_{X}$ is encoded by the CERV algorithm [
5] and the resulting codelength is denoted
${\mathcal{L}}_{C}\left({\mathcal{C}}_{X}\right)$, or for short
$\mathcal{L}\left(\mathbf{X}\right)$.
The goal of this paper is to start from a given disparity map image $\mathbf{G}$ and to find a sequence of partitions ${\mathcal{P}}_{1},\dots ,{\mathcal{P}}_{N}$ (or equivalently a sequence of label images ${\mathbf{X}}_{1},\dots {\mathbf{X}}_{N}$) and the corresponding polynomial models ${\mathsf{\Theta}}_{1},\dots ,{\mathsf{\Theta}}_{N}$ satisfying two desiderata:
the ratedistortion description $({R}_{n},{D}_{n})$, with ${R}_{n}={\mathcal{L}}_{C}\left({\mathcal{C}}_{n}\right)+\mathcal{L}\left({\mathsf{\Theta}}_{n}\right)$ and ${D}_{n}={\parallel \mathbf{G}\mathcal{S}\left({\mathsf{\Theta}}_{n}\right)\parallel}^{2}$, should be competitive with the ratedistortion of lossy compression algorithms, at very low bitrates. The wish is to extract relevant information from $\mathbf{G}$, to encode it efficiently, and use it for obtaining a reconstruction with a small distortion, as in the lossy compression tasks, but with the next additional wish on the relevance of the segmentation for the objects in the image.
The sequence of partitions ${\mathcal{P}}_{1},\dots ,{\mathcal{P}}_{N}$ should compare favorably with the hierarchical partitions obtained from the color information of the same scene, having the diagram (recall, precision) competitive with the existing state of the art boundary detection or segmentation algorithms for finding general structure in images.
2.1.5. Statement of the Problem
We start by defining the disparity map model, consisting of a partition of the image pixels into regions, and of a polynomial surface inside each region. Then we describe the iterative process for obtaining a partition of the image into regions, with a polynomial surface model for reconstructing the depth inside each region, where the optimality criterion is the overall codelength for encoding (describing) the partition and the polynomial models for all regions, subject to a given allowed distortion over each region.
Given the disparity map image $\mathbf{G}$ we define a partition $\mathcal{P}=\{{\mathsf{\Omega}}_{\ell};\ell =1,\dots ,L\}$ of the image support, ${\mathsf{\Omega}}^{0}$, into L disjoint regions ${\mathsf{\Omega}}_{\ell};\ell =1,\dots ,L$, such that ${\bigcup}_{\ell =1}^{L}{\mathsf{\Omega}}_{\ell}={\mathsf{\Omega}}^{0}$.
The minimum description length criterion consists of the cost
$\mathcal{L}\left(\mathcal{P}\right)$ of transmitting a segmentation
${\mathcal{P}}_{n}$, evaluated by the implementable codelength obtained by context based coding of the segmentation [
5], and of the cost
$\mathcal{L}\left({\mathsf{\Theta}}_{n}\right)$ of encoding the parameters of all polynomial models. The precise cost of encoding any segment of the contour can be extracted during the coding process, and we will denote
$\mathcal{L}(\Gamma )$ the codelength of encoding the contour segment
$\Gamma $. We denote
$\Gamma \left(\mathsf{\Omega}\right)$ the outer contour of a connected region
$\mathsf{\Omega}$.
For any given distortion D one needs to solve the optimization problem
2.2. Algorithm for Hierarchical Segmentation based on Persistency of Contours of the Segmentations Generated by Iterative PieceWise Polynomial Modeling
The two components of the model are the following: (a) the segmentation and (b) the set of polynomial models, one for each region of the segmentation. Estimating the model that gives directly the minimum solution to the optimization problem (
2) for a given
D is approached by finding first a set of good “optimal” segmentations, and then checking what is the distortion corresponding to each segmentation, building thus a RD plot of solutions of (
2).
The segmentation problem is sometimes seen as the estimation of a latent variable, defined for each pixel, and we introduced the label image notation $\mathbf{X}$ for this latent variable.
A simple attempt to finding a good model (including the segmentation $\mathbf{X}$ and the polynomial models) will be in the spirit of Kmeans iterative algorithm, rephrased as a Kmodels algorithm: fix a desired number of regions ${N}_{reg}$ in the segmentation and initialize a partition of the image into ${N}_{reg}$ regions. In a first stage, fit the best polynomial model over each region and in the second stage repartition the image into ${N}_{reg}$ regions, so that each pixel $(i,j)$ is associated with the model that gives the smallest reconstruction error of $G(i,j)$. A true Kmeans or Kmodels would iterate the two stages until convergence, if that ever occur. However we are taking a different route: we operate with a large ${N}_{reg}$, in a very “oversegmented” regime, with ${N}_{reg}$ larger several times than the final intended maximum number of regions, and we are not interested in iterating until stabilization of the ${N}_{reg}$ regions. Instead we are interested in exploring as much variability in the region boundaries. Since the reestimation will make the region to change their boundaries, we track during the process for each contour element, say $\mathcal{H}(i,j)$, the number of times in which it was part of regions boundaries during the process. We call $counts\left(\mathcal{H}\right(i,j\left)\right)$ the persistency degree of the contour element, and we are building our segmentations by considering progressively the contour elements in the decreasing order of their degree of persistency.
There are a few problems with the simple Kmodels approach, and we discuss them and introduce at the same time our algorithmic steps that are correcting the problems.
We introduce several regularization options to this algorithm, resulting in the Algorithm 1. Even with the introduced change we notice that the iterative reestimation has a high variability of the region contours decided at consecutive iterations. In
Figure 2, we show on middle row on panels 1 and 2 one detail of the image Adirondack. The panels 1 to 3 in the top row show consecutive segmentations obtained during reestimation. Since there are very many models initially (one for each
$(11\times 11)$ patch), on the long board which is the armrest of the chair there are several patches, with similar almost planar models, which are competing with each other during the reestimation, and one sees the high variability of the contours of these models within the armrest in panels 1 to 3. However, the outline of the armrest remains as a clear part of region boundaries in all iterations. The main feature of our algorithm is to let many fitting polynomial surfaces to compete during the reestimation iterations, resulting in many contour pixels that are changing from one iteration to the other, but also resulting in contour elements that remain “persistent” from one iteration to the next. We are keeping track of the persistency of all contour elements in the image, and after a number of iteration (
${n}_{iter}=40$ in all experiments) we check the persistency of each contour element and we use the most persistent elements for obtain contours that are true outlines of distinct objects or object parts. Just to show the final result of both Algorithms 1 and 2, we show in
Figure 2, middle row, panel 3, that after selecting carefully the contour elements in Algorithm 2, using a ratedistortion marking of the regions, we are obtaining a segmentation very relevant for the object parts (the presented segmentation is obtained in Algorithm 2 after including persistent contours resulting in 243 regions in the segmentation image). In there, one can see that the collected persistent contours were successful in providing meaningful image features, resulting in a convincing segmentation.
Algorithm 1 Hierarchical segmentations based on persistency of contours generated by iterative piecewise polynomial modeling 
Input: The input disparity map $\mathbf{G}$. Stage 1.Find persistent contours in the image $\mathbf{G}$: Iterate finding the best fitting models for the current image partition, and then finding the best image partition for the current set of polynomial models. At each iteration mark the boundaries of the partition’s regions and add the binary edge matrix to the overall contour persistency matrix; 1.0 Initialize the partition ${\mathcal{P}}_{0}$ as being formed of $\lceil \frac{{n}_{r}}{{L}_{s}}\rceil \times \lceil \frac{{n}_{c}}{{L}_{s}}\rceil $ disjoint square regions $({L}_{s}\times {L}_{s})$. The corresponding label image is denoted ${\mathbf{X}}_{0}$. The overall contour persistency matrix is set to $\mathcal{U}=\mathbf{0}\in {\mathbb{R}}^{{n}_{r}\times 2{n}_{c}}$; 1.1 For $n=1,\dots ,{n}_{iter}$// Iterate a reestimation algorithm ${n}_{iter}$ times 1.1.1 // Reestimation iteration for finding a new set of models ${\Theta}^{\prime}=\left\{{\theta}_{\ell}\right,\ell =1,\dots ,{n}_{cc}^{\prime}\}$ and their number ${n}_{cc}^{\prime}$ 1.1.1.1 Decompose the image ${\mathbf{X}}_{n1}$ into connected components, denote ${n}_{cc}$ their number, and denote ${\mathcal{P}}_{n1}$ the partition into the regions ${\mathsf{\Omega}}_{1},\dots ,{\mathsf{\Omega}}_{{n}_{cc}}$ so that ${\mathsf{\Omega}}_{\ell}=\left\{(i,j)\right{X}_{n1}(i,j)=\ell \}$. 1.1.1.2 For each region ${\mathsf{\Omega}}_{\ell}\in {\mathcal{P}}_{n1}$ 1.1.1.2.1 If the cardinality of ${\mathsf{\Omega}}_{\ell}$ is larger than ${N}_{S}$, estimate the parameters ${\theta}_{\ell}$ of the polynomial surface model by minimizing ${\sum}_{(i,j)\in {\mathsf{\Omega}}_{\ell}}{(G(i,j){P}_{{\theta}_{\ell}}(i,j))}^{2}$. Otherwise set the model ${\theta}_{\ell}$ to empty set. 1.1.1.3 Denote ${n}_{cc}^{0}$ the number of nonempty models estimated in previous step. Process the set of parameter vectors $\left\{{\theta}_{\ell}\right,\ell =1,\dots ,{n}_{cc}\}$ to select a subset ${\Theta}^{\prime}=\left\{{\theta}_{\ell}^{\prime}\right,\ell =1,\dots ,{n}_{cc}^{\prime}\}$ of ${n}_{cc}^{\prime}\le {N}_{cc}$ models If ${n}_{cc}^{0}>{N}_{cc}$, then group similar models together to obtain ${n}_{cc}^{\prime}\le {N}_{cc}$ models. 1.1.2 // Reestimation iteration for finding a new partition ${\mathcal{P}}_{n}$ 1.1.2.1 Initialization for the new ${\mathcal{P}}_{n}$: number of regions ${r}_{n}=0$; reconstruction image ${\mathbf{R}}_{n}=\mathbf{0}$ and labels image ${\mathbf{X}}_{n}=\mathbf{0}$, with $\mathbf{0}$ the all zeros $({n}_{r}\times {n}_{c})$ matrix. 1.1.2.2 // Use the competition of the models of ${\Theta}^{\prime}$ for defining the new partition For $\ell =1,\dots ,{n}_{cc}^{\prime}$ 1.1.2.2.1 Consider the model ${\theta}_{\ell}^{\prime}$ 1.1.2.2.2 Initialize the winning binary image $\mathbf{B}=\mathbf{0}\in {\{0,1\}}^{{n}_{r}\times {n}_{c}}$, and then go over all pixels, for all $(i,j)\in {\mathsf{\Omega}}^{0}$ and set $B(i,j)=1$ if ${P}_{{\theta}_{\ell}}(i,j)G(i,j)<{R}_{n}(i,j)G(i,j)$. Find all connected components of $\mathbf{B}$ and denote ${\mathsf{\Omega}}^{*}$ the largest of them 1.1.2.2.3 If the cardinality of ${\mathsf{\Omega}}^{*}$ is larger than a given size, ${N}_{S}$, then a new region is declared, $r\leftarrow r+1$ and ${\mathsf{\Omega}}_{r}={\mathsf{\Omega}}^{*}$ 1.1.2.2.4 Include the new region ${\mathsf{\Omega}}_{r}$ in the partition ${\mathcal{P}}_{n}$, by updating the label image ${X}_{n}(i,j)=r$ and the corresponding reconstruction ${R}_{n}(i,j)={S}_{\ell}(i,j)$, for each $(i,j)\in {\mathsf{\Omega}}_{r}.$ 1.1.2.3 Construct the contour image ${\mathcal{C}}_{n}$ for ${\mathbf{X}}_{n}$ and add it to the overall UCM matrix $\mathcal{U}\leftarrow \mathcal{U}+{\mathcal{C}}_{n}$. Stage 2.Construct a hierarchical segmentation from the persistency contours matrix ${\mathcal{U}}^{\prime}$, filtering out small regions 2.1 For ${i}_{p}={n}_{iter},{n}_{iter}1,\dots ,{n}_{min}$ (Iterate the persistency level from highest to smallest) 2.1.1 Construct a current contours image, $\mathcal{C}$ having $\mathcal{C}(i,j)=1$ if $\mathcal{U}(i,j)\ge {i}_{p}$. 2.1.2 Find the labels image $\mathbf{X}$ corresponding to $\mathcal{C}$, and if ${i}_{p}={n}_{iter}$ set ${\mathbf{X}}_{{i}_{p}}=\mathbf{X}$ and continue to ${i}_{p}={n}_{iter}1$ 2.1.3 Find all connected components of the labels image $\mathbf{X}$ 2.1.4 Initialize ${\mathbf{X}}_{{i}_{p}}={\mathbf{X}}_{{i}_{p}+1}$ (the labels of the previous partition) 2.1.5 For all connected components of the labels image $\mathbf{X}$ larger than ${N}_{2}$ 2.1.5.1 If the connected component ${\mathsf{\Omega}}_{\ell}$ has holes, fill each hole that is smaller than ${N}_{H}$ pixels and then copy the filled ${\mathsf{\Omega}}_{\ell}$ to ${\mathbf{X}}_{{i}_{p}}$ as a new region 2.1.5 Construct the contour map matrix ${\mathcal{C}}_{{i}_{p}}$ corresponding to label image ${\mathbf{X}}_{{i}_{p}}$ larger than ${N}_{S}$ and update the UCM matrix, ${\mathcal{U}}^{\prime}\leftarrow {\mathcal{U}}^{\prime}+{\mathcal{C}}_{{i}_{p}}$ 2.1 Rename the sequence ${\mathbf{X}}_{{n}_{iter}},\dots ,{\mathbf{X}}_{{n}_{min}}$ as ${\mathbf{X}}_{1},\dots ,{\mathbf{X}}_{N}$ Output: The ultrametric contour map matrix ${\mathcal{U}}^{\prime}$, and the sequence of segmentations ${\mathbf{X}}_{1},\dots ,{\mathbf{X}}_{N}$, dubbed Hierarchical segmentations A.

Algorithm 2 Hierarchical partition based on (description length  distortion) optimization 
Input: The sequence of segmentations ${\mathbf{X}}_{1},\dots ,{\mathbf{X}}_{N}$ from Algorithm 1. Stage 1.Extract a catalog $\mathcal{O}$ of large regions (possible objects) from ${\mathbf{X}}_{1},\dots ,{\mathbf{X}}_{N}$ Each entry ${\mathcal{O}}_{p}$ in the catalog corresponds to a large connected component region, ${\mathsf{\Omega}}_{p}$, and is stored as a set of pixels ${\mathcal{O}}_{p}^{S}={\mathsf{\Omega}}_{p}$ 1.1 For $r=1,\dots ,N$ (Iterate from coarsest segmentation ${\mathbf{X}}_{1}$ to finest segmentation ${\mathbf{X}}_{N}$) 1.1.1 Find ${\mathsf{\Omega}}_{1},\dots ,{\mathsf{\Omega}}_{m}$, all connected components of the labels image ${\mathbf{X}}_{r}$ 1.1.2 For $q=1,\dots ,m$ (go over ${\mathsf{\Omega}}_{1},\dots ,{\mathsf{\Omega}}_{m}$) 1.1.2.1 If the size of ${\mathsf{\Omega}}_{q}$ is smaller than 0.95 of the size of the parent region in ${\mathbf{X}}_{r1}$, but the cardinality ${\mathsf{\Omega}}_{q}$ is larger than ${N}_{S}$, then the connected component is included in the catalog as a new region: $p\leftarrow p+1$ and stored as ${\mathcal{O}}_{p}^{S}={\mathsf{\Omega}}_{p}$. 1.1.2.2 Estimate the polynomial model parameters ${\theta}_{p}$. Stage 2.Construct a new sequence of segmentations ${\mathbf{X}}_{1}^{\prime},\dots ,{\mathbf{X}}_{N}^{\prime}$ based on the (description lengthdistortion) optimization 2.1 Initialize the current reconstruction image $\mathbf{R}=\mathbf{0}$ and the current label image ${\mathbf{X}}_{0}^{\prime}=\mathbf{0}$ 2.2 For $n=1,\dots ,N$ (Add to ${\mathbf{X}}_{n1}^{\prime}$ a new region to form ${\mathbf{X}}_{n}^{\prime}$) 2.2.1 For $p=1,\dots ,P$ (for all large regions from the catalog $\mathcal{O}$ that were not yet chosen) 2.2.1.1 Evaluate the candidate region ${\mathcal{O}}_{p}$: find all regions ${\mathsf{\Omega}}_{1},\dots ,{\mathsf{\Omega}}_{m}$ from ${\mathbf{X}}_{n1}^{\prime}$ overlapped (partially) by ${\mathcal{O}}_{p}$. 2.2.1.2 For $q=1,\dots ,m$ (go over ${\mathsf{\Omega}}_{1},\dots ,{\mathsf{\Omega}}_{m}$) 2.2.1.2.1 If the MSE of the current reconstruction $\mathbf{R}=\mathbf{0}$ over the ${\mathsf{\Omega}}_{q}$ is better than the MSE over ${\mathsf{\Omega}}_{q}$ of the surface generated by ${\theta}_{p}$, then carve out the set ${\mathsf{\Omega}}_{q}$ from the candidate object: ${\mathcal{O}}_{p}^{S}\leftarrow {\mathcal{O}}_{p}^{S}\setminus {\mathsf{\Omega}}_{q}$ 2.2.1.3 If the remaining size of the region ${\mathcal{O}}_{p}$ is larger than ${N}_{S}$, denote r the largest label of ${\mathbf{X}}_{n}^{\prime}$ and set in ${\mathbf{X}}_{n}^{\prime}$ the pixels form ${\mathcal{O}}_{p}^{S}$ as a new region with label $r+1$. 2.2.1.4 Fit a new polynomial model ${\theta}_{p}$ over ${\mathcal{O}}_{p}^{S}$, and find $\Delta MSE$, the improvement in the MSE over ${\mathcal{O}}_{p}^{S}$ of the new polynomial surface, compared to the current reconstruction $\mathbf{R}$ 2.2.1.5 Find the description codelength for specifying the better reconstruction, i.e., the description length of the polynomial $\mathcal{L}\left({\theta}_{p}\right)$ and the description length $\mathcal{L}(\Gamma )$ of the additional contour for specifying ${\mathcal{O}}_{p}^{S}$. Construct the ratio ${\lambda}_{p}=\frac{\Delta MSE}{\mathcal{L}\left({\theta}_{p}\right)+\mathcal{L}(\Gamma )}$. 2.2.2 Pick from all candidate regions the one with the highest ${\lambda}_{p}$ and call the winning candidate index ${p}^{*}$. 2.3 If ${\lambda}_{{c}^{*}}$ is smaller than a threshold ${\lambda}_{0}$, stop adding regions and exit, else add the new region, by modifying ${\mathbf{R}}_{n}$ and ${\mathbf{X}}_{n}$ accounting for ${\mathcal{O}}_{{p}^{*}}^{S}$. Output: The sequence of segmentations ${\mathbf{X}}_{1}^{\prime},\dots ,{\mathbf{X}}_{N}^{\prime}$ dubbed Hierarchical segmentation B.

Now we present the main particularities of running Algorithm 1 followed by Algorithm 2, as cures to the Kmeans clustering. First, we do not know a priori a suitable number of regions, corresponding to a given distortion level D. For that reason we are letting the number of regions ${N}_{reg}={n}_{cc}$ to change during the reestimation iterations, and ${N}_{reg}$ will be decided implicitly by the selection decisions at each step. We initialize the algorithm with square partitions for simplicity, with the square side 11 pixels. This is similar to the initialization of the segmentation algorithms based on superpixels. The initial ${N}_{reg}$ is in the order of thousands, resulting in a heavy oversegmentation of the image.
A major problem of the partition reestimation is that when distributing each pixel $(i,j)$ to the model that achieves the smallest reconstruction error of $G(i,j)$, there might be very many good models that represent well the other pixels within a neighborhood of $(i,j)$, and then in the neighborhood of $(i,j)$ there may be many different labels of winning models. A certain model might result in many winning patches distributed over the image, with each patch having many holes due to the many similar competitors.
To tackle this problem we adopt several changes to the simple Kmodels structure of the algorithm. We enforce that during the nth reestimation of the partition, a given model has associated only one connected component (the largest one) out of all possible connected components where the model was winning over the best current reconstruction. We go over the models in such an order that first we treat the models having a smaller winning patch, and we sequentially mark the winning patches in a label image ${\mathbf{X}}_{n}$, overwriting the labels created by earlier patches. At the end of this marking process the label image ${\mathbf{X}}_{n}$ will remain with the labels of the models having large winning patches. This process is described in Algorithm 1 at the Step 1.1.2.2 Use the competition of the models of ${\mathsf{\Theta}}^{\prime}$ for defining the new partition. The label image ${\mathbf{X}}_{n}$ can remain with many undecided pixels, since we restricted the marking of the winning patches to be only (large) connected components. All pixels with label 0 will be considered again in the decomposition of ${\mathbf{X}}_{n}$ into connected components, at next run of the Step 1.1.1.1., and hence the number of models considered again in Step 1.1. may grow again larger than ${N}_{cc}$.
The number of models ${n}_{cc}^{0}$ that are reestimated based on the new partition might be too large, exceeding our desired level ${N}_{cc}$. We use a very simple reduction of their number, by grouping together the “similar” models in the following way: we quantize each model with decreasing precision, by quantizing $Q\left({\theta}_{\ell ,r}\right)=\lfloor {\theta}_{\ell ,r}{2}^{{n}_{b}}\rceil $ for ${n}_{b}=10,9,\dots ,10$ and for each ${n}_{b}$ we check how many quantized models are distinct in the sequence of parameter vectors $Q\left({\theta}_{1}\right),\dots ,Q\left({\theta}_{{n}_{cc}}\right)$, picking the ${n}_{b}$ as the first number for which ${n}_{cc}$ remains below ${N}_{cc}$. This is the process described at the Step 1.1.1.3.
At each iteration of the reestimation process we pick the contours elements set to 1 in ${\mathcal{C}}_{{\mathbf{X}}_{n}}$ and increment the contour matrix U at the corresponding locations. The contour matrix is a $({n}_{r}\times 2{n}_{c})$ matrix, where the first half block $\mathcal{U}(1:{n}_{r},1:{n}_{c})$ specifies that the labels at $X(i1,j)$ and $X(i,j1)$ are different (horizontal edge) and the second half $\mathcal{U}(1:{n}_{r},({n}_{c}+1):2{n}_{c})$ specifies that the labels at $X(i,j1)$ and $X(i,j)$ are different (vertical edge).
When the reestimation iterations of Stage 1 are finished, we pass to Stage 2, to analyze the persistency levels marked in the matrix $\mathcal{U}$, with the maximum possible value of ${n}_{iter}$. At each persistency level ${i}_{p}$ we create the contour matrix and then find the associated label matrix ${\mathbf{X}}_{{i}_{p}}$. We want to avoid too small regions in ${\mathbf{X}}_{{i}_{p}}$, and for that we decompose the image into connected components, fill for each of them the holes that are smaller than a fixed ${N}_{H}$ (we have used ${N}_{H}=50$) and use the filled connected components for a new label in ${\mathbf{X}}_{{i}_{p}}$. The detailed description of Algorithm 1 is presented in the panel of the Algorithm 1.
To illustrate the reestimation process, we show in
Figure 2 bottom panel the evolution of some of the meaningful variables in Algorithm 1, Stage1: the number of connected components found in Step 1.1.1.1
${n}_{cc}$ is marked
#Connect. Comp.; The number of models estimated at the large connected components,
${n}_{cc}^{0}$ is marked
#Large Connect. Comp.; The number
${n}_{cc}^{\prime}$ of models forced to be smaller than
${N}_{cc}$ is marked
#Kept Models; The number of pixels remaining unclassified (unlabeled) after Step 1.1.2.2 is marked
# Unclassified pixels; finally, the number of contour elements (crack edges) in
$\mathcal{H}$ and
$\mathcal{V}$ that are set to one at Step 1.1.2.3 is marked
#Marked Crack Edges. It is seen that the variables in the reestimation algorithm are changing at each iteration, inducing variability in the segmentations obtained at each iteration, which is our main goal in the iteration process.